running-length-encoding
Running-length encoding, or RLE, is a very simple kind of encoding/compression method. The idea is to convert data into datum-count pairs. For example the string "AAABBBBCC" would become something like "3A4B2C", and the string "ABC" would become "1A1B1C".
One could easily see the cases where one would benifit from this are those where the data contains lots of repeating "single-width" patterns; "single-width", as in the width of the repeating pattern is of the width of a single datum. (For example: the string "ABABABABABABABAB" (16 characters) would become something like "8AB" (3 characters) when the width of a single datum is two characters but will become "1A1B" repeated 8 times (32 characters) when the width of a single datum is one character.) For this reason, it was (and probably still is) very prominent in encoding bitmap textures and images, where one would often have big swashes of single color.
Encoding
Encoding problem (that are similar to those in other compression method, e.g. LZ77) exists. First of all, it obviously would be a waste to encode every single-width datum as "1 + datum" (which, in the worst case, would lead to 1/W
percent extra cost, where W
is the width of a single datum). A common way is to have two commands of forms (LIT, length, data)
and (RLE, length, datum)
respectively.
Consider this example encoding scheme of RLE:
Command | Width (Bytes) | Format | Semantics |
---|---|---|---|
LIT | 1 + n | 0xxxxxxx … | Encode a literal sequence of length 1~128 (denoted with x). The data follows after the first byte. |
RLE | 1 + 1 | 1xxxxxxx b | Encode a repeating sequence of length 1~128 (denoted with x) of the byte b . |
It has the following properties:
- In the worst case it introduces
1/M
percent extra cost than original, whereM
is the maximum length allowed in anLIT
command. - It doesn't make sense to encode repeating sequence of length equal or below 3, since the introduction of an
RLE
command introduces 2 or 2+1 bytes of cost (2 for theRLE
command, and 1 for the beginning of the possible nextLIT
command).
2024.8.17