running-length-encoding

Running-length encoding, or RLE, is a very simple kind of encoding/compression method. The idea is to convert data into datum-count pairs. For example the string "AAABBBBCC" would become something like "3A4B2C", and the string "ABC" would become "1A1B1C".

One could easily see the cases where one would benifit from this are those where the data contains lots of repeating "single-width" patterns; "single-width", as in the width of the repeating pattern is of the width of a single datum. (For example: the string "ABABABABABABABAB" (16 characters) would become something like "8AB" (3 characters) when the width of a single datum is two characters but will become "1A1B" repeated 8 times (32 characters) when the width of a single datum is one character.) For this reason, it was (and probably still is) very prominent in encoding bitmap textures and images, where one would often have big swashes of single color.

Encoding

Encoding problem (that are similar to those in other compression method, e.g. LZ77) exists. First of all, it obviously would be a waste to encode every single-width datum as "1 + datum" (which, in the worst case, would lead to 1/W percent extra cost, where W is the width of a single datum). A common way is to have two commands of forms (LIT, length, data) and (RLE, length, datum) respectively.

Consider this example encoding scheme of RLE:

Command Width (Bytes) Format Semantics
LIT 1 + n 0xxxxxxx … Encode a literal sequence of length 1~128 (denoted with x). The data follows after the first byte.
RLE 1 + 1 1xxxxxxx b Encode a repeating sequence of length 1~128 (denoted with x) of the byte b.

It has the following properties:

  • In the worst case it introduces 1/M percent extra cost than original, where M is the maximum length allowed in an LIT command.
  • It doesn't make sense to encode repeating sequence of length equal or below 3, since the introduction of an RLE command introduces 2 or 2+1 bytes of cost (2 for the RLE command, and 1 for the beginning of the possible next LIT command).

2024.8.17

Back