lz78

LZ78 is a lossless compression algorithm by Abraham Lempel and Jacob Ziv in 1978. The compression algorithm goes as follows:

At the end of the algorithm, the output should be a list of ID-character pair. The decompression algorithm is thus as follows:

A simple implementation in Python is thus as follows:

def encode(x):
    res = []
    d = {'': 0}
    gensym = 1
    p = ''
    for ch in x:
        if p+ch in d:
            p += ch
        else:
            res.append((d[p], ch))
            d[p+ch] = gensym
            gensym += 1
            p = ''
    if p:
        res.append((d[p], ''))
    return res

def decode(x):
    res = ''
    d = {0:''}
    gensym = 1
    for c, ch in x:
        res += d[c]+ch
        d[gensym] = d[c]+ch
        gensym += 1
    return res

On the Wikipedia page LZ78 doesn't even get a pseudocode part. Some people are truly over-estimating their ability of explaining things properly.

A few observations can be made:

A simpler variant named LZW exists; it's simpler encoding-wise and much more popular.


2024.8.17

Back