Range Coding

Robbert Haarman

2021-05-25

Introduction

Range coding is an entropy coding technique where a large number is divided into ranges for each symbol, with the size of the range proportional to the probability of encoutering the symbol. After encoding the first symbol this way, the remaining range is subdivided into subranges to allow for encoding the next symbol, and so on. Because the size of the subranges decreases more slowly for common symbols than for uncommon ones, more symbols can be encoded in a number of a given magnitude, thus requiring fewer bits per symbol on average.

Range Coding Example

Although range coded data can conceptually be thought of as a single large number which could take millions of bits to represent, practical range coders work on small numbers that computers can easily process.

As a simple example, consider two symbols, which we will call a and b. If the probability of encountering an a is 3 times the probability of encountering a b, we can divide the range like so:

+-----------+---+
|     a     | b |
+-----------+---+

Putting some numbers on this, we might say that values 0 through 191 represent a, and values 192 through 255 represent b. This requires 8 bits to represent; more than enough to represent a single a or b choice, which could have been done in a single bit. However, we can use the remaining code space to encode additional symbols. If the first symbol was a, we can use values 0 through 191 to encode the second symbol and still have the first symbol decode correctly. So, for the second symbol, we can use values 0 through 143 to represent a, and 144 through 191 to represent b. The following table shows the ranges for the first three symbols:

first	range	second	range	third	range
a	[0, 191]	a	[0, 143]	a	[0, 107]
		a	[0, 143]	b	[108, 143]
		b	[144, 191]	a	[144, 179]
		b	[144, 191]	b	[180, 191]
b	[192, 255]	a	[192, 239]	a	[192, 227]
		a	[192, 239]	b	[228, 239]
		b	[240, 255]	a	[240, 251]
		b	[240, 255]	b	[252, 255]

As can be seen, if the first 3 symbols are bbb, we only have 4 code points left: 252, 253, 254, and 255. By contrast, if the first 3 symbols are aaa, we still have 108 code points left. This is were the compression comes from: the code space is used up more slowly for symbols we are more likely to encounter, meaning we can encode more symbols in a given number of bits.

At some point, the remaining range will become too small to represent the probabilities for each symbol accurately, or at all. In the example above, we can encode bbbb as 255, but then there would be no more code points left to distinguish between a and b for the next symbol. This is solved by a process called normalization, which outputs a number of the most significant bits of range, then left-shifts range to make it larger.

In our example, after encoding ab, our range is [144, 191]. In binary, this is [10010000, 10111111]. Because of this, we know that the first two bits are 10 for any value in the range. We can write those bits out and then shift the bits left to get a new range of [01000000, 11111111] in binary, or [64, 255] in decimal. This increases the available code points from 48 to 192.

Example Range Coder

The following code implements a simple binary range coder, meaning it encodes one bit at a time. The probability of the bit being 0 is given by p0, which ranges from 1 (least likely) to 255 (most likely).

const STATE_BITS : u32 = 24;
const PROB_BITS : u32 = 8;
const MAX_RANGE : u32 = (1 << STATE_BITS) - 1;
const NORM_SHIFT : u32 = STATE_BITS - 8;
const NORM_MASK : u32 = (1 << NORM_SHIFT) - 1;

struct Encoder {
    low: u32,
    range: u32,
}

fn compute_threshold(range: u32, p0: u8) -> u32 {
    (range * p0 as u32) >> PROB_BITS
}

impl Encoder {
    pub fn encode_bit(&mut self, p0: u8, bit: bool) {
        let threshold = compute_threshold(self.range, p0);
        if bit {
            self.low += threshold + 1;
            self.range -= threshold + 1;
        } else {
            self.range = threshold;
        }
    }

    // impl Encoder continued later ...

We also need some code to initialize the encoder, to perform normalization, and to send the remaining encoded data once all input data has been processed.

    // impl Encoder {

    pub fn new() -> Encoder {
        Encoder {
            low: 0,
            range: MAX_RANGE,
        }
    }

    fn needs_normalize(&self) -> bool {
        normalize_needed(self.low, self.range)
    }

    fn normalize(&mut self) -> u8 {
        let out = (self.low >> NORM_SHIFT) as u8;
        self.range = (self.range << 8) | 0xff;
        self.low <<= 8;
        out
    }

    fn flush(&mut self) -> u8 {
        ((self.low + self.range) >> NORM_SHIFT) as u8
    }
}

fn normalize_needed(low: u32, range: u32) -> bool {
    (low & NORM_MASK) + range <= NORM_MASK
}

This contains the core of the range coder. It requires a driver to actually get data in and out and to supply the probability for each bit to be encoded. A sketch of what such a driver would look like is:


let mut encoder = Encoder::new();

while let Some((bit, probability)) = next_bit_and_probability()? {
  encoder.encode_bit(probability, bit);
  if encoder.needs_normalize() {
    write_byte(encoder.normalize())?;
  }
}

write_byte(encoder.flush())?;

Some important details of the implementation:

The low end of the encoded range is in low. The high end of the range is low + range.
Normalization is performed only when the 8 most significant bits of the high end of the range are identical to those of the low end of the range. There is no minimum range size. This works for a binary range coder, because a range size of 0 means the upper 8 bits are identical, and any larger range size is large enough to code for both possible symbols. This simplifies the implementation, although it does come at the cost of not being able to accurately represent probabilities when the range becomes too small (which results in less compression).
In normalization, we increase range by a minimum of 255 (because we bitwise or with 0xff). This ensures that, after a single normalization step, we have enough range to be able to encode both possible symbols.
When the range or the probability becomes sufficiently small, threshold will be 0. By adjusting low and range by threshold + 1, we ensure that bit == 0 and bit == 1 have distinct ranges, even when threshold equals 0.
The maximum range is 0xffffff so that when multiplying it by 0xff (the maximum possible value for p0), the result fits in 32 bits.

As an example, let's say the data we want to encode is Hi, represented in bits as 01001000 01101001. There are 10 zeros and 6 ones, so we will set p0 to 160 to approximate the probability of a bit being 0. This leads to the following values for low, range, and threshold after each bit:

after low      range    thresh.
----- -------- -------- --------
 init 0x000000 0xffffff 0x9fffff
    0 0x000000 0x9fffff 0x63ffff
    1 0x640000 0x3bffff 0x257fff
    0 0x640000 0x257fff 0x176fff
    0 0x640000 0x176fff 0x0ea5ff
    1 0x72a600 0x08c9ff 0x057e3f
    0 0x72a600 0x057e3f 0x036ee7
    0 0x72a600 0x036ee7 0x022550
    0 0x72a600 0x022550 0x015752
    0 0x72a600 0x015752 0x00d693
    1 0x737c94 0x0080be 0x005076

At this point, low (0x737c94) and low + range (0x73fd52) have the same first 8 bits (0x73), so we normalize by emitting 0x73, shifting low and range 8 bits to the left, and adding 0xff to range:

after low      range    thresh.
----- -------- -------- --------
 norm 0x7c9400 0x80beff 0x50775f
    1 0xcd0b60 0x30479f 0x1e2cc3
    0 0xcd0b60 0x1e2cc3 0x12dbf9
    1 0xdfe75a 0x0b50c9 0x07127d
    0 0xdfe75a 0x07127d 0x046b8e
    0 0xdfe75a 0x046b8e 0x02c338
    1 0xe2aa93 0x01a855 0x010935

This is the end of the input data, so we output the remaining bits. There are different possibilities for how many bits to emit. We could simply emit all 3 bytes in low. However, any value of at least low (0xe2aa93) and at most low + range (0xe452e8) will decode to the same message, so we only really need to output enough bits to get somewhere in that range. If we take the most significant byte in low + range and add zeros, we get a value (0xe40000) that is guaranteed to be in the required range. Our encoder outputs that first byte and omits the trailing zeros, relying on the decoder to fill those in.

Using the byte from the normalization (0x73) and the final byte (0xe4), our encoded data becomes 73 e4.

Example Decoder

The decoder will have to reconstruct the original data from the encoded form. To do this, it reads the bytes written by the encoder, then performs computations similar to the encoder's to determine range and threshold for each bit. The following code accomplishes this:

struct Decoder {
    low: u32,
    range: u32,
    code: u32,
}

impl Decoder {
    pub fn new() -> Decoder {
        Decoder {
            low: 0,
            range: 0,
            code: 0,
        }
    }

    pub fn decode_bit(&mut self, p0: u8) -> bool {
        let threshold = compute_threshold(self.range, p0);
        let bit = self.code > threshold;
        if bit {
            self.code -= threshold + 1;
            self.range -= threshold + 1;
            self.low += threshold + 1;
        } else {
            self.range = threshold;
        }
        bit
    }

    fn needs_normalize(&self) -> bool {
        normalize_needed(self.low, self.range)
    }

    fn normalize(&mut self, b: u8) {
        self.range = (self.range << 8) | 0xff;
        self.code = (self.code << 8) | b as u32;
        self.low <<= 8;
    }
}

Like the encoder, the decoder needs a driver to feed it bytes, probabilities, and to use the decoded bits. Something like the following:


let mut decoder = Decoder::new();

while want_more_bits() {
  if decoder.needs_normalize() {
    decoder.normalize(read_byte()?);
  }
  process_bit(
    decoder.decode_bit(get_bit_probability()));
}

In the example we used for the encoder, we produced the bytes 73 e4. The decoder starts with a range of 0 and will need to read 3 bytes to get its initial code value and get its range to the point where it doesn't want normalizing anymore. In this case, we only have 2 bytes of encoded data. Whenever the decoder wants more bytes, but they don't exist, we fill in zeros. This gives us:

after low      range    thresh.  code 
----- -------- -------- -------- --------
 init 0x000000 0x000000 0x000000 0x000000
 norm 0x000000 0x0000ff 0x00009f 0x000073
 norm 0x000000 0x00ffff 0x009fff 0x0073e4
 norm 0x000000 0xffffff 0x9fffff 0x73e400

This sets up the initial state for the decoder. We can then decode our first bit. Because 0x73e400 is not greater than the threshold of 0x9fffff, the first bit is a 0. Starting there, decoding proceeds as shown below. If you compare this sequence of states to the sequence of states in the encoder, you will notice that low, range and threshold track the values in the encoder. The additional code field is increased by reading encoded bytes, and decreases whenever low increases (by the same amount).

after low      range    thresh.  code 
----- -------- -------- -------- --------
    0 0x000000 0x9fffff 0x63ffff 0x73e400
    1 0x640000 0x3bffff 0x257fff 0x0fe400
    0 0x640000 0x257fff 0x176fff 0x0fe400
    0 0x640000 0x176fff 0x0ea5ff 0x0fe400
    1 0x72a600 0x08c9ff 0x057e3f 0x013e00
    0 0x72a600 0x057e3f 0x036ee7 0x013e00
    0 0x72a600 0x036ee7 0x022550 0x013e00
    0 0x72a600 0x022550 0x015752 0x013e00
    0 0x72a600 0x015752 0x00d693 0x013e00
    1 0x737c94 0x0080be 0x005076 0x00676c

At this point, we need to normalize. We shift code, range, and low 8 bits to the left. Then we read the next byte from the encoded data and store it in the low bits of code. In this case, because we have already read all the encoded data, we treat that byte as if it were 0x00. The low 8 bits of range become 0xff, as in the encoder. After that, decoding continues.

after low      range    thresh.  code 
----- -------- -------- -------- --------
 norm 0x7c9400 0x80beff 0x50775f 0x676c00
    1 0xcd0b60 0x30479f 0x1e2cc3 0x16f4a0
    0 0xcd0b60 0x1e2cc3 0x12dbf9 0x16f4a0
    1 0xdfe75a 0x0b50c9 0x07127d 0x0418a6
    0 0xdfe75a 0x07127d 0x046b8e 0x0418a6
    0 0xdfe75a 0x046b8e 0x02c338 0x0418a6
    1 0xe2aa93 0x01a855 0x010935 0x01556d

And now we have the fully decoded data: 01001000 01101001, which is ASCII for Hi.

Dynamic Probabilities

In the examples presented so far, the probability for each symbol has always been fixed. This need not be the case. In practice, the probability of a symbol occurring in a given position is often not fixed, and we can achieve much better efficiency by more accurately predicting the probability.

As an example of this, if our input data is English text encoded in 8-bit ASCII, the first three bits will be:

011 for lowercase letters.
010 for uppercase letters.
001 for punctuation (space, comma, period, ...) or digits.

We expect those three prefixes to be much more common than the other possible 3-bit prefixes. We can use this to our advantage by writing a predictor that predicts different probabilities of a bit being 0 based on its position in a byte:

struct Predictor {                    
    bit: u8,
}

impl Predictor {
    pub fn p0(&mut self) -> u8 {
        let p = match 7 - self.bit {
            7 => 255,
            6 => 51,
            5 => 20,
            _ => 128,
        };
        self.bit = (self.bit + 1) & 7;
        p
    }
}

This can be used with the encoder as follows:

    let input = b"Hello, world!\n";
    let mut output = Vec::new();

    let mut predictor = Predictor::new();
    let mut encoder = Encoder::new();

    for b in input {
        for i in 0..8 {
            let bit = b & (1 << (7 - i)) != 0;
            encoder.encode_bit(predictor.p0(), bit);
            if encoder.needs_normalize() {
                output.push(encoder.normalize());
            }
        }
    }
    output.push(encoder.flush());

In this case, it compresses the 14-byte input string to 11 bytes of compressed data: 36 fb 8d 6b 64 3e 16 af 23 d8 fa. That's a reduction of about 21%.

The predictor can be much more sophisticated than the example shown here. For example, it might take into account previously seen bits in the same byte, or even previously seen bytes. This allows us to encode the knowledge that "a" is more likely than "`", and "?" is likely to be followed by " ".

If we don't know in advance what our input is likely to look like, we can often get good results by learning good probabilities as we process the input. Combining this with range coding leads to adaptive range coding, where we start with some estimated probability values and adapt them over time as we learn what the real probability values are in our data.

Larger Alphabets

Thus far, we have only discussed examples with only two possibilities for each symbol. Range coding also works for larger alphabets. For example, one might have a symbol corresponding to each of the 256 possible byte values. Or even all of the byte values plus some additional symbols for other needs.

Example Code

The example code from earlier, as well as supporting code to produce a working decoder and encoder, are included in the compression_toolkit project.