# Range Coding

2021-05-25

## Introduction

Range coding is an entropy coding technique where a large number is divided into ranges for each symbol, with the size of the range proportional to the probability of encoutering the symbol. After encoding the first symbol this way, the remaining range is subdivided into subranges to allow for encoding the next symbol, and so on. Because the size of the subranges decreases more slowly for common symbols than for uncommon ones, more symbols can be encoded in a number of a given magnitude, thus requiring fewer bits per symbol on average.

## Range Coding Example

Although range coded data can conceptually be thought of as a single large number which could take millions of bits to represent, practical range coders work on small numbers that computers can easily process.

As a simple example, consider two symbols, which we will call
a

and b

. If the probability of encountering an a

is 3 times the probability of encountering a b

, we can divide
the range like so:

+-----------+---+ | a | b | +-----------+---+

Putting some numbers on this, we might say that
values 0 through 191 represent a

, and values 192 through 255
represent b

. This requires 8 bits to represent; more than
enough to represent a single a

or b

choice, which could
have been done in a single bit. However, we can use the remaining code
space to encode additional symbols. If the first symbol was a

,
we can use values 0 through 191 to encode the second symbol and still
have the first symbol decode correctly. So, for the second symbol, we
can use values 0 through 143 to represent a

, and 144 through 191 to
represent b

. The following table shows the ranges for the first three
symbols:

first | range | second | range | third | range |
---|---|---|---|---|---|

a | [0, 191] | a | [0, 143] | a | [0, 107] |

b | [108, 143] | ||||

b | [144, 191] | a | [144, 179] | ||

b | [180, 191] | ||||

b | [192, 255] | a | [192, 239] | a | [192, 227] |

b | [228, 239] | ||||

b | [240, 255] | a | [240, 251] | ||

b | [252, 255] |

As can be seen, if the first 3 symbols are bbb

,
we only have 4 code points left: 252, 253, 254, and 255. By contrast,
if the first 3 symbols are aaa

, we still have 108 code points
left.
This is were the compression comes from: the code space is used up more
slowly for symbols we are more likely to encounter, meaning we can
encode more symbols in a given number of bits.

At some point, the remaining range will become too small to
represent the probabilities for each symbol accurately, or at all. In
the example above, we can encode bbbb

as 255, but then there
would be no more code points left to distinguish between a

and
b

for the next symbol. This is solved by a process called
normalization, which outputs a number of the most significant bits of
range, then left-shifts range to make it larger.

In our example, after encoding ab

, our range is [144, 191].
In binary, this is [10010000, 10111111]. Because of this, we know
that the first two bits are 10 for any value in the range. We can write
those bits out and then shift the bits left to get a new range
of [01000000, 11111111] in binary, or [64, 255] in decimal. This
increases the available code points from 48 to 192.

## Example Range Coder

The following code implements a simple binary range
coder, meaning it encodes one bit at a time. The probability of the
bit being 0 is given by `p0`

, which ranges from 1 (least
likely) to 255 (most likely).

```
const STATE_BITS : u32 = 24;
const PROB_BITS : u32 = 8;
const MAX_RANGE : u32 = (1 << STATE_BITS) - 1;
const NORM_SHIFT : u32 = STATE_BITS - 8;
const NORM_MASK : u32 = (1 << NORM_SHIFT) - 1;
struct Encoder {
low: u32,
range: u32,
}
fn compute_threshold(range: u32, p0: u8) -> u32 {
(range * p0 as u32) >> PROB_BITS
}
impl Encoder {
pub fn encode_bit(&mut self, p0: u8, bit: bool) {
let threshold = compute_threshold(self.range, p0);
if bit {
self.low += threshold + 1;
self.range -= threshold + 1;
} else {
self.range = threshold;
}
}
// impl Encoder continued later ...
```

We also need some code to initialize the encoder, to perform normalization, and to send the remaining encoded data once all input data has been processed.

```
// impl Encoder {
pub fn new() -> Encoder {
Encoder {
low: 0,
range: MAX_RANGE,
}
}
fn needs_normalize(&self) -> bool {
normalize_needed(self.low, self.range)
}
fn normalize(&mut self) -> u8 {
let out = (self.low >> NORM_SHIFT) as u8;
self.range = (self.range << 8) | 0xff;
self.low <<= 8;
out
}
fn flush(&mut self) -> u8 {
((self.low + self.range) >> NORM_SHIFT) as u8
}
}
fn normalize_needed(low: u32, range: u32) -> bool {
(low & NORM_MASK) + range <= NORM_MASK
}
```

This contains the core of the range coder. It requires a driver to actually get data in and out and to supply the probability for each bit to be encoded. A sketch of what such a driver would look like is:

```
let mut encoder = Encoder::new();
while let Some((bit, probability)) = next_bit_and_probability()? {
encoder.encode_bit(probability, bit);
if encoder.needs_normalize() {
write_byte(encoder.normalize())?;
}
}
write_byte(encoder.flush())?;
```

Some important details of the implementation:

- The low end of the encoded range is in
`low`

. The high end of the range is`low + range`

. - Normalization is performed only when the 8 most significant bits of the high end of the range are identical to those of the low end of the range. There is no minimum range size. This works for a binary range coder, because a range size of 0 means the upper 8 bits are identical, and any larger range size is large enough to code for both possible symbols. This simplifies the implementation, although it does come at the cost of not being able to accurately represent probabilities when the range becomes too small (which results in less compression).
- In normalization, we increase
`range`

by a minimum of 255 (because we bitwise or with 0xff). This ensures that, after a single normalization step, we have enough range to be able to encode both possible symbols. - When the range or the probability becomes sufficiently small,
`threshold`

will be 0. By adjusting`low`

and`range`

by`threshold + 1`

, we ensure that bit == 0 and bit == 1 have distinct ranges, even when`threshold`

equals 0. - The maximum range is 0xffffff so that when multiplying it by 0xff
(the maximum possible value for
`p0`

), the result fits in 32 bits.

As an example, let's say the data we want to encode
is Hi

, represented in bits as 01001000 01101001

. There
are 10 zeros and 6 ones, so we will set `p0`

to 160 to
approximate the probability of a bit being 0. This leads to the
following values for `low`

, `range`

, and
`threshold`

after each bit:

after low range thresh. ----- -------- -------- -------- init 0x000000 0xffffff 0x9fffff 0 0x000000 0x9fffff 0x63ffff 1 0x640000 0x3bffff 0x257fff 0 0x640000 0x257fff 0x176fff 0 0x640000 0x176fff 0x0ea5ff 1 0x72a600 0x08c9ff 0x057e3f 0 0x72a600 0x057e3f 0x036ee7 0 0x72a600 0x036ee7 0x022550 0 0x72a600 0x022550 0x015752 0 0x72a600 0x015752 0x00d693 1 0x737c94 0x0080be 0x005076

At this point, `low`

(0x737c94) and
`low + range`

(0x73fd52) have the same first 8 bits (0x73),
so we normalize by emitting 0x73, shifting `low`

and
`range`

8 bits to the left, and adding 0xff to
`range`

:

after low range thresh. ----- -------- -------- -------- norm 0x7c9400 0x80beff 0x50775f 1 0xcd0b60 0x30479f 0x1e2cc3 0 0xcd0b60 0x1e2cc3 0x12dbf9 1 0xdfe75a 0x0b50c9 0x07127d 0 0xdfe75a 0x07127d 0x046b8e 0 0xdfe75a 0x046b8e 0x02c338 1 0xe2aa93 0x01a855 0x010935

This is the end of the input data, so we output the
remaining bits. There are different possibilities for how many bits to
emit. We could simply emit all 3 bytes in `low`

. However,
any value of at least `low`

(0xe2aa93) and at most
`low + range`

(0xe452e8) will decode to the same message,
so we only really need to output enough bits to get somewhere in that
range. If we take the most significant byte in ```
low +
range
```

and add zeros, we get a value (0xe40000) that is
guaranteed to be in the required range. Our encoder outputs that
first byte and omits the trailing zeros, relying on the decoder to
fill those in.

Using the byte from the normalization (0x73) and the final byte
(0xe4), our encoded data becomes `73 e4`

.

## Example Decoder

The decoder will have to reconstruct the original data from the encoded form. To do this, it reads the bytes written by the encoder, then performs computations similar to the encoder's to determine range and threshold for each bit. The following code accomplishes this:

```
struct Decoder {
low: u32,
range: u32,
code: u32,
}
impl Decoder {
pub fn new() -> Decoder {
Decoder {
low: 0,
range: 0,
code: 0,
}
}
pub fn decode_bit(&mut self, p0: u8) -> bool {
let threshold = compute_threshold(self.range, p0);
let bit = self.code > threshold;
if bit {
self.code -= threshold + 1;
self.range -= threshold + 1;
self.low += threshold + 1;
} else {
self.range = threshold;
}
bit
}
fn needs_normalize(&self) -> bool {
normalize_needed(self.low, self.range)
}
fn normalize(&mut self, b: u8) {
self.range = (self.range << 8) | 0xff;
self.code = (self.code << 8) | b as u32;
self.low <<= 8;
}
}
```

Like the encoder, the decoder needs a driver to feed it bytes, probabilities, and to use the decoded bits. Something like the following:

```
let mut decoder = Decoder::new();
while want_more_bits() {
if decoder.needs_normalize() {
decoder.normalize(read_byte()?);
}
process_bit(
decoder.decode_bit(get_bit_probability()));
}
```

In the example we used for the encoder, we produced
the bytes `73 e4`

. The decoder starts with a range of 0 and
will need to read 3 bytes to get its initial code value and get its
range to the point where it doesn't want normalizing anymore. In this
case, we only have 2 bytes of encoded data. Whenever the decoder
wants more bytes, but they don't exist, we fill in zeros. This gives
us:

after low range thresh. code ----- -------- -------- -------- -------- init 0x000000 0x000000 0x000000 0x000000 norm 0x000000 0x0000ff 0x00009f 0x000073 norm 0x000000 0x00ffff 0x009fff 0x0073e4 norm 0x000000 0xffffff 0x9fffff 0x73e400

This sets up the initial state for the decoder. We
can then decode our first bit. Because 0x73e400 is not greater than
the threshold of 0x9fffff, the first bit is a 0. Starting there,
decoding proceeds as shown below. If you compare this sequence of
states to the sequence of states in the encoder, you will notice that
`low`

, `range`

and `threshold`

track the values in the encoder. The additional `code`

field is increased by reading encoded bytes, and decreases
whenever `low`

increases (by the same amount).

after low range thresh. code ----- -------- -------- -------- -------- 0 0x000000 0x9fffff 0x63ffff 0x73e400 1 0x640000 0x3bffff 0x257fff 0x0fe400 0 0x640000 0x257fff 0x176fff 0x0fe400 0 0x640000 0x176fff 0x0ea5ff 0x0fe400 1 0x72a600 0x08c9ff 0x057e3f 0x013e00 0 0x72a600 0x057e3f 0x036ee7 0x013e00 0 0x72a600 0x036ee7 0x022550 0x013e00 0 0x72a600 0x022550 0x015752 0x013e00 0 0x72a600 0x015752 0x00d693 0x013e00 1 0x737c94 0x0080be 0x005076 0x00676c

At this point, we need to normalize. We shift
`code`

, `range`

, and `low`

8 bits to
the left. Then we read the next byte from the encoded data and store
it in the low bits of `code`

. In this case, because we have
already read all the encoded data, we treat that byte as if it were
0x00. The low 8 bits of `range`

become 0xff, as in the
encoder. After that, decoding continues.

after low range thresh. code ----- -------- -------- -------- -------- norm 0x7c9400 0x80beff 0x50775f 0x676c00 1 0xcd0b60 0x30479f 0x1e2cc3 0x16f4a0 0 0xcd0b60 0x1e2cc3 0x12dbf9 0x16f4a0 1 0xdfe75a 0x0b50c9 0x07127d 0x0418a6 0 0xdfe75a 0x07127d 0x046b8e 0x0418a6 0 0xdfe75a 0x046b8e 0x02c338 0x0418a6 1 0xe2aa93 0x01a855 0x010935 0x01556d

And now we have the fully decoded data:
01001000 01101001, which is ASCII for Hi

.

## Dynamic Probabilities

In the examples presented so far, the probability for each symbol has always been fixed. This need not be the case. In practice, the probability of a symbol occurring in a given position is often not fixed, and we can achieve much better efficiency by more accurately predicting the probability.

As an example of this, if our input data is English text encoded in 8-bit ASCII, the first three bits will be:

- 011 for lowercase letters.
- 010 for uppercase letters.
- 001 for punctuation (space, comma, period, ...) or digits.

We expect those three prefixes to be much more common than the other possible 3-bit prefixes. We can use this to our advantage by writing a predictor that predicts different probabilities of a bit being 0 based on its position in a byte:

```
struct Predictor {
bit: u8,
}
impl Predictor {
pub fn p0(&mut self) -> u8 {
let p = match 7 - self.bit {
7 => 255,
6 => 51,
5 => 20,
_ => 128,
};
self.bit = (self.bit + 1) & 7;
p
}
}
```

This can be used with the encoder as follows:

```
let input = b"Hello, world!\n";
let mut output = Vec::new();
let mut predictor = Predictor::new();
let mut encoder = Encoder::new();
for b in input {
for i in 0..8 {
let bit = b & (1 << (7 - i)) != 0;
encoder.encode_bit(predictor.p0(), bit);
if encoder.needs_normalize() {
output.push(encoder.normalize());
}
}
}
output.push(encoder.flush());
```

In this case, it compresses the 14-byte input string to 11 bytes of compressed
data: `36 fb 8d 6b 64 3e 16 af 23 d8 fa`. That's a reduction of about 21%.

The predictor can be much more sophisticated than the example shown here. For example, it might take into account previously seen bits in the same byte, or even previously seen bytes. This allows us to encode the knowledge that "a" is more likely than "`", and "?" is likely to be followed by " ".

If we don't know in advance what our input is likely
to look like, we can often get good results by learning good
probabilities as we process the input. Combining this with range
coding leads to *adaptive range coding*, where we start with
some estimated probability values and adapt them over time as we learn
what the real probability values are in our data.

## Larger Alphabets

Thus far, we have only discussed examples with only two possibilities for each symbol. Range coding also works for larger alphabets. For example, one might have a symbol corresponding to each of the 256 possible byte values. Or even all of the byte values plus some additional symbols for other needs.

## Example Code

The example code from earlier, as well as supporting code to produce a working decoder and encoder, are included in the compression_toolkit project.