arithmetic_coder 1.0.4
arithmetic_coder: ^1.0.4 copied to clipboard
Adaptive arithmetic coding for efficient lossless compression of bytes in pure Dart.
arithmetic_coder #
arithmetic_coder is a pure Dart implementation of adaptive arithmetic coding with context modeling, designed for efficient lossless compression of byte streams.
It combines:
- π Adaptive probability modeling
- π§ Finite-order context models (order 0, 1, 2)
- π² Fenwick Tree (Binary Indexed Tree) for fast cumulative frequencies
- π’ Bit-level I/O (BitReader / BitWriter)
- βοΈ Automatic rescaling to prevent overflow
Features #
- β Adaptive (no pre-trained model required)
- π§ Context modeling (order 0, 1, 2)
- β‘ Efficient O(log n) updates via Fenwick tree
- π¦ Byte-level compression and decompression
- π Streaming-friendly design
- π Deterministic decoding with mirrored context state
Installation #
Add to your pubspec.yaml:
dependencies:
arithmetic_coder: ^latest
Usage #
Encode #
import 'dart:convert';
import 'dart:typed_data';
import 'package:arithmetic_coder/arithmetic_coder.dart';
void main() {
final input = Uint8List.fromList(utf8.encode('Hello world'));
final ac = ArithmeticCoder(order: 2);
final compressed = ac.encode(input);
print('Original : ${input.length} bytes');
print('Compressed : ${compressed.length} bytes');
}
Decode #
import 'dart:convert';
final ac = ArithmeticCoder(order: 2);
final decompressed = ac.decode(compressed);
print(utf8.decode(decompressed));
How It Works #
1. Range Encoding #
Each symbol narrows a [low, high] range based on its probability:
range = high - low + 1
high = low + (range * cumulativeHigh / total) - 1
low = low + (range * cumulativeLow / total)
2. Renormalization #
To maintain precision, the coder emits bits when:
- range falls entirely in lower half β emit
0 - range falls entirely in upper half β emit
1 - range is in the middle β defer bits (underflow handling)
3. Context Modeling #
The coder adapts probabilities based on previous symbols using a finite-order Markov model:
- Order 0 β no context (global distribution)
- Order 1 β previous symbol
- Order 2 β previous two symbols
Each context maintains its own adaptive frequency table.
final ac = ArithmeticCoder(order: 1);
4. Adaptive Model #
Frequencies are updated after each symbol:
model.update(symbol);
- Starts with uniform distribution
- Learns symbol probabilities dynamically
- Rescales when totals exceed a threshold
5. Fenwick Tree #
Efficiently supports:
- prefix sums β cumulative frequencies
- updates β O(log n)
- inverse lookup β symbol decoding
Arithmetic Coding Overview #
Arithmetic coding is a lossless compression technique that encodes an entire message as a single number within a fractional interval [0, 1).
Instead of assigning fixed bit patterns to symbols (like Huffman coding), it:
- Encodes the whole sequence into a continuously refined numeric range
- Uses symbol probabilities to narrow that range step by step
- Achieves compression close to the entropy limit
Key Idea #
Each symbol reduces the interval:
[low, high) β progressively narrower range
The final value uniquely represents the entire message.
Why Itβs Powerful #
- Near-optimal compression efficiency
- Handles fractional probabilities naturally
- Benefits significantly from context modeling
API Overview #
ArithmeticCoder #
ArithmeticCoder({int order = 0});
Uint8List encode(Uint8List input);
Uint8List decode(Uint8List input);
Core Components #
ArithmeticCoderβ encoder/decoderContextModelβ context-aware probability modelFenwickβ frequency structureBitWriterβ bit-level outputBitReaderβ bit-level input
Example #
import 'dart:convert';
import 'dart:typed_data';
import 'package:arithmetic_coder/arithmetic_coder.dart';
void main() {
final text = 'Dart is awesome! ' * 100;
final input = Uint8List.fromList(utf8.encode(text));
final ac = ArithmeticCoder(order: 2);
final encoded = ac.encode(input);
final decoded = ac.decode(encoded);
final ratio = encoded.length / input.length;
print('Original : ${input.length} bytes');
print('Encoded : ${encoded.length} bytes');
print('Decoded : ${decoded.length} bytes');
print('Ratio : $ratio');
}
Limitations #
- Pure Dart (no SIMD / native optimizations)
- No built-in file streaming (yet)
- Higher memory usage for higher-order models (especially order 2)
Roadmap #
- Streaming API
- Higher-order / PPM models
- Performance tuning
- Optional static models
Contributing #
Contributions are welcome:
- Bug reports
- Performance improvements
- New modeling strategies
- Documentation improvements
Author #
Graciliano M. Passos: gmpassos@GitHub.
Sponsor #
If you find this project useful, consider supporting it through GitHub Sponsors.
Your support helps keep the project active, maintained, and continuously improving.
Thank you! π