slob_reader 0.1.6
slob_reader: ^0.1.6 copied to clipboard
A pure Dart implementation of the Slob (Sorted List of Blobs) file format reader.
slob_reader #
A pure Dart implementation of the Slob (Sorted List of Blobs) file format reader. Supports zlib, bz2, and lzma2 compression and is compatible with files produced by the pyslob reference implementation.
Features #
- 🔓 Open any
.slobfile (read-only, random access) - 📖 Read individual entries by index (
getBlob) - 📌 Read raw index entries (
getRef) - 🚀 Batch read multiple ranges efficiently (
getBlobs) - 🗜️ Transparent decompression —
zlib,bz2,lzma2 - 🏷️ Rich file metadata — UUID, encoding, tags, content types
- ✅ Tested against the reference Python implementation
Installation #
Add to your pubspec.yaml:
dependencies:
slob_reader: ^0.1.2
Then run:
dart pub get
Quick Start #
import 'package:slob_reader/slob_reader.dart';
void main() async {
final reader = await SlobReader.open('path/to/dictionary.slob');
// Read the first entry
final blob = await reader.getBlob(0);
print('Key: ${blob.key}');
print('Content-Type: ${blob.contentType}');
print('Content: ${String.fromCharCodes(blob.content)}');
await reader.close();
}
Core API #
SlobReader.open(String path) #
Opens a .slob file for reading. Validates the magic bytes, parses the header, and loads both the ref-index and store-index into memory. This is a convenience wrapper around openSource using FileRandomAccessSource.
final reader = await SlobReader.open('en-wiktionary.slob');
SlobReader.openSource(RandomAccessSource source) #
Opens a .slob from an arbitrary source. This is useful for environments where dart:io File is not directly accessible, such as Android Storage Access Framework (SAF) content:// URIs or Web Blobs.
class MyCustomSource implements RandomAccessSource {
@override
Future<Uint8List> read(int offset, int length) async {
// Implement your own reading logic here (e.g., platform channel call)
}
@override
Future<int> get length async => 12345;
@override
Future<void> close() async {}
}
final reader = await SlobReader.openSource(MyCustomSource());
reader.header → SlobHeader #
Provides access to the file's metadata. All fields are populated during open().
| Field | Type | Description |
|---|---|---|
uuid |
String |
Unique file identifier (hex string) |
encoding |
String |
Character encoding (e.g. "utf-8") |
compression |
String |
Compression algorithm ("zlib", "bz2", "lzma2", or "") |
tags |
Map<String, String> |
Arbitrary key-value metadata set by the creator |
contentTypes |
List<String> |
MIME types used for blobs (e.g. "text/html; charset=utf-8") |
blobCount |
int |
Total number of entries in the file |
size |
int |
Total file size in bytes |
Example — inspecting metadata:
final h = reader.header;
print('UUID: ${h.uuid}');
print('Encoding: ${h.encoding}');
print('Compression: ${h.compression}');
print('Entries: ${h.blobCount}');
print('File size: ${h.size} bytes');
// Tags set by the dictionary creator, e.g. 'label', 'uri', 'copyright'
h.tags.forEach((key, value) => print(' tag[$key] = $value'));
// Content-type strings (indexed by blob.contentType id)
for (final ct in h.contentTypes) {
print(' content-type: $ct');
}
reader.getBlob(int index) → Future<SlobBlob> #
Fetches the complete entry at the given position. This is the primary way to retrieve content.
Returns a SlobBlob with the following fields:
| Field | Type | Description |
|---|---|---|
key |
String |
The dictionary headword / lookup key |
fragment |
String |
Optional in-page fragment (anchor), may be empty |
contentType |
String |
Full MIME type string |
content |
Uint8List |
Raw (decompressed) entry content |
id |
int |
Composite id: `(binIndex << 16) |
Example — reading entries sequentially:
for (var i = 0; i < reader.header.blobCount; i++) {
final blob = await reader.getBlob(i);
if (blob.contentType.startsWith('text/html')) {
final html = String.fromCharCodes(blob.content);
print('=== ${blob.key} ===');
print(html.substring(0, html.length.clamp(0, 200)));
} else {
// Binary content (images, CSS, etc.)
print('${blob.key}: ${blob.content.length} bytes (${blob.contentType})');
}
}
Example — using the fragment for deep linking:
final blob = await reader.getBlob(42);
if (blob.fragment.isNotEmpty) {
// In a WebView you might navigate to: article.html#${blob.fragment}
print('Fragment: #${blob.fragment}');
}
reader.getRef(int index) → Future<SlobRef> #
Fetches only the lightweight index entry for a given position, without decompressing the content. Useful for building search indexes or enumerating keys.
Returns a SlobRef:
| Field | Type | Description |
|---|---|---|
key |
String |
The headword / lookup key |
binIndex |
int |
Which compressed bin this entry lives in |
itemIndex |
int |
Position within that bin |
fragment |
String |
Optional anchor fragment |
Example — listing all headwords without decompressing content:
print('Total entries: ${reader.header.blobCount}');
for (var i = 0; i < reader.header.blobCount; i++) {
final ref = await reader.getRef(i);
print('[$i] ${ref.key} (bin=${ref.binIndex}, item=${ref.itemIndex})');
}
Example — simple binary search for a word:
Future<SlobRef?> findRef(SlobReader reader, String word) async {
var lo = 0;
var hi = reader.header.blobCount - 1;
while (lo <= hi) {
final mid = (lo + hi) ~/ 2;
final ref = await reader.getRef(mid);
final cmp = ref.key.compareTo(word);
if (cmp == 0) return ref;
if (cmp < 0) lo = mid + 1;
else hi = mid - 1;
}
return null; // not found
}
reader.getBlobContent(int binIndex, int itemIndex) → Future<Uint8List> #
Low-level method: decompresses the given bin and extracts the raw bytes for the specified item. You normally get binIndex and itemIndex from a SlobRef.
final ref = await reader.getRef(0);
final bytes = await reader.getBlobContent(ref.binIndex, ref.itemIndex);
print('Raw content length: ${bytes.length} bytes');
reader.getBlobs(List<(int, int)> ranges) → Future<List<SlobBlob>> #
Batch reads multiple ranges of entries efficiently. Entries that share the same compressed bin are decompressed only once, making this significantly faster than calling getBlob in a loop when reading many entries.
Each element in ranges is a record (int startIndex, int length).
Example — read first 10 and entries 500–509:
final blobs = await reader.getBlobs([
(0, 10), // indices 0–9
(500, 10), // indices 500–509
]);
for (final blob in blobs) {
print('${blob.key}: ${blob.contentType}');
}
Example — reading a page of results (e.g. for a list view):
Future<List<SlobBlob>> fetchPage(SlobReader reader, {
required int page,
int pageSize = 20,
}) async {
final start = page * pageSize;
final safeLength = (start + pageSize)
.clamp(0, reader.header.blobCount) - start;
if (safeLength <= 0) return [];
return reader.getBlobs([(start, safeLength)]);
}
final page0 = await fetchPage(reader, page: 0);
final page1 = await fetchPage(reader, page: 1);
reader.close() #
Closes the underlying file handle. Always call this when you are done.
await reader.close();
Complete Usage Examples #
Print the first 5 entries #
import 'package:slob_reader/slob_reader.dart';
void main() async {
final reader = await SlobReader.open('dictionary.slob');
final blobs = await reader.getBlobs([(0, 5)]);
for (final blob in blobs) {
print('--- ${blob.key} ---');
print(String.fromCharCodes(blob.content));
print('');
}
await reader.close();
}
Lookup a word using binary search #
import 'package:slob_reader/slob_reader.dart';
void main() async {
final reader = await SlobReader.open('dictionary.slob');
final word = 'hello';
var lo = 0;
var hi = reader.header.blobCount - 1;
SlobBlob? result;
while (lo <= hi) {
final mid = (lo + hi) ~/ 2;
final blob = await reader.getBlob(mid);
final cmp = blob.key.toLowerCase().compareTo(word);
if (cmp == 0) { result = blob; break; }
if (cmp < 0) lo = mid + 1;
else hi = mid - 1;
}
if (result != null) {
print('Found: ${result.key}');
print(String.fromCharCodes(result.content));
} else {
print('"$word" not found.');
}
await reader.close();
}
Print file metadata and tag information #
import 'package:slob_reader/slob_reader.dart';
void main() async {
final reader = await SlobReader.open('dictionary.slob');
final h = reader.header;
print('UUID: ${h.uuid}');
print('Encoding: ${h.encoding}');
print('Compression: ${h.compression}');
print('Total entries: ${h.blobCount}');
print('File size: ${h.size} bytes');
print('\nTags:');
h.tags.forEach((k, v) => print(' $k = $v'));
print('\nContent Types:');
for (var i = 0; i < h.contentTypes.length; i++) {
print(' [$i] ${h.contentTypes[i]}');
}
await reader.close();
}
Export all HTML entries to files #
import 'dart:io';
import 'package:slob_reader/slob_reader.dart';
void main() async {
final reader = await SlobReader.open('dictionary.slob');
final outDir = Directory('output')..createSync();
for (var i = 0; i < reader.header.blobCount; i++) {
final blob = await reader.getBlob(i);
if (blob.contentType.contains('text/html')) {
final safe = blob.key.replaceAll(RegExp(r'[^\w]'), '_');
File('output/$safe.html')
.writeAsBytesSync(blob.content);
}
}
print('Done.');
await reader.close();
}
Supported Compressions #
| Value in header | Algorithm | Notes |
|---|---|---|
zlib |
Deflate | Most common in Wikipedia slobs |
bz2 |
BZip2 | Older slob files |
lzma2 |
LZMA2 (XZ) | High compression ratio |
"" (empty) |
None | Raw, uncompressed bins |
Dependencies #
archive— Decompression (zlib, bz2, lzma2/XZ)
License #
MIT