byte_search_io 0.1.0
byte_search_io: ^0.1.0 copied to clipboard
Low-allocation building blocks for reading large delimiter-separated files with RandomAccessFile (chunking, record extraction, binary search).
byte_search_io #
Low-allocation building blocks for working with large, delimiter-separated files using RandomAccessFile.
This package is meant to be the “disk-first” layer for workflows like:
- reading records (lines) without loading the full file
- scanning a byte range efficiently
- doing a binary search over a sorted file to find
[start, end)offsets, then scanning only that window
If you’re parsing huge log files, this is the package that lets you jump to the right part of the file and read just what you need.
⚠️ This package requires
dart:ioand is not supported on the web.
See also:
byte_search, which is re-exported by this package and provides high-performance, allocation-free byte searching primitives that can be applied within records or chunks.
Which package should I import? #
- If you only have bytes in memory (
Uint8List) → usebyte_search. - If you’re working with large files (
RandomAccessFile) → usebyte_search_io.
byte_search_io re-exports byte_search, so import one or the other.
Features #
RecordReader #
Read the record containing an arbitrary byte offset by scanning backward/forward to the nearest delimiters.
- delimiter-separated records (default:
\n) - CRLF-friendly via
trimCarriageReturn - bounded scanning via
maxBackwardScanBytes/maxForwardScanBytes - neighbor navigation:
readRecordBeforeOffset(...)readRecordAfterOffset(...)
ChunkedFileReader #
Stream file data in chunks (optionally with overlap for raw scanning), or stream records over a range (record-aware; no overlap required).
openRandomAccessFile(...)→Stream<ByteChunk>(whole file)openRandomAccessFileRange(...)→Stream<ByteChunk>([start, end)only)openRandomAccessFileRecords(...)→Stream<RecordSlice>(record-aware; no overlap required)
BinarySearchFile<K> #
Binary search a sorted delimiter-separated file to find offsets:
The file must be sorted by key for all records where
parseKeyreturns a non-null value. Records for whichparseKeyreturnsnullare skipped during probing but must not violate the overall sort order.
lowerBound(...)→ first record with key>= targetupperBound(...)→ first record with key> target- supports skipping unparseable records (
parseKeyreturnsnull) - throws
StateErrorif a probe lands on a truncated record (increaseRecordReaderscan limits)
This is ideal for “timestamp-sorted logs”: binary search the time bounds, then scan only that subrange.
Installation #
Add to your pubspec.yaml:
dependencies:
byte_search_io: ^0.1.0
Quick start: read a record at an offset #
import 'dart:io';
import 'package:byte_search_io/byte_search_io.dart';
Future<void> main() async {
final raf = await File('log.txt').open(mode: FileMode.read);
final reader = RecordReader(); // newline-delimited by default
final rec = await reader.readRecordContainingOffset(raf, 1 << 20);
print(rec.toStringUtf8());
await raf.close();
}
Stream records (lines) without loading the file #
import 'dart:io';
import 'package:byte_search_io/byte_search_io.dart';
Future<void> main() async {
final raf = await File('log.txt').open(mode: FileMode.read);
final recordReader = RecordReader(
delimiter: 0x0A /* '\n' */,
maxBackwardScanBytes: 1024 * 1024,
maxForwardScanBytes: 1024 * 1024,
);
final chunked = ChunkedFileReader(closeRafOnDone: false);
await for (final rec in chunked.openRandomAccessFileRecords(
raf: raf,
recordReader: recordReader,
chunkSize: 1 << 22, // 4 MiB
)) {
// rec.bytes is the line (no '\n', and optionally no '\r')
// rec.startOffset/endOffsetExclusive are absolute file offsets
final line = rec.toStringUtf8(allowMalformed: true);
// do something with line...
}
await raf.close();
}
Binary search a sorted file, then scan just that range #
import 'dart:io';
import 'dart:typed_data';
import 'package:byte_search_io/byte_search_io.dart';
DateTime? tryParseLogTime(Uint8List bytes) {
// Example parser stub:
// parse a timestamp prefix from bytes; return null if not present.
// (Implement this for your log format.)
return null;
}
Future<void> main() async {
final raf = await File('sorted.log').open(mode: FileMode.read);
final recordReader = RecordReader(
delimiter: 0x0A /* '\n' */,
maxBackwardScanBytes: 1024 * 1024,
maxForwardScanBytes: 1024 * 1024,
);
final bs = BinarySearchFile<DateTime>(
recordReader: recordReader,
parseKey: (rec) => tryParseLogTime(rec.bytes),
compare: (a, b) => a.compareTo(b),
);
final startTime = DateTime(2026, 1, 1, 0, 0, 0);
final endTime = DateTime(2026, 1, 1, 1, 0, 0);
final startOffset = await bs.lowerBound(raf: raf, target: startTime);
final endOffset = await bs.upperBound(raf: raf, target: endTime);
final chunked = ChunkedFileReader(closeRafOnDone: false);
await for (final rec in chunked.openRandomAccessFileRecords(
raf: raf,
recordReader: recordReader,
startOffset: startOffset,
endOffsetExclusive: endOffset,
)) {
// process only records in [startTime, endTime]
}
await raf.close();
}
Concepts #
Records and Delimiters
A “record” is the bytes between delimiters. The delimiter byte is not included in RecordSlice.bytes.
For newline-delimited files:
delimiter:0x0A(\n)- if
trimCarriageReturn == true, a trailing\ris removed (CRLF support)
Sorted Files
Records with parseKey == null may appear anywhere, but all parseable keys
must be in monotonic order for binary search to be correct.
Offsets
Offsets are absolute file byte offsets:
RecordSlice.startOffsetis inclusiveRecordSlice.endOffsetExclusiveis exclusiveByteChunk.fileOffsetis the absolute offset wherebytes[0]belongs
This lets you:
- seek back to exact positions
- slice ranges
- create reproducible “bookmarks” into huge files
Truncation Flags
BinarySearchFilemay throwStateErrorif a probe lands on a truncated record slice.- Neighbor navigation may return
nullin truncation scenarios.
When scan limits are hit:
startTruncated == truemeans the true record start was not foundendTruncated == truemeans the true record end was not found
Error Handling
- Methods may throw
ArgumentError/StateErrorfor invalid parameters or oversized records. - May rethrow I/O errors from
RandomAccessFileoperations or fromRecordReader.
Performance Notes #
byte_search_io handles I/O, chunking, and record boundaries;
byte_search primitives can be applied within those records or chunks
to keep hot-path searching allocation-free.
ChunkedFileReader.openRandomAccessFileRecords(...) is record-aware and avoids overlap.
ChunkedFileReader.openRandomAccessFile(...) can use overlap for raw chunk scanning use-cases.
Both APIs allocate:
- per chunk (when yielding chunk bytes)
- per record (when yielding record bytes)
...but avoid per-byte allocations and avoid loading whole files.
When to use what #
- Need “the line around this offset”? →
RecordReader.readRecordContainingOffset - Need to stream lines/records? →
ChunkedFileReader.openRandomAccessFileRecords - Have a sorted file and want a time window fast? →
BinarySearchFile+ range record streaming
When NOT to use this package #
-
You already have the data in memory
If your input is already aUint8List,byte_searchalone may be sufficient. -
Files are small
For small files, simpler APIs likereadAsLines()may be clearer. -
You need web support
This package depends ondart:ioand is not available on the web. -
You want a full log parser
This package provides navigation and record extraction, not domain-specific parsing.
License #
BSD 3-Clause (see LICENSE).