ocr_stabilizer 0.5.0
ocr_stabilizer: ^0.5.0 copied to clipboard
Real-time OCR overlay stabilization engine — drift correction, spatial indexing, block tracking. Built for Flutter.
ocr_stabilizer #
A real-time stabilization engine for live OCR overlays. Tracks text block identity across noisy captures, corrects positional drift, and provides spatial indexing for deduplication.
Built for Flutter. Designed for OCR pipelines where screenshots are captured at 1-2 Hz and translated overlays must remain stable as the user scrolls.
The Problem #
Live OCR on scrollable content produces a stream of noisy, jittery observations. The same paragraph appears at slightly different positions each capture. Without a stabilization layer, overlays flicker, duplicate, and drift.
This is the same problem visual SLAM (Simultaneous Localization and Mapping)
solves in robotics: associate noisy sensor observations to persistent landmarks,
correct accumulated drift, and maintain a consistent map. ocr_stabilizer
adapts SLAM techniques to the OCR domain.
Installation #
dependencies:
ocr_stabilizer: ^0.5.0
What's new in 0.5.0 — additive surface only; safe upgrade from 0.4.x: a typed
BandPredicateExceptionsurfaces consumer-supplied predicate throws instead of swallowing them, a newrejectedTextBandcounter makes the band funnel decomposable, and an internalassertConfidenceRangeutility centralises the[0.0, 1.0]check acrossDefaultTrackedBlock,MergeResult, the engine guards, and thePositionConfidence.from/TextConfidence.fromfactories.0.4.0 introduced the band-fallback path — see
BandFallbackConfigbelow. DefaultBandFallbackMode.offkeeps the upgrade backwards-compatible.0.4.0 also tightened Confidence validation —
stabilize(),merge(), andDefaultTrackedBlock's ctor now throwArgumentErroron NaN or out-of-[0.0, 1.0]confidences. Consumers going through.from()factories were already covered. See the CHANGELOG for migration details.
Getting Started #
The fastest path is DefaultTrackedBlock<T> — a concrete reference
implementation with documented defaults for every required field, including
the load-bearing ones like carouselIdVotes: {-1: 1} that need careful
initialization.
import 'package:ocr_stabilizer/ocr_stabilizer.dart';
final engine = StabilizationEngine<DefaultTrackedBlock<MyPayload>, MyPayload>(
merger: (existing, fresh, merge) => existing.applyMerge(merge),
);
// Each capture:
final blocks = ocrResults.map((ocr) => DefaultTrackedBlock<MyPayload>(
absoluteRect: ocr.absoluteRect,
originalText: ocr.text,
payload: ocr.payload,
positionConfidence: PositionConfidence.from(ocr.posConf),
textConfidence: TextConfidence.from(ocr.txtConf),
)).toList();
final result = engine.stabilize(blocks);
// stabilize() rebuilds engine.spatialIndex internally — no caller action.
See example/example.dart for a runnable version.
For app-specific block types not covered by DefaultTrackedBlock, implement
TrackedBlock<T> directly — see the next section.
BandFallback: the band-relaxed matching path #
OCR jitter — one character flipped or one ligature mis-segmented — can drop a
stable block below the primary text-similarity floor for a single frame.
BandFallbackConfig opens a relaxed second-pass match path so spatially-
unambiguous blocks don't "blink off and back on."
final engine = StabilizationEngine<DefaultTrackedBlock<MyPayload>, MyPayload>(
merger: (existing, fresh, merge) => existing.applyMerge(merge),
// Opt in: start in observeOnly to read counters, then flip to admit.
bandFallback: const BandFallbackConfig(mode: BandFallbackMode.observeOnly),
);
// After a few captures, inspect the counters before flipping to admit.
// Note: in admit mode, once a band candidate is locked for a fresh
// observation, subsequent candidates skip band evaluation — so
// candidatesConsidered is mode-variant (observeOnly will show a higher
// figure). The funnel terms (rejectedCandidateFloor + rejectedSpatial
// + rejectedTextBand + bandMatchesIdentified == candidatesConsidered)
// are themselves mode-invariant — every term ticks before the
// early-exit fires.
final s = engine.bandStats;
print('primary admits=${s.primaryMatchesAdmitted}, '
'primary misses=${s.primaryMatchesRejected}, '
'candidates considered=${s.candidatesConsidered}, '
'band would-admit=${s.bandMatchesIdentified}, '
'rejected obs-floor=${s.rejectedCandidateFloor}, '
'rejected spatial=${s.rejectedSpatial}, '
'rejected text-band=${s.rejectedTextBand}, '
'matches admitted=${s.matchesAdmitted}');
Recommended adoption flow for callers that want band coverage: ship with
off (the default — a ^0.5.0 upgrade is a no-op), switch to
observeOnly to read the counters in production, then flip to admit
once the ratios justify it. Staying on off permanently is also valid —
it disables the band path entirely and pays no extra cost.
Core Components #
TrackedBlock<T> #
The engine's central interface. Every block the engine processes implements this.
class MyBlock implements TrackedBlock<MyPayload> {
@override final AbsoluteRect absoluteRect;
@override final String originalText;
@override final ContainerId? containerId;
@override final bool isViewportRelative;
@override final bool isInnerScrollerChild;
@override final double innerScrollerTop;
@override final bool isHorizontalScrollChild;
@override final ScrollContext scrollContext;
@override final bool isFromStickyElement;
@override final StickyFallback stickyFallback;
@override final PositionConfidence positionConfidence;
@override final TextConfidence textConfidence;
@override final int sourceQuality;
@override final MyPayload payload; // opaque — engine carries but never reads
}
For the stabilization pipeline (vote accumulation, provisional state,
SAR-merge history), implement ObservableBlock<T> instead — it extends
TrackedBlock<T> with 8 more getters. Most integrators want
DefaultTrackedBlock<T> rather than rolling their own.
The generic T carries app-specific data (translations, styles) without
coupling the engine to your domain types.
DriftTracker #
Tracks positional drift per coordinate-space region. OCR positions jitter between captures due to scroll timing, viewport changes, and sensor noise. DriftTracker accumulates observations and computes a robust median correction per region.
final drift = DriftTracker();
// Record a drift observation
drift.addObservation(block, measuredDrift);
// Query the correction for a region
final correction = drift.medianDriftForKey(spaceKey);
// Apply correction to a fresh observation
final corrected = DriftTracker.applyCorrectedPosition(rect, correction);
Key properties:
- Bounded corrections: Drift is clamped to the median block height per region — the engine can never shift a block farther than a typical line of text.
- Rolling window: Keeps the last 20 observations per region, so drift adapts to changing conditions.
- Submap isolation: Normal page-scroll and inner-scroller containers
track drift independently via
SpaceKey.
SpatialBlockIndex #
Grid-cell spatial index for O(cells) overlap candidate lookup during deduplication. Blocks are indexed by their center position into adaptive grid cells.
final index = SpatialBlockIndex();
index.updateBucketSizes(viewportWidth: 1000, viewportHeight: 800);
index.add(block);
final nearby = index.candidates(queryBlock);
index.remove(block);
Three coordinate-space namespaces prevent cross-space false matches:
- Normal page-absolute blocks
- Viewport-relative (fixed/sticky) blocks (
vr:prefix) - Inner-scroller relative blocks (
ic:prefix) — dual-indexed for both IC-to-normal and IC-to-IC comparisons.
HierarchyWeightX #
Extension on TrackedBlock computing hierarchy weight from coordinate-space
flags. Higher weight means more constrained coordinate space:
| Tier | Weight | Meaning |
|---|---|---|
| Viewport-relative | 40 | Fixed/sticky — no scroll drift |
| Nested IC+carousel | 30 | Compound coordinate space |
| IC or carousel | 20 | Single-axis constraint |
| Normal | 10 | Unrestricted page scroll |
Extension Types #
Zero-cost compile-time wrappers for coordinate safety:
AbsoluteRect— wrapsRectfor world-space coordinates. Spatial operations (overlaps,expandToInclude) only accept otherAbsoluteRectvalues, preventing accidental coordinate-space mixing.ContainerId— wrapsStringfor stable container identity hashes.SpaceKey— wrapsStringwith typed constructors (normal,ic,unknown) for drift observation coordinate spaces.
Six-Dimension Block Identity #
A block's identity is a six-dimensional signature:
| Dimension | What It Answers | Package Support |
|---|---|---|
| Textual | What does this text say? | originalText on TrackedBlock |
| Spatial | Where is it in the page? | absoluteRect, confidence scores |
| Relative | Which coordinate space? | SpaceKey, ContainerId |
| Semantic | What kind of element? | hierarchyWeight (extension) |
| Temporal | How much evidence? | observationCount (ObservableBlock) |
| Contextual | What context was it in? | ContextualInvalidationCheck (callback) |
API Reference #
Interfaces #
| Type | Purpose |
|---|---|
TrackedBlock<T> |
Core block contract (14 getters including the opaque payload) |
ObservableBlock<T> |
Extends TrackedBlock; adds observation history (8 getters: counts, votes, provisional state) |
ClassificationInput |
Platform-agnostic viewport geometry |
CarouselInput |
Carousel-specific geometry |
SubmapMembership |
Strategy for coordinate-space partitioning |
ContextualInvalidationCheck |
Callback for context-change detection |
Components #
| Type | Purpose |
|---|---|
StabilizationEngine<T, P> |
SAR-merge, intra-batch dedup, contradiction detection |
DriftTracker |
Regional drift correction with submap isolation |
SpatialBlockIndex |
Grid-cell spatial index for overlap queries |
BlockClassifierService |
Classifies blocks into fixed / sticky / carousel / IC / normal |
OverlapResolver |
Spatial NMS with language-aware thresholds |
BlockKeyGenerator |
Position + text dedup keys with fuzzy neighbor matching |
CssSubmapMembership |
Default WebView submap partitioning |
RobustStats |
Robust statistics (median, MAD, IQR) |
IqrOutlier |
Tukey-fence outlier detection |
TextDedupUtils |
Levenshtein, Jaccard, CJK detection helpers |
BandFallback (v0.4.0+) #
| Type | Purpose |
|---|---|
BandFallbackConfig |
Configures the band-relaxed matching path. Default mode: off. |
BandFallbackMode |
off (no band loop) / observeOnly (counters only) / admit (production). |
BandFallbackStats |
Read-only per-capture telemetry exposed via engine.bandStats. |
BandSpatialPredicate |
Optional bool Function(TrackedBlock fresh, TrackedBlock candidate) injection. null → engine substitutes a drift-aware overlapRatio >= 0.80 closure. |
BandPredicateException |
Typed wrapper for consumer-predicate throws (v0.5.0+) — caught and rewrapped by the engine so failures surface with a typed shape, never swallowed. Original predicate stack lives on predicateStackTrace. |
Reference Implementations #
| Type | Purpose |
|---|---|
DefaultTrackedBlock<T> |
Concrete ObservableBlock<T> with documented defaults, copyWith, and applyMerge(MergeResult) — the fastest path for new integrators |
Result Types #
| Type | Purpose |
|---|---|
StabilizationResult<T> |
Output of engine.stabilize() — stable blocks + bookkeeping |
MergeResult |
Exhaustive engine-computed delta passed to BlockMerger |
ClassificationResult |
Output of BlockClassifierService |
Value Types #
| Type | Purpose |
|---|---|
ScrollContext |
Scroll offsets and carousel identity at capture time |
StickyFallback |
Fallback coordinate context for demoted sticky elements |
TextVote |
Accumulated confidence evidence for one text variant |
Extension Types #
| Type | Wraps | Purpose |
|---|---|---|
AbsoluteRect |
Rect |
World-space coordinate safety |
ContainerId |
String |
Stable container identity |
SpaceKey |
String |
Typed drift observation keys |
PositionConfidence |
double |
Position-accuracy confidence in [0, 1] |
TextConfidence |
double |
OCR-text confidence in [0, 1] |
Platform Support #
The package depends on dart:ui (for Rect, Offset) and therefore
requires the Flutter SDK. It has no platform-specific code — it works on
Android, iOS, macOS, Windows, Linux, and Web.
The SubmapMembership and ClassificationInput interfaces allow the engine
to support different input sources:
| Platform | SubmapMembership | ClassificationInput |
|---|---|---|
| WebView | CssSubmapMembership (default) |
CaptureSnapshotAdapter (app-side) |
| Custom (page-based submaps) | Custom (page geometry) | |
| Camera | Custom (frame regions) | Custom (camera frame) |
Contributing #
See CONTRIBUTING.md for dev setup, conventions, and the release flow.