ocr_stabilizer 0.1.0
ocr_stabilizer: ^0.1.0 copied to clipboard
Real-time OCR overlay stabilization engine — drift correction, spatial indexing, block tracking. Built for Flutter.
ocr_stabilizer #
A real-time stabilization engine for live OCR overlays. Tracks text block identity across noisy captures, corrects positional drift, and provides spatial indexing for deduplication.
Built for Flutter. Designed for OCR pipelines where screenshots are captured at 1-2 Hz and translated overlays must remain stable as the user scrolls.
The Problem #
Live OCR on scrollable content produces a stream of noisy, jittery observations. The same paragraph appears at slightly different positions each capture. Without a stabilization layer, overlays flicker, duplicate, and drift.
This is the same problem visual SLAM (Simultaneous Localization and Mapping)
solves in robotics: associate noisy sensor observations to persistent landmarks,
correct accumulated drift, and maintain a consistent map. ocr_stabilizer
adapts SLAM techniques to the OCR domain.
Installation #
dependencies:
ocr_stabilizer: ^0.1.0
Core Components #
TrackedBlock<T> #
The engine's central interface. Every block the engine processes implements this.
class MyBlock implements TrackedBlock<MyPayload> {
@override final AbsoluteRect absoluteRect;
@override final String originalText;
@override final ContainerId? containerId;
@override final bool isViewportRelative;
@override final bool isInnerScrollerChild;
@override final bool isHorizontalScrollChild;
@override final ScrollContext scrollContext;
@override final bool isFromStickyElement;
@override final StickyFallback stickyFallback;
// ... other required getters
@override final MyPayload payload; // opaque — engine carries but never reads
}
The generic T carries app-specific data (translations, styles) without
coupling the engine to your domain types.
DriftTracker #
Tracks positional drift per coordinate-space region. OCR positions jitter between captures due to scroll timing, viewport changes, and sensor noise. DriftTracker accumulates observations and computes a robust median correction per region.
final drift = DriftTracker();
// Record a drift observation
drift.addObservation(block, measuredDrift);
// Query the correction for a region
final correction = drift.medianDriftForKey(spaceKey);
// Apply correction to a fresh observation
final corrected = DriftTracker.applyCorrectedPosition(rect, correction);
Key properties:
- Bounded corrections: Drift is clamped to the median block height per region — the engine can never shift a block farther than a typical line of text.
- Rolling window: Keeps the last 20 observations per region, so drift adapts to changing conditions.
- Submap isolation: Normal page-scroll and inner-scroller containers
track drift independently via
SpaceKey.
SpatialBlockIndex #
Grid-cell spatial index for O(cells) overlap candidate lookup during deduplication. Blocks are indexed by their center position into adaptive grid cells.
final index = SpatialBlockIndex();
index.updateBucketSizes(viewportWidth: 1000, viewportHeight: 800);
index.add(block);
final nearby = index.candidates(queryBlock);
index.remove(block);
Three coordinate-space namespaces prevent cross-space false matches:
- Normal page-absolute blocks
- Viewport-relative (fixed/sticky) blocks (
vr:prefix) - Inner-scroller relative blocks (
ic:prefix) — dual-indexed for both IC-to-normal and IC-to-IC comparisons.
HierarchyWeightX #
Extension on TrackedBlock computing hierarchy weight from coordinate-space
flags. Higher weight means more constrained coordinate space:
| Tier | Weight | Meaning |
|---|---|---|
| Viewport-relative | 40 | Fixed/sticky — no scroll drift |
| Nested IC+carousel | 30 | Compound coordinate space |
| IC or carousel | 20 | Single-axis constraint |
| Normal | 10 | Unrestricted page scroll |
Extension Types #
Zero-cost compile-time wrappers for coordinate safety:
AbsoluteRect— wrapsRectfor world-space coordinates. Spatial operations (overlaps,expandToInclude) only accept otherAbsoluteRectvalues, preventing accidental coordinate-space mixing.ContainerId— wrapsStringfor stable container identity hashes.SpaceKey— wrapsStringwith typed constructors (normal,ic,unknown) for drift observation coordinate spaces.
Six-Dimension Block Identity #
A block's identity is a six-dimensional signature:
| Dimension | What It Answers | Package Support |
|---|---|---|
| Textual | What does this text say? | originalText on TrackedBlock |
| Spatial | Where is it in the page? | absoluteRect, confidence scores |
| Relative | Which coordinate space? | SpaceKey, ContainerId |
| Semantic | What kind of element? | hierarchyWeight (extension) |
| Temporal | How much evidence? | observationCount (ObservableBlock) |
| Contextual | What context was it in? | ContextualInvalidationCheck (callback) |
API Reference #
Interfaces #
| Type | Purpose |
|---|---|
TrackedBlock<T> |
Core block contract (13 getters including payload) |
ObservableBlock |
Observation history (counts, votes, provisional state) |
ClassificationInput |
Platform-agnostic viewport geometry |
CarouselInput |
Carousel-specific geometry |
SubmapMembership |
Strategy for coordinate-space partitioning |
ContextualInvalidationCheck |
Callback for context-change detection |
Components #
| Type | Purpose |
|---|---|
DriftTracker |
Regional drift correction with submap isolation |
SpatialBlockIndex |
Grid-cell spatial index for overlap queries |
CssSubmapMembership |
Default WebView submap partitioning |
RobustStats |
Robust statistics (median, MAD, IQR) |
Value Types #
| Type | Purpose |
|---|---|
ScrollContext |
Scroll offsets and carousel identity at capture time |
StickyFallback |
Fallback coordinate context for demoted sticky elements |
Extension Types #
| Type | Wraps | Purpose |
|---|---|---|
AbsoluteRect |
Rect |
World-space coordinate safety |
ContainerId |
String |
Stable container identity |
SpaceKey |
String |
Typed drift observation keys |
Platform Support #
The package depends on dart:ui (for Rect, Offset) and therefore
requires the Flutter SDK. It has no platform-specific code — it works on
Android, iOS, macOS, Windows, Linux, and Web.
The SubmapMembership and ClassificationInput interfaces allow the engine
to support different input sources:
| Platform | SubmapMembership | ClassificationInput |
|---|---|---|
| WebView | CssSubmapMembership (default) |
CaptureSnapshotAdapter (app-side) |
| Custom (page-based submaps) | Custom (page geometry) | |
| Camera | Custom (frame regions) | Custom (camera frame) |
Roadmap #
- ✅ Extract DriftTracker and SpatialBlockIndex (#516)
- ✅ Specify TrackedBlock public contract (#518)
- ✅ ClassificationInput abstraction (#519)
- ✅ SubmapMembership interface (#520)
- ✅ Contextual invalidation callback (#521)
- ✅ Package README and API docs (#522)
- ✅ Extract OverlayCacheService (SAR merge, dedup pipeline)
- ✅ Extract BlockClassifierService
- ✅ Graduate to standalone repository