Searchlight

Searchlight is an independent pure Dart reimplementation of Orama's in-memory search and indexing model for Dart and Flutter apps. It gives you schema-based indexing, scoring, filtering, facets, persistence, and tokenizer control without requiring a server.

Searchlight is especially useful when your app already has content available locally or can download and cache it, and you want fast in-app search over that data.

Status

searchlight is the core package: indexing, querying, persistence, tokenizer configuration, and a limited create-time extension surface.

Current extension support includes:

ordered SearchlightPlugin registration
lifecycle hooks via top-level SearchlightPlugin fields
component replacement via SearchlightComponents
restore-time validation that a persisted snapshot is loaded with a compatible plugin/component graph

It does not currently include:

PDF parsing or rendering
Flutter UI widgets

The extension API is intentionally narrower today. Searchlight does not yet expose async plugin initialization or every declared hook dispatch path, but it now supports more than index/sorter replacement.

Companion Packages

Current companion packages:

searchlight_highlight for text highlighting, excerpts, HTML <mark> output, and Position match ranges
searchlight_parsedoc for HTML and Markdown extraction plus population helpers

PDF extraction, viewer integration, and other source-format-specific ingestion still belong in your app or in future companion packages above the core library.

Platform Support

searchlight is a pure Dart package. It works anywhere Dart runs, including Flutter mobile, desktop, and web. The core package does not include platform-channel code or platform-specific subpackages.

Start Here

Read doc/app-integration.md for the recommended app architecture.
Read doc/validation-workflow.md for the canonical repository validation sequence.
Open example/README.md for the Flutter validation app.

What It Provides

Full-text indexing for structured documents
BM25, QPS, and PT15 ranking algorithms
Typed filters, sorting, grouping, and facets
JSON and CBOR persistence for cached indexes
Standalone tokenizer utilities with language support, stemming, and optional stop words
A create-time extension API for lifecycle hooks and component replacement

Install companion packages when you need them:

searchlight_highlight for snippets, marked ranges, and HTML <mark> output
searchlight_parsedoc for Markdown and HTML extraction before indexing

Searchlight.create() also exposes tokenizer-related configuration for the built-in database tokenizer, including stemming, stemmer, stopWords, useDefaultStopWords, allowDuplicates, tokenizeSkipProperties, and stemmerSkipProperties.

By default, stemming is off. Built-in tokenizer settings round-trip through persistence. Injected Tokenizer instances and custom stemmer callbacks do not serialize.

Installation

dart pub add searchlight

# or from a Flutter app
flutter pub add searchlight

Quick Start

import 'package:searchlight/searchlight.dart';

Future<void> main() async {
  final db = Searchlight.create(
    schema: Schema({
      'url': const TypedField(SchemaType.string),
      'title': const TypedField(SchemaType.string),
      'content': const TypedField(SchemaType.string),
      'type': const TypedField(SchemaType.enumType),
    }),
  );

  db.insert({
    'id': 'ember-lance',
    'url': '/spells/ember-lance',
    'title': 'Ember Lance',
    'content': 'A focused lance of heat that ignites dry brush.',
    'type': 'spell',
  });

  db.insert({
    'id': 'iron-boar',
    'url': '/creatures/iron-boar',
    'title': 'Iron Boar',
    'content': 'A plated beast known for explosive charges.',
    'type': 'monster',
  });

  final results = db.search(
    term: 'ember',
    properties: const ['title', 'content'],
  );

  for (final hit in results.hits) {
    print('${hit.score.toStringAsFixed(2)} ${hit.document.getString('title')}');
  }

  await db.dispose();
}

Core Workflow

Searchlight does not extract your source data for you. Your app or tooling is responsible for turning content into records, and Searchlight handles the indexing and querying.

The common integration flow is:

Read or receive source content.
Convert it into structured records.
Insert those records into a Searchlight database.
Persist the built index if you want fast startup later.
Restore the persisted index and query it at runtime.

This applies equally to:

App-bundled JSON or markdown content
Remote content downloaded and cached on device
User-imported files such as PDFs after text extraction

If your app needs reusable extraction, keep that conversion layer in your app or in a companion package. For small integrations, simple record-conversion functions are often enough.

What Searchlight Can Index

Searchlight indexes schema-shaped records, not raw files.

That means the core package directly supports:

Map<String, Object?> records inserted with insert()
persisted snapshots restored with restore() or fromJson()
any source format that your app converts into those records first

The core package does not currently include built-in parsers for:

Markdown files
HTML files
PDF files
CSV, XML, or other file formats

If you insert raw HTML or Markdown into a string field yourself, Searchlight will tokenize that raw text. It will not strip tags, ignore attributes, or understand Markdown structure automatically. In practice, that means markup tokens and link-destination fragments can become searchable unless you clean or extract the text first.

In this repository specifically:

the core package accepts records and snapshots only
the validation example's live folder mode currently reads .md files only
the validation assets are JSON corpus and JSON snapshot files

Choose the Right Runtime Pattern

There are two common integration modes:

Build in memory from records
- best for tests, small corpora, and validation
- create Searchlight, insert records, search immediately
Restore from a persisted snapshot
- best for production apps with a non-trivial corpus
- build once, persist, then restore on future launches

The package supports both paths directly.

The repository validation workflow exercises both:

public fixture corpus -> build in memory -> search
generated local corpus -> build in memory -> search
generated local snapshot -> restore persisted index -> search

For the exact command sequence, see doc/validation-workflow.md.

Document writes are available through:

insert() / insertMultiple()
update() / updateMultiple()
upsert() / upsertMultiple()
patch()
remove() / removeMultiple()

Extensions

Searchlight exposes a Dart-native create-time extension surface:

SearchlightPlugin is the registration unit
lifecycle hooks register through top-level SearchlightPlugin fields such as beforeInsert, afterSearch, and afterCreate
SearchlightComponents can replace the active tokenizer, index, sorter, documentsStore, or pinning, and can override validateSchema, getDocumentIndexId, getDocumentProperties, and formatElapsedTime

This is enough to prove real component replacement. The test suite includes plugin-driven index swaps that force PT15 and QPS behavior through the plugin path rather than through the top-level algorithm flag alone.

Current limits to know before depending on extensions heavily:

registration unit: SearchlightPlugin
supported replacement surface: tokenizer, index, sorter, documentsStore, pinning, validateSchema, getDocumentIndexId, getDocumentProperties, and formatElapsedTime
hooks are sync-only in core operations; async hooks fail fast
restore contract: extension-backed snapshots must be restored with matching plugin order and compatible component IDs
conflicting component registrations now fail fast instead of using last-writer-wins resolution
hook coverage is intentionally limited to the documented SearchlightPlugin fields; there is no broader async initialization surface

Deeper parity notes live in docs/research/searchlight-extension-status.md.

Defining a Schema

Every database is created from a schema. String fields are searchable by full text. Other field types support filtering, grouping, sorting, or geosearch.

SchemaType	Dart type	Primary use
`string`	`String`	Full-text search
`number`	`num`	Range filters and sorting
`boolean`	`bool`	Boolean filters
`enumType`	`String` or `num`	Facets and exact-match filters
`geopoint`	`GeoPoint`	Geo radius and polygon filters
`stringArray`	`List<String>`	Full-text search over multiple values
`numberArray`	`List<num>`	Numeric filtering
`booleanArray`	`List<bool>`	Boolean filtering
`enumArray`	`List<String>` or `List<num>`	Facets and filters
`NestedField`	nested object	Dot-path access such as `meta.rating`

Searching

Searchlight supports full-text search with optional filters and result shaping.

final result = db.search(
  term: 'ember lance',
  properties: const ['title', 'content'],
  tolerance: 1,
  limit: 10,
  offset: 0,
  where: {
    'type': eq('spell'),
  },
  sortBy: const SortBy(field: 'title', order: SortOrder.asc),
);

Useful search options:

properties: limit search to specific string fields
where: apply typed filters
tolerance: allow fuzzy term matches
exact: require whole-word matches after scoring
limit and offset: paginate
sortBy: sort on sortable fields
facets: collect counts for enum and numeric fields
groupBy: group matching hits by one or more fields

Choosing a Search Algorithm

Searchlight supports three ranking algorithms:

SearchAlgorithm.bm25: default general-purpose relevance ranking
SearchAlgorithm.qps: proximity-aware scoring optimized for faster search and smaller indexes
SearchAlgorithm.pt15: position-aware scoring that can work well when term order and early-token placement matter

Choose the algorithm when creating the database:

final db = Searchlight.create(
  schema: schema,
  algorithm: SearchAlgorithm.qps,
);

Or rebuild an existing database with a different algorithm:

final qpsDb = db.reindex(algorithm: SearchAlgorithm.qps);

PT15 has important query limitations:

tolerance is not supported
exact is not supported
string-field where filters are not supported

If you need the broadest query feature support, stay with bm25.

final result = db.search(
  term: 'boar',
  where: {
    'type': eq('monster'),
  },
  facets: {
    'type': const FacetConfig(),
  },
  groupBy: const GroupBy(field: 'type', limit: 5),
);

Supported filters include eq, gt, gte, lt, lte, between, inFilter, ninFilter, filterContainsAll, filterContainsAny, geoRadius, geoPolygon, and, or, and not.

Persistence

If you have a non-trivial corpus, build the index once and persist it. Restoring a saved index is usually the right runtime path for production apps.

Future<void> example(Searchlight db) async {
  final storage = FileStorage(path: 'search-index.cbor');

  await db.persist(storage: storage);

  final restored = await Searchlight.restore(storage: storage);
  final result = restored.search(term: 'ember');
  await restored.dispose();
}

FileStorage is intended for dart:io platforms. If you want persisted JSON instead of CBOR, pass format: PersistenceFormat.json to both persist() and restore(). On web or in a custom app storage layer, implement your own SearchlightStorage or use toJson() and fromJson() directly.

Persistence supports reconstructible Searchlight.create() tokenizer settings such as stemming toggles, stop words, duplicate handling, and skip-property sets. Databases created with an injected Tokenizer or custom stemmer callback must be rebuilt instead of serialized.

If a snapshot was created with plugins or replacement components, restore it with the same plugin order and compatible component IDs. Searchlight stores extension compatibility metadata in the snapshot and rejects mismatched restore graphs instead of silently loading into the wrong runtime shape.

You can also work directly with JSON-compatible maps:

void example(Searchlight db) {
  final json = db.toJson();
  final restored = Searchlight.fromJson(json);
  restored.dispose();
}

Highlighting and Excerpts

Use the companion package searchlight_highlight after search to build excerpts or render marked matches. It does not change how documents are indexed.

import 'package:searchlight_highlight/searchlight_highlight.dart';

String buildExcerpt(SearchHit hit) {
  final highlighter = Highlight();
  final text = hit.document.getString('content');
  final highlight = highlighter.highlight(text, 'ember');
  return highlight.trim(160);
}

This is a good fit for:

Search result snippets
Inline <mark> or TextSpan rendering
Page-level excerpt generation in Flutter UI

App Integration Pattern

For most apps, you will want a small indexing layer that sits above Searchlight.

Example pattern:

Define the record shape your app will search.
Convert your content into that shape.
Build or restore the index in a repository/service.
Query from your UI layer.
Use searchlight_highlight to render excerpts.

The package includes a practical reference implementation:

example/ shows a Flutter validation app for fixture, snapshot, and desktop-folder indexing flows
example/tool/build_validation_assets.dart shows a simple extraction-to-index flow used by the example

For a fuller walkthrough, see doc/app-integration.md.

Validation Example

The package includes a validation workflow with:

Public-safe fixture data under test/fixtures/
An example-owned local-only .local/ corpus flow for private validation
A Flutter example app that can load either raw records or a persisted snapshot

See:

License

Apache License 2.0. See LICENSE.

Searchlight is an independent pure Dart reimplementation of Orama. It is not affiliated with or endorsed by the Orama project. See NOTICE for attribution.