searchlight_parsedoc 0.2.2
searchlight_parsedoc: ^0.2.2 copied to clipboard
HTML and Markdown population helpers for Searchlight with VM file support.
Searchlight Parsedoc #
Searchlight Parsedoc is a pure Dart reimplementation of Orama's Parsedoc helper package shape for Searchlight, the independent Dart reimplementation of an in-memory search and indexing model.
Companion core package:
searchlightprovides the core indexing, querying, and persistence runtime this package builds on.
It turns HTML and Markdown into flat Searchlight-ready records through this helper surface:
defaultHtmlSchemaparseFile(...)populate(...)populateFromGlob(...)
Status #
searchlight_parsedoc currently exposes this documented helper contract:
defaultHtmlSchemaparseFile(data, fileType, options: ...)populate(db, data, fileType, options: ...)populateFromGlob(db, pattern, options: ...)MergeStrategyNodeContentTransformFnPopulateFnContext
Important package-shape note:
- the package is helper-driven, not a create-time plugin object
- the public package surface stays intentionally narrow
Platform Support #
searchlight_parsedoc is a pure Dart package, but the current top-level import
also exports dart:io helpers such as populateFromGlob(...),
parseMarkdownFile(...), parseHtmlFile(...), and parseLocalFile(...).
That means:
- import
package:searchlight_parsedoc/searchlight_parsedoc.dartonly in Dart or Flutter VM targets for now parseFile(...)andpopulate(...)acceptStringorList<int>input- separate web-safe exports are not available yet
Installation #
dart pub add searchlight_parsedoc
# or from a Flutter app
flutter pub add searchlight_parsedoc
Quick Start #
Create a Searchlight database that covers the minimal searchable Parsedoc fields:
import 'package:searchlight/searchlight.dart';
import 'package:searchlight_parsedoc/searchlight_parsedoc.dart';
final db = Searchlight.create(
schema: Schema({
'type': const TypedField(SchemaType.string),
'content': const TypedField(SchemaType.string),
'path': const TypedField(SchemaType.string),
}),
);
Populate it from Markdown or HTML content:
import 'dart:convert';
import 'package:searchlight/searchlight.dart';
import 'package:searchlight_parsedoc/searchlight_parsedoc.dart';
Future<void> main() async {
final db = Searchlight.create(
schema: Schema({
'type': const TypedField(SchemaType.string),
'content': const TypedField(SchemaType.string),
'path': const TypedField(SchemaType.string),
}),
);
final ids = await populate(
db,
utf8.encode('# Ember Lance\n\nA focused lance of heat.'),
'md',
);
final result = db.search(
term: 'ember',
properties: const ['content'],
);
print(ids);
print(result.count);
await db.dispose();
}
Inspect extracted records without inserting them:
import 'dart:convert';
import 'package:searchlight_parsedoc/searchlight_parsedoc.dart';
Future<void> main() async {
final records = await parseFile(
utf8.encode('<div><p>First</p><p>Second</p></div>'),
'html',
);
print(records);
}
The extracted record shape follows the current helper contract:
{
'type': 'p',
'content': 'First Second',
'path': 'root[0].div[0]',
'properties': <String, Object?>{},
}
properties is optional record metadata. It is not part of
defaultHtmlSchema, and Searchlight does not need it declared in the schema
unless you choose to model it yourself outside the default helper path.
Merge Strategies #
Parsedoc exposes these merge modes:
MergeStrategy.merge: merge consecutive compatible sibling text nodesMergeStrategy.split: emit one record per text nodeMergeStrategy.both: emit split records plus a merged companion record
Transform Contract #
The transform contract looks like this:
final options = PopulateOptions(
transformFn: (node, context) {
return node.copyWith(
additionalProperties: {'section': context['section']},
);
},
context: const {'section': 'intro'},
);
NodeContent exposes:
tagrawcontentpropertiesadditionalProperties
Folder Population #
For VM targets, populateFromGlob(...) provides folder-ingestion helpers:
import 'package:searchlight/searchlight.dart';
import 'package:searchlight_parsedoc/searchlight_parsedoc.dart';
Future<void> main() async {
final db = Searchlight.create(
schema: Schema({
'type': const TypedField(SchemaType.string),
'content': const TypedField(SchemaType.string),
'path': const TypedField(SchemaType.string),
}),
);
await populateFromGlob(db, 'content/*');
await db.dispose();
}
Current note:
- the current helper supports
.mdand.html .markdownand.htmare not part of the current documented helper surface
Additive Dart APIs #
Beyond the core helper surface, this package also keeps additive Dart-oriented parser/model APIs:
ParsedFormatParsedBlockParsedDocumentparseMarkdownString(...)parseHtmlString(...)parseMarkdownFile(...)parseHtmlFile(...)parseLocalFile(...)SearchlightDocumentRecordMapperSearchlightBlockRecordMapper
These APIs are additive. They are useful for Dart apps, but they are not part of the narrow helper contract described above.
Combining With searchlight_highlight #
Use searchlight_parsedoc to extract Markdown and HTML into Searchlight-ready
records, then pair it with
searchlight_highlight when
you want post-search snippets, highlighted match ranges, or rendered excerpts.
Example App #
The repo includes a Flutter desktop validation app under
example/.
That app is intentionally wired through the public package surface:
- it depends on published
searchlightfrom pub.dev - it depends on local
searchlight_parsedocby path - it loads a folder of live
.mdand.htmlfiles - it uses
populate(...)plusparseFile(...) - it searches the populated Searchlight database
- it can be paired with
searchlight_highlightafter search for snippets or highlighted match ranges - it lets you inspect extracted record paths alongside the source preview
Additional Information #
This package follows Searchlight's architecture split:
searchlightcore owns indexing and search- companion packages own source-format extraction and ingestion helpers
License And Attribution #
Searchlight Parsedoc is an independent pure Dart reimplementation of Orama's Parsedoc package shape. It is not affiliated with or endorsed by the Orama project.