addSourceWithChunking method
Add a source document with automatic chunking and embedding.
The document is:
- Split into chunks based on file type (auto-detected from
filePath) - Each chunk is embedded (micro-batch streaming,
kIngestionBatchSizeat a time) - Source and chunks are incrementally stored in DB
If filePath is provided, chunking strategy is auto-detected:
.md,.markdown→ Markdown-aware chunking (preserves headers, code blocks)- Other files → Default recursive chunking
Memory safety: Uses streaming micro-batch pipeline instead of loading
all embeddings into memory. Each batch of kIngestionBatchSize chunks is
embedded, saved to DB, then released — keeping memory usage flat.
chunkDelay controls the yield duration between batches (default: 10ms).
This allows GC to run and prevents thermal throttling on mobile devices.
Implementation
Future<SourceAddResult> addSourceWithChunking(
String content, {
String? metadata,
String? name,
String? filePath,
ChunkingStrategy? strategy,
Duration? chunkDelay,
void Function(int done, int total)? onProgress,
}) async {
final effectiveStrategy = strategy ??
(filePath != null &&
(filePath.endsWith('.md') || filePath.endsWith('.markdown'))
? ChunkingStrategy.markdown
: ChunkingStrategy.recursive);
// Hand the document body to Rust exactly once. `prepareSourceIngestion`
// performs the source-row INSERT, claim, duplicate decision, chunker run,
// and stages chunk content in a Rust-resident IngestSession — so neither
// the chunk content nor the full document round-trips back to Dart.
final prepared = await rust_ingest.prepareSourceIngestion(
collectionId: collectionId,
content: content,
metadata: metadata,
name: name ?? filePath,
strategy: _toIngestStrategy(effectiveStrategy),
maxChars: maxChunkChars,
overlapChars: overlapChars,
);
return _runPreparedIngestion(
prepared,
chunkDelay: chunkDelay,
onProgress: onProgress,
);
}