addSourceWithChunking method - SourceRagService class - source_rag_service library

Add a source document with automatic chunking and embedding.

The document is:

Split into chunks based on file type (auto-detected from filePath)
Each chunk is embedded (micro-batch streaming, kIngestionBatchSize at a time)
Source and chunks are incrementally stored in DB

If filePath is provided, chunking strategy is auto-detected:

.md, .markdown → Markdown-aware chunking (preserves headers, code blocks)
Other files → Default recursive chunking

Memory safety: Uses streaming micro-batch pipeline instead of loading all embeddings into memory. Each batch of kIngestionBatchSize chunks is embedded, saved to DB, then released — keeping memory usage flat.

chunkDelay controls the yield duration between batches (default: 10ms). This allows GC to run and prevents thermal throttling on mobile devices.

Implementation

Future<SourceAddResult> addSourceWithChunking( String content, { String? metadata, String? name, String? filePath, ChunkingStrategy? strategy, Duration? chunkDelay, void Function(int done, int total)? onProgress, }) async { final effectiveStrategy = strategy ?? (filePath != null && (filePath.endsWith('.md') || filePath.endsWith('.markdown')) ? ChunkingStrategy.markdown : ChunkingStrategy.recursive); // Hand the document body to Rust exactly once. `prepareSourceIngestion` // performs the source-row INSERT, claim, duplicate decision, chunker run, // and stages chunk content in a Rust-resident IngestSession — so neither // the chunk content nor the full document round-trips back to Dart. final prepared = await rust_ingest.prepareSourceIngestion( collectionId: collectionId, content: content, metadata: metadata, name: name ?? filePath, strategy: _toIngestStrategy(effectiveStrategy), maxChars: maxChunkChars, overlapChars: overlapChars, ); return _runPreparedIngestion( prepared, chunkDelay: chunkDelay, onProgress: onProgress, ); }