MCP Ingest
A document ingestion pipeline for MakeMind. Converts any file format into a standardized NormalizedDocument with chunking, OCR/ASR support, and fragment extraction.
Features
- Universal input — files, raw bytes, URLs, streams.
- Format coverage — HTML, XML, YAML, JSON, Markdown, archives, plus pluggable handlers for PDF, audio, image (via OCR/ASR ports).
- Chunking — configurable strategies for downstream embedding / retrieval.
- Fragment extraction — semantic fragment splitting.
- Pluggable OCR / ASR ports — bridge to your provider of choice.
- Pipeline composition —
IngestPipeline.defaults()for common cases, fully customizable for advanced use.
Quick Start
import 'package:mcp_ingest/mcp_ingest.dart';
final pipeline = IngestPipeline.defaults();
final result = await pipeline.ingest(
IngestInput.fromFile('document.pdf'),
IngestOptions.defaults,
);
print('Extracted ${result.chunks.length} chunks');
With OCR/ASR plugins:
final pipeline = IngestPipeline(
ocrPort: tesseractOcrPlugin,
asrPort: whisperAsrPlugin,
);
Support
License
MIT — see LICENSE.
Libraries
- mcp_ingest
- MCP Ingest - Document ingestion pipeline for MakeMind.