MCP Ingest

A document ingestion pipeline for MakeMind. Converts any file format into a standardized NormalizedDocument with chunking, OCR/ASR support, and fragment extraction.

Features

  • Universal input — files, raw bytes, URLs, streams.
  • Format coverage — HTML, XML, YAML, JSON, Markdown, archives, plus pluggable handlers for PDF, audio, image (via OCR/ASR ports).
  • Chunking — configurable strategies for downstream embedding / retrieval.
  • Fragment extraction — semantic fragment splitting.
  • Pluggable OCR / ASR ports — bridge to your provider of choice.
  • Pipeline compositionIngestPipeline.defaults() for common cases, fully customizable for advanced use.

Quick Start

import 'package:mcp_ingest/mcp_ingest.dart';

final pipeline = IngestPipeline.defaults();

final result = await pipeline.ingest(
  IngestInput.fromFile('document.pdf'),
  IngestOptions.defaults,
);

print('Extracted ${result.chunks.length} chunks');

With OCR/ASR plugins:

final pipeline = IngestPipeline(
  ocrPort: tesseractOcrPlugin,
  asrPort: whisperAsrPlugin,
);

Support

License

MIT — see LICENSE.

Libraries

mcp_ingest
MCP Ingest - Document ingestion pipeline for MakeMind.