mcp_ingest 0.1.0 copy "mcp_ingest: ^0.1.0" to clipboard
mcp_ingest: ^0.1.0 copied to clipboard

Convert any file format to standardized document structure for the MCP knowledge system.

MCP Ingest #

A document ingestion pipeline for MakeMind. Converts any file format into a standardized NormalizedDocument with chunking, OCR/ASR support, and fragment extraction.

Features #

  • Universal input — files, raw bytes, URLs, streams.
  • Format coverage — HTML, XML, YAML, JSON, Markdown, archives, plus pluggable handlers for PDF, audio, image (via OCR/ASR ports).
  • Chunking — configurable strategies for downstream embedding / retrieval.
  • Fragment extraction — semantic fragment splitting.
  • Pluggable OCR / ASR ports — bridge to your provider of choice.
  • Pipeline compositionIngestPipeline.defaults() for common cases, fully customizable for advanced use.

Quick Start #

import 'package:mcp_ingest/mcp_ingest.dart';

final pipeline = IngestPipeline.defaults();

final result = await pipeline.ingest(
  IngestInput.fromFile('document.pdf'),
  IngestOptions.defaults,
);

print('Extracted ${result.chunks.length} chunks');

With OCR/ASR plugins:

final pipeline = IngestPipeline(
  ocrPort: tesseractOcrPlugin,
  asrPort: whisperAsrPlugin,
);

Support #

License #

MIT — see LICENSE.

0
likes
130
points
134
downloads

Documentation

API reference

Publisher

unverified uploader

Weekly Downloads

Convert any file format to standardized document structure for the MCP knowledge system.

Homepage
Repository (GitHub)
View/report issues

Topics

#ingest #mcp #document #parsing #normalization

License

MIT (license)

Dependencies

archive, async, crypto, html, markdown, mcp_bundle, mcp_fact_graph, mime, path, pool, stream_transform, uuid, xml, yaml

More

Packages that depend on mcp_ingest