kreuzberg 1.0.0
kreuzberg: ^1.0.0 copied to clipboard
High-performance document intelligence library — extract text, metadata, tables from 97+ formats including PDF, DOCX, images, and email.
1.0.0 #
- Initial release candidate
- Document extraction (text, metadata, tables) from 97+ formats
- OCR via Tesseract, PaddleOCR, VLM backends
- HTML-to-Markdown conversion
- PDF rendering
- Code intelligence via tree-sitter (248 languages)
- MIME type detection (118+ extensions)
- LLM-powered structured extraction
- Batch document processing
- Embeddings generation via ONNX Runtime