kreuzcrawl 0.3.0 copy "kreuzcrawl: ^0.3.0" to clipboard
kreuzcrawl: ^0.3.0 copied to clipboard

High-performance web crawling engine

kreuzcrawl #

Dart / Flutter bindings for kreuzcrawl — a high-performance Rust web crawling engine. Built with flutter_rust_bridge for isolate-safe Future APIs and Flutter-native type mapping.

What This Package Provides #

  • Same crawler as every binding — one Rust engine behind Python, Node.js, Ruby, Go, Java, .NET, PHP, Elixir, Dart, Kotlin Android, Swift, Zig, WASM, and C FFI.
  • Structured scrape output — HTML, Markdown, metadata, links, assets, response headers, and extraction warnings with consistent field names.
  • Crawl controls — depth, page limits, concurrency, URL filters, robots/sitemap handling, rate limits, and partial failure reporting.
  • Rendering path — optional browser rendering for JavaScript-heavy pages; direct HTTP path for fast static pages.
  • Dart package — flutter_rust_bridge Future APIs for Dart and Flutter.

Installation #

dart pub add kreuzcrawl

Agent plugin #

The kreuzcrawl plugin is available via the kreuzberg-dev/plugins marketplace.

/plugin marketplace add kreuzberg-dev/plugins
/plugin install kreuzcrawl@kreuzberg

Works with Claude Code, Codex, Cursor, Gemini CLI, Factory Droid, GitHub Copilot CLI, and opencode. See the marketplace README for harness-specific install instructions.

Quick Start #

import 'package:kreuzcrawl/kreuzcrawl.dart';
import 'package:kreuzcrawl/src/kreuzcrawl_bridge_generated/frb_generated.dart'
    show RustLib;

Future<void> main() async {
  await RustLib.init();

  // Simplest case: scrape a single page with default settings.
  final engine = await KreuzcrawlBridge.createEngine();
  final result = await KreuzcrawlBridge.scrape(engine, 'https://example.com/');
  print('Title: ${result.metadata.title ?? ''}');
  print('Status: ${result.statusCode}');
  print('Links found: ${result.links.length}');

  // Crawl from a seed URL, limited to one hop and a handful of pages.
  final crawlConfig = await createCrawlConfigFromJson(
    json: r'{"max_depth":1,"max_pages":5}',
  );
  final crawlEngine = await KreuzcrawlBridge.createEngine(config: crawlConfig);
  final crawlResult = await KreuzcrawlBridge.crawl(
    crawlEngine,
    'https://en.wikipedia.org/wiki/Web_scraping',
  );
  print('Pages crawled: ${crawlResult.pages.length}');
}

API Reference #

Full API documentation is available at docs.kreuzcrawl.kreuzberg.dev.

Key functions:

  • create_engine(config?) — Create a crawl engine with optional configuration
  • scrape(engine, url) — Scrape a single URL
  • crawl(engine, url) — Crawl a website following links
  • map_urls(engine, url) — Discover all pages on a site
  • batch_scrape(engine, urls) — Scrape multiple URLs concurrently
  • batch_crawl(engine, urls) — Crawl multiple seed URLs concurrently

Contributing #

Contributions are welcome! Please see our Contributing Guide for details.

Part of Kreuzberg.dev #

  • Kreuzberg — document intelligence: text, tables, metadata from 91+ formats with optional OCR.
  • Kreuzberg Cloud — managed extraction API with SDKs, dashboards, and observability.
  • kreuzcrawl — web crawling and scraping with HTML→Markdown and headless-Chrome fallback.
  • html-to-markdown — fast, lossless HTML→Markdown engine.
  • liter-llm — universal LLM API client with native bindings for 14 languages and 143 providers.
  • tree-sitter-language-pack — tree-sitter grammars and code-intelligence primitives.
  • alef — the polyglot binding generator that produces every per-language binding across the 5 polyglot repos.
  • Discord — community, roadmap, announcements.

License #

This project is licensed under Elastic License 2.0.