HTML to Markdown Rust
A high-performance Dart package that converts HTML to Markdown using Rust's html-to-markdown-rs library via FFI (Foreign Function Interface).
This package provides fast, memory-efficient HTML to Markdown conversion with support for complex HTML structures including headings, paragraphs, lists, links, images, tables, and more.
Features
- High Performance: Rust-powered conversion for optimal speed
- Comprehensive HTML Support: Handles most common HTML elements
- Headings (
<h1>through<h6>) - Paragraphs (
<p>) - Lists (ordered and unordered)
- Links (
<a>) with automatic Markdown syntax - Images (
<img>) with alt text - Bold (
<strong>,<b>) and italic (<em>,<i>) - Code (
<code>,<pre>) - Blockquotes (
<blockquote>) - Tables (
<table>,<tr>,<td>,<th>) - Horizontal rules (
<hr>) - Line breaks (
<br>)
- Headings (
- Memory Safe: Proper memory management with automatic cleanup
- Null Safe Dart API: Follows Dart's null safety principles
- Cross Platform: Works on macOS, Linux, and Windows
Requirements
- Dart SDK:
^3.10.1or higher - Rust toolchain: Required for building the native library (automatically handled by
native_toolchain_rust)
Installation
Add this to your package's pubspec.yaml:
dependencies:
html_to_markdown_rust: ^0.1.2
Then run:
dart pub get
Building
The native Rust library is built automatically when you run your Dart application. The native_toolchain_rust package handles this process.
If you need to rebuild the native library manually:
cd rust
cargo build --release
Usage
Basic Conversion
Convert a simple HTML string to Markdown:
import 'package:html_to_markdown_rust/html_to_markdown_rust.dart';
void main() {
final html = '<h1>Hello World</h1><p>This is a test.</p>';
final markdown = htmlToMarkdown(html);
print(markdown);
// Output: # Hello World
// This is a test.
}
Complex HTML
Convert more complex HTML structures:
void main() {
final html = '''
<div class="container">
<h2>Features</h2>
<ul>
<li>Fast conversion</li>
<li>Memory efficient</li>
<li>Easy to use</li>
</ul>
<p>Check out our <a href="https://example.com">website</a>!</p>
</div>
''';
final markdown = htmlToMarkdown(html);
print(markdown);
// Output: ## Features
// - Fast conversion
// - Memory efficient
// - Easy to use
// Check out our [website](https://example.com)!
}
Error Handling
The conversion function throws an exception if the conversion fails:
void main() {
try {
final markdown = htmlToMarkdown(html);
print(markdown);
} on Exception catch (e) {
print('Conversion failed: $e');
}
}
API Reference
htmlToMarkdown(String html)
Converts HTML string to Markdown format.
Parameters:
html- The HTML string to convert
Returns:
String- The converted Markdown
Throws:
Exception- If the conversion fails
How it Works
This package uses Dart's FFI (Foreign Function Interface) to call Rust functions that perform the HTML to Markdown conversion:
- The Dart function
htmlToMarkdown()converts the input HTML string to a UTF-8 C string - The Rust function
htm_convert()processes the HTML using thehtml-to-markdown-rslibrary - The result is returned as a C string pointer
- Dart converts the result back to a Dart String
- The Rust memory is freed using
htm_free_string()
Architecture
Dart Layer (lib/)
├── html_to_markdown_rust.dart # Public API entry point
├── src/
│ ├── html_to_markdown.dart # Conversion logic with FFI
│ ├── codec.dart # Codec utilities
│ └── bindings.g.dart # Generated FFI bindings
Rust Layer (rust/)
├── src/
│ └── lib.rs # FFI interface functions
├── Cargo.toml # Rust dependencies
└── build.rs # Build configuration
Development
Running Tests
# Run all tests
dart test
# Run with coverage
dart test --coverage=coverage
Regenerating FFI Bindings
If you modify the Rust API, regenerate the bindings:
dart run ffigen
Code Generation
After modifying the Rust code, rebuild the native library:
# The native library will be built automatically when you run your app
# or you can manually trigger it with:
dart run build_runner build
Performance
This package is optimized for performance:
- Conversion happens in Rust for maximum speed
- Minimal memory overhead with proper cleanup
- Efficient string handling via FFI
Benchmarks (on typical HTML documents):
- Simple documents: < 1ms
- Medium documents (10KB): ~2-5ms
- Large documents (100KB): ~10-20ms
Benchmark Results
We've benchmarked html_to_markdown_rust against the popular html2md package:
┌─────────────────────────┬──────────┬──────────────────┬──────────────────┬───────────┐
│ Test Case │ HTML Size│ html_to_markdown_│ │ │
│ │ (bytes) │ rust │ html2md │ Speedup │
│ │ │ (μs/op) │ (μs/op) │ │
├─────────────────────────┼──────────┼──────────────────┼──────────────────┼───────────┤
│ Simple HTML │ 66 B │ 26.39 μs │ 104.61 μs │ 3.96x ↑ │
│ Complex HTML │ 1.2 KB │ 343.92 μs │ 1.51 ms │ 4.38x ↑ │
│ Nested HTML │ 978 B │ 288.24 μs │ 1.08 ms │ 3.76x ↑ │
│ Large HTML │ 37.1 KB │ 8.66 ms │ 132.53 ms │ 15.31x ↑ │
└─────────────────────────┴──────────┴──────────────────┴──────────────────┴───────────┘
📊 Summary Statistics:
─────────────────────────────────────────────────────────────────────
Average Speedup: 6.85x faster
Geometric Mean Speedup: 5.62x faster
Total HTML Size Benchmarked: 39.3 KB
Number of Benchmarks: 4
─────────────────────────────────────────────────────────────────────
Running Benchmarks
You can run the benchmarks yourself to see the performance on your system:
cd benchmark
dart pub get
dart run main.dart
For more details, see the benchmark/README.md file.
Limitations
- The conversion follows the behavior of the underlying
html-to-markdown-rscrate - Some advanced HTML features may not be fully supported
- Custom HTML tags are typically removed during conversion
- JavaScript and CSS are not processed (as expected for Markdown output)
Troubleshooting
Build Errors
If you encounter build errors related to Rust:
- Ensure Rust is installed:
rustc --version - Update the Rust toolchain:
rustup update - Clean and rebuild:
dart clean && dart pub get
Runtime Errors
If conversion fails:
- Ensure the input is valid UTF-8 encoded HTML
- Check for memory issues with very large HTML documents
- Review the error message for specific failure causes
Contributing
Contributions are welcome! Please feel free to submit issues or pull requests.
Development Workflow
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
License
This package is released under the MIT License. See the LICENSE file for details.
Credits
- Built with html-to-markdown-rs Rust crate
- Uses Dart's FFI for native interop
- Hooks for build process: hooks
- Native library building handled by native_toolchain_rust
Support
For issues, questions, or contributions:
- GitHub Issues: https://github.com/shigomany/html_to_markdown_rust/issues
- Documentation: https://github.com/shigomany/html_to_markdown_rust
Libraries
- html_to_markdown_rust
- HTML to Markdown Converter (Rust-powered)