reader_mode 0.2.0
reader_mode: ^0.2.0 copied to clipboard
A Dart port of Mozilla's Readability.js content extraction library.
Readability #
A Dart port of Mozilla's Readability.js - extract readable content from any web page.
Requirements #
- Dart SDK >=3.3.0 <4.0.0
Installation #
As a Dart Package #
Add to your pubspec.yaml:
dependencies:
reader_mode: ^0.1.0
As a CLI Tool #
Download pre-built binaries from Releases, or build from source:
make build-cli
./build/readability --help
As a JavaScript Library #
Compile to JavaScript for use in browsers or Node.js:
make build-js
# Output: build/readability.js
Usage #
Dart Package #
import 'package:reader_mode/reader_mode.dart';
void main() {
final html = '<html>...</html>';
// Parse with default JSDOMParser
final article = parse(html, baseUri: 'https://example.com');
if (article != null) {
print('Title: ${article.title}');
print('Author: ${article.byline}');
print('Content: ${article.textContent}');
}
}
Check if a Page is Readable
import 'package:html/parser.dart' as html_parser;
final doc = html_parser.parse(html);
if (isProbablyReaderable(doc)) {
final article = parse(html);
// ...
}
Configuration Options
// All options are named parameters on parse()
final article = parse(
html,
parser: ParserType.jsdom, // or ParserType.html
baseUri: 'https://example.com',
debug: false, // Enable debug logging to stdout
charThreshold: 500, // Minimum content length
maxElemsToParse: 0, // Element limit (0 = unlimited)
keepClasses: false, // Preserve CSS classes
);
Custom Logger
// Pass a custom logger callback for debug messages
final article = parse(
html,
logger: (message) => myLogger.debug(message),
debug: true, // <- without this, logger won't be called
);
Article Properties
| Property | Type | Description |
|---|---|---|
title |
String |
Article title |
content |
String |
HTML content |
textContent |
String |
Plain text content |
excerpt |
String |
Short description |
byline |
String? |
Author name |
siteName |
String? |
Site name |
lang |
String? |
Language code |
publishedTime |
String? |
Publication date |
Command Line #
# Extract from file
readability article.html
# Extract from URL
readability https://example.com/article
# Read from stdin
curl -s https://example.com | readability -
# Output as JSON
readability --json article.html
# Metadata only
readability --metadata article.html
JavaScript (Browser/Node.js) #
After compiling with make build-js:
import { parse, isProbablyReaderable } from './readability.js';
const html = `
<html>
<head><title>My Article</title></head>
<body>
<article>
<h1>Hello World</h1>
<p>This is the main content of the article...</p>
</article>
</body>
</html>
`;
// Quick check if content is worth parsing
if (isProbablyReaderable(html)) {
const article = parse(html, { baseUri: 'https://example.com' });
if (article) {
console.log('Title:', article.title);
console.log('Author:', article.byline);
console.log('Content:', article.textContent);
}
}
TypeScript definitions are included in build/readability.d.ts.
Alternative Parsers #
The library supports two HTML parsers via the parser parameter:
// JSDOMParser (default, recommended)
final article = parse(html, parser: ParserType.jsdom);
// html package parser (pure Dart)
final article = parse(html, parser: ParserType.html);
| Parser | Accuracy | Speed | Use Case |
|---|---|---|---|
| JSDOMParser | Highest | Fast | Production, compatibility |
| html package | High | Moderate | Pure Dart preference |
Contributing #
See CONTRIBUTING.md for development setup and guidelines.
License #
This project uses dual licensing:
- Apache License 2.0 - Main library code
- Mozilla Public License 2.0 - JSDOMParser (ported from Mozilla)
Both licenses are open source and commercial-friendly. The dual licensing ensures compatibility with Mozilla's original codebase while providing flexibility for most use cases.
What This Means #
- You can use this library in commercial and open source projects
- Modifications to MPL-licensed files (JSDOMParser) must be shared under MPL
- The rest of the library can be used under Apache 2.0 terms
See the LICENSE file for full details.
Based on Mozilla Readability.js by Arc90 Inc and Mozilla.