Readability

pub package License

A Dart port of Mozilla's Readability.js - extract readable content from any web page.

Requirements

  • Dart SDK >=3.3.0 <4.0.0

Installation

As a Dart Package

Add to your pubspec.yaml:

dependencies:
  reader_mode: # see pub.dev for latest version

Or run:

dart pub add reader_mode

As a CLI Tool

Download pre-built binaries from Releases, or build from source:

make build-cli
./build/readability --help

As a JavaScript Library

Compile to JavaScript for use in browsers or Node.js:

make build-js
# Output: build/readability.js

Usage

Dart Package

import 'package:reader_mode/reader_mode.dart';

void main() {
  final html = '<html>...</html>';

  // Parse with default JSDOMParser
  final article = parse(html, baseUri: 'https://example.com');

  if (article != null) {
    print('Title: ${article.title}');
    print('Author: ${article.byline}');
    print('Content: ${article.textContent}');
  }
}

Check if a Page is Readable

import 'package:html/parser.dart' as html_parser;

final doc = html_parser.parse(html);
if (isProbablyReaderable(doc)) {
  final article = parse(html);
  // ...
}

Configuration Options

// All options are named parameters on parse()
final article = parse(
  html,
  parser: ParserType.jsdom,  // or ParserType.html
  baseUri: 'https://example.com',
  debug: false,              // Enable debug logging to stdout
  charThreshold: 500,        // Minimum content length
  maxElemsToParse: 0,        // Element limit (0 = unlimited)
  keepClasses: false,        // Preserve CSS classes
);

Custom Logger

// Pass a custom logger callback for debug messages
final article = parse(
  html,
  logger: (message) => myLogger.debug(message),
  debug: true, // <- without this, logger won't be called
);

Article Properties

Property Type Description
title String Article title
content String HTML content
textContent String Plain text content
excerpt String Short description
byline String? Author name
siteName String? Site name
lang String? Language code
publishedTime String? Publication date

Command Line

# Extract from file
readability article.html

# Extract from URL
readability https://example.com/article

# Read from stdin
curl -s https://example.com | readability -

# Output as JSON
readability --json article.html

# Metadata only
readability --metadata article.html

JavaScript (Browser/Node.js)

After compiling with make build-js:

import { parse, isProbablyReaderable } from './readability.js';

const html = `
  <html>
    <head><title>My Article</title></head>
    <body>
      <article>
        <h1>Hello World</h1>
        <p>This is the main content of the article...</p>
      </article>
    </body>
  </html>
`;

// Quick check if content is worth parsing
if (isProbablyReaderable(html)) {
  const article = parse(html, { baseUri: 'https://example.com' });

  if (article) {
    console.log('Title:', article.title);
    console.log('Author:', article.byline);
    console.log('Content:', article.textContent);
  }
}

TypeScript definitions are included in build/readability.d.ts.

Alternative Parsers

The library supports two HTML parsers via the parser parameter:

// JSDOMParser (default, recommended)
final article = parse(html, parser: ParserType.jsdom);

// html package parser (pure Dart)
final article = parse(html, parser: ParserType.html);
Parser Accuracy Speed Use Case
JSDOMParser Highest Fast Production, compatibility
html package High Moderate Pure Dart preference

Contributing

See CONTRIBUTING.md for development setup and guidelines.

License

This project uses dual licensing:

Both licenses are open source and commercial-friendly. The dual licensing ensures compatibility with Mozilla's original codebase while providing flexibility for most use cases.

What This Means

  • You can use this library in commercial and open source projects
  • Modifications to MPL-licensed files (JSDOMParser) must be shared under MPL
  • The rest of the library can be used under Apache 2.0 terms

See the LICENSE file for full details.


Based on Mozilla Readability.js by Arc90 Inc and Mozilla.

Libraries

reader_mode
A Dart port of Mozilla's Readability.js content extraction library.