reader_mode 0.2.0 copy "reader_mode: ^0.2.0" to clipboard
reader_mode: ^0.2.0 copied to clipboard

A Dart port of Mozilla's Readability.js content extraction library.

Readability #

A Dart port of Mozilla's Readability.js - extract readable content from any web page.

Requirements #

  • Dart SDK >=3.3.0 <4.0.0

Installation #

As a Dart Package #

Add to your pubspec.yaml:

dependencies:
  reader_mode: ^0.1.0

As a CLI Tool #

Download pre-built binaries from Releases, or build from source:

make build-cli
./build/readability --help

As a JavaScript Library #

Compile to JavaScript for use in browsers or Node.js:

make build-js
# Output: build/readability.js

Usage #

Dart Package #

import 'package:reader_mode/reader_mode.dart';

void main() {
  final html = '<html>...</html>';

  // Parse with default JSDOMParser
  final article = parse(html, baseUri: 'https://example.com');

  if (article != null) {
    print('Title: ${article.title}');
    print('Author: ${article.byline}');
    print('Content: ${article.textContent}');
  }
}

Check if a Page is Readable

import 'package:html/parser.dart' as html_parser;

final doc = html_parser.parse(html);
if (isProbablyReaderable(doc)) {
  final article = parse(html);
  // ...
}

Configuration Options

// All options are named parameters on parse()
final article = parse(
  html,
  parser: ParserType.jsdom,  // or ParserType.html
  baseUri: 'https://example.com',
  debug: false,              // Enable debug logging to stdout
  charThreshold: 500,        // Minimum content length
  maxElemsToParse: 0,        // Element limit (0 = unlimited)
  keepClasses: false,        // Preserve CSS classes
);

Custom Logger

// Pass a custom logger callback for debug messages
final article = parse(
  html,
  logger: (message) => myLogger.debug(message),
  debug: true, // <- without this, logger won't be called
);

Article Properties

Property Type Description
title String Article title
content String HTML content
textContent String Plain text content
excerpt String Short description
byline String? Author name
siteName String? Site name
lang String? Language code
publishedTime String? Publication date

Command Line #

# Extract from file
readability article.html

# Extract from URL
readability https://example.com/article

# Read from stdin
curl -s https://example.com | readability -

# Output as JSON
readability --json article.html

# Metadata only
readability --metadata article.html

JavaScript (Browser/Node.js) #

After compiling with make build-js:

import { parse, isProbablyReaderable } from './readability.js';

const html = `
  <html>
    <head><title>My Article</title></head>
    <body>
      <article>
        <h1>Hello World</h1>
        <p>This is the main content of the article...</p>
      </article>
    </body>
  </html>
`;

// Quick check if content is worth parsing
if (isProbablyReaderable(html)) {
  const article = parse(html, { baseUri: 'https://example.com' });

  if (article) {
    console.log('Title:', article.title);
    console.log('Author:', article.byline);
    console.log('Content:', article.textContent);
  }
}

TypeScript definitions are included in build/readability.d.ts.

Alternative Parsers #

The library supports two HTML parsers via the parser parameter:

// JSDOMParser (default, recommended)
final article = parse(html, parser: ParserType.jsdom);

// html package parser (pure Dart)
final article = parse(html, parser: ParserType.html);
Parser Accuracy Speed Use Case
JSDOMParser Highest Fast Production, compatibility
html package High Moderate Pure Dart preference

Contributing #

See CONTRIBUTING.md for development setup and guidelines.

License #

This project uses dual licensing:

  • Apache License 2.0 - Main library code
  • Mozilla Public License 2.0 - JSDOMParser (ported from Mozilla)

Both licenses are open source and commercial-friendly. The dual licensing ensures compatibility with Mozilla's original codebase while providing flexibility for most use cases.

What This Means #

  • You can use this library in commercial and open source projects
  • Modifications to MPL-licensed files (JSDOMParser) must be shared under MPL
  • The rest of the library can be used under Apache 2.0 terms

See the LICENSE file for full details.


Based on Mozilla Readability.js by Arc90 Inc and Mozilla.

0
likes
0
points
70
downloads

Publisher

verified publishermortz.dev

Weekly Downloads

A Dart port of Mozilla's Readability.js content extraction library.

Repository (GitHub)
View/report issues

Topics

#html #parser #readability #content-extraction #article

License

unknown (license)

Dependencies

html

More

Packages that depend on reader_mode