reader_mode library
A Dart port of Mozilla's Readability.js content extraction library.
This library extracts the main readable content from a web page, stripping away navigation, ads, and other non-content elements.
Licensing
This library is licensed under the Apache License 2.0, except for
jsdom_parser.dart which is licensed under the Mozilla Public License v2.0.
See the LICENSE file for details.
Usage
import 'package:reader_mode/reader_mode.dart';
// Simple usage with parse() function
final article = parse(htmlString, baseUri: 'https://example.com');
print(article?.title);
print(article?.content);
// With options
final article = parse(
htmlString,
parser: ParserType.html, // Use html package instead of JSDOMParser
charThreshold: 1000,
keepClasses: true,
);
Quick Readability Check
Before parsing, you can check if a page is likely readable:
import 'package:html/parser.dart' as html;
import 'package:reader_mode/reader_mode.dart';
final document = html.parse(htmlString);
if (isProbablyReaderable(document)) {
final article = parse(htmlString);
}
Dual Parser Support
This library supports two parsers via the ParserType enum:
- ParserType.jsdom (default): Port of Mozilla's JSDOM parser
- ParserType.html: Pure Dart html package
Classes
- Article
- Article result from Readability parsing.
- Attribute
- Represents an HTML/XML attribute name-value pair.
- Comment
- Represents an HTML comment node.
- Document
- Represents an HTML document.
- DocumentFragment
- Represents a document fragment.
- DomAttribute
- Interface for an attribute.
- DomDocument
- Interface for a document node.
- DomDocumentFragment
- Interface for a document fragment node.
- DomElement
- Interface for an HTML element node.
- DomNode
- Base interface for all DOM nodes.
- DomStyle
- CSS style interface for element styles.
- Element
- Represents an HTML element node.
- HtmlDomAttribute
- Adapter for html package attributes.
- HtmlDomDocument
- Adapter for html package Document.
- HtmlDomDocumentFragment
- Adapter for html package DocumentFragment.
- HtmlDomElement
- Adapter for html package Element.
- HtmlDomNode
- Adapter for html package Node.
- JsdomDomAttribute
- Adapter for JSDOMParser Attribute.
- JsdomDomDocument
- Adapter for JSDOMParser Document.
- JsdomDomDocumentFragment
- Adapter for JSDOMParser DocumentFragment.
- JsdomDomElement
- Adapter for JSDOMParser Element.
- JsdomDomNode
- Adapter for JSDOMParser Node.
- JSDOMParser
- A lightweight DOM parser that converts HTML strings to a DOM tree.
- Node
- Base class for all DOM nodes.
- Readability
- Main Readability parser class.
- ReadabilityOptions
- Configuration options for the Readability parser.
- ReaderableOptions
- Options for isProbablyReaderable.
- Style
- Represents the style property of an element, backed by the style attribute.
- TextNode
- Represents a text node.
Enums
- NodeType
- Node type constants matching the DOM specification.
- ParserType
- Parser type for HTML content extraction.
- SpecialNodeName
- Special node names for non-element nodes.
Extensions
- ElementExtensions on Element
- Extension methods for Element to provide common DOM operations.
Functions
-
isNodeVisible(
Element node) → bool - Checks whether a node is visible based on its style and attributes.
-
isProbablyReaderable(
Document doc, [ReaderableOptions? options]) → bool - Determines whether a document is likely to contain readable article content.
-
parse(
String html, {ParserType parser = ParserType.jsdom, String? baseUri, bool debug = false, ReadabilityLogger? logger, int maxElemsToParse = 0, int numTopCandidates = 5, int charThreshold = 500, List< String> classesToPreserve = const [], bool keepClasses = false, String serializer(DomElement)?, bool enableJSONLD = true, RegExp? allowedVideoRegex, double linkDensityModifier = 0}) → Article? - Parse HTML content and extract the main article.
Typedefs
- ReadabilityLogger = void Function(List args)
- Callback type for logging messages from the Readability parser.