Beautiful Soup Dart

Dart native package inspired by Beautiful Soup 4 Python library. Provides easy ways of navigating, searching, and modifying the HTML tree.

Usage

A simple usage example:

import 'package:beautiful_soup_dart/beautiful_soup.dart';

/// 1. parse a document String
BeautifulSoup bs = BeautifulSoup(html_doc_string);
// use BeautifulSoup.fragment(html_doc_string) if you parse a part of html

/// 2. navigate quickly to any element
bs.body!.a!; // navigate quickly with tags, use outerHtml or toString to get outer html
bs.find('p', class_: 'story'); // finds first element with html tag "p" and which has "class" attribute with value "story"
bs.findAll('a', attrs: {'class': true}); // finds all elements with html tag "a" and which have defined "class" attribute with whatever value
bs.find('', selector: '#link1'); // find with custom CSS selector (other parameters are ignored)
bs.find('*', id: 'link1'); // any element with id "link1"
bs.find('*', regex: r'^b'); // find any element which tag starts with "b", for example: body, b, ...
bs.find('p', string: r'^Article #\d*'); // find "p" element which text starts with "Article #[number]"
bs.find('a', attrs: {'href': 'http://example.com/elsie'}); // finds by "href" attribute

/// 3. perform any other actions for the navigated element
Bs4Element bs4 = bs.body!.p!; // navigate quickly with tags
bs4.name; // get tag name
bs4.string; // get text
bs4.toString(); // get String representation of this element, same as outerHtml
bs4.innerHtml; // get html elements inside the element
bs4.className; // get class attribute value
bs4['class']; // get class attribute value
bs4['class'] = 'board'; // change class attribute value to 'board'
bs4.children; // get all element's children elements
bs4.replaceWith(otherBs4Element); // replace with other element
... and many more

Check test folder for more examples.

The unlinked titles are not yet implemented.

Navigating the tree
- Going down
- Going up
  - .parent
  - .parents
- Going sideways
  - .nextSibling and .previousSibling
  - .nextSiblings and .previousSiblings
- Going back and forth
  - .nextElement and .previousElement - returns next/previous Bs4Element
  - .nextElements and .previousElements
  - .nextParsed and .previousParsed - returns next/previous any parsed Node (doc comments, tags, text), to get its data as String use node.data
  - .nextParsedAll and .previousParsedAll
Searching the tree
- findFirstAny() - returns the top most (first) element of the parse tree, of any tag type
- findAll()
- find()
- findParents() and findParent()
- findNextSiblings() and findNextSibling()
- findPreviousSiblings() and findPreviousSibling()
- findAllNextElements() and findNextElement()
- findAllPreviousElements() and findPreviousElement()
- findNextParsedAll() and findNextParsed()
- findPreviousParsedAll() and findPreviousParsed()
Modifying the tree
- Changing tag names and attributes
- Modifying .string
- append()
- extend()
- newTag()
- insert()
- insertBefore() and insertAfter()
- clear()
- extract()
- decompose()
- replaceWith()
- wrap()
- unwrap()
- smooth()
Output
- prettify() - partial support
- .text and getText()

Other methods from the Element from html package can be accessed via bs4element.element.

Features and bugs

Please file feature requests and bugs at the issue tracker or feel free to raise a PR.

Beautiful Soup Dart

Usage

Table of Contents

Features and bugs

Libraries

beautiful_soup_dart package