BeautifulSoup class
Beautiful Soup is a library for pulling data out of HTML files. It provides ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work.
How it should be used? 3 easy steps.
1. parse a document
BeautifulSoup bs = BeautifulSoup(html_doc_string);
BeautifulSoup bs = BeautifulSoup.fragment(html_doc_string); // if it is just a part of html
2. navigate quickly to any element
Bs4Element bs4 = bs.body.p; // quickly with tags
Bs4Element bs4 = bs.find('p', class_: 'story'); // finds first element with html tag "p" and which has "class" attribute with value "story"
Bs4Element bs4 = bs.findAll('a', attrs: {'class': true}); // finds all elements with html tag "a" and which have defined "class" attribute with whatever value
Bs4Element bs4 = bs.find('', selector: '#link1'); // find with custom CSS selector (other parameters are ignored)
Bs4Element bs4 = bs.find('*', id: 'link1'); // find by id
Bs4Element bs4 = bs.find('*', regex: r'^b'); // find any element which tag starts with "b", for example: body, b, ...
Bs4Element bs4 = bs.find('p', string: r'^Article #\d*'); // find "p" element which text starts with "Article #[number]"
Bs4Element bs4 = bs.find('a', attrs: {'href': 'http://example.com/elsie'}); // finds by "href" attribute
3. perform any actions
bs4.name; // get tag name
bs4.string; // get text
bs4.toString(); // get String representation of this element, same as outerHtml
bs4.innerHtml; // get html elements inside the element
bs4.className; // get class attribute value
bs4['class']; // get class attribute value
bs4['class'] = 'board'; // change class attribute value to 'board'
bs4.children; // get all element's children elements
bs4.replaceWith(otherBs4Element); // replace with other element
and many more!
Constructors
- BeautifulSoup(String html_doc)
- Beautiful Soup is a library for pulling data out of HTML files. It provides ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work.
- BeautifulSoup.fragment(String html_doc)
- Beautiful Soup is a library for pulling data out of HTML files. It provides ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work.
Properties
- a → Bs4Element?
-
Returns the first occurrence of this tag down the parse tree.
no setterinherited
- b → Bs4Element?
-
Returns the first occurrence of this tag down the parse tree.
no setterinherited
- body → Bs4Element?
-
Returns the first occurrence of this tag down the parse tree.
no setterinherited
- dl → Bs4Element?
-
Returns the first occurrence of this tag down the parse tree.
no setterinherited
- doc ↔ dynamic
-
Returns
Document
orDocumentFragment
, based on what parser was used with the BeautifulSoup constructor.getter/setter pairinherited - element ↔ Element?
-
getter/setter pairinherited
- h1 → Bs4Element?
-
Returns the first occurrence of this tag down the parse tree.
no setterinherited
- h2 → Bs4Element?
-
Returns the first occurrence of this tag down the parse tree.
no setterinherited
- h3 → Bs4Element?
-
Returns the first occurrence of this tag down the parse tree.
no setterinherited
- h4 → Bs4Element?
-
Returns the first occurrence of this tag down the parse tree.
no setterinherited
- h5 → Bs4Element?
-
Returns the first occurrence of this tag down the parse tree.
no setterinherited
- h6 → Bs4Element?
-
Returns the first occurrence of this tag down the parse tree.
no setterinherited
- hashCode → int
-
The hash code for this object.
no setterinherited
- head → Bs4Element?
-
Returns the first occurrence of this tag down the parse tree.
no setterinherited
- html → Bs4Element?
-
Returns the first occurrence of this tag down the parse tree.
no setterinherited
- i → Bs4Element?
-
Returns the first occurrence of this tag down the parse tree.
no setterinherited
- img → Bs4Element?
-
Returns the first occurrence of this tag down the parse tree.
no setterinherited
- ol → Bs4Element?
-
Returns the first occurrence of this tag down the parse tree.
no setterinherited
- p → Bs4Element?
-
Returns the first occurrence of this tag down the parse tree.
no setterinherited
- runtimeType → Type
-
A representation of the runtime type of the object.
no setterinherited
- table → Bs4Element?
-
Returns the first occurrence of this tag down the parse tree.
no setterinherited
- text → String
-
Returns the text of an element.
no setterinherited
- title → Bs4Element?
-
Returns the first occurrence of this tag down the parse tree.
no setterinherited
- ul → Bs4Element?
-
Returns the first occurrence of this tag down the parse tree.
no setterinherited
Methods
-
find(
String name, {String? id, String? class_, Map< String, Object> ? attrs, Pattern? regex, Pattern? string, String? selector}) → Bs4Element? -
Looks through a tag’s descendants and retrieves descendant
that matches your filters.
inherited
-
findAll(
String name, {String? id, String? class_, Map< String, Object> ? attrs, Pattern? regex, Pattern? string, String? selector, int? limit}) → List<Bs4Element> -
Looks through a tag’s descendants and retrieves all descendants
that match your filters.
inherited
-
findAllNextElements(
String name, {String? id, String? class_, Map< String, Object> ? attrs, Pattern? regex, Pattern? string, String? selector, int? limit}) → List<Bs4Element> -
These methods use
nextElements
to iterate over elements that come after it in the document.inherited -
findAllPreviousElements(
String name, {String? id, String? class_, Map< String, Object> ? attrs, Pattern? regex, Pattern? string, String? selector, int? limit}) → List<Bs4Element> -
These methods use
previousElements
to iterate over the tags and strings that came before it in the document.inherited -
findFirstAny(
) → Bs4Element? -
Returns the top most (first) element of the parse tree, of any tag type.
inherited
-
findNextElement(
String name, {String? id, String? class_, Map< String, Object> ? attrs, Pattern? regex, Pattern? string, String? selector}) → Bs4Element? -
These methods use
nextElements
to iterate over elements that come after it in the document.inherited -
findNextParsed(
{RegExp? pattern, int? nodeType}) → Node? -
These methods use
nextParsed
to iterate over the tags, comments, strings, etc. that came after it in the document.inherited -
findNextParsedAll(
{RegExp? pattern, int? nodeType, int? limit}) → List< Node> -
These methods use
nextParsed
to iterate over the tags, comments, strings, etc. that came after it in the document.inherited -
findNextSibling(
String name, {String? id, String? class_, Map< String, Object> ? attrs, Pattern? regex, Pattern? string, String? selector}) → Bs4Element? -
These methods use
nextSiblings
to iterate over the rest of an element’s siblings in the tree.inherited -
findNextSiblings(
String name, {String? id, String? class_, Map< String, Object> ? attrs, Pattern? regex, Pattern? string, String? selector, int? limit}) → List<Bs4Element> -
These methods use
nextSiblings
to iterate over the rest of an element’s siblings in the tree.inherited -
findParent(
String name, {String? id, String? class_, Map< String, Object> ? attrs, Pattern? regex, Pattern? string, String? selector}) → Bs4Element? -
findAll and find work their way down the tree, looking at tag’s
descendants.
inherited
-
findParents(
String name, {String? id, String? class_, Map< String, Object> ? attrs, Pattern? regex, Pattern? string, String? selector, int? limit}) → List<Bs4Element> -
findAll and find work their way down the tree, looking at tag’s
descendants.
inherited
-
findPreviousElement(
String name, {String? id, String? class_, Map< String, Object> ? attrs, Pattern? regex, Pattern? string, String? selector}) → Bs4Element? -
These methods use
previousElements
to iterate over the tags and strings that came before it in the document.inherited -
findPreviousParsed(
{RegExp? pattern, int? nodeType}) → Node? -
These methods use
previousParsed
to iterate over the tags, comments, strings, etc. that came before it in the document.inherited -
findPreviousParsedAll(
{RegExp? pattern, int? nodeType, int? limit}) → List< Node> -
These methods use
previousParsed
to iterate over the tags, comments, strings, etc. that came before it in the document.inherited -
findPreviousSibling(
String name, {String? id, String? class_, Map< String, Object> ? attrs, Pattern? regex, Pattern? string, String? selector}) → Bs4Element? -
These methods use
previousSiblings
to iterate over an element’s siblings that precede it in the tree.inherited -
findPreviousSiblings(
String name, {String? id, String? class_, Map< String, Object> ? attrs, Pattern? regex, Pattern? string, String? selector, int? limit}) → List<Bs4Element> -
These methods use
previousSiblings
to iterate over an element’s siblings that precede it in the tree.inherited -
getText(
{String separator = '', bool strip = false}) → String -
Returns the text of an element.
inherited
-
noSuchMethod(
Invocation invocation) → dynamic -
Invoked when a nonexistent method or property is accessed.
inherited
-
prettify(
) → String -
The method will turn a BeautifulSoup parse tree into a nicely
formatted String, with a separate line for each tag and
each string.
inherited
-
toString(
) → String -
A string representation of this object.
override
Operators
-
operator ==(
Object other) → bool -
The equality operator.
inherited
Static Methods
-
newTag(
String? name, {Map< String, String> ? attrs, String? string}) → Bs4Element - Creates a new Bs4Element.