girasol library

Classes

AfterAfterBodyPhase
AfterAfterFramesetPhase
AfterBodyPhase
AfterFramesetPhase
AfterHeadPhase
BeforeHeadPhase
BeforeHtmlPhase
BrowserHttpClient
A browser-like HTTP client that automatically sets appropriate headers
CrawlDocument
CrawlRequest
Represents an HTTP request
CrawlResponse
Represents an HTTP response with its request
CsvItem
Necessary for the item that must be processed by the CSV pipeline
CSVPipeline<Item extends CsvItem>
The CSV pipeline is able to receive an item and save it into a CSV file
FileDocument
FileItem
FilePipeline<Item extends FileItem>
Girasol
It's the orchestrator of the crawlers.
HTMLDocument
HtmlParser
Parser for HTML, which generates a tree structure from a stream of (possibly malformed) characters.
InBodyPhase
InCaptionPhase
InCellPhase
InColumnGroupPhase
InForeignContentPhase
InFramesetPhase
InHeadPhase
InitialPhase
InRowPhase
InSelectInTablePhase
InSelectPhase
InTableBodyPhase
InTablePhase
InTableTextPhase
JsonDocument
JsonItem
Necessary for the item that must be processed by the JSON pipeline
JSONPipeline<Item extends JsonItem>
The JSON pipeline is able to receive an item data and save it into an outputFile
ParsedData<T>
A result containing data extracted from a page
ParsedEmpty
A result containing a discovered link to crawl
ParseResult
Base class for crawling results
Phase
Base class for helper object that implements each phase of processing.
Pipeline<InputType>
A pipeline is used to interact with the scraped data
StaticWebCrawler<ParsedItem>
The static web crawler is a crawler that is based on HTTP requests only. It's light and fast but it cannot run Javascript.
TextDocument
TextPhase
WebCrawler<ParsedItem>
XmlAttribute
XML attribute node.
XmlBuilder
A builder to create XML trees with code.
XmlCDATA
XML CDATA node.
XmlComment
XML comment node.
XmlDeclaration
XML document declaration.
XmlDefaultEntityMapping
Default entity mapping for XML, HTML, and HTML5 entities.
XmlDoctype
XML doctype node.
XmlDocument
XML document node.
XmlDocumentFragment
XML document fragment node.
XmlElement
XML element node.
XmlEntityMapping
Describes the decoding and encoding of character entities.
XmlItem
Necessary for the item that must be processed by the XML pipeline
XmlName
XML entity name.
XmlNode
Immutable abstract XML node.
XmlNullEntityMapping
Entity mapping that skips all entity conversion, both on decoding and encoding input.
XMLPipeline<Item extends XmlItem>
The XML pipeline is able to receive an item and save it into an XML file.
XmlPrettyWriter
A visitor that writes XML nodes correctly indented and with whitespaces adapted.
XmlProcessing
XML processing instruction.
XmlText
XML text node.
XmlToken
Shared tokens for XML reading and writing.
XmlWriter
A visitor that writes XML nodes exactly as they were parsed.

Enums

XmlAttributeType
Enum of the attribute quote types.
XmlNodeType
Enum of the different XML node types.

Mixins

XmlFormatException
Mixin for exceptions that follow the FormatException of Dart.
XmlHasAttributes
Mixin for nodes with attributes.
XmlHasChildren<T extends XmlNode>
Mixin for nodes with children.
XmlHasName
Mixin for all nodes with a name.
XmlHasParent<T extends XmlNode>
Mixin for nodes with a parent.
XmlHasVisitor
Mixin for classes that can be visited using an XmlVisitor.
XmlHasWriter
Mixin to serialize XML to a StringBuffer.
XmlVisitor
Basic visitor over XmlHasVisitor nodes.

Properties

defaultEntityMapping XmlEntityMapping
The entity mapping used when nothing else is specified.
getter/setter pair

Functions

getElementNameTuple(Element e) → (String, String?)
Convenience function to get the pair of namespace and localName.
parse(dynamic input, {String? encoding, bool generateSpans = false, String? sourceUrl}) → Document
Parse an html5 document into a tree.
parseFragment(dynamic input, {String container = 'div', String? encoding, bool generateSpans = false, String? sourceUrl}) → DocumentFragment
Parse an html5 document fragment into a tree.

Typedefs

TokenProccessor = Token Function(Token token)

Exceptions / Errors

ParseError
Error in parsed document.
XmlException
Abstract exception class.
XmlNodeTypeException
Exception thrown when an unsupported node type is used.
XmlParentException
Exception thrown when the parent relationship between nodes is invalid.
XmlParserException
Exception thrown when parsing of an XML document fails.
XmlTagException
Exception thrown when the end tag does not match the open tag.