A collection of functions for parsing structured data from web pages. The library supports parsing microdata, rdfa, and jsonld forms of structured data embedded in a web page.
This has not been used in any production application yet, so it may be lacking features or have some bugs that I haven't encountered yet. Please create issue reports on github to give feedback. I am actively developing another application that uses this and am fixing issues as I find them, often these updates are potential breaking changes. Use at your own risk, but I would really appreciate any support and feedback!
To parse structured data from a web page:
List<StructuredData> data = await StructuredDataImporter.importUrl("path/to/website");
To parse structured data from an already loaded web page
List<StructuredData> data = StructuredDataParser.extract(htmlDocument);
StructuredData is a dictionary-like object
StructuredData someData; // Load data by importing from URL someData["property"]
Eventually I may overlay some wrapper classes for accessing structured data so it's more than a glorified map.
- This library implements functions for parsing structured data from HTML documents. It also supports importing structured data from a given URL.