ConcurrentWebScraper class
A web scraper with concurrency control
Constructors
-
ConcurrentWebScraper.new({required ProxyManager proxyManager, int maxConcurrentTasks = 5, ProxyHttpClient? httpClient, String? defaultUserAgent, Map<
String, String> ? defaultHeaders, int defaultTimeout = 30000, int maxRetries = 3, ScrapingLogger? logger, RobotsTxtHandler? robotsTxtHandler, StreamingHtmlParser? streamingParser, bool respectRobotsTxt = true}) - Creates a new ConcurrentWebScraper with the given parameters
Properties
- hashCode → int
-
The hash code for this object.
no setterinherited
- pendingTaskCount → int
-
Gets the number of pending tasks
no setter
- runningTaskCount → int
-
Gets the number of running tasks
no setter
- runtimeType → Type
-
A representation of the runtime type of the object.
no setterinherited
- totalTaskCount → int
-
Gets the total number of tasks (pending + running)
no setter
Methods
-
clearPendingTasks(
) → void - Clears all pending tasks
-
close(
) → void - Closes the web scraper
-
extractData(
{required String url, required String selector, String? attribute, bool asText = true, int priority = 0, Map< String, String> ? headers, int? timeout, int? retries, bool ignoreRobotsTxt = false}) → Future<List< String> > - Extracts data from a URL with priority
-
extractDataBatch(
{required List< String> urls, required String selector, String? attribute, bool asText = true, Map<String, String> ? headers, int? timeout, int? retries, bool ignoreRobotsTxt = false, void onProgress(int completed, int total, String url)?}) → Future<Map< String, List< >String> > - Extracts data from multiple URLs concurrently
-
extractStructuredData(
{required String url, required Map< String, String> selectors, Map<String, String?> ? attributes, int priority = 0, Map<String, String> ? headers, int? timeout, int? retries, bool ignoreRobotsTxt = false}) → Future<List< Map< >String, String> > - Extracts structured data from a URL with priority
-
extractStructuredDataBatch(
{required List< String> urls, required Map<String, String> selectors, Map<String, String?> ? attributes, Map<String, String> ? headers, int? timeout, int? retries, bool ignoreRobotsTxt = false, void onProgress(int completed, int total, String url)?}) → Future<Map< String, List< >Map< >String, String> > - Extracts structured data from multiple URLs concurrently
-
fetchHtml(
{required String url, int priority = 0, Map< String, String> ? headers, int? timeout, int? retries, bool ignoreRobotsTxt = false}) → Future<String> - Fetches HTML content from a URL with priority
-
fetchHtmlBatch(
{required List< String> urls, Map<String, String> ? headers, int? timeout, int? retries, bool ignoreRobotsTxt = false, void onProgress(int completed, int total, String url)?}) → Future<Map< String, String> > - Fetches HTML content from multiple URLs concurrently
-
noSuchMethod(
Invocation invocation) → dynamic -
Invoked when a nonexistent method or property is accessed.
inherited
-
toString(
) → String -
A string representation of this object.
inherited
Operators
-
operator ==(
Object other) → bool -
The equality operator.
inherited
Static Methods
-
create(
{required ProxyManager proxyManager, int maxConcurrentTasks = 5, int defaultTimeout = 30000, int maxRetries = 3, bool respectRobotsTxt = true}) → Future< ConcurrentWebScraper> - Factory constructor to create a ConcurrentWebScraper with default components