ConcurrentWebScraper class

A web scraper with concurrency control

Constructors

ConcurrentWebScraper.new({required ProxyManager proxyManager, int maxConcurrentTasks = 5, ProxyHttpClient? httpClient, String? defaultUserAgent, Map<String, String>? defaultHeaders, int defaultTimeout = 30000, int maxRetries = 3, ScrapingLogger? logger, RobotsTxtHandler? robotsTxtHandler, StreamingHtmlParser? streamingParser, bool respectRobotsTxt = true})
Creates a new ConcurrentWebScraper with the given parameters

Properties

hashCode int
The hash code for this object.
no setterinherited
pendingTaskCount int
Gets the number of pending tasks
no setter
runningTaskCount int
Gets the number of running tasks
no setter
runtimeType Type
A representation of the runtime type of the object.
no setterinherited
totalTaskCount int
Gets the total number of tasks (pending + running)
no setter

Methods

clearPendingTasks() → void
Clears all pending tasks
close() → void
Closes the web scraper
extractData({required String url, required String selector, String? attribute, bool asText = true, int priority = 0, Map<String, String>? headers, int? timeout, int? retries, bool ignoreRobotsTxt = false}) Future<List<String>>
Extracts data from a URL with priority
extractDataBatch({required List<String> urls, required String selector, String? attribute, bool asText = true, Map<String, String>? headers, int? timeout, int? retries, bool ignoreRobotsTxt = false, void onProgress(int completed, int total, String url)?}) Future<Map<String, List<String>>>
Extracts data from multiple URLs concurrently
extractStructuredData({required String url, required Map<String, String> selectors, Map<String, String?>? attributes, int priority = 0, Map<String, String>? headers, int? timeout, int? retries, bool ignoreRobotsTxt = false}) Future<List<Map<String, String>>>
Extracts structured data from a URL with priority
extractStructuredDataBatch({required List<String> urls, required Map<String, String> selectors, Map<String, String?>? attributes, Map<String, String>? headers, int? timeout, int? retries, bool ignoreRobotsTxt = false, void onProgress(int completed, int total, String url)?}) Future<Map<String, List<Map<String, String>>>>
Extracts structured data from multiple URLs concurrently
fetchHtml({required String url, int priority = 0, Map<String, String>? headers, int? timeout, int? retries, bool ignoreRobotsTxt = false}) Future<String>
Fetches HTML content from a URL with priority
fetchHtmlBatch({required List<String> urls, Map<String, String>? headers, int? timeout, int? retries, bool ignoreRobotsTxt = false, void onProgress(int completed, int total, String url)?}) Future<Map<String, String>>
Fetches HTML content from multiple URLs concurrently
noSuchMethod(Invocation invocation) → dynamic
Invoked when a nonexistent method or property is accessed.
inherited
toString() String
A string representation of this object.
inherited

Operators

operator ==(Object other) bool
The equality operator.
inherited

Static Methods

create({required ProxyManager proxyManager, int maxConcurrentTasks = 5, int defaultTimeout = 30000, int maxRetries = 3, bool respectRobotsTxt = true}) Future<ConcurrentWebScraper>
Factory constructor to create a ConcurrentWebScraper with default components