ConcurrentWebScraper class - concurrent_web_scraper library

Constructors

ConcurrentWebScraper.new({required ProxyManager proxyManager, int maxConcurrentTasks = 5, ProxyHttpClient? httpClient, String? defaultUserAgent, Map<String, String>? defaultHeaders, int defaultTimeout = 30000, int maxRetries = 3, ScrapingLogger? logger, RobotsTxtHandler? robotsTxtHandler, StreamingHtmlParser? streamingParser, bool respectRobotsTxt = true}): Creates a new ConcurrentWebScraper with the given parameters

Properties

hashCode → int: The hash code for this object.
no setterinherited
pendingTaskCount → int: Gets the number of pending tasks
no setter
runningTaskCount → int: Gets the number of running tasks
no setter
runtimeType → Type: A representation of the runtime type of the object.
no setterinherited
totalTaskCount → int: Gets the total number of tasks (pending + running)
no setter

Methods

clearPendingTasks() → void: Clears all pending tasks
close() → void: Closes the web scraper
extractData({required String url, required String selector, String? attribute, bool asText = true, int priority = 0, Map<String, String>? headers, int? timeout, int? retries, bool ignoreRobotsTxt = false}) → Future<List<String>>: Extracts data from a URL with priority
extractDataBatch({required List<String> urls, required String selector, String? attribute, bool asText = true, Map<String, String>? headers, int? timeout, int? retries, bool ignoreRobotsTxt = false, void onProgress(int completed, int total, String url)?}) → Future<Map<String, List<String>>>: Extracts data from multiple URLs concurrently
extractStructuredData({required String url, required Map<String, String> selectors, Map<String, String?>? attributes, int priority = 0, Map<String, String>? headers, int? timeout, int? retries, bool ignoreRobotsTxt = false}) → Future<List<Map<String, String>>>: Extracts structured data from a URL with priority
extractStructuredDataBatch({required List<String> urls, required Map<String, String> selectors, Map<String, String?>? attributes, Map<String, String>? headers, int? timeout, int? retries, bool ignoreRobotsTxt = false, void onProgress(int completed, int total, String url)?}) → Future<Map<String, List<Map<String, String>>>>: Extracts structured data from multiple URLs concurrently
fetchHtml({required String url, int priority = 0, Map<String, String>? headers, int? timeout, int? retries, bool ignoreRobotsTxt = false}) → Future<String>: Fetches HTML content from a URL with priority
fetchHtmlBatch({required List<String> urls, Map<String, String>? headers, int? timeout, int? retries, bool ignoreRobotsTxt = false, void onProgress(int completed, int total, String url)?}) → Future<Map<String, String>>: Fetches HTML content from multiple URLs concurrently
noSuchMethod(Invocation invocation) → dynamic: Invoked when a nonexistent method or property is accessed.
inherited
toString() → String: A string representation of this object.
inherited

Operators

operator ==(Object other) → bool: The equality operator.
inherited

Static Methods

create({required ProxyManager proxyManager, int maxConcurrentTasks = 5, int defaultTimeout = 30000, int maxRetries = 3, bool respectRobotsTxt = true}) → Future<ConcurrentWebScraper>: Factory constructor to create a ConcurrentWebScraper with default components