AdvancedWebScraper class

An advanced web scraper with proxy rotation, rate limiting, and more

Constructors

AdvancedWebScraper.new({required ProxyManager proxyManager, ProxyHttpClient? httpClient, EnhancedRateLimiter? rateLimiter, UserAgentRotator? userAgentRotator, CookieManager? cookieManager, RobotsTxtHandler? robotsTxtHandler, StreamingHtmlParser? streamingParser, MemoryEfficientParser? memoryEfficientParser, ScrapingTaskQueue? taskQueue, ScrapingLogger? logger, int defaultTimeout = 30000, int maxRetries = 3, bool handleCookies = true, bool followRedirects = true, bool respectRobotsTxt = true, int maxConcurrentTasks = 5})
Creates a new AdvancedWebScraper with the given parameters

Properties

hashCode int
The hash code for this object.
no setterinherited
pendingTaskCount int
Gets the number of pending tasks
no setter
runningTaskCount int
Gets the number of running tasks
no setter
runtimeType Type
A representation of the runtime type of the object.
no setterinherited
totalTaskCount int
Gets the total number of tasks (pending + running)
no setter

Methods

clearPendingTasks() → void
Clears all pending tasks
close() → void
Closes the HTTP client
extractData({required String html, required String selector, String? attribute, bool asText = true}) List<String>
Parses HTML content and extracts data using CSS selectors
extractDataEfficient({required String html, required String selector, String? attribute, bool asText = true, int chunkSize = 1024 * 1024}) List<String>
Extracts data using memory-efficient parsing for large HTML documents
extractDataStream({required String url, required String selector, String? attribute, bool asText = true, Map<String, String>? headers, int? timeout, int? retries, int priority = 0, int chunkSize = 1024 * 1024}) Stream<String>
Extracts data from a URL using streaming for memory efficiency
extractStructuredData({required String html, required Map<String, String> selectors, Map<String, String?>? attributes}) List<Map<String, String>>
Parses HTML content and extracts structured data using CSS selectors
extractStructuredDataEfficient({required String html, required Map<String, String> selectors, Map<String, String?>? attributes, int chunkSize = 1024 * 1024}) List<Map<String, String>>
Extracts structured data using memory-efficient parsing for large HTML documents
extractStructuredDataStream({required String url, required Map<String, String> selectors, Map<String, String?>? attributes, Map<String, String>? headers, int? timeout, int? retries, int priority = 0, int chunkSize = 1024 * 1024}) Stream<Map<String, String>>
Extracts structured data from a URL using streaming for memory efficiency
fetchHtml({required String url, Map<String, String>? headers, int? timeout, int? retries, int priority = 0}) Future<String>
Fetches HTML content from the given URL
fetchHtmlBatch({required List<String> urls, Map<String, String>? headers, int? timeout, int? retries, void onProgress(int completed, int total, String url)?}) Future<Map<String, String>>
Fetches HTML content from multiple URLs concurrently
fetchHtmlStream({required String url, Map<String, String>? headers, int? timeout, int? retries, int priority = 0}) Future<Stream<List<int>>>
Fetches HTML content as a stream from the given URL
fetchJson({required String url, Map<String, String>? headers, int? timeout, int? retries, int priority = 0}) Future<Map<String, dynamic>>
Fetches JSON content from the given URL
noSuchMethod(Invocation invocation) → dynamic
Invoked when a nonexistent method or property is accessed.
inherited
submitForm({required String url, String method = 'POST', required Map<String, String> formData, Map<String, String>? headers, int? timeout, int? retries}) Future<String>
Submits a form with the given data
toString() String
A string representation of this object.
inherited

Operators

operator ==(Object other) bool
The equality operator.
inherited

Static Methods

create({required ProxyManager proxyManager, int defaultTimeout = 30000, int maxRetries = 3, bool handleCookies = true, bool followRedirects = true, bool respectRobotsTxt = true, int maxConcurrentTasks = 5}) Future<AdvancedWebScraper>
Factory constructor to create an AdvancedWebScraper with default components