AdvancedWebScraper class
An advanced web scraper with proxy rotation, rate limiting, and more
Constructors
- AdvancedWebScraper.new({required ProxyManager proxyManager, ProxyHttpClient? httpClient, EnhancedRateLimiter? rateLimiter, UserAgentRotator? userAgentRotator, CookieManager? cookieManager, RobotsTxtHandler? robotsTxtHandler, StreamingHtmlParser? streamingParser, MemoryEfficientParser? memoryEfficientParser, ScrapingTaskQueue? taskQueue, ScrapingLogger? logger, int defaultTimeout = 30000, int maxRetries = 3, bool handleCookies = true, bool followRedirects = true, bool respectRobotsTxt = true, int maxConcurrentTasks = 5})
- Creates a new AdvancedWebScraper with the given parameters
Properties
- hashCode → int
-
The hash code for this object.
no setterinherited
- pendingTaskCount → int
-
Gets the number of pending tasks
no setter
- runningTaskCount → int
-
Gets the number of running tasks
no setter
- runtimeType → Type
-
A representation of the runtime type of the object.
no setterinherited
- totalTaskCount → int
-
Gets the total number of tasks (pending + running)
no setter
Methods
-
clearPendingTasks(
) → void - Clears all pending tasks
-
close(
) → void - Closes the HTTP client
-
extractData(
{required String html, required String selector, String? attribute, bool asText = true}) → List< String> - Parses HTML content and extracts data using CSS selectors
-
extractDataEfficient(
{required String html, required String selector, String? attribute, bool asText = true, int chunkSize = 1024 * 1024}) → List< String> - Extracts data using memory-efficient parsing for large HTML documents
-
extractDataStream(
{required String url, required String selector, String? attribute, bool asText = true, Map< String, String> ? headers, int? timeout, int? retries, int priority = 0, int chunkSize = 1024 * 1024}) → Stream<String> - Extracts data from a URL using streaming for memory efficiency
-
extractStructuredData(
{required String html, required Map< String, String> selectors, Map<String, String?> ? attributes}) → List<Map< String, String> > - Parses HTML content and extracts structured data using CSS selectors
-
extractStructuredDataEfficient(
{required String html, required Map< String, String> selectors, Map<String, String?> ? attributes, int chunkSize = 1024 * 1024}) → List<Map< String, String> > - Extracts structured data using memory-efficient parsing for large HTML documents
-
extractStructuredDataStream(
{required String url, required Map< String, String> selectors, Map<String, String?> ? attributes, Map<String, String> ? headers, int? timeout, int? retries, int priority = 0, int chunkSize = 1024 * 1024}) → Stream<Map< String, String> > - Extracts structured data from a URL using streaming for memory efficiency
-
fetchHtml(
{required String url, Map< String, String> ? headers, int? timeout, int? retries, int priority = 0}) → Future<String> - Fetches HTML content from the given URL
-
fetchHtmlBatch(
{required List< String> urls, Map<String, String> ? headers, int? timeout, int? retries, void onProgress(int completed, int total, String url)?}) → Future<Map< String, String> > - Fetches HTML content from multiple URLs concurrently
-
fetchHtmlStream(
{required String url, Map< String, String> ? headers, int? timeout, int? retries, int priority = 0}) → Future<Stream< List< >int> > - Fetches HTML content as a stream from the given URL
-
fetchJson(
{required String url, Map< String, String> ? headers, int? timeout, int? retries, int priority = 0}) → Future<Map< String, dynamic> > - Fetches JSON content from the given URL
-
noSuchMethod(
Invocation invocation) → dynamic -
Invoked when a nonexistent method or property is accessed.
inherited
-
submitForm(
{required String url, String method = 'POST', required Map< String, String> formData, Map<String, String> ? headers, int? timeout, int? retries}) → Future<String> - Submits a form with the given data
-
toString(
) → String -
A string representation of this object.
inherited
Operators
-
operator ==(
Object other) → bool -
The equality operator.
inherited
Static Methods
-
create(
{required ProxyManager proxyManager, int defaultTimeout = 30000, int maxRetries = 3, bool handleCookies = true, bool followRedirects = true, bool respectRobotsTxt = true, int maxConcurrentTasks = 5}) → Future< AdvancedWebScraper> - Factory constructor to create an AdvancedWebScraper with default components