scrapy 0.0.3

  • Readme
  • Changelog
  • Example
  • Installing
  • 85

scrapy #

pub package

Scrapy, a fast high-level web crawling & scraping framework for dart and Flutter

Getting started #

import 'package:scrapy/scrapy.dart';
import 'package:html/parser.dart' as html;
import 'package:http/http.dart';

class Quote extends Item {
  String quote;
  Quote({this.quote});
  @override
  String toString() {
    return "Quote : { quote : $quote }";
  }

  @override
  Map<String, dynamic> toJson() => {
        "quote": quote == null ? null : quote,
      };
  factory Quote.fromJson(String str) => Quote.fromMap(json.decode(str));
  factory Quote.fromMap(Map<String, dynamic> json) => Quote(
        quote: json["quote"] == null ? null : json["quote"],
      );
}

class Quotes extends Items {
  @override
  final List<Quote> items;
  Quotes({
    this.items,
  });

  factory Quotes.fromJson(String str) => Quotes.fromMap(json.decode(str));
  factory Quotes.fromMap(Map<String, dynamic> json) => Quotes(
        items: json["items"] == null
            ? null
            : List<Quote>.from(json["items"].map((x) => Quote.fromMap(x))),
      );
}

class BlogSpider extends Spider<Quote,Quotes> {
  Stream<String> parse(Response response) async* {
    final document = html.parse(response.body);
    final nodes = document.querySelectorAll("div.quote> span.text");

    for (var node in nodes) {
      yield node.innerHtml;
    }
  }

  @override
  Stream<String> Transform(Stream<String> stream) async* {
    await for (String parsed in stream) {
      final transformed = parsed;
      yield transformed.substring(1, parsed.length - 1);
    }
  }

  @override
  Stream<Quote> Save(Stream<String> stream) async* {
    await for (String transformed in stream) {
      final quote = Quote(quote: transformed);
      yield quote;
    }
  }
}

main() async {
  final spider = BlogSpider();
  spider.name = "myspider";
  spider.client = Client();
  spider.startUrls = [
    "http://quotes.toscrape.com/page/7/",
    "http://quotes.toscrape.com/page/8/",
    "http://quotes.toscrape.com/page/9/"
  ];

  final stopw = Stopwatch()..start();
  
  await spider.startRequests();
  await spider.saveResult();
  final elapsed = stopw.elapsed;

  print("the program took $elapsed"); //the program took 0:00:00.279733
}

Example #

Here a list view example on flutter showing the quotes we just scrapped and saved on disk.

screencap.png

Lightweight dependencies: #

  • http

TODOs #

0.0.3 #

  • Adding path parameter
  • Support for flutter
  • Cleaner code, thus breaking change on non camel case functions.
  • Move away from dio http library. The library have only one dependency on http package

0.0.2 #

  • Badge

0.0.2-1 #

  • Updated README and increas score on pub

0.0.1 #

  • Initial version

example/lib/main.dart

import 'package:flutter/material.dart';
import 'package:http/http.dart';

import 'model.dart';
import 'spider.dart';
import 'storage.dart';

void main() async {
  final spider = BlogSpider();
  spider.name = "myspider";
  final storage = QuoteStorage();
  final path = await storage.localPath;
  spider.path = "$path/data.json";
  spider.client = Client();
  spider.startUrls = [
    "http://quotes.toscrape.com/page/7/",
    "http://quotes.toscrape.com/page/8/",
    "http://quotes.toscrape.com/page/9/"
  ];

  final stopw = Stopwatch()..start();

  await spider.startRequests();
  await spider.saveResult();
  final elapsed = stopw.elapsed;

  print("the program took $elapsed");

  print(await storage.getQuotes());

  runApp(
    MaterialApp(
      title: 'Reading and Writing Files',
      home: FlutterDemo(storage: storage),
    ),
  );
}

class FlutterDemo extends StatelessWidget {
  final QuoteStorage storage;

  FlutterDemo({Key key, @required this.storage}) : super(key: key);

  @override
  Widget build(BuildContext context) {
    return Scaffold(
      appBar: AppBar(title: const Text('Scrapy on flutter')),
      body: Center(
        child: FutureBuilder(
            future: storage.getQuotes(),
            builder: (context, AsyncSnapshot<Quotes> snapshot) {
              return snapshot.hasData
                  ? ListView.builder(
                      itemCount: 10,
                      itemBuilder: (context, index) {
                        final quotes = snapshot.data;
                        return Card(
                            child: Padding(
                          padding: const EdgeInsets.all(8.0),
                          child: Text(quotes.items[index].quote),
                        ));
                      },
                    )
                  : const CircularProgressIndicator();
            }),
      ),
    );
  }
}

Use this package as a library

1. Depend on it

Add this to your package's pubspec.yaml file:


dependencies:
  scrapy: ^0.0.3

2. Install it

You can install packages from the command line:

with pub:


$ pub get

with Flutter:


$ flutter pub get

Alternatively, your editor might support pub get or flutter pub get. Check the docs for your editor to learn more.

3. Import it

Now in your Dart code, you can use:


import 'package:scrapy/scrapy.dart';
  
Popularity:
Describes how popular the package is relative to other packages. [more]
75
Health:
Code health derived from static analysis. [more]
99
Maintenance:
Reflects how tidy and up-to-date the package is. [more]
86
Overall:
Weighted score of the above. [more]
85
Learn more about scoring.

We analyzed this package on Jul 2, 2020, and provided a score, details, and suggestions below. Analysis was completed with status completed using:

  • Dart: 2.8.4
  • pana: 0.13.9+1

Analysis suggestions

Package not compatible with runtime flutter-web of web

Because of the import of dart:io via the import chain package:scrapy/scrapy.dart->package:scrapy/src/spider.dart->dart:io

Package not compatible with runtime web

Because of the import of dart:io via the import chain package:scrapy/scrapy.dart->package:scrapy/src/spider.dart->dart:io

Health issues and suggestions

Document public APIs. (-1 points)

32 out of 32 API elements have no dartdoc comment.Providing good documentation for libraries, classes, functions, and other API elements improves code readability and helps developers find and use your API.

Maintenance suggestions

Package is pre-v0.1 release. (-10 points)

While nothing is inherently wrong with versions of 0.0.*, it might mean that the author is still experimenting with the general direction of the API.

Package is getting outdated. (-3.84 points)

The package was last published 54 weeks ago.

Dependencies

Package Constraint Resolved Available
Direct dependencies
Dart SDK >=2.0.0 <3.0.0
http ^0.12.0+2 0.12.1
Transitive dependencies
charcode 1.1.3
collection 1.14.13
http_parser 3.1.4
meta 1.1.8 1.2.0
path 1.7.0
pedantic 1.9.0 1.9.1
source_span 1.7.0
string_scanner 1.0.5
term_glyph 1.1.0
typed_data 1.2.0