extract_text 0.0.2 copy "extract_text: ^0.0.2" to clipboard
extract_text: ^0.0.2 copied to clipboard

A simple text extractor that supports PDF, DOCX, TXT, and image OCR.

extract_text #

A Flutter package to extract text from PDF, Word (.docx), TXT, and image files (OCR).
Supports Android and iOS for OCR and PDF extraction, and works with TXT and DOCX files on any Dart/Flutter platform.


Features #

  • Extract text from PDF files using read_pdf_text (Android/iOS)
  • Extract text from Word (.docx) files
  • Extract text from plain TXT files
  • Extract text from images (JPG, PNG) using ML Kit OCR
  • Simple and unified API: ExtractText.fromFile(filePath)

Installation #

Add the package to your Flutter project in pubspec.yaml:

dependencies:
  extract_text: ^0.0.1

Then run:

flutter pub get

Setup Instructions #

  1. Download Language Files

You need the trained data files and a configuration file for Tesseract OCR.

From Original Resources

  1. Add files to your Flutter project

Create an assets folder in your project, for example:

your_app/

assets/
    tessdata/
        eng.traineddata
        mya.traineddata
        tha.traineddata
    tessdata_config.json
lib/
  • Place all .traineddata files inside assets/tessdata/.
  • Place tessdata_config.json directly inside assets/.
  1. Declare assets in pubspec.yaml
   flutter:
   assets:
    - assets/tessdata/eng.traineddata
    - assets/tessdata/mya.traineddata
    - assets/tessdata/tha.traineddata
    - assets/tessdata_config.json

Usage #

import 'package:extract_text/extract_text.dart';

void main() async {
  // Ensure Flutter bindings are initialized if using in main()
  // WidgetsFlutterBinding.ensureInitialized();

  final filePath = '/path/to/sample.pdf';

  try {
    final text = await ExtractText.fromFile(filePath);
    print('Extracted text:');
    print(text);
  } catch (e) {
    print('Error extracting text: $e');
  }
}

Supported File Types #

File Type Extension Notes PDF .pdf Android/iOS only Word .docx Works on all platforms Text .txt Works on all platforms Image .jpg/.jpeg/.png OCR using ML Kit (Android/iOS)

Example You can create a small example/ folder in your package:

import 'package:flutter/material.dart';
import 'package:extract_text/extract_text.dart';

void main() {
  runApp(const MyApp());
}

class MyApp extends StatelessWidget {
  const MyApp({super.key});

  @override
  Widget build(BuildContext context) {
    return MaterialApp(
      home: Scaffold(
        appBar: AppBar(title: const Text('Extract Text Example')),
        body: Center(
          child: ElevatedButton(
            onPressed: () async {
              final text = await ExtractText.fromFile('/storage/emulated/0/Download/sample.pdf');
              print(text);
            },
            child: const Text('Extract PDF Text'),
          ),
        ),
      ),
    );
  }
}

Changelog #

[0.0.1] #

  • Initial release
  • PDF, DOCX, TXT, and image (OCR) extraction

[0.0.2] - 2025-10-17 #

Added #

  • Integrated tesseract_ocr package to support OCR for image files.
  • Added support for Thai (tha) and Myanmar (mya) language files.
  • Updated package to load language files and configuration from assets.
  • Internal improvements to ExtractText and TessdataLoader for multi-language support.

Removed #

  • Refactored the OCR system to remove Google ML Kit and simplify processing.
  • Unified all OCR operations into one method:
    final text = await TesseractOCTExtractor.performOcr(
      imagePath: file.path,
      languages: ['mya', 'eng', 'tha'],
    );
    

Notes #

For OCR and PDF extraction, Android and iOS platforms are required.

Make sure to call WidgetsFlutterBinding.ensureInitialized() before calling ExtractText.fromFile() in main() if your code runs before runApp().

For testing on desktop or Dart CLI, only TXT and DOCX files are supported.

Acknowledgements #

This package uses tesseract_ocr under the hood for optical character recognition.

Special thanks to the Tesseract OCR project and its contributors for providing an open-source OCR engine.

4
likes
160
points
27
downloads

Publisher

unverified uploader

Weekly Downloads

A simple text extractor that supports PDF, DOCX, TXT, and image OCR.

Repository (GitHub)
View/report issues

Documentation

API reference

License

MIT (license)

Dependencies

archive, flutter, path, read_pdf_text, tesseract_ocr, xml

More

Packages that depend on extract_text