extract_text 0.0.2
extract_text: ^0.0.2 copied to clipboard
A simple text extractor that supports PDF, DOCX, TXT, and image OCR.
extract_text #
A Flutter package to extract text from PDF, Word (.docx), TXT, and image files (OCR).
Supports Android and iOS for OCR and PDF extraction, and works with TXT and DOCX files on any Dart/Flutter platform.
Features #
- Extract text from PDF files using
read_pdf_text(Android/iOS) - Extract text from Word (.docx) files
- Extract text from plain TXT files
- Extract text from images (JPG, PNG) using ML Kit OCR
- Simple and unified API:
ExtractText.fromFile(filePath)
Installation #
Add the package to your Flutter project in pubspec.yaml:
dependencies:
extract_text: ^0.0.1
Then run:
flutter pub get
Setup Instructions #
- Download Language Files
You need the trained data files and a configuration file for Tesseract OCR.
- English (eng.traineddata)
- Myanmar (mya.traineddata)
- Thai (tha.traineddata)
- Tessdata Config (tessdata_config.json)
From Original Resources
- Add files to your Flutter project
Create an assets folder in your project, for example:
your_app/
assets/
tessdata/
eng.traineddata
mya.traineddata
tha.traineddata
tessdata_config.json
lib/
- Place all
.traineddatafiles insideassets/tessdata/. - Place
tessdata_config.jsondirectly insideassets/.
- Declare assets in pubspec.yaml
flutter:
assets:
- assets/tessdata/eng.traineddata
- assets/tessdata/mya.traineddata
- assets/tessdata/tha.traineddata
- assets/tessdata_config.json
Usage #
import 'package:extract_text/extract_text.dart';
void main() async {
// Ensure Flutter bindings are initialized if using in main()
// WidgetsFlutterBinding.ensureInitialized();
final filePath = '/path/to/sample.pdf';
try {
final text = await ExtractText.fromFile(filePath);
print('Extracted text:');
print(text);
} catch (e) {
print('Error extracting text: $e');
}
}
Supported File Types #
File Type Extension Notes PDF .pdf Android/iOS only Word .docx Works on all platforms Text .txt Works on all platforms Image .jpg/.jpeg/.png OCR using ML Kit (Android/iOS)
Example You can create a small example/ folder in your package:
import 'package:flutter/material.dart';
import 'package:extract_text/extract_text.dart';
void main() {
runApp(const MyApp());
}
class MyApp extends StatelessWidget {
const MyApp({super.key});
@override
Widget build(BuildContext context) {
return MaterialApp(
home: Scaffold(
appBar: AppBar(title: const Text('Extract Text Example')),
body: Center(
child: ElevatedButton(
onPressed: () async {
final text = await ExtractText.fromFile('/storage/emulated/0/Download/sample.pdf');
print(text);
},
child: const Text('Extract PDF Text'),
),
),
),
);
}
}
Changelog #
[0.0.1] #
- Initial release
- PDF, DOCX, TXT, and image (OCR) extraction
[0.0.2] - 2025-10-17 #
Added #
- Integrated tesseract_ocr package to support OCR for image files.
- Added support for Thai (
tha) and Myanmar (mya) language files. - Updated package to load language files and configuration from assets.
- Internal improvements to
ExtractTextandTessdataLoaderfor multi-language support.
Removed #
- Refactored the OCR system to remove Google ML Kit and simplify processing.
- Unified all OCR operations into one method:
final text = await TesseractOCTExtractor.performOcr( imagePath: file.path, languages: ['mya', 'eng', 'tha'], );
Notes #
For OCR and PDF extraction, Android and iOS platforms are required.
Make sure to call WidgetsFlutterBinding.ensureInitialized() before calling ExtractText.fromFile() in main() if your code runs before runApp().
For testing on desktop or Dart CLI, only TXT and DOCX files are supported.
Acknowledgements #
This package uses tesseract_ocr under the hood for optical character recognition.
Special thanks to the Tesseract OCR project and its contributors for providing an open-source OCR engine.