DocTextExtractor
A Flutter package for extracting text from Word (.doc, .docx), PDF, Markdown(.md) and Google Docs URLs
DocTextExtractor is a lightweight Flutter package that extracts text from Word (.doc, .docx), PDF, Markdown(.md) and Google Docs URLs, with offline .doc support and real filename extraction. Perfect for AI-driven apps like NotteChat, it enables document-based chat and analysis by processing legacy and modern formats efficiently.
Features
- Word (.doc, .docx) Extraction: Parse legacy .doc files offline and .docx files via XML.
- PDF Extraction: Extract text from PDFs using Syncfusion.
- Google Docs Support: Download PDF exports from Google Docs URLs with real filename extraction.
- Offline Support: Process local .doc, .docx, .md, and PDF files without internet.
- Real Filename Extraction: Retrieve accurate document names from Content-Disposition headers or URLs.
- Cross-Platform: Works on iOS, Android, and web via Flutter.
Installation
Add the package to your pubspec.yaml
:
dependencies:
doc_text_extractor: ^1.0.0
Run:
flutter pub get
Usage
Extract Text from a URL
import 'package:doc_text_extractor/doc_text_extractor.dart';
void main() async {
final extractor = TextExtractor();
try {
// Extract text from a Google Docs URL
final result = await extractor.extractText('https://docs.google.com/document/d/EXAMPLE_ID/edit');
print('Filename: ${result['filename']}');
print('Text: ${result['text']}');
// Extract text from a .doc URL
final docResult = await extractor.extractText('https://example.com/sample.doc');
print('Filename: ${docResult['filename']}');
print('Text: ${docResult['text']}');
// Extract text from a .md URL
final mdResult = await extractor.extractText('https://example.com/sample.md');
print('Filename: ${mdResult['filename']}');
print('Text: ${mdResult['text']}');
} catch (e) {
print('Error: $e');
}
}
Extract Text from a Local File
import 'package:doc_text_extractor/doc_text_extractor.dart';
import 'package:path_provider/path_provider.dart';
import 'dart:io';
void main() async {
final extractor = TextExtractor();
try {
final dir = await getTemporaryDirectory();
final filePath = '${dir.path}/sample.pdf';
// Assume sample.pdf exists in temporary directory
final result = await extractor.extractText(filePath, isUrl: false);
print('Filename: ${result['filename']}');
print('Text: ${result['text']}');
} catch (e) {
print('Error: $e');
}
}
Dependencies
http
: Fetches document URLs.syncfusion_flutter_pdf
: Extracts PDF text.archive
andxml
: Parse .docx files.
Limitations
- Google Docs URLs must be publicly accessible or shared with export permissions.
- Large files (>10MB) may require loading dialogs for optimal UX.
Contributing
Contributions are welcome! Fork the repository, create a branch, and submit a pull request. Report issues at GitHub Issues.
License
MIT License. See LICENSE for details.
Contact
- Developer: Destiny Ed
- Email:
talk2destinyed@gmail.com
- Repository: https://github.com/Destiny-Ed/doc_text_extractor