doc_text_extractor 1.0.0 copy "doc_text_extractor: ^1.0.0" to clipboard
doc_text_extractor: ^1.0.0 copied to clipboard

A Flutter package for extracting text from Word (.doc, .docx), PDF, Google Docs URLs, and Markdown (.md) files, with offline .doc and .md support and real filename extraction.

DocTextExtractor #

A Flutter package for extracting text from Word (.doc, .docx), PDF, Markdown(.md) and Google Docs URLs

DocTextExtractor is a lightweight Flutter package that extracts text from Word (.doc, .docx), PDF, Markdown(.md) and Google Docs URLs, with offline .doc support and real filename extraction. Perfect for AI-driven apps like NotteChat, it enables document-based chat and analysis by processing legacy and modern formats efficiently.

Features #

  • Word (.doc, .docx) Extraction: Parse legacy .doc files offline and .docx files via XML.
  • PDF Extraction: Extract text from PDFs using Syncfusion.
  • Google Docs Support: Download PDF exports from Google Docs URLs with real filename extraction.
  • Offline Support: Process local .doc, .docx, .md, and PDF files without internet.
  • Real Filename Extraction: Retrieve accurate document names from Content-Disposition headers or URLs.
  • Cross-Platform: Works on iOS, Android, and web via Flutter.

Installation #

Add the package to your pubspec.yaml:

dependencies:
  doc_text_extractor: ^1.0.0

Run:

flutter pub get

Usage #

Extract Text from a URL #

import 'package:doc_text_extractor/doc_text_extractor.dart';

void main() async {
  final extractor = TextExtractor();
  try {
    // Extract text from a Google Docs URL
    final result = await extractor.extractText('https://docs.google.com/document/d/EXAMPLE_ID/edit');
    print('Filename: ${result['filename']}');
    print('Text: ${result['text']}');

    // Extract text from a .doc URL
    final docResult = await extractor.extractText('https://example.com/sample.doc');
    print('Filename: ${docResult['filename']}');
    print('Text: ${docResult['text']}');

    // Extract text from a .md URL
    final mdResult = await extractor.extractText('https://example.com/sample.md');
    print('Filename: ${mdResult['filename']}');
    print('Text: ${mdResult['text']}');
  } catch (e) {
    print('Error: $e');
  }
}

Extract Text from a Local File #

import 'package:doc_text_extractor/doc_text_extractor.dart';
import 'package:path_provider/path_provider.dart';
import 'dart:io';

void main() async {
  final extractor = TextExtractor();
  try {
    final dir = await getTemporaryDirectory();
    final filePath = '${dir.path}/sample.pdf';
    // Assume sample.pdf exists in temporary directory
    final result = await extractor.extractText(filePath, isUrl: false);
    print('Filename: ${result['filename']}');
    print('Text: ${result['text']}');
  } catch (e) {
    print('Error: $e');
  }
}

Dependencies #

  • http: Fetches document URLs.
  • syncfusion_flutter_pdf: Extracts PDF text.
  • archive and xml: Parse .docx files.

Limitations #

  • Google Docs URLs must be publicly accessible or shared with export permissions.
  • Large files (>10MB) may require loading dialogs for optimal UX.

Contributing #

Contributions are welcome! Fork the repository, create a branch, and submit a pull request. Report issues at GitHub Issues.

License #

MIT License. See LICENSE for details.

Contact #

5
likes
150
points
87
downloads

Publisher

unverified uploader

Weekly Downloads

A Flutter package for extracting text from Word (.doc, .docx), PDF, Google Docs URLs, and Markdown (.md) files, with offline .doc and .md support and real filename extraction.

Repository (GitHub)

Documentation

API reference

License

MIT (license)

Dependencies

archive, flutter, http, markdown, syncfusion_flutter_pdf, xml

More

Packages that depend on doc_text_extractor