isOcrRequiredPdfExtractionError static method

bool isOcrRequiredPdfExtractionError(
  1. Object error
)

Returns true when a PDF extraction error indicates a scanned/image-only document — the kind OCR can recover.

The Rust parser surfaces a below-threshold error that shares the same "… fewer than N non-whitespace …" prefix across three cases. It appends the scanned-specific marker for exactly the OCR-recoverable ones, so this keys on that marker rather than the shared prefix:

  • scanned/image-only PDFs with no text layer — marker present, OCR helps;
  • mixed PDFs that are scanned but also have some pages that failed to extract — marker still present, OCR recovers the scanned pages;
  • PDFs where every page failed to extract (corrupt/unsupported content) — no marker, OCR will not help.

Implementation

static bool isOcrRequiredPdfExtractionError(Object error) {
  final message = error.toString();
  return message.contains('PDF text extraction returned fewer than') &&
      message.contains('scanned/image-only');
}