TextPage class
Represents a parsed text page, equivalent to PyMuPDF's fitz.TextPage.
Extracts text from the page content stream using the PDF text operators. Supports output in multiple formats:
- Plain text
- Blocks (with bounding boxes)
- Words (with bounding boxes)
- Dict (nested structure)
- HTML
- XHTML
- XML
Constructors
- TextPage.fromPage(Page page, PdfParser parser)
-
Create a TextPage by parsing the page's content stream.
factory
Properties
Methods
-
debugRuns(
) → List< Map< String, dynamic> > - Debug: dump text runs with positions.
-
extractBlocks(
) → List< TextBlock> - Extract text as blocks with bounding boxes.
-
extractDict(
{bool raw = false}) → TextDict - Extract text as a detailed dictionary.
-
extractHtml(
) → String - Extract text as HTML.
-
extractText(
) → String - Extract plain text from the page.
-
extractWords(
) → List< TextWord> - Extract words with bounding boxes.
-
extractXhtml(
) → String - Extract text as XHTML.
-
extractXml(
) → String - Extract text as XML.
-
noSuchMethod(
Invocation invocation) → dynamic -
Invoked when a nonexistent method or property is accessed.
inherited
-
toString(
) → String -
A string representation of this object.
inherited
Operators
-
operator ==(
Object other) → bool -
The equality operator.
inherited