docx_to_markdown library
Converts DOCX (OOXML) documents to Markdown with configurable output.
This library parses .docx files in memory and produces Markdown text.
Use it when you need to migrate Word documents to Markdown-based systems
(wikis, static site generators, documentation platforms).
Capabilities
- Content extraction: paragraphs, headings, nested lists, tables
- Image handling: extracts embedded images to a directory or asset sink
- Text formatting: bold, italic, strikethrough, underline, inline code
- Extensibility: hooks let you rewrite links, transform blocks, or inject custom logic
Quick Start
import 'package:docx_to_markdown/docx_to_markdown.dart';
final bytes = File('document.docx').readAsBytesSync();
final converter = DocxConverter(bytes);
final markdown = await converter.convert();
Customization
For non-default behavior, pass a DocxToMarkdownConfig:
final config = DocxToMarkdownConfig(
flavor: MarkdownFlavor.gfm,
extractImages: true,
underlineMode: UnderlineMode.html,
);
final converter = DocxConverter(bytes, config: config);
final markdown = await converter.convert(imageOutputDirectory: 'images/');
See also:
- DocxConverter - the main conversion entry point
- DocxToMarkdownConfig - all configuration options
- Document - the intermediate representation if you need lower-level access
Classes
- Block
- A block-level IR node (paragraph, heading, list, table, etc.).
- CodeBlock
- A fenced code block with optional language hint.
- CodeInline
- Inline code (monospace text).
- ColorInline
-
Colored text (Word
w:color). - DefinitionListBlock
- A definition list (description list) mapping terms to their definitions.
- DefinitionListItem
- A single entry in a DefinitionListBlock: a term and its definition body.
- DocNode
- Base class for all IR nodes.
- Document
- The root IR node representing a complete DOCX document.
- DocumentMetadata
-
Structured document metadata parsed from
docProps/core.xmlanddocProps/custom.xml. - DocWarning
- A non-fatal issue detected during DOCX parsing or conversion.
- DocxConverter
- Parses DOCX bytes and renders Markdown output.
- DocxImageAsset
- An embedded DOCX image exported during conversion.
- DocxToMarkdownConfig
- Complete configuration for DOCX to Markdown conversion.
- DocxToMarkdownHooks
- Extension points for customizing parsing and rendering behavior.
- EmphInline
- Italic/emphasized text.
- FootnoteDefinition
- The content of a footnote or endnote.
- FootnoteRefInline
- A reference to a footnote definition.
- HeadingBlock
- A heading with a level and inline content.
- HighlightInline
-
Highlighted text (Word
w:highlight). - HookContext
- Contextual information passed to hook functions during parsing.
- HorizontalRuleBlock
- A horizontal rule (thematic break).
- HtmlBlock
- Raw HTML to be passed through to output.
- HtmlInline
- Raw inline HTML to be passed through to output.
- ImageBlock
- A standalone image.
- ImageInline
- An inline image.
- Inline
- An inline IR node (text, link, formatting span, etc.).
- LineBreakInline
- A hard line break within a paragraph.
- LinkInline
- A hyperlink with text content.
- ListBlock
- A list (ordered or unordered) containing list items.
- ListItem
- A single item in a list.
- MathBlock
- A math equation block.
- NodeMeta
- Debug metadata attached to IR nodes.
- PageBreakBlock
- An explicit page break from the source document.
- ParagraphBlock
- A paragraph containing inline content.
- ParagraphStyleOverride
- Maps a Word paragraph style to a specific Markdown block type.
- QuoteBlock
- A block quote containing nested blocks.
- SoftBreakInline
- A soft line break (typically rendered as a space or ignored).
- SourceLocation
- Points to a location within the DOCX package for debugging and warnings.
- StrikeInline
- Strikethrough text.
- StrongInline
- Bold/strong text.
- SubInline
- Subscript text.
- SupInline
- Superscript text.
- TableBlock
- A table with rows, cells, and optional column alignments.
- TableCell
- A single cell in a table row.
- TableGrid
- The row and cell structure of a table.
- TableRow
- A single row in a table.
- TextInline
- Plain text content.
- UnderlineInline
- Underlined text.
Enums
- DefinitionListMode
-
Controls how definition lists (
Definition Term/Definitionparagraph styles) are rendered. - DocSeverity
- Severity levels for diagnostic messages emitted during parsing.
- HighlightMode
-
Controls how highlighted runs (
w:highlight) are rendered. - ImageSizeMode
- Controls how image dimensions are encoded in Markdown output.
- LineBreakStyle
-
Controls how DOCX line breaks (
w:br) are rendered in Markdown. - ListNumberFormat
- The numbering symbol style for an ordered ListBlock.
- ListTightness
- Controls spacing between list items.
- MarkdownFlavor
- The Markdown dialect to target for output.
- MetadataMode
-
Controls whether document metadata (
docProps/core.xmlanddocProps/custom.xml) is rendered ahead of the document body. - OrderedListMarker
- Controls the symbol style used for ordered list markers.
- OrderedListNumbering
- Controls how ordered list numbers are rendered in Markdown output.
- PageBreakMode
-
Controls how explicit page breaks (
w:br w:type="page") are rendered. - TableAlign
- Column alignment for Markdown tables.
- TableMode
- Controls how DOCX tables are rendered in the output.
- TextColorMode
-
Controls how colored runs (
w:color) are rendered. - TrackChangesMode
-
Controls how Word tracked changes (
w:ins/w:del) are converted. - UnderlineMode
- Controls how underlined text is rendered in output.
- UnknownElementPolicy
- Determines how unrecognized OOXML elements are handled during parsing.
Extensions
-
InlinePlainText
on Iterable<
Inline> - Extension for extracting plain text from inline nodes.
Typedefs
- BlockTransformer = Block? Function(Block block, HookContext ctx)
- Transforms or filters a parsed block before rendering.
-
DocxImageAssetSink
= FutureOr<
String?> Function(DocxImageAsset asset) - Receives an embedded image and returns the Markdown image source to use.
- ImageRewriter = String Function(String src, [HookContext? ctx])
- Rewrites image source paths before rendering.
- InlineTransformer = Inline? Function(Inline inlineNode, HookContext ctx)
- Transforms or filters a parsed inline element before rendering.
- LinkRewriter = String Function(String url, [HookContext? ctx])
- Rewrites hyperlink URLs before rendering.
- OmmlToLatexConverter = String? Function(String ommlXml, [HookContext? ctx])
- Converts Office Math Markup Language (OMML) to LaTeX.
- WarningHandler = void Function(DocWarning warning, HookContext ctx)
- Callback invoked when the parser emits a non-fatal warning.
Exceptions / Errors
- DocxPackageException
- Thrown when the .docx package is missing critical parts or cannot be read.