docx_to_markdown library

Converts DOCX (OOXML) documents to Markdown with configurable output.

This library parses .docx files in memory and produces Markdown text. Use it when you need to migrate Word documents to Markdown-based systems (wikis, static site generators, documentation platforms).

Capabilities

  • Content extraction: paragraphs, headings, nested lists, tables
  • Image handling: extracts embedded images to a directory or asset sink
  • Text formatting: bold, italic, strikethrough, underline, inline code
  • Extensibility: hooks let you rewrite links, transform blocks, or inject custom logic

Quick Start

import 'package:docx_to_markdown/docx_to_markdown.dart';

final bytes = File('document.docx').readAsBytesSync();
final converter = DocxConverter(bytes);
final markdown = await converter.convert();

Customization

For non-default behavior, pass a DocxToMarkdownConfig:

final config = DocxToMarkdownConfig(
  flavor: MarkdownFlavor.gfm,
  extractImages: true,
  underlineMode: UnderlineMode.html,
);
final converter = DocxConverter(bytes, config: config);
final markdown = await converter.convert(imageOutputDirectory: 'images/');

See also:

Classes

Block
A block-level IR node (paragraph, heading, list, table, etc.).
CodeBlock
A fenced code block with optional language hint.
CodeInline
Inline code (monospace text).
ColorInline
Colored text (Word w:color).
DefinitionListBlock
A definition list (description list) mapping terms to their definitions.
DefinitionListItem
A single entry in a DefinitionListBlock: a term and its definition body.
DocNode
Base class for all IR nodes.
Document
The root IR node representing a complete DOCX document.
DocumentMetadata
Structured document metadata parsed from docProps/core.xml and docProps/custom.xml.
DocWarning
A non-fatal issue detected during DOCX parsing or conversion.
DocxConverter
Parses DOCX bytes and renders Markdown output.
DocxImageAsset
An embedded DOCX image exported during conversion.
DocxToMarkdownConfig
Complete configuration for DOCX to Markdown conversion.
DocxToMarkdownHooks
Extension points for customizing parsing and rendering behavior.
EmphInline
Italic/emphasized text.
FootnoteDefinition
The content of a footnote or endnote.
FootnoteRefInline
A reference to a footnote definition.
HeadingBlock
A heading with a level and inline content.
HighlightInline
Highlighted text (Word w:highlight).
HookContext
Contextual information passed to hook functions during parsing.
HorizontalRuleBlock
A horizontal rule (thematic break).
HtmlBlock
Raw HTML to be passed through to output.
HtmlInline
Raw inline HTML to be passed through to output.
ImageBlock
A standalone image.
ImageInline
An inline image.
Inline
An inline IR node (text, link, formatting span, etc.).
LineBreakInline
A hard line break within a paragraph.
LinkInline
A hyperlink with text content.
ListBlock
A list (ordered or unordered) containing list items.
ListItem
A single item in a list.
MathBlock
A math equation block.
NodeMeta
Debug metadata attached to IR nodes.
PageBreakBlock
An explicit page break from the source document.
ParagraphBlock
A paragraph containing inline content.
ParagraphStyleOverride
Maps a Word paragraph style to a specific Markdown block type.
QuoteBlock
A block quote containing nested blocks.
SoftBreakInline
A soft line break (typically rendered as a space or ignored).
SourceLocation
Points to a location within the DOCX package for debugging and warnings.
StrikeInline
Strikethrough text.
StrongInline
Bold/strong text.
SubInline
Subscript text.
SupInline
Superscript text.
TableBlock
A table with rows, cells, and optional column alignments.
TableCell
A single cell in a table row.
TableGrid
The row and cell structure of a table.
TableRow
A single row in a table.
TextInline
Plain text content.
UnderlineInline
Underlined text.

Enums

DefinitionListMode
Controls how definition lists (Definition Term / Definition paragraph styles) are rendered.
DocSeverity
Severity levels for diagnostic messages emitted during parsing.
HighlightMode
Controls how highlighted runs (w:highlight) are rendered.
ImageSizeMode
Controls how image dimensions are encoded in Markdown output.
LineBreakStyle
Controls how DOCX line breaks (w:br) are rendered in Markdown.
ListNumberFormat
The numbering symbol style for an ordered ListBlock.
ListTightness
Controls spacing between list items.
MarkdownFlavor
The Markdown dialect to target for output.
MetadataMode
Controls whether document metadata (docProps/core.xml and docProps/custom.xml) is rendered ahead of the document body.
OrderedListMarker
Controls the symbol style used for ordered list markers.
OrderedListNumbering
Controls how ordered list numbers are rendered in Markdown output.
PageBreakMode
Controls how explicit page breaks (w:br w:type="page") are rendered.
TableAlign
Column alignment for Markdown tables.
TableMode
Controls how DOCX tables are rendered in the output.
TextColorMode
Controls how colored runs (w:color) are rendered.
TrackChangesMode
Controls how Word tracked changes (w:ins / w:del) are converted.
UnderlineMode
Controls how underlined text is rendered in output.
UnknownElementPolicy
Determines how unrecognized OOXML elements are handled during parsing.

Extensions

InlinePlainText on Iterable<Inline>
Extension for extracting plain text from inline nodes.

Typedefs

BlockTransformer = Block? Function(Block block, HookContext ctx)
Transforms or filters a parsed block before rendering.
DocxImageAssetSink = FutureOr<String?> Function(DocxImageAsset asset)
Receives an embedded image and returns the Markdown image source to use.
ImageRewriter = String Function(String src, [HookContext? ctx])
Rewrites image source paths before rendering.
InlineTransformer = Inline? Function(Inline inlineNode, HookContext ctx)
Transforms or filters a parsed inline element before rendering.
LinkRewriter = String Function(String url, [HookContext? ctx])
Rewrites hyperlink URLs before rendering.
OmmlToLatexConverter = String? Function(String ommlXml, [HookContext? ctx])
Converts Office Math Markup Language (OMML) to LaTeX.
WarningHandler = void Function(DocWarning warning, HookContext ctx)
Callback invoked when the parser emits a non-fatal warning.

Exceptions / Errors

DocxPackageException
Thrown when the .docx package is missing critical parts or cannot be read.