rumil_parsers 0.10.0 copy "rumil_parsers: ^0.10.0" to clipboard
rumil_parsers: ^0.10.0 copied to clipboard

Format parsers built on Rumil: JSON, CSV, XML, TOML, YAML, Proto3, HCL, and CommonMark Markdown, plus typed AST decoders with ObjectAccessor pattern.

0.10.0 #

The value layer is now stack-safe to memory, and serializers can stream. The parser interpreter was already stack-safe; this release extends that to the operations that run after a parse (native conversion, serialization, and composed decoding), which had been recursive on nesting depth, so a document that parsed fine could overflow the stack on the next step. All converters, serializers, and composite decoders are now iterative. Additive on the public surface; the version jumps 0.8.1 → 0.10.0 to rejoin the rumil family in lockstep.

Fixed — value-layer stack safety #

  • Native converters (jsonToNative, yamlToNative/resolveAnchors, tomlToNative, hclToNative, xmlToNative) now walk the AST over an explicit worklist instead of recursing, so converting a deeply-nested document cannot overflow the Dart call stack.
  • Serializers (serializeJson, serializeYaml, serializeToml, serializeXml, serializeHcl) are likewise iterative.
  • Composite decoders (jsonListOf/jsonMapOf/nullable and the YAML/TOML equivalents) drain their nesting over a worklist. The one remaining host-recursion boundary is the user build-callback of fromJsonObject/fromYamlMapping/fromTomlTable (and .map), which recurses once per schema level, not per value level, and cannot be trampolined without a breaking AstDecoder.decode signature change, so it is documented as a known boundary.

Added — streaming serialization #

  • serialize{Json,Yaml,Toml,Xml,Hcl,HclValue}To(StringSink, …): each serializer now has a streaming primitive that writes into a StringSink; the existing String-returning functions are byte-for-byte-identical wrappers over it. This decouples stack-safety from output size: an indented pretty-printer emits indent × depth whitespace per level (Θ(depth²) total, inherent to pretty-printing, as in jq or JSON.stringify(_, null, 2)), and streaming to a sink keeps peak memory bounded even when the total output is large.

0.8.1 #

Two additive parsers: parseMarkdownWithFrontmatter and parseNdJson. Motivated by lambé's input pipeline (markdown frontmatter currently leaks into document body; NDJSON line splitting was hand-rolled downstream) and by rem's markdownWithFrontmatter helper, which can collapse to a thin re-export of the upstream API.

Added #

  • parseMarkdownWithFrontmatter(String input) → Result<ParseError, MarkdownDocument>. Parses Markdown that may have a leading YAML frontmatter block delimited by --- lines. Returns a MarkdownDocument carrying both the optional YamlDocument frontmatter and the MdDocument body. Detection rules: the opening --- must sit at offset 0 and be followed by a newline; the closing fence is the first line containing exactly ---; CRLF is tolerated; an unclosed block falls back to plain Markdown without raising an error; an empty block (---\n---\n) yields YamlNull. YAML parse errors inside a well-formed block surface as the result's failure. parseMarkdown is byte-unchanged.

  • parseNdJson(String input, {NdJsonConfig config})Result<ParseError, List<JsonValue>>. Parses newline-delimited JSON (NDJSON / JSON Lines). A \r immediately before \n is stripped so CRLF input parses identically to LF. Per-line errors are accumulated as Partial rather than aborting the stream — callers see every parsed value and every error in one pass. Error Locations reference the original input, with line/column precomputed in O(log n) via the new rumil.LineIndex. parseJson is byte-unchanged.

    Strict by default. Blank lines are parse errors, matching jsonlines.org. The opt-in NdJsonConfig(lenient: true) skips blank lines for log-file consumers and stanza-style inputs. Strict mode is the right default — tolerating blank lines silently is the kind of choice that makes one parser quietly different from another and bugs in upstream producers go unnoticed.

0.8.0 #

JSON parser, principled and fast. The HCL number AST follows the same split. Five logical chunks ship together: the HCL decoder fix originally scoped as 0.7.1, a JSON AST split (JsonNumberJsonInt | JsonDouble), the matching HCL AST split (HclNumberHclInt | HclDouble), a JSON parser perf overhaul, and two latent correctness fixes (common.floatingPoint precision, YAML integer overflow).

Changed (breaking) #

  • JsonNumber is now a sealed sum of JsonInt(int) and JsonDouble(double). The previous single-JsonNumber(double) representation flattened integer-shaped and float-shaped tokens at the AST layer, silently losing precision for integers above 2^53 and denying downstream consumers the type discrimination they need to specialize integer-vs-float paths.

    The new shape matches the discrimination already present in dart:convert (where jsonDecode returns int or double based on token shape), serde_json's Number enum (PosInt/NegInt/Float), simdjson's number_type (signed_integer/unsigned_integer/floating_point_number), and Jackson's NumericNode hierarchy. Pattern matching on JsonNumber becomes pattern matching on JsonInt or JsonDouble. Equality across the variants is false: JsonInt(1) != JsonDouble(1.0).

    Big integers exceeding Dart's int range fall back to JsonDouble, matching dart:convert's rule. Adding an explicit JsonBigInt variant is reserved for a future release if real consumers need it.

    Round-trip fidelity is improved as a side effect: parseJson('1.0') now serializes back as '1.0' rather than '1'. The source token shape is preserved.

    Decoders are tolerant of either variant — jsonInt.decode(JsonDouble) narrows via value.toInt(), jsonDouble.decode(JsonInt) widens via value.toDouble(). Documented on each decoder.

  • HclNumber is now a sealed sum of HclInt(int) and HclDouble(double), mirroring the JSON AST split. The previous single-HclNumber(num) representation forced consumers to dispatch on value is int at every read; the new shape preserves the discrimination at parse time. Integer-shaped tokens that overflow Dart's int fall back to HclDouble, matching JSON's rule. Equality across variants is false. Pattern matching on HclNumber becomes pattern matching on HclInt or HclDouble. Round-trip preserves source token shape: 1 parses as HclInt(1) and serializes as '1'; 1.0 parses as HclDouble(1.0) and serializes as '1.0' (was '1' under the flattened representation).

Fixed #

  • HCL decoder is now consistent across N=1 vs N≥2 same-labeled blocks. hclDocToNative previously returned a single block as a non-list ({...}) and multiple blocks as a list. Now blocks always return as lists, regardless of count, using the HclBlock discriminator already present in the AST. Attributes are unchanged. Consumers that pattern-matched on result['variable'] is Map for the N=1 case must switch to result['variable'] is List (always). The previous behavior threw away structural information from the parser AST and made common Terraform patterns (one terraform, one provider, single variable) require defensive shape checks.

  • common.floatingPoint() precision. The helper previously computed value * math.pow(10, exp) for tokens with an exponent; that multiplication rounded before assembly and dropped the smallest positive subnormal (5e-324) to 0.0. Now delegates to double.parse on the captured source slice, which uses the platform's correctly-rounded conversion. YAML inherits the fix automatically since it consumes floatingPoint().

  • YAML integer overflow. _yamlInteger previously called int.parse(...) (via common.signedInt()), which throws on tokens exceeding Dart's int range. Now uses int.tryParse + fallback to YamlFloat, matching JSON's big-integer rule. Affects YAML documents with very large integer literals (e.g. 2^63 or beyond).

Performance #

The JSON parser is now substantially faster on every workload, with the largest wins under Wasm where the JsonInt/JsonDouble split unlocks i64-vs-f64 specialization that the flattened representation forced into a single homogeneous f64 path.

Mean μs/op across 100 measured iterations + 100 warmup, Linux x86_64, Dart SDK 3.11.4. Each pass run separately on a quiet system. Full table and per-byte MB/s in BENCHMARKS.md.

Workload 0.7.0 AOT 0.8.0 AOT AOT speedup 0.7.0 Wasm 0.8.0 Wasm Wasm speedup
integer_heavy 162.1 ms 154.5 ms 1.05× 86.8 ms 64.7 ms 1.34×
float_heavy 189.2 ms 179.5 ms 1.05× 96.0 ms 76.9 ms 1.25×
mixed 1368 ms 1115 ms 1.23× 609.4 ms 430.3 ms 1.42×

Wins come from three changes: capture-based number parsing (one allocation per token instead of a per-character interpolation chain), capture-based string runs (one substring slice in the unescaped fast path instead of O(n) per-character allocations), and elimination of the redundant leading _ws in _lex (every token paid a leading skip that the previous token's trailing skip had already consumed). The combinator architecture's affinity for Wasm codegen surfaces in the Wasm column — the mixed workload composes all four optimizations (numbers, strings, dispatch, lex) and shows the largest relative win.

Reproduce via rumil_bench's bench_json_perf_pass:

cd rumil_bench
dart compile exe bin/bench_json_perf_pass.dart -o /tmp/perf.aot
/tmp/perf.aot

For the Wasm column, see BENCHMARKS.md for the full instructions.

0.7.0 #

  • JSON: value-dispatch parser migrated from a 6-way Or chain to rumil's new firstCharChoice combinator. JSON values have cleanly disjoint leading chars (n, t/f, digits/-, ", [, {), so the O(1) dispatch replaces the linear scan. Bench numbers (AOT native, 6 runs): json-small 24.0 µs → 18.4 µs (-23%), json-medium 35.0 ms → 25.7 ms (-27%), json-large 429 ms → 312 ms (-27%). vs petitparser ratio improves from 13× to ~10× small / ~9× large. All RFC 8259 conformance tests pass unchanged.
  • HCL: operator precedence parser migrated from a six-layered chainl1 ladder + recursive _unary to a single pratt(...) call using rumil's new cFamilyPrecedence preset. Functionally equivalent — same operators, same binding powers, same AST. Bench numbers: hcl-config 253 µs → 225 µs (-11%), hcl-50res 10.7 ms → 9.32 ms (-13%) on AOT native. All HCL conformance tests (specsuite, fuzz corpus, terraform-provider-aws .tf files) pass unchanged.
  • Other format parsers (CSV, TOML, XML, YAML, Proto3, Markdown) are unchanged at the source level. They benefit transparently from rumil 0.7's Many(StringMatch) / SkipMany(simple) fast paths (CSV measured 10–22% faster) and from the FIRST-set Or dispatch optimization (small wins on alternation-heavy grammars).
  • Depends on rumil: ^0.7.0.

0.6.0 #

  • Depends on rumil: ^0.6.0. Version aligned with the rumil-dart monorepo 0.6.0 release. No functional changes in this package.

0.5.0 #

CommonMark Markdown parser. Architecture audit. 7376 tests.

  • Markdown: 652/652 CommonMark 0.31.2 spec conformance. Typed MdNode AST with structured fields (MdHeading.level, MdLink.href, MdImage.alt) — separates parsing from rendering. Public API: parseMarkdown(String) → Result<ParseError, MdDocument>.
  • TOML: Replace throw/try-catch with Result-based error flow. Zero exceptions in the parser.
  • XML: Replace manual indexOf/substring with combinators for QName parsing, entity reference validation, and attribute value expansion.
  • Delimited: Replace while-loop field splitter and RegExp with combinator parsers.
  • All formats: Apply .capture optimization (12 sites) — each benefits from fused Capture(Many) interpreter fast path.
  • TOML: Deduplicate unicode escape parsers into parameterized _unicodeEscape(marker, count).
  • Depends on rumil ^0.5.0.

0.4.0 #

All parsers to spec conformance. 6724 tests, zero analyzer warnings.

  • HCL full spec: expression tower (operators, ternary, for-expressions, function calls), string templates ${expr}, heredocs <<EOF/<<-EOF, template directives %{if}/%{for}, index/splat [*]/.*, scientific notation, Unicode identifiers, parenthesized object keys, object element commas. 2760/2760 including 2717 terraform-provider-aws .tf files.
  • XML 1.0 5e: W3C conformance suite — 1506/1506. DOCTYPE/DTD parsing, external entity resolution, namespace validation, Unicode names, attribute uniqueness, -- restriction in comments.
  • Delimited overhaul: three-tier architecture (explicit config / auto-detect dialect / per-row robust), BOM stripping, ragged row policies, detectDialect(), parseDelimitedRobust(). 100 tests.
  • YAML 1.2: anchors, aliases, merge keys, block scalars, multi-document, full escape set, resolveAnchors(), YamlParseConfig. 333/333.
  • JSON: 318/318. TOML 1.1: 681/681. Proto3: 101/101.
  • Conformance test runners for all formats in test/conformance/.

0.3.1 #

  • Doc on ObjectBuilder constructor.
  • Depends on rumil ^0.3.0.

0.3.0 #

  • AST encoders + serializers for JSON, TOML, YAML, XML, CSV, Proto3, HCL.
  • AstBuilder with nativeToAst for JSON, YAML, TOML, XML, HCL.
  • Native decoders: jsonToNative, yamlToNative, tomlToNative, xmlToNative, hclToNative.
  • Shared escape utilities.
  • operator == and hashCode on all AST classes.
  • YAML indentation-based nested block parsing.
  • HCL parser (attributes, blocks, comments, references).
  • 278 tests.

0.2.0 #

  • Doc comments on all public API elements.
  • Depends on rumil ^0.2.0 (fail renamed to failure).

0.1.0 #

  • Core parser combinators: sealed Parser ADT with 26 subtypes, external interpreter, defunctionalized trampoline
  • Warth seed-growth left recursion via rule()
  • Stack-safe to 10M+ operations
  • Typed errors with source location (line, column, offset)
  • Lazy error construction via late final thunks
  • RadixNode O(m) string matching
  • Full combinator DSL: .zip(), .thenSkip(), .skipThen(), |, .map, .flatMap, .many, .sepBy, .chainl1, .chainr1, .between, .capture, .memoize
  • Format parsers: JSON (RFC 8259), CSV (RFC 4180), XML, TOML (v1.0.0), YAML (simplified 1.2), Proto3 schema
  • AST decoders for JSON, TOML, YAML with ObjectAccessor pattern
  • Formula evaluator with operator precedence via chainl1, variables, custom functions
  • Binary codec: ZigZag, LEB128 Varint, BinaryCodec with xmap + product2product6 composition
  • build_runner codegen for @binarySerializable classes and sealed hierarchies
0
likes
160
points
520
downloads

Documentation

API reference

Publisher

verified publisherardaproject.org

Weekly Downloads

Format parsers built on Rumil: JSON, CSV, XML, TOML, YAML, Proto3, HCL, and CommonMark Markdown, plus typed AST decoders with ObjectAccessor pattern.

Repository (GitHub)
View/report issues

Topics

#parser #json #toml

License

MIT (license)

Dependencies

rumil

More

Packages that depend on rumil_parsers