rumil 0.7.1
rumil: ^0.7.1 copied to clipboard
Parser combinator library for Dart with left recursion, stack-safe trampolining, typed errors, lazy error construction, and sealed ADT design.
0.7.1 #
LineIndex for O(log n) line/column resolution, and a hot/cold
split of the interpreter dispatch that lifts the format-parser suite
4–6%. Additive on the public surface — Location, existing
combinators, and the rest are unchanged.
Added #
-
LineIndex(source): builds a sorted array of every\noffset in [source] in one O(n) pass, backed by aUint32List. SubsequentlocationAt(offset)queries run in O(log n) via binary search and return a [Location] with precomputed line, column, and offset (instead of the lazy O(n) walk and O(column) backwalk that [Location] performs by default).spanAt(start, end)resolves both endpoints of a [Span] in one pair of lookups.Motivated by streaming parsers (NDJSON, multi-document YAML) that produce many errors against the same input: rendering N errors to
line:columnwas O(N·n) without an index, O(N·log n) with one. Mirrors the LineIndex from rumil-scala, including its line-terminator policy:\nis the sole terminator;\ris a regular character; callers normalize at the editor or LSP boundary before building the index. Column encoding is UTF-16 code units, matching the LSP default. -
PrecomputedLocation: an internal [Location] subtype carrying cachedlineandcolumn. Constructed only viaLineIndex.locationAt; existing code that constructsLocation(input, offset)directly continues to use the lazy walk.
Changed #
-
Location.format()now does one forward walk instead of two. The previous implementation read [Location.line] and [Location.column] separately, walking[0, offset)twice (once forward to count newlines, once backward to find the column). The fused version walks once, tracking both the newline count and the most-recent newline offset. Measured 17–21% faster on AOT across source sizes 1 KB / 10 KB / 100 KB.Behavior unchanged: same output string, same edge cases. Callers that read
.lineand.columndirectly still walk twice — no memoization (would conflict with the const constructor and the subtype-via-implementspattern). Use [LineIndex] when many positions need resolving against the same source. -
Hot/cold split of
interpretI's ~40-case switch. The hot cases stay inline (Succeed, Fail, Satisfy, StringMatch, StringChoice, Eof, GetPosition, Mapped, FlatMap, Zip, Or, Choice, FirstCharChoice, the Many/Many1/SkipMany/Capture(Many) terminal- shape fast paths, Defer). The cold cases (Optional, Attempt, LookAhead, NotFollowedBy, RecoverWith, Expect, Named, Trace, Debug, Memo, Pratt, Chainl1, Chainr1, the generic Many/Many1/ SkipMany/Capture fall-throughs) move to a separate_interpretColdfunction called from adefaultbranch.Pattern adapted from Eru's
runFast/stepColdarchitecture. ShrinksinterpretI's AOT body from 11,276 → 6,824 bytes (−39.5%), letting the optimizer specialize the hot dispatch tightly.Behavior unchanged. Format parser suite (every parser in the family, both runtimes): every benchmark faster, range 2–9% AOT, 2–6% WASM, average ~5%. The dispatch microbench in
rumil_benchshows 9–14% improvement on the composite-Many path that surfaces the hot-loop dispatch overhead.
Benchmarks (1000 ops per benchmark) #
LineIndex amortization holds across runtimes — about three orders of magnitude at 100 KB on both AOT and WASM:
| Source | AOT cached | AOT rebuilt | AOT speedup | WASM cached | WASM rebuilt | WASM speedup |
|---|---|---|---|---|---|---|
| 1 KB | 6.5 μs | 1 049 μs | 161× | 7.6 μs | 1 128 μs | 148× |
| 10 KB | 16.6 μs | 10 214 μs | 615× | 16.3 μs | 11 141 μs | 684× |
| 100 KB | 112 μs | 103 718 μs | 926× | 116 μs | 111 132 μs | 958× |
The fused Location.format() walk is a smaller polish, more visible
on AOT than WASM (which already folds some of the redundant work):
| Source | AOT fused | AOT legacy | AOT win | WASM fused | WASM legacy | WASM win |
|---|---|---|---|---|---|---|
| 1 KB | 330 μs | 390 μs | 1.18× | 444 μs | 439 μs | ~1.00× |
| 10 KB | 2 691 μs | 3 154 μs | 1.17× | 3 293 μs | 3 709 μs | 1.13× |
| 100 KB | 25 562 μs | 30 843 μs | 1.21× | 32 328 μs | 36 330 μs | 1.12× |
Reproduce from rumil_bench/:
# AOT
dart compile exe bin/bench_line_index.dart -o /tmp/bench_li
/tmp/bench_li
# WASM (Deno required)
dart compile wasm bin/bench_line_index.dart -o /tmp/bench_li.wasm
deno run --allow-read tool/run_wasm.mjs /tmp/bench_li.wasm
0.7.0 #
Pratt operator-precedence parsing, stack-safe chain combinators, operator preset, first-char dispatch. Synchronized release across all rumil-dart packages.
New combinators #
pratt(...): Top-Down Operator Precedence (TDOP) combinator. Takes an atom parser and a list ofPrattOperator<A>descriptors —InfixLeft<A>,InfixRight<A>,Prefix<A>,Postfix<A>— with binding powers and node constructors. Single-pass operator-precedence parsing in place of layeredchainl1calls.cFamilyPrecedence<A>(...): convenience preset returning the standard 15-operator C-family precedence ladder (||/&&/==/!=/comparison/additive/multiplicative + prefix-/!). Consumers pass binary/unary node constructors and the symbol parser; per-symbol customization is via dispatch on the input string.firstCharChoice<A>(...): O(1) dispatch to one of several alternatives based on the leading code unit at the current position. Map keys are strings of one or more chars (e.g.'tf'binds bothtandf,'-0123456789'binds 11 chars). Optionalfallbackruns on miss. Used in JSON's value dispatch (24–27% faster across all input sizes vs the previousOr-chain form).choice([...])auto-fuses toFirstCharChoicewhen alternatives have statically decidable, mutually disjoint leading chars (≥3 alternatives required). The introspector peels throughMapped/Named/Expect/LookAhead/Memo/Zip-left/Captureand recurses intoOr/Choice; bails onDefer/FlatMap/ opaque-predicateSatisfy. Recursive grammars that need the optimization should use the explicitfirstCharChoicebuilder.Chainl1<E, A>andChainr1<E, A>are now first-class Parser ADT cases.chainl1(p, op)andchainr1(p, op)continue to be the public builders; they construct these nodes directly.
Stack safety #
- chainl1 / chainr1 are now stack-safe to memory-only depth (was
StackOverflow at ~850 chain steps under the previous recursive
Or+FlatMapexpansion). - Pratt right-associative chains scale to 1M+ depth via an explicit operator stack.
- Pratt prefix chains (
--5,---5,!-x) compose correctly and scale to 1M+ depth.
Performance #
- FIRST-set dispatch on
Or: peeks at one character to skip doomed left branches when the leading shape is decidable (Satisfy,StringMatch,StringChoice,EofplusMapped/Zip-left/LookAheadpeeling). Many(StringMatch)/SkipMany(simple)fast paths: bypass per-iteration error-thunk allocation in repetition loops.- Pratt opTable fast path: when all infix/postfix operator symbols are literal-prefix parsers, dispatch via a code-unit-indexed table with optional word-boundary / not-followed-by-char guards.
Correctness fixes #
Named<A>/Expect<A>type erasure: previous code inferredA=dynamicand the resultingResult<E, dynamic>failed the runtime cast toResult<E, A>. Fix uses explicitinterpretI<ParseError, A>with typed pattern matching. Was dormant until consumers wrapped Named parsers inside Pratt expressions.
Trade-offs #
- chainl1/chainr1 now go through a dedicated ADT case rather than the
recursive
Or+FlatMapexpansion. Shallow-chain workloads pay ~5–7% more per iteration in dispatch overhead in exchange for the unbounded stack safety. The recommended path for new precedence-driven grammars ispratt(...)+cFamilyPrecedence, which delivers a net win on every workload measured. All three rumil consumers (rumil_expressions, rumil_parsers/HCL, lambe) have been migrated.
Documentation #
state.dartdocuments the mutability boundary:ParserStateis the one mutable object in the parsing pipeline, scoped to a singleparser.run(input)call, never escapes to user code.
Naming #
- The Pratt operator-descriptor sealed class is named
PrattOperator<A>(parent ofInfixLeft<A>/InfixRight<A>/Prefix<A>/Postfix<A>). This frees the bareOperatorname for downstream consumers that have their ownOperatortypes — notablyrumil_tokens, whereOperatoris aTokensubclass for value-computing operator characters in source code. Subclass names (InfixLeft, etc.) are unchanged.
0.6.0 #
Synchronized release across all rumil-dart packages. Additive for
rumil.
position()primitive: a zero-width parser that yields the current byte offset. Combines withZipfor span capture:position().zip(p).zip(position())produces((start, value), end)in one pass.
0.5.0 #
Interpreter optimizations and API refinements.
- Breaking:
Locationchanged fromextension typetofinal class. Line/column now computed lazily from offset — eliminates per-character write barriers. Constructor changed from named parameters toLocation(input, offset). - Breaking:
Snapshottypedef removed.ParserState.save()returnsint,restore()takesint.ParserState.line/columngetters removed — usestate.location.lineinstead. - Perf: Eliminate terminal re-boxing in trampoline (no intermediate Result allocation per terminal dispatch).
- Perf: Replace
late finalwith nullable cache (??=) inPartial/Failureerror fields — removes hidden initialization check on WasmGC. - Perf: Add
Parser.isSimpleproperty for save/restore skipping inOr,Optional,Many,SkipMany. - Perf: Fuse
Capture(Many(p))/Capture(Many1(p))in interpreter — skip intermediate list allocation. - 5-9% faster on AOT native, 30-52% faster on WasmGC across all format parser benchmarks.
0.4.0 #
- Fix:
RecoverWitheagerly evaluates error thunks at recovery time. Lazy thunks in_satisfyManyclosed overParserStateand read stale offsets when evaluated later, causingRangeErroron.errorsaccess. _satisfyManycapturesstate.currentCharinto a local before closures.
0.3.0 #
- Doc on
MemoKey.id. public_member_api_docslint enforced.- Version aligned with other rumil packages.
0.2.0 #
- Breaking:
fail()renamed tofailure()to avoid conflict withpackage:test. - Doc comments on all public API elements.
rule()doc: guidance on placement (postfix level, not top level).lexeme()doc: note about whitespace handling forchainl1operands.
0.1.0 #
- Core parser combinators: sealed Parser ADT with 26 subtypes, external interpreter, defunctionalized trampoline
- Warth seed-growth left recursion via
rule() - Stack-safe to 10M+ operations
- Typed errors with source location (line, column, offset)
- Lazy error construction via
late finalthunks - RadixNode O(m) string matching
- Full combinator DSL:
.zip(),.thenSkip(),.skipThen(),|,.map,.flatMap,.many,.sepBy,.chainl1,.chainr1,.between,.capture,.memoize - Format parsers: JSON (RFC 8259), CSV (RFC 4180), XML, TOML (v1.0.0), YAML (simplified 1.2), Proto3 schema
- AST decoders for JSON, TOML, YAML with
ObjectAccessorpattern - Formula evaluator with operator precedence via
chainl1, variables, custom functions - Binary codec: ZigZag, LEB128 Varint, BinaryCodec with
xmap+product2–product6composition - build_runner codegen for
@binarySerializableclasses and sealed hierarchies