rumil 0.7.0
rumil: ^0.7.0 copied to clipboard
Parser combinator library for Dart with left recursion, stack-safe trampolining, typed errors, lazy error construction, and sealed ADT design.
0.7.0 #
Pratt operator-precedence parsing, stack-safe chain combinators, operator preset, first-char dispatch. Synchronized release across all rumil-dart packages.
New combinators #
pratt(...): Top-Down Operator Precedence (TDOP) combinator. Takes an atom parser and a list ofPrattOperator<A>descriptors —InfixLeft<A>,InfixRight<A>,Prefix<A>,Postfix<A>— with binding powers and node constructors. Single-pass operator-precedence parsing in place of layeredchainl1calls.cFamilyPrecedence<A>(...): convenience preset returning the standard 15-operator C-family precedence ladder (||/&&/==/!=/comparison/additive/multiplicative + prefix-/!). Consumers pass binary/unary node constructors and the symbol parser; per-symbol customization is via dispatch on the input string.firstCharChoice<A>(...): O(1) dispatch to one of several alternatives based on the leading code unit at the current position. Map keys are strings of one or more chars (e.g.'tf'binds bothtandf,'-0123456789'binds 11 chars). Optionalfallbackruns on miss. Used in JSON's value dispatch (24–27% faster across all input sizes vs the previousOr-chain form).choice([...])auto-fuses toFirstCharChoicewhen alternatives have statically decidable, mutually disjoint leading chars (≥3 alternatives required). The introspector peels throughMapped/Named/Expect/LookAhead/Memo/Zip-left/Captureand recurses intoOr/Choice; bails onDefer/FlatMap/ opaque-predicateSatisfy. Recursive grammars that need the optimization should use the explicitfirstCharChoicebuilder.Chainl1<E, A>andChainr1<E, A>are now first-class Parser ADT cases.chainl1(p, op)andchainr1(p, op)continue to be the public builders; they construct these nodes directly.
Stack safety #
- chainl1 / chainr1 are now stack-safe to memory-only depth (was
StackOverflow at ~850 chain steps under the previous recursive
Or+FlatMapexpansion). - Pratt right-associative chains scale to 1M+ depth via an explicit operator stack.
- Pratt prefix chains (
--5,---5,!-x) compose correctly and scale to 1M+ depth.
Performance #
- FIRST-set dispatch on
Or: peeks at one character to skip doomed left branches when the leading shape is decidable (Satisfy,StringMatch,StringChoice,EofplusMapped/Zip-left/LookAheadpeeling). Many(StringMatch)/SkipMany(simple)fast paths: bypass per-iteration error-thunk allocation in repetition loops.- Pratt opTable fast path: when all infix/postfix operator symbols are literal-prefix parsers, dispatch via a code-unit-indexed table with optional word-boundary / not-followed-by-char guards.
Correctness fixes #
Named<A>/Expect<A>type erasure: previous code inferredA=dynamicand the resultingResult<E, dynamic>failed the runtime cast toResult<E, A>. Fix uses explicitinterpretI<ParseError, A>with typed pattern matching. Was dormant until consumers wrapped Named parsers inside Pratt expressions.
Trade-offs #
- chainl1/chainr1 now go through a dedicated ADT case rather than the
recursive
Or+FlatMapexpansion. Shallow-chain workloads pay ~5–7% more per iteration in dispatch overhead in exchange for the unbounded stack safety. The recommended path for new precedence-driven grammars ispratt(...)+cFamilyPrecedence, which delivers a net win on every workload measured. All three rumil consumers (rumil_expressions, rumil_parsers/HCL, lambe) have been migrated.
Documentation #
state.dartdocuments the mutability boundary:ParserStateis the one mutable object in the parsing pipeline, scoped to a singleparser.run(input)call, never escapes to user code.
Naming #
- The Pratt operator-descriptor sealed class is named
PrattOperator<A>(parent ofInfixLeft<A>/InfixRight<A>/Prefix<A>/Postfix<A>). This frees the bareOperatorname for downstream consumers that have their ownOperatortypes — notablyrumil_tokens, whereOperatoris aTokensubclass for value-computing operator characters in source code. Subclass names (InfixLeft, etc.) are unchanged.
0.6.0 #
Synchronized release across all rumil-dart packages. Additive for
rumil.
position()primitive: a zero-width parser that yields the current byte offset. Combines withZipfor span capture:position().zip(p).zip(position())produces((start, value), end)in one pass.
0.5.0 #
Interpreter optimizations and API refinements.
- Breaking:
Locationchanged fromextension typetofinal class. Line/column now computed lazily from offset — eliminates per-character write barriers. Constructor changed from named parameters toLocation(input, offset). - Breaking:
Snapshottypedef removed.ParserState.save()returnsint,restore()takesint.ParserState.line/columngetters removed — usestate.location.lineinstead. - Perf: Eliminate terminal re-boxing in trampoline (no intermediate Result allocation per terminal dispatch).
- Perf: Replace
late finalwith nullable cache (??=) inPartial/Failureerror fields — removes hidden initialization check on WasmGC. - Perf: Add
Parser.isSimpleproperty for save/restore skipping inOr,Optional,Many,SkipMany. - Perf: Fuse
Capture(Many(p))/Capture(Many1(p))in interpreter — skip intermediate list allocation. - 5-9% faster on AOT native, 30-52% faster on WasmGC across all format parser benchmarks.
0.4.0 #
- Fix:
RecoverWitheagerly evaluates error thunks at recovery time. Lazy thunks in_satisfyManyclosed overParserStateand read stale offsets when evaluated later, causingRangeErroron.errorsaccess. _satisfyManycapturesstate.currentCharinto a local before closures.
0.3.0 #
- Doc on
MemoKey.id. public_member_api_docslint enforced.- Version aligned with other rumil packages.
0.2.0 #
- Breaking:
fail()renamed tofailure()to avoid conflict withpackage:test. - Doc comments on all public API elements.
rule()doc: guidance on placement (postfix level, not top level).lexeme()doc: note about whitespace handling forchainl1operands.
0.1.0 #
- Core parser combinators: sealed Parser ADT with 26 subtypes, external interpreter, defunctionalized trampoline
- Warth seed-growth left recursion via
rule() - Stack-safe to 10M+ operations
- Typed errors with source location (line, column, offset)
- Lazy error construction via
late finalthunks - RadixNode O(m) string matching
- Full combinator DSL:
.zip(),.thenSkip(),.skipThen(),|,.map,.flatMap,.many,.sepBy,.chainl1,.chainr1,.between,.capture,.memoize - Format parsers: JSON (RFC 8259), CSV (RFC 4180), XML, TOML (v1.0.0), YAML (simplified 1.2), Proto3 schema
- AST decoders for JSON, TOML, YAML with
ObjectAccessorpattern - Formula evaluator with operator precedence via
chainl1, variables, custom functions - Binary codec: ZigZag, LEB128 Varint, BinaryCodec with
xmap+product2–product6composition - build_runner codegen for
@binarySerializableclasses and sealed hierarchies