lambe 0.9.0
lambe: ^0.9.0 copied to clipboard
A query language for structured data that shows you what you're working with. Shape-aware --explain, JSON Schema input, format bridges. CLI + library + MCP.
0.9.0 #
Closes the shape feedback loop. Declare a JSON Schema, check queries
against it, round-trip schemas with the ecosystem. Plus: richer
static analysis in --explain, line-delimited JSON input, an opt-in
CSV escape hatch for nested cells, an architectural pipe-op
consolidation, and a rumil_tokens-based REPL highlighter.
Pipe-op AST consolidation #
- The 27 per-op AST classes (
FilterOp,MapOp,SortOp, …) collapse into a singleBuiltinPipeOp(name, args). The spec table inpipe_ops.dartis now the only place per-op behaviour lives: acceptance, shape inference, runtime evaluation, and parse arity all live on the same record. Adding or renaming a pipe op is a one-file change. As(target)keeps a dedicated AST class for its typedOutputFormatargument — it's the only custom-arity op.pipeOpInfoFor(LamExpr)recognises bothBuiltinPipeOpandAs.- Source-breaking for external code that constructed pipe-op AST
nodes directly. The pre-1.0 contract here was that AST classes
were internals; we're taking that out properly. Tests that
assembled
MapOp(.x)etc. now writeBuiltinPipeOp('map', [.x]).
REPL syntax highlighter on rumil_tokens #
lib/src/readline.dart's 100-line hand-rolled tokenizer is gone. The highlighter now consumes aTokenstream from therumil_tokensLangGrammardefined inlib/src/highlight_grammar.dart. The grammar lives in lambé (not inrumil_tokens' built-in five) because it's lambé-specific.- New runtime dependency:
rumil_tokens ^0.1.0. - Pipe op names (
filter,map,text, etc.) now colour as keywords (magenta) — they're routed throughLangGrammar.typesand sourced frompipe_ops.dart's spec table, so adding a new op picks up colouring automatically. - Highlighting is re-rendered on every keystroke. Earlier sessions
used a fast-path that wrote each typed character verbatim without
re-tokenising, so keywords stayed plain until a later edit
triggered a full redraw. With
rumil_tokensactually being fast, the fast-path was a UX bug; nowfiltercolours on the finalr, not after the next backspace. - Visible behavioural change in the REPL:
.fieldcolours as two tokens (.punctuation +fieldidentifier) rather than one cyan run; negative literals colour as-operator + number rather than one yellow run. The audit determined the new behaviour is more principled; the visual effect is subtle.
REPL Tab completion: bare pipe ops inside parameterised ops #
map(t<TAB>now offerstext,to_entries,type, etc. instead of nothing useful. Bare pipe-op names liketext,length,to_entriesare legal expressions in lambé (sugar for. | op), so the completer should offer them insidemap(...)/filter(...)when the user is typing a partial name without a leading.. Candidates are filtered by the element shape of the surrounding pipe input — same shape-gated rule that already governed post-pipe completion. The newtextop makesmap(text)a useful and discoverable pattern; this change ensures the completer can help users find it.
REPL Tab completion: heterogeneous lists via data sampling #
.children | map(.<TAB>on a heterogeneous list (e.g. a real markdown document wherechildrenmixes headings / paragraphs / code blocks) now offers the actual fields of the first list element, instead of an empty candidate list. The static shape system correctly widens such lists toSList<SAny>, which gives completion no hints; the completer falls back to navigating the actual data values and shape-of-ing the first element to recover a useful shape. Sampling threads through pipe ops that preserve the element family (filter,sort,sort_by,unique,unique_by,reverse) so.children | filter(.type == "heading") | map(.<TAB>works too. Completion never runs the user's query — only structural navigation — so cost stays bounded.
Markdown text extraction #
textpipe op. Walks a markdown node (or list of nodes) and concatenates every prose-bearing leaf —text,code,code_block, andimage.alt— in document order. Container nodes recurse through theirchildren.html_blockandhtml_inlineare skipped (avoids theNode.textContenttrap of dragging raw HTML, scripts, and styles into "give me the text").soft_break(a paragraph wrap in source) contributes a single space, preserving word boundaries across line wraps;hard_break(\at end of line, or two trailing spaces, an explicit author-intended break) contributes a literal'\n'. This diverges frommdast-util-to-string's empty-on-break default — trades strict precedent for the typical case of "produce readable prose". Users who want a fully flat string can post-process with a whitespace collapser. The previous recommendation,.children[0].text, is structurally wrong for non-trivial markdown (nested emphasis, inline code, links) and the existing pipe surface cannot fix that without recursion.- First op tuned to a specific input format's vocabulary. This is
the only pipe op whose
evalswitches on a value'stypefield. The behaviour is bounded to markdown's node-type vocabulary as defined inlib/src/input.dart's_nodeToNative. It does NOT authorise content-level dispatch in any other op — the spec entry carries a load-bearing comment to that effect. Prior art (XPathstring(node), mdast-util-to-string) converges on the same shape: format-aware leaf primitive with hardcoded knowledge of which fields carry prose vs metadata. jq's..approach drags inlink.href,image.src, andcode_block.language— exactly the trap this op avoids.
queryNdjsonString convenience #
- New
queryNdjsonString(Iterable<String> lines, String expression)parses the expression once and delegates toqueryNdjson. Resolves the asymmetry where the existingqueryNdjsontook a pre-parsed AST while every otherquery*took a string.
Performance #
_normalizeshort-circuits canonical inputs.Map<String, Object?>/List<Object?>/ scalars round-trip through the public API without allocating a copy. Non-canonical inputs (e.g.Map<dynamic, dynamic>from some YAML decoders) still rebuild as before.- End-to-end CLI is roughly 3.3× faster than 0.8.0 on
parse-bound workloads. Measured on a 50k-element JSON document
(1.5 MB), AOT, on a Linux x86_64 workstation with the bench
harness in
tool/bench/cli_bench.sh:lam --print-shape big.json: 2.4 s → 732 ms (3.28×).lam '.items | filter(.value > 50000) | length' big.json: 2.5 s → 742 ms (3.37×). Most of the win is inherited: rumil 0.7's FIRST-set Or dispatch, thefirstCharChoicecombinator, and the Pratt migration carried the bulk; rumil_parsers 0.8.0's JSON AST split, capture-based number/string parsing, HCL AST split, andcommon.dartcapture rewrites carried the rest. Non-parse-bound paths benefit too —group_byon 1k records is ~15% faster (39 ms → 33 ms) because the JSON AST split removes a per-numbertruncateToDoublecheck injsonToNative. Seetool/bench/cli_bench.shfor the harness and reproduction.
Documentation precision #
- Six per-op behavioural details now have load-bearing docstrings:
//is a null-fallback (not an error-handler), the empty-list policy (first/lastreturn null;min/max/avgthrow;sumreturns 0),uniquedistinguishes int from double by canonical encoding, duplicate keys in{a: x, a: y}follow Dart map literal semantics (last wins),from_entriesrejects non-map / non-string- key entries explicitly (was silent skip),typerejects non-JSON runtime values with a hint pointing atparseInput/jsonDecode. - The
from_entrieschange is the only behavioural one — non-map entries used to be dropped silently, now they throwQueryError. Hides a class of bugs where upstream pipelines emit the wrong shape. as(fmt)bridges reference indoc/recipes.md. Documents the four canonical bridges with runnable examples:list<scalar> | as(toml/hcl)wraps as{items: ...};scalar | as(toml/hcl)wraps as{value: ...};map | as(csv/tsv)derives viato_entries;scalar | as(csv/tsv)composes both.Asclass doc softened to be honest about which error paths users will and won't hit. The "ambiguous bridge" runtime branch is defensive against future curation errors but unreachable with the current curated table — the doc no longer claims otherwise. A new invariant test inshape_synthesize_testpins≤ 1 bridge per (shape, format)so the path becomes reachable only by a deliberate change.syntax.mdexamples revert fromecho … | lam '. | op'to the cleanerlam -n '… | op'form now that-nexists. Several pre-A6 examples were also silently broken: lambé object construction uses bare identifiers ({a: 1}), not JSON-string keys ({"a": 1}), so[{"key": "a"}] | from_entrieswas never runnable. Fixed.-r/--rawsemantics — man page entry now states the option only affects top-level string scalars and is a silent no-op on structured output (objects, arrays, numbers, booleans, null). The previous wording ("Output strings without quotes") read as a pretty-print toggle and surprised users on non-string values.doc/non-goals.md— new page enumerating the features lambé deliberately omits, with the lambé idiom that replaces each one. Cross-linked fromREADME.md("What lambé is not"),jq-to-lambe.md, andAGENTS.md. Covers Turing-completeness, recursive descent (..),try/catch,selectoutsidefilter,paths/leaf_paths/getpath/setpath, regex,range/limit/nth,.[]iteration,def/lambdas,@base64/@uri, streaming,env/$__loc__, HCL evaluation, and XML. Staying bounded is a feature; the page makes that legible.textop precedent — the newtextpipe op (see Markdown text extraction) is the only op tuned to a specific input format's vocabulary. The spec entry carries a load-bearing dartdoc comment declaring this is bounded to markdown's node-type vocabulary as defined in_nodeToNative, and does NOT authorise content-level dispatch in any other op.
Bug fixes #
- TSV input now honors header rows the same way CSV does. Pre-0.9.0
every TSV file returned
List<List<String>>because the parser passed a staticdefaultTsvConfigand skipped dialect detection. NowparseInputrunsdetectDialectfor TSV with the tab delimiter forced, so files where the first row looks like headers returnList<Map<String, Object?>>.--print-shape data.tsvand--print-shape data.csvagree on logical content. - String single-char indexing.
.name[0]now returns a one-character substring instead of erroring withCannot index string. Slicing (.name[0:3]) already worked; the asymmetry is gone. Out-of-range returnsnull(mirrors list indexing); non-int still throws. --explainwritability section is suppressed when a runtime-rejection warning fires. When a pipe op's input shape is provably incompatible the post-stage shape widens toSAny, which used to make every output format passcanWriteAs— so the explain report listed every format for a pipeline that would throw before any writer ran. BothWritable as:andNot writable as:are now suppressed; the text renderer prints a one-line note in their place, and the JSON renderer sets both keys tonull.- Heterogeneous list rendering hint.
shapeOf([1, "two", true])collapses the element type toSAny. The rendered JSON Schema now carries adescription: "sampled, may be heterogeneous"so--print-shapeusers see that the schema reflects sampling, not a guarantee. The hint round-trips throughparseJsonSchema(unknown keywords are ignored per JSON Schema's extensibility convention). - Empty piped stdin. Empty stdin in evaluation mode now surfaces the standard "no input" error rather than a confusing JSON parse error on the empty string.
- HCL block access is now uniform across N=1 and N≥2 cases.
Previously, querying
.variablereturned a single map for onevariableblock but a list for two or more — forcing defensive shape checks in queries. Now.variableis always a list, regardless of count. Common Terraform patterns (oneterraform, oneprovider, singlevariable) no longer require N=1-vs-N≥2 branching. Fixed upstream inrumil_parsers 0.8.0(decoder uses theHclBlockdiscriminator already present in the AST instead of inferring shape from key collisions); lambé adopts it via a constraint bump from^0.7.0to^0.8.0.
Dependencies #
rumil_parsers ^0.8.0. The JSON parser AST splitsJsonNumberinto a sealedJsonInt | JsonDoublesum. Lambé propagates the change through one schema-parser switch case —JsonInt() || JsonDouble() => 'number'inlib/src/schema/parser.dart. No user-visible behavior change at the lambé surface; downstream consumers of lambé's library API see no shape difference becauseparseInput-flavored Map/List types remain canonical Dart types (the AST split is only visible when you reach into the JSON AST directly via the lambé schema layer). The HCL fix described above also rides this dependency bump (originally scoped asrumil_parsers 0.7.1; rolled into 0.8.0 alongside the AST split). Seerumil_parsers/BENCHMARKS.mdfor the JSON parser perf wins on the 0.8.0 release; lambé queries operating on JSON inputs benefit transparently.- Object construction accepts JSON-string keys.
{name: .x}was the only spelling;{"name": .x}errored with a confusing "unexpected" message. Now both spellings produce the same map. Keys that are valid identifiers should still use the bare form (name:); keys that aren't (hyphenated, spaces, leading digits) use a JSON-string literal in key position —{"x-axis": .a},{"Content-Type": "application/json"},{"my key": 1}. Lambé's data model accepts any string as a key; the construction grammar now matches. Interpolation ({"\(expr)": .y}) is rejected with a clear message — key position is structurally not an expression position; build dynamic keys viafrom_entrieson a list of{key, value}maps. Shorthand{name}continues to require a bare identifier ({"name"}alone is intentionally not supported).
jq compatibility #
addis now recognized as an alias forsum. A jq idiom that matches Lambé'ssumexactly._jqAliasesinparser.dartis the table; entries belong there only when the jq semantics are an exact match.- Idiom hints for column-1 jq keywords.
_jqIdiomHintand_jqPipeOpHintnow recognisetry/try ... catch,recurse,walk,paths,leaf_paths,range,limit,nth,@csv,@tsv, and@base64. Each produces a one-liner pointing at the lambé equivalent (or, for@base64, the explicit "not supported" signal) instead of the giant op-vocabulary dump. Folds into the pre-existing hints for[],?,..,select,empty, and strandedend.
Schemas as a first-class contract #
--schema <path>on the CLI. Threads a JSON Schema subset through both--explaininference and normal evaluation. With data, the schema validates at load time (structural disagreement exits 1 with a JSON path). Without data, the schema alone seeds shape inference for design-time planning.- Sibling auto-detect. Data at
path/to/data.jsonpicks uppath/to/data.schema.jsonimplicitly. Same convention as ndjson auto-detect. --print-shapeon the CLI. EmitsshapeOf(data)as a JSON Schema subset document, round-trippable with--schemainput. The same shape-to-JSON-Schema rendering powersrenderJsonSchema(shape)on the library and the MCPlambe_print_shapetool.--print-shape EXPRcomposes with the query. When given an expression,lam --print-shape '.users' data.jsonnow returns the schema of the result of evaluating.usersrather than the schema of the whole document. Pre-0.9.0 the expression was silently ignored. Without data, falls back to inferring fromSAny— matches the--explain-without-data flow.- REPL:
:schema [path]and:print-shape.:schema <path>loads a schema for the session and reports agreement/disagreement vs current data.:schema(no arg) prints the active schema.:loadre-validates against an active schema and warns on disagreement. - MCP:
lambe_print_shape,lambe_check,lambe_explain, plus aschemaparameter onlambe_query. Agents can print a shape, validate fixtures against a schema, trace a query structurally before running, or gate a query on schema conformance.lambe_checkreturns{"ok": true}/{"ok": false, "error": "..."}. - Library surface.
parseJsonSchema,renderJsonSchema,loadSchemaFromFile,loadSchemaForData,mergeSchemaWithDataare all exported frompackage:lambe/lambe.dart.
SOptional in the shape ADT #
- New sealed variant
SOptional(Shape). Represents statically-known optionality — populated by JSON Schema'srequiredsemantics, propagated through field access and op inference, and surfaced by the explain trace. Nested optionality collapses at construction:SOptional(SOptional(x))is alwaysSOptional(x). - Acceptance predicates unwrap
SOptionalfor op inputs —filteronSOptional<SList<T>>is accepted, with the potential absence surfaced by a runtime-rejection warning rather than a silent accept or a false reject. - Root-level requirements (TOML/HCL
MustBeMap) do NOT unwrap: an absent root can't be serialized, so users must materialize a default first. This asymmetry is deliberate. shapeToJsonemits{"kind": "optional", "inner": ...}.renderJsonSchemaflattensSOptionalinsideSMapfields into missingrequiredentries (standard JSON Schema idiom); non-field-positionSOptionalhas no standard spelling in our subset and is flattened with a docstring caveat.
Richer --explain output #
Three new categories of static analysis, plus a structured output mode:
- Runtime-rejection warnings (always on). Flags pipe ops whose
input shape is provably incompatible.
.config | filter(.x)on a known map produces"filter rejects map<...>; this will throw at runtime". Uses the existing pipe-op acceptance predicates. - Trivial-result warnings (opt-in via
--explain-trivial). Flagssort_by,group_by,map, andunique_bywhose argument references a field provably absent on the element shape. Opt-in because legitimate uses exist (stable no-op sort, explicit null projection). - Structured JSON output (
--explain-json). Emits the full explain report as JSON with snake_case keys (stages,warnings,writable_as,not_writable_as,flatten_cells). Warning kinds serialize asempty_filter,runtime_rejection,trivial_result. Shapes serialize as nested{kind, ...}trees (viashapeToJson) so agents can pattern-match shape structure without re-parsing. Also surfaces in the newlambe_explainMCP tool. - Both
--explain-trivialand--explain-jsonimply--explain. - New
shapeToJson(Shape),renderExplainJson(ExplainReport),WarningKindenum, andExplainWarning.kindfield on the library.
--ndjson mode for line-delimited JSON input #
- Each line is parsed as an independent JSON document; the query is
evaluated per line with no shared state; one compact JSON result
per line. Auto-enabled when the file extension is
.ndjsonor.jsonl. Stdin support streams:tail -f app.log | lam --ndjson '.level'emits each result as the line arrives. - Fail-fast on the first malformed or unevaluable line; error carries the line number.
- New
queryNdjson(Iterable<String>, LamExpr)library function (Iterable<Object?>, lazy). - Cannot combine with
--interactive,--schema,--assert, or--explain; output is restricted to JSON (--toother thanjsonis refused).
Null input #
-n/--null-inputflag. Run a query againstnullcontext with no input file. Useful for value computations:lam -n '[1,2,3] | unique'. Without-n, the missing-input guard fires (typo'd filename or missing redirect is a common footgun); the flag puts the "I have no input" intent on the command line where it's visible in scripts and code review. The--null-inputspelling matches jq exactly.- Cannot combine with
--interactive,--ndjson,--schema, or--assert. The TTY stdin guard is unchanged.
--flatten-cells for CSV/TSV #
- Opt-in escape hatch: non-scalar cells encoded as JSON strings
inline. Accepts
refuse(default, 0.8.0 behavior) orjson. Underjson, the shape check widensMustBeFlatListtoMustBeListfor csv/tsv. Round-tripping the resulting CSV back into Lambe does NOT recover structure; this is an output-side escape hatch, not a faithful encoding. - Surfaced at the CLI (
--flatten-cells), REPL (:flatten-cells), MCP (flatten_cellsparameter), and asCellPolicy flattenCellsonformatOutput,canWriteAs,canWriteShapeAs,requirementFor, andexplain.
Cross-surface hints #
NotWritable.hints. When a shape mismatch has an environmental resolution (a flag, a setting, a tool parameter), the report carries a structuredHinttype withlabel,cliFlag,replCommand,mcpParameter, andexplanation. CLI, REPL, and MCP each render the form that applies to them. Agent-facing JSON carriesparameter/valuepairs, not CLI syntax.- The first shipping hint covers
--flatten-cells json: when a CSV/TSV request rejects underrefusebut a list root is already present.
Breaking changes #
--schemaflag renamed to--print-shape. 0.8.0's--schemaprinted a type-name JSON summary of the data. That function moved to--print-shape. The new--schematakes a JSON Schema file path. Users scriptinglam --schema data.jsonmust change tolam --print-shape data.json. ArgParser rejects the old form because--schemanow requires a value.--print-shapeoutput format changed. Emits a JSON Schema subset document ({"type": "object", "properties": ..., "required": ...}) instead of the type-name-string JSON format 0.8.0 emitted ({"age": "number"}). The new output round-trips with--schemainput; the old format had no round-trip path.- MCP tool
lambe_schemarenamed tolambe_print_shape. Output format also changed to JSON Schema, matching the CLI. Agents that hardcoded the old tool name get "tool not found" and a message pointing atlambe_print_shape. ShapeADT gainedSOptionalvariant. Source-breaking for external code that pattern-matchesShapewithout a default case (probably just Lambe itself). Exhaustive switches now need a fifth branch.ExplainWarningconstructor gained requiredkindparameter. External code constructing warnings directly must add aWarningKind. Uncommon; the existing pattern is consuming warnings, not producing them.
Deprecated #
inferSchema(Object? value)library function. Emits type-name-string JSON (no round-trip). UserenderJsonSchema(shapeOf(value))for JSON Schema output, orshapeOf(value)for theShapeADT. Scheduled for removal in 1.0.
Install ergonomics #
install.sh— one-line installer at the repo root.curl -fsSL https://raw.githubusercontent.com/hakimjonas/lambe/main/install.sh | shdownloads the latestlamandlam-mcpbinaries for the current platform (Linux x64/arm64, macOS x64/arm64), verifies SHA256 against a publishedchecksums.txt, and installs to~/.local/bin/. No sudo, no shell rc edits. RespectsLAMBE_VERSIONandLAMBE_PREFIXenv vars.- Release workflow generates
checksums.txt..github/workflows/release.ymlnow publishes a combined SHA256 manifest for every release artifact as an asset.install.shrelies on this for integrity checking; downstream package managers (a future Homebrew tap, apt/rpm) can reuse it.
Tooling #
- CHANGELOG self-validation.
tool/lint_changelog.shuses lambé itself (via--assert) to validate this file's structural invariants on every CI run: at least one H2 release entry, no duplicate H2s, the first heading is H2, and the latest H2 matchespubspec.yaml's version. The toolchain checks itself: rumil's Markdown parser handles the input, lambé's query model expresses the invariants. Seedoc/recipes.md#querying-a-changelogfor the underlying queries.
0.8.0 #
Adds element-level shape checking for CSV/TSV output, union headers
across heterogeneous-keyed rows, static warnings for provably-empty
filters in --explain, and line-aware parser diagnostics.
Added #
- Element-level CSV/TSV shape check via
MustBeFlatList. The new requirement class walks the element shape of the outer list and accepts three forms:SList<scalar>,SList<SList<scalar>>, andSList<SMap<k:scalar, ...>>. A list of maps with list- or map-valued cells now raisesOutputShapeErrorinstead of being serialized through Dart's defaulttoString().MustBeListis retained as the generic list-root requirement for future format additions whose serialization tolerates any element shape. Exported frompackage:lambe/lambe.dart. - Defensive cell guard in the CSV/TSV writer. For cases where
shape inference loses precision (heterogeneous list elements
collapse to
SList<SAny>, which the shape check cannot prove incompatible), a non-scalar cell that reaches the serializer throwsQueryErrorwith a descriptive type name rather than silently stringifying. - Union headers across heterogeneous-keyed rows. The writer
previously took headers from the first row only, so
[{a:1}, {b:2}]produced"a\n1\n"and thebcolumn was silently dropped. Headers are now the union of keys across all rows in first-seen order; rows missing a key render as an empty cell. Matches pandas and Python'scsv.DictWritersemantics. - Writer and shape-check consistency matrix.
test/shape_output_consistency_test.dartpins the invariant across 100 cases:canWriteAs(v, fmt) == WritableimpliesformatOutputdoes not raiseOutputShapeError;NotWritablealways raises it. Structural complement topipe_ops_consistency_test.dart; writer drift fails loudly. --explainwarnings for provably-empty filters.filter,filter_values, andfilter_keysreject elements whose predicate is not== true. Two patterns make the op provably empty: a predicate whose field path doesn't exist on the element, value, or key shape; and a predicate whose inferred shape is any concrete non-boolean scalar. The explain report now carries awarningslist andrenderExplainprints it between stages and writability.SBoolandSAnypredicates never warn: either might be true. NewExplainWarningtype exported frompackage:lambe/lambe.dart.
Changed #
- Parser errors are line-aware with source context. Error
messages lead with
line:columninstead of justcolumn, show the offending line in a gutter-prefixed excerpt with a caret under the bad column, and include one line of context on either side for multi-line queries. The "did you mean" hint for mistyped pipe ops is preserved.Location.linewas always available on Rumil'sParseError; the previous renderer assumed single-line input and rendered the caret against the full expression, which put it on the wrong visual line for any multi-line query. - Empty-expression parse error is actionable. Running
lam '' file.jsonpreviously dumped the parser's full list of expected tokens (around 30 items) and buried the actual problem. It now returns a single line:parse error: expression is empty. Same treatment for whitespace-only and newline-only input. --explainCLI path uses the line-aware diagnostic. Previously it printedError: failed to parse queryand skipped the excerpt that--tohad access to. Both paths now go throughparseAst.requirementFor(OutputFormat.csv)andrequirementFor(OutputFormat.tsv)returnMustBeFlatListinstead ofMustBeList. Downstream code that pattern-matchesis MustBeListonNotWritable.requiredwill fliptruetofalse.- MCP server
Error: $ecallsites rewritten toe.message. Three sites inbin/mcp_server.dartpreviously producedError: QueryError: parse error at line ...; theQueryError:prefix doubled the CLI'sError:prefix. ExplainReportconstructor gains an optionalwarningsparameter. Defaults toconst [], so existing callers that passstages,writableAs, andnotWritableAscompile and behave unchanged.
Breaking #
- Queries that relied on the silent CSV/TSV garbage output now
raise
OutputShapeError. Pipelines like.deps | as(csv) | as(toml) | as(csv)used to emit a CSV cell like"[{key: rumil, value: ^0.6.0}, ...]"(Dart's defaultList<Map>.toString()). They now raiseOutputShapeErrorwith the shape that failed the check. Convert the shape before the finalas(csv)(for example project inner lists to strings), or use a different output format. - Writer output for heterogeneous-keyed list-of-maps changes.
[{a:1}, {b:2}]now serializes to"a,b\n1,\n,2\n"instead of"a\n1\n". Row 1 renders an empty cell forb, row 2 renders an empty cell fora. Code that produced correct output on homogeneous-keyed input is unaffected.
0.7.1 #
Polish release on top of 0.7.0. Error-message remediation
suggestions now surface the intent-level as(<format>) form,
aligning every bridge-offering surface (CLI, REPL, MCP, playground)
with the 0.6.0 shape story. The template that runs is unchanged,
so the composed $expression | as(csv) query produces the same
result as before.
Changed #
- Error suggestions use
as(<format>)as the display form. The suggestion shown in CLI errors, REPL prompts, MCP responses, and the playground is now| as(csv)/| as(toml)/ etc. instead of the raw| to_entries/| {items: .}fragment. The explanation names the underlying mechanism for transparency — e.g. "Wraps each map entry as a {key, value} row (equivalent toto_entries)". Remediation.displayandRemediation.templatecan now differ. TheRemediation()constructor still setsdisplay = source. A newRemediation.withDisplay()factory decouples them, used internally to surfaceas(<format>)while the runtime AST stays as the raw fragment. Callers that only readRemediation.template(e.g. throughapplyBridge()) see no behavior change.- Curated template ASTs are parsed lazily on first use. The
four canonical sources (
{items: .},{value: .},to_entries,{value: .} | to_entries) are parsed once per isolate and shared across format-parameterized factories, instead of re-parsing on every shape error.
0.7.0 #
Shape-gated tab completion, single-source-of-truth pipe-op metadata,
and inferShape correctness fixes. Builds on the 0.6.0 shape work:
the completer now uses the same shape machinery that powers
--explain and as(fmt) to hide candidates that would throw at
runtime.
Added #
- Shape-gated pipe-op completion.
.x | <TAB>filters the candidate list by the inferred input shape. A map input hides list-only ops (flatten,sort,sum,first); a list input hides map-only ops (filter_keys,has,map_values,to_entries). Ops that accept any input (as,type) are offered everywhere. When the shape inference isSAny, every op is offered — rejection only happens when the op can be proven incompatible. - Single source of truth for pipe-op metadata.
lib/src/shape/pipe_ops.dartowns, for each of the 27 pipe ops: canonical name, input-shape acceptance predicate, output-shape inference rule, and parse metadata. The parser builds itszeroArgandoneArgalternatives from this table (customgrammar likeas(fmt)is still hand-written); the completer consults it for candidate filtering;inferShapedispatches pipe-op cases through it. Adding a new op with standard grammar is a single spec entry plus an AST case (compile-enforced via sealedLamExpr) plus an evaluator case (compile-enforced). PipeOpInfo,PipeOpParseKind,pipeOpSpecs,pipeOpInfoFor,pipeOpInfoForName,acceptsInputShape,inferPipeOpShape. Exported frompackage:lambe/lambe.dartso tools can reason about op metadata without parsing a query.pipeOpSpecsis the iteration-friendly view,pipeOpInfoFor(astNode)resolves by AST type,pipeOpInfoForName(str)resolves by name. ThePipeOpInforecord shape may gain additional fields in future minor releases as the shape machinery evolves (e.g. richer element-level predicates, documentation strings). Callers that only need stable access should prefer the helper functions (acceptsInputShape,inferPipeOpShape,pipeOpInfoForName) over destructuringPipeOpInforecords directly.- Consistency test matrix.
test/pipe_ops_consistency_test.dartruns every pipe op against a representative value of every concrete shape kind and cross-checks the spec'sacceptspredicate with the evaluator's actual runtime behavior. Drift between spec and evaluator fails loudly instead of silently.
Fixed #
inferShapeno longer lies on structurally incompatible input.flatten,sort,reverse,unique,filter_values,lengthpreviously returned the input shape unchanged when given something the runtime evaluator would reject (e.g.flattenon a map). They now widen toSAny, so--explainreports the truth and downstream inference doesn't propagate impossible shapes.- Re-assertion filter. Candidates whose text exactly matches
what's already typed in
[start, end)are filtered out before returning. Accepting such a candidate is a no-op on the text but moves the cursor backward, which users read as "Tab erased what I typed." Tab on fully-typed tokens is now a silent no-op.
Changed #
- Parser pipe-op rules generated from the spec table.
lib/src/parser.dart's_pipeOpis built by iteratingpipeOpSpecslongest-name-first and dispatching onPipeOpParseKind. The hand-written alternation for the 26 non-custom ops is gone.as(fmt)remains hand-written because its grammar takes a closed keyword set. pipeOpNamesre-exported fromshape/pipe_ops.dart. The parser, the completer, and the misspelling-suggestion logic all read from the same derived list.
Breaking #
Completionstypedef now carries anendfield. Callers that destructured as(:start, :candidates)must destructure as(:start, :end, :candidates)and splice withtext.replaceRange(start, end, candidate)instead oftext.replaceRange(start, cursor, candidate). The new field lets callers splice[start, end)and preserve any trailing whitespace the user typed after a complete token, which the previousstart..cursorsplice consumed.
Docs #
ROADMAP.md. Publishes the 0.7.0 / 0.8.0 / 0.9.0 plan plus explicit non-goals (no Turing-completeness, no streaming, no jq feature parity).- Removed
PLAN_COMPLETER_WHITESPACE_FIX.md(shipped) andISSUES.md(items resolved or tracked on GitHub).
0.6.1 #
Tab completion fix: trailing whitespace in the REPL query no longer
corrupts the replacement offset. Typing .dependencies, a space, then
Tab now completes against .dependencies instead of producing
..dependencies.
Fixed #
- Completer: the replacement
startoffset is now correct when the query has trailing whitespace (space, tab, CR, LF, or any mixture). Previously.users+ Tab returnedstart: 1instead ofstart: 0, which caused the REPL and the arda-web playground to splice the candidate in the wrong position. - Completer:
??,?., and??=were previously split across multiple tokens in the unparsed-remainder classifier. They now match as single operators before falling through.
Changed #
- Completer: unparsed-remainder classification no longer uses regex.
Two small Rumil parsers (
_pipeCtx,_fieldTailCtx) handle pipe-op and field-tail contexts, withposition()for offset tracking. Whitespace handling is uniform across space, tab, CR, and LF. - Dependencies:
rumil,rumil_parsers,rumil_expressionsbumped to^0.6.0. Rumil 0.6.0 adds theposition()primitive used by the completer fix.
0.6.0 #
Shape-aware output with interactive bridging. Lambe now infers the structural shape of query results, reports incompatibilities with target output formats as structured errors, and can bridge common mismatches through a new language combinator or through interactive prompts.
Added #
ShapeADT. A sealed hierarchy (SAny,SNull,SBool,SNum,SString,SList,SMap) describing the structural kind of a value.shapeOf(value)infers the shape of any JSON-shaped value in time proportional to structure depth, using bounded sampling on lists.renderShape(shape)produces the canonical human-readable form (list<map<a: number, b: string>>).canWriteAs(value, format)andcanWriteShapeAs(shape, format). Return aShapeReport(WritableorNotWritable). TheNotWritablecase carries the mismatched shape, the format's requirement, and a list ofRemediationrecords describing curated query-fragments that bridge the mismatch.inferShape(ast, inputShape). A structural interpreter overLamExpr. Given the shape of the value.refers to, returns the shape the query would produce. Every pipeline operator has a rule; the interpreter falls back toSAnywhere output cannot be determined without runtime values.synthesize(from, target)andsynthesizeWithLabels(from, target). Produce AST fragments (or fullRemediationrecords) that bridgefromtotarget's shape requirement.applyBridge(user, bridge)composes a user query with a bridge fragment into a single AST viaPipe, avoiding string manipulation.as(format)combinator. A new pipeline operator written directly in the query language:.users | as(toml)produces a TOML-compatible value if exactly one curated bridge applies, and throws with the candidate list otherwise. Acceptsjson,yaml,toml,csv,tsv,hcl.--explainCLI flag. Prints the inferred shape at each pipe stage of a query, plus the set of output formats the final shape can be serialized as. Performs static analysis only; does not execute the query. Works with or without input data.- Interactive suggestion prompts. When
lam --to <fmt>would produce anOutputShapeErroron an interactive terminal, the CLI now lists the available remediations and applies the chosen one. The REPL shows the same prompt inline and retries the query with the selected bridge. - Structured MCP error payload. The
lambe_queryMCP tool now returns shape-mismatch errors as a JSON object witherror,message,format,got_shape,original_expression, and asuggestionsarray (each entry withid,label,template_text,apply_as,explanation). Agents can respond by calling the tool again with anapply_asquery verbatim. parseAst(expression)andevaluateAst(ast, data)library entry points. The existingquery(expression, data)is now defined asevaluateAst(parseAst(expression), data). Callers that parse once and evaluate against multiple inputs, or that compose a parsed AST with a remediation viaapplyBridge, should use these directly.OutputShapeErrorsubclass ofQueryError. Carries the structuredNotWritablereport with getters forformat,got,required, andsuggestions. Existingcatch (QueryError)handlers continue to work; the new subclass is available for code that wants to render suggestions programmatically.
Changed #
- Completer migrated to shape-based inference. The REPL's tab
completer now walks the parsed AST over a single inferred
Shapetree rather than over a reduced value. Behaviour is unchanged (the same candidates are returned for every case). Benchmark medians are within run-to-run noise of the previous release. - CLI error messages for unwritable output.
lam --to <fmt>now reports shape mismatches with a short teaching message and a list of candidate bridges appended with|, rather than a raw runtime exception.
Fixed #
- AOT benchmark harness.
tool/bench/run.dartgained--aotand--runs Nflags. The AOT path removes JIT warmup from the measurement; the multi-run median of medians suppresses per-process noise so smaller regressions are visible.
0.5.0 #
Added #
to_numberpipeline op. Parses a string as a number; pass-through for existing numbers. Matches CSV and TSV cells, which are strings by default:. | map(.price | to_number) | sum. ThrowsQueryErroron strings that do not parse.typepipeline op. Returns the runtime type of the input as a string:"null","boolean","number","string","array", or"object". Example:. | filter((. | type) == "number").query()andeval()normalize input data. Maps and lists with non-canonical static types (e.g.Map<dynamic, dynamic>from some third-party decoders, or typed literals like<int>[1, 2, 3]) are recursively rebuilt asMap<String, Object?>andList<Object?>before evaluation. Previously these caused cryptic type-cast errors inside the evaluator.queryStringskips this step sinceparseInputalready produces canonical trees. Maps with non-string keys throwQueryErrorwith a clear message.
Performance #
- REPL tab completion is now independent of dataset size. The completer
reduces the data to a shape representative (one sample per list, all map
keys preserved) before walking the partial AST, so operations like
sort_by,group_by, anduniqueno longer execute against the full data. Median completion latency at 1M records drops from ~380ms–1.2s (depending on pipeline ops) to ~1–2ms. Peak resident set during a completion drops from hundreds of MB to the cost of the shape tree. Completion semantics are unchanged: the candidate lists are identical. Benchmark harness undertool/bench/.
Fixed #
unique,unique_by, andgroup_bynow use structural equality on collection-valued keys. Previously these operations relied on Dart's native==forListandMap, which is reference equality, so[{"a":1}, {"a":1}] | uniquereturned both entries instead of one. The evaluator now canonicalizes keys via JSON with sorted map keys before insertion into the hash set/map. Scalar keys (num,bool,String,null) still deduplicate by value as before. Key order in maps no longer affects equality:{"a":1, "b":2}and{"b":2, "a":1}are treated as equal.EvalExceptionfromrumil_expressionsis now wrapped asQueryErrorat the public API boundary (query()andeval()). Previously, type errors in the evaluator (e.g.,.x > 5where.xis a string, ornull + 1) would leak the underlyingEvalExceptionwith a full Dart stack trace, crashingbin/lam.dartwith exit code 255 instead of reporting a clean error with exit code 1. The REPL was not affected because it already had a catch-all handler. The docstring forquery()already advertisedQueryErroras the evaluation error type; this brings the implementation in line with the contract.- REPL banner now uses the actual
lambeVersionfrom_version.dartinstead of a hardcodedv0.1.0string.
Docs #
- Tagline in the library doc comment and MCP server instructions changed from "universal" to "multi-format" — accurate given the specific format set (JSON, YAML, TOML, HCL, CSV, TSV, Markdown).
AGENTS.mdno longer references the unimplemented..(recursive descent) operator in Markdown query examples. The 0.4.0 changelog noted this was removed fromAI.mdbutAGENTS.mdwas missed.AI.mdandAGENTS.mdpipeline operation lists now includeto_numberandtype.
Release infrastructure #
- Release matrix now builds Linux ARM64 and macOS ARM64 (Apple Silicon) in addition to x64 and Windows. The MCP registry manifest covers all five platforms.
- GitHub Actions bumped:
upload-artifactv4→v7,download-artifactv4→v8,action-gh-releasev2→v3.
0.4.0 #
Added #
- Pipeline ops are now valid bare expressions with implicit
.input.has("k"),length,keys,sum,filter(...),map(...)and every other pipe op can appear as standalone expressions —has("k")parses as sugar for. | has("k"). This also unblocks common shapes likemap(has("email")),filter(has("k")), andfilter(length > 0). Bare ops are only consulted after the other_atomalternatives fail, so existing forms like{length}object shorthand,.lengthfield access, and"\(length)"string interpolation keep their prior meaning.
Breaking #
- XML input/output support removed.
Format.xml,OutputFormat.xml, and XML extension detection (.xml,.pom,.csproj,.svg) are gone. The XML→native projection was lossy in ways that silently produced wrong query results (repeated sibling elements collapsed under last-wins map semantics; attributes were dropped entirely). Rather than ship a footgun, XML is dropped for now. The underlying XML parser inrumil_parsersis unchanged and remains spec-compliant; a future lambe release may reintroduce XML with a proper projection (array-preserved siblings, attribute preservation) once the design is settled.
MCP surface #
output_formatparameter on thelambe_queryMCP tool. AI agents can now request yaml/toml/csv/tsv/hcl output directly, matching the CLI's--toflag. Defaults to json.- CSV and TSV exposed through the MCP surface. The library always
supported them; the MCP
formatenum was missing them. - MCP tool descriptions now document common pitfalls:
&&/||for boolean logic (notand/or), bracket syntax for hyphenated keys,has()and other pipeline ops requiring a leading|, and the[{key, values}]shape ofgroup_byoutput. - Build-time version generation.
tool/gen_version.dartreadspubspec.yamland writeslib/src/_version.dart, which the MCP server uses to report its version. Run after bumping the pubspec; the release workflow also runs it automatically. test/doc_examples_test.dart— AI-doc and MCP-instruction examples are now test-gated. Everylam '...'in AI.md and every embedded query in the MCP server's tool descriptions/instructions is parsed and evaluated against a fixture. Prevents future phantom-feature drift (e.g., LLM-drafted examples that advertise syntax the parser doesn't implement).
Fixed #
- MCP server now reports its actual version.
bin/mcp_server.darthad hardcoded0.1.0since that release and was never bumped. - Removed phantom
..(recursive descent) references from docs. The operator was advertised inAI.mdand the MCP server instructions as a Markdown query pattern but was never implemented. Callers who saw it would have hit parse errors. - Fixed broken example in
AI.md:filter(has("resources") == false)→filter((. | has("resources")) == false).hasis a pipeline op and cannot appear as a bare expression.
0.3.0 #
Added #
- Markdown support. CommonMark Markdown (.md, .markdown) is now a queryable input format. Parsed into a typed AST with node types like heading, paragraph, link, code_block, list, image, emphasis, etc.
mdToNativepublic API for convertingMdDocumentto queryable Dart types- Markdown query examples in MCP server instructions, AI.md, and AGENTS.md
Changed #
- Bumped rumil, rumil_parsers, rumil_expressions to ^0.5.0
- Rewrote
tool/manpage.dartto useparseMarkdown+parseYamlfrom rumil_parsers instead of handrolled parser - 491 tests (was 465)
0.2.0 #
Breaking #
|is expression composition.PipeOpsealed class removed. Pipeline operations are nowLamExprsubtypes. Any expression can appear after|:.users[0] | {name, age},. | if .active then "yes" else "no".
Improved #
- Parser error messages show position pointers and contextual descriptions
- "Did you mean?" suggestions for misspelled pipeline operations
- MCP tool descriptions expanded with syntax reference and common patterns
- Expanded recipes: object projection, string interpolation, chaining patterns
Added #
doc/jq-to-lambe.mdmigration guidetest/syntax_examples_test.dartbacking every example indoc/syntax.md- 465 tests (was 369)
0.1.1 #
- Added
.mcp.jsonfor automatic MCP server discovery in AI coding assistants - Documented MCP server setup in README
- Added query syntax guide, REPL guide, recipes, and man page to
doc/
0.1.0 #
Core #
- Query AST: sealed
LamExprhierarchy (16 subtypes) + sealedPipeOp(24 subtypes) - Left-recursive parser via Rumil's
rule()+ Warth seed-growth - Operator precedence via layered
chainl1calls - Null propagation: navigation propagates null, computation throws on type errors
- Tolerant parsing via
.recover()for REPL completion and multi-line detection
Query Language #
- Property access chains:
.users[0].address.city - Negative indexing:
.items[-1] - String key indexing:
.data["key"] - Slicing:
.[1:3],.[:3],.[2:],.[:-1] - Arithmetic:
+,-,*,/,% - Comparison:
<,<=,>,>=,==,!= - Boolean logic:
&&,||,! - Object construction with shorthand:
{name, total: .price * .qty} - Conditionals:
if .age > 65 then "senior" else "active" - String interpolation:
"\(.name) is \(.age) years old"
Pipeline Operations (24) #
- Filter and transform:
filter,map - Ordering:
sort,sort_by,reverse - Grouping:
group_by(returns{key, values}structure) - Deduplication:
unique,unique_by - Structure:
flatten,keys,values,length,first,last - Aggregation:
sum,avg,min,max - Map operations:
filter_values,map_values,filter_keys - Existence:
has - Entry conversion:
to_entries,from_entries
Multi-format I/O #
- Input: JSON, YAML, TOML, HCL, XML, CSV, TSV with auto-detection
- Output:
--to json/yaml/toml/xml/csvfor format conversion --schemafor data structure inference--assertfor CI/CD validation (exit 0 if true, 1 if false)
Interactive REPL (lam -i) #
- Parser-driven tab completion on field names, pipeline operations, and inner fields
- Syntax highlighting and colorized JSON output
- Persistent history (
~/.lambe_history) with Ctrl+R reverse search - Multi-line input with
\continuation and parser-driven bracket detection - Ctrl+Left/Right word movement, Ctrl+A/E/K/U editing shortcuts
- REPL commands:
:schema,:to,:raw,:pretty,:load,:history,:help,:quit
API #
- Library:
query(),queryJson(),queryString(),parse(),eval() - Output:
formatOutput(),inferSchema() - CLI:
lam '<expression>' [file]with all flags - MCP server:
lambe_query,lambe_schema,lambe_asserttools
Ecosystem #
lambe_testpackage with matchers:lamWhere,lamEquals,lamMatches,lamHas- MCP server installable via
dart pub global activate lambe→lam-mcp