skillscore 0.6.0
skillscore: ^0.6.0 copied to clipboard
Lint and score AI agent skills (SKILL.md) against the official Claude, Codex, and Antigravity authoring guides. Offline, deterministic CLI.
Changelog #
All notable changes to this project are documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
0.6.0 - 2026-06-29 #
Added #
- Eval harness: new
skillscore evalcommand family for testing skill trigger accuracy without an API key.eval init <path>— scaffolds anevals.jsonfile with 20 default queries derived from the skill description.eval validate <path>— parses and validates the eval suite; enforces trigger/non-trigger balance and minimum query count.eval run <path>— runs the full suite using the offlineHeuristicEvalClient; exits 0 for pass, 1 for gate failure, 2 for usage errors.
- HeuristicEvalClient: fully offline, deterministic trigger scorer.
- Extracts trigger terms (
use whenclause), boundary terms (do not useclause), and what-terms (first sentence) from the skill description. - Boundary-exclusivity logic: only terms unique to the boundary clause penalize a query — shared nouns never cause false negatives.
- Meta-query pattern:
what is,explain,write a test for,summarise,debug why, and similar explainer queries never trigger. - Wave noise (±7%, deterministic per call index) adds realistic variance without randomness.
- Extracts trigger terms (
- B6 description truncation rule: warns when the skill description field is too long for reliable discovery.
- C1 fix hint: improved hierarchy pointer in the fix hint for conciseness rule C1.
--format jsonoutput foreval runincludesskill,passed,failed,queries, and per-query results.- QA test suite (
tool/qa_run.mjs): 20 test cases covering all three eval subcommands with PNG terminal screenshots indocs/qa/evidence/.
0.3.0 - 2026-06-14 #
Added #
- BPE token counting: every scorecard now shows the token cost of each
skill split into two scopes:
description (permanent)— loaded on every prompt for skill discoveryfull manifest (active)— loaded only when the skill is invoked
- Token counts use the cl100k_base BPE vocabulary (exact for GPT-4 and Codex; Claude estimates add 10% overhead).
--format jsonoutput gains atokenskey withgpt4andclaudeEstimatevalues for both scopes, ready for dashboards and CI.- Validated against the official Anthropic
count_tokensAPI across all 31 skills ingoogle/skills: mean actual Claude overhead +10.2%, median exactly +10.0%, range +0% to +20%.
0.2.0 - 2026-06-13 #
Added #
- Multi-path scoring: pass two or more paths in one command and get a
combined report with a summary line:
skillscore skill-a/ skill-b/ skill-c/ - Duplicate manifests are silently deduplicated, so overlapping paths (e.g. a tree root and one of its children) are each scored once.
- When multiple paths are given and one is invalid or contains no manifest, the CLI warns on stderr and scores the remaining paths rather than aborting. All paths invalid still exits with code 2.
- Usage string updated to reflect
<path> [<path> ...].
0.1.1 - 2026-06-12 #
Changed #
- Documentation only: the README now leads with a cover banner and a
terminal demo GIF (scoring the Flutter team's
flutter-add-widget-testskill at 90/A, then explaining a finding with its source guide). Images are served fromraw.githubusercontent.comso they render on pub.dev. No code or behavior changes.
0.1.0 - 2026-06-12 #
Added #
- Initial release of
skillscore. - 24 scoring rules across 7 rubric categories (A frontmatter, B description, C conciseness, D structure, E instruction quality, F hygiene, G safety penalty), each citing its source authoring guide (Anthropic, Antigravity, Codex, Flutter).
- Target profiles:
--target claude | antigravity | codex | universalimplemented as data (per-target rule activation + severity overrides). - Name-agnostic, case-insensitive skill discovery for single manifests, skill folders, and whole directory trees; BOM/CRLF tolerance; graceful handling of malformed frontmatter, binary files, and symlinks.
- Reporters:
pretty(colored, per-category bars),json(stable CI shape), andsarif(valid SARIF 2.1.0). - CI gating via
--min-scoreand--strict; exit codes 0/1/2. skillscore rulesandskillscore explain <rule-id>with rationale, fix, and source citations.