skillscore 0.1.0
skillscore: ^0.1.0 copied to clipboard
Lint and score AI agent skills (SKILL.md) against the official Claude, Codex, and Antigravity authoring guides. Offline, deterministic CLI.
skillscore — lint and score AI agent skills (SKILL.md) #
skillscore statically analyzes any AI agent skill — a SKILL.md manifest
and its folder — and produces a 0–100 quality score, a letter grade,
and a list of actionable findings, scored against the official skill
authoring guides from Anthropic (Claude), Google (Antigravity), and
OpenAI (Codex). Offline, deterministic, CI-friendly.
What is skillscore? #
skillscore is a skill linter / SKILL.md validator / agent-skill quality
checker / AI skill scorer. Agent skills are an open standard — a folder
with a SKILL.md (YAML frontmatter + Markdown body) plus optional
references/, examples/, scripts/, and assets/ — used by Claude Code,
Codex, Antigravity, Gemini CLI, and Cursor. Because an agent keeps every
skill's name and description in its context budget permanently, a vague
or malformed skill is worse than no skill. skillscore catches exactly those
problems before a skill ships.
Quickstart #
# Install
dart pub global activate skillscore
# Score a single skill (any name, any location)
skillscore path/to/SKILL.md
# Score every skill in a folder or monorepo
skillscore path/to/skills/
# Pick a target ruleset
skillscore my-skill/ --target claude
# Machine-readable output for CI / dashboards
skillscore my-skill/ --format json
# Gate CI: fail the build if any skill scores below 80
skillscore skills/ --min-score 80
Sample output (trimmed):
csv-to-xlsx (skills/spreadsheet-skill/SKILL.md)
Score: 72/100 Grade: C
A Frontmatter validity 15/15 ██████████
B Description quality 12/25 █████░░░░░
C Conciseness & token economy 10.5/15 ███████░░░
D Structure & progressive disclosure 15/15 ██████████
E Instruction quality 9/20 █████░░░░░
F Content hygiene 10/10 ██████████
G Safety & scripts no penalty
WARNING B2_description_when line 3
Description has no trigger clause saying when to use the skill.
fix: Add a trigger clause such as "Use when the user asks to ..."
Commands and flags #
skillscore <path> Score a manifest, a skill folder, or a tree of skills
skillscore rules List every rule: id, title, weight, targets, source guide
skillscore explain <rule-id> Print a rule's rationale, the fix, and its source guide
skillscore --version
skillscore --help
| Flag | Values | Default | Purpose |
|---|---|---|---|
--target |
claude | antigravity | codex | universal |
universal |
Which guide's ruleset to apply |
--format |
pretty | json | sarif |
pretty |
Output format (SARIF 2.1.0 renders in code-review tools) |
--min-score <n> |
0–100 | — | Exit non-zero if any skill scores below n |
--strict |
— | off | Treat warning-level findings as errors |
--quiet |
— | off | Print only the final score line per skill |
--no-color |
— | off | Disable ANSI colors |
Exit codes: 0 all skills meet the threshold · 1 a skill is below
--min-score, or --strict and any error/warning exists · 2 usage error
(bad path, unreadable file, invalid flag).
How is the score calculated? #
100 points are distributed across categories A–F. Each rule awards full,
partial, or zero points; partial-credit formulas are documented in each
rule's doc comment and shown by skillscore explain <id>. Category G
(safety) is a penalty of up to −15 that applies only when the skill
ships scripts or terminal commands. Profiles that exclude a rule (e.g.
--target claude excludes the Codex-specific B4) are normalized back to a
0–100 scale, so scores are comparable across targets.
Grades: A 90–100 · B 80–89 · C 70–79 · D 60–69 · F below 60.
The full rubric #
| Rule | Title | Pts | Severity | Targets | Source |
|---|---|---|---|---|---|
A1_frontmatter_present |
YAML frontmatter delimited by --- |
4 | error | all | Anthropic |
A2_name_format |
name ≤64 chars, lowercase/digits/hyphens |
4 | error | all | Anthropic |
A3_name_reserved_words |
name avoids "anthropic"/"claude" |
3 | error (claude) / info | all | Anthropic |
A4_description_present |
description present, ≤1024 chars |
4 | error | all | Anthropic |
B1_description_what |
States WHAT (opens with action verb) | 6 | warning | all | Anthropic |
B2_description_when |
States WHEN ("use when ...") | 6 | warning | all | Anthropic |
B3_third_person |
Written in third person | 5 | warning | all | Anthropic |
B4_frontloaded_triggers |
Concrete keywords in first ~60 chars | 4 | warning | codex, universal | Codex |
B5_boundary_clause |
Has a "do not use" boundary | 4 | warning (antigravity) / info | antigravity, universal | Antigravity |
C1_body_length |
Body ≤500 lines (linear to 0 at 1000) | 6 | warning | all | Anthropic |
C2_explainer_bloat |
No definitions of common knowledge | 5 | warning | all | Anthropic |
C3_excessive_optionality |
No long "or" chains | 4 | info | all | Anthropic |
D1_progressive_disclosure |
Depth split into references/examples | 5 | info | all | Anthropic |
D2_one_level_links |
Reference links one level deep | 5 | warning | all | Anthropic |
D3_reference_toc |
Long reference files have a TOC | 5 | info | all | Anthropic |
E1_anti_patterns |
States anti-patterns explicitly | 6 | warning | all | Flutter |
E2_workflow_checklist |
Checklist or numbered workflow | 5 | warning | all | Anthropic |
E3_feedback_loop |
Validate → fix → repeat loop | 5 | warning | all | Anthropic |
E4_code_example |
At least one fenced code example | 4 | warning | all | Anthropic |
F1_time_sensitive |
No date-anchored statements that rot | 4 | warning | all | Anthropic |
F2_forward_slashes |
Paths use forward slashes only | 3 | error | all | Anthropic |
F3_consistent_terminology |
No synonym mixing (conservative) | 3 | info | all | Anthropic |
G1_safety_section |
Scripts/commands need a Safety section | −8 | error | antigravity, universal | Antigravity |
G2_script_docs |
Bundled scripts are documented | −7 | warning | all | Anthropic |
Run skillscore rules for the live table and
skillscore explain <rule-id> for any rule's rationale and fix.
How do I gate CI on skill quality? #
# .github/workflows/skills.yml
- name: Lint agent skills
run: |
dart pub global activate skillscore
skillscore skills/ --min-score 80 --no-color
--format json feeds dashboards; --format sarif uploads to GitHub code
scanning so findings annotate pull requests.
FAQ #
What is an agent skill?
A folder with a SKILL.md manifest (YAML frontmatter + Markdown
instructions) that teaches an AI agent a repeatable task. Optional
subfolders hold references, examples, scripts, and assets.
Does skillscore work with Claude Code / Codex / Antigravity / Gemini CLI / Cursor?
Yes. The SKILL.md format is shared across all of them. Score against one
vendor's rules with --target, or use the default universal profile,
which a portable skill should pass everywhere.
Is it offline? Completely. skillscore makes no network calls at runtime, analyzes local files only, and is fully deterministic — the same input always produces the same score and finding order.
How do I score every skill in a monorepo?
skillscore path/to/repo/ — it walks the tree, finds every folder with a
SKILL.md (case-insensitive), and scores each one, deterministically
ordered by path.
Does my skill have to be named a certain way?
No. skillscore is name-agnostic: the frontmatter name, the folder name,
and the file name are all independent, and unusual names (including
non-ASCII folder names) are handled — though rule A2 will tell you if the
name field itself violates the official format.
What happens with malformed frontmatter? No crash: the relevant A-category errors are reported and every other rule that can still run does, so you always get a score.
How does skillscore compare to alternatives? #
- Vendor skill validators (e.g. quick checks built into agent CLIs) verify only schema validity — name format, description present. skillscore additionally scores quality: discoverability, conciseness, structure, instruction design, hygiene, and safety, with cited sources per rule.
- Generic Markdown linters (markdownlint, Vale) check prose style, not
skill semantics; they don't know what a frontmatter
descriptionmust contain for an agent to find the skill. - Asking an LLM to review your skill is non-deterministic and unsuitable for CI gates. skillscore is static, reproducible, and exits with codes designed for pipelines. The two combine well.
Library use #
skillscore is also a Dart library:
import 'package:skillscore/skillscore.dart';
void main() {
final doc = SkillParser().parseFile('my-skill/SKILL.md');
final result = Scorer(RuleRegistry()).score(doc, Target.universal);
print('${result.score}/100 ${result.grade}');
}
Contributing #
New rules are one class + one registration — see CONTRIBUTING.md for the walkthrough and the project's design principles (every rule cites its source guide, deterministic output, offline only, name-agnostic). Use the "Propose a new rule" issue template to suggest one.
License #
Apache-2.0. See CHANGELOG.md for release history.