arb/icu_message library
Inspect ICU MessageFormat strings — extract placeholder names and the
plural categories used in a {var, plural, …} expression.
Scope. This is a focused scanner, not a full ICU parser. It's designed for v1.0's two structural-check needs:
placeholder_match: ensure every{name}in the source also appears in every translation, and no new names are introduced.plural_categories: when the source string is a plural expression, ensure each translation includes the CLDR categories required for its locale.
Supported.
- Simple placeholders:
Hello {name}→{name}. - Typed placeholders:
{count, number},{date, date, yMMMd}. - Plural expressions at any nesting depth (the scanner recurses into each branch).
- Select expressions (treated as a non-plural typed placeholder by
extractPluralCategories, but their variable name is extracted byextractPlaceholders). - ICU escapes:
''→ literal';'…'quoted run when the run contains{,},|, or#(per the ICU 4.8+ "auto-quoting" rules).
Out of scope for v1.0. Argument formatting beyond name + first
keyword (e.g. detailed number skeleton inspection). If a v1.1+ check
rule needs deeper analysis we will swap in a real ICU parser.