arb/icu_message library

Inspect ICU MessageFormat strings — extract placeholder names and the plural categories used in a {var, plural, …} expression.

Scope. This is a focused scanner, not a full ICU parser. It's designed for v1.0's two structural-check needs:

  • placeholder_match: ensure every {name} in the source also appears in every translation, and no new names are introduced.
  • plural_categories: when the source string is a plural expression, ensure each translation includes the CLDR categories required for its locale.

Supported.

  • Simple placeholders: Hello {name}{name}.
  • Typed placeholders: {count, number}, {date, date, yMMMd}.
  • Plural expressions at any nesting depth (the scanner recurses into each branch).
  • Select expressions (treated as a non-plural typed placeholder by extractPluralCategories, but their variable name is extracted by extractPlaceholders).
  • ICU escapes: '' → literal '; '…' quoted run when the run contains {, }, |, or # (per the ICU 4.8+ "auto-quoting" rules).

Out of scope for v1.0. Argument formatting beyond name + first keyword (e.g. detailed number skeleton inspection). If a v1.1+ check rule needs deeper analysis we will swap in a real ICU parser.

Classes

IcuMessage