text_counter

A lightweight Dart utility for accurately counting characters and words in over 100 languages, including CJK (Chinese, Japanese, Korean), RTL (Right-to-Left) scripts like Arabic and Hebrew, and mixed-language texts.

text_counter uses Microsoft Word-compatible word counting logic, ensuring consistent and familiar results across different writing systems. This makes it ideal for applications requiring accurate text metrics โ€” such as content editors, writing tools, and input validation systems.

โœจ Features

  • โœ… Uses Microsoft Word's word counting rules:
    • Words are split by whitespace and common punctuation.
    • Hyphenated words (e.g., "state-of-the-art") are counted as a single word.
    • Numbers and symbols are treated appropriately based on context.
  • โœ… Language-aware counting strategies:
    • CJK (Character-based): Each character is counted individually (used for Chinese, Japanese, Korean, etc.).
    • Latin & RTL scripts (Word-based): Standard word-based counting using appropriate delimiters and tokenization.
  • ๐Ÿ” Automatic language/script detection for mixed-language texts.
  • โšก Lightweight and dependency-free: No external libraries required.
  • ๐ŸŒ Supports over 100 languages out of the box.

๐Ÿ“ฆ Installation

Add this to your package's pubspec.yaml:

dependencies:
  text_counter: ^0.1.0

Then run:

dart pub get

๐Ÿงช Usage

Basic Example

import 'package:text_counter/text_counter.dart';

void main() {
  print('Chinese: ${TextCounter.count("ไฝ ๅฅฝ๏ผŒไธ–็•Œ", languageCode: "zh")}'); // 5
  print('Japanese: ${TextCounter.count("ใ“ใ‚“ใซใกใฏไธ–็•Œ", languageCode: "ja")}'); // 7
  print('Korean: ${TextCounter.count("์•ˆ๋…•ํ•˜์„ธ์š” ์„ธ์ƒ", languageCode: "ko")}'); // 7
  print('Arabic: ${TextCounter.count("ู…ุฑุญุจุง ุจุงู„ุนุงู„ู…", languageCode: "ar")}'); // 2
  print('Hebrew: ${TextCounter.count("ืฉืœื•ื ืขื•ืœื", languageCode: "he")}'); // 2
  print('English: ${TextCounter.count("Hello world", languageCode: "en")}'); // 2

  const mixed = "Hello ไฝ ๅฅฝ ู…ุฑุญุจุง ใ“ใ‚“ใซใกใฏ";
  print('Mixed Text "$mixed": ${TextCounter.count(mixed)}'); // 9
}

๐Ÿ—บ๏ธ Supported Languages

Script Type Language Codes
CJK (Character-based) zh, yue, ja, ko, th, hi, bn, ta, te, kn,ml, si, km, my, lo, tl, jw, su, bo, dz
RTL (Word-based) ml, si, km, my, lo, tl, jw, su, bo, dz
Latin (Word-based) All other ISO 639-1 language codes not listed above, including: en,de,es,fr,it,pt,nl,tr,pl,ca,sv,id,fi,vi,hi,uk,el,ms,cs,ro,da,hu,no,th...

If no languageCode is provided, the library automatically detects script types and applies appropriate counting rules.

๐Ÿ› ๏ธ How It Works

  • For CJK languages, each ideographic or logographic character is counted individually.
  • For Latin and RTL scripts, standard word boundaries are detected using whitespace and punctuation patterns similar to those used by Microsoft Word.
  • In mixed-language texts, the counter dynamically switches between counting methods depending on the script being used.

๐Ÿงฉ Use Cases

  • Content management systems
  • Rich-text editors
  • Writing apps with word limits
  • Language learning platforms
  • Analytics dashboards
  • Form validation utilities

๐Ÿ“š API Reference

int TextCounter.count(String text, {String? languageCode});
  • text: The input string to be analyzed.
  • languageCode: Optional BCP 47 language code (e.g., "en" for English, "zh" for Chinese). If omitted, auto-detection is used.

๐Ÿ“Ž License

MIT License โ€“ see LICENSE

Libraries

text_counter