dart_bert_tokenizer 1.0.2 copy "dart_bert_tokenizer: ^1.0.2" to clipboard
dart_bert_tokenizer: ^1.0.2 copied to clipboard

A lightweight, pure Dart implementation of BERT WordPiece tokenizer. 100% compatible with HuggingFace tokenizers.

Changelog #

1.0.2 #

Added #

  • Project configuration files (.gitignore)
  • Updated .pubignore for cleaner package distribution

1.0.1 #

Added #

  • Comprehensive dartdoc comments for all public APIs
  • .pubignore for cleaner package distribution

1.0.0 #

  • Initial release
  • Pure Dart implementation of BERT WordPiece tokenizer
  • 100% HuggingFace tokenizers compatibility
  • Memory-efficient typed arrays (Int32List, Uint8List)
  • Single text and sentence pair encoding
  • Batch encoding (sequential and parallel with Isolates)
  • Padding and truncation support
  • Offset mapping (char-to-token, token-to-char, word-to-tokens)
  • Vocabulary access and token conversion utilities
1
likes
150
points
78
downloads

Publisher

verified publisherbrodykim.work

Weekly Downloads

A lightweight, pure Dart implementation of BERT WordPiece tokenizer. 100% compatible with HuggingFace tokenizers.

Repository (GitHub)
View/report issues

Topics

#nlp #bert #tokenizer #machine-learning #wordpiece

Documentation

API reference

License

MIT (license)

More

Packages that depend on dart_bert_tokenizer