dsbuild 0.1.0-alpha.5 copy "dsbuild: ^0.1.0-alpha.5" to clipboard
dsbuild: ^0.1.0-alpha.5 copied to clipboard

Dataset preparation for LM training. Usable as a standalone tool and a library. Process conversations using StreamTransformers configured with simple YAML. Supports multithreading.

0.1.0-alpha.5:

  • Required binary data can now be pushed to workers.
  • ConversationTransformers can now push MessageRead and ConversationRead events.
  • Fix unhandled exception on missing artifact (now a warning)
  • Allow FileConcatenate to create new directories and overwrite existing files.
  • ExactReplace can now use external csv data (using the new PackedDataCache)
  • Implement new transformers StatsCountOccurrences and StatsAddColMerge
  • HtmlStrip can now strip elements by dom query

0.1.0-alpha.4:

  • Simplified API and pipeline (major breaking changes)
  • Configurable dispatching replaces readers/writers.
  • ConversationTransformer replaces Preprocessor and Postprocessor.
  • required and artifacts config section replaces input/output descriptors.

0.1.0-alpha.3:

  • Multithreading
  • Multiple passes
  • Additional transformers

2
likes
140
points
26
downloads

Publisher

verified publishersquish.tech

Weekly Downloads

Dataset preparation for LM training. Usable as a standalone tool and a library. Process conversations using StreamTransformers configured with simple YAML. Supports multithreading.

Repository (GitHub)

Topics

#ai #nlp #lm

Documentation

API reference

License

MIT (license)

Dependencies

async, bloc, crypto, csv, fast_immutable_collections, glob, html, http, logging, meta, path, yaml

More

Packages that depend on dsbuild