dsbuild 0.1.0-alpha.5
dsbuild: ^0.1.0-alpha.5 copied to clipboard
Dataset preparation for LM training. Usable as a standalone tool and a library. Process conversations using StreamTransformers configured with simple YAML. Supports multithreading.
0.1.0-alpha.5:
- Required binary data can now be pushed to workers.
ConversationTransformers
can now pushMessageRead
andConversationRead
events.- Fix unhandled exception on missing artifact (now a warning)
- Allow FileConcatenate to create new directories and overwrite existing files.
- ExactReplace can now use external csv data (using the new
PackedDataCache
) - Implement new transformers
StatsCountOccurrences
andStatsAddColMerge
- HtmlStrip can now strip elements by dom query
0.1.0-alpha.4:
- Simplified API and pipeline (major breaking changes)
- Configurable dispatching replaces readers/writers.
ConversationTransformer
replacesPreprocessor
andPostprocessor
.required
andartifacts
config section replaces input/output descriptors.
0.1.0-alpha.3:
- Multithreading
- Multiple passes
- Additional transformers