sherpa_onnx 1.10.23 copy "sherpa_onnx: ^1.10.23" to clipboard
sherpa_onnx: ^1.10.23 copied to clipboard

Speech recognition, speech synthesis, and speaker recognition using next-gen Kaldi with onnxruntime without Internet connection.

Supported functions #

Speech recognition Speech synthesis Speaker verification Speaker identification
✔️ ✔️ ✔️ ✔️
Spoken Language identification Audio tagging Voice activity detection
✔️ ✔️ ✔️
Keyword spotting Add punctuation
✔️ ✔️

Supported platforms #

Architecture Android iOS Windows macOS linux
x64 ✔️ ✔️ ✔️ ✔️
x86 ✔️ ✔️
arm64 ✔️ ✔️ ✔️ ✔️ ✔️
arm32 ✔️ ✔️
riscv64 ✔️

Supported programming languages #

1. C++ 2. C 3. Python 4. JavaScript
✔️ ✔️ ✔️ ✔️
5. Java 6. C# 7. Kotlin 8. Swift
✔️ ✔️ ✔️ ✔️
9. Go 10. Dart 11. Rust 12. Pascal
✔️ ✔️ ✔️ ✔️

For Rust support, please see sherpa-rs

It also supports WebAssembly.

Introduction #

This repository supports running the following functions locally

  • Speech-to-text (i.e., ASR); both streaming and non-streaming are supported
  • Text-to-speech (i.e., TTS)
  • Speaker identification
  • Speaker verification
  • Spoken language identification
  • Audio tagging
  • VAD (e.g., silero-vad)
  • Keyword spotting

on the following platforms and operating systems:

with the following APIs

  • C++, C, Python, Go, C#
  • Java, Kotlin, JavaScript
  • Swift, Rust
  • Dart, Object Pascal

You can visit the following Huggingface spaces to try sherpa-onnx without installing anything. All you need is a browser.

Description URL
Speech recognition Click me
Speech recognition with Whisper Click me
Speech synthesis Click me
Generate subtitles Click me
Audio tagging Click me
Spoken language identification with Whisper Click me

We also have spaces built using WebAssembly. The are listed below:

Description Huggingface space ModelScope space
Voice activity detection with silero-vad Click me 地址
Real-time speech recognition (Chinese + English) with Zipformer Click me 地址
Real-time speech recognition (Chinese + English) with Paraformer Click me 地址
Real-time speech recognition (Chinese + English + Cantonese) with Paraformer-large Click me 地址
Real-time speech recognition (English) Click me 地址
VAD + speech recognition (Chinese + English + Korean + Japanese + Cantonese) with SenseVoice Click me 地址
VAD + speech recognition (English) with Whisper tiny.en Click me 地址
VAD + speech recognition (English) with Zipformer trained with GigaSpeech Click me 地址
VAD + speech recognition (Chinese) with Zipformer trained with WenetSpeech Click me 地址
VAD + speech recognition (Japanese) with Zipformer trained with ReazonSpeech Click me 地址
VAD + speech recognition (Thai) with Zipformer trained with GigaSpeech2 Click me 地址
VAD + speech recognition (Chinese 多种方言) with a TeleSpeech-ASR CTC model Click me 地址
VAD + speech recognition (English + Chinese, 及多种中文方言) with Paraformer-large Click me 地址
VAD + speech recognition (English + Chinese, 及多种中文方言) with Paraformer-small Click me 地址
Speech synthesis (English) Click me 地址
Speech synthesis (German) Click me 地址
Description URL 中国用户
Streaming speech recognition Address 点此
Text-to-speech Address 点此
Voice activity detection (VAD) Address 点此
VAD + non-streaming speech recognition Address 点此
Two-pass speech recognition Address 点此
Audio tagging Address 点此
Audio tagging (WearOS) Address 点此
Speaker identification Address 点此
Spoken language identification Address 点此
Keyword spotting Address 点此

Real-time speech recognition

Description URL 中国用户
Streaming speech recognition Address 点此

Text-to-speech

Description URL 中国用户
Android (arm64-v8a, armeabi-v7a, x86_64) Address 点此
Linux (x64) Address 点此
macOS (x64) Address 点此
macOS (arm64) Address 点此
Windows (x64) Address 点此

Note: You need to build from source for iOS.

Generating subtitles

Description URL 中国用户
Generate subtitles (生成字幕) Address 点此
Description URL
Speech recognition (speech to text, ASR) Address
Text-to-speech (TTS) Address
VAD Address
Keyword spotting Address
Audio tagging Address
Speaker identification (Speaker ID) Address
Spoken language identification (Language ID) See multi-lingual Whisper ASR models from Speech recognition
Punctuation Address

How to reach us #

Please see https://k2-fsa.github.io/sherpa/social-groups.html for 新一代 Kaldi 微信交流群 and QQ 交流群.

19
likes
0
pub points
83%
popularity

Publisher

unverified uploader

Speech recognition, speech synthesis, and speaker recognition using next-gen Kaldi with onnxruntime without Internet connection.

Homepage
Repository (GitHub)
View/report issues

Topics

#speech-recognition #speech-synthesis #speaker-identification #audio-tagging #voice-activity-detection

Documentation

Documentation

License

unknown (license)

Dependencies

ffi, flutter, sherpa_onnx_android, sherpa_onnx_ios, sherpa_onnx_linux, sherpa_onnx_macos, sherpa_onnx_windows

More

Packages that depend on sherpa_onnx