azure_stt_flutter 0.0.1
azure_stt_flutter: ^0.0.1 copied to clipboard
A Flutter package for real-time Speech-to-Text transcription using Microsoft Azure Cognitive Services with BLoC/Cubit pattern. Supports Mobile, Desktop, and Web platforms
Azure STT Flutter #
A Flutter package for real-time Speech-to-Text (transcription) using Microsoft Azure Cognitive Services. This library provides a reactive, stream-based API built with the BLoC/Cubit pattern to easily integrate speech recognition into your Flutter applications.
Features #
- Real-time Transcription: Receive intermediate results (hypothesis) and finalized text as the user speaks.
- Reactive API: Built on top of
flutter_bloc/CubitandStreamfor easy state management. - Cross-Platform: Supports Mobile (iOS, Android), Desktop (macOS, Windows, Linux), and Web.
- Auto-Silence Timeout: Automatically clears the text after a configurable period of silence.
- Multi-Language: Supports all languages provided by Azure Speech Services.
- Type-Safe: Built with sound null safety and strong typing.
Getting Started #
1. Dependencies #
Add the package to your pubspec.yaml:
dependencies:
azure_stt_flutter:
path: ./ # Or git url
2. Permissions #
Android
Add the microphone permission to android/app/src/main/AndroidManifest.xml:
<uses-permission android:name="android.permission.RECORD_AUDIO" />
<uses-permission android:name="android.permission.INTERNET" />
iOS
Add the microphone usage description to ios/Runner/Info.plist:
<key>NSMicrophoneUsageDescription</key>
<string>This app needs access to the microphone for speech recognition.</string>
macOS
Add the microphone entitlement to macos/Runner/DebugProfile.entitlements and Release.entitlements:
<key>com.apple.security.device.audio-input</key>
<true/>
Usage #
Initialization #
Initialize the AzureSpeechToText instance. You need a Subscription Key
final azureStt = AzureSpeechToText(
subscriptionKey: 'YOUR_AZURE_KEY',
region: 'westeurope',
language: 'en-US',
textClearTimeout: const Duration(seconds: 2),
);
Listening to Updates #
The library exposes a transcriptionStateStream which emits TranscriptionState updates.
StreamBuilder<TranscriptionState>(
stream: azureStt.transcriptionStateStream,
builder: (context, snapshot) {
final state = snapshot.data;
if (state == null) return SizedBox();
return Column(
children: [
// Combined text (finalized + intermediate)
Text(state.text),
// Or access them separately
// Text(state.intermediateText), // Changing hypothesis
// Text(state.finalizedText.join(' ')), // Confirmed sentences
],
);
},
)
Controls #
// Start listening
await azureStt.startListening()
// Stop listening
azureStt.stopListening()
// Check if listening
azureStt.isListening()
// Dispose when done
azureStt.dispose()
Architecture #
The library is built using the BLoC/Cubit pattern to manage the state of the transcription.
TranscriptionCubit #
The central state manager. It processes events from the Azure Service and emits TranscriptionState.
TranscriptionState #
An immutable object containing:
intermediateText: The real-time, changing text (hypothesis) that Azure sends while you are speaking.finalizedText: A list of completed sentences (phrases) that Azure has confirmed.text: A helper field that combines finalized and intermediate text for easier display.isListening: A boolean indicating if the microphone is active.
Data Flow #
- Microphone: Captures audio as a stream of bytes.
- Service:
AzureSttServicelistens to the mic stream and forwards audio chunks to Azure via WebSocket. - Azure: Sends back JSON events (Hypothesis or Phrase).
- Cubit: Processes these events and emits a new
TranscriptionState. - UI: Your app listens to the stream and rebuilds automatically.
Authentication #
The library handles authentication differently depending on the platform due to browser limitations.
Mobile & Desktop #
- Mechanism: The library uses the Subscription Key to get a short-lived Access Token from Azure.
- Connection: It connects to the Azure WebSocket URL, passing this token in the HTTP Authorization Header (
Authorization: Bearer <token>). This is the standard, secure way.
Web #
- Limitation: Standard browser WebSocket APIs do not allow setting custom HTTP headers during the handshake.
- Workaround: The library connects to the Azure WebSocket URL but passes the authentication credentials directly in the URL Query Parameters.
- Security Note: Because query parameters can potentially be logged, using the Token Fetcher approach (generating tokens on your backend) is highly recommended for Web deployments to avoid exposing your long-lived Subscription Key.
Development #
Setup #
# Install Git hooks and activate melos
make all
# Or run separately
make install-hooks
make activate-melos
# Bootstrap the project (install dependencies)
melos bootstrap
Testing #
# Run all tests
melos test
# Run tests with coverage
melos test_with_coverage
# Run a single test file
flutter test test/azure_stt_flutter_test.dart --dart-define=TESTING=true
Code Quality #
# Run all quality checks (format, analyze, test)
melos quality_checks
# Format code
melos format
# Analyze code
melos analyze
# Run DCM checks
melos dcm_analyze
melos dcm
# Auto-fix issues
melos fix
Key Technologies #
- State Management:
flutter_blocwith Cubit pattern - Equality:
equatablefor value-based state equality - Audio:
recordpackage for microphone input (16kHz, mono, PCM16) - WebSocket:
web_socket_channelfor Azure connection - HTTP:
httppackage for authentication token fetching
Project Structure #
lib/
├── azure_stt_flutter.dart # Public API entry point
└── src/ # Internal implementation
├── constants.dart # App constants
├── utils.dart # Utility classes/functions
├── cubit/ # BLoC/Cubit state management
├── models/ # Data models
├── services/ # Business logic services
└── web_socket/ # Platform-specific implementations
Contributing #
This project uses:
- Flutter SDK: >=3.10.0 <4.0.0
- FVM: For Flutter version management
- Melos: For monorepo tooling and scripts
- DCM: Dart Code Metrics for advanced static analysis
License #
This project is licensed under the MIT License - see the LICENSE file for details.