Speech Recognition Providers in Hedy

What are Speech Recognition Providers?

Hedy supports multiple speech recognition options, giving you flexibility to choose between complete privacy with local processing or cloud-based alternatives. You can switch providers anytime based on your current needs - use local for offline sessions and cloud services when you prefer their specific features.

Getting Started

Open the Hedy app
Navigate to Settings (tap your profile icon)
Scroll to “Speech Recognition Options”
Select your preferred provider from the dropdown menu
Configure provider-specific settings if needed
Your selection takes effect in the next session

Available Providers

Hedy offers four speech recognition options, each with unique characteristics:

Local Speech Recognition (Whisper): Default option - 100% private, works offline, no usage costs. Your audio never leaves your device. Available on every platform Hedy runs on.
Local Speech Recognition (Nemotron) [Beta]: A newer on-device streaming engine with live transcripts and on-device speaker labels. You choose between an English-only mode (the fastest option) and a multilingual mode that covers a broad set of major languages. Available on every platform Hedy ships a native app for: Apple Silicon Macs, iPhone 12 (or newer), iPad Air 4 (or newer), Windows, and Android. On Apple hardware it runs on the Neural Engine and labels speakers live; on Windows and Android the labels are added at the end of the session. Requires a one-time model download (about 0.6 GB for English-only, 0.7 GB for multilingual).
Deepgram: Cloud-based service with real-time streaming and smart formatting features. Uses Nova-3, which supports dozens of languages. Hedy exposes every language Nova-3 offers, so you can transcribe meetings in any supported language without switching providers. Requires your own API key.
OpenAI: Cloud transcription with Voice Activity Detection and automatic language detection. Hedy automatically continues long sessions past OpenAI’s 60-minute per-connection cap by rotating connections behind the scenes, so hour-plus meetings keep going without interruption. Requires your own API key.

Configuring Local Speech Recognition (Whisper)

When using Whisper, you can optimize for your device and needs:

For macOS Users:

Small Model: Fastest processing, recommended for Intel Macs
Regular Model: Balanced speed and accuracy for most users
Large Model: Enhanced capabilities for non-English languages (requires 1.5GB download)

For iOS/Android Users:

Standard Model: Default option suitable for most devices
Large Model: Alternative model option (iPhone 12+ or 2024+ Android recommended)

Voice Activity Detection (VAD):

VAD automatically filters out silence and background noise to improve transcription quality. This feature is enabled by default for Whisper.

Enable/Disable: Toggle VAD on or off based on your recording environment
Sensitivity: Adjust from “High Sensitivity” (captures more speech, including quieter sounds) to “Maximum Filtering” (only captures clear speech, filters more background noise)

Transcript Speed Settings:

Slower: Waits for complete sentences before displaying
Normal: Balanced speed and display timing
Faster: Near real-time display with more frequent updates

Configuring Local Speech Recognition (Nemotron)

Nemotron is currently in Beta. It transcribes entirely on-device and shows live transcripts as you talk. It’s available on every platform Hedy ships a native app for: iOS, iPadOS, macOS, Windows, and Android. On Apple hardware it runs on the Neural Engine.

Device requirements:

Apple Silicon Mac (M1 or newer), or
iPhone 12 family or newer, or iPad Air 4 or newer

English-only or multilingual:

In the provider dropdown, Nemotron appears as two choices, so you can pick the one that matches your meetings:

Local Speech Recognition (Nemotron English Only): streaming English transcription, the fastest option.
Local Speech Recognition (Nemotron Multilingual): on-device streaming across a broad set of major languages, for when you need more than English.

Both run fully on-device, and both identify language from the audio rather than from your meeting language setting.

First-time setup:

Select Local Speech Recognition (Nemotron English Only) or (Nemotron Multilingual) from the provider dropdown
Tap Download Nemotron model (about 0.6 GB for English-only, 0.7 GB for multilingual) - we recommend Wi-Fi
Once the download finishes, Nemotron is used automatically in your next session

Speaker labels and the temporary audio cache:

Nemotron labels who’s speaking, both live and after the session. To make those speaker labels more accurate, Hedy keeps each session’s audio in a temporary on-device cache while it processes, then deletes it. This audio stays on your device. The setting, Temporary audio cache (Nemotron), is on by default; you can turn it off in Hedy’s settings, though leaving it on gives Nemotron the best speaker attribution.

Setting Up Cloud Providers

Deepgram Setup:

Create an account at console.deepgram.com
Generate an API key from your dashboard
In Hedy Settings, select Deepgram from the dropdown
Paste your API key and tap “Test” to verify
Choose your model and language preferences
Set maximum session duration to control costs

OpenAI Setup:

Get your API key from platform.openai.com/api-keys
In Hedy Settings, select OpenAI from the dropdown
Enter your API key and test the connection
Choose your preferred model
Optionally enable Voice Activity Detection with adjustable sensitivity
Set maximum session duration for cost control

Choosing the Right Provider

Select based on your priorities and use case:

Privacy First: Use a local engine (Whisper or Nemotron) - audio never leaves your device
Offline Use: All local engines work without internet
Cloud Features: Deepgram and OpenAI offer cloud-based processing
Voice Detection: Whisper and OpenAI include Voice Activity Detection features
Smart Formatting: Deepgram offers automatic formatting options
No Usage Costs: Local engines (Whisper, Nemotron) have no per-minute charges
Faster On-Device Transcription: Nemotron (Beta) typically delivers a lower-latency transcript than Whisper
Multilingual On-Device Streaming: Nemotron Multilingual (Beta) gives you on-device transcription across a broad set of languages
Maximum Language Coverage On-Device: For non-European languages on-device, prefer Whisper Large or Nemotron Multilingual
Fully Private Analysis: On macOS (Apple Silicon) or Windows, you can pair local speech recognition with Local AI Processing to keep both transcription and AI analysis fully on-device.

Cost Considerations

Understanding the cost implications of each provider:

Local Speech Recognition (Whisper): Free - no usage charges
Local Speech Recognition (Nemotron): Free - no usage charges (one-time model download, about 0.6-0.7 GB)
Deepgram: Pay-per-minute pricing (check current rates on their dashboard)
OpenAI: Usage-based pricing (check current rates on their platform)

The maximum session duration setting helps prevent accidental overnight recordings and manage API costs.

Best Practices

Start with Local Speech Recognition (Whisper) to familiarize yourself with the feature, then try Nemotron if your device is supported
Test cloud providers with short recordings before important sessions
Monitor your API usage on provider dashboards to track costs
Use different providers for different scenarios based on your needs
Switch to local when traveling or in areas with limited internet
Set appropriate maximum session durations (60-120 minutes for typical meetings)

Troubleshooting

API Key Not Working

Ensure you copied the complete key without spaces
Verify your account has available credits
Check the API key has necessary permissions
Try regenerating the key from provider dashboard

Connection Test Failed

Check your internet connection stability
Verify firewall isn’t blocking WebSocket connections
Ensure API key is active with sufficient quota
Wait a moment and try again (temporary service issues)

Transcription Issues

For Whisper: Try a different model size
For Whisper on Windows: If transcription lags far behind the conversation, check slow transcription GPU settings
For specialized terms, names, and acronyms: Add them via the custom vocabulary feature
For Nemotron: Use the English Only mode for English meetings; for other languages, use the Multilingual mode or switch to Whisper with the language set explicitly
For Cloud: Check internet connection stability
Ensure microphone is properly configured
Minimize background noise during recording

Settings Not Saving

Wait for the “Saved” indicator to appear
Don’t switch screens while saving
Restart the app if issues persist
Ensure you have a stable internet connection

Your API keys are stored securely in your device’s encrypted keychain and never transmitted to Hedy’s servers. For maximum privacy with sensitive conversations, always use a local engine (Whisper or Nemotron).