Skip to content

Fix Poor Transcription Quality

If Hedy’s transcripts are full of errors — missed words, wrong proper nouns, garbled phrases — the cause is rarely the AI model itself. It’s almost always one of: poor audio capture environment, the wrong microphone, the wrong language setting, or a provider that doesn’t match your use case. Here’s how to diagnose and fix each, ranked by how often each is the culprit.

First, Confirm the Basics

Before changing anything, check these:

  • Is the Meeting/Class Language set to the language you’re actually speaking? Settings > Profile > Language Preferences. The default speech recognition provider (Whisper) does not auto-detect language — it transcribes assuming the language you configured. If those don’t match, every word will be wrong. See Transcript Came Out in the Wrong Language.

  • Is the right microphone selected? Settings > Sessions > Microphone Settings. If you accidentally chose a Bluetooth headset that’s unplugged or a USB mic that’s disconnected, Hedy is recording silence and the transcript is garbage.

Most “low quality” complaints are one of these two settings, not anything technical.

Improve the Audio Environment

Hedy doesn’t apply any client-side noise suppression, automatic gain control, or echo cancellation. The audio that goes into transcription is essentially what your microphone picks up. Cleaner audio in = cleaner transcript out.

  • Get the microphone closer to the speakers. For in-person meetings, a phone placed in the middle of a small table works for 4-5 people. For a large room, 8+ people, or noisy environments, use multiple devices or a dedicated conference mic.

  • Reduce background noise. Fans, air conditioning, kitchen appliances, traffic, and other people talking in the background all degrade accuracy. Close doors and windows. Turn off the fan if possible.

  • Avoid recording from laptop speakers played by laptop speakers. If you’re trying to capture a meeting that’s playing through laptop speakers (e.g., a video on YouTube), use the system audio capture features instead. See Hedy Isn’t Capturing Other Participants in Virtual Meetings.

  • Don’t talk over each other. Overlapping speech is the hardest case for any speech recognition. Hedy’s diarization tries to separate speakers, but if multiple people speak at once, accuracy drops sharply.

Choose the Right Speech Recognition Provider

Hedy supports four speech recognition providers — two local, two cloud. You can see and change them at Settings > Speech & AI > Speech Recognition Options.

ProviderTypeBest forTrade-off
Local Speech Recognition (Whisper) — defaultLocalPrivacy-sensitive use, working offline, broad language supportSlower than cloud on integrated graphics; uses configured meeting language (no auto-detect)
Local Speech Recognition (Parakeet) [Beta]Local (Apple Silicon Macs and supported iPhone/iPad models)Faster real-time transcription for English and major European languagesBeta; narrower language list than Whisper; may misidentify similar languages
Deepgram (requires your own API key)CloudCloud accuracy, multi-language auto-detect, large meetingsRequires Deepgram account and API key; not local
OpenAI (requires your own API key)CloudCloud accuracy, language auto-detectionRequires OpenAI account and API key; not local

If you’re using the default Whisper provider and accuracy isn’t good enough, try the following in order, depending on your situation:

  • On Apple Silicon Macs or supported iPhone/iPad models, for English or major European languages: try Parakeet. It runs on Apple’s Neural Engine and is often faster and more accurate than Whisper for real-time English transcription. It’s still beta — watch for “similar language” misidentification (e.g., German vs. Dutch).

  • For multi-language meetings, accented speech, or noisy environments: try Deepgram (multi-language auto-detect) or OpenAI (auto-detect). Both require you to bring your own API key, but they typically outperform local models on hard audio.

  • If you need to stay offline or fully private and Whisper is slow on your hardware: see Fix Slow Transcription on Windows (GPU Settings) for the Windows-specific GPU acceleration fix, or move to Parakeet if you’re on Apple Silicon.

Use Custom Vocabulary for Proper Nouns

If Hedy mis-transcribes names, technical terms, product names, or industry jargon, add them to Custom Vocabulary.

  1. Open Hedy’s Settings

  2. Go to Personalization > Custom Vocabulary > Manage Vocabulary Terms

  3. Enter each term in “Enter a custom term…” and tap Add

  4. Make sure Enable Custom Vocabulary is on

Custom Vocabulary feeds directly into the local Whisper transcription as a prompt, helping it recognize and spell domain-specific terms correctly. It also helps the transcript cleanup step (which runs across all providers, including Parakeet, Deepgram, and OpenAI) catch and fix mistakes.

Note: Custom Vocabulary has its strongest direct effect when you’re using local Whisper STT. For Parakeet, Deepgram, and OpenAI, the cleanup step still benefits from your vocabulary list, but the speech recognizer itself doesn’t receive it as a prompt.

For a longer guide on building a good vocabulary list, see Custom Vocabulary Guide.

Fix Microphone Hardware Issues

If audio quality is degrading mid-session or only certain speakers come through, the hardware is suspect:

  • Bluetooth headsets often degrade as battery drops or when range increases. See AirPods and Bluetooth Headphones Cutting Out.

  • USB microphones can suffer from cable issues — try a different USB port, or a different cable

  • Built-in laptop mics are fine for one or two people sitting close to the keyboard. They’re not great for conference rooms.

  • Phones inside cases or under fabric can sound muffled

A quick test: record a short voice memo with the same microphone in Voice Memos / Recorder / a similar simple app. If that recording sounds bad, the problem is the mic — not Hedy.

Audio Format Hedy Uses

For reference, Hedy captures audio at 16 kHz, mono, 16-bit PCM — the standard for speech recognition. This format goes directly to local Whisper and Deepgram. For OpenAI Realtime, Hedy resamples to 24 kHz before sending (OpenAI’s required format). All of these are fine for speech but lossy for music or high-fidelity audio. Don’t expect great results trying to transcribe songs.

When to Escalate

If you’ve checked all the above and accuracy is still poor:

  1. Note the specific kind of error (wrong words, missed sections, wrong speaker attribution, total garbage)

  2. Capture a 30-second sample where the error happens

  3. Contact us through the chat widget with the sample and your provider/language setup

We can usually identify whether it’s environmental, configuration, or a provider issue.

Still having trouble? Contact us through the chat widget with your provider, your Meeting/Class Language setting, your device model, and a sample where the issue is visible.