[Dev Notes] Speed Up Whisper on Mac with mlx-whisper
OpenAI’s Whisper is a solid local model for transcribing audio to text, but on macOS the out-of-the-box performance is painfully slow — on my MacBook Pro M1, transcribing 1 minute of audio took over 5 minutes. The root cause is that the standard Python implementation doesn’t leverage the neural engine on Apple Silicon at all, which is a real shame.
There are a few Whisper-based alternatives for Mac. I tried faster-whisper and whisper.cpp, but found them either tedious to set up (compiling C++ is no fun) or not noticeably faster. I eventually landed on mlx-whisper.
mlx-whisper is a Whisper implementation built on Apple’s official MLX framework. It calls the GPU and neural engine on M-series chips directly, making it significantly faster than the native Python version.
Installation
Requirements
- Mac with Apple Silicon (M1 / M2 / M3 / M4)
brewand Python 3.11 or later installed
Step 1: Install the audio processing dependency
mlx-whisper relies on ffmpeg for audio decoding. Install it via Homebrew first:
brew install ffmpeg
Step 2: Set up a Python virtual environment
Create an isolated virtual environment to keep things clean, then install the package:
python3.11 -m venv .venv
source .venv/bin/activate
pip install mlx-whisper
Step 3: Start transcribing
There are two ways to use mlx-whisper: the command line (CLI) or the Python API.
Command Line (CLI)
# Simplest usage — defaults to the whisper-tiny model
mlx_whisper audio.mp3
# Specify a model (recommended for Chinese/multilingual content)
mlx_whisper audio.mp3 --model mlx-community/whisper-large-v3-turbo
# Output as an SRT subtitle file
mlx_whisper audio.mp3 --model mlx-community/whisper-large-v3-turbo -f srt
The output file (e.g. audio.txt or audio.srt) is saved in the same directory.
Python API
import mlx_whisper
# Basic usage
result = mlx_whisper.transcribe("audio.mp3")
print(result["text"])
# Specify a model
result = mlx_whisper.transcribe(
"audio.mp3",
path_or_hf_repo="mlx-community/whisper-large-v3-turbo"
)
print(result["text"])
# Word-level timestamps
result = mlx_whisper.transcribe(
"audio.mp3",
path_or_hf_repo="mlx-community/whisper-large-v3-turbo",
word_timestamps=True
)
# Detailed segment info
for segment in result["segments"]:
print(f"[{segment['start']:.1f}s -> {segment['end']:.1f}s] {segment['text']}")
The model is downloaded automatically from Hugging Face on first run and cached locally for subsequent use.
Model Comparison
Here are the main models available from mlx-community, with actual download sizes:
| Model | Download Size | Chinese | Speed | Best For |
|---|---|---|---|---|
mlx-community/whisper-tiny-mlx | 74 MB | Poor | Fastest | Quick drafts, English only |
mlx-community/whisper-small-mlx | 481 MB | Fair | Fast | Primarily English content |
mlx-community/whisper-medium-mlx | 1.5 GB | Good | Medium | Mixed Chinese/English |
mlx-community/whisper-large-v3-mlx | 3.1 GB | Excellent | Slower | Chinese, multilingual, highest accuracy |
mlx-community/whisper-large-v3-turbo | 1.6 GB | Excellent | Fast | Chinese, multilingual, speed + accuracy ⭐ |
Recommendation: For Chinese content, go with
mlx-community/whisper-large-v3-turbo. It’s a distilled version oflarge-v3with comparable accuracy but much faster inference — the best balance of speed and quality.
Full model list: Hugging Face mlx-community Whisper collection
Conclusion
mlx-whisper is the most hassle-free way to run Whisper on a Mac with Apple Silicon — a few commands to install, no C++ compilation, no extra configuration, and it just works. In practice, pairing it with the large-v3-turbo model gives excellent Chinese transcription accuracy at a fraction of the original wait time.
If you have any audio-to-text needs on your Mac, mlx-whisper is well worth trying.
Conversation
No sparks yet. Waiting for your first word...