[Dev Notes] Speed Up Whisper on Mac with mlx-whisper

[Dev Notes] Speed Up Whisper on Mac with mlx-whisper

444
Words
2
min

OpenAI’s Whisper is a solid local model for transcribing audio to text, but on macOS the out-of-the-box performance is painfully slow — on my MacBook Pro M1, transcribing 1 minute of audio took over 5 minutes. The root cause is that the standard Python implementation doesn’t leverage the neural engine on Apple Silicon at all, which is a real shame.

There are a few Whisper-based alternatives for Mac. I tried faster-whisper and whisper.cpp, but found them either tedious to set up (compiling C++ is no fun) or not noticeably faster. I eventually landed on mlx-whisper.

mlx-whisper is a Whisper implementation built on Apple’s official MLX framework. It calls the GPU and neural engine on M-series chips directly, making it significantly faster than the native Python version.

Installation

Requirements

  1. Mac with Apple Silicon (M1 / M2 / M3 / M4)
  2. brew and Python 3.11 or later installed

Step 1: Install the audio processing dependency

mlx-whisper relies on ffmpeg for audio decoding. Install it via Homebrew first:

brew install ffmpeg

Step 2: Set up a Python virtual environment

Create an isolated virtual environment to keep things clean, then install the package:

python3.11 -m venv .venv
source .venv/bin/activate
pip install mlx-whisper

Step 3: Start transcribing

There are two ways to use mlx-whisper: the command line (CLI) or the Python API.

Command Line (CLI)

# Simplest usage — defaults to the whisper-tiny model
mlx_whisper audio.mp3

# Specify a model (recommended for Chinese/multilingual content)
mlx_whisper audio.mp3 --model mlx-community/whisper-large-v3-turbo

# Output as an SRT subtitle file
mlx_whisper audio.mp3 --model mlx-community/whisper-large-v3-turbo -f srt

The output file (e.g. audio.txt or audio.srt) is saved in the same directory.

Python API

import mlx_whisper

# Basic usage
result = mlx_whisper.transcribe("audio.mp3")
print(result["text"])

# Specify a model
result = mlx_whisper.transcribe(
    "audio.mp3",
    path_or_hf_repo="mlx-community/whisper-large-v3-turbo"
)
print(result["text"])

# Word-level timestamps
result = mlx_whisper.transcribe(
    "audio.mp3",
    path_or_hf_repo="mlx-community/whisper-large-v3-turbo",
    word_timestamps=True
)
# Detailed segment info
for segment in result["segments"]:
    print(f"[{segment['start']:.1f}s -> {segment['end']:.1f}s] {segment['text']}")

The model is downloaded automatically from Hugging Face on first run and cached locally for subsequent use.

Model Comparison

Here are the main models available from mlx-community, with actual download sizes:

ModelDownload SizeChineseSpeedBest For
mlx-community/whisper-tiny-mlx74 MBPoorFastestQuick drafts, English only
mlx-community/whisper-small-mlx481 MBFairFastPrimarily English content
mlx-community/whisper-medium-mlx1.5 GBGoodMediumMixed Chinese/English
mlx-community/whisper-large-v3-mlx3.1 GBExcellentSlowerChinese, multilingual, highest accuracy
mlx-community/whisper-large-v3-turbo1.6 GBExcellentFastChinese, multilingual, speed + accuracy ⭐

Recommendation: For Chinese content, go with mlx-community/whisper-large-v3-turbo. It’s a distilled version of large-v3 with comparable accuracy but much faster inference — the best balance of speed and quality.

Full model list: Hugging Face mlx-community Whisper collection

Conclusion

mlx-whisper is the most hassle-free way to run Whisper on a Mac with Apple Silicon — a few commands to install, no C++ compilation, no extra configuration, and it just works. In practice, pairing it with the large-v3-turbo model gives excellent Chinese transcription accuracy at a fraction of the original wait time.

If you have any audio-to-text needs on your Mac, mlx-whisper is well worth trying.

Conversation

Share your thoughts

No sparks yet. Waiting for your first word...

Scroll down to load more comments...

地藏王菩薩愛您

迎接地藏王菩薩進入您的人生,推薦您以下三個天界之舟出品的禱告運用,
點閱播放讓地藏王菩薩幫助您~