[開發手記] 用 mlx-whisper 在 Mac 上快速將音訊轉字幕

2026年4月2日

729

字數

分鐘

在音檔轉字幕需求下，OpenAI 提供的 Whisper 本地模型有著不錯的辨識效果，但如果在 MacOS 下使用，原生的處理速度非常慢(MBPR M1 處理 1 分鐘音檔跑了 5 分鐘)，主要原因是底層沒有直接用到 M 系列晶片的神經網路功能，實在很可惜。

MacOS 下還有幾套基於 Whisper 的加速方案，藥藥曾經嘗試使用 faster-whisper、whisper.cpp 等專案，主要原因是設定繁瑣(還要編譯 c++)、效果沒有特別好，因此最後選擇 mlx-whisper。

mlx-whisper 是基於 Apple 官方 MLX 框架開發的 Whisper 實作，能直接呼叫 M 系列晶片的 GPU 與神經網路引擎，速度比原生 Python 版快上許多。

安裝

需求

Mac M 系列晶片（M1 / M2 / M3 / M4）
已安裝 brew 及 python3.11 以上版本

步驟一：安裝影音處理套件

mlx-whisper 依賴 ffmpeg 進行音訊解碼，先用 Homebrew 安裝：

brew install ffmpeg

步驟二：設定 Python 虛擬環境

為了讓環境乾淨，建立獨立的虛擬環境再安裝套件：

python3.11 -m venv .venv
source .venv/bin/activate
pip install mlx-whisper

步驟三：開始使用

安裝完成後有兩種使用方式：命令列（CLI）或 Python 程式。

命令列（CLI）

# 最簡單的用法，預設使用 whisper-tiny 模型
mlx_whisper audio.mp3

# 指定模型（推薦中文使用 large-v3-turbo）
mlx_whisper audio.mp3 --model mlx-community/whisper-large-v3-turbo

# 指定輸出格式為 srt 字幕檔
mlx_whisper audio.mp3 --model mlx-community/whisper-large-v3-turbo -f srt

執行後會在同目錄產生對應的文字檔（如 audio.txt 或 audio.srt）。

Python 程式

import mlx_whisper

# 基本用法
result = mlx_whisper.transcribe("audio.mp3")
print(result["text"])

# 指定模型
result = mlx_whisper.transcribe(
    "audio.mp3",
    path_or_hf_repo="mlx-community/whisper-large-v3-turbo"
)
print(result["text"])

# 加上逐字時間戳記
result = mlx_whisper.transcribe(
    "audio.mp3",
    path_or_hf_repo="mlx-community/whisper-large-v3-turbo",
    word_timestamps=True
)
# 每個片段的詳細資訊
for segment in result["segments"]:
    print(f"[{segment['start']:.1f}s -> {segment['end']:.1f}s] {segment['text']}")

第一次執行時會自動從 Hugging Face 下載模型，請確保網路暢通，之後會快取於本地。

常見模型比較

以下為 mlx-community 提供的主要模型，可依用途選擇：

模型	下載大小	中文辨識	速度	適用情境
`mlx-community/whisper-tiny-mlx`	74 MB	普通	最快	快速草稿、純英文
`mlx-community/whisper-small-mlx`	481 MB	尚可	快	英文為主的內容
`mlx-community/whisper-medium-mlx`	1.5 GB	良好	中等	中英混合內容
`mlx-community/whisper-large-v3-mlx`	3.1 GB	優秀	較慢	中文、多語言、最高準確率需求
`mlx-community/whisper-large-v3-turbo`	1.6 GB	優秀	快	中文、多語言，兼顧速度與準確率 ⭐

推薦：中文內容首選 mlx-community/whisper-large-v3-turbo，它是 large-v3 的蒸餾版本，準確率相近但速度快上許多，是速度與品質的最佳平衡點。

完整模型清單可參考：Hugging Face mlx-community Whisper 系列

結論

mlx-whisper 是目前在 Mac M 系列晶片上跑 Whisper 最省心的方案——安裝只需幾行指令，不用編譯 C++，也不需要額外設定，開箱即用。實際使用下來，搭配 large-v3-turbo 模型，中文辨識準確率令人滿意，速度也比原生 Python 版快了數倍，再也不用等著看進度條跑完。

如果你也在 Mac 上有語音轉文字的需求，mlx-whisper 絕對值得一試。

—

留言交流

目前還沒有留言,歡迎您留下第一則!

向下滑動載入更多留言...