I've been a heavy user of dictation, off and on, as my wrists have
gotten better and worse. I've mostly used the built-in Mac and
Android recognition: Mac isn't great, Android is pretty good, neither
has improved much over the past ~5y despite large improvements in what
should be possible. OpenAI has an open speech recognition model, whisper, and I wanted to
have a go at running it on my Mac.
It looks like for good local performance the best version is whisper.cpp, which
is a plain C/C++ implementation with support for Mac's ML hardware.
To get this installed I needed to install XCode (not just the command
line tools, since I needed coremlc) and then run:
By default the output is quite repetitive. For example I dictated:
It looks like for good local performance, the best version to use is
whisper.cpp, which is a plain C/C++ implementation with support for
Mac's machine learning hardware. To get this installed, I needed to
install Xcode (not just the command line tools since I needed
coremlc), and then run a whole bunch of commands.
The output was:
It looks like for good local performance the best version
to use is whisper.cpp, which is a plain C++
to use is whisper.cpp which is a plain C/C++ implementation with
support for maximum
machine learning hardware. To get this insight into the
machine learning hardware. To get this installed, I needed to
install Xcode, not just the computer
not just the command line tools since I needed core MLC.
not just the command line tools since I needed core MLC, and
then run a whole bunch of commands.
The fix for not enough LLM these days is often more LLM, so I used
Claude Sonnet 4 to clean it up with the prompt "Could you clean up
this Whisper transcript? The things that look like repetitions
aren't, it's just that whisper does a lot of 'restating' as it updates
it's view of what was said." This gave me:
It looks like for good local performance, the best version to use is
whisper.cpp, which is a plain C/C++ implementation with support for
maximum machine learning hardware. To get this installed, I needed to
install Xcode, not just the command line tools since I needed Core ML,
and then run a whole bunch of commands.
This is very good! The only two things it seems to have gotten
'wrong' are "maximum" for "Mac's" and commas where I was thinking
parens. And neither of these are very wrong: "Mac's" comes out
verbally as "max" and "maximum" is also plausible in context; the
commas read fine, perhaps better than my parens.
I set this up a couple weeks ago, and have generally been finding this
quite useful.
I've been a heavy user of dictation, off and on, as my wrists have gotten better and worse. I've mostly used the built-in Mac and Android recognition: Mac isn't great, Android is pretty good, neither has improved much over the past ~5y despite large improvements in what should be possible. OpenAI has an open speech recognition model, whisper, and I wanted to have a go at running it on my Mac.
It looks like for good local performance the best version is whisper.cpp, which is a plain C/C++ implementation with support for Mac's ML hardware. To get this installed I needed to install XCode (not just the command line tools, since I needed
coremlc
) and then run:Note that both older (3.10) and newer (3.13) Python versions gave compilation errors.
While I don't know if these are the ideal arguments, I've been using:
By default the output is quite repetitive. For example I dictated:
The output was:
The fix for not enough LLM these days is often more LLM, so I used Claude Sonnet 4 to clean it up with the prompt "Could you clean up this Whisper transcript? The things that look like repetitions aren't, it's just that whisper does a lot of 'restating' as it updates it's view of what was said." This gave me:
This is very good! The only two things it seems to have gotten 'wrong' are "maximum" for "Mac's" and commas where I was thinking parens. And neither of these are very wrong: "Mac's" comes out verbally as "max" and "maximum" is also plausible in context; the commas read fine, perhaps better than my parens.
I set this up a couple weeks ago, and have generally been finding this quite useful.
Comment via: facebook, mastodon, bluesky, substack