<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Speech-to-Text on Tarragon</title><link>https://tarrragon.github.io/blog/tags/speech-to-text/</link><description>Recent content in Speech-to-Text on Tarragon</description><generator>Hugo -- gohugo.io</generator><language>zh-TW</language><copyright>Tarragon (CC BY 4.0)</copyright><lastBuildDate>Tue, 12 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://tarrragon.github.io/blog/tags/speech-to-text/index.xml" rel="self" type="application/rss+xml"/><item><title>Hands-on：安裝 whisper.cpp 做語音轉文字</title><link>https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/whisper-setup/</link><pubDate>Tue, 12 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/whisper-setup/</guid><description>&lt;p>本篇紀錄在 Apple Silicon Mac 上裝 &lt;code>whisper.cpp&lt;/code> 並驗證英文語音轉文字。選 whisper.cpp 而非 &lt;code>openai-whisper&lt;/code>（Python 版）的理由：&lt;/p>
&lt;ul>
&lt;li>純 C++ 實作、Metal backend 直接吃 Apple Silicon GPU。&lt;/li>
&lt;li>Homebrew bottle、&lt;code>brew install&lt;/code> 一行裝完、不需要 Python 環境跟 torch wheel。&lt;/li>
&lt;li>Binary 名稱是 &lt;code>whisper-cli&lt;/code>、CLI-first、整合到 shell pipeline 容易。&lt;/li>
&lt;/ul>
&lt;blockquote>
&lt;p>&lt;strong>驗證日期&lt;/strong>：2026-05-12
&lt;strong>whisper-cpp 版本&lt;/strong>：1.8.4
&lt;strong>示範模型&lt;/strong>：&lt;code>ggml-tiny.en.bin&lt;/code>（78 MB、英文專用、最小可用）
&lt;strong>實測&lt;/strong>：7 秒音訊 484ms 轉錄、用 Metal GPU 加速&lt;/p>&lt;/blockquote>
&lt;h2 id="前置設定">前置設定&lt;/h2>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>項目&lt;/th>
 &lt;th>檢查指令&lt;/th>
 &lt;th>預期&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>Homebrew&lt;/td>
 &lt;td>&lt;code>brew --version&lt;/code>&lt;/td>
 &lt;td>4.x&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>ffmpeg&lt;/td>
 &lt;td>&lt;code>which ffmpeg&lt;/code>&lt;/td>
 &lt;td>&lt;code>/opt/homebrew/bin/ffmpeg&lt;/code>（沒有：&lt;code>brew install ffmpeg&lt;/code>）&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>磁碟空間&lt;/td>
 &lt;td>&lt;code>df -h ~&lt;/code>&lt;/td>
 &lt;td>至少 200 MB（whisper-cli + 1 個 small model）&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>&lt;code>ffmpeg&lt;/code> 是必要的——whisper-cli 接受多種音訊格式、但實際內部會先轉成 16kHz mono WAV、ffmpeg 是這個轉換的依賴。&lt;/p>
&lt;h2 id="安裝-whisper-cpp">安裝 whisper-cpp&lt;/h2>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">brew install whisper-cpp&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Homebrew 會裝：&lt;/p>
&lt;ul>
&lt;li>&lt;code>whisper-cli&lt;/code> binary 到 &lt;code>/opt/homebrew/bin/&lt;/code>&lt;/li>
&lt;li>&lt;code>ggml&lt;/code> 共用 lib 到 &lt;code>/opt/homebrew/Cellar/ggml/&lt;/code>&lt;/li>
&lt;li>BLAS / Metal backend 自動配對 Apple Silicon&lt;/li>
&lt;/ul>
&lt;p>驗證 binary 可用：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">which whisper-cli
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="c1"># /opt/homebrew/bin/whisper-cli&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">whisper-cli --help 2&amp;gt;&lt;span class="p">&amp;amp;&lt;/span>&lt;span class="m">1&lt;/span> &lt;span class="p">|&lt;/span> head -5&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>第一次跑會看到 Metal 初始化訊息：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-text" data-lang="text">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">ggml_metal_library_init: using embedded metal library
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">ggml_metal_library_init: loaded in 6.883 sec&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>第一次 Metal lib 載入慢（~7 秒）、後續會 cache、變很快。&lt;/p>
&lt;h2 id="下載-model">下載 Model&lt;/h2>
&lt;p>whisper-cpp 跟 OpenAI 原版分離管理 model file、要自己下載 GGML 格式：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">mkdir -p ~/.whisper-models
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="nb">cd&lt;/span> ~/.whisper-models
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">curl -L -o ggml-tiny.en.bin &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">&lt;span class="se">&lt;/span> &lt;span class="s2">&amp;#34;https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-tiny.en.bin&amp;#34;&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>可用 model 比較（大小越大、品質越好、速度越慢）：&lt;/p>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>Model&lt;/th>
 &lt;th>大小&lt;/th>
 &lt;th>適合場景&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>&lt;code>ggml-tiny.en.bin&lt;/code>&lt;/td>
 &lt;td>78 MB&lt;/td>
 &lt;td>英文、最小驗證、品質可接受&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;code>ggml-base.en.bin&lt;/code>&lt;/td>
 &lt;td>148 MB&lt;/td>
 &lt;td>英文、常用入門&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;code>ggml-small.en.bin&lt;/code>&lt;/td>
 &lt;td>488 MB&lt;/td>
 &lt;td>英文、daily use 甜蜜點&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;code>ggml-medium.en.bin&lt;/code>&lt;/td>
 &lt;td>1.5 GB&lt;/td>
 &lt;td>英文、品質敏感&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;code>ggml-small.bin&lt;/code>&lt;/td>
 &lt;td>488 MB&lt;/td>
 &lt;td>多語言（含中文）&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;code>ggml-large-v3.bin&lt;/code>&lt;/td>
 &lt;td>3.1 GB&lt;/td>
 &lt;td>多語言、最佳品質、跑得最慢&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>選 &lt;code>tiny.en&lt;/code> 是因為&lt;strong>只驗證安裝路徑&lt;/strong>、實際日常用要 &lt;code>small.en&lt;/code> 起跳。&lt;/p>
&lt;p>驗證下載：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">ls -lh ~/.whisper-models/
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="c1"># 應該看到 78 MB 的 ggml-tiny.en.bin&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="跑第一次轉錄">跑第一次轉錄&lt;/h2>
&lt;p>需要一段測試音訊。可以用 macOS 內建 &lt;code>say&lt;/code> 生成、再用 ffmpeg 轉成 whisper.cpp 需要的格式（16kHz mono WAV）：&lt;/p></description><content:encoded><![CDATA[<p>本篇紀錄在 Apple Silicon Mac 上裝 <code>whisper.cpp</code> 並驗證英文語音轉文字。選 whisper.cpp 而非 <code>openai-whisper</code>（Python 版）的理由：</p>
<ul>
<li>純 C++ 實作、Metal backend 直接吃 Apple Silicon GPU。</li>
<li>Homebrew bottle、<code>brew install</code> 一行裝完、不需要 Python 環境跟 torch wheel。</li>
<li>Binary 名稱是 <code>whisper-cli</code>、CLI-first、整合到 shell pipeline 容易。</li>
</ul>
<blockquote>
<p><strong>驗證日期</strong>：2026-05-12
<strong>whisper-cpp 版本</strong>：1.8.4
<strong>示範模型</strong>：<code>ggml-tiny.en.bin</code>（78 MB、英文專用、最小可用）
<strong>實測</strong>：7 秒音訊 484ms 轉錄、用 Metal GPU 加速</p></blockquote>
<h2 id="前置設定">前置設定</h2>
<table>
  <thead>
      <tr>
          <th>項目</th>
          <th>檢查指令</th>
          <th>預期</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Homebrew</td>
          <td><code>brew --version</code></td>
          <td>4.x</td>
      </tr>
      <tr>
          <td>ffmpeg</td>
          <td><code>which ffmpeg</code></td>
          <td><code>/opt/homebrew/bin/ffmpeg</code>（沒有：<code>brew install ffmpeg</code>）</td>
      </tr>
      <tr>
          <td>磁碟空間</td>
          <td><code>df -h ~</code></td>
          <td>至少 200 MB（whisper-cli + 1 個 small model）</td>
      </tr>
  </tbody>
</table>
<p><code>ffmpeg</code> 是必要的——whisper-cli 接受多種音訊格式、但實際內部會先轉成 16kHz mono WAV、ffmpeg 是這個轉換的依賴。</p>
<h2 id="安裝-whisper-cpp">安裝 whisper-cpp</h2>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">brew install whisper-cpp</span></span></code></pre></div><p>Homebrew 會裝：</p>
<ul>
<li><code>whisper-cli</code> binary 到 <code>/opt/homebrew/bin/</code></li>
<li><code>ggml</code> 共用 lib 到 <code>/opt/homebrew/Cellar/ggml/</code></li>
<li>BLAS / Metal backend 自動配對 Apple Silicon</li>
</ul>
<p>驗證 binary 可用：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">which whisper-cli
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"># /opt/homebrew/bin/whisper-cli</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">
</span></span><span class="line"><span class="ln">4</span><span class="cl">whisper-cli --help 2&gt;<span class="p">&amp;</span><span class="m">1</span> <span class="p">|</span> head -5</span></span></code></pre></div><p>第一次跑會看到 Metal 初始化訊息：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">ggml_metal_library_init: using embedded metal library
</span></span><span class="line"><span class="ln">2</span><span class="cl">ggml_metal_library_init: loaded in 6.883 sec</span></span></code></pre></div><p>第一次 Metal lib 載入慢（~7 秒）、後續會 cache、變很快。</p>
<h2 id="下載-model">下載 Model</h2>
<p>whisper-cpp 跟 OpenAI 原版分離管理 model file、要自己下載 GGML 格式：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">mkdir -p ~/.whisper-models
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="nb">cd</span> ~/.whisper-models
</span></span><span class="line"><span class="ln">3</span><span class="cl">curl -L -o ggml-tiny.en.bin <span class="se">\
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="se"></span>  <span class="s2">&#34;https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-tiny.en.bin&#34;</span></span></span></code></pre></div><p>可用 model 比較（大小越大、品質越好、速度越慢）：</p>
<table>
  <thead>
      <tr>
          <th>Model</th>
          <th>大小</th>
          <th>適合場景</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><code>ggml-tiny.en.bin</code></td>
          <td>78 MB</td>
          <td>英文、最小驗證、品質可接受</td>
      </tr>
      <tr>
          <td><code>ggml-base.en.bin</code></td>
          <td>148 MB</td>
          <td>英文、常用入門</td>
      </tr>
      <tr>
          <td><code>ggml-small.en.bin</code></td>
          <td>488 MB</td>
          <td>英文、daily use 甜蜜點</td>
      </tr>
      <tr>
          <td><code>ggml-medium.en.bin</code></td>
          <td>1.5 GB</td>
          <td>英文、品質敏感</td>
      </tr>
      <tr>
          <td><code>ggml-small.bin</code></td>
          <td>488 MB</td>
          <td>多語言（含中文）</td>
      </tr>
      <tr>
          <td><code>ggml-large-v3.bin</code></td>
          <td>3.1 GB</td>
          <td>多語言、最佳品質、跑得最慢</td>
      </tr>
  </tbody>
</table>
<p>選 <code>tiny.en</code> 是因為<strong>只驗證安裝路徑</strong>、實際日常用要 <code>small.en</code> 起跳。</p>
<p>驗證下載：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">ls -lh ~/.whisper-models/
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"># 應該看到 78 MB 的 ggml-tiny.en.bin</span></span></span></code></pre></div><h2 id="跑第一次轉錄">跑第一次轉錄</h2>
<p>需要一段測試音訊。可以用 macOS 內建 <code>say</code> 生成、再用 ffmpeg 轉成 whisper.cpp 需要的格式（16kHz mono WAV）：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="nb">cd</span> /tmp
</span></span><span class="line"><span class="ln">2</span><span class="cl">say -o sample.aiff -v Samantha <span class="s2">&#34;Hello world. This is a test of the whisper transcription system. It should produce accurate text from this short audio clip.&#34;</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">ffmpeg -loglevel error -y -i sample.aiff -ar <span class="m">16000</span> -ac <span class="m">1</span> sample.wav</span></span></code></pre></div><p><code>-ar 16000 -ac 1</code> 是 whisper.cpp 的標準輸入規格（16 kHz、單聲道、16-bit PCM）。Whisper 模型訓練時用這個 sample rate、輸入不符會降低準確度。</p>
<p>轉錄：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">whisper-cli -m ~/.whisper-models/ggml-tiny.en.bin -f /tmp/sample.wav</span></span></code></pre></div><p>預期輸出（含時間軸）：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">[00:00:00.000 --&gt; 00:00:03.980]   Hello World, this is a test of the whisper transcription system.
</span></span><span class="line"><span class="ln">2</span><span class="cl">[00:00:03.980 --&gt; 00:00:06.980]   It should produce accurate text from this short audio clip.
</span></span><span class="line"><span class="ln">3</span><span class="cl">
</span></span><span class="line"><span class="ln">4</span><span class="cl">whisper_print_timings:     load time =    39.88 ms
</span></span><span class="line"><span class="ln">5</span><span class="cl">whisper_print_timings:   encode time =   220.01 ms
</span></span><span class="line"><span class="ln">6</span><span class="cl">whisper_print_timings:    total time =   484.08 ms</span></span></code></pre></div><p>關鍵觀察：</p>
<ul>
<li><strong>484ms</strong> 處理 7 秒音訊、約 14x 即時速度。</li>
<li>轉錄結果跟原文一致（除了 <code>world</code> 大寫變 <code>World</code>）。</li>
<li>含時間軸（time stamps）、可以做 subtitle / 字幕對齊。</li>
</ul>
<p>要拿不含時間軸的純文字：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">whisper-cli -m ~/.whisper-models/ggml-tiny.en.bin -f /tmp/sample.wav -nt
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"># -nt 是 --no-timestamps</span></span></span></code></pre></div><h2 id="常用選項">常用選項</h2>
<table>
  <thead>
      <tr>
          <th>選項</th>
          <th>作用</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><code>-l zh</code></td>
          <td>指定語言（中文）；多語言 model 用、單語 model 用不到</td>
      </tr>
      <tr>
          <td><code>-otxt</code></td>
          <td>同時輸出 .txt 檔（純文字、無時間軸）</td>
      </tr>
      <tr>
          <td><code>-osrt</code></td>
          <td>同時輸出 .srt 字幕檔</td>
      </tr>
      <tr>
          <td><code>-ovtt</code></td>
          <td>同時輸出 .vtt 字幕檔</td>
      </tr>
      <tr>
          <td><code>-of OUT</code></td>
          <td>設定輸出檔名 prefix</td>
      </tr>
      <tr>
          <td><code>-t N</code></td>
          <td>用 N 個 thread（預設用 CPU 核心數）</td>
      </tr>
      <tr>
          <td><code>-pp</code></td>
          <td>print progress（顯示處理進度條、跑長音訊時開）</td>
      </tr>
  </tbody>
</table>
<p>實務常用組合：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># 字幕生成</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">whisper-cli -m ~/.whisper-models/ggml-small.en.bin <span class="se">\
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="se"></span>  -f input.wav <span class="se">\
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="se"></span>  -osrt <span class="se">\
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="se"></span>  -of output_subtitle
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="c1"># 中文轉錄</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">whisper-cli -m ~/.whisper-models/ggml-small.bin <span class="se">\
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="se"></span>  -f speech.wav <span class="se">\
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="se"></span>  -l zh</span></span></code></pre></div><h2 id="跟其他工具串接">跟其他工具串接</h2>
<p>Whisper-cli 的 stdout 是純文字、容易串 pipeline：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># 轉錄結果直接餵給 LLM 摘要</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">whisper-cli -m ~/.whisper-models/ggml-small.en.bin -f meeting.wav -nt <span class="se">\
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="se"></span>  <span class="p">|</span> curl -s http://localhost:11434/v1/chat/completions <span class="se">\
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="se"></span>    -H <span class="s2">&#34;Content-Type: application/json&#34;</span> <span class="se">\
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="se"></span>    -d @- <span class="s">&lt;&lt;EOF
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="s">{
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="s">  &#34;model&#34;: &#34;gemma3:1b&#34;,
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="s">  &#34;messages&#34;: [
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="s">    {&#34;role&#34;: &#34;system&#34;, &#34;content&#34;: &#34;Summarize the meeting transcript in 5 bullet points.&#34;},
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="s">    {&#34;role&#34;: &#34;user&#34;, &#34;content&#34;: &#34;$(cat)&#34;}
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="s">  ]
</span></span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="s">}
</span></span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="s">EOF</span></span></span></code></pre></div><p>這個 pipeline 串接到 <a href="/blog/llm/01-local-llm-services/hands-on/ollama-setup/" data-link-title="Hands-on：安裝 Ollama &#43; 拉第一個 Gemma 模型" data-link-desc="brew install ollama、launchd service、ollama pull、curl 驗證 OpenAI 相容 API">Ollama</a> 完成「語音 → 文字 → 摘要」流程、整條本地、無雲端 API。</p>
<h2 id="常見坑">常見坑</h2>
<h3 id="audio-file-not-found--format-error">「audio file not found / format error」</h3>
<p>確認 ffmpeg 已轉成 16kHz mono：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">ffprobe input.wav 2&gt;<span class="p">&amp;</span><span class="m">1</span> <span class="p">|</span> grep -E <span class="s2">&#34;Stream|Audio&#34;</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"># 應該看到：Audio: pcm_s16le, 16000 Hz, mono</span></span></span></code></pre></div><p>不是這個規格就用 ffmpeg 轉：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">ffmpeg -i input.mp3 -ar <span class="m">16000</span> -ac <span class="m">1</span> -c:a pcm_s16le output.wav</span></span></code></pre></div><h3 id="model-載入慢">Model 載入慢</h3>
<p>第一次 Metal lib 初始化要 ~7 秒、是 macOS Metal compiler 在 cache shader。後續快很多。</p>
<p>如果每次都慢、看是否 Metal cache 路徑（<code>~/Library/Caches/...</code>）有權限問題。</p>
<h3 id="中文--多語言準確度差">中文 / 多語言準確度差</h3>
<p>確認 model 不是 <code>.en</code> 後綴：<code>.en</code> model 只訓練英文、餵中文會 hallucinate。中文要用 <code>ggml-small.bin</code>、<code>ggml-medium.bin</code>、<code>ggml-large-v3.bin</code>（沒 <code>.en</code>）。</p>
<h3 id="output-拼錯字">Output 拼錯字</h3>
<p>Whisper tiny / base model 對非母音清晰、噪音多、口音重的音訊準確度差。換 small 或 medium 通常解決。</p>
<h2 id="完整-round-trip-驗證">完整 round-trip 驗證</h2>
<p>驗證 Whisper + Piper TTS 完整迴圈：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># Piper 生成 WAV</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="nb">echo</span> <span class="s2">&#34;Hello world test.&#34;</span> <span class="p">|</span> piper -m ~/.piper-voices/en_US-lessac-low.onnx -f /tmp/out.wav
</span></span><span class="line"><span class="ln">3</span><span class="cl">
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="c1"># Whisper 轉回文字</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">whisper-cli -m ~/.whisper-models/ggml-tiny.en.bin -f /tmp/out.wav -nt
</span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="c1"># 應該回：Hello world test.</span></span></span></code></pre></div><p>兩個都跑得起來表示整條 STT / TTS pipeline 工作。沒裝 Piper 的場景：用任何 16kHz 單聲道 WAV 都能驗證（macOS 內建 <code>say -o sample.aiff</code> + ffmpeg 轉檔、或從 Hugging Face 拉個 sample 音訊）、不一定要用 Piper。</p>
<p>跟其他章節的關係：完整 hands-on 系列見 <a href="/blog/llm/01-local-llm-services/hands-on/" data-link-title="Hands-on：本地 AI 工具實作筆記" data-link-desc="Ollama / ComfyUI / Whisper / Piper TTS：實際安裝、驗證、跑通的紀錄。隨工具版本演化、跟 1.x 原理章節互補。">Hands-on 章節索引</a>、本地 LLM 加 speech 在隱私 / 資料流上的位置見 <a href="/blog/llm/00-foundations/privacy-data-flow/" data-link-title="0.7 隱私 / 資安的資料流原理" data-link-desc="從「位置」到「資料流」的思考升級：信任邊界、合約模型、零信任原則套用到 LLM 工作流">0.7 隱私資料流原理</a>、排錯走三層方法論見 <a href="/blog/llm/01-local-llm-services/troubleshooting/" data-link-title="1.7 排錯方法論：用三層架構做故障定位" data-link-desc="故障定位的分層思考、症狀到層級的對應反射、log 在三層的角色差異、最小可重現的縮減策略">1.7 排錯方法論</a>。</p>
<h2 id="何時這篇會過時">何時這篇會過時</h2>
<ul>
<li><code>brew install whisper-cpp</code> 安裝方式短期內不會變。</li>
<li>GGML model 路徑（Hugging Face <code>ggerganov/whisper.cpp</code>）穩定、是 maintainer 官方 repo。</li>
<li>模型版本會更新（large-v3 → large-v4 等）、但「下載 GGML、用 whisper-cli 餵 WAV」流程不變。</li>
<li>Metal backend 自動啟用、不需配置——Apple Silicon GPU 演化會持續增進效能但不影響介面。</li>
</ul>
<p>讀的時候若 brew 跑失敗、查 whisper.cpp GitHub release notes；模型新版本看 Hugging Face <code>ggerganov/whisper.cpp</code> repo 列表。</p>
]]></content:encoded></item></channel></rss>