<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Piper on Tarragon</title><link>https://tarrragon.github.io/blog/tags/piper/</link><description>Recent content in Piper on Tarragon</description><generator>Hugo -- gohugo.io</generator><language>zh-TW</language><copyright>Tarragon (CC BY 4.0)</copyright><lastBuildDate>Tue, 12 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://tarrragon.github.io/blog/tags/piper/index.xml" rel="self" type="application/rss+xml"/><item><title>Hands-on：安裝 Piper TTS 做文字轉語音</title><link>https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/piper-tts-setup/</link><pubDate>Tue, 12 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/piper-tts-setup/</guid><description>&lt;p>本篇紀錄裝 Piper TTS 並用它合成英文語音、再用 Whisper 轉回文字做 round-trip 驗證。選 Piper 而非雲端 TTS（OpenAI / ElevenLabs）的理由：&lt;/p>
&lt;ul>
&lt;li>完全本地、隱私邊界乾淨。&lt;/li>
&lt;li>ONNX runtime、Apple Silicon 跑得動、不依賴 GPU。&lt;/li>
&lt;li>模型小（low quality ~17-65 MB、medium ~50 MB、high ~125 MB）、適合 minimal 驗證。&lt;/li>
&lt;li>CLI-first、stdin 餵文字、stdout 或檔案輸出 WAV、容易串 pipeline。&lt;/li>
&lt;/ul>
&lt;blockquote>
&lt;p>&lt;strong>驗證日期&lt;/strong>：2026-05-12
&lt;strong>Piper 版本&lt;/strong>：透過 pip 安裝
&lt;strong>示範 voice&lt;/strong>：&lt;code>en_US-lessac-low.onnx&lt;/code>（63 MB、英文女聲、low quality）
&lt;strong>實測&lt;/strong>：4 秒文字合成 &amp;lt; 1 秒、品質夠日常用&lt;/p>&lt;/blockquote>
&lt;h2 id="前置設定">前置設定&lt;/h2>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>項目&lt;/th>
 &lt;th>檢查指令&lt;/th>
 &lt;th>預期&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>Python&lt;/td>
 &lt;td>&lt;code>python3 --version&lt;/code>&lt;/td>
 &lt;td>3.11+&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>pip&lt;/td>
 &lt;td>&lt;code>pip3 --version&lt;/code>&lt;/td>
 &lt;td>25+&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>磁碟空間&lt;/td>
 &lt;td>&lt;code>df -h ~&lt;/code>&lt;/td>
 &lt;td>至少 200 MB（Piper + 一個 voice）&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>Piper 跟 Whisper 一樣分離 binary 跟 model：先裝 runtime、再下載 voice。&lt;/p>
&lt;h2 id="安裝-piper">安裝 Piper&lt;/h2>
&lt;p>&lt;code>piper-tts&lt;/code> 沒有 Homebrew formula、用 pip 裝：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">pip3 install piper-tts --break-system-packages&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;code>PEP 668&lt;/code> 是 macOS / Homebrew Python 的 external-management 機制、保護系統 Python 不被 pip 安裝污染；&lt;code>--break-system-packages&lt;/code> 是 bypass flag、跳過該檢查直接裝。比較乾淨的做法是用 venv：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">python3 -m venv ~/.piper-venv
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="nb">source&lt;/span> ~/.piper-venv/bin/activate
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">pip install piper-tts&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>但裝完 PATH 要指到 venv 的 piper、稍麻煩。本 demo 用 &lt;code>--break-system-packages&lt;/code> 簡化。實際生產建議用 venv 或 pipx。&lt;/p>
&lt;p>驗證 binary 在 PATH：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">which piper
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="c1"># /opt/homebrew/bin/piper（若 pip3 來自 Homebrew Python）&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">&lt;span class="c1"># 或 ~/Library/Python/3.x/bin/piper（若 pip3 來自系統 Python）&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl">piper --help &lt;span class="p">|&lt;/span> head -10&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;code>which piper&lt;/code> 找不到時、檢查兩個 bin 目錄哪邊有檔案、把該目錄加進 &lt;code>PATH&lt;/code>。&lt;/p>
&lt;h2 id="下載-voice-model">下載 Voice Model&lt;/h2>
&lt;p>Piper 用 ONNX 格式的 voice model、每個 voice 是一對 &lt;code>.onnx&lt;/code>（model 權重）+ &lt;code>.onnx.json&lt;/code>（metadata、含採樣率、phoneme map）。&lt;/p>
&lt;p>從 Hugging Face &lt;code>rhasspy/piper-voices&lt;/code> repo 拉：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">mkdir -p ~/.piper-voices
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="nb">cd&lt;/span> ~/.piper-voices
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">&lt;span class="c1"># 英文女聲、low quality（小、快）&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl">curl -L -o en_US-lessac-low.onnx &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">6&lt;/span>&lt;span class="cl">&lt;span class="se">&lt;/span> &lt;span class="s2">&amp;#34;https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_US/lessac/low/en_US-lessac-low.onnx&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">7&lt;/span>&lt;span class="cl">curl -L -o en_US-lessac-low.onnx.json &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">8&lt;/span>&lt;span class="cl">&lt;span class="se">&lt;/span> &lt;span class="s2">&amp;#34;https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_US/lessac/low/en_US-lessac-low.onnx.json&amp;#34;&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>可用 voice quality 等級：&lt;/p></description><content:encoded><![CDATA[<p>本篇紀錄裝 Piper TTS 並用它合成英文語音、再用 Whisper 轉回文字做 round-trip 驗證。選 Piper 而非雲端 TTS（OpenAI / ElevenLabs）的理由：</p>
<ul>
<li>完全本地、隱私邊界乾淨。</li>
<li>ONNX runtime、Apple Silicon 跑得動、不依賴 GPU。</li>
<li>模型小（low quality ~17-65 MB、medium ~50 MB、high ~125 MB）、適合 minimal 驗證。</li>
<li>CLI-first、stdin 餵文字、stdout 或檔案輸出 WAV、容易串 pipeline。</li>
</ul>
<blockquote>
<p><strong>驗證日期</strong>：2026-05-12
<strong>Piper 版本</strong>：透過 pip 安裝
<strong>示範 voice</strong>：<code>en_US-lessac-low.onnx</code>（63 MB、英文女聲、low quality）
<strong>實測</strong>：4 秒文字合成 &lt; 1 秒、品質夠日常用</p></blockquote>
<h2 id="前置設定">前置設定</h2>
<table>
  <thead>
      <tr>
          <th>項目</th>
          <th>檢查指令</th>
          <th>預期</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Python</td>
          <td><code>python3 --version</code></td>
          <td>3.11+</td>
      </tr>
      <tr>
          <td>pip</td>
          <td><code>pip3 --version</code></td>
          <td>25+</td>
      </tr>
      <tr>
          <td>磁碟空間</td>
          <td><code>df -h ~</code></td>
          <td>至少 200 MB（Piper + 一個 voice）</td>
      </tr>
  </tbody>
</table>
<p>Piper 跟 Whisper 一樣分離 binary 跟 model：先裝 runtime、再下載 voice。</p>
<h2 id="安裝-piper">安裝 Piper</h2>
<p><code>piper-tts</code> 沒有 Homebrew formula、用 pip 裝：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">pip3 install piper-tts --break-system-packages</span></span></code></pre></div><p><code>PEP 668</code> 是 macOS / Homebrew Python 的 external-management 機制、保護系統 Python 不被 pip 安裝污染；<code>--break-system-packages</code> 是 bypass flag、跳過該檢查直接裝。比較乾淨的做法是用 venv：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">python3 -m venv ~/.piper-venv
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="nb">source</span> ~/.piper-venv/bin/activate
</span></span><span class="line"><span class="ln">3</span><span class="cl">pip install piper-tts</span></span></code></pre></div><p>但裝完 PATH 要指到 venv 的 piper、稍麻煩。本 demo 用 <code>--break-system-packages</code> 簡化。實際生產建議用 venv 或 pipx。</p>
<p>驗證 binary 在 PATH：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">which piper
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"># /opt/homebrew/bin/piper（若 pip3 來自 Homebrew Python）</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="c1"># 或 ~/Library/Python/3.x/bin/piper（若 pip3 來自系統 Python）</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">
</span></span><span class="line"><span class="ln">5</span><span class="cl">piper --help <span class="p">|</span> head -10</span></span></code></pre></div><p><code>which piper</code> 找不到時、檢查兩個 bin 目錄哪邊有檔案、把該目錄加進 <code>PATH</code>。</p>
<h2 id="下載-voice-model">下載 Voice Model</h2>
<p>Piper 用 ONNX 格式的 voice model、每個 voice 是一對 <code>.onnx</code>（model 權重）+ <code>.onnx.json</code>（metadata、含採樣率、phoneme map）。</p>
<p>從 Hugging Face <code>rhasspy/piper-voices</code> repo 拉：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">mkdir -p ~/.piper-voices
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="nb">cd</span> ~/.piper-voices
</span></span><span class="line"><span class="ln">3</span><span class="cl">
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="c1"># 英文女聲、low quality（小、快）</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">curl -L -o en_US-lessac-low.onnx <span class="se">\
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="se"></span>  <span class="s2">&#34;https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_US/lessac/low/en_US-lessac-low.onnx&#34;</span>
</span></span><span class="line"><span class="ln">7</span><span class="cl">curl -L -o en_US-lessac-low.onnx.json <span class="se">\
</span></span></span><span class="line"><span class="ln">8</span><span class="cl"><span class="se"></span>  <span class="s2">&#34;https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_US/lessac/low/en_US-lessac-low.onnx.json&#34;</span></span></span></code></pre></div><p>可用 voice quality 等級：</p>
<table>
  <thead>
      <tr>
          <th>Quality</th>
          <th>大小</th>
          <th>用途</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><code>low</code></td>
          <td>17-65 MB</td>
          <td>快、品質粗糙、適合 prototype</td>
      </tr>
      <tr>
          <td><code>medium</code></td>
          <td>50-100 MB</td>
          <td>平衡、日常用</td>
      </tr>
      <tr>
          <td><code>high</code></td>
          <td>100-200 MB</td>
          <td>品質佳、合成略慢</td>
      </tr>
      <tr>
          <td><code>x_low</code></td>
          <td>&lt; 20 MB</td>
          <td>極小、品質明顯差、適合受限環境</td>
      </tr>
  </tbody>
</table>
<p>語言 / 地區覆蓋（部分）：</p>
<table>
  <thead>
      <tr>
          <th>Locale</th>
          <th>Voice 範例</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><code>en_US</code></td>
          <td>lessac、ryan、amy、libritts</td>
      </tr>
      <tr>
          <td><code>en_GB</code></td>
          <td>alan、cori、jenny</td>
      </tr>
      <tr>
          <td><code>zh_CN</code></td>
          <td>huayan（北京話）</td>
      </tr>
      <tr>
          <td><code>ja_JP</code>（社群）</td>
          <td>較少</td>
      </tr>
      <tr>
          <td><code>de_DE</code> / <code>fr_FR</code> / <code>es_ES</code> 等</td>
          <td>各有多個</td>
      </tr>
  </tbody>
</table>
<p>完整清單在 <code>rhasspy/piper-voices</code> 的 <a href="https://github.com/rhasspy/piper">VOICES.md</a>。</p>
<p>驗證下載：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">ls -lh ~/.piper-voices/
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"># en_US-lessac-low.onnx       63M</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="c1"># en_US-lessac-low.onnx.json  4.9K</span></span></span></code></pre></div><h2 id="跑第一次合成">跑第一次合成</h2>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="nb">echo</span> <span class="s2">&#34;Hello from Piper TTS, this is a synthesized voice test.&#34;</span> <span class="se">\
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="se"></span>  <span class="p">|</span> piper -m ~/.piper-voices/en_US-lessac-low.onnx -f /tmp/piper-out.wav</span></span></code></pre></div><p>說明：</p>
<ul>
<li>文字從 stdin 進、是 Piper 的標準輸入方式。</li>
<li><code>-m</code>：voice model <code>.onnx</code> path。Piper 自動找同目錄的 <code>.onnx.json</code>。</li>
<li><code>-f</code>：output WAV path。不指定的話直接寫 stdout（可以 pipe 到 <code>aplay</code> / <code>afplay</code> 即時播放）。</li>
</ul>
<p>預期輸出：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">ls -lh /tmp/piper-out.wav
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"># 128 KB</span></span></span></code></pre></div><p>驗證 WAV 規格：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">file /tmp/piper-out.wav
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"># RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 16000 Hz</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">
</span></span><span class="line"><span class="ln">4</span><span class="cl">ffprobe -loglevel error -show_format /tmp/piper-out.wav <span class="p">|</span> grep duration
</span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="c1"># duration=3.984000</span></span></span></code></pre></div><p>16-bit PCM、16 kHz mono——跟 <a href="/blog/llm/01-local-llm-services/hands-on/whisper-setup/" data-link-title="Hands-on：安裝 whisper.cpp 做語音轉文字" data-link-desc="brew install whisper-cpp、下載 GGML model、Metal 加速、ffmpeg 餵 WAV、484ms 完成 7 秒音訊轉錄">Whisper</a> 期望的輸入規格一致、可以直接 round-trip。</p>
<p>播放確認：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">afplay /tmp/piper-out.wav</span></span></code></pre></div><h2 id="常用選項">常用選項</h2>
<table>
  <thead>
      <tr>
          <th>選項</th>
          <th>作用</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><code>-m MODEL</code></td>
          <td>voice model <code>.onnx</code> 路徑（必填）</td>
      </tr>
      <tr>
          <td><code>-c CONFIG</code></td>
          <td>metadata json 路徑（預設自動找同名 <code>.onnx.json</code>）</td>
      </tr>
      <tr>
          <td><code>-i FILE</code></td>
          <td>輸入文字檔（替代 stdin）</td>
      </tr>
      <tr>
          <td><code>-f OUTPUT</code></td>
          <td>輸出 WAV 路徑</td>
      </tr>
      <tr>
          <td><code>-d DIR</code></td>
          <td>輸出目錄（多句時自動分檔）</td>
      </tr>
      <tr>
          <td><code>--length-scale FACTOR</code></td>
          <td>速度調整（&lt; 1 加速、&gt; 1 減速、預設 1.0）</td>
      </tr>
      <tr>
          <td><code>--volume FACTOR</code></td>
          <td>音量調整（0.0-1.0）</td>
      </tr>
      <tr>
          <td><code>-s SPEAKER</code></td>
          <td>多 speaker model 選 speaker（如 libritts）</td>
      </tr>
      <tr>
          <td><code>--cuda</code></td>
          <td>用 CUDA（Apple Silicon 用不到、留 default）</td>
      </tr>
  </tbody>
</table>
<p>典型應用：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># 從文字檔合成</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">piper -m ~/.piper-voices/en_US-lessac-low.onnx <span class="se">\
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="se"></span>  -i article.txt <span class="se">\
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="se"></span>  -f narration.wav
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="c1"># 多句子分檔</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">piper -m ~/.piper-voices/en_US-lessac-medium.onnx <span class="se">\
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="se"></span>  -i script.txt <span class="se">\
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="se"></span>  -d ~/audio-output/ <span class="se">\
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="se"></span>  --output-dir-naming text
</span></span><span class="line"><span class="ln">11</span><span class="cl">
</span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="c1"># 慢速朗讀（學習用）</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">piper -m ~/.piper-voices/en_US-lessac-low.onnx <span class="se">\
</span></span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="se"></span>  --length-scale 1.4 <span class="se">\
</span></span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="se"></span>  -f slow.wav <span class="o">&lt;&lt;&lt;</span> <span class="s2">&#34;Slowly read this sentence.&#34;</span></span></span></code></pre></div><h2 id="round-trip-驗證">Round-Trip 驗證</h2>
<p>確認 TTS + STT 整條串得起來：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># 1. Piper TTS：文字 → WAV</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="nb">echo</span> <span class="s2">&#34;The quick brown fox jumps over the lazy dog.&#34;</span> <span class="se">\
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="se"></span>  <span class="p">|</span> piper -m ~/.piper-voices/en_US-lessac-low.onnx -f /tmp/test.wav
</span></span><span class="line"><span class="ln">4</span><span class="cl">
</span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="c1"># 2. Whisper STT：WAV → 文字</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl">whisper-cli -m ~/.whisper-models/ggml-tiny.en.bin -f /tmp/test.wav -nt</span></span></code></pre></div><p>預期 Whisper 回應接近原文字（可能大小寫 / 標點稍變）。Round-trip 成功表示：</p>
<ul>
<li>Piper 輸出格式（16kHz mono WAV）符合 Whisper 輸入需求。</li>
<li>兩個模型對英文的訓練分佈相容。</li>
</ul>
<h2 id="跟-llm-串接llm-說話的-minimal-pipeline">跟 LLM 串接：「LLM 說話」的 minimal pipeline</h2>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># 1. LLM 生成回答</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="nv">ANSWER</span><span class="o">=</span><span class="k">$(</span>curl -s http://localhost:11434/v1/chat/completions <span class="se">\
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="se"></span>  -H <span class="s2">&#34;Content-Type: application/json&#34;</span> <span class="se">\
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="se"></span>  -d <span class="s1">&#39;{
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="s1">    &#34;model&#34;: &#34;gemma3:1b&#34;,
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="s1">    &#34;messages&#34;: [{&#34;role&#34;:&#34;user&#34;,&#34;content&#34;:&#34;Tell me a one-sentence joke.&#34;}],
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="s1">    &#34;stream&#34;: false
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="s1">  }&#39;</span> <span class="p">|</span> python3 -c <span class="s2">&#34;import json,sys; print(json.load(sys.stdin)[&#39;choices&#39;][0][&#39;message&#39;][&#39;content&#39;])&#34;</span><span class="k">)</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="c1"># 2. Piper 把回答念出來</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="nb">echo</span> <span class="s2">&#34;</span><span class="nv">$ANSWER</span><span class="s2">&#34;</span> <span class="p">|</span> piper -m ~/.piper-voices/en_US-lessac-low.onnx -f /tmp/llm-says.wav
</span></span><span class="line"><span class="ln">12</span><span class="cl">
</span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="c1"># 3. 播放</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">afplay /tmp/llm-says.wav</span></span></code></pre></div><p>三行 shell 完成「Local LLM 講笑話」整條 pipeline、無雲端、無 GPU。</p>
<h2 id="常見坑">常見坑</h2>
<h3 id="中文--多語言">中文 / 多語言</h3>
<p><code>en_US-lessac-low</code> 是英文 voice、餵中文會發音怪。中文要下載 <code>zh_CN-huayan-*</code>：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">curl -L -o ~/.piper-voices/zh_CN-huayan-medium.onnx <span class="se">\
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="se"></span>  <span class="s2">&#34;https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/zh/zh_CN/huayan/medium/zh_CN-huayan-medium.onnx&#34;</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">curl -L -o ~/.piper-voices/zh_CN-huayan-medium.onnx.json <span class="se">\
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="se"></span>  <span class="s2">&#34;https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/zh/zh_CN/huayan/medium/zh_CN-huayan-medium.onnx.json&#34;</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">
</span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="nb">echo</span> <span class="s2">&#34;你好，這是 Piper TTS 的中文測試。&#34;</span> <span class="se">\
</span></span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="se"></span>  <span class="p">|</span> piper -m ~/.piper-voices/zh_CN-huayan-medium.onnx -f /tmp/zh-out.wav</span></span></code></pre></div><p>zh_CN 預設是北京話腔調。</p>
<h3 id="--break-system-packages-警告"><code>--break-system-packages</code> 警告</h3>
<p>macOS 系統 Python 3.13+ 預設禁止 pip 直接裝。安全做法用 venv 或 pipx；不想搞 venv 就用 <code>--break-system-packages</code> flag（會跳警告但能裝）。長期建議遷到 venv、避免污染系統 Python。</p>
<h3 id="voice-quality-不夠">Voice quality 不夠</h3>
<p><code>low</code> quality 的 voice 適合驗證 / prototype、實際用 <code>medium</code> 或 <code>high</code>。低品質 voice 在長段文字會聽起來機械、自然度差。</p>
<h3 id="sample-rate-mismatch">Sample rate mismatch</h3>
<p>Voice metadata（<code>.onnx.json</code> 內 <code>sample_rate</code>）決定輸出 sample rate、不同 voice 可能不同（多數 22050 或 16000）。Whisper 期望 16000、若 Piper 輸出 22050、可能需要 ffmpeg 降採樣：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">ffmpeg -i piper-out.wav -ar <span class="m">16000</span> piper-out-16k.wav</span></span></code></pre></div><p><code>en_US-lessac-low</code> 本來就是 16k、沒這問題。</p>
<h2 id="何時這篇會過時">何時這篇會過時</h2>
<ul>
<li><code>pip install piper-tts</code> 安裝方式可能演化（轉純 binary release？）、但 ONNX model + CLI invocation 形式應該穩定。</li>
<li>Voice model 格式（ONNX）是 web 通用標準、未來增加 quality / locale、現有 voice 不會被 deprecate。</li>
<li>Hugging Face <code>rhasspy/piper-voices</code> repo 是 maintainer 官方、不會消失。</li>
</ul>
<p>讀的時候若 pip install 失敗、查 <a href="https://github.com/rhasspy/piper">piper GitHub</a> 最新 install 路徑；voice 列表看 piper-voices repo。</p>
<p>跟其他 hands-on 章節的關係：完整 hands-on 系列見 <a href="/blog/llm/01-local-llm-services/hands-on/" data-link-title="Hands-on：本地 AI 工具實作筆記" data-link-desc="Ollama / ComfyUI / Whisper / Piper TTS：實際安裝、驗證、跑通的紀錄。隨工具版本演化、跟 1.x 原理章節互補。">Hands-on 章節索引</a>、語音 round-trip 對接見 <a href="/blog/llm/01-local-llm-services/hands-on/whisper-setup/" data-link-title="Hands-on：安裝 whisper.cpp 做語音轉文字" data-link-desc="brew install whisper-cpp、下載 GGML model、Metal 加速、ffmpeg 餵 WAV、484ms 完成 7 秒音訊轉錄">Whisper STT</a>、跨服務 lifecycle 與記憶體管理見 <a href="/blog/llm/01-local-llm-services/hands-on/resource-management/" data-link-title="Hands-on：LLM 運行中 &#43; 結束的資源管理" data-link-desc="RAM / 磁碟 / port 三個 dimension 的觀察跟釋放、Ollama keep_alive 跟 ComfyUI 兩種 lifecycle 對比、實測釋放數字">Resource management</a>。</p>
]]></content:encoded></item></channel></rss>