<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Hands-On on Tarragon</title><link>https://tarrragon.github.io/blog/tags/hands-on/</link><description>Recent content in Hands-On on Tarragon</description><generator>Hugo -- gohugo.io</generator><language>zh-TW</language><copyright>Tarragon (CC BY 4.0)</copyright><lastBuildDate>Wed, 01 Jul 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://tarrragon.github.io/blog/tags/hands-on/index.xml" rel="self" type="application/rss+xml"/><item><title>Case Study：customer support agent 從 task decomposition 到 eval</title><link>https://tarrragon.github.io/blog/llm/04-applications/hands-on/customer-support-case-study/</link><pubDate>Thu, 14 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/llm/04-applications/hands-on/customer-support-case-study/</guid><description>&lt;p>本案例的責任是把模組四前面所有原理章節串成一個端到端的設計過程、示範&lt;strong>遇到實際 LLM 應用任務時、設計反射動作的順序&lt;/strong>。每段都標出引用哪章原理、讓讀者看到 principle 章節怎麼落到具體工作。&lt;/p>
&lt;p>用作走查的任務：PM 交派「做一個 customer support agent、能處理用戶查詢、必要時自動完成操作（如改地址）。」本案例聚焦「改地址」這個高頻 query type 走完整流程。&lt;/p>
&lt;h2 id="本案例的設計反射">本案例的設計反射&lt;/h2>
&lt;p>整個流程分七階段：&lt;/p>
&lt;ol>
&lt;li>&lt;strong>觀察人類工作流&lt;/strong>：訪談、決定 task decomposition&lt;/li>
&lt;li>&lt;strong>典範定位&lt;/strong>：哪段該 deterministic、哪段該 fuzzy&lt;/li>
&lt;li>&lt;strong>工作流設計&lt;/strong>：每個 step 選對應的 LLM / tool / RAG / HITL 形態&lt;/li>
&lt;li>&lt;strong>協議跟自主度決定&lt;/strong>：是 single agent / multi-call / multi-agent&lt;/li>
&lt;li>&lt;strong>Trace instrumentation&lt;/strong>：哪些資訊要記&lt;/li>
&lt;li>&lt;strong>Eval 設計&lt;/strong>：先選座標、再選工具&lt;/li>
&lt;li>&lt;strong>Iteration loop&lt;/strong>：error analysis → 修哪一層 → 看 metric 收斂&lt;/li>
&lt;/ol>
&lt;p>初次設計 LLM 應用時最常省略階段 1、2、5、6、直接跳到階段 3 開始寫 prompt——這條路會走進「prompt 改了 20 版、無法判讀有沒有變好」的迭代無收斂。本案例強調的是設計反射動作的順序、不是寫 prompt 技巧。&lt;/p>
&lt;h2 id="階段-1觀察人類工作流">階段 1：觀察人類工作流&lt;/h2>
&lt;p>PM 給的任務描述是「處理用戶查詢」、但「查詢」涵蓋的範圍可能很大。第一個反射動作是&lt;strong>坐在客服旁邊觀察兩天&lt;/strong>、不是打開 IDE。&lt;/p>
&lt;p>實際做的事：&lt;/p>
&lt;ul>
&lt;li>統計收到的 query 類型分佈（退款 / 改地址 / 查詢訂單狀態 / 抱怨 / 開放問題各佔多少）。&lt;/li>
&lt;li>看每類 query 的 human resolution 流程（哪幾步、要查哪些系統、要遵守哪些 policy）。&lt;/li>
&lt;li>看哪幾類 query 是 high volume + low complexity（最值得自動化）、哪幾類是 low volume + high complexity（自動化 ROI 差）。&lt;/li>
&lt;li>記下 human 在哪些 step 卡住、哪些 step 反覆需要查同樣資料。&lt;/li>
&lt;/ul>
&lt;p>訪談結束、你得到一張 task decomposition map。本案例假設聚焦在「用戶請求改地址」這個高頻 query type：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-text" data-lang="text">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">User: 「我搬家了、訂單編號 #12345、新地址是 ___」
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl"> ↓
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">1. 解析意圖 + 抽取訊息（訂單編號、新地址）
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">2. 查訂單狀態（已出貨？未出貨？已送達？）
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl">3. 查 policy（這個訂單狀態 + user tier 能不能改地址？）
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">6&lt;/span>&lt;span class="cl">4. 若可：執行改地址（呼叫物流 / 庫存 API）
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">7&lt;/span>&lt;span class="cl">5. 若不可：解釋為什麼、給替代方案
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">8&lt;/span>&lt;span class="cl">6. 草擬回覆 email、發出&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>引用原理：這個 decomposition 本身對應 &lt;a href="https://tarrragon.github.io/blog/llm/00-foundations/deterministic-vs-fuzzy-engineering/" data-link-title="0.8 Deterministic vs Fuzzy Engineering：軟體設計典範的位移" data-link-desc="傳統 deterministic 軟體跟 fuzzy LLM 軟體在資料、邏輯、分解、實驗成本四個維度的根本差異、以及哪段該 deterministic、哪段該 fuzzy 的決策框架">0.8 fuzzy engineering&lt;/a>（&lt;a href="https://tarrragon.github.io/blog/llm/knowledge-cards/deterministic-vs-fuzzy/" data-link-title="Deterministic vs Fuzzy engineering" data-link-desc="LLM 軟體 vs 傳統軟體在資料 / 邏輯 / 行為一致性 / 實驗成本四維度的典範差異、決定哪段該包 guardrail">deterministic-vs-fuzzy&lt;/a> 卡）的「先分解任務、再判讀每段該 deterministic 還是 fuzzy」。&lt;/p>
&lt;h2 id="階段-2典範定位">階段 2：典範定位&lt;/h2>
&lt;p>對每個 step 做典範定位（deterministic / fuzzy）：&lt;/p>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>Step&lt;/th>
 &lt;th>典範&lt;/th>
 &lt;th>為什麼&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>1. 解析意圖 + 抽取訊息&lt;/td>
 &lt;td>Fuzzy&lt;/td>
 &lt;td>自由文字 input、需要 LLM 理解&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>2. 查訂單狀態&lt;/td>
 &lt;td>Deterministic&lt;/td>
 &lt;td>結構化 query（給 order_id、回 status）&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>3. 查 policy&lt;/td>
 &lt;td>Deterministic&lt;/td>
 &lt;td>規則可窮舉、policy as code&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>4. 執行改地址&lt;/td>
 &lt;td>Deterministic&lt;/td>
 &lt;td>API call、有 schema 跟錯誤碼&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>5. 解釋 / 給替代方案&lt;/td>
 &lt;td>Fuzzy&lt;/td>
 &lt;td>要寫人話、要 tailored to 情境&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>6. 草擬 email + 發出&lt;/td>
 &lt;td>Fuzzy（草擬）+ Deterministic（發送）&lt;/td>
 &lt;td>寫 email 是 fuzzy、發 API call 是 deterministic&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>判讀的重點是&lt;strong>邊界各歸各位&lt;/strong>：規則跟政策走 code、人話跟意圖解析走 LLM。&lt;/p></description><content:encoded><![CDATA[<p>本案例的責任是把模組四前面所有原理章節串成一個端到端的設計過程、示範<strong>遇到實際 LLM 應用任務時、設計反射動作的順序</strong>。每段都標出引用哪章原理、讓讀者看到 principle 章節怎麼落到具體工作。</p>
<p>用作走查的任務：PM 交派「做一個 customer support agent、能處理用戶查詢、必要時自動完成操作（如改地址）。」本案例聚焦「改地址」這個高頻 query type 走完整流程。</p>
<h2 id="本案例的設計反射">本案例的設計反射</h2>
<p>整個流程分七階段：</p>
<ol>
<li><strong>觀察人類工作流</strong>：訪談、決定 task decomposition</li>
<li><strong>典範定位</strong>：哪段該 deterministic、哪段該 fuzzy</li>
<li><strong>工作流設計</strong>：每個 step 選對應的 LLM / tool / RAG / HITL 形態</li>
<li><strong>協議跟自主度決定</strong>：是 single agent / multi-call / multi-agent</li>
<li><strong>Trace instrumentation</strong>：哪些資訊要記</li>
<li><strong>Eval 設計</strong>：先選座標、再選工具</li>
<li><strong>Iteration loop</strong>：error analysis → 修哪一層 → 看 metric 收斂</li>
</ol>
<p>初次設計 LLM 應用時最常省略階段 1、2、5、6、直接跳到階段 3 開始寫 prompt——這條路會走進「prompt 改了 20 版、無法判讀有沒有變好」的迭代無收斂。本案例強調的是設計反射動作的順序、不是寫 prompt 技巧。</p>
<h2 id="階段-1觀察人類工作流">階段 1：觀察人類工作流</h2>
<p>PM 給的任務描述是「處理用戶查詢」、但「查詢」涵蓋的範圍可能很大。第一個反射動作是<strong>坐在客服旁邊觀察兩天</strong>、不是打開 IDE。</p>
<p>實際做的事：</p>
<ul>
<li>統計收到的 query 類型分佈（退款 / 改地址 / 查詢訂單狀態 / 抱怨 / 開放問題各佔多少）。</li>
<li>看每類 query 的 human resolution 流程（哪幾步、要查哪些系統、要遵守哪些 policy）。</li>
<li>看哪幾類 query 是 high volume + low complexity（最值得自動化）、哪幾類是 low volume + high complexity（自動化 ROI 差）。</li>
<li>記下 human 在哪些 step 卡住、哪些 step 反覆需要查同樣資料。</li>
</ul>
<p>訪談結束、你得到一張 task decomposition map。本案例假設聚焦在「用戶請求改地址」這個高頻 query type：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">User: 「我搬家了、訂單編號 #12345、新地址是 ___」
</span></span><span class="line"><span class="ln">2</span><span class="cl">   ↓
</span></span><span class="line"><span class="ln">3</span><span class="cl">1. 解析意圖 + 抽取訊息（訂單編號、新地址）
</span></span><span class="line"><span class="ln">4</span><span class="cl">2. 查訂單狀態（已出貨？未出貨？已送達？）
</span></span><span class="line"><span class="ln">5</span><span class="cl">3. 查 policy（這個訂單狀態 + user tier 能不能改地址？）
</span></span><span class="line"><span class="ln">6</span><span class="cl">4. 若可：執行改地址（呼叫物流 / 庫存 API）
</span></span><span class="line"><span class="ln">7</span><span class="cl">5. 若不可：解釋為什麼、給替代方案
</span></span><span class="line"><span class="ln">8</span><span class="cl">6. 草擬回覆 email、發出</span></span></code></pre></div><p>引用原理：這個 decomposition 本身對應 <a href="/blog/llm/00-foundations/deterministic-vs-fuzzy-engineering/" data-link-title="0.8 Deterministic vs Fuzzy Engineering：軟體設計典範的位移" data-link-desc="傳統 deterministic 軟體跟 fuzzy LLM 軟體在資料、邏輯、分解、實驗成本四個維度的根本差異、以及哪段該 deterministic、哪段該 fuzzy 的決策框架">0.8 fuzzy engineering</a>（<a href="/blog/llm/knowledge-cards/deterministic-vs-fuzzy/" data-link-title="Deterministic vs Fuzzy engineering" data-link-desc="LLM 軟體 vs 傳統軟體在資料 / 邏輯 / 行為一致性 / 實驗成本四維度的典範差異、決定哪段該包 guardrail">deterministic-vs-fuzzy</a> 卡）的「先分解任務、再判讀每段該 deterministic 還是 fuzzy」。</p>
<h2 id="階段-2典範定位">階段 2：典範定位</h2>
<p>對每個 step 做典範定位（deterministic / fuzzy）：</p>
<table>
  <thead>
      <tr>
          <th>Step</th>
          <th>典範</th>
          <th>為什麼</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>1. 解析意圖 + 抽取訊息</td>
          <td>Fuzzy</td>
          <td>自由文字 input、需要 LLM 理解</td>
      </tr>
      <tr>
          <td>2. 查訂單狀態</td>
          <td>Deterministic</td>
          <td>結構化 query（給 order_id、回 status）</td>
      </tr>
      <tr>
          <td>3. 查 policy</td>
          <td>Deterministic</td>
          <td>規則可窮舉、policy as code</td>
      </tr>
      <tr>
          <td>4. 執行改地址</td>
          <td>Deterministic</td>
          <td>API call、有 schema 跟錯誤碼</td>
      </tr>
      <tr>
          <td>5. 解釋 / 給替代方案</td>
          <td>Fuzzy</td>
          <td>要寫人話、要 tailored to 情境</td>
      </tr>
      <tr>
          <td>6. 草擬 email + 發出</td>
          <td>Fuzzy（草擬）+ Deterministic（發送）</td>
          <td>寫 email 是 fuzzy、發 API call 是 deterministic</td>
      </tr>
  </tbody>
</table>
<p>判讀的重點是<strong>邊界各歸各位</strong>：規則跟政策走 code、人話跟意圖解析走 LLM。</p>
<ul>
<li>Policy check 寫成 code（如「user tier + 訂單狀態 → 能否改地址」是 deterministic 規則）。對應反例：把規則塞進 prompt 讓 LLM 判斷、會偶爾跳過規則或誤判 tier。</li>
<li>「能不能做」這類 yes/no 走規則。對應反例：用 LLM 算判斷、debug 困難且非確定性。</li>
<li>「Helpful 的回覆」走 LLM 寫。對應反例：在 code 內 hard-code 模板、變成僵化的客服機器人腔。</li>
</ul>
<p>最容易混的邊界在 step 6：「草擬 email」是 fuzzy（要寫人話、tailor to 情境）、「發送 email」是 deterministic（呼叫 API、處理錯誤碼）。把這兩件事拆開、草擬可以 retry / 改 prompt 不影響發送邏輯、發送有結構化 error 不被 LLM hallucinate 蓋過。Step 4「執行改地址」也類似：tool call 本身 deterministic、但是否該 call 的判讀回到 step 3 的 policy check。</p>
<p>引用原理：<a href="/blog/llm/00-foundations/deterministic-vs-fuzzy-engineering/" data-link-title="0.8 Deterministic vs Fuzzy Engineering：軟體設計典範的位移" data-link-desc="傳統 deterministic 軟體跟 fuzzy LLM 軟體在資料、邏輯、分解、實驗成本四個維度的根本差異、以及哪段該 deterministic、哪段該 fuzzy 的決策框架">0.8 fuzzy engineering</a> 的「哪段該 deterministic / 哪段該 fuzzy」決策框架、特別是反模式「邊界用錯」段。</p>
<h2 id="階段-3工作流設計">階段 3：工作流設計</h2>
<p>對每個 step 選對應的工具：</p>
<table>
  <thead>
      <tr>
          <th>Step</th>
          <th>設計選擇</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>1. 解析意圖 + 抽取訊息</td>
          <td>Vanilla LLM call + structured output（output 強制 JSON schema：intent / order_id / new_address）</td>
      </tr>
      <tr>
          <td>2. 查訂單狀態</td>
          <td>Tool call → 內部 order API</td>
      </tr>
      <tr>
          <td>3. 查 policy</td>
          <td>Tool call → policy engine（純 deterministic、不過 LLM）</td>
      </tr>
      <tr>
          <td>4. 執行改地址</td>
          <td>Tool call → logistics API、寫操作前要 pre-act HITL（高風險 + 不可逆）</td>
      </tr>
      <tr>
          <td>5. 解釋 / 給替代方案</td>
          <td>LLM call + few-shot（從 case 庫 retrieve「類似情境怎麼解釋」、配 RAG）</td>
      </tr>
      <tr>
          <td>6. 草擬 email + 發出</td>
          <td>LLM call 寫 email + structured output 含 subject/body、發送透過 email API</td>
      </tr>
  </tbody>
</table>
<p>兩個容易選錯的 step 展開：</p>
<p><strong>Step 1 為何要 structured output、不是純 prompt 解析</strong>：抽取結果要餵 step 2-4 的 deterministic tool、order_id 抽錯就整個流程斷。純 prompt 描述「請輸出 JSON」是弱保證、structured output / constrained decoding 是強保證（見 <a href="/blog/llm/03-theoretical-foundations/constrained-decoding-internals/" data-link-title="3.10 Constrained decoding 內部：grammar mask 跟性能取捨" data-link-desc="Constrained decoding 的內部運作：token mask 計算、JSON schema / regex / CFG 三種 grammar、XGrammar pre-compile 機制、性能反而加速">3.10 constrained decoding 內部</a>）。Trade-off：強格式可能犧牲表達彈性、但這個 step 不需要彈性、要的是可靠。</p>
<p><strong>Step 5 為何配 RAG 而非純 few-shot</strong>：客服 case 涵蓋多種情境（訂單已出貨 / 已送達 / VIP / 一般 user / 不同國家 policy）、固定 few-shot 範例 cover 不全。RAG 從歷史 case 庫即時 retrieve 最相似的解釋範例、屬於 <a href="/blog/llm/04-applications/prompt-techniques-landscape/" data-link-title="4.0 Prompt 技術光譜：手法分類、取捨、組合模式" data-link-desc="Zero-shot / few-shot、chain-of-thought、role / template、reflection 等 prompt 技術的分類與取捨、何時 stack 何時不要 stack、跟 fine-tune / RAG / chaining 的邊界">4.0 prompt 技術光譜</a> context 軸的 retrieval-augmented prompting。</p>
<p>引用原理：</p>
<ul>
<li>Step 1 的 structured output → <a href="/blog/llm/04-applications/application-protocols/" data-link-title="4.6 應用層協議：function calling / structured output / MCP" data-link-desc="三個常被混為一談的概念：模型能力、sampling 約束、server 協議，三者的層級差異與組合方式">4.6 應用層協議</a></li>
<li>Step 2-4 的 tool 設計 → <a href="/blog/llm/04-applications/tool-use-principles/" data-link-title="4.3 Tool use 原理：LLM 跟外部世界互動" data-link-desc="Structured output 是 LLM 跨入工程系統的橋、function calling 取捨、為什麼本地小模型 tool use 表現崩潰">4.3 tool use</a></li>
<li>Step 4 的 pre-act HITL → <a href="/blog/llm/04-applications/human-ai-collaboration/" data-link-title="4.5 人機協作拓樸：何時人介入、怎麼介入" data-link-desc="Centaur vs Cyborg 工作模式、jagged frontier、HITL 三種觸發時機（pre-act / mid-stream / post-hoc）、確認流程的設計避免橡皮圖章化">4.5 人機協作拓樸</a> pre-act 段。對比講座 Workera appeal 是 post-hoc、本案例選 pre-act 是因為改地址不可逆 + 物流影響大、必須在執行前審</li>
<li>Step 5 的 RAG → <a href="/blog/llm/04-applications/rag-principles/" data-link-title="4.1 RAG 原理：retrieval &#43; augmentation 模式" data-link-desc="為什麼模型需要外掛知識、語意相似 vs 字面相似、chunking 的本質取捨、retrieval 失敗的根本原因">4.1 RAG 原理</a> + <a href="/blog/llm/04-applications/prompt-techniques-landscape/" data-link-title="4.0 Prompt 技術光譜：手法分類、取捨、組合模式" data-link-desc="Zero-shot / few-shot、chain-of-thought、role / template、reflection 等 prompt 技術的分類與取捨、何時 stack 何時不要 stack、跟 fine-tune / RAG / chaining 的邊界">4.0 prompt 技術光譜</a> context 軸</li>
</ul>
<h2 id="階段-4協議跟自主度決定">階段 4：協議跟自主度決定</h2>
<p>這個工作流的控制流是線性的（1→2→3→4→5→6）、有條件分支（step 3 結果決定走 4 還是 5）、但每步順序固定。判讀：</p>
<p><strong>該用什麼結構</strong>：</p>
<ul>
<li><strong>不適用 Multi-agent</strong>：步驟順序固定、角色差異不大、orchestration overhead 純增。</li>
<li><strong>不適用 Single agent loop（model 自決下一步）</strong>：本案例假設 single-turn / 短多 turn、步驟順序明確、不需要 agent 自決。若 user 互動多輪 + turn 數不固定（如 user 中途補資訊、改主意、追問）、可考慮 agent loop。</li>
<li><strong>採用 Multi-call pipeline + router</strong>：寫成 deterministic pipeline、step 3 後有 router 分流。</li>
</ul>
<p>引用原理：</p>
<ul>
<li><a href="/blog/llm/04-applications/multi-agent-topology/" data-link-title="4.8 Multi-Agent 拓樸：flat / hierarchical / agent-as-tool" data-link-desc="從 multi-call workflow 走到 multi-agent system 的判讀、flat vs hierarchical 拓樸、agent-as-tool 的 MCP 視角、specialization 跟 orchestration overhead 的取捨">4.8 multi-agent 拓樸</a> 的「先 multi-call、不夠再 multi-agent」反射</li>
<li><a href="/blog/llm/04-applications/workflow-patterns/" data-link-title="4.7 Workflow 編排模式" data-link-desc="Pipeline / router / parallel / reflection：多 LLM call 組合的四種基本模式與退化條件">4.7 workflow patterns</a> 的 pipeline + router 模式</li>
<li><a href="/blog/llm/04-applications/agent-architecture/" data-link-title="4.4 Agent 架構原理" data-link-desc="Agent loop 結構、失敗模式、什麼任務適合 vs 不適合、跟人類審查的協作模型">4.4 agent 架構</a> 的「先 single-call、不夠再 agent」反射</li>
</ul>
<p><strong>自主度</strong>：</p>
<ul>
<li>Step 1（parse）、5（解釋）、6（草擬 email）：full auto。</li>
<li>Step 2、3（查訂單、查 policy）：full auto（read-only）。</li>
<li>Step 4（執行改地址）：pre-act HITL（高風險 + 不可逆）、有 diff show、user 可以 reject。</li>
<li>Step 6（發 email）：可選 pre-act HITL（看公司風格、保守版要審 email、激進版自動發）。</li>
</ul>
<h2 id="階段-5trace-instrumentation">階段 5：Trace Instrumentation</h2>
<p>工作流上線前、先設計要記哪些資訊。<strong>Eval 跟 debug 都靠 trace、沒 trace 後面什麼都做不了</strong>。</p>
<p>每個 step 要記：</p>
<table>
  <thead>
      <tr>
          <th>欄位</th>
          <th>為什麼</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Input（完整）</td>
          <td>Debug 時要重現</td>
      </tr>
      <tr>
          <td>Output（完整）</td>
          <td>比對預期、做 regression set</td>
      </tr>
      <tr>
          <td>Latency</td>
          <td>找 bottleneck</td>
      </tr>
      <tr>
          <td>Token cost</td>
          <td>算成本</td>
      </tr>
      <tr>
          <td>Step name + version</td>
          <td>追蹤是哪個版本的 prompt / tool</td>
      </tr>
      <tr>
          <td>Decision branch</td>
          <td>Step 3 的 router 走哪邊</td>
      </tr>
      <tr>
          <td>Error（若有）</td>
          <td>結構化 error、不是 string</td>
      </tr>
  </tbody>
</table>
<p>整段 trace 要綁同一個 conversation_id、可以後面 join 起來看完整流程。</p>
<p>引用原理：<a href="/blog/llm/04-applications/llm-tracing-and-observability/" data-link-title="4.20 LLM tracing 與 observability" data-link-desc="OpenTelemetry GenAI semantic conventions、結構化 span 設計、cost / latency 監控、failure debug 流程、跟 LLM-as-judge eval 的串接">4.20 LLM tracing</a>。</p>
<h2 id="階段-6eval-設計">階段 6：Eval 設計</h2>
<p>先選座標、再選工具。對本案例的每個 eval 需求、用 <a href="/blog/llm/04-applications/eval-design-framework/" data-link-title="4.13 Eval 設計座標系：三軸、八象限、何時測什麼" data-link-desc="Eval 設計三軸（objective↔subjective / component↔end-to-end / quantitative↔qualitative）、八象限的對應 eval 工具、軸選錯的訊號、跟 benchmarking / LLM-as-judge / tracing 的關係">4.13 三軸座標</a> 定位。下面列的 threshold 數字（95%、80%、≥4 等）是 illustrative、實際數字隨產品 baseline、user 容忍度、業務代價而定、不是通用標準。</p>
<h3 id="eval-1step-1-抽取準不準">Eval 1：Step 1 抽取準不準</h3>
<ul>
<li><strong>三軸</strong>：Objective（有 ground truth）+ Component（測單 step）+ Quantitative（accuracy）。</li>
<li><strong>工具</strong>：寫 100 個有標註的 query、跑 step 1、看 extraction accuracy（order_id 對 + new_address 對的比例）。</li>
<li><strong>Threshold</strong>：&lt; 95% 不上線。</li>
</ul>
<h3 id="eval-2step-2-4-tool-call-行為正確">Eval 2：Step 2-4 tool call 行為正確</h3>
<ul>
<li><strong>三軸</strong>：Objective + Component + Quantitative。</li>
<li><strong>工具</strong>：mock API、給 step 2-4 各 50 個 case、看 tool call 參數對不對、返回值處理對不對。</li>
<li><strong>Threshold</strong>：100%（這是 deterministic 行為、不該有錯）。</li>
</ul>
<h3 id="eval-3step-5-解釋品質">Eval 3：Step 5 解釋品質</h3>
<ul>
<li><strong>三軸</strong>：Subjective（沒有單一正解）+ Component + Quantitative。</li>
<li><strong>工具</strong>：LLM-as-judge with rubric（clarity / helpfulness / tone）、scale 1-5、aggregate average。</li>
<li><strong>Threshold</strong>：average ≥ 4、no 1-2 比例 &lt; 5%。</li>
</ul>
<h3 id="eval-4step-6-email-品質">Eval 4：Step 6 email 品質</h3>
<ul>
<li><strong>三軸</strong>：Subjective + Component + Quantitative + 加 Qualitative human review。</li>
<li><strong>工具</strong>：LLM judge 給分 + 每週抽 20 封 human review、看是否有 hallucinate 承諾、是否符合公司 tone。</li>
<li><strong>Threshold</strong>：judge 平均 ≥ 4、human review 沒有 critical issue。</li>
</ul>
<h3 id="eval-5e2e-success-rate">Eval 5：E2E success rate</h3>
<ul>
<li><strong>三軸</strong>：Objective + End-to-end + Quantitative。</li>
<li><strong>工具</strong>：跑 200 個 representative case、看「完整完成 + user 沒申訴」的比例。</li>
<li><strong>Threshold</strong>：≥ 85% baseline、降到 &lt; 80% alert。</li>
</ul>
<h3 id="eval-6user-滿意度">Eval 6：User 滿意度</h3>
<ul>
<li><strong>三軸</strong>：Subjective + End-to-end + Quantitative。</li>
<li><strong>工具</strong>：每次互動結束顯示 thumbs up/down + optional 留言、追蹤 weekly。</li>
<li><strong>Threshold</strong>：thumbs up rate &gt; 80%、appeal rate &lt; 5%。</li>
</ul>
<h3 id="eval-7failure-mode-pattern持續做">Eval 7：Failure mode pattern（持續做）</h3>
<ul>
<li><strong>三軸</strong>：Objective / Subjective + End-to-end + Qualitative。</li>
<li><strong>工具</strong>：每週讀 50 個 sampled traces + 100% 讀 failure / appeal traces、找 emerging pattern。</li>
<li><strong>產出</strong>：bug ticket、prompt 修改 hypothesis、policy 補強 hypothesis。</li>
</ul>
<p>引用原理：</p>
<ul>
<li>三軸座標 → <a href="/blog/llm/04-applications/eval-design-framework/" data-link-title="4.13 Eval 設計座標系：三軸、八象限、何時測什麼" data-link-desc="Eval 設計三軸（objective↔subjective / component↔end-to-end / quantitative↔qualitative）、八象限的對應 eval 工具、軸選錯的訊號、跟 benchmarking / LLM-as-judge / tracing 的關係">4.13 eval design framework</a></li>
<li>LLM judge rubric → <a href="/blog/llm/04-applications/llm-as-judge/" data-link-title="4.21 LLM-as-Judge 評估方法" data-link-desc="LLM 評估 LLM 的 production eval 方法：rubric design、pairwise / direct scoring、三大 bias 緩解、跟 trace 串接的閉環、calibration">4.21 LLM-as-Judge</a></li>
<li>Trace 接 eval → <a href="/blog/llm/04-applications/llm-tracing-and-observability/" data-link-title="4.20 LLM tracing 與 observability" data-link-desc="OpenTelemetry GenAI semantic conventions、結構化 span 設計、cost / latency 監控、failure debug 流程、跟 LLM-as-judge eval 的串接">4.20 LLM tracing</a></li>
</ul>
<h2 id="階段-7iteration-loop">階段 7：Iteration Loop</h2>
<p>上線後、不是「等出問題」、是<strong>持續 iteration</strong>。典型 iteration cycle：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln"> 1</span><span class="cl">Production trace + eval result
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">   ↓
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">[Error analysis：找 emerging pattern]
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">   ↓
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">   Hypothesis：哪一層有問題？
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">   ├── Prompt 層 → 改 prompt → A/B test → 看 eval 收斂
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">   ├── Tool 層   → 改 tool / schema → 跑 component eval → 收斂
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">   ├── RAG 層    → 改 chunking / query rewriting → 跑 [retrieval recall](/llm/knowledge-cards/retrieval-recall/) → 收斂
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">   ├── Policy 層 → 改 deterministic rule → 跑 step 3 component eval → 收斂
</span></span><span class="line"><span class="ln">10</span><span class="cl">   └── Model 層  → 換 model → 跑全 eval set → 收斂
</span></span><span class="line"><span class="ln">11</span><span class="cl">   ↓
</span></span><span class="line"><span class="ln">12</span><span class="cl">[改動進 production]
</span></span><span class="line"><span class="ln">13</span><span class="cl">   ↓
</span></span><span class="line"><span class="ln">14</span><span class="cl">[Frozen baseline 留著、新版本跟它比、漂移看得見]</span></span></code></pre></div><p>判讀「該改哪一層」的反射：</p>
<table>
  <thead>
      <tr>
          <th>失敗訊號</th>
          <th>該改的層</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Step 1 抽錯訊息</td>
          <td>Prompt / structured output schema</td>
      </tr>
      <tr>
          <td>Tool call 參數錯</td>
          <td>Prompt 內 tool description / few-shot</td>
      </tr>
      <tr>
          <td>Tool 跑掛</td>
          <td>Tool 實作（不是 LLM 問題）</td>
      </tr>
      <tr>
          <td>RAG retrieve 不到相關案例</td>
          <td>Chunking / embedding / query rewriting</td>
      </tr>
      <tr>
          <td>Policy judgment 錯</td>
          <td>Deterministic rule（不是 LLM 問題）</td>
      </tr>
      <tr>
          <td>Email tone 不對</td>
          <td>Prompt（role / few-shot）</td>
      </tr>
      <tr>
          <td>Email hallucinate 承諾</td>
          <td>Output validator（不只是 prompt）</td>
      </tr>
      <tr>
          <td>整體 latency 太高</td>
          <td>找 trace bottleneck、可能要 cache / 並行</td>
      </tr>
  </tbody>
</table>
<p>引用原理：</p>
<ul>
<li>Prompt 跟 model 層的失敗診斷 → <a href="/blog/llm/04-applications/prompt-techniques-landscape/" data-link-title="4.0 Prompt 技術光譜：手法分類、取捨、組合模式" data-link-desc="Zero-shot / few-shot、chain-of-thought、role / template、reflection 等 prompt 技術的分類與取捨、何時 stack 何時不要 stack、跟 fine-tune / RAG / chaining 的邊界">4.0 prompt 技術光譜</a> systematic vs random error</li>
<li>整體 fuzzy / deterministic 邊界判讀 → <a href="/blog/llm/00-foundations/deterministic-vs-fuzzy-engineering/" data-link-title="0.8 Deterministic vs Fuzzy Engineering：軟體設計典範的位移" data-link-desc="傳統 deterministic 軟體跟 fuzzy LLM 軟體在資料、邏輯、分解、實驗成本四個維度的根本差異、以及哪段該 deterministic、哪段該 fuzzy 的決策框架">0.8</a></li>
</ul>
<h2 id="五個容易遺漏的設計反射">五個容易遺漏的設計反射</h2>
<p>實務上常常省略這五個反射動作、走進無收斂迭代：</p>
<h3 id="反射一先觀察再開-ide">反射一：先觀察、再開 IDE</h3>
<p>階段 1 的價值是把 task decomposition 跟真實人類工作流對齊。沒這層對齊、寫出來的 prompt 跟 tool 拆法跟 reality 偏離、三天後重做。階段 1 的兩天比階段 3 的兩週值得。對應反例：「我先寫個 prompt 試試」、跳過觀察直接寫 code。</p>
<h3 id="反射二policy-寫成-codellm-只解析意圖">反射二：Policy 寫成 code、LLM 只解析意圖</h3>
<p>判斷類規則（user tier、訂單狀態、可否操作）走 deterministic code、LLM 只負責「user 想做什麼」這層意圖抽取。這條邊界讓 debug 容易、規則更新不用 prompt iteration。對應反例：「LLM、請判斷這個訂單能不能改地址、規則如下：&hellip;」——把判斷塞進 prompt、debug 困難、規則漂移無從追蹤。對應 <a href="/blog/llm/00-foundations/deterministic-vs-fuzzy-engineering/" data-link-title="0.8 Deterministic vs Fuzzy Engineering：軟體設計典範的位移" data-link-desc="傳統 deterministic 軟體跟 fuzzy LLM 軟體在資料、邏輯、分解、實驗成本四個維度的根本差異、以及哪段該 deterministic、哪段該 fuzzy 的決策框架">0.8</a> 的「邊界用錯」反模式。</p>
<h3 id="反射三trace-是-day-1-設計">反射三：Trace 是 day-1 設計</h3>
<p>從第一天就把 input / output / latency / token / step name / decision branch / error 進 trace、綁同一個 conversation_id。Eval 跟 debug 都靠 trace、沒 trace 後面什麼都做不了。對應反例：「先讓系統跑起來、之後再加 trace」——出 bug 時 debug 從零開始、production trace 不可回溯。</p>
<h3 id="反射四deterministic-行為用-deterministic-check">反射四：Deterministic 行為用 deterministic check</h3>
<p>有 ground truth 的行為（抽取對不對、API 參數對不對、JSON schema 合不合）用 Python 函數驗證、判斷成本低、精度高。LLM judge 留給沒 ground truth 的 subjective 行為。對應反例：用 LLM judge 測「step 1 抽取對不對」——cost 翻倍、精度反而不如 deterministic check。對應 <a href="/blog/llm/04-applications/eval-design-framework/" data-link-title="4.13 Eval 設計座標系：三軸、八象限、何時測什麼" data-link-desc="Eval 設計三軸（objective↔subjective / component↔end-to-end / quantitative↔qualitative）、八象限的對應 eval 工具、軸選錯的訊號、跟 benchmarking / LLM-as-judge / tracing 的關係">4.13</a> 軸誤選一。</p>
<h3 id="反射五保留-frozen-baseline">反射五：保留 frozen baseline</h3>
<p><a href="/blog/llm/knowledge-cards/frozen-baseline/" data-link-title="Frozen baseline" data-link-desc="Eval 系統中固定特定 prompt &#43; model 當長期對照、讓行為漂移可見的標準作法">Frozen baseline</a> 是把某個特定 prompt + 特定 model 跑 production 一段時間後 freeze 起來、每次新版本都跟它比、漂移看得見。對應反例：每次只跟「上一版」比、半年後累積漂移完全不可見、「整體變好了沒」無從回答。</p>
<h2 id="跟其他章節的對應總表">跟其他章節的對應總表</h2>
<p>本案例每階段引用的原理章節彙整：</p>
<table>
  <thead>
      <tr>
          <th>階段</th>
          <th>引用章節</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>1. 觀察人類工作流</td>
          <td><a href="/blog/llm/00-foundations/deterministic-vs-fuzzy-engineering/" data-link-title="0.8 Deterministic vs Fuzzy Engineering：軟體設計典範的位移" data-link-desc="傳統 deterministic 軟體跟 fuzzy LLM 軟體在資料、邏輯、分解、實驗成本四個維度的根本差異、以及哪段該 deterministic、哪段該 fuzzy 的決策框架">0.8 fuzzy engineering</a></td>
      </tr>
      <tr>
          <td>2. 典範定位</td>
          <td><a href="/blog/llm/00-foundations/deterministic-vs-fuzzy-engineering/" data-link-title="0.8 Deterministic vs Fuzzy Engineering：軟體設計典範的位移" data-link-desc="傳統 deterministic 軟體跟 fuzzy LLM 軟體在資料、邏輯、分解、實驗成本四個維度的根本差異、以及哪段該 deterministic、哪段該 fuzzy 的決策框架">0.8 fuzzy engineering</a></td>
      </tr>
      <tr>
          <td>3. 工作流設計（prompt / tool / RAG / HITL）</td>
          <td><a href="/blog/llm/04-applications/prompt-techniques-landscape/" data-link-title="4.0 Prompt 技術光譜：手法分類、取捨、組合模式" data-link-desc="Zero-shot / few-shot、chain-of-thought、role / template、reflection 等 prompt 技術的分類與取捨、何時 stack 何時不要 stack、跟 fine-tune / RAG / chaining 的邊界">4.0</a>、<a href="/blog/llm/04-applications/rag-principles/" data-link-title="4.1 RAG 原理：retrieval &#43; augmentation 模式" data-link-desc="為什麼模型需要外掛知識、語意相似 vs 字面相似、chunking 的本質取捨、retrieval 失敗的根本原因">4.1</a>、<a href="/blog/llm/04-applications/tool-use-principles/" data-link-title="4.3 Tool use 原理：LLM 跟外部世界互動" data-link-desc="Structured output 是 LLM 跨入工程系統的橋、function calling 取捨、為什麼本地小模型 tool use 表現崩潰">4.3</a>、<a href="/blog/llm/04-applications/human-ai-collaboration/" data-link-title="4.5 人機協作拓樸：何時人介入、怎麼介入" data-link-desc="Centaur vs Cyborg 工作模式、jagged frontier、HITL 三種觸發時機（pre-act / mid-stream / post-hoc）、確認流程的設計避免橡皮圖章化">4.5</a></td>
      </tr>
      <tr>
          <td>4. 結構決定（multi-call vs agent vs multi-agent）</td>
          <td><a href="/blog/llm/04-applications/agent-architecture/" data-link-title="4.4 Agent 架構原理" data-link-desc="Agent loop 結構、失敗模式、什麼任務適合 vs 不適合、跟人類審查的協作模型">4.4</a>、<a href="/blog/llm/04-applications/workflow-patterns/" data-link-title="4.7 Workflow 編排模式" data-link-desc="Pipeline / router / parallel / reflection：多 LLM call 組合的四種基本模式與退化條件">4.7</a>、<a href="/blog/llm/04-applications/multi-agent-topology/" data-link-title="4.8 Multi-Agent 拓樸：flat / hierarchical / agent-as-tool" data-link-desc="從 multi-call workflow 走到 multi-agent system 的判讀、flat vs hierarchical 拓樸、agent-as-tool 的 MCP 視角、specialization 跟 orchestration overhead 的取捨">4.8</a></td>
      </tr>
      <tr>
          <td>5. Trace instrumentation</td>
          <td><a href="/blog/llm/04-applications/llm-tracing-and-observability/" data-link-title="4.20 LLM tracing 與 observability" data-link-desc="OpenTelemetry GenAI semantic conventions、結構化 span 設計、cost / latency 監控、failure debug 流程、跟 LLM-as-judge eval 的串接">4.20 LLM tracing</a></td>
      </tr>
      <tr>
          <td>6. Eval 設計</td>
          <td><a href="/blog/llm/04-applications/eval-design-framework/" data-link-title="4.13 Eval 設計座標系：三軸、八象限、何時測什麼" data-link-desc="Eval 設計三軸（objective↔subjective / component↔end-to-end / quantitative↔qualitative）、八象限的對應 eval 工具、軸選錯的訊號、跟 benchmarking / LLM-as-judge / tracing 的關係">4.13 eval framework</a>、<a href="/blog/llm/04-applications/benchmarking-and-evaluation/" data-link-title="4.14 Benchmarking 與評估方法論" data-link-desc="判讀 model card benchmark 數字、做自己工作流的 in-house benchmark、量測本地推論速度的完整方法論">4.14</a>、<a href="/blog/llm/04-applications/llm-as-judge/" data-link-title="4.21 LLM-as-Judge 評估方法" data-link-desc="LLM 評估 LLM 的 production eval 方法：rubric design、pairwise / direct scoring、三大 bias 緩解、跟 trace 串接的閉環、calibration">4.21</a></td>
      </tr>
      <tr>
          <td>7. Iteration loop</td>
          <td><a href="/blog/llm/04-applications/prompt-techniques-landscape/" data-link-title="4.0 Prompt 技術光譜：手法分類、取捨、組合模式" data-link-desc="Zero-shot / few-shot、chain-of-thought、role / template、reflection 等 prompt 技術的分類與取捨、何時 stack 何時不要 stack、跟 fine-tune / RAG / chaining 的邊界">4.0 prompt 光譜</a> systematic vs random error 段</td>
      </tr>
  </tbody>
</table>
<h2 id="下一步">下一步</h2>
<p>返回：<a href="/blog/llm/04-applications/" data-link-title="模組四：LLM 應用層原理" data-link-desc="Prompt 技術光譜、RAG、tool use、agent、應用層協議、人機協作、multi-agent、workflow 編排、eval 設計：跨工具不變的概念地圖">模組四首頁</a>、或回到 <a href="/blog/llm/04-applications/hands-on/" data-link-title="4.x Hands-on：端到端案例" data-link-desc="把模組四的所有原理串成具體 case study：從 task decomposition、workflow 設計、eval 設計到 iteration loop">hands-on 索引</a>。</p>
]]></content:encoded></item><item><title>Hands-on：安裝 ComfyUI + SDXL base</title><link>https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/comfyui-setup/</link><pubDate>Tue, 12 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/comfyui-setup/</guid><description>&lt;p>本篇紀錄裝 ComfyUI 跟 Stable Diffusion XL base 模型、在 Apple Silicon Mac 上跑通最小 text-to-image 流程。ComfyUI 是 2026 年 Apple Silicon 跑 &lt;a href="https://tarrragon.github.io/blog/llm/knowledge-cards/diffusion/" data-link-title="Diffusion" data-link-desc="產圖用的生成式 AI 架構：跟寫 code 用的 Transformer 是不同路線">Diffusion&lt;/a> 最主流的選擇——節點式工作流（拖拉節點連線、像 visual programming、每個節點負責一段運算）、跨平台、Python 環境、容易客製化。Draw Things（Mac 原生 GUI）更簡單、但 ComfyUI 接 workflow 跟 custom node 的能力強很多。&lt;/p>
&lt;blockquote>
&lt;p>&lt;strong>驗證日期&lt;/strong>：2026-05-12
&lt;strong>ComfyUI&lt;/strong>：main branch、shallow clone
&lt;strong>示範模型&lt;/strong>：Stable Diffusion XL base 1.0（6.5 GB、&lt;code>stabilityai/stable-diffusion-xl-base-1.0&lt;/code>）
&lt;strong>Python&lt;/strong>：3.14（venv 隔離、不污染系統）&lt;/p>&lt;/blockquote>
&lt;h2 id="前置設定">前置設定&lt;/h2>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>項目&lt;/th>
 &lt;th>檢查指令&lt;/th>
 &lt;th>預期&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>Git&lt;/td>
 &lt;td>&lt;code>which git&lt;/code>&lt;/td>
 &lt;td>&lt;code>/usr/bin/git&lt;/code> 或 brew 版&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Python 3.10+&lt;/td>
 &lt;td>&lt;code>python3 --version&lt;/code>&lt;/td>
 &lt;td>3.10 ~ 3.14 都可、本 demo 用 3.14&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>磁碟空間&lt;/td>
 &lt;td>&lt;code>df -h ~&lt;/code>&lt;/td>
 &lt;td>至少 15 GB（runtime 3 GB + SDXL 6.5 GB + cache）&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;a href="https://tarrragon.github.io/blog/llm/knowledge-cards/unified-memory/" data-link-title="Unified Memory Architecture" data-link-desc="Apple Silicon 讓 CPU / GPU / NE 共用同一塊記憶體：跑大模型的優勢來源">統一記憶體&lt;/a>&lt;/td>
 &lt;td>&lt;code>system_profiler SPHardwareDataType | grep Memory&lt;/code>&lt;/td>
 &lt;td>至少 16 GB、推薦 32 GB+&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>ComfyUI 在 Apple Silicon 跑 Diffusion 用 &lt;a href="https://tarrragon.github.io/blog/llm/knowledge-cards/gpu-compute-backend/" data-link-title="GPU Compute Backend" data-link-desc="GPU 加速計算的底層 API 介面（CUDA / ROCm / Vulkan / Metal / SYCL）、決定推論軟體能否用 GPU 跑得快">MPS（Metal Performance Shaders）backend&lt;/a>、不需要 NVIDIA CUDA。但跑 SDXL 至少要 12 GB 統一記憶體留給 model + activation、16 GB Mac 跟其他 app 一起會吃緊。&lt;/p>
&lt;h2 id="clone-comfyui">Clone ComfyUI&lt;/h2>
&lt;p>放在 &lt;code>~/Projects/&lt;/code> 下、跟其他 dev project 同層：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="nb">cd&lt;/span> ~/Projects
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">git clone --depth &lt;span class="m">1&lt;/span> https://github.com/comfyanonymous/ComfyUI.git
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">&lt;span class="nb">cd&lt;/span> ComfyUI&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;code>--depth 1&lt;/code> 只拉最新 commit、不拉全部歷史、省幾百 MB。要追歷史 / submit PR 才需要 full clone。&lt;/p>
&lt;p>ComfyUI 目錄結構（核心部分）：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-text" data-lang="text">&lt;span class="line">&lt;span class="ln"> 1&lt;/span>&lt;span class="cl">ComfyUI/
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 2&lt;/span>&lt;span class="cl">├── main.py # 啟動 entry point
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 3&lt;/span>&lt;span class="cl">├── server.py # HTTP server
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 4&lt;/span>&lt;span class="cl">├── nodes.py # 內建節點實作
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 5&lt;/span>&lt;span class="cl">├── custom_nodes/ # 第三方 / 客製節點放這
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 6&lt;/span>&lt;span class="cl">├── models/
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 7&lt;/span>&lt;span class="cl">│ ├── checkpoints/ # SD / SDXL 主 model 檔放這
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 8&lt;/span>&lt;span class="cl">│ ├── loras/ # LoRA 微調權重
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 9&lt;/span>&lt;span class="cl">│ ├── vae/ # VAE 模型
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">10&lt;/span>&lt;span class="cl">│ ├── controlnet/ # ControlNet 模型
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">11&lt;/span>&lt;span class="cl">│ └── ...
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">12&lt;/span>&lt;span class="cl">├── output/ # 生成的圖
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">13&lt;/span>&lt;span class="cl">├── input/ # 拖進 ComfyUI 的圖片
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">14&lt;/span>&lt;span class="cl">└── requirements.txt&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="建-venv--裝-dependencies">建 venv + 裝 dependencies&lt;/h2>
&lt;p>ComfyUI requirements 含 PyTorch、numpy、PIL、safetensors、einops 等、套件多、版本敏感。用 venv 隔離：&lt;/p></description><content:encoded><![CDATA[<p>本篇紀錄裝 ComfyUI 跟 Stable Diffusion XL base 模型、在 Apple Silicon Mac 上跑通最小 text-to-image 流程。ComfyUI 是 2026 年 Apple Silicon 跑 <a href="/blog/llm/knowledge-cards/diffusion/" data-link-title="Diffusion" data-link-desc="產圖用的生成式 AI 架構：跟寫 code 用的 Transformer 是不同路線">Diffusion</a> 最主流的選擇——節點式工作流（拖拉節點連線、像 visual programming、每個節點負責一段運算）、跨平台、Python 環境、容易客製化。Draw Things（Mac 原生 GUI）更簡單、但 ComfyUI 接 workflow 跟 custom node 的能力強很多。</p>
<blockquote>
<p><strong>驗證日期</strong>：2026-05-12
<strong>ComfyUI</strong>：main branch、shallow clone
<strong>示範模型</strong>：Stable Diffusion XL base 1.0（6.5 GB、<code>stabilityai/stable-diffusion-xl-base-1.0</code>）
<strong>Python</strong>：3.14（venv 隔離、不污染系統）</p></blockquote>
<h2 id="前置設定">前置設定</h2>
<table>
  <thead>
      <tr>
          <th>項目</th>
          <th>檢查指令</th>
          <th>預期</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Git</td>
          <td><code>which git</code></td>
          <td><code>/usr/bin/git</code> 或 brew 版</td>
      </tr>
      <tr>
          <td>Python 3.10+</td>
          <td><code>python3 --version</code></td>
          <td>3.10 ~ 3.14 都可、本 demo 用 3.14</td>
      </tr>
      <tr>
          <td>磁碟空間</td>
          <td><code>df -h ~</code></td>
          <td>至少 15 GB（runtime 3 GB + SDXL 6.5 GB + cache）</td>
      </tr>
      <tr>
          <td><a href="/blog/llm/knowledge-cards/unified-memory/" data-link-title="Unified Memory Architecture" data-link-desc="Apple Silicon 讓 CPU / GPU / NE 共用同一塊記憶體：跑大模型的優勢來源">統一記憶體</a></td>
          <td><code>system_profiler SPHardwareDataType | grep Memory</code></td>
          <td>至少 16 GB、推薦 32 GB+</td>
      </tr>
  </tbody>
</table>
<p>ComfyUI 在 Apple Silicon 跑 Diffusion 用 <a href="/blog/llm/knowledge-cards/gpu-compute-backend/" data-link-title="GPU Compute Backend" data-link-desc="GPU 加速計算的底層 API 介面（CUDA / ROCm / Vulkan / Metal / SYCL）、決定推論軟體能否用 GPU 跑得快">MPS（Metal Performance Shaders）backend</a>、不需要 NVIDIA CUDA。但跑 SDXL 至少要 12 GB 統一記憶體留給 model + activation、16 GB Mac 跟其他 app 一起會吃緊。</p>
<h2 id="clone-comfyui">Clone ComfyUI</h2>
<p>放在 <code>~/Projects/</code> 下、跟其他 dev project 同層：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="nb">cd</span> ~/Projects
</span></span><span class="line"><span class="ln">2</span><span class="cl">git clone --depth <span class="m">1</span> https://github.com/comfyanonymous/ComfyUI.git
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="nb">cd</span> ComfyUI</span></span></code></pre></div><p><code>--depth 1</code> 只拉最新 commit、不拉全部歷史、省幾百 MB。要追歷史 / submit PR 才需要 full clone。</p>
<p>ComfyUI 目錄結構（核心部分）：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln"> 1</span><span class="cl">ComfyUI/
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">├── main.py              # 啟動 entry point
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">├── server.py            # HTTP server
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">├── nodes.py             # 內建節點實作
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">├── custom_nodes/        # 第三方 / 客製節點放這
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">├── models/
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">│   ├── checkpoints/     # SD / SDXL 主 model 檔放這
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">│   ├── loras/           # LoRA 微調權重
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">│   ├── vae/             # VAE 模型
</span></span><span class="line"><span class="ln">10</span><span class="cl">│   ├── controlnet/      # ControlNet 模型
</span></span><span class="line"><span class="ln">11</span><span class="cl">│   └── ...
</span></span><span class="line"><span class="ln">12</span><span class="cl">├── output/              # 生成的圖
</span></span><span class="line"><span class="ln">13</span><span class="cl">├── input/               # 拖進 ComfyUI 的圖片
</span></span><span class="line"><span class="ln">14</span><span class="cl">└── requirements.txt</span></span></code></pre></div><h2 id="建-venv--裝-dependencies">建 venv + 裝 dependencies</h2>
<p>ComfyUI requirements 含 PyTorch、numpy、PIL、safetensors、einops 等、套件多、版本敏感。用 venv 隔離：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="nb">cd</span> ~/Projects/ComfyUI
</span></span><span class="line"><span class="ln">2</span><span class="cl">python3 -m venv venv
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="nb">source</span> venv/bin/activate
</span></span><span class="line"><span class="ln">4</span><span class="cl">python --version  <span class="c1"># 確認在 venv 內</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">pip install --upgrade pip</span></span></code></pre></div><p>裝 dependencies：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">pip install -r requirements.txt</span></span></code></pre></div><p>實測時間：10-15 分鐘（torch + 各種 dep）、首次跑會編譯部分 C extension。完成後預期看到：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">Successfully installed Mako-... MarkupSafe-... Pillow-... PyOpenGL-... ...
</span></span><span class="line"><span class="ln">2</span><span class="cl">  torch-... torchvision-... torchaudio-... ...
</span></span><span class="line"><span class="ln">3</span><span class="cl">  safetensors-... transformers-... ...</span></span></code></pre></div><p>驗證 PyTorch + MPS：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">python -c <span class="s2">&#34;import torch; print(&#39;torch:&#39;, torch.__version__, &#39;mps:&#39;, torch.backends.mps.is_available())&#34;</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"># torch: 2.x.x mps: True</span></span></span></code></pre></div><p><code>mps: True</code> 表示 Apple Silicon GPU 加速可用。</p>
<h2 id="下載-sdxl-base-模型">下載 SDXL base 模型</h2>
<p>SDXL base 約 6.5 GB、是 Stable Diffusion XL 的基礎 model。從 Hugging Face 拉到 ComfyUI 的 <code>models/checkpoints/</code>：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">mkdir -p ~/Projects/ComfyUI/models/checkpoints
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="nb">cd</span> ~/Projects/ComfyUI/models/checkpoints
</span></span><span class="line"><span class="ln">3</span><span class="cl">
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="c1"># -L 跟 redirect、--continue-at - 支援中斷後重續、避免 6.5 GB 重下</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">curl -L --continue-at - -o sd_xl_base_1.0.safetensors <span class="se">\
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="se"></span>  <span class="s2">&#34;https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/sd_xl_base_1.0.safetensors?download=true&#34;</span></span></span></code></pre></div><p>下載時間視網速、10-30 分鐘 broadband 都正常。網路中斷時重跑同一個指令、<code>--continue-at -</code> 會從中斷處續傳、不用重下 6.5 GB。完成後：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">ls -lh sd_xl_base_1.0.safetensors
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"># 6.5 GB</span></span></span></code></pre></div><p>可選的進階模型：</p>
<table>
  <thead>
      <tr>
          <th>Model</th>
          <th>大小</th>
          <th>用途</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>SDXL base 1.0</td>
          <td>6.5 GB</td>
          <td>基礎、本 demo 用</td>
      </tr>
      <tr>
          <td>SDXL refiner 1.0</td>
          <td>6.1 GB</td>
          <td>跟 base 配對、提升細節</td>
      </tr>
      <tr>
          <td>SD 1.5</td>
          <td>4.0 GB</td>
          <td>較小、生態最成熟（很多 LoRA）</td>
      </tr>
      <tr>
          <td>Flux.1 schnell</td>
          <td>12 GB</td>
          <td>2024+ 最強開源 SD 級</td>
      </tr>
      <tr>
          <td>Flux.1 dev</td>
          <td>24 GB</td>
          <td>Flux 完整版、品質最佳</td>
      </tr>
  </tbody>
</table>
<p>SDXL 6.5 GB 是「能驗證 + 不過大」的甜蜜點。再小可以選 SD 1.5（4 GB）、跑 Flux 要 24 GB 磁碟 + 16 GB+ 統一記憶體。</p>
<h2 id="啟動-comfyui-server">啟動 ComfyUI Server</h2>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="nb">cd</span> ~/Projects/ComfyUI
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="nb">source</span> venv/bin/activate
</span></span><span class="line"><span class="ln">3</span><span class="cl">python main.py</span></span></code></pre></div><p>預期輸出：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">[Prompt Server] Starting ComfyUI...
</span></span><span class="line"><span class="ln">2</span><span class="cl">Total VRAM 32768 MB, total RAM 32768 MB
</span></span><span class="line"><span class="ln">3</span><span class="cl">pytorch version: 2.x.x
</span></span><span class="line"><span class="ln">4</span><span class="cl">Set vram state to: SHARED
</span></span><span class="line"><span class="ln">5</span><span class="cl">Device: mps
</span></span><span class="line"><span class="ln">6</span><span class="cl">Using sub quadratic attention for cross-attention
</span></span><span class="line"><span class="ln">7</span><span class="cl">...
</span></span><span class="line"><span class="ln">8</span><span class="cl">Starting server
</span></span><span class="line"><span class="ln">9</span><span class="cl">To see the GUI go to: http://127.0.0.1:8188</span></span></code></pre></div><p>Apple Silicon 統一記憶體被 PyTorch 報成 VRAM 是預期、不是 bug：mps backend 把整個統一記憶體當成「GPU 可見記憶體」、所以 32GB Mac 顯示 <code>Total VRAM 32768 MB</code>。實際使用上 ComfyUI、其他 app 跟系統共用同一塊。</p>
<p>關鍵驗證：</p>
<ul>
<li><code>Device: mps</code> → Apple Silicon GPU 啟用</li>
<li><code>Starting server</code> + <code>http://127.0.0.1:8188</code> → server 跑了</li>
</ul>
<p>開瀏覽器到 <code>http://127.0.0.1:8188</code>、看到節點式 UI 就成功。第一次開啟會載入預設 workflow（一個簡單 text-to-image）。</p>
<p>要對外暴露（讓 LAN 內其他機器連）：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">python main.py --listen 0.0.0.0 --port <span class="m">8188</span></span></span></code></pre></div><p>跟 <a href="/blog/llm/00-foundations/privacy-data-flow/" data-link-title="0.7 隱私 / 資安的資料流原理" data-link-desc="從「位置」到「資料流」的思考升級：信任邊界、合約模型、零信任原則套用到 LLM 工作流">0.7 隱私資料流</a> 提的一樣、<code>0.0.0.0</code> 等於暴露給整個區網、家用 OK 公共網路要小心。</p>
<h2 id="跑第一張圖">跑第一張圖</h2>
<p>ComfyUI 預設 workflow 是 text-to-image：</p>
<ol>
<li><strong>CheckpointLoader 節點</strong>：選 <code>sd_xl_base_1.0.safetensors</code>。</li>
<li><strong>CLIPTextEncode（Prompt）節點</strong>：輸入 prompt、例如 <code>a photograph of a cat sitting on a wooden chair, natural lighting</code>。</li>
<li><strong>CLIPTextEncode（Negative）節點</strong>：輸入 negative prompt、例如 <code>blurry, low quality, artifacts</code>。</li>
<li><strong>EmptyLatentImage 節點</strong>：設定 1024×1024（SDXL 最佳尺寸）。</li>
<li><strong>KSampler 節點</strong>：steps=20、cfg=7、sampler=<code>euler</code> 或 <code>dpmpp_2m</code>。</li>
<li><strong>VAEDecode 節點</strong>：把 latent 轉成 RGB image。</li>
<li><strong>SaveImage 節點</strong>：存到 <code>output/</code>。</li>
</ol>
<p>點右側 panel 的 <code>Queue Prompt</code>、開始生成。</p>
<p>實測時間（M4 Pro 32GB、SDXL base、1024×1024、MPS backend）：</p>
<table>
  <thead>
      <tr>
          <th>Steps</th>
          <th>第一張（含 model 載入）</th>
          <th>後續同 model</th>
          <th>備註</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>15</td>
          <td>約 100-110 秒</td>
          <td>約 30-40 秒</td>
          <td>本驗證實測 106s（含載入）</td>
      </tr>
      <tr>
          <td>20</td>
          <td>約 130-150 秒</td>
          <td>約 40-60 秒</td>
          <td>ComfyUI 預設值</td>
      </tr>
      <tr>
          <td>30</td>
          <td>約 200 秒</td>
          <td>約 80 秒</td>
          <td>品質更高、邊際效益小</td>
      </tr>
  </tbody>
</table>
<p>16GB Mac 跑 SDXL：每張 60-180 秒、可能會降頻。</p>
<p>生成完成後在 <code>output/</code> 看到 PNG 檔（如 <code>comfyui-test_00001_.png</code>）。</p>
<h2 id="用-rest-api-直接生成不開瀏覽器">用 REST API 直接生成（不開瀏覽器）</h2>
<p>GUI 適合互動探索、自動化要走 REST API。完整 script 在 <code>scripts/comfyui-test/generate.py</code>、實際驗證指令：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="nb">cd</span> ~/Projects/blog
</span></span><span class="line"><span class="ln">2</span><span class="cl">python3 scripts/comfyui-test/generate.py --steps <span class="m">15</span></span></span></code></pre></div><p>腳本流程：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="k">def</span> <span class="nf">build_workflow</span><span class="p">(</span><span class="n">prompt_text</span><span class="p">,</span> <span class="n">neg_text</span><span class="p">,</span> <span class="n">steps</span><span class="p">):</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="k">return</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">        <span class="s2">&#34;3&#34;</span><span class="p">:</span> <span class="p">{</span><span class="s2">&#34;inputs&#34;</span><span class="p">:</span> <span class="p">{</span><span class="s2">&#34;seed&#34;</span><span class="p">:</span> <span class="mi">42</span><span class="p">,</span> <span class="s2">&#34;steps&#34;</span><span class="p">:</span> <span class="n">steps</span><span class="p">,</span> <span class="s2">&#34;cfg&#34;</span><span class="p">:</span> <span class="mf">7.0</span><span class="p">,</span> <span class="s2">&#34;sampler_name&#34;</span><span class="p">:</span> <span class="s2">&#34;euler&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">                         <span class="s2">&#34;scheduler&#34;</span><span class="p">:</span> <span class="s2">&#34;normal&#34;</span><span class="p">,</span> <span class="s2">&#34;denoise&#34;</span><span class="p">:</span> <span class="mf">1.0</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">                         <span class="s2">&#34;model&#34;</span><span class="p">:</span> <span class="p">[</span><span class="s2">&#34;4&#34;</span><span class="p">,</span> <span class="mi">0</span><span class="p">],</span> <span class="s2">&#34;positive&#34;</span><span class="p">:</span> <span class="p">[</span><span class="s2">&#34;6&#34;</span><span class="p">,</span> <span class="mi">0</span><span class="p">],</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">                         <span class="s2">&#34;negative&#34;</span><span class="p">:</span> <span class="p">[</span><span class="s2">&#34;7&#34;</span><span class="p">,</span> <span class="mi">0</span><span class="p">],</span> <span class="s2">&#34;latent_image&#34;</span><span class="p">:</span> <span class="p">[</span><span class="s2">&#34;5&#34;</span><span class="p">,</span> <span class="mi">0</span><span class="p">]},</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">              <span class="s2">&#34;class_type&#34;</span><span class="p">:</span> <span class="s2">&#34;KSampler&#34;</span><span class="p">},</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">        <span class="s2">&#34;4&#34;</span><span class="p">:</span> <span class="p">{</span><span class="s2">&#34;inputs&#34;</span><span class="p">:</span> <span class="p">{</span><span class="s2">&#34;ckpt_name&#34;</span><span class="p">:</span> <span class="s2">&#34;sd_xl_base_1.0.safetensors&#34;</span><span class="p">},</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">              <span class="s2">&#34;class_type&#34;</span><span class="p">:</span> <span class="s2">&#34;CheckpointLoaderSimple&#34;</span><span class="p">},</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">        <span class="s2">&#34;5&#34;</span><span class="p">:</span> <span class="p">{</span><span class="s2">&#34;inputs&#34;</span><span class="p">:</span> <span class="p">{</span><span class="s2">&#34;width&#34;</span><span class="p">:</span> <span class="mi">1024</span><span class="p">,</span> <span class="s2">&#34;height&#34;</span><span class="p">:</span> <span class="mi">1024</span><span class="p">,</span> <span class="s2">&#34;batch_size&#34;</span><span class="p">:</span> <span class="mi">1</span><span class="p">},</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">              <span class="s2">&#34;class_type&#34;</span><span class="p">:</span> <span class="s2">&#34;EmptyLatentImage&#34;</span><span class="p">},</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">        <span class="s2">&#34;6&#34;</span><span class="p">:</span> <span class="p">{</span><span class="s2">&#34;inputs&#34;</span><span class="p">:</span> <span class="p">{</span><span class="s2">&#34;text&#34;</span><span class="p">:</span> <span class="n">prompt_text</span><span class="p">,</span> <span class="s2">&#34;clip&#34;</span><span class="p">:</span> <span class="p">[</span><span class="s2">&#34;4&#34;</span><span class="p">,</span> <span class="mi">1</span><span class="p">]},</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">              <span class="s2">&#34;class_type&#34;</span><span class="p">:</span> <span class="s2">&#34;CLIPTextEncode&#34;</span><span class="p">},</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">        <span class="s2">&#34;7&#34;</span><span class="p">:</span> <span class="p">{</span><span class="s2">&#34;inputs&#34;</span><span class="p">:</span> <span class="p">{</span><span class="s2">&#34;text&#34;</span><span class="p">:</span> <span class="n">neg_text</span><span class="p">,</span> <span class="s2">&#34;clip&#34;</span><span class="p">:</span> <span class="p">[</span><span class="s2">&#34;4&#34;</span><span class="p">,</span> <span class="mi">1</span><span class="p">]},</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">              <span class="s2">&#34;class_type&#34;</span><span class="p">:</span> <span class="s2">&#34;CLIPTextEncode&#34;</span><span class="p">},</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl">        <span class="s2">&#34;8&#34;</span><span class="p">:</span> <span class="p">{</span><span class="s2">&#34;inputs&#34;</span><span class="p">:</span> <span class="p">{</span><span class="s2">&#34;samples&#34;</span><span class="p">:</span> <span class="p">[</span><span class="s2">&#34;3&#34;</span><span class="p">,</span> <span class="mi">0</span><span class="p">],</span> <span class="s2">&#34;vae&#34;</span><span class="p">:</span> <span class="p">[</span><span class="s2">&#34;4&#34;</span><span class="p">,</span> <span class="mi">2</span><span class="p">]},</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl">              <span class="s2">&#34;class_type&#34;</span><span class="p">:</span> <span class="s2">&#34;VAEDecode&#34;</span><span class="p">},</span>
</span></span><span class="line"><span class="ln">18</span><span class="cl">        <span class="s2">&#34;9&#34;</span><span class="p">:</span> <span class="p">{</span><span class="s2">&#34;inputs&#34;</span><span class="p">:</span> <span class="p">{</span><span class="s2">&#34;filename_prefix&#34;</span><span class="p">:</span> <span class="s2">&#34;comfyui-test&#34;</span><span class="p">,</span> <span class="s2">&#34;images&#34;</span><span class="p">:</span> <span class="p">[</span><span class="s2">&#34;8&#34;</span><span class="p">,</span> <span class="mi">0</span><span class="p">]},</span>
</span></span><span class="line"><span class="ln">19</span><span class="cl">              <span class="s2">&#34;class_type&#34;</span><span class="p">:</span> <span class="s2">&#34;SaveImage&#34;</span><span class="p">},</span>
</span></span><span class="line"><span class="ln">20</span><span class="cl">    <span class="p">}</span></span></span></code></pre></div><p><strong>workflow JSON 結構解釋</strong>：</p>
<ul>
<li><strong>每個 key（&ldquo;3&rdquo;、&ldquo;4&rdquo;、…）是節點 ID</strong>。任意整數字串、只要在 workflow 內唯一即可。</li>
<li><strong><code>class_type</code></strong>：節點類型（KSampler、CheckpointLoaderSimple、CLIPTextEncode 等）、ComfyUI 內建。</li>
<li><strong><code>inputs</code></strong>：節點參數。標量值（如 <code>1024</code>、<code>&quot;euler&quot;</code>）直接寫；連到別的節點輸出用 <code>[node_id, output_index]</code> 形式。</li>
<li><strong><code>[&quot;4&quot;, 0]</code></strong> 表示「節點 4 的第 0 個 output」。CheckpointLoaderSimple 有三個 output：<code>model</code>（0）、<code>clip</code>（1）、<code>vae</code>（2）、所以 <code>[&quot;4&quot;, 0]</code> 是 model、<code>[&quot;4&quot;, 1]</code> 是 clip、<code>[&quot;4&quot;, 2]</code> 是 vae。</li>
</ul>
<p><strong>每個節點做什麼</strong>：</p>
<ul>
<li><strong>4 CheckpointLoaderSimple</strong>：載 SDXL safetensors、輸出 model / clip / vae 三個東西。是整條 graph 的根。</li>
<li><strong>5 EmptyLatentImage</strong>：建一張 1024×1024 的空白 latent tensor（不是 RGB 圖、是 4-channel latent space tensor）。SDXL 的 「畫布」。</li>
<li><strong>6 CLIPTextEncode (positive)</strong>：把 prompt 文字用 CLIP text encoder 轉成 conditioning vector。</li>
<li><strong>7 CLIPTextEncode (negative)</strong>：同上、但是 negative prompt（要 avoid 的特徵）。</li>
<li><strong>3 KSampler</strong>：核心 denoising loop。15-30 個 step、把 latent 從噪聲變成跟 conditioning 對齊的 latent。</li>
<li><strong>8 VAEDecode</strong>：把 latent 用 VAE 解碼成 RGB 圖（1024×1024×3）。</li>
<li><strong>9 SaveImage</strong>：寫 PNG 到 <code>output/</code> 目錄、檔名 prefix <code>comfyui-test</code>。</li>
</ul>
<p><strong>為什麼 graph 結構這樣</strong>：</p>
<ul>
<li><strong>為什麼 model / clip / vae 從同一個 checkpoint 拿</strong>：SDXL 設計上三個元件互相 train、必須同源。從不同 checkpoint 拿會造成生成品質崩。</li>
<li><strong>為什麼 EmptyLatentImage 不直接接 KSampler、要設 batch_size</strong>：保留 batch 維度、未來要 batch generation（一次生 4 張）改 <code>batch_size: 4</code> 就好、其他節點不用改。</li>
<li><strong>為什麼 sampler 用 <code>euler</code>、scheduler 用 <code>normal</code></strong>：最簡單的組合、SDXL base 上品質可預測。其他選項（<code>dpmpp_2m</code>、<code>karras</code> scheduler 等）品質可能更好但效果各模型不同。</li>
<li><strong>為什麼 cfg=7.0</strong>：classifier-free guidance scale。SDXL 的標準預設、太低（&lt; 3）模型忽略 prompt、太高（&gt; 12）過 saturated。</li>
<li><strong>為什麼 seed=42</strong>：固定 seed 讓結果可重現。每次跑同 prompt 同 seed 同 model 結果完全一樣——是調 prompt / 比較 model 的必要條件。</li>
</ul>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="k">def</span> <span class="nf">main</span><span class="p">():</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="n">workflow</span> <span class="o">=</span> <span class="n">build_workflow</span><span class="p">(</span><span class="n">args</span><span class="o">.</span><span class="n">prompt</span><span class="p">,</span> <span class="n">args</span><span class="o">.</span><span class="n">neg</span><span class="p">,</span> <span class="n">args</span><span class="o">.</span><span class="n">steps</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">    <span class="n">client_id</span> <span class="o">=</span> <span class="nb">str</span><span class="p">(</span><span class="n">uuid</span><span class="o">.</span><span class="n">uuid4</span><span class="p">())</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">    <span class="n">resp</span> <span class="o">=</span> <span class="n">http_post_json</span><span class="p">(</span><span class="s2">&#34;/prompt&#34;</span><span class="p">,</span> <span class="p">{</span><span class="s2">&#34;prompt&#34;</span><span class="p">:</span> <span class="n">workflow</span><span class="p">,</span> <span class="s2">&#34;client_id&#34;</span><span class="p">:</span> <span class="n">client_id</span><span class="p">})</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">    <span class="n">prompt_id</span> <span class="o">=</span> <span class="n">resp</span><span class="p">[</span><span class="s2">&#34;prompt_id&#34;</span><span class="p">]</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">    <span class="k">while</span> <span class="kc">True</span><span class="p">:</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">        <span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">        <span class="n">history</span> <span class="o">=</span> <span class="n">http_get_json</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;/history/</span><span class="si">{</span><span class="n">prompt_id</span><span class="si">}</span><span class="s2">&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">        <span class="k">if</span> <span class="n">prompt_id</span> <span class="ow">in</span> <span class="n">history</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">            <span class="n">outputs</span> <span class="o">=</span> <span class="n">history</span><span class="p">[</span><span class="n">prompt_id</span><span class="p">]</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">&#34;outputs&#34;</span><span class="p">,</span> <span class="p">{})</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">            <span class="k">break</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">
</span></span><span class="line"><span class="ln">14</span><span class="cl">    <span class="n">img</span> <span class="o">=</span> <span class="n">outputs</span><span class="p">[</span><span class="s2">&#34;9&#34;</span><span class="p">][</span><span class="s2">&#34;images&#34;</span><span class="p">][</span><span class="mi">0</span><span class="p">]</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">    <span class="n">qs</span> <span class="o">=</span> <span class="n">urllib</span><span class="o">.</span><span class="n">parse</span><span class="o">.</span><span class="n">urlencode</span><span class="p">({</span><span class="s2">&#34;filename&#34;</span><span class="p">:</span> <span class="n">img</span><span class="p">[</span><span class="s2">&#34;filename&#34;</span><span class="p">],</span> <span class="s2">&#34;type&#34;</span><span class="p">:</span> <span class="s2">&#34;output&#34;</span><span class="p">})</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl">    <span class="n">blob</span> <span class="o">=</span> <span class="n">http_get_bytes</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;/view?</span><span class="si">{</span><span class="n">qs</span><span class="si">}</span><span class="s2">&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl">    <span class="n">Path</span><span class="p">(</span><span class="n">args</span><span class="o">.</span><span class="n">out</span><span class="p">)</span><span class="o">.</span><span class="n">write_bytes</span><span class="p">(</span><span class="n">blob</span><span class="p">)</span></span></span></code></pre></div><p><strong>每段做什麼</strong>：</p>
<ol>
<li><strong><code>client_id = str(uuid.uuid4())</code></strong>：每個 client 識別碼。ComfyUI 用 client_id 把 progress events 路由給正確 WebSocket subscriber。本 demo 用 polling、client_id 隨意產生即可。</li>
<li><strong><code>POST /prompt</code></strong>：送 workflow + client_id、server 回 <code>prompt_id</code>（這次 job 的 UUID）。Server 把 workflow 丟進 internal queue、立刻 return、不會等 generation。</li>
<li><strong><code>while True: time.sleep(2); GET /history/{prompt_id}</code></strong>：polling 等 job 完成。完成的 job 才會出現在 <code>/history</code> 裡（執行中 / queued 都不算）。</li>
<li><strong><code>if prompt_id in history</code></strong>：完成判讀——history 內出現該 prompt_id 表示 generation 結束。</li>
<li><strong><code>outputs[&quot;9&quot;][&quot;images&quot;][0]</code></strong>：節點 9 (SaveImage) 的輸出、含 <code>filename</code>、<code>subfolder</code>、<code>type</code> 等資訊。</li>
<li><strong><code>/view?filename=...&amp;type=output</code></strong>：拿生成的 PNG bytes。<code>type=output</code> 是 ComfyUI 的內部 dir 標記（區分 output / input / temp）。</li>
</ol>
<p><strong>為什麼這樣設計</strong>：</p>
<ul>
<li><strong>為什麼 polling 而不是 WebSocket</strong>：WebSocket 要 subscribe events、處理 connection lifecycle、邏輯複雜。Polling 兩行解決、對教學 demo 夠用。Production 自動化系統建議用 WebSocket、知道每個 progress event。</li>
<li><strong>為什麼 <code>time.sleep(2)</code></strong>：太短（&lt; 1s）對 server 造成不必要 polling；太長（&gt; 5s）感知延遲明顯。2 秒是 demo 友善平衡。</li>
<li><strong>為什麼用 prompt_id 而不是 client_id 查 history</strong>：一個 client 可能送多個 job、prompt_id 唯一識別 job。client_id 主要用 WebSocket 訂閱、不是 history query 主鍵。</li>
<li><strong>為什麼 <code>Path(args.out).write_bytes(blob)</code></strong>：PNG 是 binary、用 <code>write_bytes</code> 直接寫；改用 <code>open(...).write()</code> 的 text mode 會在編碼轉換時破壞檔案內容。</li>
</ul>
<p><strong>實測</strong>：M4 Pro 32GB、prompt 「a photograph of an orange cat sitting on a wooden chair, soft natural lighting, detailed fur」、15 steps、cfg=7、euler+normal sampler、seed=42 → 106 秒生成 1024×1024 PNG、1.65 MB。</p>
<h2 id="comfyui-的-rest-api-形狀無-openai-相容層">ComfyUI 的 REST API 形狀（無 OpenAI 相容層）</h2>
<p>ComfyUI 沒提供 OpenAI 相容 API、它的 API 是自己的 REST + WebSocket：</p>
<ul>
<li><code>POST /prompt</code>：丟一個 workflow JSON、回傳 job id。</li>
<li><code>GET /history/{prompt_id}</code>：查看生成結果。</li>
<li><code>GET /view?filename=X</code>：拿生成的圖。</li>
<li>WebSocket：訂閱 job progress events。</li>
</ul>
<p>API 形狀跟 Diffusion 任務匹配、跟 LLM 的 <code>/chat/completions</code> 完全不同——這正是 <a href="/blog/llm/04-applications/rag-principles/" data-link-title="4.1 RAG 原理：retrieval &#43; augmentation 模式" data-link-desc="為什麼模型需要外掛知識、語意相似 vs 字面相似、chunking 的本質取捨、retrieval 失敗的根本原因">4.1 RAG 章節</a> 提到「Diffusion 跟 Transformer 工具鏈互不通用」的具體展現。Ollama / LM Studio 對接 Continue.dev 的 OpenAI 相容路徑、跟 ComfyUI 接 SDXL 是完全平行的兩條路。</p>
<h2 id="常用-custom-nodes">常用 Custom Nodes</h2>
<p>ComfyUI 的核心功能來自 custom nodes、社群維護。最常用：</p>
<table>
  <thead>
      <tr>
          <th>Custom Node</th>
          <th>功能</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>ComfyUI-Manager</td>
          <td>管理其他 custom node、安裝 / 更新</td>
      </tr>
      <tr>
          <td>ComfyUI-Impact-Pack</td>
          <td>物件偵測、masking、inpainting</td>
      </tr>
      <tr>
          <td>ComfyUI-AnimateDiff</td>
          <td>影片動畫生成</td>
      </tr>
      <tr>
          <td>ComfyUI-ControlNet-Aux</td>
          <td>ControlNet preprocessor</td>
      </tr>
      <tr>
          <td>ComfyUI-IPAdapter-plus</td>
          <td>圖像 reference embedding</td>
      </tr>
  </tbody>
</table>
<p>安裝方式（透過 ComfyUI-Manager）：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="nb">cd</span> ~/Projects/ComfyUI/custom_nodes
</span></span><span class="line"><span class="ln">2</span><span class="cl">git clone https://github.com/ltdrdata/ComfyUI-Manager.git
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="c1"># 重啟 ComfyUI、UI 多一個 Manager 按鈕、之後用 Manager 裝其他 node</span></span></span></code></pre></div><h2 id="常見坑">常見坑</h2>
<h3 id="python-版本太新torch-沒-wheel">Python 版本太新、torch 沒 wheel</h3>
<p>PyTorch 對最新 Python（3.13、3.14）的 wheel 發布有 lag、可能 <code>pip install -r requirements.txt</code> 跑 build from source 慢 + 失敗。退到 Python 3.11 / 3.12：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">brew install python@3.11
</span></span><span class="line"><span class="ln">2</span><span class="cl">python3.11 -m venv venv
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="nb">source</span> venv/bin/activate
</span></span><span class="line"><span class="ln">4</span><span class="cl">pip install -r requirements.txt</span></span></code></pre></div><h3 id="mps-false跑在-cpu-上"><code>mps: False</code>、跑在 CPU 上</h3>
<p>確認 PyTorch 是 Apple Silicon 版本（不是 x86_64 emulation）：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">python -c <span class="s2">&#34;import platform; print(platform.machine())&#34;</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"># arm64 ← 正確；x86_64 ← 走 Rosetta、要重裝</span></span></span></code></pre></div><p>如果是 x86_64、表示 venv 用了 Intel Python。重建 venv：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">deactivate
</span></span><span class="line"><span class="ln">2</span><span class="cl">rm -rf venv
</span></span><span class="line"><span class="ln">3</span><span class="cl">arch -arm64 python3 -m venv venv</span></span></code></pre></div><h3 id="記憶體不夠推論時-crash">記憶體不夠、推論時 crash</h3>
<p>SDXL 在 16 GB Mac 上吃緊、可能 swap 或 crash。緩解：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># 降解析度</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">python main.py --normalvram   <span class="c1"># 預設、~12 GB</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">python main.py --lowvram      <span class="c1"># 較省、~8 GB、慢</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">python main.py --novram       <span class="c1"># 極省、~4 GB、極慢、實用上界</span></span></span></code></pre></div><p>或換 SD 1.5（4 GB checkpoint）、記憶體需求 &lt; SDXL 的一半。</p>
<h3 id="workflow-json-載入失敗">Workflow JSON 載入失敗</h3>
<p>ComfyUI workflow 是 JSON 描述節點 + 連線。如果是別人分享的 workflow、可能用了你沒裝的 custom node。錯誤訊息會列出缺哪些 node、用 ComfyUI-Manager 補裝。</p>
<h3 id="port-8188-被佔">Port 8188 被佔</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">lsof -i :8188
</span></span><span class="line"><span class="ln">2</span><span class="cl">python main.py --port <span class="m">8189</span>  <span class="c1"># 改 port</span></span></span></code></pre></div><h2 id="跟-llm-stack-並存">跟 LLM stack 並存</h2>
<p>ComfyUI 用 port 8188、跟 Ollama (11434) / LM Studio (1234) 完全不撞、可同時跑。實務配置：</p>
<table>
  <thead>
      <tr>
          <th>服務</th>
          <th>Port</th>
          <th>用途</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Ollama</td>
          <td>11434</td>
          <td>寫 code、對話</td>
      </tr>
      <tr>
          <td>ComfyUI</td>
          <td>8188</td>
          <td>產圖</td>
      </tr>
      <tr>
          <td>LM Studio</td>
          <td>1234</td>
          <td>探索新 LLM</td>
      </tr>
      <tr>
          <td>Open WebUI</td>
          <td>3000</td>
          <td>ChatGPT 風格瀏覽器介面</td>
      </tr>
  </tbody>
</table>
<p>各服務獨立、不干擾、可以一台 Mac 跑全部（看記憶體預算）。</p>
<h2 id="何時這篇會過時">何時這篇會過時</h2>
<ul>
<li>ComfyUI 主分支 API 短期內穩定（大量社群依賴）。</li>
<li>SDXL base 1.0 不會消失、但會被新版本（SDXL 1.1、Flux 等）取代——「下載 .safetensors 放 models/checkpoints/」流程不變。</li>
<li>MPS backend 持續優化、效能會提升、但介面不變。</li>
<li>Python 版本相容性會持續演化、<code>pip install -r requirements.txt</code> 偶爾要降版 Python。</li>
</ul>
<p>讀的時候若 pip install 失敗、看 ComfyUI GitHub issues 跟 PyTorch release notes 對應的 Python 版本。</p>
<p>跟其他 hands-on 章節的關係：完整 hands-on 系列見 <a href="/blog/llm/01-local-llm-services/hands-on/" data-link-title="Hands-on：本地 AI 工具實作筆記" data-link-desc="Ollama / ComfyUI / Whisper / Piper TTS：實際安裝、驗證、跑通的紀錄。隨工具版本演化、跟 1.x 原理章節互補。">Hands-on 章節索引</a>、跨服務的 lifecycle / 記憶體管理見 <a href="/blog/llm/01-local-llm-services/hands-on/resource-management/" data-link-title="Hands-on：LLM 運行中 &#43; 結束的資源管理" data-link-desc="RAM / 磁碟 / port 三個 dimension 的觀察跟釋放、Ollama keep_alive 跟 ComfyUI 兩種 lifecycle 對比、實測釋放數字">Resource management</a>、ComfyUI 跟 Ollama 同台跑的記憶體預算規劃見 <a href="/blog/llm/00-foundations/hardware-memory-budget/" data-link-title="0.5 Apple Silicon 記憶體預算" data-link-desc="記憶體決定能跑什麼，Q4 量化下的可運作模型對照與系統保留">0.5 Apple Silicon 記憶體預算</a>。</p>
]]></content:encoded></item><item><title>Case Study：Blog 語意搜尋從 pickle 到 production</title><link>https://tarrragon.github.io/blog/llm/04-applications/hands-on/blog-vector-search/</link><pubDate>Wed, 01 Jul 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/llm/04-applications/hands-on/blog-vector-search/</guid><description>&lt;p>本案例記錄一個技術 blog（2,738 篇 markdown、24,216 chunks）的語意搜尋工具從 demo 到 production 的完整過程。每段標出對應 &lt;a href="https://tarrragon.github.io/blog/llm/04-applications/vector-storage-engineering/" data-link-title="4.22 RAG storage 工程：從 pickle 到 vector database 的選型判讀" data-link-desc="RAG storage backend 選型：規模到哪個階段該從 in-memory 升級到 vector DB、dependency chain 如何收窄選項">4.22 RAG storage 工程&lt;/a> 的哪個判讀步驟，讓讀者看到原理章的框架怎麼落到具體決策。&lt;/p>
&lt;blockquote>
&lt;p>&lt;strong>實測日期&lt;/strong>：2026-07-01
&lt;strong>環境&lt;/strong>：macOS Apple Silicon、Ollama 0.7.x、&lt;code>nomic-embed-text&lt;/code>（768 維）
&lt;strong>Corpus&lt;/strong>：&lt;code>content/&lt;/code> 全量 2,738 個 markdown 檔、24,216 chunks
&lt;strong>前置 demo&lt;/strong>：&lt;a href="https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/rag-demo/" data-link-title="Hands-on：用 blog content 當 corpus 跑 RAG" data-link-desc="200 行 Python：embedding &amp;#43; cosine retrieval &amp;#43; Ollama chat、validating 4.0 RAG 原理">rag-demo&lt;/a>（pickle、463 chunks）&lt;/p>&lt;/blockquote>
&lt;h3 id="讀法建議">讀法建議&lt;/h3>
&lt;p>本案例用 Go 重寫了 RAG storage 層，Go 實作細節佔不少篇幅。依你的背景選讀法：&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Python 開發者、想選自己專案的 storage 方案&lt;/strong>：先跳到「通用可複製流程」（語言無關的五步驟）→「四方案 benchmark」→「二次選型評估」（結論/理由/前提三層框架），這三段跨語言可遷移。Go 實作段（架構、效能優化）可 skim。&lt;/li>
&lt;li>&lt;strong>Go 開發者、想做類似工具&lt;/strong>：從頭讀，每段都跟你相關。&lt;/li>
&lt;li>&lt;strong>只想看選型框架、不管實作&lt;/strong>：直接跳「二次選型評估」。&lt;/li>
&lt;/ul>
&lt;h2 id="從-demo-到-production-的重寫動機">從 demo 到 production 的重寫動機&lt;/h2>
&lt;p>&lt;a href="https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/rag-demo/" data-link-title="Hands-on：用 blog content 當 corpus 跑 RAG" data-link-desc="200 行 Python：embedding &amp;#43; cosine retrieval &amp;#43; Ollama chat、validating 4.0 RAG 原理">rag-demo&lt;/a> 用 Python pickle 跑通了 RAG 概念驗證：71 篇 → 463 chunks → pickle 儲存 → cosine retrieval → Ollama 生成。概念層完全正確（4.1 的 retrieval + augmentation 骨架），但作為這個 blog 的日常工具有三個&lt;strong>專案特有的&lt;/strong>限制：&lt;/p>
&lt;ol>
&lt;li>&lt;strong>工具鏈語言不同&lt;/strong>：blog 的核心工具是 Go（lint / fmt / cards），加 Python dependency 讓其他維護者 clone 後多一步環境設定。Python 專案不會有這個問題 — pickle 綁 Python 對 Python 專案是優點而非缺點。&lt;/li>
&lt;li>&lt;strong>只索引部分 corpus&lt;/strong>：rag-demo 只跑 &lt;code>content/llm/&lt;/code>（71 篇），blog 全量有 2,738 篇、24 個 section。&lt;/li>
&lt;li>&lt;strong>Demo 定位&lt;/strong>：ingest.py / query.py 是教學程式碼，不是維護工具（沒有 status、沒有 section filter）。&lt;/li>
&lt;/ol>
&lt;p>這是一次&lt;strong>完整重寫&lt;/strong>、不是漸進升級 — rag-demo 的 Python 程式碼不會被修改或遷移，而是用 Go 重新實作相同的 RAG pipeline（chunk → embed → store → search）、保留相同的概念架構。rag-demo 作為教學 demo 繼續存在。&lt;/p>
&lt;p>升級目標：一個跟 &lt;code>mdtools&lt;/code> 同級的 Go CLI 工具，能對全量 content 做語意搜尋，其他維護者 clone 後 &lt;code>go build&lt;/code> 即可用。完整原始碼在 &lt;code>scripts/blogsearch/&lt;/code>。&lt;/p></description><content:encoded><![CDATA[<p>本案例記錄一個技術 blog（2,738 篇 markdown、24,216 chunks）的語意搜尋工具從 demo 到 production 的完整過程。每段標出對應 <a href="/blog/llm/04-applications/vector-storage-engineering/" data-link-title="4.22 RAG storage 工程：從 pickle 到 vector database 的選型判讀" data-link-desc="RAG storage backend 選型：規模到哪個階段該從 in-memory 升級到 vector DB、dependency chain 如何收窄選項">4.22 RAG storage 工程</a> 的哪個判讀步驟，讓讀者看到原理章的框架怎麼落到具體決策。</p>
<blockquote>
<p><strong>實測日期</strong>：2026-07-01
<strong>環境</strong>：macOS Apple Silicon、Ollama 0.7.x、<code>nomic-embed-text</code>（768 維）
<strong>Corpus</strong>：<code>content/</code> 全量 2,738 個 markdown 檔、24,216 chunks
<strong>前置 demo</strong>：<a href="/blog/llm/01-local-llm-services/hands-on/rag-demo/" data-link-title="Hands-on：用 blog content 當 corpus 跑 RAG" data-link-desc="200 行 Python：embedding &#43; cosine retrieval &#43; Ollama chat、validating 4.0 RAG 原理">rag-demo</a>（pickle、463 chunks）</p></blockquote>
<h3 id="讀法建議">讀法建議</h3>
<p>本案例用 Go 重寫了 RAG storage 層，Go 實作細節佔不少篇幅。依你的背景選讀法：</p>
<ul>
<li><strong>Python 開發者、想選自己專案的 storage 方案</strong>：先跳到「通用可複製流程」（語言無關的五步驟）→「四方案 benchmark」→「二次選型評估」（結論/理由/前提三層框架），這三段跨語言可遷移。Go 實作段（架構、效能優化）可 skim。</li>
<li><strong>Go 開發者、想做類似工具</strong>：從頭讀，每段都跟你相關。</li>
<li><strong>只想看選型框架、不管實作</strong>：直接跳「二次選型評估」。</li>
</ul>
<h2 id="從-demo-到-production-的重寫動機">從 demo 到 production 的重寫動機</h2>
<p><a href="/blog/llm/01-local-llm-services/hands-on/rag-demo/" data-link-title="Hands-on：用 blog content 當 corpus 跑 RAG" data-link-desc="200 行 Python：embedding &#43; cosine retrieval &#43; Ollama chat、validating 4.0 RAG 原理">rag-demo</a> 用 Python pickle 跑通了 RAG 概念驗證：71 篇 → 463 chunks → pickle 儲存 → cosine retrieval → Ollama 生成。概念層完全正確（4.1 的 retrieval + augmentation 骨架），但作為這個 blog 的日常工具有三個<strong>專案特有的</strong>限制：</p>
<ol>
<li><strong>工具鏈語言不同</strong>：blog 的核心工具是 Go（lint / fmt / cards），加 Python dependency 讓其他維護者 clone 後多一步環境設定。Python 專案不會有這個問題 — pickle 綁 Python 對 Python 專案是優點而非缺點。</li>
<li><strong>只索引部分 corpus</strong>：rag-demo 只跑 <code>content/llm/</code>（71 篇），blog 全量有 2,738 篇、24 個 section。</li>
<li><strong>Demo 定位</strong>：ingest.py / query.py 是教學程式碼，不是維護工具（沒有 status、沒有 section filter）。</li>
</ol>
<p>這是一次<strong>完整重寫</strong>、不是漸進升級 — rag-demo 的 Python 程式碼不會被修改或遷移，而是用 Go 重新實作相同的 RAG pipeline（chunk → embed → store → search）、保留相同的概念架構。rag-demo 作為教學 demo 繼續存在。</p>
<p>升級目標：一個跟 <code>mdtools</code> 同級的 Go CLI 工具，能對全量 content 做語意搜尋，其他維護者 clone 後 <code>go build</code> 即可用。完整原始碼在 <code>scripts/blogsearch/</code>。</p>
<h2 id="選型過程對應-422-演化階梯--工程約束">選型過程（對應 4.22 演化階梯 + 工程約束）</h2>
<h3 id="第一軸規模判讀">第一軸：規模判讀</h3>
<p>全量 content 產生 24,216 chunks（原本估計 ~1,500）。按 4.22 判讀樹，24K 落在「10K-100K → HNSW 或 brute-force」區間。預估 vs 實際的 16 倍落差揭露一個教訓：<strong>估計 chunk 數不能用篇數乘以常數</strong>，要看每篇的實際長度跟 chunking 策略。</p>
<h3 id="第二軸工程約束本專案特有">第二軸：工程約束（本專案特有）</h3>
<p>以下四個 constraint 反映<strong>這個 blog 專案的偏好</strong>、不是通用判準。換一組 constraint 會篩出完全不同的方案 — Python 專案不會有「Go 單 binary」constraint、已有 Docker 的團隊不會排斥外部 server。讀者套用時應先列出自己專案的 constraint、不是照搬這張表。</p>
<table>
  <thead>
      <tr>
          <th>Constraint</th>
          <th>砍掉什麼</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Go 單 binary</td>
          <td>Python-only 方案（pickle / FAISS）</td>
      </tr>
      <tr>
          <td>不要 CGo</td>
          <td>sqlite-vec（需要 <code>mattn/go-sqlite3</code>）</td>
      </tr>
      <tr>
          <td>不要外部 server</td>
          <td>Qdrant / Weaviate / Pinecone</td>
      </tr>
      <tr>
          <td>Ollama 原生</td>
          <td>OpenAI / Cohere embedding（多一個 API key）</td>
      </tr>
  </tbody>
</table>
<p>剩餘選項：<strong>Go + flat file + brute-force</strong>。</p>
<h3 id="第三軸延遲容忍">第三軸：延遲容忍</h3>
<p>CLI 工具、每天用幾次、不是 API server。&lt; 500ms 可接受。</p>
<p>結論：選階段二（flat file），brute-force cosine。</p>
<h2 id="實作架構">實作架構</h2>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln"> 1</span><span class="cl">scripts/blogsearch/
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">├── main.go                     # CLI: ingest / query / status
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">├── cmd/
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">│   ├── ingest.go               # walk content/ → chunk → embed → store
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">│   ├── query.go                # load → embed query → cosine top-K → lazy load text
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">│   └── status.go               # index stats
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">└── internal/
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">    ├── chunk/chunk.go           # paragraph-aware markdown chunking
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">    ├── embed/embed.go           # Ollama HTTP API wrapper
</span></span><span class="line"><span class="ln">10</span><span class="cl">    ├── search/search.go         # brute-force cosine similarity
</span></span><span class="line"><span class="ln">11</span><span class="cl">    └── store/store.go           # 三檔案 binary store</span></span></code></pre></div><h3 id="日常使用">日常使用</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># 語意搜尋</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">./bin/blogsearch query <span class="s2">&#34;retry 策略&#34;</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="c1"># 只搜特定 section</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">./bin/blogsearch query -section backend <span class="s2">&#34;connection pool 設定&#34;</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl">
</span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="c1"># 查 index 狀態</span>
</span></span><span class="line"><span class="ln">8</span><span class="cl">./bin/blogsearch status</span></span></code></pre></div><h3 id="storage-格式三檔案分離">Storage 格式（三檔案分離）</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">.blogsearch/
</span></span><span class="line"><span class="ln">2</span><span class="cl">├── vectors.bin    # float32 binary（70.9 MB）— bulk read + unsafe.Slice 零拷貝
</span></span><span class="line"><span class="ln">3</span><span class="cl">├── meta.json      # compact metadata 不含 text（7.3 MB）
</span></span><span class="line"><span class="ln">4</span><span class="cl">└── texts.bin      # length-prefixed chunk text（19.2 MB）— top-K 才 lazy load</span></span></code></pre></div><p>分離 text 的設計理由：query 時只需要 vectors + metadata 做 cosine search（78 MB），top-K 結果才從 texts.bin 按 offset 讀取 5 筆 text。省掉 19 MB 的 JSON 解析。</p>
<h2 id="效能優化歷程">效能優化歷程</h2>
<h3 id="初版95-秒">初版：9.5 秒</h3>
<p>初版用逐 4-byte Read 載入 vectors.bin（17.5M 次 <code>f.Read(buf)</code>），加上 27MB 的 index.json（含所有 chunk text）一次 JSON 解析。</p>
<h3 id="優化版034-秒28x">優化版：0.34 秒（28x）</h3>
<p>三項改動：</p>
<table>
  <thead>
      <tr>
          <th>改動</th>
          <th>從</th>
          <th>到</th>
          <th>效果</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>vectors.bin 讀法</td>
          <td>逐 4-byte Read</td>
          <td><code>os.ReadFile</code> + <code>unsafe.Slice</code></td>
          <td>I/O call 17.5M → 1</td>
      </tr>
      <tr>
          <td>metadata 格式</td>
          <td>含 text（27 MB）</td>
          <td>不含 text（7.3 MB）</td>
          <td>JSON parse 快 4x</td>
      </tr>
      <tr>
          <td>text 載入</td>
          <td>全量</td>
          <td>top-K lazy load（只讀 5 筆）</td>
          <td>省 19 MB 讀取</td>
      </tr>
  </tbody>
</table>
<p>瓶頸分析：0.34 秒裡、embedding API call（Ollama）約 77ms、file I/O + JSON parse 約 200ms、cosine 計算約 50ms。cosine 計算只佔 15%。</p>
<h2 id="通用可複製流程抽掉-goblog">通用可複製流程（抽掉 Go/blog）</h2>
<p>本案例的 Go 實作細節（<code>unsafe.Slice</code>、<code>os.ReadFile</code>）是語言特定的、但背後的流程步驟跨語言通用：</p>
<ol>
<li><strong>Walk corpus</strong>：遞迴掃描目標目錄的所有文件（markdown / code / 任意文字）</li>
<li><strong>Chunk</strong>：段落感知分割、soft token cap、保留語意邊界（原理見 <a href="/blog/llm/04-applications/rag-principles/" data-link-title="4.1 RAG 原理：retrieval &#43; augmentation 模式" data-link-desc="為什麼模型需要外掛知識、語意相似 vs 字面相似、chunking 的本質取捨、retrieval 失敗的根本原因">4.1 Chunking</a>）</li>
<li><strong>Embed</strong>：對每個 chunk 呼叫 embedding API（本地 Ollama 或 cloud API），得到固定維度向量</li>
<li><strong>Store</strong>：向量 + metadata + text 分離存檔（binary vectors / compact JSON / lazy-load text）</li>
<li><strong>Search</strong>：embed query → brute-force cosine → top-K → lazy load text for display</li>
</ol>
<p>Python 實作同流程只是把第 4 步的 binary 檔換成 pickle / FAISS index / SQLite DB、第 5 步的 cosine 換成 numpy / FAISS / sqlite-vec query。Node.js / Rust 同理。</p>
<p>關鍵優化原則也跨語言：「分離向量與文字、query 時只載入向量、top-K 才載入文字」讓 I/O 量從 ~98MB 降到 ~78MB、JSON parse 從 27MB 降到 7MB。這個原則用什麼語言實作都有效。</p>
<h2 id="四方案同-corpus-benchmark">四方案同 corpus Benchmark</h2>
<p>用同一個 corpus（24,216 chunks、768 維、nomic-embed-text）比較四種 storage 方案。Benchmark 腳本在 <code>scripts/blogsearch-bench/bench.py</code>。</p>
<h3 id="前置依賴">前置依賴</h3>
<p>Benchmark 腳本讀 Go 工具產生的 index（<code>.blogsearch/</code> 下的 <code>vectors.bin</code> + <code>meta.json</code>）。完整指令鏈：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="nb">cd</span> scripts/blogsearch <span class="o">&amp;&amp;</span> go build -o ../../bin/blogsearch .   <span class="c1"># build Go 工具</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">ollama serve <span class="p">&amp;</span>                                                  <span class="c1"># 啟動 Ollama</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">ollama pull nomic-embed-text                                    <span class="c1"># pull embedding model</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">./bin/blogsearch ingest -content content -out .blogsearch       <span class="c1"># 建 index（~4 分鐘）</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">uv run --with sqlite-vec --with faiss-cpu --with numpy <span class="se">\
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="se"></span>  scripts/blogsearch-bench/bench.py --index .blogsearch         <span class="c1"># 跑 benchmark</span></span></span></code></pre></div><p>若無 Go 環境，可用自己的 Python embedding 腳本產生相同格式的 <code>vectors.bin</code>（little-endian float32、n × dim 連續排列）+ <code>meta.json</code>（<code>{&quot;dim&quot;: 768, &quot;count&quot;: n, &quot;metas&quot;: [...]}</code>），benchmark 腳本只讀這兩個檔案、不依賴 Go binary 本身。Corpus 格式無硬性要求，任何目錄下的 <code>.md</code> 檔案都可索引。</p>
<h3 id="方法論">方法論</h3>
<ul>
<li><strong>Embedding</strong>：四方案共用同一組 embedding（從 Go index 載入），排除 embedding model 差異</li>
<li><strong>Query</strong>：同一句 query（&ldquo;RAG storage 選型&rdquo;），跑 5 次取 median</li>
<li><strong>Ingest 時間</strong>：只計 storage 操作（不含 embedding），Go 方案含 embedding 不可分離故標 —</li>
<li><strong>環境</strong>：macOS Apple Silicon、Python 3.12、Go 1.25</li>
</ul>
<h3 id="結果">結果</h3>
<table>
  <thead>
      <tr>
          <th>方案</th>
          <th>Ingest（純 storage）</th>
          <th>Query（median）</th>
          <th>Index 大小</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Go + flat file</td>
          <td>—</td>
          <td>151ms</td>
          <td>97.4 MB</td>
      </tr>
      <tr>
          <td>Python sqlite-vec</td>
          <td>2.9s</td>
          <td>19ms</td>
          <td>75.3 MB</td>
      </tr>
      <tr>
          <td>Python FAISS flat</td>
          <td>40ms</td>
          <td>1.8ms</td>
          <td>in-memory</td>
      </tr>
      <tr>
          <td>Python FAISS HNSW</td>
          <td>23.3s</td>
          <td>0.5ms</td>
          <td>in-memory</td>
      </tr>
  </tbody>
</table>
<h3 id="三個關鍵發現">三個關鍵發現</h3>
<p><strong>延遲瓶頸在 I/O 和實作、不在演算法</strong>。Go flat file 的 151ms 裡、cosine 計算約 50ms、file I/O 約 100ms。FAISS flat 用 numpy BLAS 做同樣的 brute-force cosine、純計算 1.8ms — 計算層差約 28 倍（Go pure loop vs BLAS 向量化指令），加上 I/O 後端到端差 84 倍。</p>
<p><strong>HNSW 的 query 加速在此規模 ROI 低</strong>。FAISS HNSW query 0.5ms vs flat 1.8ms、每次省 1.3ms。但 HNSW build 要 23.3s。每天查 100 次、要 179 天才回本 build 成本（23.3s ÷ 0.13s/天）。4.22 的判讀結論（「此規模 brute-force 夠用」）被數據驗證。</p>
<p><strong>sqlite-vec 的 19ms 是「DB overhead 換功能」</strong>。比 FAISS flat 慢 10 倍、但多了 SQL metadata filter、transaction 保護、disk persistence。對「需要 filter 但不想維運 server」的場景有意義。</p>
<h3 id="讀數據的注意事項">讀數據的注意事項</h3>
<ul>
<li>Go 151ms 含 file I/O（每次 query 重載 78MB）；如果做 daemon mode（常駐、載入一次），query 會降到 ~50ms（純 cosine + overhead）</li>
<li>FAISS 數字是 in-memory baseline（index 已載入），不含 index 檔案的載入時間</li>
<li>sqlite-vec 數字含 disk I/O（每次 query 從 SQLite 讀取），是 persistent storage 的真實代價</li>
<li>四方案都不含 Ollama embedding call 時間（~77ms），實際端到端延遲要加上</li>
</ul>
<h2 id="二次選型評估同結論理由鏈翻轉">二次選型評估：同結論、理由鏈翻轉</h2>
<p>Benchmark 數據出來後，80 倍效能差距讓原始選型（Go + flat file）受到質疑：「是否該換 Python + FAISS 或 sqlite-vec？」重新用 WRAP 框架評估，結論相同（維持 Go），但理由鏈完全不同。</p>
<h3 id="第一次選型的理由事前">第一次選型的理由（事前）</h3>
<p>「Go 工具鏈統一（mdtools 是 Go）+ 單 binary 分發（clone 後 <code>go build</code> 即可）。」</p>
<h3 id="實測推翻的前提">實測推翻的前提</h3>
<table>
  <thead>
      <tr>
          <th>原始假設</th>
          <th>實測</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Corpus ~1,500 chunks</td>
          <td>24,216 chunks（16 倍）</td>
      </tr>
      <tr>
          <td>Brute-force &lt; 10ms</td>
          <td>Go 151ms（I/O 瓶頸、不是計算）</td>
      </tr>
      <tr>
          <td>語言效能差異不大</td>
          <td>Go pure cosine vs numpy BLAS 差 80 倍</td>
      </tr>
      <tr>
          <td>「工具鏈統一」很重要</td>
          <td>mdtools（pre-commit、延遲敏感）跟 blogsearch（手動 CLI、每天幾次）使用模式不同，強制統一語言是用「同一棟建築」邏輯要求「不同用途房間用同一種建材」</td>
      </tr>
  </tbody>
</table>
<p>第一次的理由鏈幾乎全數被推翻。如果只看理由，應該換方案。</p>
<h3 id="第二次選型的理由事後">第二次選型的理由（事後）</h3>
<p>重新評估時加入三個第一次沒有的變數：</p>
<p><strong>端到端延遲 vs in-memory benchmark</strong>。84 倍是端到端的數字（Go 151ms 含 I/O vs FAISS 1.8ms in-memory）。但 FAISS 從 disk 載入 index 也要 ~100-200ms，端到端差距縮小到 2 倍。sqlite-vec 是唯一不需要全量載入的方案（disk-based HNSW、端到端 19ms），差距從「84 倍」變成「8 倍」。</p>
<p><strong>使用頻率決定 ROI</strong>。CLI 工具、每天 ~10 次手動 query。每次省 130ms（151 vs 19），一天省 1.3 秒。重寫投入 2-3 小時，回本時間 ≈ 19 年。注意這個計算對頻率極敏感：每天 100 次（如被整合進 MCP server 當 agent 工具）回本縮短到 1.9 年、每天 1000 次則 69 天。上方 HNSW ROI 也用每天 100 次計算 — 兩處頻率假設不同是因為比較對象不同（HNSW build 成本 vs 語言重寫成本），但讀者套到自己場景時應先確定自己的查詢頻率。</p>
<p><strong>Ingest 瓶頸在 Ollama API、跟語言無關</strong>。~4 分鐘的 ingest 裡、embedding API call 佔 95% 以上。換 Python 不會改善 ingest 速度。</p>
<h3 id="維持的理由是痛點不存在">維持的理由是「痛點不存在」</h3>
<p>維持 Go 的理由是<strong>改善的絕對收益太小、投入回不了本</strong> — 151ms 對 CLI 使用模式不構成痛點，與「Go 好」或「工具鏈統一」無關。</p>
<h3 id="這個翻轉的教學意義">這個翻轉的教學意義</h3>
<p>正確的結論配錯誤的理由是脆弱的。第一次 WRAP 的結論（選 Go）在當時是對的，但理由鏈（工具鏈統一、&lt; 10ms）被實測推翻後，如果不重新建立正確的理由鏈，下次環境變動（比如 blogsearch 從 CLI 變成 API server）就會用已失效的理由做出錯誤判斷。</p>
<p>判讀工具選型時，要區分三層：</p>
<ol>
<li><strong>結論</strong>：選什麼方案</li>
<li><strong>理由</strong>：為什麼選（可能被推翻）</li>
<li><strong>前提</strong>：理由依賴的假設（規模、使用模式、效能數字）</li>
</ol>
<p>前提變了、理由就要重建，即使結論沒變。寫進決策紀錄時，三層都要記 — 只記結論的話，下次重新評估時沒有判讀基礎。</p>
<p>區分「正當理由重建」跟「動機性推理」（先有結論再找理由）的判準：新理由是否在看到數據之前也能成立？本例的「130ms 對 CLI 不痛」在實測前也成立（CLI 使用模式本來就低頻），所以是正當重建。如果新理由只能在看到特定數字之後才講得通（如「151ms 剛好在 200ms 閾值內」——但閾值是事後設的），就是 post-hoc rationalization。</p>
<h3 id="觸發換方案的訊號">觸發換方案的訊號</h3>
<table>
  <thead>
      <tr>
          <th>訊號</th>
          <th>門檻</th>
          <th>動作</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Query 延遲不可接受</td>
          <td>&gt; 500ms</td>
          <td>先加 mmap（最小改動）</td>
      </tr>
      <tr>
          <td>使用模式改變</td>
          <td>從 CLI 變 API server</td>
          <td>換 Python sqlite-vec</td>
      </tr>
      <tr>
          <td>查詢頻率跳增</td>
          <td>被整合進 MCP server / agent 工具</td>
          <td>評估 daemon mode 或換 sqlite-vec</td>
      </tr>
      <tr>
          <td>Corpus 規模跳增</td>
          <td>&gt; 50K chunks</td>
          <td>重跑 benchmark</td>
      </tr>
      <tr>
          <td>需要原生 metadata filter</td>
          <td>code filter 維護成本過高</td>
          <td>換 Python sqlite-vec</td>
      </tr>
  </tbody>
</table>
<h2 id="embedding-model-選型對應-412-constraint-優先序">Embedding model 選型（對應 4.12 constraint 優先序）</h2>
<p>選 <code>nomic-embed-text</code> 的理由鏈：</p>
<ol>
<li><strong>Ollama 原生支援</strong>：<code>ollama pull</code> 一行、不需要額外 Python library 或 API key</li>
<li><strong>體積小</strong>：274 MB、跟 chat model 共用記憶體不打架</li>
<li><strong>已有驗證基線</strong>：rag-demo 用同一個模型跑過 463 chunks、retrieval 命中率確認可用</li>
<li><strong>768 維 sweet spot</strong>：24K chunks × 768 dim × 4 bytes = 70.9 MB，brute-force 可行</li>
</ol>
<p>未來如果 CJK retrieval 品質不夠（目前可用但未做系統性評估），<code>multilingual-e5-large</code> 或 <code>bge-m3</code> 是備選。換模型只需改 <code>embed.go</code> 的 Model 變數 + 重新 <code>blogsearch ingest</code>（4.22 的「四層可替換」設計）。</p>
<h2 id="cjk-混合-chunking-觀察">CJK 混合 Chunking 觀察</h2>
<p>Blog 內容是繁體中文 + 英文術語混合。Chunking 策略沿用 rag-demo 的 paragraph-aware split（空白行切段、soft token cap 400）。</p>
<p>Token 估算用 <code>len(s) / 2</code> 的 heuristic（CJK 字元多算一次）。不精確但 chunking 只需要粗略估算。跟 tokenizer 精確計算的差異在 ±20%、對 chunking 品質影響小於 chunk 邊界選擇的影響。</p>
<p>實際觀察：24,216 chunks 的 retrieval 品質在語意搜尋場景（「哪些文章跟 retry 有關」「RAG storage 選型」）表現良好。keyword 精確搜尋場景（「找 RFC 7807」）表現較弱 — 這是 embedding-only retrieval 的已知限制（見 4.1 的語意 vs 字面相似度對比），未來可加 BM25 做 hybrid search。</p>
<h2 id="跟其他章節的對應">跟其他章節的對應</h2>
<table>
  <thead>
      <tr>
          <th>本案例的段落</th>
          <th>對應原理章節</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>選型過程</td>
          <td><a href="/blog/llm/04-applications/vector-storage-engineering/" data-link-title="4.22 RAG storage 工程：從 pickle 到 vector database 的選型判讀" data-link-desc="RAG storage backend 選型：規模到哪個階段該從 in-memory 升級到 vector DB、dependency chain 如何收窄選項">4.22 演化階梯 + 工程約束</a></td>
      </tr>
      <tr>
          <td>二次選型評估</td>
          <td><a href="/blog/llm/04-applications/vector-storage-engineering/" data-link-title="4.22 RAG storage 工程：從 pickle 到 vector database 的選型判讀" data-link-desc="RAG storage backend 選型：規模到哪個階段該從 in-memory 升級到 vector DB、dependency chain 如何收窄選項">4.22 同 corpus 實測比較</a></td>
      </tr>
      <tr>
          <td>Embedding 選型</td>
          <td><a href="/blog/llm/04-applications/embedding-model-internals/" data-link-title="4.12 Embedding model 內部：訓練、選型、in-domain fine-tune" data-link-desc="Embedding model 怎麼訓練（contrastive learning &#43; hard negative mining）、怎麼挑（MTEB / 大小 / domain）、何時該自己 fine-tune">4.12 實務選型 constraint 優先序</a></td>
      </tr>
      <tr>
          <td>Chunking</td>
          <td><a href="/blog/llm/04-applications/rag-principles/" data-link-title="4.1 RAG 原理：retrieval &#43; augmentation 模式" data-link-desc="為什麼模型需要外掛知識、語意相似 vs 字面相似、chunking 的本質取捨、retrieval 失敗的根本原因">4.1 Chunking 策略對比</a></td>
      </tr>
      <tr>
          <td>Benchmark 方法論</td>
          <td><a href="/blog/llm/04-applications/benchmarking-and-evaluation/" data-link-title="4.14 Benchmarking 與評估方法論" data-link-desc="判讀 model card benchmark 數字、做自己工作流的 in-house benchmark、量測本地推論速度的完整方法論">4.14 Benchmarking 方法論</a></td>
      </tr>
      <tr>
          <td>Storage 格式設計</td>
          <td><a href="/blog/llm/04-applications/artifact-management/" data-link-title="4.10 衍生產物管理原理：什麼進 git、什麼不該" data-link-desc="LLM 應用的 source / derived / external 三類產物對應 git / build cache / registry、與 production 部署的 reproducibility / cost / share 取捨">4.10 衍生產物管理</a></td>
      </tr>
      <tr>
          <td>Retrieval 品質</td>
          <td><a href="/blog/llm/04-applications/rag-principles/" data-link-title="4.1 RAG 原理：retrieval &#43; augmentation 模式" data-link-desc="為什麼模型需要外掛知識、語意相似 vs 字面相似、chunking 的本質取捨、retrieval 失敗的根本原因">4.1 Retrieval 失敗根因</a></td>
      </tr>
  </tbody>
</table>
]]></content:encoded></item><item><title>Hands-on：安裝 whisper.cpp 做語音轉文字</title><link>https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/whisper-setup/</link><pubDate>Tue, 12 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/whisper-setup/</guid><description>&lt;p>本篇紀錄在 Apple Silicon Mac 上裝 &lt;code>whisper.cpp&lt;/code> 並驗證英文語音轉文字。選 whisper.cpp 而非 &lt;code>openai-whisper&lt;/code>（Python 版）的理由：&lt;/p>
&lt;ul>
&lt;li>純 C++ 實作、Metal backend 直接吃 Apple Silicon GPU。&lt;/li>
&lt;li>Homebrew bottle、&lt;code>brew install&lt;/code> 一行裝完、不需要 Python 環境跟 torch wheel。&lt;/li>
&lt;li>Binary 名稱是 &lt;code>whisper-cli&lt;/code>、CLI-first、整合到 shell pipeline 容易。&lt;/li>
&lt;/ul>
&lt;blockquote>
&lt;p>&lt;strong>驗證日期&lt;/strong>：2026-05-12
&lt;strong>whisper-cpp 版本&lt;/strong>：1.8.4
&lt;strong>示範模型&lt;/strong>：&lt;code>ggml-tiny.en.bin&lt;/code>（78 MB、英文專用、最小可用）
&lt;strong>實測&lt;/strong>：7 秒音訊 484ms 轉錄、用 Metal GPU 加速&lt;/p>&lt;/blockquote>
&lt;h2 id="前置設定">前置設定&lt;/h2>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>項目&lt;/th>
 &lt;th>檢查指令&lt;/th>
 &lt;th>預期&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>Homebrew&lt;/td>
 &lt;td>&lt;code>brew --version&lt;/code>&lt;/td>
 &lt;td>4.x&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>ffmpeg&lt;/td>
 &lt;td>&lt;code>which ffmpeg&lt;/code>&lt;/td>
 &lt;td>&lt;code>/opt/homebrew/bin/ffmpeg&lt;/code>（沒有：&lt;code>brew install ffmpeg&lt;/code>）&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>磁碟空間&lt;/td>
 &lt;td>&lt;code>df -h ~&lt;/code>&lt;/td>
 &lt;td>至少 200 MB（whisper-cli + 1 個 small model）&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>&lt;code>ffmpeg&lt;/code> 是必要的——whisper-cli 接受多種音訊格式、但實際內部會先轉成 16kHz mono WAV、ffmpeg 是這個轉換的依賴。&lt;/p>
&lt;h2 id="安裝-whisper-cpp">安裝 whisper-cpp&lt;/h2>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">brew install whisper-cpp&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Homebrew 會裝：&lt;/p>
&lt;ul>
&lt;li>&lt;code>whisper-cli&lt;/code> binary 到 &lt;code>/opt/homebrew/bin/&lt;/code>&lt;/li>
&lt;li>&lt;code>ggml&lt;/code> 共用 lib 到 &lt;code>/opt/homebrew/Cellar/ggml/&lt;/code>&lt;/li>
&lt;li>BLAS / Metal backend 自動配對 Apple Silicon&lt;/li>
&lt;/ul>
&lt;p>驗證 binary 可用：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">which whisper-cli
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="c1"># /opt/homebrew/bin/whisper-cli&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">whisper-cli --help 2&amp;gt;&lt;span class="p">&amp;amp;&lt;/span>&lt;span class="m">1&lt;/span> &lt;span class="p">|&lt;/span> head -5&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>第一次跑會看到 Metal 初始化訊息：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-text" data-lang="text">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">ggml_metal_library_init: using embedded metal library
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">ggml_metal_library_init: loaded in 6.883 sec&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>第一次 Metal lib 載入慢（~7 秒）、後續會 cache、變很快。&lt;/p>
&lt;h2 id="下載-model">下載 Model&lt;/h2>
&lt;p>whisper-cpp 跟 OpenAI 原版分離管理 model file、要自己下載 GGML 格式：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">mkdir -p ~/.whisper-models
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="nb">cd&lt;/span> ~/.whisper-models
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">curl -L -o ggml-tiny.en.bin &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">&lt;span class="se">&lt;/span> &lt;span class="s2">&amp;#34;https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-tiny.en.bin&amp;#34;&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>可用 model 比較（大小越大、品質越好、速度越慢）：&lt;/p>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>Model&lt;/th>
 &lt;th>大小&lt;/th>
 &lt;th>適合場景&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>&lt;code>ggml-tiny.en.bin&lt;/code>&lt;/td>
 &lt;td>78 MB&lt;/td>
 &lt;td>英文、最小驗證、品質可接受&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;code>ggml-base.en.bin&lt;/code>&lt;/td>
 &lt;td>148 MB&lt;/td>
 &lt;td>英文、常用入門&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;code>ggml-small.en.bin&lt;/code>&lt;/td>
 &lt;td>488 MB&lt;/td>
 &lt;td>英文、daily use 甜蜜點&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;code>ggml-medium.en.bin&lt;/code>&lt;/td>
 &lt;td>1.5 GB&lt;/td>
 &lt;td>英文、品質敏感&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;code>ggml-small.bin&lt;/code>&lt;/td>
 &lt;td>488 MB&lt;/td>
 &lt;td>多語言（含中文）&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;code>ggml-large-v3.bin&lt;/code>&lt;/td>
 &lt;td>3.1 GB&lt;/td>
 &lt;td>多語言、最佳品質、跑得最慢&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>選 &lt;code>tiny.en&lt;/code> 是因為&lt;strong>只驗證安裝路徑&lt;/strong>、實際日常用要 &lt;code>small.en&lt;/code> 起跳。&lt;/p>
&lt;p>驗證下載：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">ls -lh ~/.whisper-models/
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="c1"># 應該看到 78 MB 的 ggml-tiny.en.bin&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="跑第一次轉錄">跑第一次轉錄&lt;/h2>
&lt;p>需要一段測試音訊。可以用 macOS 內建 &lt;code>say&lt;/code> 生成、再用 ffmpeg 轉成 whisper.cpp 需要的格式（16kHz mono WAV）：&lt;/p></description><content:encoded><![CDATA[<p>本篇紀錄在 Apple Silicon Mac 上裝 <code>whisper.cpp</code> 並驗證英文語音轉文字。選 whisper.cpp 而非 <code>openai-whisper</code>（Python 版）的理由：</p>
<ul>
<li>純 C++ 實作、Metal backend 直接吃 Apple Silicon GPU。</li>
<li>Homebrew bottle、<code>brew install</code> 一行裝完、不需要 Python 環境跟 torch wheel。</li>
<li>Binary 名稱是 <code>whisper-cli</code>、CLI-first、整合到 shell pipeline 容易。</li>
</ul>
<blockquote>
<p><strong>驗證日期</strong>：2026-05-12
<strong>whisper-cpp 版本</strong>：1.8.4
<strong>示範模型</strong>：<code>ggml-tiny.en.bin</code>（78 MB、英文專用、最小可用）
<strong>實測</strong>：7 秒音訊 484ms 轉錄、用 Metal GPU 加速</p></blockquote>
<h2 id="前置設定">前置設定</h2>
<table>
  <thead>
      <tr>
          <th>項目</th>
          <th>檢查指令</th>
          <th>預期</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Homebrew</td>
          <td><code>brew --version</code></td>
          <td>4.x</td>
      </tr>
      <tr>
          <td>ffmpeg</td>
          <td><code>which ffmpeg</code></td>
          <td><code>/opt/homebrew/bin/ffmpeg</code>（沒有：<code>brew install ffmpeg</code>）</td>
      </tr>
      <tr>
          <td>磁碟空間</td>
          <td><code>df -h ~</code></td>
          <td>至少 200 MB（whisper-cli + 1 個 small model）</td>
      </tr>
  </tbody>
</table>
<p><code>ffmpeg</code> 是必要的——whisper-cli 接受多種音訊格式、但實際內部會先轉成 16kHz mono WAV、ffmpeg 是這個轉換的依賴。</p>
<h2 id="安裝-whisper-cpp">安裝 whisper-cpp</h2>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">brew install whisper-cpp</span></span></code></pre></div><p>Homebrew 會裝：</p>
<ul>
<li><code>whisper-cli</code> binary 到 <code>/opt/homebrew/bin/</code></li>
<li><code>ggml</code> 共用 lib 到 <code>/opt/homebrew/Cellar/ggml/</code></li>
<li>BLAS / Metal backend 自動配對 Apple Silicon</li>
</ul>
<p>驗證 binary 可用：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">which whisper-cli
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"># /opt/homebrew/bin/whisper-cli</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">
</span></span><span class="line"><span class="ln">4</span><span class="cl">whisper-cli --help 2&gt;<span class="p">&amp;</span><span class="m">1</span> <span class="p">|</span> head -5</span></span></code></pre></div><p>第一次跑會看到 Metal 初始化訊息：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">ggml_metal_library_init: using embedded metal library
</span></span><span class="line"><span class="ln">2</span><span class="cl">ggml_metal_library_init: loaded in 6.883 sec</span></span></code></pre></div><p>第一次 Metal lib 載入慢（~7 秒）、後續會 cache、變很快。</p>
<h2 id="下載-model">下載 Model</h2>
<p>whisper-cpp 跟 OpenAI 原版分離管理 model file、要自己下載 GGML 格式：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">mkdir -p ~/.whisper-models
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="nb">cd</span> ~/.whisper-models
</span></span><span class="line"><span class="ln">3</span><span class="cl">curl -L -o ggml-tiny.en.bin <span class="se">\
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="se"></span>  <span class="s2">&#34;https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-tiny.en.bin&#34;</span></span></span></code></pre></div><p>可用 model 比較（大小越大、品質越好、速度越慢）：</p>
<table>
  <thead>
      <tr>
          <th>Model</th>
          <th>大小</th>
          <th>適合場景</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><code>ggml-tiny.en.bin</code></td>
          <td>78 MB</td>
          <td>英文、最小驗證、品質可接受</td>
      </tr>
      <tr>
          <td><code>ggml-base.en.bin</code></td>
          <td>148 MB</td>
          <td>英文、常用入門</td>
      </tr>
      <tr>
          <td><code>ggml-small.en.bin</code></td>
          <td>488 MB</td>
          <td>英文、daily use 甜蜜點</td>
      </tr>
      <tr>
          <td><code>ggml-medium.en.bin</code></td>
          <td>1.5 GB</td>
          <td>英文、品質敏感</td>
      </tr>
      <tr>
          <td><code>ggml-small.bin</code></td>
          <td>488 MB</td>
          <td>多語言（含中文）</td>
      </tr>
      <tr>
          <td><code>ggml-large-v3.bin</code></td>
          <td>3.1 GB</td>
          <td>多語言、最佳品質、跑得最慢</td>
      </tr>
  </tbody>
</table>
<p>選 <code>tiny.en</code> 是因為<strong>只驗證安裝路徑</strong>、實際日常用要 <code>small.en</code> 起跳。</p>
<p>驗證下載：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">ls -lh ~/.whisper-models/
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"># 應該看到 78 MB 的 ggml-tiny.en.bin</span></span></span></code></pre></div><h2 id="跑第一次轉錄">跑第一次轉錄</h2>
<p>需要一段測試音訊。可以用 macOS 內建 <code>say</code> 生成、再用 ffmpeg 轉成 whisper.cpp 需要的格式（16kHz mono WAV）：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="nb">cd</span> /tmp
</span></span><span class="line"><span class="ln">2</span><span class="cl">say -o sample.aiff -v Samantha <span class="s2">&#34;Hello world. This is a test of the whisper transcription system. It should produce accurate text from this short audio clip.&#34;</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">ffmpeg -loglevel error -y -i sample.aiff -ar <span class="m">16000</span> -ac <span class="m">1</span> sample.wav</span></span></code></pre></div><p><code>-ar 16000 -ac 1</code> 是 whisper.cpp 的標準輸入規格（16 kHz、單聲道、16-bit PCM）。Whisper 模型訓練時用這個 sample rate、輸入不符會降低準確度。</p>
<p>轉錄：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">whisper-cli -m ~/.whisper-models/ggml-tiny.en.bin -f /tmp/sample.wav</span></span></code></pre></div><p>預期輸出（含時間軸）：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">[00:00:00.000 --&gt; 00:00:03.980]   Hello World, this is a test of the whisper transcription system.
</span></span><span class="line"><span class="ln">2</span><span class="cl">[00:00:03.980 --&gt; 00:00:06.980]   It should produce accurate text from this short audio clip.
</span></span><span class="line"><span class="ln">3</span><span class="cl">
</span></span><span class="line"><span class="ln">4</span><span class="cl">whisper_print_timings:     load time =    39.88 ms
</span></span><span class="line"><span class="ln">5</span><span class="cl">whisper_print_timings:   encode time =   220.01 ms
</span></span><span class="line"><span class="ln">6</span><span class="cl">whisper_print_timings:    total time =   484.08 ms</span></span></code></pre></div><p>關鍵觀察：</p>
<ul>
<li><strong>484ms</strong> 處理 7 秒音訊、約 14x 即時速度。</li>
<li>轉錄結果跟原文一致（除了 <code>world</code> 大寫變 <code>World</code>）。</li>
<li>含時間軸（time stamps）、可以做 subtitle / 字幕對齊。</li>
</ul>
<p>要拿不含時間軸的純文字：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">whisper-cli -m ~/.whisper-models/ggml-tiny.en.bin -f /tmp/sample.wav -nt
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"># -nt 是 --no-timestamps</span></span></span></code></pre></div><h2 id="常用選項">常用選項</h2>
<table>
  <thead>
      <tr>
          <th>選項</th>
          <th>作用</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><code>-l zh</code></td>
          <td>指定語言（中文）；多語言 model 用、單語 model 用不到</td>
      </tr>
      <tr>
          <td><code>-otxt</code></td>
          <td>同時輸出 .txt 檔（純文字、無時間軸）</td>
      </tr>
      <tr>
          <td><code>-osrt</code></td>
          <td>同時輸出 .srt 字幕檔</td>
      </tr>
      <tr>
          <td><code>-ovtt</code></td>
          <td>同時輸出 .vtt 字幕檔</td>
      </tr>
      <tr>
          <td><code>-of OUT</code></td>
          <td>設定輸出檔名 prefix</td>
      </tr>
      <tr>
          <td><code>-t N</code></td>
          <td>用 N 個 thread（預設用 CPU 核心數）</td>
      </tr>
      <tr>
          <td><code>-pp</code></td>
          <td>print progress（顯示處理進度條、跑長音訊時開）</td>
      </tr>
  </tbody>
</table>
<p>實務常用組合：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># 字幕生成</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">whisper-cli -m ~/.whisper-models/ggml-small.en.bin <span class="se">\
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="se"></span>  -f input.wav <span class="se">\
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="se"></span>  -osrt <span class="se">\
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="se"></span>  -of output_subtitle
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="c1"># 中文轉錄</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">whisper-cli -m ~/.whisper-models/ggml-small.bin <span class="se">\
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="se"></span>  -f speech.wav <span class="se">\
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="se"></span>  -l zh</span></span></code></pre></div><h2 id="跟其他工具串接">跟其他工具串接</h2>
<p>Whisper-cli 的 stdout 是純文字、容易串 pipeline：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># 轉錄結果直接餵給 LLM 摘要</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">whisper-cli -m ~/.whisper-models/ggml-small.en.bin -f meeting.wav -nt <span class="se">\
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="se"></span>  <span class="p">|</span> curl -s http://localhost:11434/v1/chat/completions <span class="se">\
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="se"></span>    -H <span class="s2">&#34;Content-Type: application/json&#34;</span> <span class="se">\
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="se"></span>    -d @- <span class="s">&lt;&lt;EOF
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="s">{
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="s">  &#34;model&#34;: &#34;gemma3:1b&#34;,
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="s">  &#34;messages&#34;: [
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="s">    {&#34;role&#34;: &#34;system&#34;, &#34;content&#34;: &#34;Summarize the meeting transcript in 5 bullet points.&#34;},
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="s">    {&#34;role&#34;: &#34;user&#34;, &#34;content&#34;: &#34;$(cat)&#34;}
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="s">  ]
</span></span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="s">}
</span></span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="s">EOF</span></span></span></code></pre></div><p>這個 pipeline 串接到 <a href="/blog/llm/01-local-llm-services/hands-on/ollama-setup/" data-link-title="Hands-on：安裝 Ollama &#43; 拉第一個 Gemma 模型" data-link-desc="brew install ollama、launchd service、ollama pull、curl 驗證 OpenAI 相容 API">Ollama</a> 完成「語音 → 文字 → 摘要」流程、整條本地、無雲端 API。</p>
<h2 id="常見坑">常見坑</h2>
<h3 id="audio-file-not-found--format-error">「audio file not found / format error」</h3>
<p>確認 ffmpeg 已轉成 16kHz mono：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">ffprobe input.wav 2&gt;<span class="p">&amp;</span><span class="m">1</span> <span class="p">|</span> grep -E <span class="s2">&#34;Stream|Audio&#34;</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"># 應該看到：Audio: pcm_s16le, 16000 Hz, mono</span></span></span></code></pre></div><p>不是這個規格就用 ffmpeg 轉：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">ffmpeg -i input.mp3 -ar <span class="m">16000</span> -ac <span class="m">1</span> -c:a pcm_s16le output.wav</span></span></code></pre></div><h3 id="model-載入慢">Model 載入慢</h3>
<p>第一次 Metal lib 初始化要 ~7 秒、是 macOS Metal compiler 在 cache shader。後續快很多。</p>
<p>如果每次都慢、看是否 Metal cache 路徑（<code>~/Library/Caches/...</code>）有權限問題。</p>
<h3 id="中文--多語言準確度差">中文 / 多語言準確度差</h3>
<p>確認 model 不是 <code>.en</code> 後綴：<code>.en</code> model 只訓練英文、餵中文會 hallucinate。中文要用 <code>ggml-small.bin</code>、<code>ggml-medium.bin</code>、<code>ggml-large-v3.bin</code>（沒 <code>.en</code>）。</p>
<h3 id="output-拼錯字">Output 拼錯字</h3>
<p>Whisper tiny / base model 對非母音清晰、噪音多、口音重的音訊準確度差。換 small 或 medium 通常解決。</p>
<h2 id="完整-round-trip-驗證">完整 round-trip 驗證</h2>
<p>驗證 Whisper + Piper TTS 完整迴圈：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># Piper 生成 WAV</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="nb">echo</span> <span class="s2">&#34;Hello world test.&#34;</span> <span class="p">|</span> piper -m ~/.piper-voices/en_US-lessac-low.onnx -f /tmp/out.wav
</span></span><span class="line"><span class="ln">3</span><span class="cl">
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="c1"># Whisper 轉回文字</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">whisper-cli -m ~/.whisper-models/ggml-tiny.en.bin -f /tmp/out.wav -nt
</span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="c1"># 應該回：Hello world test.</span></span></span></code></pre></div><p>兩個都跑得起來表示整條 STT / TTS pipeline 工作。沒裝 Piper 的場景：用任何 16kHz 單聲道 WAV 都能驗證（macOS 內建 <code>say -o sample.aiff</code> + ffmpeg 轉檔、或從 Hugging Face 拉個 sample 音訊）、不一定要用 Piper。</p>
<p>跟其他章節的關係：完整 hands-on 系列見 <a href="/blog/llm/01-local-llm-services/hands-on/" data-link-title="Hands-on：本地 AI 工具實作筆記" data-link-desc="Ollama / ComfyUI / Whisper / Piper TTS：實際安裝、驗證、跑通的紀錄。隨工具版本演化、跟 1.x 原理章節互補。">Hands-on 章節索引</a>、本地 LLM 加 speech 在隱私 / 資料流上的位置見 <a href="/blog/llm/00-foundations/privacy-data-flow/" data-link-title="0.7 隱私 / 資安的資料流原理" data-link-desc="從「位置」到「資料流」的思考升級：信任邊界、合約模型、零信任原則套用到 LLM 工作流">0.7 隱私資料流原理</a>、排錯走三層方法論見 <a href="/blog/llm/01-local-llm-services/troubleshooting/" data-link-title="1.7 排錯方法論：用三層架構做故障定位" data-link-desc="故障定位的分層思考、症狀到層級的對應反射、log 在三層的角色差異、最小可重現的縮減策略">1.7 排錯方法論</a>。</p>
<h2 id="何時這篇會過時">何時這篇會過時</h2>
<ul>
<li><code>brew install whisper-cpp</code> 安裝方式短期內不會變。</li>
<li>GGML model 路徑（Hugging Face <code>ggerganov/whisper.cpp</code>）穩定、是 maintainer 官方 repo。</li>
<li>模型版本會更新（large-v3 → large-v4 等）、但「下載 GGML、用 whisper-cli 餵 WAV」流程不變。</li>
<li>Metal backend 自動啟用、不需配置——Apple Silicon GPU 演化會持續增進效能但不影響介面。</li>
</ul>
<p>讀的時候若 brew 跑失敗、查 whisper.cpp GitHub release notes；模型新版本看 Hugging Face <code>ggerganov/whisper.cpp</code> repo 列表。</p>
]]></content:encoded></item><item><title>Hands-on：安裝 Piper TTS 做文字轉語音</title><link>https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/piper-tts-setup/</link><pubDate>Tue, 12 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/piper-tts-setup/</guid><description>&lt;p>本篇紀錄裝 Piper TTS 並用它合成英文語音、再用 Whisper 轉回文字做 round-trip 驗證。選 Piper 而非雲端 TTS（OpenAI / ElevenLabs）的理由：&lt;/p>
&lt;ul>
&lt;li>完全本地、隱私邊界乾淨。&lt;/li>
&lt;li>ONNX runtime、Apple Silicon 跑得動、不依賴 GPU。&lt;/li>
&lt;li>模型小（low quality ~17-65 MB、medium ~50 MB、high ~125 MB）、適合 minimal 驗證。&lt;/li>
&lt;li>CLI-first、stdin 餵文字、stdout 或檔案輸出 WAV、容易串 pipeline。&lt;/li>
&lt;/ul>
&lt;blockquote>
&lt;p>&lt;strong>驗證日期&lt;/strong>：2026-05-12
&lt;strong>Piper 版本&lt;/strong>：透過 pip 安裝
&lt;strong>示範 voice&lt;/strong>：&lt;code>en_US-lessac-low.onnx&lt;/code>（63 MB、英文女聲、low quality）
&lt;strong>實測&lt;/strong>：4 秒文字合成 &amp;lt; 1 秒、品質夠日常用&lt;/p>&lt;/blockquote>
&lt;h2 id="前置設定">前置設定&lt;/h2>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>項目&lt;/th>
 &lt;th>檢查指令&lt;/th>
 &lt;th>預期&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>Python&lt;/td>
 &lt;td>&lt;code>python3 --version&lt;/code>&lt;/td>
 &lt;td>3.11+&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>pip&lt;/td>
 &lt;td>&lt;code>pip3 --version&lt;/code>&lt;/td>
 &lt;td>25+&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>磁碟空間&lt;/td>
 &lt;td>&lt;code>df -h ~&lt;/code>&lt;/td>
 &lt;td>至少 200 MB（Piper + 一個 voice）&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>Piper 跟 Whisper 一樣分離 binary 跟 model：先裝 runtime、再下載 voice。&lt;/p>
&lt;h2 id="安裝-piper">安裝 Piper&lt;/h2>
&lt;p>&lt;code>piper-tts&lt;/code> 沒有 Homebrew formula、用 pip 裝：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">pip3 install piper-tts --break-system-packages&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;code>PEP 668&lt;/code> 是 macOS / Homebrew Python 的 external-management 機制、保護系統 Python 不被 pip 安裝污染；&lt;code>--break-system-packages&lt;/code> 是 bypass flag、跳過該檢查直接裝。比較乾淨的做法是用 venv：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">python3 -m venv ~/.piper-venv
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="nb">source&lt;/span> ~/.piper-venv/bin/activate
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">pip install piper-tts&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>但裝完 PATH 要指到 venv 的 piper、稍麻煩。本 demo 用 &lt;code>--break-system-packages&lt;/code> 簡化。實際生產建議用 venv 或 pipx。&lt;/p>
&lt;p>驗證 binary 在 PATH：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">which piper
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="c1"># /opt/homebrew/bin/piper（若 pip3 來自 Homebrew Python）&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">&lt;span class="c1"># 或 ~/Library/Python/3.x/bin/piper（若 pip3 來自系統 Python）&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl">piper --help &lt;span class="p">|&lt;/span> head -10&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;code>which piper&lt;/code> 找不到時、檢查兩個 bin 目錄哪邊有檔案、把該目錄加進 &lt;code>PATH&lt;/code>。&lt;/p>
&lt;h2 id="下載-voice-model">下載 Voice Model&lt;/h2>
&lt;p>Piper 用 ONNX 格式的 voice model、每個 voice 是一對 &lt;code>.onnx&lt;/code>（model 權重）+ &lt;code>.onnx.json&lt;/code>（metadata、含採樣率、phoneme map）。&lt;/p>
&lt;p>從 Hugging Face &lt;code>rhasspy/piper-voices&lt;/code> repo 拉：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">mkdir -p ~/.piper-voices
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="nb">cd&lt;/span> ~/.piper-voices
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">&lt;span class="c1"># 英文女聲、low quality（小、快）&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl">curl -L -o en_US-lessac-low.onnx &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">6&lt;/span>&lt;span class="cl">&lt;span class="se">&lt;/span> &lt;span class="s2">&amp;#34;https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_US/lessac/low/en_US-lessac-low.onnx&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">7&lt;/span>&lt;span class="cl">curl -L -o en_US-lessac-low.onnx.json &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">8&lt;/span>&lt;span class="cl">&lt;span class="se">&lt;/span> &lt;span class="s2">&amp;#34;https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_US/lessac/low/en_US-lessac-low.onnx.json&amp;#34;&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>可用 voice quality 等級：&lt;/p></description><content:encoded><![CDATA[<p>本篇紀錄裝 Piper TTS 並用它合成英文語音、再用 Whisper 轉回文字做 round-trip 驗證。選 Piper 而非雲端 TTS（OpenAI / ElevenLabs）的理由：</p>
<ul>
<li>完全本地、隱私邊界乾淨。</li>
<li>ONNX runtime、Apple Silicon 跑得動、不依賴 GPU。</li>
<li>模型小（low quality ~17-65 MB、medium ~50 MB、high ~125 MB）、適合 minimal 驗證。</li>
<li>CLI-first、stdin 餵文字、stdout 或檔案輸出 WAV、容易串 pipeline。</li>
</ul>
<blockquote>
<p><strong>驗證日期</strong>：2026-05-12
<strong>Piper 版本</strong>：透過 pip 安裝
<strong>示範 voice</strong>：<code>en_US-lessac-low.onnx</code>（63 MB、英文女聲、low quality）
<strong>實測</strong>：4 秒文字合成 &lt; 1 秒、品質夠日常用</p></blockquote>
<h2 id="前置設定">前置設定</h2>
<table>
  <thead>
      <tr>
          <th>項目</th>
          <th>檢查指令</th>
          <th>預期</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Python</td>
          <td><code>python3 --version</code></td>
          <td>3.11+</td>
      </tr>
      <tr>
          <td>pip</td>
          <td><code>pip3 --version</code></td>
          <td>25+</td>
      </tr>
      <tr>
          <td>磁碟空間</td>
          <td><code>df -h ~</code></td>
          <td>至少 200 MB（Piper + 一個 voice）</td>
      </tr>
  </tbody>
</table>
<p>Piper 跟 Whisper 一樣分離 binary 跟 model：先裝 runtime、再下載 voice。</p>
<h2 id="安裝-piper">安裝 Piper</h2>
<p><code>piper-tts</code> 沒有 Homebrew formula、用 pip 裝：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">pip3 install piper-tts --break-system-packages</span></span></code></pre></div><p><code>PEP 668</code> 是 macOS / Homebrew Python 的 external-management 機制、保護系統 Python 不被 pip 安裝污染；<code>--break-system-packages</code> 是 bypass flag、跳過該檢查直接裝。比較乾淨的做法是用 venv：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">python3 -m venv ~/.piper-venv
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="nb">source</span> ~/.piper-venv/bin/activate
</span></span><span class="line"><span class="ln">3</span><span class="cl">pip install piper-tts</span></span></code></pre></div><p>但裝完 PATH 要指到 venv 的 piper、稍麻煩。本 demo 用 <code>--break-system-packages</code> 簡化。實際生產建議用 venv 或 pipx。</p>
<p>驗證 binary 在 PATH：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">which piper
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"># /opt/homebrew/bin/piper（若 pip3 來自 Homebrew Python）</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="c1"># 或 ~/Library/Python/3.x/bin/piper（若 pip3 來自系統 Python）</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">
</span></span><span class="line"><span class="ln">5</span><span class="cl">piper --help <span class="p">|</span> head -10</span></span></code></pre></div><p><code>which piper</code> 找不到時、檢查兩個 bin 目錄哪邊有檔案、把該目錄加進 <code>PATH</code>。</p>
<h2 id="下載-voice-model">下載 Voice Model</h2>
<p>Piper 用 ONNX 格式的 voice model、每個 voice 是一對 <code>.onnx</code>（model 權重）+ <code>.onnx.json</code>（metadata、含採樣率、phoneme map）。</p>
<p>從 Hugging Face <code>rhasspy/piper-voices</code> repo 拉：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">mkdir -p ~/.piper-voices
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="nb">cd</span> ~/.piper-voices
</span></span><span class="line"><span class="ln">3</span><span class="cl">
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="c1"># 英文女聲、low quality（小、快）</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">curl -L -o en_US-lessac-low.onnx <span class="se">\
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="se"></span>  <span class="s2">&#34;https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_US/lessac/low/en_US-lessac-low.onnx&#34;</span>
</span></span><span class="line"><span class="ln">7</span><span class="cl">curl -L -o en_US-lessac-low.onnx.json <span class="se">\
</span></span></span><span class="line"><span class="ln">8</span><span class="cl"><span class="se"></span>  <span class="s2">&#34;https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_US/lessac/low/en_US-lessac-low.onnx.json&#34;</span></span></span></code></pre></div><p>可用 voice quality 等級：</p>
<table>
  <thead>
      <tr>
          <th>Quality</th>
          <th>大小</th>
          <th>用途</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><code>low</code></td>
          <td>17-65 MB</td>
          <td>快、品質粗糙、適合 prototype</td>
      </tr>
      <tr>
          <td><code>medium</code></td>
          <td>50-100 MB</td>
          <td>平衡、日常用</td>
      </tr>
      <tr>
          <td><code>high</code></td>
          <td>100-200 MB</td>
          <td>品質佳、合成略慢</td>
      </tr>
      <tr>
          <td><code>x_low</code></td>
          <td>&lt; 20 MB</td>
          <td>極小、品質明顯差、適合受限環境</td>
      </tr>
  </tbody>
</table>
<p>語言 / 地區覆蓋（部分）：</p>
<table>
  <thead>
      <tr>
          <th>Locale</th>
          <th>Voice 範例</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><code>en_US</code></td>
          <td>lessac、ryan、amy、libritts</td>
      </tr>
      <tr>
          <td><code>en_GB</code></td>
          <td>alan、cori、jenny</td>
      </tr>
      <tr>
          <td><code>zh_CN</code></td>
          <td>huayan（北京話）</td>
      </tr>
      <tr>
          <td><code>ja_JP</code>（社群）</td>
          <td>較少</td>
      </tr>
      <tr>
          <td><code>de_DE</code> / <code>fr_FR</code> / <code>es_ES</code> 等</td>
          <td>各有多個</td>
      </tr>
  </tbody>
</table>
<p>完整清單在 <code>rhasspy/piper-voices</code> 的 <a href="https://github.com/rhasspy/piper">VOICES.md</a>。</p>
<p>驗證下載：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">ls -lh ~/.piper-voices/
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"># en_US-lessac-low.onnx       63M</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="c1"># en_US-lessac-low.onnx.json  4.9K</span></span></span></code></pre></div><h2 id="跑第一次合成">跑第一次合成</h2>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="nb">echo</span> <span class="s2">&#34;Hello from Piper TTS, this is a synthesized voice test.&#34;</span> <span class="se">\
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="se"></span>  <span class="p">|</span> piper -m ~/.piper-voices/en_US-lessac-low.onnx -f /tmp/piper-out.wav</span></span></code></pre></div><p>說明：</p>
<ul>
<li>文字從 stdin 進、是 Piper 的標準輸入方式。</li>
<li><code>-m</code>：voice model <code>.onnx</code> path。Piper 自動找同目錄的 <code>.onnx.json</code>。</li>
<li><code>-f</code>：output WAV path。不指定的話直接寫 stdout（可以 pipe 到 <code>aplay</code> / <code>afplay</code> 即時播放）。</li>
</ul>
<p>預期輸出：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">ls -lh /tmp/piper-out.wav
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"># 128 KB</span></span></span></code></pre></div><p>驗證 WAV 規格：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">file /tmp/piper-out.wav
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"># RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 16000 Hz</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">
</span></span><span class="line"><span class="ln">4</span><span class="cl">ffprobe -loglevel error -show_format /tmp/piper-out.wav <span class="p">|</span> grep duration
</span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="c1"># duration=3.984000</span></span></span></code></pre></div><p>16-bit PCM、16 kHz mono——跟 <a href="/blog/llm/01-local-llm-services/hands-on/whisper-setup/" data-link-title="Hands-on：安裝 whisper.cpp 做語音轉文字" data-link-desc="brew install whisper-cpp、下載 GGML model、Metal 加速、ffmpeg 餵 WAV、484ms 完成 7 秒音訊轉錄">Whisper</a> 期望的輸入規格一致、可以直接 round-trip。</p>
<p>播放確認：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">afplay /tmp/piper-out.wav</span></span></code></pre></div><h2 id="常用選項">常用選項</h2>
<table>
  <thead>
      <tr>
          <th>選項</th>
          <th>作用</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><code>-m MODEL</code></td>
          <td>voice model <code>.onnx</code> 路徑（必填）</td>
      </tr>
      <tr>
          <td><code>-c CONFIG</code></td>
          <td>metadata json 路徑（預設自動找同名 <code>.onnx.json</code>）</td>
      </tr>
      <tr>
          <td><code>-i FILE</code></td>
          <td>輸入文字檔（替代 stdin）</td>
      </tr>
      <tr>
          <td><code>-f OUTPUT</code></td>
          <td>輸出 WAV 路徑</td>
      </tr>
      <tr>
          <td><code>-d DIR</code></td>
          <td>輸出目錄（多句時自動分檔）</td>
      </tr>
      <tr>
          <td><code>--length-scale FACTOR</code></td>
          <td>速度調整（&lt; 1 加速、&gt; 1 減速、預設 1.0）</td>
      </tr>
      <tr>
          <td><code>--volume FACTOR</code></td>
          <td>音量調整（0.0-1.0）</td>
      </tr>
      <tr>
          <td><code>-s SPEAKER</code></td>
          <td>多 speaker model 選 speaker（如 libritts）</td>
      </tr>
      <tr>
          <td><code>--cuda</code></td>
          <td>用 CUDA（Apple Silicon 用不到、留 default）</td>
      </tr>
  </tbody>
</table>
<p>典型應用：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># 從文字檔合成</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">piper -m ~/.piper-voices/en_US-lessac-low.onnx <span class="se">\
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="se"></span>  -i article.txt <span class="se">\
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="se"></span>  -f narration.wav
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="c1"># 多句子分檔</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">piper -m ~/.piper-voices/en_US-lessac-medium.onnx <span class="se">\
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="se"></span>  -i script.txt <span class="se">\
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="se"></span>  -d ~/audio-output/ <span class="se">\
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="se"></span>  --output-dir-naming text
</span></span><span class="line"><span class="ln">11</span><span class="cl">
</span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="c1"># 慢速朗讀（學習用）</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">piper -m ~/.piper-voices/en_US-lessac-low.onnx <span class="se">\
</span></span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="se"></span>  --length-scale 1.4 <span class="se">\
</span></span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="se"></span>  -f slow.wav <span class="o">&lt;&lt;&lt;</span> <span class="s2">&#34;Slowly read this sentence.&#34;</span></span></span></code></pre></div><h2 id="round-trip-驗證">Round-Trip 驗證</h2>
<p>確認 TTS + STT 整條串得起來：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># 1. Piper TTS：文字 → WAV</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="nb">echo</span> <span class="s2">&#34;The quick brown fox jumps over the lazy dog.&#34;</span> <span class="se">\
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="se"></span>  <span class="p">|</span> piper -m ~/.piper-voices/en_US-lessac-low.onnx -f /tmp/test.wav
</span></span><span class="line"><span class="ln">4</span><span class="cl">
</span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="c1"># 2. Whisper STT：WAV → 文字</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl">whisper-cli -m ~/.whisper-models/ggml-tiny.en.bin -f /tmp/test.wav -nt</span></span></code></pre></div><p>預期 Whisper 回應接近原文字（可能大小寫 / 標點稍變）。Round-trip 成功表示：</p>
<ul>
<li>Piper 輸出格式（16kHz mono WAV）符合 Whisper 輸入需求。</li>
<li>兩個模型對英文的訓練分佈相容。</li>
</ul>
<h2 id="跟-llm-串接llm-說話的-minimal-pipeline">跟 LLM 串接：「LLM 說話」的 minimal pipeline</h2>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># 1. LLM 生成回答</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="nv">ANSWER</span><span class="o">=</span><span class="k">$(</span>curl -s http://localhost:11434/v1/chat/completions <span class="se">\
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="se"></span>  -H <span class="s2">&#34;Content-Type: application/json&#34;</span> <span class="se">\
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="se"></span>  -d <span class="s1">&#39;{
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="s1">    &#34;model&#34;: &#34;gemma3:1b&#34;,
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="s1">    &#34;messages&#34;: [{&#34;role&#34;:&#34;user&#34;,&#34;content&#34;:&#34;Tell me a one-sentence joke.&#34;}],
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="s1">    &#34;stream&#34;: false
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="s1">  }&#39;</span> <span class="p">|</span> python3 -c <span class="s2">&#34;import json,sys; print(json.load(sys.stdin)[&#39;choices&#39;][0][&#39;message&#39;][&#39;content&#39;])&#34;</span><span class="k">)</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="c1"># 2. Piper 把回答念出來</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="nb">echo</span> <span class="s2">&#34;</span><span class="nv">$ANSWER</span><span class="s2">&#34;</span> <span class="p">|</span> piper -m ~/.piper-voices/en_US-lessac-low.onnx -f /tmp/llm-says.wav
</span></span><span class="line"><span class="ln">12</span><span class="cl">
</span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="c1"># 3. 播放</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">afplay /tmp/llm-says.wav</span></span></code></pre></div><p>三行 shell 完成「Local LLM 講笑話」整條 pipeline、無雲端、無 GPU。</p>
<h2 id="常見坑">常見坑</h2>
<h3 id="中文--多語言">中文 / 多語言</h3>
<p><code>en_US-lessac-low</code> 是英文 voice、餵中文會發音怪。中文要下載 <code>zh_CN-huayan-*</code>：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">curl -L -o ~/.piper-voices/zh_CN-huayan-medium.onnx <span class="se">\
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="se"></span>  <span class="s2">&#34;https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/zh/zh_CN/huayan/medium/zh_CN-huayan-medium.onnx&#34;</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">curl -L -o ~/.piper-voices/zh_CN-huayan-medium.onnx.json <span class="se">\
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="se"></span>  <span class="s2">&#34;https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/zh/zh_CN/huayan/medium/zh_CN-huayan-medium.onnx.json&#34;</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">
</span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="nb">echo</span> <span class="s2">&#34;你好，這是 Piper TTS 的中文測試。&#34;</span> <span class="se">\
</span></span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="se"></span>  <span class="p">|</span> piper -m ~/.piper-voices/zh_CN-huayan-medium.onnx -f /tmp/zh-out.wav</span></span></code></pre></div><p>zh_CN 預設是北京話腔調。</p>
<h3 id="--break-system-packages-警告"><code>--break-system-packages</code> 警告</h3>
<p>macOS 系統 Python 3.13+ 預設禁止 pip 直接裝。安全做法用 venv 或 pipx；不想搞 venv 就用 <code>--break-system-packages</code> flag（會跳警告但能裝）。長期建議遷到 venv、避免污染系統 Python。</p>
<h3 id="voice-quality-不夠">Voice quality 不夠</h3>
<p><code>low</code> quality 的 voice 適合驗證 / prototype、實際用 <code>medium</code> 或 <code>high</code>。低品質 voice 在長段文字會聽起來機械、自然度差。</p>
<h3 id="sample-rate-mismatch">Sample rate mismatch</h3>
<p>Voice metadata（<code>.onnx.json</code> 內 <code>sample_rate</code>）決定輸出 sample rate、不同 voice 可能不同（多數 22050 或 16000）。Whisper 期望 16000、若 Piper 輸出 22050、可能需要 ffmpeg 降採樣：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">ffmpeg -i piper-out.wav -ar <span class="m">16000</span> piper-out-16k.wav</span></span></code></pre></div><p><code>en_US-lessac-low</code> 本來就是 16k、沒這問題。</p>
<h2 id="何時這篇會過時">何時這篇會過時</h2>
<ul>
<li><code>pip install piper-tts</code> 安裝方式可能演化（轉純 binary release？）、但 ONNX model + CLI invocation 形式應該穩定。</li>
<li>Voice model 格式（ONNX）是 web 通用標準、未來增加 quality / locale、現有 voice 不會被 deprecate。</li>
<li>Hugging Face <code>rhasspy/piper-voices</code> repo 是 maintainer 官方、不會消失。</li>
</ul>
<p>讀的時候若 pip install 失敗、查 <a href="https://github.com/rhasspy/piper">piper GitHub</a> 最新 install 路徑；voice 列表看 piper-voices repo。</p>
<p>跟其他 hands-on 章節的關係：完整 hands-on 系列見 <a href="/blog/llm/01-local-llm-services/hands-on/" data-link-title="Hands-on：本地 AI 工具實作筆記" data-link-desc="Ollama / ComfyUI / Whisper / Piper TTS：實際安裝、驗證、跑通的紀錄。隨工具版本演化、跟 1.x 原理章節互補。">Hands-on 章節索引</a>、語音 round-trip 對接見 <a href="/blog/llm/01-local-llm-services/hands-on/whisper-setup/" data-link-title="Hands-on：安裝 whisper.cpp 做語音轉文字" data-link-desc="brew install whisper-cpp、下載 GGML model、Metal 加速、ffmpeg 餵 WAV、484ms 完成 7 秒音訊轉錄">Whisper STT</a>、跨服務 lifecycle 與記憶體管理見 <a href="/blog/llm/01-local-llm-services/hands-on/resource-management/" data-link-title="Hands-on：LLM 運行中 &#43; 結束的資源管理" data-link-desc="RAM / 磁碟 / port 三個 dimension 的觀察跟釋放、Ollama keep_alive 跟 ComfyUI 兩種 lifecycle 對比、實測釋放數字">Resource management</a>。</p>
]]></content:encoded></item><item><title>Hands-on：用 blog content 當 corpus 跑 RAG</title><link>https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/rag-demo/</link><pubDate>Tue, 12 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/rag-demo/</guid><description>&lt;p>本篇把 &lt;a href="https://tarrragon.github.io/blog/llm/04-applications/rag-principles/" data-link-title="4.1 RAG 原理：retrieval &amp;#43; augmentation 模式" data-link-desc="為什麼模型需要外掛知識、語意相似 vs 字面相似、chunking 的本質取捨、retrieval 失敗的根本原因">4.1 RAG 原理&lt;/a> 的概念落到一個能跑的最小實作：用本 blog 的 &lt;code>content/llm/&lt;/code> 當 corpus、Ollama 的 &lt;code>nomic-embed-text&lt;/code> 做 embedding、&lt;code>gemma3:1b&lt;/code> 做生成、兩個 Python 檔案完成 ingest + query 整條鏈。實作刻意保持 minimal、為的是把每一段都看清楚、跟原理對應。&lt;/p>
&lt;blockquote>
&lt;p>&lt;strong>驗證日期&lt;/strong>：2026-05-12
&lt;strong>環境&lt;/strong>：macOS、Ollama 0.23.2、&lt;code>nomic-embed-text&lt;/code>、&lt;code>gemma3:1b&lt;/code>
&lt;strong>Corpus&lt;/strong>：本 blog 的 &lt;code>content/llm/&lt;/code>、71 個 markdown 檔
&lt;strong>結果&lt;/strong>：22 秒索引 463 個 chunk、retrieval 命中率好、generation 受 1B 模型能力限制——剛好示範「retrieval 跟 generation 各自會失敗」的兩段式失敗模式&lt;/p>&lt;/blockquote>
&lt;h2 id="前置設定">前置設定&lt;/h2>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>項目&lt;/th>
 &lt;th>來源 / 指令&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>Ollama 跑著&lt;/td>
 &lt;td>見 &lt;a href="https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/ollama-setup/" data-link-title="Hands-on：安裝 Ollama &amp;#43; 拉第一個 Gemma 模型" data-link-desc="brew install ollama、launchd service、ollama pull、curl 驗證 OpenAI 相容 API">Ollama 安裝&lt;/a>&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Embedding 模型&lt;/td>
 &lt;td>&lt;code>ollama pull nomic-embed-text&lt;/code>（274 MB、768 維）&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Chat 模型&lt;/td>
 &lt;td>&lt;code>ollama pull gemma3:1b&lt;/code>（815 MB）。能力弱但夠驗證流程；上 31B 級才能拿到「真正能用」的 answer 品質&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Python&lt;/td>
 &lt;td>3.11+（標準 lib &lt;code>urllib&lt;/code> / &lt;code>pickle&lt;/code> 即可、不需要外部依賴）&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;h3 id="驗證-embedding-api-可用">驗證 embedding API 可用&lt;/h3>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">curl -s http://localhost:11434/api/embeddings &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="se">&lt;/span> -d &lt;span class="s1">&amp;#39;{&amp;#34;model&amp;#34;:&amp;#34;nomic-embed-text&amp;#34;,&amp;#34;prompt&amp;#34;:&amp;#34;hello world&amp;#34;}&amp;#39;&lt;/span> &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">&lt;span class="se">&lt;/span> &lt;span class="p">|&lt;/span> python3 -c &lt;span class="s2">&amp;#34;import json,sys; r=json.load(sys.stdin); print(&amp;#39;dim:&amp;#39;, len(r[&amp;#39;embedding&amp;#39;]))&amp;#34;&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>逐項說明：&lt;/p>
&lt;ul>
&lt;li>&lt;code>curl -s&lt;/code>：&lt;code>-s&lt;/code> 是 silent 模式、不顯示下載進度條（不然會混進 stdout、後面 python parse 會炸）。&lt;/li>
&lt;li>&lt;code>http://localhost:11434/api/embeddings&lt;/code>：用 Ollama &lt;strong>原生&lt;/strong> embedding endpoint。也有 &lt;code>/v1/embeddings&lt;/code>（OpenAI 相容）、但原生回應結構較簡（直接 &lt;code>{&amp;quot;embedding&amp;quot;: [...]}&lt;/code>、不是 OpenAI 那種 &lt;code>{&amp;quot;data&amp;quot;: [{&amp;quot;embedding&amp;quot;: [...]}]}&lt;/code> 巢狀）。本 demo 用原生、parse 更直接。&lt;/li>
&lt;li>&lt;code>-d '{&amp;quot;model&amp;quot;:&amp;quot;...&amp;quot;,&amp;quot;prompt&amp;quot;:&amp;quot;...&amp;quot;}'&lt;/code>：JSON payload。&lt;code>model&lt;/code> 是 Ollama tag、&lt;code>prompt&lt;/code> 是要 embed 的文字。&lt;/li>
&lt;li>&lt;code>python3 -c &amp;quot;...&amp;quot;&lt;/code>：stdin 接 curl 輸出、parse JSON、印 embedding 長度。&lt;/li>
&lt;li>&lt;strong>為什麼測 &lt;code>dim: 768&lt;/code>&lt;/strong>：&lt;code>nomic-embed-text&lt;/code> 模型架構決定 embedding 維度是 768。每次 embed 任何文字都會回固定 768 維向量、是 retrieval 的基本資料形狀。看到 &lt;code>dim: 768&lt;/code> 表示：API 通了、模型載入了、輸出形狀對。&lt;/li>
&lt;/ul>
&lt;h2 id="設計取捨">設計取捨&lt;/h2>
&lt;p>實作前先對齊 &lt;a href="https://tarrragon.github.io/blog/llm/04-applications/rag-principles/" data-link-title="4.1 RAG 原理：retrieval &amp;#43; augmentation 模式" data-link-desc="為什麼模型需要外掛知識、語意相似 vs 字面相似、chunking 的本質取捨、retrieval 失敗的根本原因">4.1 RAG 原理&lt;/a> 提的設計取捨、決定每段怎麼做：&lt;/p>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>取捨點&lt;/th>
 &lt;th>本 demo 的選擇&lt;/th>
 &lt;th>Trade-off&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>Chunking 粒度&lt;/td>
 &lt;td>段落感知 + 軟 token cap（~400 token）&lt;/td>
 &lt;td>簡單、保留段落邊界；不做語意 chunking&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Embedding 模型&lt;/td>
 &lt;td>&lt;code>nomic-embed-text&lt;/code>（768 維）&lt;/td>
 &lt;td>主流、Ollama 內建、英文為主；中文混合場景仍可運作&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>向量儲存&lt;/td>
 &lt;td>Python pickle 檔&lt;/td>
 &lt;td>463 chunks 用 in-memory 完全夠；何時該換見 &lt;a href="https://tarrragon.github.io/blog/llm/04-applications/vector-storage-engineering/" data-link-title="4.22 RAG storage 工程：從 pickle 到 vector database 的選型判讀" data-link-desc="RAG storage backend 選型：規模到哪個階段該從 in-memory 升級到 vector DB、dependency chain 如何收窄選項">4.22 RAG storage 工程&lt;/a>&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Retrieval&lt;/td>
 &lt;td>Cosine similarity、top-K&lt;/td>
 &lt;td>無 hybrid、無 re-ranker；夠驗證、品質受 embedding 限制&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Generation&lt;/td>
 &lt;td>&lt;code>gemma3:1b&lt;/code> 純 Ollama OpenAI 相容 API&lt;/td>
 &lt;td>1B 模型能力弱、會編造；用來示範 retrieval 跟 generation 兩段分離&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>這些選擇都對應到 4.0 章節的「會變的部分」清單——可預期半年後 embedding 模型有新選擇、chunking 有更好策略、re-ranker 變主流。但骨架（retrieval + augmentation 兩段式）不變。&lt;/p></description><content:encoded><![CDATA[<p>本篇把 <a href="/blog/llm/04-applications/rag-principles/" data-link-title="4.1 RAG 原理：retrieval &#43; augmentation 模式" data-link-desc="為什麼模型需要外掛知識、語意相似 vs 字面相似、chunking 的本質取捨、retrieval 失敗的根本原因">4.1 RAG 原理</a> 的概念落到一個能跑的最小實作：用本 blog 的 <code>content/llm/</code> 當 corpus、Ollama 的 <code>nomic-embed-text</code> 做 embedding、<code>gemma3:1b</code> 做生成、兩個 Python 檔案完成 ingest + query 整條鏈。實作刻意保持 minimal、為的是把每一段都看清楚、跟原理對應。</p>
<blockquote>
<p><strong>驗證日期</strong>：2026-05-12
<strong>環境</strong>：macOS、Ollama 0.23.2、<code>nomic-embed-text</code>、<code>gemma3:1b</code>
<strong>Corpus</strong>：本 blog 的 <code>content/llm/</code>、71 個 markdown 檔
<strong>結果</strong>：22 秒索引 463 個 chunk、retrieval 命中率好、generation 受 1B 模型能力限制——剛好示範「retrieval 跟 generation 各自會失敗」的兩段式失敗模式</p></blockquote>
<h2 id="前置設定">前置設定</h2>
<table>
  <thead>
      <tr>
          <th>項目</th>
          <th>來源 / 指令</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Ollama 跑著</td>
          <td>見 <a href="/blog/llm/01-local-llm-services/hands-on/ollama-setup/" data-link-title="Hands-on：安裝 Ollama &#43; 拉第一個 Gemma 模型" data-link-desc="brew install ollama、launchd service、ollama pull、curl 驗證 OpenAI 相容 API">Ollama 安裝</a></td>
      </tr>
      <tr>
          <td>Embedding 模型</td>
          <td><code>ollama pull nomic-embed-text</code>（274 MB、768 維）</td>
      </tr>
      <tr>
          <td>Chat 模型</td>
          <td><code>ollama pull gemma3:1b</code>（815 MB）。能力弱但夠驗證流程；上 31B 級才能拿到「真正能用」的 answer 品質</td>
      </tr>
      <tr>
          <td>Python</td>
          <td>3.11+（標準 lib <code>urllib</code> / <code>pickle</code> 即可、不需要外部依賴）</td>
      </tr>
  </tbody>
</table>
<h3 id="驗證-embedding-api-可用">驗證 embedding API 可用</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">curl -s http://localhost:11434/api/embeddings <span class="se">\
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="se"></span>  -d <span class="s1">&#39;{&#34;model&#34;:&#34;nomic-embed-text&#34;,&#34;prompt&#34;:&#34;hello world&#34;}&#39;</span> <span class="se">\
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="se"></span>  <span class="p">|</span> python3 -c <span class="s2">&#34;import json,sys; r=json.load(sys.stdin); print(&#39;dim:&#39;, len(r[&#39;embedding&#39;]))&#34;</span></span></span></code></pre></div><p>逐項說明：</p>
<ul>
<li><code>curl -s</code>：<code>-s</code> 是 silent 模式、不顯示下載進度條（不然會混進 stdout、後面 python parse 會炸）。</li>
<li><code>http://localhost:11434/api/embeddings</code>：用 Ollama <strong>原生</strong> embedding endpoint。也有 <code>/v1/embeddings</code>（OpenAI 相容）、但原生回應結構較簡（直接 <code>{&quot;embedding&quot;: [...]}</code>、不是 OpenAI 那種 <code>{&quot;data&quot;: [{&quot;embedding&quot;: [...]}]}</code> 巢狀）。本 demo 用原生、parse 更直接。</li>
<li><code>-d '{&quot;model&quot;:&quot;...&quot;,&quot;prompt&quot;:&quot;...&quot;}'</code>：JSON payload。<code>model</code> 是 Ollama tag、<code>prompt</code> 是要 embed 的文字。</li>
<li><code>python3 -c &quot;...&quot;</code>：stdin 接 curl 輸出、parse JSON、印 embedding 長度。</li>
<li><strong>為什麼測 <code>dim: 768</code></strong>：<code>nomic-embed-text</code> 模型架構決定 embedding 維度是 768。每次 embed 任何文字都會回固定 768 維向量、是 retrieval 的基本資料形狀。看到 <code>dim: 768</code> 表示：API 通了、模型載入了、輸出形狀對。</li>
</ul>
<h2 id="設計取捨">設計取捨</h2>
<p>實作前先對齊 <a href="/blog/llm/04-applications/rag-principles/" data-link-title="4.1 RAG 原理：retrieval &#43; augmentation 模式" data-link-desc="為什麼模型需要外掛知識、語意相似 vs 字面相似、chunking 的本質取捨、retrieval 失敗的根本原因">4.1 RAG 原理</a> 提的設計取捨、決定每段怎麼做：</p>
<table>
  <thead>
      <tr>
          <th>取捨點</th>
          <th>本 demo 的選擇</th>
          <th>Trade-off</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Chunking 粒度</td>
          <td>段落感知 + 軟 token cap（~400 token）</td>
          <td>簡單、保留段落邊界；不做語意 chunking</td>
      </tr>
      <tr>
          <td>Embedding 模型</td>
          <td><code>nomic-embed-text</code>（768 維）</td>
          <td>主流、Ollama 內建、英文為主；中文混合場景仍可運作</td>
      </tr>
      <tr>
          <td>向量儲存</td>
          <td>Python pickle 檔</td>
          <td>463 chunks 用 in-memory 完全夠；何時該換見 <a href="/blog/llm/04-applications/vector-storage-engineering/" data-link-title="4.22 RAG storage 工程：從 pickle 到 vector database 的選型判讀" data-link-desc="RAG storage backend 選型：規模到哪個階段該從 in-memory 升級到 vector DB、dependency chain 如何收窄選項">4.22 RAG storage 工程</a></td>
      </tr>
      <tr>
          <td>Retrieval</td>
          <td>Cosine similarity、top-K</td>
          <td>無 hybrid、無 re-ranker；夠驗證、品質受 embedding 限制</td>
      </tr>
      <tr>
          <td>Generation</td>
          <td><code>gemma3:1b</code> 純 Ollama OpenAI 相容 API</td>
          <td>1B 模型能力弱、會編造；用來示範 retrieval 跟 generation 兩段分離</td>
      </tr>
  </tbody>
</table>
<p>這些選擇都對應到 4.0 章節的「會變的部分」清單——可預期半年後 embedding 模型有新選擇、chunking 有更好策略、re-ranker 變主流。但骨架（retrieval + augmentation 兩段式）不變。</p>
<h2 id="ingest把-corpus-變索引">Ingest：把 corpus 變索引</h2>
<p>完整檔案：<code>scripts/rag-demo/ingest.py</code>（本 repo 下）。三段 function：切 chunk、embed、走訪 + 持久化。</p>
<h3 id="1-slice_markdown段落感知的-chunk-切割">1. <code>slice_markdown</code>：段落感知的 chunk 切割</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="k">def</span> <span class="nf">slice_markdown</span><span class="p">(</span><span class="n">text</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span> <span class="n">soft_token_cap</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="mi">400</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">list</span><span class="p">[</span><span class="nb">str</span><span class="p">]:</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="n">paragraphs</span> <span class="o">=</span> <span class="p">[</span><span class="n">p</span><span class="o">.</span><span class="n">strip</span><span class="p">()</span> <span class="k">for</span> <span class="n">p</span> <span class="ow">in</span> <span class="n">re</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="sa">r</span><span class="s2">&#34;\n\s*\n&#34;</span><span class="p">,</span> <span class="n">text</span><span class="p">)</span> <span class="k">if</span> <span class="n">p</span><span class="o">.</span><span class="n">strip</span><span class="p">()]</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">    <span class="n">chunks</span> <span class="o">=</span> <span class="p">[]</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">    <span class="n">buf</span><span class="p">,</span> <span class="n">buf_len</span> <span class="o">=</span> <span class="p">[],</span> <span class="mi">0</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">    <span class="k">for</span> <span class="n">p</span> <span class="ow">in</span> <span class="n">paragraphs</span><span class="p">:</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">        <span class="n">plen</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">p</span><span class="p">)</span> <span class="o">/</span> <span class="mi">2</span>  <span class="c1"># char-count / 2 ≈ token (CJK + English heuristic)</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">        <span class="k">if</span> <span class="n">buf</span> <span class="ow">and</span> <span class="n">buf_len</span> <span class="o">+</span> <span class="n">plen</span> <span class="o">&gt;</span> <span class="n">soft_token_cap</span><span class="p">:</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">            <span class="n">chunks</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="s2">&#34;</span><span class="se">\n\n</span><span class="s2">&#34;</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">buf</span><span class="p">))</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">            <span class="n">buf</span><span class="p">,</span> <span class="n">buf_len</span> <span class="o">=</span> <span class="p">[],</span> <span class="mi">0</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">        <span class="n">buf</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">p</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">        <span class="n">buf_len</span> <span class="o">+=</span> <span class="n">plen</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">    <span class="k">if</span> <span class="n">buf</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">        <span class="n">chunks</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="s2">&#34;</span><span class="se">\n\n</span><span class="s2">&#34;</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">buf</span><span class="p">))</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">    <span class="k">return</span> <span class="n">chunks</span></span></span></code></pre></div><p><strong>每段做什麼</strong>：</p>
<ol>
<li><strong><code>re.split(r&quot;\n\s*\n&quot;, text)</code></strong>：用「空白行」當分隔符切段落。<code>\n\s*\n</code> 比 <code>\n\n</code> 寬一點、允許中間有 whitespace（空白、tab）。Markdown 段落的標準分隔是空白行、這個 regex 捕捉所有段落邊界。</li>
<li><strong><code>[p.strip() for ... if p.strip()]</code></strong>：每段去除前後空白、過濾掉純空段落。</li>
<li><strong><code>buf, buf_len = [], 0</code></strong>：累積一個正在構建的 chunk。<code>buf</code> 是段落 list、<code>buf_len</code> 是該 chunk 的 token 累計估算。</li>
<li><strong><code>plen = len(p) / 2</code></strong>：估算這段的 token 數。</li>
<li><strong><code>if buf and buf_len + plen &gt; soft_token_cap</code></strong>：「greedy pack」邏輯——如果加上這段就會超過 cap、把目前 buffer flush 成一個 chunk、再開新 buffer 裝這段。</li>
<li><strong><code>if buf: chunks.append(...)</code></strong>：迴圈結束後、最後一個 buffer 還沒 flush、補上。</li>
</ol>
<p><strong>為什麼這樣設計</strong>：</p>
<ul>
<li><strong>為什麼 paragraph-aware、不是固定 token cap</strong>：<a href="/blog/llm/04-applications/rag-principles/" data-link-title="4.1 RAG 原理：retrieval &#43; augmentation 模式" data-link-desc="為什麼模型需要外掛知識、語意相似 vs 字面相似、chunking 的本質取捨、retrieval 失敗的根本原因">4.1 RAG 原理</a> 提的 chunking 設計取捨——固定 token cap 容易切過句子或段落中間、語意被截斷。Paragraph-aware 切在自然邊界、保留段落內語意完整。</li>
<li><strong>為什麼 <code>soft</code> token cap（軟限制）而不是硬切</strong>：硬切會把一個 800-token 段落切成兩半；軟切讓「目前 chunk + 下一段超過 cap」時 flush 目前 chunk、下一段獨立成新 chunk（即使超過 cap 也保留段落完整）。代價：個別 chunk 可能超過 cap、retrieval 拿到的塊較大、但內容完整。</li>
<li><strong>為什麼 <code>len(p) / 2</code> 估 token</strong>：英文約 4 字元 / token、中文約 1.5 字元 / token、混合平均 / 2 在兩種場景都合理。要精確用 tokenizer（如 <code>tiktoken</code>）、但 demo 不需要——這個 heuristic 在 ±20% 內、夠用來做 chunking 決策。</li>
<li><strong>為什麼 <code>\n\n</code>.join(buf)`</strong>：flush 成 chunk 時、段落間保留空白行分隔、讀者看到 chunk 仍是合法 markdown 結構、不是平鋪文字。</li>
</ul>
<h3 id="2-embed呼叫-ollama-embedding-api">2. <code>embed</code>：呼叫 Ollama embedding API</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">def</span> <span class="nf">embed</span><span class="p">(</span><span class="n">text</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">list</span><span class="p">[</span><span class="nb">float</span><span class="p">]:</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">    <span class="n">payload</span> <span class="o">=</span> <span class="n">json</span><span class="o">.</span><span class="n">dumps</span><span class="p">({</span><span class="s2">&#34;model&#34;</span><span class="p">:</span> <span class="s2">&#34;nomic-embed-text&#34;</span><span class="p">,</span> <span class="s2">&#34;prompt&#34;</span><span class="p">:</span> <span class="n">text</span><span class="p">})</span><span class="o">.</span><span class="n">encode</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">    <span class="n">req</span> <span class="o">=</span> <span class="n">urllib</span><span class="o">.</span><span class="n">request</span><span class="o">.</span><span class="n">Request</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">        <span class="s2">&#34;http://localhost:11434/api/embeddings&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">        <span class="n">data</span><span class="o">=</span><span class="n">payload</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl">        <span class="n">headers</span><span class="o">=</span><span class="p">{</span><span class="s2">&#34;Content-Type&#34;</span><span class="p">:</span> <span class="s2">&#34;application/json&#34;</span><span class="p">},</span>
</span></span><span class="line"><span class="ln">7</span><span class="cl">    <span class="p">)</span>
</span></span><span class="line"><span class="ln">8</span><span class="cl">    <span class="k">with</span> <span class="n">urllib</span><span class="o">.</span><span class="n">request</span><span class="o">.</span><span class="n">urlopen</span><span class="p">(</span><span class="n">req</span><span class="p">,</span> <span class="n">timeout</span><span class="o">=</span><span class="mi">60</span><span class="p">)</span> <span class="k">as</span> <span class="n">resp</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">9</span><span class="cl">        <span class="k">return</span> <span class="n">json</span><span class="o">.</span><span class="n">loads</span><span class="p">(</span><span class="n">resp</span><span class="o">.</span><span class="n">read</span><span class="p">())[</span><span class="s2">&#34;embedding&#34;</span><span class="p">]</span></span></span></code></pre></div><p><strong>每行做什麼</strong>：</p>
<ol>
<li><strong><code>payload = json.dumps(...).encode()</code></strong>：把 dict 轉成 JSON 字串、再 encode 成 bytes。HTTP body 必須是 bytes、不能直接傳 str。</li>
<li><strong><code>urllib.request.Request(...)</code></strong>：建立 HTTP request 物件。沒寫 <code>method</code> 預設是 GET、但有 <code>data</code> 參數會自動變 POST。</li>
<li><strong><code>headers={&quot;Content-Type&quot;: &quot;application/json&quot;}</code></strong>：告訴 server payload 是 JSON。少了這個、Ollama 可能 parse 不出 body。</li>
<li><strong><code>urlopen(req, timeout=60)</code></strong>：發送 request、<code>timeout=60</code> 是 socket-level timeout（連線 + 讀取總共最多 60 秒）。</li>
<li><strong><code>json.loads(resp.read())[&quot;embedding&quot;]</code></strong>：讀回應 body、parse JSON、取 <code>embedding</code> 欄位（768 維 list of float）。</li>
</ol>
<p><strong>為什麼這樣設計</strong>：</p>
<ul>
<li><strong>為什麼用 stdlib <code>urllib</code> 而不是 <code>requests</code></strong>：完全沒有外部 dependency、<code>urllib</code> 是 Python stdlib 內建。<code>requests</code> 較友善但要 <code>pip install</code>、本 demo 想 minimal。</li>
<li><strong>為什麼 timeout=60</strong>：embed 一段文字通常 &lt; 200ms、60 秒夠 buffer 意外（首次 model 載入記憶體可能 5-10 秒）。設無限會在 Ollama 掛掉時整個 script hang。</li>
<li><strong>為什麼 <code>/api/embeddings</code>、不是 <code>/v1/embeddings</code></strong>：兩者都可。原生 endpoint 回應結構平、parse 直接（<code>r[&quot;embedding&quot;]</code>）；OpenAI 相容回應較巢狀（<code>r[&quot;data&quot;][0][&quot;embedding&quot;]</code>）。對 demo、寫法簡單較重要。</li>
</ul>
<h3 id="3-走訪--持久化">3. 走訪 + 持久化</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="n">md_files</span> <span class="o">=</span> <span class="nb">sorted</span><span class="p">(</span><span class="n">args</span><span class="o">.</span><span class="n">content_root</span><span class="o">.</span><span class="n">rglob</span><span class="p">(</span><span class="s2">&#34;*.md&#34;</span><span class="p">))</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="n">records</span> <span class="o">=</span> <span class="p">[]</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="k">for</span> <span class="n">md</span> <span class="ow">in</span> <span class="n">md_files</span><span class="p">:</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">    <span class="n">text</span> <span class="o">=</span> <span class="n">md</span><span class="o">.</span><span class="n">read_text</span><span class="p">(</span><span class="n">encoding</span><span class="o">=</span><span class="s2">&#34;utf-8&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">    <span class="n">text</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">sub</span><span class="p">(</span><span class="sa">r</span><span class="s2">&#34;^---\n.*?\n---\n&#34;</span><span class="p">,</span> <span class="s2">&#34;&#34;</span><span class="p">,</span> <span class="n">text</span><span class="p">,</span> <span class="n">count</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">flags</span><span class="o">=</span><span class="n">re</span><span class="o">.</span><span class="n">DOTALL</span><span class="p">)</span>  <span class="c1"># 去掉 frontmatter</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">    <span class="n">chunks</span> <span class="o">=</span> <span class="n">slice_markdown</span><span class="p">(</span><span class="n">text</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">    <span class="k">for</span> <span class="n">j</span><span class="p">,</span> <span class="n">chunk</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">chunks</span><span class="p">):</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">        <span class="n">vec</span> <span class="o">=</span> <span class="n">embed</span><span class="p">(</span><span class="n">chunk</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">        <span class="n">records</span><span class="o">.</span><span class="n">append</span><span class="p">({</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">            <span class="s2">&#34;source&#34;</span><span class="p">:</span> <span class="nb">str</span><span class="p">(</span><span class="n">md</span><span class="o">.</span><span class="n">relative_to</span><span class="p">(</span><span class="n">args</span><span class="o">.</span><span class="n">content_root</span><span class="o">.</span><span class="n">parent</span><span class="p">)),</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">            <span class="s2">&#34;chunk_index&#34;</span><span class="p">:</span> <span class="n">j</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">            <span class="s2">&#34;text&#34;</span><span class="p">:</span> <span class="n">chunk</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">            <span class="s2">&#34;embedding&#34;</span><span class="p">:</span> <span class="n">vec</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">        <span class="p">})</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s2">&#34;scripts/rag-demo/index.pkl&#34;</span><span class="p">,</span> <span class="s2">&#34;wb&#34;</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl">    <span class="n">pickle</span><span class="o">.</span><span class="n">dump</span><span class="p">(</span><span class="n">records</span><span class="p">,</span> <span class="n">f</span><span class="p">)</span></span></span></code></pre></div><p><strong>每段做什麼</strong>：</p>
<ol>
<li><strong><code>args.content_root.rglob(&quot;*.md&quot;)</code></strong>：recursive glob、回 <code>Path</code> iterator、找出 <code>content_root</code> 下所有 <code>.md</code> 檔（含子目錄）。</li>
<li><strong><code>sorted(...)</code></strong>：排序、讓每次 ingest 順序穩定（git diff 比較友善、retrieval 結果可重現）。</li>
<li><strong><code>text.read_text(encoding=&quot;utf-8&quot;)</code></strong>：讀檔、明確指定 UTF-8（中文 markdown 必要、否則 macOS / Linux 預設可能不一致）。</li>
<li><strong><code>re.sub(r&quot;^---\n.*?\n---\n&quot;, &quot;&quot;, text, count=1, flags=re.DOTALL)</code></strong>：去掉 Hugo frontmatter。
<ul>
<li><code>^---\n</code>：開頭 <code>---\n</code>。</li>
<li><code>.*?</code>：non-greedy match、配到下一個 <code>---</code> 就停。</li>
<li><code>\n---\n</code>：closing fence。</li>
<li><code>count=1</code>：只 strip 第一個（檔案中可能有其他 <code>---</code> 是水平分隔線、不要誤殺）。</li>
<li><code>flags=re.DOTALL</code>：讓 <code>.</code> 也匹配換行符（預設 <code>.</code> 不匹配 <code>\n</code>、規 frontmatter 跨行就吃不到）。</li>
</ul>
</li>
<li><strong><code>records.append({...})</code></strong>：每個 chunk 一個 record、含 source path、chunk index、原文、embedding。</li>
<li><strong><code>md.relative_to(args.content_root.parent)</code></strong>：把絕對 path 變成 <code>llm/00-foundations/xxx.md</code> 形式、retrieval 顯示時短、跨機器可移植。</li>
<li><strong><code>pickle.dump(records, f)</code></strong>：把整個 records list 序列化到 binary 檔。</li>
</ol>
<p><strong>為什麼這樣設計</strong>：</p>
<ul>
<li><strong>為什麼要 strip frontmatter</strong>：Frontmatter 是 <code>title</code>、<code>date</code>、<code>tags</code> 等 metadata、不是文章正文。embed 進去會稀釋向量語意（讓「date」「2026-05-11」等 keyword 影響相似度計算）。Strip 後 embedding 只 capture 內容語意。</li>
<li><strong>為什麼 records 是 list of dict 而不是 numpy array</strong>：兩個原因。(1) 每個 record 含 source / chunk_index / text / embedding 四種異質欄位、numpy 處理不直接。(2) 463 chunks 規模、純 Python list 跑 cosine 也只是毫秒級、不需要 vectorize。十萬 chunk 以上才考慮 numpy array + batched dot product。</li>
<li><strong>為什麼 pickle 而不是 JSON</strong>：embedding 是 768-float list、JSON 序列化會把每個 float 變成 ASCII 字串（每個 ~20 bytes）、檔案大很多、parse 也慢。Pickle 是 binary format、保留原本資料結構、檔案小、loader 快。代價：pickle 有 Python 版本相依、跨語言不能讀——但本 demo 索引只給自家 query.py / mcp_server.py 用、可接受。</li>
<li><strong>為什麼存 <code>text</code> 跟 <code>embedding</code>、不只 embedding</strong>：retrieval 要回 chunk 原文給 LLM 看、不能只有 source path（不然每次 query 還要再讀檔）。這裡的 corpus 檔案就是 <a href="/blog/llm/knowledge-cards/retrieval-source/" data-link-title="Retrieval Source" data-link-desc="RAG 從哪個 corpus、index、tool 或外部系統取回內容，決定來源可信度、freshness、權限與引用責任">retrieval source</a>；Pickle 多存原文成本低（~100 byte / chunk）、查詢時方便很多。</li>
</ul>
<h3 id="跑-ingest">跑 ingest</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="nb">cd</span> ~/Projects/blog
</span></span><span class="line"><span class="ln">2</span><span class="cl">python3 scripts/rag-demo/ingest.py</span></span></code></pre></div><ul>
<li><code>cd ~/Projects/blog</code>：切到 repo 根、讓相對路徑 <code>content/llm</code> 對得到 corpus、<code>scripts/rag-demo/index.pkl</code> 對得到 output 位置。</li>
<li><code>python3 scripts/rag-demo/ingest.py</code>：跑 ingest script、預設讀 <code>content/llm/</code>、寫 <code>scripts/rag-demo/index.pkl</code>。</li>
</ul>
<p>實測輸出：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">Found 71 markdown files under content/llm
</span></span><span class="line"><span class="ln">2</span><span class="cl">  [10/71] 86 chunks in 4.5s
</span></span><span class="line"><span class="ln">3</span><span class="cl">  [20/71] 181 chunks in 8.6s
</span></span><span class="line"><span class="ln">4</span><span class="cl">  ...
</span></span><span class="line"><span class="ln">5</span><span class="cl">  [70/71] 461 chunks in 22.2s
</span></span><span class="line"><span class="ln">6</span><span class="cl">Wrote 463 records to scripts/rag-demo/index.pkl (22.3s)</span></span></code></pre></div><p>463 chunks、22 秒、平均 ~21 chunks/sec。瓶頸是 sequential API call、用 async / batch 能快 5-10 倍、但這個量級不值得。</p>
<h2 id="queryretrieval--augmentation--generation">Query：retrieval + augmentation + generation</h2>
<p>完整檔案：<code>scripts/rag-demo/query.py</code>。三段。</p>
<h3 id="1-cosine-similarity--top-k-retrieval">1. Cosine similarity + top-K retrieval</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="k">def</span> <span class="nf">cosine</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">):</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="n">dot</span> <span class="o">=</span> <span class="nb">sum</span><span class="p">(</span><span class="n">x</span> <span class="o">*</span> <span class="n">y</span> <span class="k">for</span> <span class="n">x</span><span class="p">,</span> <span class="n">y</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">))</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">    <span class="n">na</span> <span class="o">=</span> <span class="n">math</span><span class="o">.</span><span class="n">sqrt</span><span class="p">(</span><span class="nb">sum</span><span class="p">(</span><span class="n">x</span> <span class="o">*</span> <span class="n">x</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">a</span><span class="p">))</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">    <span class="n">nb</span> <span class="o">=</span> <span class="n">math</span><span class="o">.</span><span class="n">sqrt</span><span class="p">(</span><span class="nb">sum</span><span class="p">(</span><span class="n">y</span> <span class="o">*</span> <span class="n">y</span> <span class="k">for</span> <span class="n">y</span> <span class="ow">in</span> <span class="n">b</span><span class="p">))</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">    <span class="k">return</span> <span class="n">dot</span> <span class="o">/</span> <span class="p">(</span><span class="n">na</span> <span class="o">*</span> <span class="n">nb</span><span class="p">)</span> <span class="k">if</span> <span class="n">na</span> <span class="ow">and</span> <span class="n">nb</span> <span class="k">else</span> <span class="mf">0.0</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="k">def</span> <span class="nf">retrieve</span><span class="p">(</span><span class="n">records</span><span class="p">,</span> <span class="n">query_vec</span><span class="p">,</span> <span class="n">top_k</span><span class="p">):</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">    <span class="n">scored</span> <span class="o">=</span> <span class="p">[(</span><span class="n">cosine</span><span class="p">(</span><span class="n">query_vec</span><span class="p">,</span> <span class="n">r</span><span class="p">[</span><span class="s2">&#34;embedding&#34;</span><span class="p">]),</span> <span class="n">r</span><span class="p">)</span> <span class="k">for</span> <span class="n">r</span> <span class="ow">in</span> <span class="n">records</span><span class="p">]</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">    <span class="n">scored</span><span class="o">.</span><span class="n">sort</span><span class="p">(</span><span class="n">key</span><span class="o">=</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">x</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">reverse</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">    <span class="k">return</span> <span class="n">scored</span><span class="p">[:</span><span class="n">top_k</span><span class="p">]</span></span></span></code></pre></div><p><strong>每行做什麼</strong>：</p>
<ol>
<li><strong><code>dot = sum(x * y for x, y in zip(a, b))</code></strong>：兩個向量的內積（dot product）。<code>zip(a, b)</code> 把兩個 list 對位配對、generator expression 算每對相乘、sum 加起來。</li>
<li><strong><code>na = math.sqrt(sum(x * x for x in a))</code></strong>：a 的 L2 norm（歐氏範數）—— <code>sqrt(x1² + x2² + ... + xn²)</code>。</li>
<li><strong><code>nb = math.sqrt(sum(y * y for y in b))</code></strong>：b 的 L2 norm。</li>
<li><strong><code>return dot / (na * nb) if na and nb else 0.0</code></strong>：cosine = dot / (||a|| × ||b||)。三元運算子防 zero division——若任一向量是零向量、na 或 nb 為 0、回 0.0 而不是 crash。</li>
<li><strong><code>scored = [(cosine(query_vec, r[&quot;embedding&quot;]), r) for r in records]</code></strong>：對每個 record 算相似度、組成 (score, record) tuple 的 list。</li>
<li><strong><code>scored.sort(key=lambda x: x[0], reverse=True)</code></strong>：按 score 從大到小排序。<code>key=lambda x: x[0]</code> 取 tuple 第一個元素（score）當排序 key。</li>
<li><strong><code>return scored[:top_k]</code></strong>：取前 K 個。</li>
</ol>
<p><strong>為什麼這樣設計</strong>：</p>
<ul>
<li><strong>為什麼 cosine 而不是純 dot product</strong>：純 dot product 受向量長度影響——長向量自動拿高分、跟「相似度」無關。Cosine 把向量正規化到單位長度、純看方向、是「語意相似」的標準衡量。語意相似 embedding 應該方向相近、長度差異不重要。</li>
<li><strong>為什麼用 <code>math.sqrt</code> 而不是 <code>**0.5</code></strong>：兩者數學等價、但 <code>math.sqrt</code> 用 C-level 實作、CPython 中比 Python 級 <code>**0.5</code> 快幾倍。對 463 chunks 影響不大、但 production scale 會放大差異——習慣寫 <code>math.sqrt</code> 的好。</li>
<li><strong>為什麼 <code>if na and nb else 0.0</code></strong>：防禦性程式設計。理論上 embedding 不會是零向量（模型架構保證有非零權重）、但邊界情況（空輸入、API 出錯回 placeholder）可能出現、避免 ZeroDivisionError 整個 query 失敗。回 0.0 表示「無法判斷相似度」、retrieval 排序時自然排到最後。</li>
<li><strong>為什麼 sort 全部、不用 heap</strong>：463 records、Python sort 是 O(n log n)、毫秒級。<code>heapq.nlargest(top_k, ...)</code> 是 O(n log k)、在 k=4、n=463 上實測幾乎沒差。十萬 record 以上才看到顯著差別。</li>
<li><strong>為什麼用 list of tuple、不用 numpy</strong>：跟 ingest 同樣的理由——小規模不需要 vectorize、純 Python 清楚。</li>
</ul>
<h3 id="2-建-augmented-prompt">2. 建 augmented prompt</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="n">context_blocks</span> <span class="o">=</span> <span class="p">[]</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="k">for</span> <span class="n">score</span><span class="p">,</span> <span class="n">r</span> <span class="ow">in</span> <span class="n">retrieved</span><span class="p">:</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">    <span class="n">context_blocks</span><span class="o">.</span><span class="n">append</span><span class="p">(</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">        <span class="sa">f</span><span class="s2">&#34;[來源：</span><span class="si">{</span><span class="n">r</span><span class="p">[</span><span class="s1">&#39;source&#39;</span><span class="p">]</span><span class="si">}</span><span class="s2">#chunk</span><span class="si">{</span><span class="n">r</span><span class="p">[</span><span class="s1">&#39;chunk_index&#39;</span><span class="p">]</span><span class="si">}</span><span class="s2"> 相似度：</span><span class="si">{</span><span class="n">score</span><span class="si">:</span><span class="s2">.3f</span><span class="si">}</span><span class="s2">]</span><span class="se">\n</span><span class="si">{</span><span class="n">r</span><span class="p">[</span><span class="s1">&#39;text&#39;</span><span class="p">]</span><span class="si">}</span><span class="s2">&#34;</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">    <span class="p">)</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="n">system</span> <span class="o">=</span> <span class="p">(</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">    <span class="s2">&#34;你是一個技術文件問答助手。&#34;</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">    <span class="s2">&#34;依下方 context 內容回答問題、不要編造 context 外的事實。&#34;</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">    <span class="s2">&#34;若 context 不足以回答、明確說『資料不足』。&#34;</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">    <span class="s2">&#34;回答末尾列出引用的來源 path。&#34;</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="p">)</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="n">user</span> <span class="o">=</span> <span class="s2">&#34;## Context</span><span class="se">\n\n</span><span class="s2">&#34;</span> <span class="o">+</span> <span class="s2">&#34;</span><span class="se">\n\n</span><span class="s2">---</span><span class="se">\n\n</span><span class="s2">&#34;</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">context_blocks</span><span class="p">)</span> <span class="o">+</span> <span class="sa">f</span><span class="s2">&#34;</span><span class="se">\n\n</span><span class="s2">## Question</span><span class="se">\n\n</span><span class="si">{</span><span class="n">question</span><span class="si">}</span><span class="s2">&#34;</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">
</span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="n">messages</span> <span class="o">=</span> <span class="p">[</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl">    <span class="p">{</span><span class="s2">&#34;role&#34;</span><span class="p">:</span> <span class="s2">&#34;system&#34;</span><span class="p">,</span> <span class="s2">&#34;content&#34;</span><span class="p">:</span> <span class="n">system</span><span class="p">},</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl">    <span class="p">{</span><span class="s2">&#34;role&#34;</span><span class="p">:</span> <span class="s2">&#34;user&#34;</span><span class="p">,</span> <span class="s2">&#34;content&#34;</span><span class="p">:</span> <span class="n">user</span><span class="p">},</span>
</span></span><span class="line"><span class="ln">18</span><span class="cl"><span class="p">]</span></span></span></code></pre></div><p><strong>每行做什麼</strong>：</p>
<ol>
<li><strong><code>f&quot;[來源：{...} 相似度：{score:.3f}]\n{r['text']}&quot;</code></strong>：每個 retrieved chunk 加 header 標明出處跟相似度、再接原文。<code>:.3f</code> 是 score 格式化到三位小數。</li>
<li><strong><code>&quot;\n\n---\n\n&quot;.join(context_blocks)</code></strong>：用 <code>---</code> 水平分隔線分隔各 chunk、視覺上清楚。</li>
<li><strong><code>{&quot;role&quot;: &quot;system&quot;, &quot;content&quot;: system}</code></strong>：system message 給 LLM 設定角色 + 約束。</li>
<li><strong><code>{&quot;role&quot;: &quot;user&quot;, &quot;content&quot;: user}</code></strong>：user message 含 context 跟 question、是 LLM 實際讀的內容。</li>
</ol>
<p><strong>為什麼這樣設計</strong>：</p>
<ul>
<li><strong>為什麼 <a href="/blog/llm/knowledge-cards/system-prompt/" data-link-title="System Prompt" data-link-desc="LLM application 中由開發者預設、不直接顯示給使用者的指令層、定義模型的角色、行為規範、輸出格式">system prompt</a> 約束四件事</strong>（角色、忠於 context、資料不足時明說、引用來源）：
<ul>
<li><strong>角色</strong>：「技術文件問答助手」框定模型行為、減少 off-topic 回應。</li>
<li><strong>忠於 context</strong>：對抗 RAG 最常見的失敗模式——LLM 看到 context 但用自己訓練的 knowledge 補完、結果跟 corpus 不一致。明確要求 follow context 能降低（雖然不能完全消除、見實測 1）。</li>
<li><strong>資料不足時明說</strong>：避免 LLM「硬要回答」造成 hallucination。對 weak model 這條 follow 度差、但對 large model 有效。</li>
<li><strong>引用來源</strong>：traceability。讀者能回查 corpus、驗證模型答案。</li>
</ul>
</li>
<li><strong>為什麼 <code>## Context</code> / <code>## Question</code> 結構</strong>：用 markdown heading 結構幫助 LLM 區分「我要讀什麼」「我要回答什麼」。比平鋪文字穩定（即使對小模型）。</li>
<li><strong>為什麼把 retrieved chunks 全塞 user message、不分開</strong>：MCP / function calling 的更現代做法是把 retrieved 結果做成 tool response、模型主動 call retrieval tool。本 demo 不引入 tool use、直接塞 prompt 較單純——能說明 RAG 核心（augmentation）不必牽扯 tool use。</li>
</ul>
<h3 id="3-呼叫-chat-completions">3. 呼叫 chat completions</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">def</span> <span class="nf">chat</span><span class="p">(</span><span class="n">messages</span><span class="p">,</span> <span class="n">model</span><span class="p">):</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">    <span class="n">payload</span> <span class="o">=</span> <span class="n">json</span><span class="o">.</span><span class="n">dumps</span><span class="p">({</span><span class="s2">&#34;model&#34;</span><span class="p">:</span> <span class="n">model</span><span class="p">,</span> <span class="s2">&#34;messages&#34;</span><span class="p">:</span> <span class="n">messages</span><span class="p">,</span> <span class="s2">&#34;stream&#34;</span><span class="p">:</span> <span class="kc">False</span><span class="p">})</span><span class="o">.</span><span class="n">encode</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">    <span class="n">req</span> <span class="o">=</span> <span class="n">urllib</span><span class="o">.</span><span class="n">request</span><span class="o">.</span><span class="n">Request</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">        <span class="s2">&#34;http://localhost:11434/v1/chat/completions&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">        <span class="n">data</span><span class="o">=</span><span class="n">payload</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl">        <span class="n">headers</span><span class="o">=</span><span class="p">{</span><span class="s2">&#34;Content-Type&#34;</span><span class="p">:</span> <span class="s2">&#34;application/json&#34;</span><span class="p">},</span>
</span></span><span class="line"><span class="ln">7</span><span class="cl">    <span class="p">)</span>
</span></span><span class="line"><span class="ln">8</span><span class="cl">    <span class="k">with</span> <span class="n">urllib</span><span class="o">.</span><span class="n">request</span><span class="o">.</span><span class="n">urlopen</span><span class="p">(</span><span class="n">req</span><span class="p">,</span> <span class="n">timeout</span><span class="o">=</span><span class="mi">180</span><span class="p">)</span> <span class="k">as</span> <span class="n">resp</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">9</span><span class="cl">        <span class="k">return</span> <span class="n">json</span><span class="o">.</span><span class="n">loads</span><span class="p">(</span><span class="n">resp</span><span class="o">.</span><span class="n">read</span><span class="p">())[</span><span class="s2">&#34;choices&#34;</span><span class="p">][</span><span class="mi">0</span><span class="p">][</span><span class="s2">&#34;message&#34;</span><span class="p">][</span><span class="s2">&#34;content&#34;</span><span class="p">]</span></span></span></code></pre></div><p><strong>每行做什麼</strong>：</p>
<ol>
<li><strong><code>json.dumps({&quot;model&quot;: ..., &quot;messages&quot;: ..., &quot;stream&quot;: False}).encode()</code></strong>：構造 OpenAI 相容 chat completions request body。<code>stream: False</code> 讓 server 等生成完再一次回、不要 SSE 串流。</li>
<li><strong><code>/v1/chat/completions</code></strong>：OpenAI 相容 endpoint、跟雲端 OpenAI 完全同樣 schema。</li>
<li><strong><code>timeout=180</code></strong>：3 分鐘、給長 context + 慢模型空間。</li>
<li><strong><code>[&quot;choices&quot;][0][&quot;message&quot;][&quot;content&quot;]</code></strong>：parse OpenAI 標準 response 結構、取第一個 choice 的 content。</li>
</ol>
<p><strong>為什麼這樣設計</strong>：</p>
<ul>
<li><strong>為什麼 <code>stream: False</code></strong>：demo 要把完整 answer 印出、不需要 incremental display。<code>stream: True</code> 要寫 SSE parser、複雜。Production 互動式 UI 才需要 streaming。</li>
<li><strong>為什麼 timeout=180、不是 60</strong>：1B 模型 + 4 個 retrieved chunks 的 context、prefill 可能要 5-30 秒、生成 100-500 token 又要 5-20 秒、保守設 3 分鐘。embed function 用 60 是因為 embedding 是純 forward pass、單一 token 量級操作、不需要這麼長。</li>
<li><strong>為什麼 <code>/v1/...</code> 而不是 <code>/api/...</code></strong>：chat completions 走 OpenAI 相容 endpoint、生態都用這個格式（Continue.dev、Cursor、各家 SDK）。embedding 用 <code>/api/...</code> 是因為原生 schema 簡單；chat 用 <code>/v1/...</code> 是因為 message-based 結構是 OpenAI 標準、跨工具互通。</li>
</ul>
<h2 id="實測結果retrieval-對generation-弱">實測結果：retrieval 對、generation 弱</h2>
<h3 id="測試-1什麼是-mtp為什麼對寫-code-場景特別有效">測試 1：「什麼是 MTP？為什麼對寫 code 場景特別有效？」</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">python3 scripts/rag-demo/query.py --show-retrieved <span class="s2">&#34;什麼是 MTP？為什麼對寫 code 場景特別有效？&#34;</span></span></span></code></pre></div><p><code>--show-retrieved</code> 是個 flag、開啟後在 stderr 印 retrieved chunks 跟 score、答案還是進 stdout。是 debug 跟教學用、不會影響 LLM 看到的 prompt。</p>
<p>Retrieval：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">0.870  llm/knowledge-cards/transformer.md#chunk2
</span></span><span class="line"><span class="ln">2</span><span class="cl">0.825  llm/03-theoretical-foundations/sampling-and-decoding.md#chunk8
</span></span><span class="line"><span class="ln">3</span><span class="cl">0.782  llm/knowledge-cards/ttft.md#chunk1
</span></span><span class="line"><span class="ln">4</span><span class="cl">0.771  llm/knowledge-cards/mtp.md#chunk2</span></span></code></pre></div><p>四個 chunk 都跟問題相關、相似度合理。MTP 卡確實被命中（雖然不是 top-1、是因為 transformer.md 該段提到 MTP）。</p>
<p>Generation（1B 模型）：</p>
<blockquote>
<p>MTP 僅指使用 Ollama 進行 Coding 模型訓練與部署、它是一種系統性的方式&hellip;
來源：<a href="https://llm.dev/mti/">llm.dev</a></p></blockquote>
<p><strong>錯</strong>：1B 模型編造了「MTP 僅指使用 Ollama」這個事實（不對、MTP 是 Google 為 Gemma 釋出的、跟 Ollama 沒直接關係）、來源 URL 也是 hallucination。</p>
<h3 id="測試-2mcp-跟-function-calling-有什麼差別">測試 2：「MCP 跟 function calling 有什麼差別？」</h3>
<p>Retrieval：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">0.721  llm/04-applications/application-protocols.md#chunk2
</span></span><span class="line"><span class="ln">2</span><span class="cl">0.704  llm/04-applications/application-protocols.md#chunk1
</span></span><span class="line"><span class="ln">3</span><span class="cl">0.702  llm/04-applications/application-protocols.md#chunk0
</span></span><span class="line"><span class="ln">4</span><span class="cl">0.693  llm/knowledge-cards/function-calling.md#chunk1</span></span></code></pre></div><p>完美命中——4.3 應用層協議章節三個 chunk + function-calling 卡。</p>
<p>Generation：模型把幾段重複拼接、framing 跟原文有出入、但比測試 1 好（因為 context 涵蓋直接答案）。</p>
<h2 id="觀察跟原理對應">觀察跟原理對應</h2>
<p>這個 demo 剛好示範 <a href="/blog/llm/04-applications/rag-principles/" data-link-title="4.1 RAG 原理：retrieval &#43; augmentation 模式" data-link-desc="為什麼模型需要外掛知識、語意相似 vs 字面相似、chunking 的本質取捨、retrieval 失敗的根本原因">4.1 RAG 原理</a> 提的兩段式失敗模式：</p>
<table>
  <thead>
      <tr>
          <th>階段</th>
          <th>表現</th>
          <th>原因</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Retrieval</td>
          <td>命中率好、找到對的 chunks</td>
          <td><code>nomic-embed-text</code> 對技術文件覆蓋好、cosine 對短 query 也 OK</td>
      </tr>
      <tr>
          <td>Generation</td>
          <td>內容有時編造、不忠於 context、來源亂寫</td>
          <td><code>gemma3:1b</code> 模型容量不足以可靠 follow system prompt</td>
      </tr>
  </tbody>
</table>
<p>換 31B+ 模型 generation 會改善很多——這也是 4.0 章節提到「retrieval 跟下游 LLM 訓練分佈不一致」會放大失敗的具體例子。寫 RAG 系統時、generation 失敗不一定是「retrieval 沒給對 context」、可能是「模型不夠強」。</p>
<h2 id="何時這份-demo-會過時">何時這份 demo 會過時</h2>
<ul>
<li><strong>Ollama API 形狀</strong>：短期內不會變（生態都依賴）。</li>
<li><strong><code>nomic-embed-text</code> / <code>gemma3:1b</code> 具體 tag</strong>：預期會被新模型取代、但 retrieval + augmentation 結構不變。</li>
<li><strong>Chunking heuristic</strong>：簡單 char-count / 2 很粗、半年後若有便宜的 token counter 直接接會更準。</li>
<li><strong>Pickle 儲存</strong>：production 場景建議換 vector DB、本 demo 是教學用。</li>
</ul>
<p>實作換代時、保留 ingest / retrieve / augment / generate 四段、各段內部換工具即可——這四段是 RAG 的骨架、跨工具世代不變。</p>
<h2 id="跑這個-demo-的指令總結">跑這個 demo 的指令總結</h2>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># 一次性建索引（每次 corpus 變動才需要重建）</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="nb">cd</span> ~/Projects/blog
</span></span><span class="line"><span class="ln">3</span><span class="cl">python3 scripts/rag-demo/ingest.py</span></span></code></pre></div><ul>
<li><code>cd</code>：切到 repo 根、relative path 對得到。</li>
<li><code>python3 ingest.py</code>：跑索引、預設讀 <code>content/llm/</code>、寫 <code>scripts/rag-demo/index.pkl</code>。每次 corpus 變動才需要重跑、不變的話 index 就一直用。</li>
</ul>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># 查詢（任意次）</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">python3 scripts/rag-demo/query.py --show-retrieved <span class="s2">&#34;你的問題&#34;</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">python3 scripts/rag-demo/query.py --top-k <span class="m">5</span> --model gemma3:1b <span class="s2">&#34;問題&#34;</span></span></span></code></pre></div><ul>
<li><code>--show-retrieved</code>：教學 / debug 用、列 retrieved chunks 跟 score 到 stderr。</li>
<li><code>--top-k 5</code>：取 top 5 instead of 預設 4。chunks 越多 context 越長、TTFT 越久、但訊息越完整。</li>
<li><code>--model gemma3:1b</code>：指定 chat model。換 <code>gemma3:4b</code>、<code>gemma4:31b-coding-mtp-bf16</code> 等 generation 品質會大幅改善。</li>
</ul>
<p>完整 source 在 <code>scripts/rag-demo/</code> 下、200 行 Python、無外部 dependency。</p>
<p>跟其他 hands-on 章節的關係：完整 hands-on 系列見 <a href="/blog/llm/01-local-llm-services/hands-on/" data-link-title="Hands-on：本地 AI 工具實作筆記" data-link-desc="Ollama / ComfyUI / Whisper / Piper TTS：實際安裝、驗證、跑通的紀錄。隨工具版本演化、跟 1.x 原理章節互補。">Hands-on 章節索引</a>、把 retrieval 包成 MCP server 暴露給 LLM application 見 <a href="/blog/llm/01-local-llm-services/hands-on/mcp-demo/" data-link-title="Hands-on：用 blog content 寫一個最小 MCP server" data-link-desc="stdio JSON-RPC、stdlib-only Python、暴露 blog content 給 LLM 用、validating 4.3 應用層協議">MCP demo</a>、RAG + MCP 同跑的記憶體 / 程序預算見 <a href="/blog/llm/01-local-llm-services/hands-on/rag-mcp-resources/" data-link-title="Hands-on：RAG / MCP 的資源 footprint" data-link-desc="RAG ingest / query / MCP server 三階段的 RAM / 磁碟 / process 實測、多模型並存的 RAM 衝突、本地 LLM 跑 RAG 跟單純 chat 的差異">RAG + MCP resource footprint</a>、術語見 <a href="/blog/llm/knowledge-cards/rag/" data-link-title="RAG" data-link-desc="Retrieval-Augmented Generation：動態外掛知識給 LLM、繞開模型參數記憶的靜態限制">RAG</a> 跟 <a href="/blog/llm/knowledge-cards/embedding-model/" data-link-title="Embedding Model" data-link-desc="把文字轉成向量的模型：用於 codebase 索引與語意搜尋">embedding model</a>。</p>
]]></content:encoded></item><item><title>Hands-on：用 blog content 寫一個最小 MCP server</title><link>https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/mcp-demo/</link><pubDate>Tue, 12 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/mcp-demo/</guid><description>&lt;p>本篇把 &lt;a href="https://tarrragon.github.io/blog/llm/04-applications/application-protocols/" data-link-title="4.6 應用層協議：function calling / structured output / MCP" data-link-desc="三個常被混為一談的概念：模型能力、sampling 約束、server 協議，三者的層級差異與組合方式">4.6 應用層協議&lt;/a> 的 MCP 概念落到一個可跑的最小實作：用 stdio JSON-RPC 暴露兩個 tool（&lt;code>search_blog&lt;/code>、&lt;code>read_chunk&lt;/code>）、客戶端 spawn server 跟它對話、驗證 protocol initialize / tools/list / tools/call / error 四個基本流程。實作刻意只用 Python stdlib、不依賴 MCP SDK、為的是把 wire protocol 看清楚、跟 4.3 的「server 協議層」framing 對應。&lt;/p>
&lt;blockquote>
&lt;p>&lt;strong>驗證日期&lt;/strong>：2026-05-12
&lt;strong>環境&lt;/strong>：Python 3.11+、stdlib only（json / subprocess / urllib）
&lt;strong>依賴&lt;/strong>：RAG demo 的 &lt;code>index.pkl&lt;/code>（&lt;a href="https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/rag-demo/" data-link-title="Hands-on：用 blog content 當 corpus 跑 RAG" data-link-desc="200 行 Python：embedding &amp;#43; cosine retrieval &amp;#43; Ollama chat、validating 4.0 RAG 原理">見 RAG demo&lt;/a>）
&lt;strong>協議版本&lt;/strong>：MCP &lt;code>2025-03-26&lt;/code>&lt;/p>&lt;/blockquote>
&lt;h2 id="mcp-是什麼層的東西">MCP 是什麼層的東西&lt;/h2>
&lt;p>回顧 &lt;a href="https://tarrragon.github.io/blog/llm/04-applications/application-protocols/" data-link-title="4.6 應用層協議：function calling / structured output / MCP" data-link-desc="三個常被混為一談的概念：模型能力、sampling 約束、server 協議，三者的層級差異與組合方式">4.6 應用層協議&lt;/a> 的層級劃分：&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Function calling&lt;/strong>：模型訓練建立的能力（模型層）。&lt;/li>
&lt;li>&lt;strong>Structured output&lt;/strong>：sampling 階段約束（推論層）。&lt;/li>
&lt;li>&lt;strong>MCP&lt;/strong>：LLM application ↔ 外部 tool server 的協議（架構層）。&lt;/li>
&lt;/ul>
&lt;p>MCP 不管「模型怎麼呼叫工具」、它管「工具怎麼被暴露給 application」。本 demo 寫的是 server 端：server 不知道是哪個 LLM 在用它、不假設客戶端用 function calling 還是 structured output、它只專注「把 tool 透過 JSON-RPC 暴露出去」。&lt;/p>
&lt;p>這跟 &lt;a href="https://tarrragon.github.io/blog/llm/00-foundations/openai-compatible-api/" data-link-title="0.3 OpenAI 相容 API" data-link-desc="為什麼幾乎所有本地 LLM 工具不用改就能切到本地：背後是同一套 API 形狀">OpenAI 相容 API&lt;/a> 的設計哲學一致：定義最小可用標準、讓生態繞著標準長。&lt;/p>
&lt;h2 id="前置設定">前置設定&lt;/h2>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>項目&lt;/th>
 &lt;th>來源&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>Ollama + &lt;code>nomic-embed-text&lt;/code>&lt;/td>
 &lt;td>&lt;a href="https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/ollama-setup/" data-link-title="Hands-on：安裝 Ollama &amp;#43; 拉第一個 Gemma 模型" data-link-desc="brew install ollama、launchd service、ollama pull、curl 驗證 OpenAI 相容 API">Ollama 安裝&lt;/a>&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>RAG index（&lt;code>index.pkl&lt;/code>）&lt;/td>
 &lt;td>&lt;a href="https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/rag-demo/" data-link-title="Hands-on：用 blog content 當 corpus 跑 RAG" data-link-desc="200 行 Python：embedding &amp;#43; cosine retrieval &amp;#43; Ollama chat、validating 4.0 RAG 原理">RAG demo&lt;/a> 跑過 &lt;code>ingest.py&lt;/code>&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Python&lt;/td>
 &lt;td>3.11+&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>不需要安裝 MCP SDK——本 demo 手寫 JSON-RPC 處理、為了 inspection 透明度。Production server 建議改用 &lt;a href="https://github.com/modelcontextprotocol">官方 SDK&lt;/a>（Python / TypeScript 都有）、處理 framing、capability negotiation、transport edge cases。&lt;/p></description><content:encoded><![CDATA[<p>本篇把 <a href="/blog/llm/04-applications/application-protocols/" data-link-title="4.6 應用層協議：function calling / structured output / MCP" data-link-desc="三個常被混為一談的概念：模型能力、sampling 約束、server 協議，三者的層級差異與組合方式">4.6 應用層協議</a> 的 MCP 概念落到一個可跑的最小實作：用 stdio JSON-RPC 暴露兩個 tool（<code>search_blog</code>、<code>read_chunk</code>）、客戶端 spawn server 跟它對話、驗證 protocol initialize / tools/list / tools/call / error 四個基本流程。實作刻意只用 Python stdlib、不依賴 MCP SDK、為的是把 wire protocol 看清楚、跟 4.3 的「server 協議層」framing 對應。</p>
<blockquote>
<p><strong>驗證日期</strong>：2026-05-12
<strong>環境</strong>：Python 3.11+、stdlib only（json / subprocess / urllib）
<strong>依賴</strong>：RAG demo 的 <code>index.pkl</code>（<a href="/blog/llm/01-local-llm-services/hands-on/rag-demo/" data-link-title="Hands-on：用 blog content 當 corpus 跑 RAG" data-link-desc="200 行 Python：embedding &#43; cosine retrieval &#43; Ollama chat、validating 4.0 RAG 原理">見 RAG demo</a>）
<strong>協議版本</strong>：MCP <code>2025-03-26</code></p></blockquote>
<h2 id="mcp-是什麼層的東西">MCP 是什麼層的東西</h2>
<p>回顧 <a href="/blog/llm/04-applications/application-protocols/" data-link-title="4.6 應用層協議：function calling / structured output / MCP" data-link-desc="三個常被混為一談的概念：模型能力、sampling 約束、server 協議，三者的層級差異與組合方式">4.6 應用層協議</a> 的層級劃分：</p>
<ul>
<li><strong>Function calling</strong>：模型訓練建立的能力（模型層）。</li>
<li><strong>Structured output</strong>：sampling 階段約束（推論層）。</li>
<li><strong>MCP</strong>：LLM application ↔ 外部 tool server 的協議（架構層）。</li>
</ul>
<p>MCP 不管「模型怎麼呼叫工具」、它管「工具怎麼被暴露給 application」。本 demo 寫的是 server 端：server 不知道是哪個 LLM 在用它、不假設客戶端用 function calling 還是 structured output、它只專注「把 tool 透過 JSON-RPC 暴露出去」。</p>
<p>這跟 <a href="/blog/llm/00-foundations/openai-compatible-api/" data-link-title="0.3 OpenAI 相容 API" data-link-desc="為什麼幾乎所有本地 LLM 工具不用改就能切到本地：背後是同一套 API 形狀">OpenAI 相容 API</a> 的設計哲學一致：定義最小可用標準、讓生態繞著標準長。</p>
<h2 id="前置設定">前置設定</h2>
<table>
  <thead>
      <tr>
          <th>項目</th>
          <th>來源</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Ollama + <code>nomic-embed-text</code></td>
          <td><a href="/blog/llm/01-local-llm-services/hands-on/ollama-setup/" data-link-title="Hands-on：安裝 Ollama &#43; 拉第一個 Gemma 模型" data-link-desc="brew install ollama、launchd service、ollama pull、curl 驗證 OpenAI 相容 API">Ollama 安裝</a></td>
      </tr>
      <tr>
          <td>RAG index（<code>index.pkl</code>）</td>
          <td><a href="/blog/llm/01-local-llm-services/hands-on/rag-demo/" data-link-title="Hands-on：用 blog content 當 corpus 跑 RAG" data-link-desc="200 行 Python：embedding &#43; cosine retrieval &#43; Ollama chat、validating 4.0 RAG 原理">RAG demo</a> 跑過 <code>ingest.py</code></td>
      </tr>
      <tr>
          <td>Python</td>
          <td>3.11+</td>
      </tr>
  </tbody>
</table>
<p>不需要安裝 MCP SDK——本 demo 手寫 JSON-RPC 處理、為了 inspection 透明度。Production server 建議改用 <a href="https://github.com/modelcontextprotocol">官方 SDK</a>（Python / TypeScript 都有）、處理 framing、capability negotiation、transport edge cases。</p>
<h2 id="mcp-協議的最小子集">MCP 協議的最小子集</h2>
<p><a href="/blog/llm/knowledge-cards/mcp/" data-link-title="MCP（Model Context Protocol）" data-link-desc="LLM application ↔ 外部 tool server 之間的標準化協議、複用 OpenAI 相容 API 的成功模式">MCP server</a> 要 handle 的核心 method：</p>
<table>
  <thead>
      <tr>
          <th>Method</th>
          <th>角色</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><code>initialize</code></td>
          <td>Client 跟 server 握手、交換 protocol version + capability</td>
      </tr>
      <tr>
          <td><code>notifications/initialized</code></td>
          <td>Client 通知 handshake 完成（notification、無 response）</td>
      </tr>
      <tr>
          <td><code>tools/list</code></td>
          <td>Client 問 server 有哪些 tool</td>
      </tr>
      <tr>
          <td><code>tools/call</code></td>
          <td>Client 呼叫某 tool、傳 arguments</td>
      </tr>
  </tbody>
</table>
<p>四個 method 之外、還可以暴露 resources / prompts / sampling、本 demo 只做 tools。</p>
<h2 id="server-實作">Server 實作</h2>
<p>完整檔案：<code>scripts/mcp-demo/blog_mcp_server.py</code>、約 150 行。</p>
<h3 id="主迴圈讀-stdin分派-method寫-stdout">主迴圈：讀 stdin、分派 method、寫 stdout</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="k">def</span> <span class="nf">main</span><span class="p">():</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="n">log</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;[blog-mcp-demo] starting, index=</span><span class="si">{</span><span class="n">INDEX_PATH</span><span class="si">}</span><span class="s2">, tools=</span><span class="si">{</span><span class="nb">list</span><span class="p">(</span><span class="n">TOOLS</span><span class="o">.</span><span class="n">keys</span><span class="p">())</span><span class="si">}</span><span class="s2">&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">    <span class="k">for</span> <span class="n">line</span> <span class="ow">in</span> <span class="n">sys</span><span class="o">.</span><span class="n">stdin</span><span class="p">:</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">        <span class="n">line</span> <span class="o">=</span> <span class="n">line</span><span class="o">.</span><span class="n">strip</span><span class="p">()</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">        <span class="k">if</span> <span class="ow">not</span> <span class="n">line</span><span class="p">:</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">            <span class="k">continue</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">        <span class="k">try</span><span class="p">:</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">            <span class="n">msg</span> <span class="o">=</span> <span class="n">json</span><span class="o">.</span><span class="n">loads</span><span class="p">(</span><span class="n">line</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">        <span class="k">except</span> <span class="n">json</span><span class="o">.</span><span class="n">JSONDecodeError</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">            <span class="n">log</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;  parse error: </span><span class="si">{</span><span class="n">e</span><span class="si">}</span><span class="s2">&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">            <span class="k">continue</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">        <span class="n">method</span> <span class="o">=</span> <span class="n">msg</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">&#34;method&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">        <span class="n">rid</span> <span class="o">=</span> <span class="n">msg</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">&#34;id&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">        <span class="n">params</span> <span class="o">=</span> <span class="n">msg</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">&#34;params&#34;</span><span class="p">,</span> <span class="p">{})</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">        <span class="n">log</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;  → </span><span class="si">{</span><span class="n">method</span><span class="si">}</span><span class="s2"> (id=</span><span class="si">{</span><span class="n">rid</span><span class="si">}</span><span class="s2">)&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl">        <span class="k">if</span> <span class="n">method</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">HANDLERS</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl">            <span class="n">respond</span><span class="p">(</span><span class="n">rid</span><span class="p">,</span> <span class="n">error</span><span class="o">=</span><span class="p">{</span><span class="s2">&#34;code&#34;</span><span class="p">:</span> <span class="o">-</span><span class="mi">32601</span><span class="p">,</span> <span class="s2">&#34;message&#34;</span><span class="p">:</span> <span class="sa">f</span><span class="s2">&#34;Method not found: </span><span class="si">{</span><span class="n">method</span><span class="si">}</span><span class="s2">&#34;</span><span class="p">})</span>
</span></span><span class="line"><span class="ln">18</span><span class="cl">            <span class="k">continue</span>
</span></span><span class="line"><span class="ln">19</span><span class="cl">        <span class="n">handler</span> <span class="o">=</span> <span class="n">HANDLERS</span><span class="p">[</span><span class="n">method</span><span class="p">]</span>
</span></span><span class="line"><span class="ln">20</span><span class="cl">        <span class="k">if</span> <span class="n">handler</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">21</span><span class="cl">            <span class="k">continue</span>  <span class="c1"># notification, no response expected</span>
</span></span><span class="line"><span class="ln">22</span><span class="cl">        <span class="k">try</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">23</span><span class="cl">            <span class="n">result</span> <span class="o">=</span> <span class="n">handler</span><span class="p">(</span><span class="n">params</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">24</span><span class="cl">            <span class="n">respond</span><span class="p">(</span><span class="n">rid</span><span class="p">,</span> <span class="n">result</span><span class="o">=</span><span class="n">result</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">25</span><span class="cl">        <span class="k">except</span> <span class="ne">Exception</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">26</span><span class="cl">            <span class="n">log</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;  ✗ handler error: </span><span class="si">{</span><span class="n">e</span><span class="si">}</span><span class="s2">&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">27</span><span class="cl">            <span class="n">respond</span><span class="p">(</span><span class="n">rid</span><span class="p">,</span> <span class="n">error</span><span class="o">=</span><span class="p">{</span><span class="s2">&#34;code&#34;</span><span class="p">:</span> <span class="o">-</span><span class="mi">32000</span><span class="p">,</span> <span class="s2">&#34;message&#34;</span><span class="p">:</span> <span class="nb">str</span><span class="p">(</span><span class="n">e</span><span class="p">)})</span></span></span></code></pre></div><p><strong>每段做什麼</strong>：</p>
<ol>
<li><strong><code>log(...)</code> 開機訊息</strong>：印到 stderr（不是 stdout）、讓人類能看到 server 啟動了、什麼 tools 可用。stdout 完全保留給 JSON-RPC 用。</li>
<li><strong><code>for line in sys.stdin</code></strong>：MCP 的 stdio transport 是 line-delimited JSON—— 每個 message 一行、<code>\n</code> 結束。Python 的 file iteration 自動按行切。</li>
<li><strong><code>line.strip()</code> + <code>if not line</code></strong>：空行 skip（不是 protocol error、只是 idle）。</li>
<li><strong><code>json.loads(line)</code></strong> with <code>try / except</code>：parse 失敗（malformed input）不 crash、log error 繼續下一行。Protocol 訊息該是合法 JSON、parse error 表示 client 出錯。</li>
<li><strong><code>msg.get(&quot;method&quot;)</code> / <code>msg.get(&quot;id&quot;)</code> / <code>msg.get(&quot;params&quot;, {})</code></strong>：JSON-RPC 2.0 標準三個欄位。<code>get</code> 而不是 <code>[]</code>、避免 KeyError；params 預設空 dict、後面 handler 可以安全 <code>.get(&quot;xxx&quot;)</code>。</li>
<li><strong><code>if method not in HANDLERS: respond(rid, error={&quot;code&quot;: -32601, ...})</code></strong>：未知 method 回標準 JSON-RPC error <code>-32601</code>（Method not found）。Client 知道這個 method 不能用、但 server 不死。</li>
<li><strong><code>if handler is None: continue</code></strong>：notification（如 <code>notifications/initialized</code>）對應的 handler 是 <code>None</code>、不該回 response。</li>
<li><strong><code>try: result = handler(params); respond(rid, result=result)</code></strong>：呼叫 handler、把結果回給 client。</li>
<li><strong><code>except Exception as e: ... respond(rid, error={&quot;code&quot;: -32000, ...})</code></strong>：handler 內部錯誤回 <code>-32000</code>（generic server error）。確保 server 任何時候都不 crash、即使工具 bug 也讓 client 拿到 error response。</li>
</ol>
<p><strong>為什麼這樣設計</strong>：</p>
<ul>
<li><strong>為什麼用 line-delimited JSON、不是 length-prefixed</strong>：MCP spec 規定 stdio transport 是 newline-delimited。length-prefixed 是 LSP 的做法、解析複雜（要先讀 Content-Length header 再讀 N bytes）；newline-delimited 用 <code>for line in sys.stdin</code> 一行解決。</li>
<li><strong>為什麼 stderr 不能寫 stdout</strong>：stdio transport 的 invariant——stdout 是 protocol channel、只能寫 JSON-RPC message。任何 stray print() / debug output 進 stdout、會被 client parse JSON 時炸（「multiple JSON values on one line」或 invalid JSON）。所有 log / debug / progress message 必須走 stderr。寫錯這條 server 看起來不工作、debug 很久才找到。</li>
<li><strong>為什麼 dispatch 用 dict-of-handlers 而不是 if/elif chain</strong>：擴充性。加新 method 只要往 <code>HANDLERS</code> dict 加一項、不用改 main loop。也讓 dispatch logic 跟 method 實作分離、容易測試。</li>
<li><strong>為什麼每個 handler 都用 try/except 包</strong>：「single point of failure」設計——任何 handler 例外不影響其他 method。Server 應該是 long-running daemon、不能因為一個 tool bug 死掉。</li>
<li><strong>為什麼 errors 用 JSON-RPC error code 而不是 HTTP-style status</strong>：JSON-RPC 2.0 標準。<code>-32700</code> parse error、<code>-32600</code> invalid request、<code>-32601</code> method not found、<code>-32602</code> invalid params、<code>-32603</code> internal error、<code>-32000</code> to <code>-32099</code> 留給應用層自訂。</li>
</ul>
<h3 id="工具search_blog">工具：search_blog</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="k">def</span> <span class="nf">tool_search_blog</span><span class="p">(</span><span class="n">query</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span> <span class="n">top_k</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="mi">5</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">dict</span><span class="p">:</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="n">records</span> <span class="o">=</span> <span class="n">load_index</span><span class="p">()</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">    <span class="n">q_vec</span> <span class="o">=</span> <span class="n">embed</span><span class="p">(</span><span class="n">query</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">    <span class="n">scored</span> <span class="o">=</span> <span class="nb">sorted</span><span class="p">(</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">        <span class="p">((</span><span class="n">cosine</span><span class="p">(</span><span class="n">q_vec</span><span class="p">,</span> <span class="n">r</span><span class="p">[</span><span class="s2">&#34;embedding&#34;</span><span class="p">]),</span> <span class="n">r</span><span class="p">)</span> <span class="k">for</span> <span class="n">r</span> <span class="ow">in</span> <span class="n">records</span><span class="p">),</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">        <span class="n">key</span><span class="o">=</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">x</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">        <span class="n">reverse</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">    <span class="p">)[:</span><span class="n">top_k</span><span class="p">]</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">    <span class="n">results</span> <span class="o">=</span> <span class="p">[</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">        <span class="p">{</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">            <span class="s2">&#34;source&#34;</span><span class="p">:</span> <span class="n">r</span><span class="p">[</span><span class="s2">&#34;source&#34;</span><span class="p">],</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">            <span class="s2">&#34;chunk_index&#34;</span><span class="p">:</span> <span class="n">r</span><span class="p">[</span><span class="s2">&#34;chunk_index&#34;</span><span class="p">],</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">            <span class="s2">&#34;score&#34;</span><span class="p">:</span> <span class="nb">round</span><span class="p">(</span><span class="n">score</span><span class="p">,</span> <span class="mi">4</span><span class="p">),</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">            <span class="s2">&#34;preview&#34;</span><span class="p">:</span> <span class="n">r</span><span class="p">[</span><span class="s2">&#34;text&#34;</span><span class="p">][:</span><span class="mi">160</span><span class="p">]</span> <span class="o">+</span> <span class="p">(</span><span class="s2">&#34;...&#34;</span> <span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">r</span><span class="p">[</span><span class="s2">&#34;text&#34;</span><span class="p">])</span> <span class="o">&gt;</span> <span class="mi">160</span> <span class="k">else</span> <span class="s2">&#34;&#34;</span><span class="p">),</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl">        <span class="k">for</span> <span class="n">score</span><span class="p">,</span> <span class="n">r</span> <span class="ow">in</span> <span class="n">scored</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl">    <span class="p">]</span>
</span></span><span class="line"><span class="ln">18</span><span class="cl">    <span class="k">return</span> <span class="p">{</span><span class="s2">&#34;content&#34;</span><span class="p">:</span> <span class="p">[{</span><span class="s2">&#34;type&#34;</span><span class="p">:</span> <span class="s2">&#34;text&#34;</span><span class="p">,</span> <span class="s2">&#34;text&#34;</span><span class="p">:</span> <span class="n">json</span><span class="o">.</span><span class="n">dumps</span><span class="p">(</span><span class="n">results</span><span class="p">,</span> <span class="n">ensure_ascii</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span> <span class="n">indent</span><span class="o">=</span><span class="mi">2</span><span class="p">)}]}</span></span></span></code></pre></div><p><strong>每段做什麼</strong>：</p>
<ol>
<li><strong><code>records = load_index()</code></strong>：lazy load <code>index.pkl</code>、第一次 call 載入記憶體、後續直接用 cached。Server 啟動時 lazy load 而不是 import 時 load、讓 server 即使在 Ollama 還沒起 / index 不存在時也能 boot（之後 call 才會報 error）。</li>
<li><strong><code>q_vec = embed(query)</code></strong>：把 query 轉成 768 維向量、呼叫 Ollama embedding API、跟 RAG demo 的 <code>embed</code> 是同一個 function。</li>
<li><strong><code>sorted((...) for r in records, key=lambda x: x[0], reverse=True)[:top_k]</code></strong>：generator expression + sorted 一次完成「算分 → 排序 → 取 top-K」。</li>
<li><strong><code>results = [{...} for score, r in scored]</code></strong>：把 top-K 整理成 client 友善的 dict 結構、含 source、chunk_index、score、preview（前 160 字 + 省略號）。</li>
<li><strong><code>{&quot;content&quot;: [{&quot;type&quot;: &quot;text&quot;, &quot;text&quot;: json.dumps(...)}]}</code></strong>：MCP <code>tools/call</code> 標準 response 格式——<code>content</code> 是 array、每個元素 type + payload。<code>type: &quot;text&quot;</code> 是文字 content、<code>text</code> 是實際內容（這裡是 JSON 字串、讓 LLM 可以 parse）。</li>
</ol>
<p><strong>為什麼這樣設計</strong>：</p>
<ul>
<li><strong>為什麼 generator expression 而非 list comprehension</strong>：<code>(... for r in records)</code> 是 generator、<code>sorted</code> 直接消費、不會在記憶體中建中間 list。對 463 records 影響不大、但展現 memory-efficient pattern。</li>
<li><strong>為什麼 preview 切到 160 字</strong>：兩件事的平衡——讓 LLM 看到的 search result 短（不淹沒 LLM 的 context）、但夠判讀（160 中文字約 80 token、能看出 chunk 是不是相關）。如果 LLM 要完整內容、再 call <code>read_chunk</code>。</li>
<li><strong>為什麼回傳 JSON 字串、不是 nested object</strong>：MCP <code>content</code> 規定每個 element 是 <code>{type, payload}</code>、<code>type: &quot;text&quot;</code> 的 <code>text</code> 必須是 string、不能直接放 nested object。要傳結構化資料、就把它 <code>json.dumps</code> 成字串。LLM 看到後可以自己 parse。</li>
<li><strong>為什麼 <code>ensure_ascii=False</code></strong>：預設 <code>json.dumps</code> 把非 ASCII 字元（如中文）轉成 <code>\uXXXX</code>、難讀。<code>ensure_ascii=False</code> 直接輸出 UTF-8、LLM 也能直接讀懂、節省 token 數（一個中文字 1 token vs 6 token 的 <code>中</code>）。</li>
<li><strong>為什麼 <code>round(score, 4)</code></strong>：score 是 float、原始可能是 <code>0.7497284598827362</code>、長且無意義。<code>round(score, 4)</code> 保留 4 位小數、<code>0.7497</code>、夠精確、wire size 短。</li>
</ul>
<h3 id="工具read_chunk">工具：read_chunk</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">def</span> <span class="nf">tool_read_chunk</span><span class="p">(</span><span class="n">source</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span> <span class="n">chunk_index</span><span class="p">:</span> <span class="nb">int</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">dict</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">    <span class="n">records</span> <span class="o">=</span> <span class="n">load_index</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">    <span class="k">for</span> <span class="n">r</span> <span class="ow">in</span> <span class="n">records</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">        <span class="k">if</span> <span class="n">r</span><span class="p">[</span><span class="s2">&#34;source&#34;</span><span class="p">]</span> <span class="o">==</span> <span class="n">source</span> <span class="ow">and</span> <span class="n">r</span><span class="p">[</span><span class="s2">&#34;chunk_index&#34;</span><span class="p">]</span> <span class="o">==</span> <span class="n">chunk_index</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">            <span class="k">return</span> <span class="p">{</span><span class="s2">&#34;content&#34;</span><span class="p">:</span> <span class="p">[{</span><span class="s2">&#34;type&#34;</span><span class="p">:</span> <span class="s2">&#34;text&#34;</span><span class="p">,</span> <span class="s2">&#34;text&#34;</span><span class="p">:</span> <span class="n">r</span><span class="p">[</span><span class="s2">&#34;text&#34;</span><span class="p">]}]}</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl">    <span class="k">return</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">7</span><span class="cl">        <span class="s2">&#34;content&#34;</span><span class="p">:</span> <span class="p">[{</span><span class="s2">&#34;type&#34;</span><span class="p">:</span> <span class="s2">&#34;text&#34;</span><span class="p">,</span> <span class="s2">&#34;text&#34;</span><span class="p">:</span> <span class="sa">f</span><span class="s2">&#34;Not found: </span><span class="si">{</span><span class="n">source</span><span class="si">}</span><span class="s2">#chunk</span><span class="si">{</span><span class="n">chunk_index</span><span class="si">}</span><span class="s2">&#34;</span><span class="p">}],</span>
</span></span><span class="line"><span class="ln">8</span><span class="cl">        <span class="s2">&#34;isError&#34;</span><span class="p">:</span> <span class="kc">True</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">9</span><span class="cl">    <span class="p">}</span></span></span></code></pre></div><p><strong>每段做什麼</strong>：</p>
<ol>
<li><strong><code>for r in records: if r[&quot;source&quot;] == source and r[&quot;chunk_index&quot;] == chunk_index: return ...</code></strong>：linear scan 找匹配的 record、找到回完整 text。</li>
<li><strong>找不到時 <code>return {... &quot;isError&quot;: True}</code></strong>：MCP 標準的「tool 內部失敗」訊號。<code>isError: True</code> 告訴 client「這個 tool call 失敗了」、<code>content</code> 內是 human-readable error message。</li>
</ol>
<p><strong>為什麼這樣設計</strong>：</p>
<ul>
<li><strong>為什麼 linear scan 而不是 dict lookup</strong>：可以改用 <code>{(source, chunk_index): record}</code> dict 變 O(1)。但 463 records 的 linear scan 是 &lt; 1ms、optimize 不值得。Production 跟 vector DB 整合時、retrieval 系統自帶 indexing。</li>
<li><strong>為什麼 <code>isError: True</code> 而不是 JSON-RPC error</strong>：分兩種錯誤：
<ul>
<li><strong>Protocol error</strong>：method 不存在、params 不合法、JSON parse 失敗——回 JSON-RPC <code>error</code> 物件。</li>
<li><strong>Tool semantic error</strong>：method OK、params OK、但 tool 邏輯上不能 complete（找不到資料、外部 service down）——回 normal response 加 <code>isError: True</code>。
MCP 設計這層分離、讓 client / LLM 區分「我做錯了」（協議層）跟「資料不存在」（語意層）。Production 設計工具時要仔細區分。</li>
</ul>
</li>
</ul>
<h3 id="tool-描述用-json-schema">Tool 描述用 JSON Schema</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="n">TOOLS</span> <span class="o">=</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="s2">&#34;search_blog&#34;</span><span class="p">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">        <span class="s2">&#34;description&#34;</span><span class="p">:</span> <span class="s2">&#34;Semantic search over blog content. Returns top-K relevant chunks with source paths.&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">        <span class="s2">&#34;inputSchema&#34;</span><span class="p">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">            <span class="s2">&#34;type&#34;</span><span class="p">:</span> <span class="s2">&#34;object&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">            <span class="s2">&#34;properties&#34;</span><span class="p">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">                <span class="s2">&#34;query&#34;</span><span class="p">:</span> <span class="p">{</span><span class="s2">&#34;type&#34;</span><span class="p">:</span> <span class="s2">&#34;string&#34;</span><span class="p">,</span> <span class="s2">&#34;description&#34;</span><span class="p">:</span> <span class="s2">&#34;Natural language query&#34;</span><span class="p">},</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">                <span class="s2">&#34;top_k&#34;</span><span class="p">:</span> <span class="p">{</span><span class="s2">&#34;type&#34;</span><span class="p">:</span> <span class="s2">&#34;integer&#34;</span><span class="p">,</span> <span class="s2">&#34;default&#34;</span><span class="p">:</span> <span class="mi">5</span><span class="p">,</span> <span class="s2">&#34;minimum&#34;</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span> <span class="s2">&#34;maximum&#34;</span><span class="p">:</span> <span class="mi">20</span><span class="p">},</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">            <span class="p">},</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">            <span class="s2">&#34;required&#34;</span><span class="p">:</span> <span class="p">[</span><span class="s2">&#34;query&#34;</span><span class="p">],</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">        <span class="p">},</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">        <span class="s2">&#34;fn&#34;</span><span class="p">:</span> <span class="k">lambda</span> <span class="n">args</span><span class="p">:</span> <span class="n">tool_search_blog</span><span class="p">(</span><span class="n">args</span><span class="p">[</span><span class="s2">&#34;query&#34;</span><span class="p">],</span> <span class="n">args</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">&#34;top_k&#34;</span><span class="p">,</span> <span class="mi">5</span><span class="p">)),</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">    <span class="p">},</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">    <span class="s2">&#34;read_chunk&#34;</span><span class="p">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">        <span class="s2">&#34;description&#34;</span><span class="p">:</span> <span class="s2">&#34;Read the full text of a specific chunk by source path and chunk index.&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl">        <span class="s2">&#34;inputSchema&#34;</span><span class="p">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl">            <span class="s2">&#34;type&#34;</span><span class="p">:</span> <span class="s2">&#34;object&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">18</span><span class="cl">            <span class="s2">&#34;properties&#34;</span><span class="p">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">19</span><span class="cl">                <span class="s2">&#34;source&#34;</span><span class="p">:</span> <span class="p">{</span><span class="s2">&#34;type&#34;</span><span class="p">:</span> <span class="s2">&#34;string&#34;</span><span class="p">,</span> <span class="s2">&#34;description&#34;</span><span class="p">:</span> <span class="s2">&#34;Markdown file path relative to content/&#34;</span><span class="p">},</span>
</span></span><span class="line"><span class="ln">20</span><span class="cl">                <span class="s2">&#34;chunk_index&#34;</span><span class="p">:</span> <span class="p">{</span><span class="s2">&#34;type&#34;</span><span class="p">:</span> <span class="s2">&#34;integer&#34;</span><span class="p">,</span> <span class="s2">&#34;minimum&#34;</span><span class="p">:</span> <span class="mi">0</span><span class="p">},</span>
</span></span><span class="line"><span class="ln">21</span><span class="cl">            <span class="p">},</span>
</span></span><span class="line"><span class="ln">22</span><span class="cl">            <span class="s2">&#34;required&#34;</span><span class="p">:</span> <span class="p">[</span><span class="s2">&#34;source&#34;</span><span class="p">,</span> <span class="s2">&#34;chunk_index&#34;</span><span class="p">],</span>
</span></span><span class="line"><span class="ln">23</span><span class="cl">        <span class="p">},</span>
</span></span><span class="line"><span class="ln">24</span><span class="cl">        <span class="s2">&#34;fn&#34;</span><span class="p">:</span> <span class="k">lambda</span> <span class="n">args</span><span class="p">:</span> <span class="n">tool_read_chunk</span><span class="p">(</span><span class="n">args</span><span class="p">[</span><span class="s2">&#34;source&#34;</span><span class="p">],</span> <span class="n">args</span><span class="p">[</span><span class="s2">&#34;chunk_index&#34;</span><span class="p">]),</span>
</span></span><span class="line"><span class="ln">25</span><span class="cl">    <span class="p">},</span>
</span></span><span class="line"><span class="ln">26</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p><strong>每個 field 角色</strong>：</p>
<ol>
<li><strong><code>description</code></strong>：給 LLM 看的、解釋這個 tool 解什麼問題。LLM 看 description 決定何時 call。<strong>這是模型 follow tool 的最主要訊號</strong>——寫得清晰具體、模型用得對。</li>
<li><strong><code>inputSchema</code></strong>：JSON Schema、描述 tool 接受的參數結構。LLM application 用這個 schema 約束 LLM 生成「合法的呼叫」。</li>
<li><strong><code>properties</code></strong>：每個參數的型別 + 約束。</li>
<li><strong><code>required</code></strong>：必填參數清單。LLM 漏掉時、client 端可以 reject、不會浪費 round-trip。</li>
<li><strong><code>default</code></strong>：可選參數的預設值。傳的時候不給、tool 就用 default。</li>
<li><strong><code>minimum</code> / <code>maximum</code></strong>：數值約束。<code>top_k</code> 設 1-20 是因為 &lt; 1 沒意義、&gt; 20 浪費 retrieval。</li>
<li><strong><code>fn</code></strong>：實際 dispatch 用的 callable。本 demo 用 lambda 把 <code>args</code> dict 轉成 positional / keyword call。</li>
</ol>
<p><strong>為什麼這樣設計</strong>：</p>
<ul>
<li><strong>為什麼 description 要具體</strong>：LLM 看 description 決定 call 時機。「search the blog」對 LLM 來說太模糊（搜什麼？找什麼？）、改成「Semantic search over blog content. Returns top-K relevant chunks with source paths.」明確描述輸入跟輸出形狀、LLM 能判讀「使用者問技術問題時該 call 這個」。</li>
<li><strong>為什麼 schema 用 JSON Schema、不是自訂格式</strong>：JSON Schema 是 web 標準、所有 LLM application 都認識、跨 framework 可移植。也是 <a href="/blog/llm/knowledge-cards/function-calling/" data-link-title="Function Calling" data-link-desc="模型訓練階段建立的「呼叫工具」能力：知道何時該呼叫、傳什麼參數">function calling</a> 跟 <a href="/blog/llm/04-applications/tool-use-principles/" data-link-title="4.3 Tool use 原理：LLM 跟外部世界互動" data-link-desc="Structured output 是 LLM 跨入工程系統的橋、function calling 取捨、為什麼本地小模型 tool use 表現崩潰">Tool use 原理</a> 的 schema 描述語言。</li>
<li><strong>為什麼 <code>required</code> 跟 <code>default</code> 兩個機制</strong>：對 LLM 看的 prompt 越清楚越好。<code>required</code> 告訴 LLM「不傳這個會錯」、<code>default</code> 告訴 LLM「可不傳、預設值是 X」。沒分清的話、LLM 可能總是傳所有參數、雜訊多。</li>
<li><strong>為什麼 <code>fn</code> 用 lambda 包</strong>：實際 tool function 是 positional args、但 client 送的是 dict。lambda 把 dict 拆成 function call 的 args。也方便將來如果 tool function signature 變、只要改 lambda 不用改 dispatcher。</li>
</ul>
<h2 id="client-實作測試用">Client 實作（測試用）</h2>
<p>完整檔案：<code>scripts/mcp-demo/test_client.py</code>。實際 production 用 Claude Desktop / Cursor 等 MCP-capable application。本 demo 寫一個 stdio client、模擬 application 行為：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="n">proc</span> <span class="o">=</span> <span class="n">subprocess</span><span class="o">.</span><span class="n">Popen</span><span class="p">(</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="p">[</span><span class="n">sys</span><span class="o">.</span><span class="n">executable</span><span class="p">,</span> <span class="nb">str</span><span class="p">(</span><span class="n">SERVER</span><span class="p">)],</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">    <span class="n">stdin</span><span class="o">=</span><span class="n">subprocess</span><span class="o">.</span><span class="n">PIPE</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">    <span class="n">stdout</span><span class="o">=</span><span class="n">subprocess</span><span class="o">.</span><span class="n">PIPE</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">    <span class="n">stderr</span><span class="o">=</span><span class="n">subprocess</span><span class="o">.</span><span class="n">PIPE</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">    <span class="n">text</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">    <span class="n">bufsize</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="k">def</span> <span class="nf">send</span><span class="p">(</span><span class="n">method</span><span class="p">,</span> <span class="n">params</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">rid</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">    <span class="n">msg</span> <span class="o">=</span> <span class="p">{</span><span class="s2">&#34;jsonrpc&#34;</span><span class="p">:</span> <span class="s2">&#34;2.0&#34;</span><span class="p">,</span> <span class="s2">&#34;method&#34;</span><span class="p">:</span> <span class="n">method</span><span class="p">}</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">    <span class="k">if</span> <span class="n">params</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">        <span class="n">msg</span><span class="p">[</span><span class="s2">&#34;params&#34;</span><span class="p">]</span> <span class="o">=</span> <span class="n">params</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">    <span class="k">if</span> <span class="n">rid</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">        <span class="n">msg</span><span class="p">[</span><span class="s2">&#34;id&#34;</span><span class="p">]</span> <span class="o">=</span> <span class="n">rid</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl">    <span class="n">proc</span><span class="o">.</span><span class="n">stdin</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">json</span><span class="o">.</span><span class="n">dumps</span><span class="p">(</span><span class="n">msg</span><span class="p">)</span> <span class="o">+</span> <span class="s2">&#34;</span><span class="se">\n</span><span class="s2">&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl">    <span class="n">proc</span><span class="o">.</span><span class="n">stdin</span><span class="o">.</span><span class="n">flush</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">18</span><span class="cl">    <span class="k">if</span> <span class="n">rid</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">19</span><span class="cl">        <span class="k">return</span> <span class="kc">None</span>  <span class="c1"># notification</span>
</span></span><span class="line"><span class="ln">20</span><span class="cl">    <span class="n">line</span> <span class="o">=</span> <span class="n">proc</span><span class="o">.</span><span class="n">stdout</span><span class="o">.</span><span class="n">readline</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">21</span><span class="cl">    <span class="k">return</span> <span class="n">json</span><span class="o">.</span><span class="n">loads</span><span class="p">(</span><span class="n">line</span><span class="p">)</span></span></span></code></pre></div><p><strong>每個參數做什麼</strong>：</p>
<ol>
<li><strong><code>subprocess.Popen([sys.executable, str(SERVER)], ...)</code></strong>：spawn server 當 child process。用 <code>sys.executable</code> 確保用同一個 Python interpreter（避免 venv 跟系統 Python 混用）。</li>
<li><strong><code>stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE</code></strong>：三條 pipe 都接到 client、讓我們能讀寫 server 的 stdio。</li>
<li><strong><code>text=True</code></strong>：自動處理 str ↔ bytes 編碼、直接讀寫字串、不用手動 encode/decode。預設是 binary mode。</li>
<li><strong><code>bufsize=1</code></strong>：line buffering、每寫一行就 flush。沒這個的話、Python 預設 block buffering（4KB 才 flush）、client 寫的 message server 看不到、整個卡住。</li>
<li><strong><code>proc.stdin.write(json.dumps(msg) + &quot;\n&quot;)</code></strong>：寫 JSON 訊息、結尾加 <code>\n</code>（line-delimited）。</li>
<li><strong><code>proc.stdin.flush()</code></strong>：強制立刻送出。即使有 <code>bufsize=1</code>、明確 flush 是好習慣、避免任何 buffer 累積。</li>
<li><strong><code>if rid is None: return None</code></strong>：notification 不該等 response。</li>
<li><strong><code>line = proc.stdout.readline()</code> + <code>json.loads(line)</code></strong>：讀一行 response、parse。</li>
</ol>
<p><strong>為什麼這樣設計</strong>：</p>
<ul>
<li><strong>為什麼 stdio 而不是 socket / HTTP</strong>：MCP stdio transport 的主要場景是「application spawn server」(Claude Desktop 開 Python 進程當 MCP server)。Stdio 自然形成 1-to-1 ownership、不需要 port allocation、不需要 auth。HTTP transport 也存在、用在 multi-client 場景。</li>
<li><strong>為什麼 <code>bufsize=1</code> 這麼關鍵</strong>：Python 預設 stdio buffer 4KB。如果 server / client 任一邊寫了 short message 但沒 fill 4KB、message 不會被另一邊看到、protocol 卡死。看起來是 hang、debug 困難。<code>bufsize=1</code> 強制 line buffering、解決這個 deadlock。</li>
<li><strong>為什麼 <code>text=True</code></strong>：JSON-RPC 都是文字、binary mode 要手動 <code>.encode()</code> / <code>.decode()</code>、增加複雜度。<code>text=True</code> 自動處理 UTF-8。</li>
</ul>
<h2 id="跑通整條流程">跑通整條流程</h2>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="nb">cd</span> ~/Projects/blog
</span></span><span class="line"><span class="ln">2</span><span class="cl">python3 scripts/mcp-demo/test_client.py</span></span></code></pre></div><ul>
<li><code>cd ~/Projects/blog</code>：切到 repo 根、讓 SERVER 路徑相對解析正確。</li>
<li><code>python3 scripts/mcp-demo/test_client.py</code>：跑 test client、它會 spawn server 跟它對話。</li>
</ul>
<p>預期看到五個階段：</p>
<h3 id="1-initialize握手">1. initialize（握手）</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-json" data-lang="json"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="err">===</span> <span class="mi">1</span><span class="err">.</span> <span class="err">initialize</span> <span class="err">===</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="p">{</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">  <span class="nt">&#34;jsonrpc&#34;</span><span class="p">:</span> <span class="s2">&#34;2.0&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">  <span class="nt">&#34;id&#34;</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">  <span class="nt">&#34;result&#34;</span><span class="p">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">    <span class="nt">&#34;protocolVersion&#34;</span><span class="p">:</span> <span class="s2">&#34;2025-03-26&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">    <span class="nt">&#34;capabilities&#34;</span><span class="p">:</span> <span class="p">{</span><span class="nt">&#34;tools&#34;</span><span class="p">:</span> <span class="p">{}},</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">    <span class="nt">&#34;serverInfo&#34;</span><span class="p">:</span> <span class="p">{</span><span class="nt">&#34;name&#34;</span><span class="p">:</span> <span class="s2">&#34;blog-mcp-demo&#34;</span><span class="p">,</span> <span class="nt">&#34;version&#34;</span><span class="p">:</span> <span class="s2">&#34;0.1.0&#34;</span><span class="p">}</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">  <span class="p">}</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p><strong>Protocol 意義</strong>：</p>
<ul>
<li><code>protocolVersion</code>：server 支援的 MCP 版本。Client 要 negotiate（自己 cap 較新時要 downgrade）。</li>
<li><code>capabilities.tools: {}</code>：server 宣告「我支援 tools 功能」、空 object 表示沒額外 sub-feature。Client 拿到後知道可以 call <code>tools/list</code>。</li>
<li><code>serverInfo</code>：server 識別資訊、給 client 顯示用（debug、logging）。</li>
<li><code>id: 1</code>：對應 client 送的 request id、讓 client 知道這個 response 是哪個 request 的。</li>
</ul>
<h3 id="2-toolslist">2. tools/list</h3>
<p>Server 回兩個 tool 的完整 schema：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-json" data-lang="json"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="p">{</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">  <span class="nt">&#34;tools&#34;</span><span class="p">:</span> <span class="p">[</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">    <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">      <span class="nt">&#34;name&#34;</span><span class="p">:</span> <span class="s2">&#34;search_blog&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">      <span class="nt">&#34;description&#34;</span><span class="p">:</span> <span class="s2">&#34;Semantic search over blog content...&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">      <span class="nt">&#34;inputSchema&#34;</span><span class="p">:</span> <span class="p">{</span><span class="err">...JSON</span> <span class="err">Schema...</span><span class="p">}</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">    <span class="p">},</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">    <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">      <span class="nt">&#34;name&#34;</span><span class="p">:</span> <span class="s2">&#34;read_chunk&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">      <span class="nt">&#34;description&#34;</span><span class="p">:</span> <span class="s2">&#34;Read the full text of a specific chunk...&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">      <span class="nt">&#34;inputSchema&#34;</span><span class="p">:</span> <span class="p">{</span><span class="err">...</span><span class="p">}</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">  <span class="p">]</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p><strong>Protocol 意義</strong>：這個輸出就是 LLM application 會塞給 LLM 的 tool 描述。LLM application 把這份 schema 用 <a href="/blog/llm/knowledge-cards/function-calling/" data-link-title="Function Calling" data-link-desc="模型訓練階段建立的「呼叫工具」能力：知道何時該呼叫、傳什麼參數">function calling</a> 機制給模型看、模型決定何時呼叫、傳什麼參數。Server 跟模型之間靠這層 schema 對齊、模型不直接呼叫 server、是經 application 中介。</p>
<h3 id="3-toolscall-search_blog">3. tools/call: search_blog</h3>
<p>Client 送：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-json" data-lang="json"><span class="line"><span class="ln">1</span><span class="cl"><span class="p">{</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">  <span class="nt">&#34;method&#34;</span><span class="p">:</span> <span class="s2">&#34;tools/call&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">  <span class="nt">&#34;params&#34;</span><span class="p">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">    <span class="nt">&#34;name&#34;</span><span class="p">:</span> <span class="s2">&#34;search_blog&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">    <span class="nt">&#34;arguments&#34;</span><span class="p">:</span> <span class="p">{</span><span class="nt">&#34;query&#34;</span><span class="p">:</span> <span class="s2">&#34;什麼是 KV cache？&#34;</span><span class="p">,</span> <span class="nt">&#34;top_k&#34;</span><span class="p">:</span> <span class="mi">3</span><span class="p">}</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl">  <span class="p">},</span>
</span></span><span class="line"><span class="ln">7</span><span class="cl">  <span class="nt">&#34;id&#34;</span><span class="p">:</span> <span class="mi">3</span>
</span></span><span class="line"><span class="ln">8</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p><code>params</code> 包兩件事：</p>
<ul>
<li><code>name</code>：要 call 的 tool 名（matches <code>tools/list</code> 內某個 tool）。</li>
<li><code>arguments</code>：實際傳給 tool 的 dict、結構符合該 tool 的 <code>inputSchema</code>。</li>
</ul>
<p>Server 回 cosine 搜尋結果（preview）：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-json" data-lang="json"><span class="line"><span class="ln">1</span><span class="cl"><span class="p">[</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">  <span class="p">{</span><span class="nt">&#34;source&#34;</span><span class="p">:</span> <span class="s2">&#34;llm/00-foundations/hardware-memory-budget.md&#34;</span><span class="p">,</span> <span class="nt">&#34;chunk_index&#34;</span><span class="p">:</span> <span class="mi">5</span><span class="p">,</span> <span class="nt">&#34;score&#34;</span><span class="p">:</span> <span class="mf">0.7497</span><span class="p">,</span> <span class="nt">&#34;preview&#34;</span><span class="p">:</span> <span class="s2">&#34;| Context 長度 | KV cache 估算...&#34;</span><span class="p">},</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">  <span class="p">{</span><span class="nt">&#34;source&#34;</span><span class="p">:</span> <span class="s2">&#34;llm/00-foundations/why-llm-feels-slow.md&#34;</span><span class="p">,</span> <span class="nt">&#34;chunk_index&#34;</span><span class="p">:</span> <span class="mi">4</span><span class="p">,</span> <span class="nt">&#34;score&#34;</span><span class="p">:</span> <span class="mf">0.7212</span><span class="p">,</span> <span class="nt">&#34;preview&#34;</span><span class="p">:</span> <span class="s2">&#34;...&#34;</span><span class="p">},</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">  <span class="p">{</span><span class="nt">&#34;source&#34;</span><span class="p">:</span> <span class="s2">&#34;llm/03-theoretical-foundations/attention-mechanism.md&#34;</span><span class="p">,</span> <span class="nt">&#34;chunk_index&#34;</span><span class="p">:</span> <span class="mi">7</span><span class="p">,</span> <span class="nt">&#34;score&#34;</span><span class="p">:</span> <span class="mf">0.7176</span><span class="p">,</span> <span class="nt">&#34;preview&#34;</span><span class="p">:</span> <span class="s2">&#34;...&#34;</span><span class="p">}</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="p">]</span></span></span></code></pre></div><p>實測命中合理——KV cache 相關段落都被找到。</p>
<h3 id="4-toolscall-read_chunk">4. tools/call: read_chunk</h3>
<p>Client 用 search 拿到的 source + chunk_index、call <code>read_chunk</code> 拿完整內容：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-json" data-lang="json"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="p">{</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">  <span class="nt">&#34;method&#34;</span><span class="p">:</span> <span class="s2">&#34;tools/call&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">  <span class="nt">&#34;params&#34;</span><span class="p">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">    <span class="nt">&#34;name&#34;</span><span class="p">:</span> <span class="s2">&#34;read_chunk&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">    <span class="nt">&#34;arguments&#34;</span><span class="p">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">      <span class="nt">&#34;source&#34;</span><span class="p">:</span> <span class="s2">&#34;llm/00-foundations/hardware-memory-budget.md&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">      <span class="nt">&#34;chunk_index&#34;</span><span class="p">:</span> <span class="mi">5</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">  <span class="p">}</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>Server 回該 chunk 的完整 markdown 文字。這實現了「search → read」的兩段流程——避免 search 一次就把所有 chunk 完整內容塞給 LLM（context 暴炸）、讓 LLM 自己看 preview 決定要 deep dive 哪個。</p>
<h3 id="5-錯誤路徑">5. 錯誤路徑</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-json" data-lang="json"><span class="line"><span class="ln">1</span><span class="cl"><span class="err">===</span> <span class="mi">5</span><span class="err">.</span> <span class="err">unknown</span> <span class="err">method</span> <span class="err">(error</span> <span class="err">path)</span> <span class="err">===</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="p">{</span><span class="nt">&#34;jsonrpc&#34;</span><span class="p">:</span> <span class="s2">&#34;2.0&#34;</span><span class="p">,</span> <span class="nt">&#34;id&#34;</span><span class="p">:</span> <span class="mi">5</span><span class="p">,</span> <span class="nt">&#34;error&#34;</span><span class="p">:</span> <span class="p">{</span><span class="nt">&#34;code&#34;</span><span class="p">:</span> <span class="mi">-32601</span><span class="p">,</span> <span class="nt">&#34;message&#34;</span><span class="p">:</span> <span class="s2">&#34;Method not found: does/not/exist&#34;</span><span class="p">}}</span></span></span></code></pre></div><p><code>-32601</code> 是 JSON-RPC 標準 error code for unknown method。Server 對未知 method 回標準 error、不 crash。Client 知道這個 method 不能用、繼續其他操作。</p>
<h2 id="跟-claude-desktop--cursor-整合">跟 Claude Desktop / Cursor 整合</h2>
<p>把這個 server 接到實際 MCP-capable application：</p>
<h3 id="claude-desktop">Claude Desktop</h3>
<p>編輯 <code>~/Library/Application Support/Claude/claude_desktop_config.json</code>：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-json" data-lang="json"><span class="line"><span class="ln">1</span><span class="cl"><span class="p">{</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">  <span class="nt">&#34;mcpServers&#34;</span><span class="p">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">    <span class="nt">&#34;blog-search&#34;</span><span class="p">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">      <span class="nt">&#34;command&#34;</span><span class="p">:</span> <span class="s2">&#34;/path/to/python3&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">      <span class="nt">&#34;args&#34;</span><span class="p">:</span> <span class="p">[</span><span class="s2">&#34;&lt;absolute-path-to-blog&gt;/scripts/mcp-demo/blog_mcp_server.py&#34;</span><span class="p">]</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">7</span><span class="cl">  <span class="p">}</span>
</span></span><span class="line"><span class="ln">8</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p><strong>每個 field 做什麼</strong>：</p>
<ul>
<li><code>mcpServers</code>：MCP server 註冊表、key 是任意名稱（client 識別用）。</li>
<li><code>command</code>：spawn 用的 executable path。要寫絕對路徑、Claude Desktop 啟動時的 PATH 可能不含 <code>python3</code>。</li>
<li><code>args</code>：傳給 command 的 args list。第一個是 script path。</li>
</ul>
<p><strong>為什麼這樣設計</strong>：Claude Desktop 啟動時讀這個 config、對每個 server 用 <code>subprocess.spawn(command, args)</code> 起 child process、用 stdio 跟它對話。跟本 demo 的 <code>test_client.py</code> 做的事完全一樣、只是改成 GUI application 而已。</p>
<p>重啟 Claude Desktop 後、在對話框問「用 search_blog 找 KV cache 相關段落」、Claude 會自動 call tool 並用結果回答。</p>
<h3 id="cursor">Cursor</h3>
<p><code>.cursor/mcp.json</code>（per-project）或全域設定類似結構。具體欄位看當下版本文件。</p>
<p>兩種整合的共通點：<strong>MCP server 自己不變</strong>、只要 application 端配置 path 跟 args、整合就完成。這正是 4.3 章節 N×M → N+M 的具體展現——本 server 不為任何特定 application 客製化、就能被多個 application 接到。</p>
<h2 id="觀察跟原理對應">觀察跟原理對應</h2>
<p>回到 <a href="/blog/llm/04-applications/application-protocols/" data-link-title="4.6 應用層協議：function calling / structured output / MCP" data-link-desc="三個常被混為一談的概念：模型能力、sampling 約束、server 協議，三者的層級差異與組合方式">4.6 應用層協議</a> 的三層 framing：</p>
<table>
  <thead>
      <tr>
          <th>層級</th>
          <th>本 demo 是否實作</th>
          <th>怎麼實作</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>模型能力</td>
          <td>不在本 demo 範圍</td>
          <td>LLM application 自己決定用 GPT/Claude/Gemma</td>
      </tr>
      <tr>
          <td>Sampling 約束</td>
          <td>不在本 demo 範圍</td>
          <td>application + 推論伺服器配合</td>
      </tr>
      <tr>
          <td>Server 協議</td>
          <td><strong>本 demo 焦點</strong></td>
          <td>JSON-RPC over stdio + tools/list / tools/call</td>
      </tr>
  </tbody>
</table>
<p>這個分離正是 MCP 的核心收益：server 寫好之後、用什麼 LLM 跟它互動跟 server 無關。換掉 LLM、換掉 application、server code 完全不動。</p>
<h2 id="何時這份-demo-會過時">何時這份 demo 會過時</h2>
<ul>
<li><strong>MCP protocol version</strong>：目前用 <code>2025-03-26</code>、未來會更新、但「server 暴露 tool 給 application」的 framing 不變。</li>
<li><strong>JSON-RPC 細節</strong>：可能 transport 形式增加（HTTP / WebSocket）、stdio 不會消失。</li>
<li><strong>Tool 描述格式</strong>：JSON Schema 是 web 通用標準、不會被換掉。</li>
</ul>
<p>實作換代時、可以把手寫 JSON-RPC 換成官方 SDK、tool 內部邏輯（embedding / cosine / pickle）依需求換、但 protocol 骨架（initialize / tools/list / tools/call）會保留。</p>
<h2 id="跑這個-demo-的指令總結">跑這個 demo 的指令總結</h2>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># 前置：確認 Ollama 跑著、index.pkl 存在</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">ollama list <span class="p">|</span> grep nomic-embed-text
</span></span><span class="line"><span class="ln">3</span><span class="cl">ls scripts/rag-demo/index.pkl</span></span></code></pre></div><ul>
<li><code>ollama list</code>：列已下載 model、<code>grep</code> 過濾出 embedding model。沒看到表示要先 <code>ollama pull nomic-embed-text</code>。</li>
<li><code>ls scripts/rag-demo/index.pkl</code>：確認 RAG ingest 跑過、index 存在。沒看到要先跑 <code>python3 scripts/rag-demo/ingest.py</code>。</li>
</ul>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># 自動測試 MCP server</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">python3 scripts/mcp-demo/test_client.py</span></span></code></pre></div><ul>
<li>跑 test_client、spawn server、依序送 5 個 request 驗證 protocol。stdout 印 protocol 對話、stderr 印 server log。看到全部 5 階段 OK 就成功。</li>
</ul>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># 手動跟 server 互動（看 protocol 原始 wire format）</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">python3 scripts/mcp-demo/blog_mcp_server.py
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="c1"># 然後手打：{&#34;jsonrpc&#34;:&#34;2.0&#34;,&#34;id&#34;:1,&#34;method&#34;:&#34;initialize&#34;,&#34;params&#34;:{}}</span></span></span></code></pre></div><ul>
<li>直接 invoke server、它讀 stdin 等 request。手打 JSON-RPC 訊息、看 server 回。是學 protocol 最直接的方式——你會看到 wire format 真實長相、跟自動 client 包裝後不一樣。</li>
</ul>
<p>完整 source 在 <code>scripts/mcp-demo/</code>、約 250 行 Python、stdlib only。</p>
<p>跟其他 hands-on 章節的關係：完整 hands-on 系列見 <a href="/blog/llm/01-local-llm-services/hands-on/" data-link-title="Hands-on：本地 AI 工具實作筆記" data-link-desc="Ollama / ComfyUI / Whisper / Piper TTS：實際安裝、驗證、跑通的紀錄。隨工具版本演化、跟 1.x 原理章節互補。">Hands-on 章節索引</a>、本 demo 依賴的索引由 <a href="/blog/llm/01-local-llm-services/hands-on/rag-demo/" data-link-title="Hands-on：用 blog content 當 corpus 跑 RAG" data-link-desc="200 行 Python：embedding &#43; cosine retrieval &#43; Ollama chat、validating 4.0 RAG 原理">RAG demo</a> ingest 產生、MCP + RAG 同跑的記憶體 / 程序預算見 <a href="/blog/llm/01-local-llm-services/hands-on/rag-mcp-resources/" data-link-title="Hands-on：RAG / MCP 的資源 footprint" data-link-desc="RAG ingest / query / MCP server 三階段的 RAM / 磁碟 / process 實測、多模型並存的 RAM 衝突、本地 LLM 跑 RAG 跟單純 chat 的差異">RAG + MCP resource footprint</a>、術語見 <a href="/blog/llm/knowledge-cards/mcp/" data-link-title="MCP（Model Context Protocol）" data-link-desc="LLM application ↔ 外部 tool server 之間的標準化協議、複用 OpenAI 相容 API 的成功模式">MCP</a>。</p>
]]></content:encoded></item><item><title>Hands-on：Ollama 改檔案 / 寫程式碼的權限邊界在哪</title><link>https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/permission-boundary/</link><pubDate>Tue, 12 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/permission-boundary/</guid><description>&lt;p>「Ollama 自己改檔案要不要 sudo？」「叫它寫 &lt;code>rm -rf&lt;/code> 會直接刪嗎？」這類問題的答案來自一個根本事實：&lt;strong>LLM 是 pure function、文字進、文字出、本身沒任何 file system / shell / network 副作用&lt;/strong>。改檔案、刪檔案、發網路請求、執行 shell command——全部由 &lt;strong>wrapper 或人類&lt;/strong>做。LLM 「以為」自己做了什麼、跟實際發生什麼是兩件事。&lt;/p>
&lt;p>本篇用四組對照實驗證明這個事實、再展開 wrapper 三檔審查粒度的設計取捨。這跟 &lt;a href="https://tarrragon.github.io/blog/llm/04-applications/tool-use-principles/" data-link-title="4.3 Tool use 原理：LLM 跟外部世界互動" data-link-desc="Structured output 是 LLM 跨入工程系統的橋、function calling 取捨、為什麼本地小模型 tool use 表現崩潰">4.3 副作用範圍設計&lt;/a>、&lt;a href="https://tarrragon.github.io/blog/llm/04-applications/agent-architecture/" data-link-title="4.4 Agent 架構原理" data-link-desc="Agent loop 結構、失敗模式、什麼任務適合 vs 不適合、跟人類審查的協作模型">4.4 Agent 跟人類審查的協作模型&lt;/a>、&lt;a href="https://tarrragon.github.io/blog/llm/00-foundations/privacy-data-flow/" data-link-title="0.7 隱私 / 資安的資料流原理" data-link-desc="從「位置」到「資料流」的思考升級：信任邊界、合約模型、零信任原則套用到 LLM 工作流">0.7 隱私資料流原理&lt;/a> 三個原則章節對應、實作層的權限與供應鏈判讀對應 &lt;a href="https://tarrragon.github.io/blog/llm/06-security/tool-use-permission-model/" data-link-title="6.2 tool use 與 MCP server 的權限模型" data-link-desc="個人 dev 場景下 tool use / MCP server 的副作用權限：檔案系統 / shell / 網路存取邊界、第三方 MCP 信任、副作用的可逆性">6.2 tool use 與 MCP server 的權限模型&lt;/a> 跟 &lt;a href="https://tarrragon.github.io/blog/llm/06-security/model-supply-chain-trust/" data-link-title="6.0 模型供應鏈與信任邊界" data-link-desc="個人 dev 用本地 LLM 時的模型權重來源信任：GGUF 完整性、Hugging Face / Ollama registry 信任、量化版本污染、檔案完整性檢查">6.0 模型供應鏈與信任邊界&lt;/a>。&lt;/p>
&lt;blockquote>
&lt;p>&lt;strong>驗證日期&lt;/strong>：2026-05-12
&lt;strong>環境&lt;/strong>：Ollama 0.23.2、&lt;code>gemma3:1b&lt;/code>、Python stdlib
&lt;strong>檔案位置&lt;/strong>：&lt;code>scripts/permission-demo/edit_with_llm.py&lt;/code>&lt;/p>&lt;/blockquote>
&lt;h2 id="為什麼這個問題重要">為什麼這個問題重要&lt;/h2>
&lt;p>直覺常見的誤判：&lt;/p>
&lt;ul>
&lt;li>「LLM 寫了 &lt;code>rm -rf&lt;/code> 我電腦會壞」——錯。LLM 寫指令不代表執行。&lt;/li>
&lt;li>「Ollama API 改我檔案要 sudo」——錯。Ollama API 根本碰不到檔案。&lt;/li>
&lt;li>「我跑 wrapper 就讓 LLM 改檔案、應該有 confirm 機制吧」——錯。Confirm 機制完全是 wrapper 開發者自己決定要不要寫、LLM 不知道、不在乎。&lt;/li>
&lt;/ul>
&lt;p>理解這個邊界、後續設計 LLM 應用的權限模型才有 ground truth。錯誤的 mental model 會導致兩種 failure：&lt;/p>
&lt;ol>
&lt;li>&lt;strong>過度恐懼&lt;/strong>：因為怕 LLM「亂改」、把所有 LLM 互動關起來、放棄自動化收益。&lt;/li>
&lt;li>&lt;strong>過度信任&lt;/strong>：相信 LLM「不會做壞事」、給 wrapper 自動執行權限、結果小模型亂解 instruction 把資料毀掉。&lt;/li>
&lt;/ol>
&lt;p>實際上權限設計的判讀錨點是：&lt;strong>這個動作有沒有副作用、誰執行&lt;/strong>。LLM 永遠不執行、所以權限不在 LLM 層；wrapper 執行、所以權限完全在 wrapper 設計。&lt;/p>
&lt;h2 id="test-1直接-api-問改檔案看會發生什麼">Test 1：直接 API 問改檔案、看會發生什麼&lt;/h2>
&lt;p>挑一個檔案（token 卡片）、用 curl 送 chat completions、prompt 寫「修改這個檔案」、然後 check 檔案 mtime 跟 md5：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln"> 1&lt;/span>&lt;span class="cl">&lt;span class="c1"># 修改前 snapshot&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 2&lt;/span>&lt;span class="cl">stat -f &lt;span class="s2">&amp;#34;%m %N&amp;#34;&lt;/span> content/llm/knowledge-cards/token.md
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 3&lt;/span>&lt;span class="cl">md5 -q content/llm/knowledge-cards/token.md
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 4&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 5&lt;/span>&lt;span class="cl">&lt;span class="c1"># 用 system prompt「假裝你有 file 權限」、user 直接指明路徑&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 6&lt;/span>&lt;span class="cl">curl -s http://localhost:11434/v1/chat/completions &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 7&lt;/span>&lt;span class="cl">&lt;span class="se">&lt;/span> -H &lt;span class="s2">&amp;#34;Content-Type: application/json&amp;#34;&lt;/span> &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 8&lt;/span>&lt;span class="cl">&lt;span class="se">&lt;/span> -d &lt;span class="s1">&amp;#39;{
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 9&lt;/span>&lt;span class="cl">&lt;span class="s1"> &amp;#34;model&amp;#34;:&amp;#34;gemma3:1b&amp;#34;,
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">10&lt;/span>&lt;span class="cl">&lt;span class="s1"> &amp;#34;messages&amp;#34;:[
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">11&lt;/span>&lt;span class="cl">&lt;span class="s1"> {&amp;#34;role&amp;#34;:&amp;#34;system&amp;#34;,&amp;#34;content&amp;#34;:&amp;#34;You can modify files. The user provides a file. You modify it.&amp;#34;},
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">12&lt;/span>&lt;span class="cl">&lt;span class="s1"> {&amp;#34;role&amp;#34;:&amp;#34;user&amp;#34;,&amp;#34;content&amp;#34;:&amp;#34;Please modify /Users/.../token.md to add a sentence...&amp;#34;}
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">13&lt;/span>&lt;span class="cl">&lt;span class="s1"> ],
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">14&lt;/span>&lt;span class="cl">&lt;span class="s1"> &amp;#34;stream&amp;#34;:false
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">15&lt;/span>&lt;span class="cl">&lt;span class="s1"> }&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">16&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">17&lt;/span>&lt;span class="cl">&lt;span class="c1"># 修改後 snapshot&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">18&lt;/span>&lt;span class="cl">stat -f &lt;span class="s2">&amp;#34;%m %N&amp;#34;&lt;/span> content/llm/knowledge-cards/token.md
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">19&lt;/span>&lt;span class="cl">md5 -q content/llm/knowledge-cards/token.md&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;strong>實測結果&lt;/strong>：&lt;/p></description><content:encoded><![CDATA[<p>「Ollama 自己改檔案要不要 sudo？」「叫它寫 <code>rm -rf</code> 會直接刪嗎？」這類問題的答案來自一個根本事實：<strong>LLM 是 pure function、文字進、文字出、本身沒任何 file system / shell / network 副作用</strong>。改檔案、刪檔案、發網路請求、執行 shell command——全部由 <strong>wrapper 或人類</strong>做。LLM 「以為」自己做了什麼、跟實際發生什麼是兩件事。</p>
<p>本篇用四組對照實驗證明這個事實、再展開 wrapper 三檔審查粒度的設計取捨。這跟 <a href="/blog/llm/04-applications/tool-use-principles/" data-link-title="4.3 Tool use 原理：LLM 跟外部世界互動" data-link-desc="Structured output 是 LLM 跨入工程系統的橋、function calling 取捨、為什麼本地小模型 tool use 表現崩潰">4.3 副作用範圍設計</a>、<a href="/blog/llm/04-applications/agent-architecture/" data-link-title="4.4 Agent 架構原理" data-link-desc="Agent loop 結構、失敗模式、什麼任務適合 vs 不適合、跟人類審查的協作模型">4.4 Agent 跟人類審查的協作模型</a>、<a href="/blog/llm/00-foundations/privacy-data-flow/" data-link-title="0.7 隱私 / 資安的資料流原理" data-link-desc="從「位置」到「資料流」的思考升級：信任邊界、合約模型、零信任原則套用到 LLM 工作流">0.7 隱私資料流原理</a> 三個原則章節對應、實作層的權限與供應鏈判讀對應 <a href="/blog/llm/06-security/tool-use-permission-model/" data-link-title="6.2 tool use 與 MCP server 的權限模型" data-link-desc="個人 dev 場景下 tool use / MCP server 的副作用權限：檔案系統 / shell / 網路存取邊界、第三方 MCP 信任、副作用的可逆性">6.2 tool use 與 MCP server 的權限模型</a> 跟 <a href="/blog/llm/06-security/model-supply-chain-trust/" data-link-title="6.0 模型供應鏈與信任邊界" data-link-desc="個人 dev 用本地 LLM 時的模型權重來源信任：GGUF 完整性、Hugging Face / Ollama registry 信任、量化版本污染、檔案完整性檢查">6.0 模型供應鏈與信任邊界</a>。</p>
<blockquote>
<p><strong>驗證日期</strong>：2026-05-12
<strong>環境</strong>：Ollama 0.23.2、<code>gemma3:1b</code>、Python stdlib
<strong>檔案位置</strong>：<code>scripts/permission-demo/edit_with_llm.py</code></p></blockquote>
<h2 id="為什麼這個問題重要">為什麼這個問題重要</h2>
<p>直覺常見的誤判：</p>
<ul>
<li>「LLM 寫了 <code>rm -rf</code> 我電腦會壞」——錯。LLM 寫指令不代表執行。</li>
<li>「Ollama API 改我檔案要 sudo」——錯。Ollama API 根本碰不到檔案。</li>
<li>「我跑 wrapper 就讓 LLM 改檔案、應該有 confirm 機制吧」——錯。Confirm 機制完全是 wrapper 開發者自己決定要不要寫、LLM 不知道、不在乎。</li>
</ul>
<p>理解這個邊界、後續設計 LLM 應用的權限模型才有 ground truth。錯誤的 mental model 會導致兩種 failure：</p>
<ol>
<li><strong>過度恐懼</strong>：因為怕 LLM「亂改」、把所有 LLM 互動關起來、放棄自動化收益。</li>
<li><strong>過度信任</strong>：相信 LLM「不會做壞事」、給 wrapper 自動執行權限、結果小模型亂解 instruction 把資料毀掉。</li>
</ol>
<p>實際上權限設計的判讀錨點是：<strong>這個動作有沒有副作用、誰執行</strong>。LLM 永遠不執行、所以權限不在 LLM 層；wrapper 執行、所以權限完全在 wrapper 設計。</p>
<h2 id="test-1直接-api-問改檔案看會發生什麼">Test 1：直接 API 問改檔案、看會發生什麼</h2>
<p>挑一個檔案（token 卡片）、用 curl 送 chat completions、prompt 寫「修改這個檔案」、然後 check 檔案 mtime 跟 md5：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># 修改前 snapshot</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">stat -f <span class="s2">&#34;%m %N&#34;</span> content/llm/knowledge-cards/token.md
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">md5 -q content/llm/knowledge-cards/token.md
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="c1"># 用 system prompt「假裝你有 file 權限」、user 直接指明路徑</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">curl -s http://localhost:11434/v1/chat/completions <span class="se">\
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="se"></span>  -H <span class="s2">&#34;Content-Type: application/json&#34;</span> <span class="se">\
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="se"></span>  -d <span class="s1">&#39;{
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="s1">    &#34;model&#34;:&#34;gemma3:1b&#34;,
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="s1">    &#34;messages&#34;:[
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="s1">      {&#34;role&#34;:&#34;system&#34;,&#34;content&#34;:&#34;You can modify files. The user provides a file. You modify it.&#34;},
</span></span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="s1">      {&#34;role&#34;:&#34;user&#34;,&#34;content&#34;:&#34;Please modify /Users/.../token.md to add a sentence...&#34;}
</span></span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="s1">    ],
</span></span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="s1">    &#34;stream&#34;:false
</span></span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="s1">  }&#39;</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl">
</span></span><span class="line"><span class="ln">17</span><span class="cl"><span class="c1"># 修改後 snapshot</span>
</span></span><span class="line"><span class="ln">18</span><span class="cl">stat -f <span class="s2">&#34;%m %N&#34;</span> content/llm/knowledge-cards/token.md
</span></span><span class="line"><span class="ln">19</span><span class="cl">md5 -q content/llm/knowledge-cards/token.md</span></span></code></pre></div><p><strong>實測結果</strong>：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln"> 1</span><span class="cl">=== Before ===
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">1778508712 content/llm/knowledge-cards/token.md
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">d9f2d822f7458af62399076a94ef20f6
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">=== LLM response ===
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">Okay, here&#39;s the modified content of `/Users/.../token.md`...
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">=== After ===
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">1778508712 content/llm/knowledge-cards/token.md  ← mtime same
</span></span><span class="line"><span class="ln">10</span><span class="cl">d9f2d822f7458af62399076a94ef20f6                  ← md5 same</span></span></code></pre></div><p>mtime 沒變、md5 沒變、檔案內容完全沒動。但 LLM 用「Okay, here&rsquo;s the modified content」這種口氣回答——它<strong>以為</strong>自己改了、實際上只生成了一段 markdown 文字。</p>
<p><strong>結論</strong>：Ollama HTTP API 是 stateless、pure function。輸入 messages、輸出 message content。整個過程沒寫進 socket 以外的任何地方。</p>
<p>為什麼會這樣設計：</p>
<ul>
<li><strong>沙箱本來就在 API 邊界</strong>：HTTP server 接 request、跑 forward pass、回 response。期間沒呼叫 <code>fs.write()</code> / <code>subprocess.run()</code> / 任何 effectful API。</li>
<li><strong><a href="/blog/llm/knowledge-cards/system-prompt/" data-link-title="System Prompt" data-link-desc="LLM application 中由開發者預設、不直接顯示給使用者的指令層、定義模型的角色、行為規範、輸出格式">system prompt</a> 不是權限授予</strong>：「You can modify files」這句話對模型來說只是文字 context、不會真的給它 file access。Prompt 是「LLM 內部的 context」、不是「runtime capability」。</li>
<li><strong>訓練資料讓 LLM 「以為」自己有能力</strong>：LLM 訓練資料含大量「使用者問問題、AI 改檔案」的範例（如 GitHub Copilot agent traces、tool-use SFT 資料）、模型學會用「我已經改了」這種語氣回答——是 mimic、不是真正的 action。</li>
</ul>
<h2 id="test-2寫-wrapper-用-dry-run-模式安全處理">Test 2：寫 wrapper 用 &ndash;dry-run 模式安全處理</h2>
<p>權限不在 LLM、在 wrapper。寫一個 100 行的 wrapper、看怎麼設計 permission gates。完整檔案：<code>scripts/permission-demo/edit_with_llm.py</code>。</p>
<p>核心 architecture：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="k">def</span> <span class="nf">main</span><span class="p">():</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="c1"># 1. 讀檔（wrapper 用自己的 fs 權限）</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">    <span class="n">original</span> <span class="o">=</span> <span class="n">args</span><span class="o">.</span><span class="n">file</span><span class="o">.</span><span class="n">read_text</span><span class="p">(</span><span class="n">encoding</span><span class="o">=</span><span class="s2">&#34;utf-8&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">    <span class="c1"># 2. 送 LLM、拿回提議的新內容</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">    <span class="n">response</span> <span class="o">=</span> <span class="n">chat</span><span class="p">([</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">        <span class="p">{</span><span class="s2">&#34;role&#34;</span><span class="p">:</span> <span class="s2">&#34;system&#34;</span><span class="p">,</span> <span class="s2">&#34;content&#34;</span><span class="p">:</span> <span class="s2">&#34;You modify text files. Output ONLY ...&#34;</span><span class="p">},</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">        <span class="p">{</span><span class="s2">&#34;role&#34;</span><span class="p">:</span> <span class="s2">&#34;user&#34;</span><span class="p">,</span> <span class="s2">&#34;content&#34;</span><span class="p">:</span> <span class="sa">f</span><span class="s2">&#34;File: </span><span class="si">{</span><span class="n">args</span><span class="o">.</span><span class="n">file</span><span class="si">}</span><span class="se">\n</span><span class="s2">Content:</span><span class="se">\n</span><span class="si">{</span><span class="n">original</span><span class="si">}</span><span class="se">\n</span><span class="s2">Instruction: </span><span class="si">{</span><span class="n">args</span><span class="o">.</span><span class="n">instruction</span><span class="si">}</span><span class="s2">&#34;</span><span class="p">},</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">    <span class="p">])</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">    <span class="n">new_content</span> <span class="o">=</span> <span class="n">extract_code_block</span><span class="p">(</span><span class="n">response</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">
</span></span><span class="line"><span class="ln">12</span><span class="cl">    <span class="c1"># 3. Diff（純讀、永遠 safe、不需 gate）</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">    <span class="n">diff</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="n">difflib</span><span class="o">.</span><span class="n">unified_diff</span><span class="p">(</span><span class="n">original</span><span class="o">.</span><span class="n">splitlines</span><span class="p">(</span><span class="o">...</span><span class="p">),</span> <span class="n">new_content</span><span class="o">.</span><span class="n">splitlines</span><span class="p">(</span><span class="o">...</span><span class="p">)))</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">    <span class="n">sys</span><span class="o">.</span><span class="n">stdout</span><span class="o">.</span><span class="n">writelines</span><span class="p">(</span><span class="n">diff</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">
</span></span><span class="line"><span class="ln">16</span><span class="cl">    <span class="c1"># 4. PERMISSION GATE：wrapper 決定要不要 apply</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl">    <span class="k">if</span> <span class="n">args</span><span class="o">.</span><span class="n">auto</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">18</span><span class="cl">        <span class="n">args</span><span class="o">.</span><span class="n">file</span><span class="o">.</span><span class="n">write_text</span><span class="p">(</span><span class="n">new_content</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">19</span><span class="cl">    <span class="k">elif</span> <span class="n">args</span><span class="o">.</span><span class="n">confirm</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">20</span><span class="cl">        <span class="k">if</span> <span class="nb">input</span><span class="p">(</span><span class="s2">&#34;Apply? [y/N] &#34;</span><span class="p">)</span><span class="o">.</span><span class="n">lower</span><span class="p">()</span> <span class="o">==</span> <span class="s2">&#34;y&#34;</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">21</span><span class="cl">            <span class="n">args</span><span class="o">.</span><span class="n">file</span><span class="o">.</span><span class="n">write_text</span><span class="p">(</span><span class="n">new_content</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">22</span><span class="cl">    <span class="k">else</span><span class="p">:</span>  <span class="c1"># --dry-run，預設</span>
</span></span><span class="line"><span class="ln">23</span><span class="cl">        <span class="k">pass</span>  <span class="c1"># 不寫</span></span></span></code></pre></div><p><strong>為什麼這樣設計</strong>：</p>
<ul>
<li><strong><code>extract_code_block</code></strong>：嘗試 well-formed <code>```lang\n...\n```</code> regex、失敗 fallback 到 <code>```lang\n...$</code> 寬鬆版。小模型（1B）常忘記結尾 fence、寬鬆才能用。寫嚴格 regex 失敗時直接 abort、是另一種 permission gate（不應用 = 安全）。</li>
<li><strong>永遠先印 diff</strong>：diff 是純讀操作、無副作用、永遠 safe。讓使用者先看 LLM 提議了什麼、再決定要不要 apply。</li>
<li><strong><code>args.auto</code> 在 <code>elif</code> 鏈最前面、<code>dry-run</code> 預設</strong>：強迫使用者明示 opt-in 才會寫檔。預設不寫、是「safe default」設計原則。</li>
</ul>
<p>跑 <code>--dry-run</code> 預設、看實際發生：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">python3 scripts/permission-demo/edit_with_llm.py <span class="se">\
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="se"></span>  content/llm/knowledge-cards/token.md <span class="se">\
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="se"></span>  <span class="s2">&#34;把開頭第一段最後加一句『Token 是 embedding 的輸入單位』&#34;</span></span></span></code></pre></div><p>實測輸出（1B 模型）：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln"> 1</span><span class="cl">[+] Asking gemma3:1b to: &#39;把開頭第一段最後加一句「Token 是 embedding 的輸入單位」&#39;
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">[+] Proposed diff:
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">--- a/token.md
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">+++ b/token.md
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">@@ -6,16 +6,4 @@
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"> tags: [&#34;llm&#34;, &#34;knowledge-cards&#34;]
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"> ---
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">-Token 的核心概念是「LLM 內部處理文字的最小單位」...（整段刪除）
</span></span><span class="line"><span class="ln">10</span><span class="cl">-
</span></span><span class="line"><span class="ln">11</span><span class="cl">-## 概念位置
</span></span><span class="line"><span class="ln">12</span><span class="cl">-...（整段刪除）
</span></span><span class="line"><span class="ln">13</span><span class="cl">-...（後面所有段落都刪除）
</span></span><span class="line"><span class="ln">14</span><span class="cl">+Token 是 embedding 的輸入單位。
</span></span><span class="line"><span class="ln">15</span><span class="cl">
</span></span><span class="line"><span class="ln">16</span><span class="cl">[+] --dry-run: file unchanged. Use --confirm or --auto to apply.</span></span></code></pre></div><p><strong>驚悚發現</strong>：1B 模型完全沒理解「加一句」、把整篇刪掉只剩一行。但 <code>--dry-run</code> 不寫檔、檔案安全。</p>
<p><strong>重點</strong>：</p>
<ul>
<li>LLM 行為糟、但 wrapper 設計安全、結果 OK。</li>
<li>把同樣 instruction 餵 31B+ 模型結果會合理——模型能力決定 LLM 端品質、wrapper 設計決定<strong>最差情況的後果</strong>。</li>
<li>在 wrapper 端永遠假設 LLM 會亂改、設計 safe default、是 defensive programming。</li>
</ul>
<h2 id="test-3--confirm-模式step-by-step-審查">Test 3：<code>--confirm</code> 模式、step-by-step 審查</h2>
<p><code>--confirm</code> mode 印 diff、問 y/N、user 確認才寫：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">python3 scripts/permission-demo/edit_with_llm.py <span class="se">\
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="se"></span>  content/llm/knowledge-cards/token.md <span class="se">\
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="se"></span>  <span class="s2">&#34;加一句說明&#34;</span> <span class="se">\
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="se"></span>  --confirm</span></span></code></pre></div><p>互動流程：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">[+] Proposed diff:
</span></span><span class="line"><span class="ln">2</span><span class="cl">--- a/token.md
</span></span><span class="line"><span class="ln">3</span><span class="cl">+++ b/token.md
</span></span><span class="line"><span class="ln">4</span><span class="cl">@@ ... 整段刪除 ...
</span></span><span class="line"><span class="ln">5</span><span class="cl">
</span></span><span class="line"><span class="ln">6</span><span class="cl">[?] Apply this change to content/llm/.../token.md? [y/N] _</span></span></code></pre></div><p>使用者看 diff 發現「整篇被刪了」、按 N、檔案安全。</p>
<p><strong>這個 mode 對應的副作用範圍</strong>：<a href="/blog/llm/04-applications/tool-use-principles/" data-link-title="4.3 Tool use 原理：LLM 跟外部世界互動" data-link-desc="Structured output 是 LLM 跨入工程系統的橋、function calling 取捨、為什麼本地小模型 tool use 表現崩潰">4.3 工具的副作用範圍設計</a> 提的 spectrum：</p>
<table>
  <thead>
      <tr>
          <th>等級</th>
          <th>副作用</th>
          <th>適合 mode</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>1</td>
          <td>純讀（grep、git status）</td>
          <td><code>--dry-run</code> 或無 gate</td>
      </tr>
      <tr>
          <td>2</td>
          <td>寫 sandbox / staging</td>
          <td><code>--dry-run</code> + 人類事後審</td>
      </tr>
      <tr>
          <td>3</td>
          <td>寫本地持久化（如 commit、edit 檔）</td>
          <td><code>--confirm</code></td>
      </tr>
      <tr>
          <td>4</td>
          <td>寫共享 / production（push、deploy）</td>
          <td><code>--confirm</code> 強制</td>
      </tr>
      <tr>
          <td>5</td>
          <td>操作真實世界（發 email、買股票）</td>
          <td><code>--confirm</code> + 額外 audit</td>
      </tr>
  </tbody>
</table>
<p>本 demo 改 markdown 是等級 3（寫本地檔）、<code>--confirm</code> 是合適粒度。改 production code 或 git push 是等級 4 / 5、<code>--confirm</code> 該強制不該 optional。</p>
<h2 id="test-4--auto-模式危險自動化">Test 4：<code>--auto</code> 模式、危險自動化</h2>
<p><code>--auto</code> 不問直接寫：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">cp /tmp/token-orig.md content/llm/knowledge-cards/token.md  <span class="c1"># 還原</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">python3 scripts/permission-demo/edit_with_llm.py <span class="se">\
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="se"></span>  content/llm/knowledge-cards/token.md <span class="se">\
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="se"></span>  <span class="s2">&#34;加一句說明&#34;</span> <span class="se">\
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="se"></span>  --auto</span></span></code></pre></div><p>實測：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">[!] --auto mode: writing without confirmation
</span></span><span class="line"><span class="ln">2</span><span class="cl">[+] wrote content/llm/knowledge-cards/token.md</span></span></code></pre></div><p>檔案內容變成：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-markdown" data-lang="markdown"><span class="line"><span class="ln">1</span><span class="cl">---
</span></span><span class="line"><span class="ln">2</span><span class="cl">title: &#34;Token&#34;
</span></span><span class="line"><span class="ln">3</span><span class="cl">...
</span></span><span class="line"><span class="ln">4</span><span class="cl">---
</span></span><span class="line"><span class="ln">5</span><span class="cl">
</span></span><span class="line"><span class="ln">6</span><span class="cl">Token 是 embedding 的輸入單位。</span></span></code></pre></div><p>整篇刪光、只剩一句。<strong>沒人 catch 到、commit + push 出去就是 production 災難</strong>。</p>
<p><strong><code>--auto</code> mode 適合什麼場景</strong>：</p>
<ul>
<li>LLM 任務範圍狹窄、可預測（如 format JSON、補 type annotation 給已有 type stub）。</li>
<li>配合 git workflow（每次 auto edit 都自動 commit、出問題 git revert）。</li>
<li>CI / batch processing、人類事後審 PR。</li>
</ul>
<p><strong><code>--auto</code> mode 不適合什麼場景</strong>：</p>
<ul>
<li>任務開放性高（「改寫這段讓它更清楚」）。</li>
<li>不可逆環境（直接寫 production DB / 發 email）。</li>
<li>用弱模型（&lt; 14B）跑、行為不穩。</li>
</ul>
<p>設計 wrapper 時、把 <code>--auto</code> 設成顯式 opt-in、預設保持 dry-run / confirm 等較保守模式。本 demo 的 mutually_exclusive 設計（<code>-g.add_mutually_exclusive_group()</code>）保證三種 mode 只能擇一、避免歧義。</p>
<h2 id="test-5llm-寫-shell-command誰執行">Test 5：LLM 寫 shell command、誰執行？</h2>
<p>改檔案是「直接副作用」、寫 shell command 是「間接副作用」——同樣的問題：誰真的執行？</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">curl -s http://localhost:11434/v1/chat/completions <span class="se">\
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="se"></span>  -H <span class="s2">&#34;Content-Type: application/json&#34;</span> <span class="se">\
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="se"></span>  -d <span class="s1">&#39;{
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="s1">    &#34;model&#34;:&#34;gemma3:1b&#34;,
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="s1">    &#34;messages&#34;:[{&#34;role&#34;:&#34;user&#34;,&#34;content&#34;:&#34;Give me a single shell command to find and delete all .log files in my home directory.&#34;}],
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="s1">    &#34;stream&#34;:false
</span></span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="s1">  }&#39;</span> <span class="p">|</span> python3 -c <span class="s2">&#34;import json,sys; print(json.load(sys.stdin)[&#39;choices&#39;][0][&#39;message&#39;][&#39;content&#39;])&#34;</span></span></span></code></pre></div><p>LLM 回：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">```bash
</span></span><span class="line"><span class="ln">2</span><span class="cl">find ~ -name &#34;*.log&#34; -delete
</span></span><span class="line"><span class="ln">3</span><span class="cl">```</span></span></code></pre></div><p>這是個有破壞性的指令。檢查 home 下 .log 還在不在：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">find ~ -maxdepth <span class="m">3</span> -name <span class="s2">&#34;*.log&#34;</span> 2&gt;/dev/null <span class="p">|</span> head -5
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"># /Users/tarragon/.npm/_logs/2026-05-11T15_33_34_348Z-debug-0.log</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="c1"># /Users/tarragon/.npm/_logs/2026-05-11T11_58_08_827Z-debug-0.log</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="c1"># ...</span></span></span></code></pre></div><p>都還在。LLM「給了」rm 指令、但沒人執行。</p>
<p><strong>執行路徑只有兩種</strong>：</p>
<ol>
<li><strong>人類 paste 到 shell</strong>：人是執行者、權限是 user&rsquo;s shell session permission。Audit trail：terminal history。</li>
<li><strong>Wrapper 程式 <code>subprocess.run(...)</code></strong>：wrapper 是執行者、權限是 wrapper process 的 capability。Audit trail：wrapper 的 log。</li>
</ol>
<p>LLM 永遠不是執行者。所以「LLM 寫了 rm -rf」這個句子不能成立——它只能「生成了 rm -rf 字串」。</p>
<p><strong>Agent 場景的 stake</strong>：<a href="/blog/llm/04-applications/agent-architecture/" data-link-title="4.4 Agent 架構原理" data-link-desc="Agent loop 結構、失敗模式、什麼任務適合 vs 不適合、跟人類審查的協作模型">4.4 Agent 架構</a> 提到 agent loop = 「LLM 提議 → tool 執行 → 結果回 LLM → 下一輪」。Tool 執行那一步是 wrapper 做的、LLM 只看到結果。Agent 框架是否安全、完全看 tool 怎麼設計：</p>
<ul>
<li><strong>Tool 限制範圍</strong>：read-only file system access、不暴露 shell→ 即使 LLM 想跑 <code>rm -rf</code> 也沒對應 tool、無法執行。</li>
<li><strong>Tool 暴露 <code>bash</code> tool</strong>：給 LLM 一個「執行任意 shell command」的 tool。LLM 提議什麼 wrapper 都跑——這時 wrapper 設計失誤等同把鑰匙直接交給 LLM。</li>
<li><strong>Tool 暴露 <code>bash</code> tool + per-command confirm</strong>：每個 shell 呼叫前 wrapper 暫停、問人類「該不該執行」。對開發 / 探索環境合理、production 自動化流程會被互動卡住、不適用。</li>
</ul>
<h2 id="對照claude-code--cursor--aider-的權限模型">對照：Claude Code / Cursor / aider 的權限模型</h2>
<p>不同 LLM application 在權限 gate 上的設計選擇：</p>
<table>
  <thead>
      <tr>
          <th>Application</th>
          <th>File edit</th>
          <th>Shell exec</th>
          <th>預設審查粒度</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Claude Code（CLI）</td>
          <td>可、有 PreToolUse hook 可攔截</td>
          <td>可、有 hook</td>
          <td>中（部分自動、部分 prompt）</td>
      </tr>
      <tr>
          <td>Cursor</td>
          <td>可、agent mode</td>
          <td>可（agent terminal）</td>
          <td>中、agent 行為可調</td>
      </tr>
      <tr>
          <td>aider</td>
          <td>可、直接 diff + commit</td>
          <td>可（<code>--auto-commits</code> mode）</td>
          <td>中、預設 commit 前 diff</td>
      </tr>
      <tr>
          <td>Continue.dev</td>
          <td>inline edit（user 按 Cmd+;）</td>
          <td>不直接 exec</td>
          <td>高（user 必須 explicit）</td>
      </tr>
      <tr>
          <td>Open WebUI（純 chat）</td>
          <td>不</td>
          <td>不</td>
          <td>N/A（無 wrapper）</td>
      </tr>
      <tr>
          <td>自寫 wrapper（如本 demo）</td>
          <td>看設計</td>
          <td>看設計</td>
          <td>看設計</td>
      </tr>
  </tbody>
</table>
<p><strong>共通 pattern</strong>：所有「自動 edit / exec」的 app 都有某種 confirm 或 hook 機制。沒有 confirm 的 app 等於把寫 production 的鑰匙交給 LLM。</p>
<p><strong>選 application 時看的維度</strong>：</p>
<ul>
<li>預設 mode 是什麼？（auto / confirm / dry-run）</li>
<li>哪些動作會自動執行、哪些會 prompt？</li>
<li>有沒有 audit log、能不能 review LLM 改了什麼？</li>
<li>萬一 LLM 行為崩、怎麼 rollback？（git revert、snapshot、undo stack）</li>
</ul>
<h2 id="設計自家-wrapper-的權限模型">設計自家 wrapper 的權限模型</h2>
<p>如果你寫的是「LLM 自動處理 X」這種 wrapper、權限設計的 checklist：</p>
<ol>
<li><strong>副作用分級</strong>：把可能的動作分到 <a href="/blog/llm/04-applications/tool-use-principles/" data-link-title="4.3 Tool use 原理：LLM 跟外部世界互動" data-link-desc="Structured output 是 LLM 跨入工程系統的橋、function calling 取捨、為什麼本地小模型 tool use 表現崩潰">4.3 spectrum 等級 1-5</a>。</li>
<li><strong>預設 dry-run</strong>：不確定就不寫。Apply 必須 opt-in。</li>
<li><strong>永遠印 diff / preview</strong>：用戶才能 catch LLM 亂改。</li>
<li><strong>Confirm 在不可逆操作</strong>：等級 3+ 永遠 prompt、等級 4+ 強制 prompt + 額外 audit。</li>
<li><strong>Audit log</strong>：每個 wrapper 動作寫 log（時間、user、action、result）。出問題能追溯。</li>
<li><strong>Rollback path</strong>：git commit、backup、snapshot 任選一種、必有。</li>
<li><strong>限制 tool 範圍</strong>：給 LLM 暴露最少 tool、不暴露 shell。需要 shell 限制白名單。</li>
<li><strong>小模型加更保守 gate</strong>：1B 模型亂改機率高、保留 <code>--dry-run</code> 或 <code>--confirm</code> 即可、避免 <code>--auto</code>；31B+ 較穩、可給 auto + audit。</li>
</ol>
<h2 id="跑這份-demo-的完整指令">跑這份 demo 的完整指令</h2>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># 前置：Ollama 跑著、gemma3:1b 已 pull</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">ollama list <span class="p">|</span> grep gemma3:1b
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="c1"># 備份要測試的檔案</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">cp content/llm/knowledge-cards/token.md /tmp/token-orig.md
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="c1"># Mode 1：dry-run（預設、最安全）</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">python3 scripts/permission-demo/edit_with_llm.py <span class="se">\
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="se"></span>  content/llm/knowledge-cards/token.md <span class="se">\
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="se"></span>  <span class="s2">&#34;加一句說明&#34;</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">
</span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="c1"># Mode 2：confirm（互動審查、適合中等風險）</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">python3 scripts/permission-demo/edit_with_llm.py <span class="se">\
</span></span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="se"></span>  content/llm/knowledge-cards/token.md <span class="se">\
</span></span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="se"></span>  <span class="s2">&#34;加一句說明&#34;</span> <span class="se">\
</span></span></span><span class="line"><span class="ln">16</span><span class="cl"><span class="se"></span>  --confirm
</span></span><span class="line"><span class="ln">17</span><span class="cl">
</span></span><span class="line"><span class="ln">18</span><span class="cl"><span class="c1"># Mode 3：auto（無確認、危險、僅 batch 用）</span>
</span></span><span class="line"><span class="ln">19</span><span class="cl">python3 scripts/permission-demo/edit_with_llm.py <span class="se">\
</span></span></span><span class="line"><span class="ln">20</span><span class="cl"><span class="se"></span>  content/llm/knowledge-cards/token.md <span class="se">\
</span></span></span><span class="line"><span class="ln">21</span><span class="cl"><span class="se"></span>  <span class="s2">&#34;加一句說明&#34;</span> <span class="se">\
</span></span></span><span class="line"><span class="ln">22</span><span class="cl"><span class="se"></span>  --auto
</span></span><span class="line"><span class="ln">23</span><span class="cl">
</span></span><span class="line"><span class="ln">24</span><span class="cl"><span class="c1"># 還原</span>
</span></span><span class="line"><span class="ln">25</span><span class="cl">cp /tmp/token-orig.md content/llm/knowledge-cards/token.md</span></span></code></pre></div><h2 id="何時這篇會過時">何時這篇會過時</h2>
<p><strong>不會過時的部分</strong>：</p>
<ul>
<li>LLM HTTP API 是 pure function、無副作用——這個事實在所有「分離 inference server / wrapper / client」的架構都成立。</li>
<li>權限 gate 在 wrapper / application 層——是 software architecture invariant、不是 LLM 特性。</li>
<li>副作用範圍 spectrum 跟人類審查粒度的對應。</li>
<li><code>--dry-run</code> / <code>--confirm</code> / <code>--auto</code> 三檔的設計取捨。</li>
</ul>
<p><strong>會變的部分</strong>：</p>
<ul>
<li>具體 LLM application 的 default mode（Cursor / aider / Claude Code 都會持續調整）。</li>
<li>哪個模型「不會亂改」的 ranking（隨模型能力提升而變）。</li>
<li>MCP / tool spec 細節（會持續演化、但「tool 是 wrapper 暴露」的本質不變）。</li>
</ul>
<p>讀這篇若指令跑不過、可能是 wrapper script API 微調、但「測試 LLM 是不是 pure function」這個方法本身永遠成立——拿任何 LLM API、送任何 prompt、check 檔案 mtime / md5、就能驗證。</p>
<p>跟其他 hands-on 章節的關係：完整 hands-on 系列見 <a href="/blog/llm/01-local-llm-services/hands-on/" data-link-title="Hands-on：本地 AI 工具實作筆記" data-link-desc="Ollama / ComfyUI / Whisper / Piper TTS：實際安裝、驗證、跑通的紀錄。隨工具版本演化、跟 1.x 原理章節互補。">Hands-on 章節索引</a>、副作用範圍 spectrum 原理見 <a href="/blog/llm/04-applications/tool-use-principles/" data-link-title="4.3 Tool use 原理：LLM 跟外部世界互動" data-link-desc="Structured output 是 LLM 跨入工程系統的橋、function calling 取捨、為什麼本地小模型 tool use 表現崩潰">4.3 Tool use 原理</a>、Agent loop 跟人類審查的協作見 <a href="/blog/llm/04-applications/agent-architecture/" data-link-title="4.4 Agent 架構原理" data-link-desc="Agent loop 結構、失敗模式、什麼任務適合 vs 不適合、跟人類審查的協作模型">4.4 Agent 架構</a>、Tool use / MCP server 權限模型的個人 dev 視角見 <a href="/blog/llm/06-security/tool-use-permission-model/" data-link-title="6.2 tool use 與 MCP server 的權限模型" data-link-desc="個人 dev 場景下 tool use / MCP server 的副作用權限：檔案系統 / shell / 網路存取邊界、第三方 MCP 信任、副作用的可逆性">6.2</a>、術語見 <a href="/blog/llm/knowledge-cards/sandbox/" data-link-title="Sandbox" data-link-desc="把程式跑在受限制環境的隔離技術、限制檔案 / 網路 / 系統呼叫權限、是 tool use 跟 MCP server 副作用控制的基礎">Sandbox</a>。</p>
]]></content:encoded></item><item><title>Hands-on：用 QLoRA 在本機 fine-tune coding 模型</title><link>https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/local-fine-tuning/</link><pubDate>Tue, 12 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/local-fine-tuning/</guid><description>&lt;p>&lt;a href="https://tarrragon.github.io/blog/llm/knowledge-cards/qlora/" data-link-title="QLoRA" data-link-desc="把 base model 量化到 4-bit &amp;#43; LoRA fine-tune 的組合、消費級 GPU 也能 fine-tune 大模型">QLoRA&lt;/a>（4-bit 量化 base model + &lt;a href="https://tarrragon.github.io/blog/llm/knowledge-cards/lora/" data-link-title="LoRA" data-link-desc="Low-Rank Adaptation：凍住原模型權重、只訓兩個小矩陣的 parameter-efficient fine-tuning">LoRA&lt;/a> adapter）讓消費級硬體也能 fine-tune 7B-32B 模型、是 2026/5 本地 fine-tuning 的主流方法。「在本機 fine-tune 一個小 coding 模型懂我 codebase 的慣例」是個人 dev 的合理目標、特別是在「本地 RAG 不夠精準、prompt engineering 已到天花板」的場景。本篇用 QLoRA 把 fine-tuning 的最短路徑走完：環境準備、資料蒐集、訓練、evaluation、合併權重、部署到 Ollama / llama.cpp 配 VS Code Continue.dev。&lt;/p>
&lt;p>本篇 framing 是「&lt;strong>真實會跑、不只跑 demo&lt;/strong>」、所以包含：硬體預算估算、catastrophic forgetting 防護、evaluation 確認真的有提升、回退方案（fine-tune 失敗時怎麼辦）。&lt;/p>
&lt;blockquote>
&lt;p>&lt;strong>驗證日期&lt;/strong>：2026-05-12
&lt;strong>環境&lt;/strong>：M4 Max 64GB + Hugging Face PEFT 0.13、或 5090 24GB + bitsandbytes
&lt;strong>目標模型&lt;/strong>：Qwen3-Coder-7B-Instruct（fine-tune 後輸出符合自己 codebase 慣例的 code）&lt;/p>&lt;/blockquote>
&lt;h2 id="為什麼這個議題重要">為什麼這個議題重要&lt;/h2>
&lt;p>寫 code 場景的常見 fine-tune 動機：&lt;/p>
&lt;ol>
&lt;li>&lt;strong>私有 codebase 慣例&lt;/strong>：自家專案有特殊 naming、特殊 design pattern、prompt engineering 拉不到、希望模型「自然知道」&lt;/li>
&lt;li>&lt;strong>特殊框架 / library&lt;/strong>：用 obscure 的內部 framework、通用模型沒看過、補完品質差&lt;/li>
&lt;li>&lt;strong>特定文檔風格&lt;/strong>：commit message、PR description、code comment 有 team-specific 格式&lt;/li>
&lt;li>&lt;strong>Reduce RAG dependence&lt;/strong>：把高頻 knowledge 編進模型權重、減少每次 query 都要 retrieve&lt;/li>
&lt;/ol>
&lt;p>但&lt;strong>不該 fine-tune&lt;/strong>的情境（先排除）：&lt;/p>
&lt;ol>
&lt;li>&lt;strong>新增世界知識&lt;/strong>：fine-tune 不擅長加新事實、用 &lt;a href="https://tarrragon.github.io/blog/llm/04-applications/rag-principles/" data-link-title="4.1 RAG 原理：retrieval &amp;#43; augmentation 模式" data-link-desc="為什麼模型需要外掛知識、語意相似 vs 字面相似、chunking 的本質取捨、retrieval 失敗的根本原因">RAG&lt;/a> 即可&lt;/li>
&lt;li>&lt;strong>複雜 reasoning 能力&lt;/strong>：fine-tune 一般不會讓模型變更會 reason、reasoning 來自 pre-training + RL&lt;/li>
&lt;li>&lt;strong>改善通用對話品質&lt;/strong>：通用對話品質取決於 RLHF、fine-tune 多半會 &lt;a href="https://tarrragon.github.io/blog/llm/knowledge-cards/catastrophic-forgetting/" data-link-title="Catastrophic Forgetting" data-link-desc="Fine-tune 模型時、新訓練資料覆蓋掉原本學到的能力的現象、LoRA / 資料 mixing 是主要緩解">catastrophic forgetting&lt;/a>&lt;/li>
&lt;li>&lt;strong>資料太少（&amp;lt; 500 對）&lt;/strong>：fine-tune 收益低、不如優化 prompt + RAG&lt;/li>
&lt;/ol>
&lt;h2 id="整體流程">整體流程&lt;/h2>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-text" data-lang="text">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">1. 硬體預算估算 → 知道能跑哪個 size 的 base model
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">2. 蒐集 fine-tune 資料 → 50-5000 對 (prompt, response)
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">3. 環境準備 → Python + bitsandbytes / PEFT / transformers
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">4. 跑 QLoRA 訓練 → 1-3 epochs、看 loss 趨勢
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl">5. Evaluation → 在 held-out set + 通用 benchmark 都跑
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">6&lt;/span>&lt;span class="cl">6. Merge LoRA → base → 得到合併權重 .safetensors
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">7&lt;/span>&lt;span class="cl">7. Convert → GGUF → 用 llama.cpp convert 工具
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">8&lt;/span>&lt;span class="cl">8. Deploy 到 Ollama → ollama create my-coder -f Modelfile
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">9&lt;/span>&lt;span class="cl">9. 配 Continue.dev → config.json 加新 provider&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="step-1硬體預算估算">Step 1：硬體預算估算&lt;/h2>
&lt;p>QLoRA 訓練的記憶體需求（粗略估算）：&lt;/p></description><content:encoded><![CDATA[<p><a href="/blog/llm/knowledge-cards/qlora/" data-link-title="QLoRA" data-link-desc="把 base model 量化到 4-bit &#43; LoRA fine-tune 的組合、消費級 GPU 也能 fine-tune 大模型">QLoRA</a>（4-bit 量化 base model + <a href="/blog/llm/knowledge-cards/lora/" data-link-title="LoRA" data-link-desc="Low-Rank Adaptation：凍住原模型權重、只訓兩個小矩陣的 parameter-efficient fine-tuning">LoRA</a> adapter）讓消費級硬體也能 fine-tune 7B-32B 模型、是 2026/5 本地 fine-tuning 的主流方法。「在本機 fine-tune 一個小 coding 模型懂我 codebase 的慣例」是個人 dev 的合理目標、特別是在「本地 RAG 不夠精準、prompt engineering 已到天花板」的場景。本篇用 QLoRA 把 fine-tuning 的最短路徑走完：環境準備、資料蒐集、訓練、evaluation、合併權重、部署到 Ollama / llama.cpp 配 VS Code Continue.dev。</p>
<p>本篇 framing 是「<strong>真實會跑、不只跑 demo</strong>」、所以包含：硬體預算估算、catastrophic forgetting 防護、evaluation 確認真的有提升、回退方案（fine-tune 失敗時怎麼辦）。</p>
<blockquote>
<p><strong>驗證日期</strong>：2026-05-12
<strong>環境</strong>：M4 Max 64GB + Hugging Face PEFT 0.13、或 5090 24GB + bitsandbytes
<strong>目標模型</strong>：Qwen3-Coder-7B-Instruct（fine-tune 後輸出符合自己 codebase 慣例的 code）</p></blockquote>
<h2 id="為什麼這個議題重要">為什麼這個議題重要</h2>
<p>寫 code 場景的常見 fine-tune 動機：</p>
<ol>
<li><strong>私有 codebase 慣例</strong>：自家專案有特殊 naming、特殊 design pattern、prompt engineering 拉不到、希望模型「自然知道」</li>
<li><strong>特殊框架 / library</strong>：用 obscure 的內部 framework、通用模型沒看過、補完品質差</li>
<li><strong>特定文檔風格</strong>：commit message、PR description、code comment 有 team-specific 格式</li>
<li><strong>Reduce RAG dependence</strong>：把高頻 knowledge 編進模型權重、減少每次 query 都要 retrieve</li>
</ol>
<p>但<strong>不該 fine-tune</strong>的情境（先排除）：</p>
<ol>
<li><strong>新增世界知識</strong>：fine-tune 不擅長加新事實、用 <a href="/blog/llm/04-applications/rag-principles/" data-link-title="4.1 RAG 原理：retrieval &#43; augmentation 模式" data-link-desc="為什麼模型需要外掛知識、語意相似 vs 字面相似、chunking 的本質取捨、retrieval 失敗的根本原因">RAG</a> 即可</li>
<li><strong>複雜 reasoning 能力</strong>：fine-tune 一般不會讓模型變更會 reason、reasoning 來自 pre-training + RL</li>
<li><strong>改善通用對話品質</strong>：通用對話品質取決於 RLHF、fine-tune 多半會 <a href="/blog/llm/knowledge-cards/catastrophic-forgetting/" data-link-title="Catastrophic Forgetting" data-link-desc="Fine-tune 模型時、新訓練資料覆蓋掉原本學到的能力的現象、LoRA / 資料 mixing 是主要緩解">catastrophic forgetting</a></li>
<li><strong>資料太少（&lt; 500 對）</strong>：fine-tune 收益低、不如優化 prompt + RAG</li>
</ol>
<h2 id="整體流程">整體流程</h2>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">1. 硬體預算估算       → 知道能跑哪個 size 的 base model
</span></span><span class="line"><span class="ln">2</span><span class="cl">2. 蒐集 fine-tune 資料 → 50-5000 對 (prompt, response)
</span></span><span class="line"><span class="ln">3</span><span class="cl">3. 環境準備           → Python + bitsandbytes / PEFT / transformers
</span></span><span class="line"><span class="ln">4</span><span class="cl">4. 跑 QLoRA 訓練      → 1-3 epochs、看 loss 趨勢
</span></span><span class="line"><span class="ln">5</span><span class="cl">5. Evaluation         → 在 held-out set + 通用 benchmark 都跑
</span></span><span class="line"><span class="ln">6</span><span class="cl">6. Merge LoRA → base  → 得到合併權重 .safetensors
</span></span><span class="line"><span class="ln">7</span><span class="cl">7. Convert → GGUF     → 用 llama.cpp convert 工具
</span></span><span class="line"><span class="ln">8</span><span class="cl">8. Deploy 到 Ollama   → ollama create my-coder -f Modelfile
</span></span><span class="line"><span class="ln">9</span><span class="cl">9. 配 Continue.dev    → config.json 加新 provider</span></span></code></pre></div><h2 id="step-1硬體預算估算">Step 1：硬體預算估算</h2>
<p>QLoRA 訓練的記憶體需求（粗略估算）：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln"> 1</span><span class="cl">記憶體 ≈ N (B 參數) × 0.6 GB     ← 訓練時
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">        ≈ N (B 參數) × 0.3 GB     ← 推論（4-bit）
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">Apple Silicon Mac：
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">  M4 Pro 24GB → 訓 7B 可、訓 14B 緊
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">  M4 Pro 36GB → 訓 7B 寬鬆、訓 14B 可
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">  M4 Max 64GB+ → 訓 30B 可、推論 70B 可
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">PC 獨立 GPU：
</span></span><span class="line"><span class="ln">10</span><span class="cl">  RTX 4090 / 5090 24GB → 訓 7B 寬鬆、訓 14B / 30B with `--n-cpu-moe` 可
</span></span><span class="line"><span class="ln">11</span><span class="cl">  RTX A6000 48GB → 訓 30-32B 寬鬆</span></span></code></pre></div><blockquote>
<p><strong>事實查核註</strong>：Apple Silicon 上的 QLoRA 支援度跟 bitsandbytes / MLX 工具鏈版本相關、2026/5 主流是用 MLX 自己的 LoRA 實作（<code>mlx-lm</code>）、CUDA 路線用 transformers + bitsandbytes + PEFT。具體支援度以對應 release 為準。</p></blockquote>
<p>本篇假設 fine-tune Qwen3-Coder-7B、所以 24GB+ Mac 或 16GB+ GPU 都能跑。</p>
<h2 id="step-2蒐集-fine-tune-資料">Step 2：蒐集 fine-tune 資料</h2>
<p>最關鍵的 step。資料品質決定 fine-tune 成敗。</p>
<h3 id="資料格式典型-sft-format">資料格式（典型 SFT format）</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-json" data-lang="json"><span class="line"><span class="ln">1</span><span class="cl"><span class="p">[</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">  <span class="p">{</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">    <span class="nt">&#34;instruction&#34;</span><span class="p">:</span> <span class="s2">&#34;用我們 codebase 的慣例寫一個 REST endpoint 處理 user signup&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">    <span class="nt">&#34;input&#34;</span><span class="p">:</span> <span class="s2">&#34;需求：accept email + password、回 JWT&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">    <span class="nt">&#34;output&#34;</span><span class="p">:</span> <span class="s2">&#34;// 完整符合我們慣例的 code...&#34;</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl">  <span class="p">},</span>
</span></span><span class="line"><span class="ln">7</span><span class="cl">  <span class="err">...</span>
</span></span><span class="line"><span class="ln">8</span><span class="cl"><span class="p">]</span></span></span></code></pre></div><p>或對話格式（ChatML）：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-json" data-lang="json"><span class="line"><span class="ln">1</span><span class="cl"><span class="p">[</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">  <span class="p">{</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">    <span class="nt">&#34;messages&#34;</span><span class="p">:</span> <span class="p">[</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">      <span class="p">{</span><span class="nt">&#34;role&#34;</span><span class="p">:</span> <span class="s2">&#34;system&#34;</span><span class="p">,</span> <span class="nt">&#34;content&#34;</span><span class="p">:</span> <span class="s2">&#34;你是我們 codebase 的 coding assistant&#34;</span><span class="p">},</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">      <span class="p">{</span><span class="nt">&#34;role&#34;</span><span class="p">:</span> <span class="s2">&#34;user&#34;</span><span class="p">,</span> <span class="nt">&#34;content&#34;</span><span class="p">:</span> <span class="s2">&#34;...&#34;</span><span class="p">},</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl">      <span class="p">{</span><span class="nt">&#34;role&#34;</span><span class="p">:</span> <span class="s2">&#34;assistant&#34;</span><span class="p">,</span> <span class="nt">&#34;content&#34;</span><span class="p">:</span> <span class="s2">&#34;...&#34;</span><span class="p">}</span>
</span></span><span class="line"><span class="ln">7</span><span class="cl">    <span class="p">]</span>
</span></span><span class="line"><span class="ln">8</span><span class="cl">  <span class="p">}</span>
</span></span><span class="line"><span class="ln">9</span><span class="cl"><span class="p">]</span></span></span></code></pre></div><h3 id="資料來源">資料來源</h3>
<table>
  <thead>
      <tr>
          <th>來源</th>
          <th>取得方式</th>
          <th>品質</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>過往 commit 的「good code」</td>
          <td>從 main branch 抽函式 + git log message</td>
          <td>中（人工挑）</td>
      </tr>
      <tr>
          <td>Code review 通過的 PR diff</td>
          <td>從 GitHub API 抽 merged PR</td>
          <td>高</td>
      </tr>
      <tr>
          <td>內部 wiki 跟 design docs</td>
          <td>轉成 Q&amp;A 對</td>
          <td>中</td>
      </tr>
      <tr>
          <td>Synthetic data：用大模型生</td>
          <td>給雲端旗艦 prompt「以這個 codebase 風格寫 X」</td>
          <td>中（要 review）</td>
      </tr>
      <tr>
          <td>Pair programming 紀錄</td>
          <td>自己跟 IDE 互動的 log</td>
          <td>高（最貼近真實使用）</td>
      </tr>
  </tbody>
</table>
<h3 id="資料量門檻">資料量門檻</h3>
<table>
  <thead>
      <tr>
          <th>資料量</th>
          <th>預期效果</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>&lt; 50 對</td>
          <td>通常無感、不如優化 prompt + RAG</td>
      </tr>
      <tr>
          <td>50-500 對</td>
          <td>開始有 in-domain 效果、但易 forgetting</td>
      </tr>
      <tr>
          <td>500-5000 對</td>
          <td>顯著效果、QLoRA fine-tune 甜蜜點</td>
      </tr>
      <tr>
          <td>5000+ 對</td>
          <td>邊際收益遞減、開始接近 full fine-tune 效果</td>
      </tr>
  </tbody>
</table>
<h3 id="資料-mixing防-catastrophic-forgetting">資料 mixing（防 <a href="/blog/llm/knowledge-cards/catastrophic-forgetting/" data-link-title="Catastrophic Forgetting" data-link-desc="Fine-tune 模型時、新訓練資料覆蓋掉原本學到的能力的現象、LoRA / 資料 mixing 是主要緩解">catastrophic forgetting</a>）</h3>
<p>訓練 batch 內 mix 通用資料、避免 fine-tune 把通用能力洗掉：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">80% in-domain data（你的 codebase 範例）
</span></span><span class="line"><span class="ln">2</span><span class="cl">20% 通用 instruction data（如 Alpaca、ShareGPT subset）</span></span></code></pre></div><p>通用 data 可從 Hugging Face datasets 抓（如 <code>tatsu-lab/alpaca</code>、<code>teknium/OpenHermes-2.5</code>）。</p>
<h2 id="step-3環境準備">Step 3：環境準備</h2>
<h3 id="apple-silicon-mac用-mlx">Apple Silicon Mac（用 MLX）</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># MLX 是 Apple 的 ML framework、原生支援 Apple Silicon</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">pip install mlx mlx-lm
</span></span><span class="line"><span class="ln">3</span><span class="cl">
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="c1"># 或用 conda（推薦）</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">conda create -n llm-ft <span class="nv">python</span><span class="o">=</span>3.11
</span></span><span class="line"><span class="ln">6</span><span class="cl">conda activate llm-ft
</span></span><span class="line"><span class="ln">7</span><span class="cl">pip install mlx-lm</span></span></code></pre></div><h3 id="pccuda--transformers--bitsandbytes">PC（CUDA + transformers + bitsandbytes）</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># 安裝 CUDA 12.x（依 GPU 驅動）</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="c1"># Python 套件</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">pip install torch transformers peft bitsandbytes accelerate datasets trl</span></span></code></pre></div><h2 id="step-4跑-qlora-訓練">Step 4：跑 QLoRA 訓練</h2>
<h3 id="apple-siliconmlx方式">Apple Silicon（MLX）方式</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># 把 base model 下載到本機</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">huggingface-cli download Qwen/Qwen3-Coder-7B-Instruct <span class="se">\
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="se"></span>  --local-dir ~/models/qwen3-coder-7b
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="c1"># 把資料整理成 JSONL（一行一筆）</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="c1"># data/train.jsonl、data/valid.jsonl</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">
</span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="c1"># 跑 LoRA fine-tune（MLX 內建 4-bit）</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">mlx_lm.lora <span class="se">\
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="se"></span>  --train <span class="se">\
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="se"></span>  --model ~/models/qwen3-coder-7b <span class="se">\
</span></span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="se"></span>  --data data/ <span class="se">\
</span></span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="se"></span>  --batch-size <span class="m">4</span> <span class="se">\
</span></span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="se"></span>  --lora-layers <span class="m">16</span> <span class="se">\
</span></span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="se"></span>  --iters <span class="m">1000</span> <span class="se">\
</span></span></span><span class="line"><span class="ln">16</span><span class="cl"><span class="se"></span>  --learning-rate 1e-4 <span class="se">\
</span></span></span><span class="line"><span class="ln">17</span><span class="cl"><span class="se"></span>  --steps-per-eval <span class="m">100</span> <span class="se">\
</span></span></span><span class="line"><span class="ln">18</span><span class="cl"><span class="se"></span>  --adapter-path ./adapters</span></span></code></pre></div><h3 id="pccuda方式">PC（CUDA）方式</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># train.py（簡化版）</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="kn">from</span> <span class="nn">transformers</span> <span class="kn">import</span> <span class="n">AutoTokenizer</span><span class="p">,</span> <span class="n">AutoModelForCausalLM</span><span class="p">,</span> <span class="n">TrainingArguments</span><span class="p">,</span> <span class="n">BitsAndBytesConfig</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="kn">from</span> <span class="nn">peft</span> <span class="kn">import</span> <span class="n">LoraConfig</span><span class="p">,</span> <span class="n">get_peft_model</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="kn">from</span> <span class="nn">trl</span> <span class="kn">import</span> <span class="n">SFTTrainer</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="kn">from</span> <span class="nn">datasets</span> <span class="kn">import</span> <span class="n">load_dataset</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="c1"># 4-bit 量化載入 base</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="n">bnb_config</span> <span class="o">=</span> <span class="n">BitsAndBytesConfig</span><span class="p">(</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">    <span class="n">load_in_4bit</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">    <span class="n">bnb_4bit_quant_type</span><span class="o">=</span><span class="s2">&#34;nf4&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">    <span class="n">bnb_4bit_compute_dtype</span><span class="o">=</span><span class="s2">&#34;bfloat16&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="p">)</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="n">model</span> <span class="o">=</span> <span class="n">AutoModelForCausalLM</span><span class="o">.</span><span class="n">from_pretrained</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">    <span class="s2">&#34;Qwen/Qwen3-Coder-7B-Instruct&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">    <span class="n">quantization_config</span><span class="o">=</span><span class="n">bnb_config</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl"><span class="p">)</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl">
</span></span><span class="line"><span class="ln">18</span><span class="cl"><span class="c1"># LoRA 配置</span>
</span></span><span class="line"><span class="ln">19</span><span class="cl"><span class="n">lora_config</span> <span class="o">=</span> <span class="n">LoraConfig</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">20</span><span class="cl">    <span class="n">r</span><span class="o">=</span><span class="mi">16</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">21</span><span class="cl">    <span class="n">lora_alpha</span><span class="o">=</span><span class="mi">32</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">22</span><span class="cl">    <span class="n">target_modules</span><span class="o">=</span><span class="p">[</span><span class="s2">&#34;q_proj&#34;</span><span class="p">,</span> <span class="s2">&#34;v_proj&#34;</span><span class="p">],</span>
</span></span><span class="line"><span class="ln">23</span><span class="cl">    <span class="n">lora_dropout</span><span class="o">=</span><span class="mf">0.05</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">24</span><span class="cl">    <span class="n">task_type</span><span class="o">=</span><span class="s2">&#34;CAUSAL_LM&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">25</span><span class="cl"><span class="p">)</span>
</span></span><span class="line"><span class="ln">26</span><span class="cl"><span class="n">model</span> <span class="o">=</span> <span class="n">get_peft_model</span><span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="n">lora_config</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">27</span><span class="cl">
</span></span><span class="line"><span class="ln">28</span><span class="cl"><span class="c1"># 資料</span>
</span></span><span class="line"><span class="ln">29</span><span class="cl"><span class="n">dataset</span> <span class="o">=</span> <span class="n">load_dataset</span><span class="p">(</span><span class="s2">&#34;json&#34;</span><span class="p">,</span> <span class="n">data_files</span><span class="o">=</span><span class="s2">&#34;data/train.jsonl&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">30</span><span class="cl">
</span></span><span class="line"><span class="ln">31</span><span class="cl"><span class="c1"># 訓練</span>
</span></span><span class="line"><span class="ln">32</span><span class="cl"><span class="n">training_args</span> <span class="o">=</span> <span class="n">TrainingArguments</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">33</span><span class="cl">    <span class="n">output_dir</span><span class="o">=</span><span class="s2">&#34;./checkpoints&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">34</span><span class="cl">    <span class="n">learning_rate</span><span class="o">=</span><span class="mf">1e-4</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">35</span><span class="cl">    <span class="n">num_train_epochs</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">36</span><span class="cl">    <span class="n">per_device_train_batch_size</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">37</span><span class="cl">    <span class="n">gradient_accumulation_steps</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">38</span><span class="cl">    <span class="n">save_steps</span><span class="o">=</span><span class="mi">200</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">39</span><span class="cl">    <span class="n">logging_steps</span><span class="o">=</span><span class="mi">20</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">40</span><span class="cl">    <span class="n">optim</span><span class="o">=</span><span class="s2">&#34;paged_adamw_8bit&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">41</span><span class="cl">    <span class="n">bf16</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">42</span><span class="cl"><span class="p">)</span>
</span></span><span class="line"><span class="ln">43</span><span class="cl"><span class="n">trainer</span> <span class="o">=</span> <span class="n">SFTTrainer</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">44</span><span class="cl">    <span class="n">model</span><span class="o">=</span><span class="n">model</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">45</span><span class="cl">    <span class="n">args</span><span class="o">=</span><span class="n">training_args</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">46</span><span class="cl">    <span class="n">train_dataset</span><span class="o">=</span><span class="n">dataset</span><span class="p">[</span><span class="s2">&#34;train&#34;</span><span class="p">],</span>
</span></span><span class="line"><span class="ln">47</span><span class="cl">    <span class="n">max_seq_length</span><span class="o">=</span><span class="mi">2048</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">48</span><span class="cl"><span class="p">)</span>
</span></span><span class="line"><span class="ln">49</span><span class="cl"><span class="n">trainer</span><span class="o">.</span><span class="n">train</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">50</span><span class="cl"><span class="n">trainer</span><span class="o">.</span><span class="n">save_model</span><span class="p">(</span><span class="s2">&#34;./adapters&#34;</span><span class="p">)</span></span></span></code></pre></div><p>關鍵超參數的判讀邏輯：</p>
<table>
  <thead>
      <tr>
          <th>參數</th>
          <th>預設</th>
          <th>怎麼調</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><code>r</code>（LoRA rank）</td>
          <td>16</td>
          <td>小 dataset（&lt; 1000 對）可降到 8、大 dataset 升到 32 / 64</td>
      </tr>
      <tr>
          <td><code>lora_alpha</code></td>
          <td>32（通常 = 2 × r）</td>
          <td>增大會放大 LoRA 影響、太大易 catastrophic forgetting</td>
      </tr>
      <tr>
          <td><code>target_modules</code></td>
          <td>q_proj, v_proj</td>
          <td>8B+ 模型可加 k_proj + o_proj 提品質、加 ffn 是進階</td>
      </tr>
      <tr>
          <td><code>lora_dropout</code></td>
          <td>0.05</td>
          <td>dataset 小時加大（0.1）防 overfit</td>
      </tr>
      <tr>
          <td><code>num_train_epochs</code></td>
          <td>2</td>
          <td>1-3 是常見範圍、看 validation loss 何時開始升</td>
      </tr>
      <tr>
          <td><code>per_device_train_batch_size</code></td>
          <td>4</td>
          <td>視 GPU 記憶體；不夠用 <code>gradient_accumulation_steps</code> 補</td>
      </tr>
      <tr>
          <td><code>learning_rate</code></td>
          <td>1e-4</td>
          <td>LoRA 適合較大 lr（vs full fine-tune 的 1e-5）、初值可 1e-4 ~ 5e-4</td>
      </tr>
  </tbody>
</table>
<h3 id="看-training-loss-趨勢">看 training loss 趨勢</h3>
<p>訓練過程中、loss 應該：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln"> 1</span><span class="cl">Initial：~2.5（cross-entropy on next-token）
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">1/4 訓練：降到 ~1.5
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">1/2 訓練：降到 ~1.0
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">3/4 訓練：降到 ~0.7
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">末段：穩定在 ~0.5
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">警示訊號：
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">- Loss 不降（≈ 2.0+ 持平） → lr 太小、或資料品質差、或 base 跟資料分佈完全不合
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">- Loss 降到 &lt; 0.1 → over-fit、validation loss 應該已升、stop training
</span></span><span class="line"><span class="ln">10</span><span class="cl">- Loss 出 NaN → lr 太大、降 lr 重來</span></span></code></pre></div><h2 id="step-5evaluation">Step 5：Evaluation</h2>
<p>訓練完不能只看 training loss、要實測：</p>
<h3 id="1-held-out-test-set你自己的-in-domain-資料">1. Held-out test set（你自己的 in-domain 資料）</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># 拿 valid.jsonl 跑、看模型輸出 vs expected</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"># 用 BLEU / ROUGE / 或 LLM-as-judge 評分</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">mlx_lm.generate <span class="se">\
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="se"></span>  --model ~/models/qwen3-coder-7b <span class="se">\
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="se"></span>  --adapter ./adapters <span class="se">\
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="se"></span>  --prompt <span class="s2">&#34;&lt;test prompt from valid.jsonl&gt;&#34;</span></span></span></code></pre></div><h3 id="2-通用-benchmark防-catastrophic-forgetting">2. 通用 benchmark（防 catastrophic forgetting）</h3>
<p>跑通用 HumanEval、看分數有沒有崩：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># 用 lm-evaluation-harness</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">git clone https://github.com/EleutherAI/lm-evaluation-harness
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="nb">cd</span> lm-evaluation-harness
</span></span><span class="line"><span class="ln">4</span><span class="cl">pip install -e .
</span></span><span class="line"><span class="ln">5</span><span class="cl">
</span></span><span class="line"><span class="ln">6</span><span class="cl">lm_eval --model hf <span class="se">\
</span></span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="se"></span>  --model_args <span class="nv">pretrained</span><span class="o">=</span>~/models/qwen3-coder-7b,peft<span class="o">=</span>./adapters <span class="se">\
</span></span></span><span class="line"><span class="ln">8</span><span class="cl"><span class="se"></span>  --tasks humaneval <span class="se">\
</span></span></span><span class="line"><span class="ln">9</span><span class="cl"><span class="se"></span>  --batch_size <span class="m">8</span></span></span></code></pre></div><p>判讀：</p>
<ul>
<li>HumanEval 從 75% → 75%：通用能力保留、in-domain 提升、成功</li>
<li>HumanEval 從 75% → 55%：catastrophic forgetting、要重新 fine-tune（用 LoRA + 資料 mixing 加強）</li>
</ul>
<h3 id="3-自己工作流測試最重要">3. 自己工作流測試（最重要）</h3>
<p>實際在 Continue.dev 用幾天、看：</p>
<ul>
<li>In-domain 任務輸出是否確實貼近 codebase 慣例</li>
<li>通用 coding 任務（如「寫一個 helper function」）是否仍 OK</li>
<li>對話流暢度有沒有變差</li>
<li>出現怪行為的頻率</li>
</ul>
<h2 id="step-6合併-lora-跟-base-model">Step 6：合併 LoRA 跟 base model</h2>
<p>訓練完得到 adapter（小檔、&lt; 100MB）。要用於日常推論、通常 merge 進 base：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># MLX 方式</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">mlx_lm.fuse <span class="se">\
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="se"></span>  --model ~/models/qwen3-coder-7b <span class="se">\
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="se"></span>  --adapter-path ./adapters <span class="se">\
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="se"></span>  --save-path ~/models/qwen3-coder-7b-mycodebase
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="c1"># PEFT 方式</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">python -c <span class="s2">&#34;
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="s2">from peft import AutoPeftModelForCausalLM
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="s2">import torch
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="s2">
</span></span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="s2">model = AutoPeftModelForCausalLM.from_pretrained(&#39;./adapters&#39;, torch_dtype=torch.bfloat16)
</span></span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="s2">merged = model.merge_and_unload()
</span></span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="s2">merged.save_pretrained(&#39;./merged-model&#39;)
</span></span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="s2">&#34;</span></span></span></code></pre></div><h2 id="step-7convert-成-gguf給-ollama--llamacpp-用">Step 7：Convert 成 GGUF（給 Ollama / llama.cpp 用）</h2>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># 安裝 llama.cpp</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">git clone https://github.com/ggml-org/llama.cpp
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="nb">cd</span> llama.cpp
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">pip install -r requirements.txt
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="c1"># Convert HF → GGUF</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">python convert_hf_to_gguf.py ~/models/qwen3-coder-7b-mycodebase <span class="se">\
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="se"></span>  --outfile ~/models/qwen3-coder-7b-mycodebase.gguf
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="c1"># 量化（可選、Q4_K_M 是甜蜜點）</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">./llama-quantize <span class="se">\
</span></span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="se"></span>  ~/models/qwen3-coder-7b-mycodebase.gguf <span class="se">\
</span></span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="se"></span>  ~/models/qwen3-coder-7b-mycodebase-Q4_K_M.gguf <span class="se">\
</span></span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="se"></span>  Q4_K_M</span></span></code></pre></div><h2 id="step-8deploy-到-ollama">Step 8：Deploy 到 Ollama</h2>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># 寫 Modelfile</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">cat &gt; ~/models/Modelfile-mycodebase <span class="s">&lt;&lt;EOF
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="s">FROM ~/models/qwen3-coder-7b-mycodebase-Q4_K_M.gguf
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="s">
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="s">TEMPLATE &#34;&#34;&#34;&lt;|im_start|&gt;system
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="s">{{ .System }}&lt;|im_end|&gt;
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="s">&lt;|im_start|&gt;user
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="s">{{ .Prompt }}&lt;|im_end|&gt;
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="s">&lt;|im_start|&gt;assistant
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="s">&#34;&#34;&#34;
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="s">
</span></span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="s">PARAMETER temperature 0.3
</span></span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="s">PARAMETER top_p 0.9
</span></span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="s">PARAMETER num_ctx 32768
</span></span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="s">EOF</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl">
</span></span><span class="line"><span class="ln">17</span><span class="cl"><span class="c1"># 註冊到 Ollama</span>
</span></span><span class="line"><span class="ln">18</span><span class="cl">ollama create mycodebase-coder -f ~/models/Modelfile-mycodebase
</span></span><span class="line"><span class="ln">19</span><span class="cl">
</span></span><span class="line"><span class="ln">20</span><span class="cl"><span class="c1"># 測試</span>
</span></span><span class="line"><span class="ln">21</span><span class="cl">ollama run mycodebase-coder <span class="s2">&#34;寫一個 user signup endpoint&#34;</span></span></span></code></pre></div><h2 id="step-9配-continuedev">Step 9：配 Continue.dev</h2>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-json" data-lang="json"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1">// ~/.continue/config.json 加：
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="c1"></span><span class="p">{</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">  <span class="nt">&#34;models&#34;</span><span class="p">:</span> <span class="p">[</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">    <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">      <span class="nt">&#34;title&#34;</span><span class="p">:</span> <span class="s2">&#34;My Codebase Coder&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">      <span class="nt">&#34;provider&#34;</span><span class="p">:</span> <span class="s2">&#34;ollama&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">      <span class="nt">&#34;model&#34;</span><span class="p">:</span> <span class="s2">&#34;mycodebase-coder&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">      <span class="nt">&#34;apiBase&#34;</span><span class="p">:</span> <span class="s2">&#34;http://localhost:11434&#34;</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">    <span class="p">},</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">    <span class="c1">// ... 既有 models
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="c1"></span>  <span class="p">]</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>VS Code restart 後、Continue panel 下拉就能切換。</p>
<h2 id="失敗模式跟回退">失敗模式跟回退</h2>
<h3 id="失敗-1訓練-loss-不降">失敗 1：訓練 loss 不降</h3>
<p>可能原因：</p>
<ul>
<li>資料品質差 → 人工 review 50 對、看 instruction-response 是否真有對應</li>
<li>資料 token 太短 → 多數 &lt; 100 token、模型學不到複雜 pattern</li>
<li>lr 太小 → 試 lr 5e-4</li>
</ul>
<p>回退：把資料品質提升、或放棄 fine-tune 用 RAG。</p>
<h3 id="失敗-2humaneval-大幅下降catastrophic-forgetting">失敗 2：HumanEval 大幅下降（catastrophic forgetting）</h3>
<p>緩解：</p>
<ul>
<li>加入 20% 通用 data mixing、重訓</li>
<li>降低 epochs（從 3 → 1）</li>
<li>降低 LoRA rank（從 16 → 8）</li>
</ul>
<h3 id="失敗-3in-domain-test-進步但日常用感覺沒變">失敗 3：In-domain test 進步、但日常用感覺沒變</h3>
<p>可能原因：</p>
<ul>
<li>Test set 跟真實工作流分佈不符</li>
<li>Prompt template 在訓練跟推論不一致</li>
</ul>
<p>緩解：實際在 Continue.dev 跑 1-2 週、看真實效果再判斷。</p>
<h3 id="失敗-4訓練爆-oom">失敗 4：訓練爆 OOM</h3>
<p>緩解：</p>
<ul>
<li>降 batch size（4 → 2 → 1）</li>
<li>加 gradient_accumulation_steps（保持 effective batch size）</li>
<li>用更小的 LoRA rank</li>
<li>換更小的 base model（7B → 3B）</li>
</ul>
<h2 id="何時不該繼續-fine-tune-路線">何時不該繼續 fine-tune 路線</h2>
<p>跑完一次 fine-tune 評估後、若：</p>
<ol>
<li><strong>In-domain 提升 &lt; 10%</strong>：相對成本（時間 + 維護）不划算、用 RAG</li>
<li><strong>Catastrophic forgetting &gt; 10%</strong>：跟其他能力 trade-off 不值得</li>
<li><strong>資料量不夠（&lt; 500 對）</strong>：RAG 比 fine-tune 更有效</li>
<li><strong>工作流變化快（codebase 慣例每月變）</strong>：fine-tune 過時得快、RAG 更靈活</li>
</ol>
<h2 id="跟其他模組的關係">跟其他模組的關係</h2>
<ul>
<li>原理層的 LoRA 設計見 <a href="/blog/llm/knowledge-cards/lora/" data-link-title="LoRA" data-link-desc="Low-Rank Adaptation：凍住原模型權重、只訓兩個小矩陣的 parameter-efficient fine-tuning">LoRA 卡片</a> 跟 <a href="/blog/llm/knowledge-cards/qlora/" data-link-title="QLoRA" data-link-desc="把 base model 量化到 4-bit &#43; LoRA fine-tune 的組合、消費級 GPU 也能 fine-tune 大模型">QLoRA 卡片</a></li>
<li>Catastrophic forgetting 跟整體 alignment 議題見 <a href="/blog/llm/03-theoretical-foundations/training-pipeline/" data-link-title="3.4 訓練流程：pre-train → SFT → RLHF" data-link-desc="LLM 的三階段訓練：預訓練、指令微調、人類反饋強化學習；各階段目標與最新替代方案">3.4 訓練流程</a></li>
<li>Fine-tune 後的模型評估見 <a href="/blog/llm/04-applications/benchmarking-and-evaluation/" data-link-title="4.14 Benchmarking 與評估方法論" data-link-desc="判讀 model card benchmark 數字、做自己工作流的 in-house benchmark、量測本地推論速度的完整方法論">4.14 Benchmarking</a></li>
<li>隱私 / 供應鏈面：fine-tune 後 model 怎麼分享（給 team / 上 HuggingFace）見 <a href="/blog/llm/06-security/model-supply-chain-trust/" data-link-title="6.0 模型供應鏈與信任邊界" data-link-desc="個人 dev 用本地 LLM 時的模型權重來源信任：GGUF 完整性、Hugging Face / Ollama registry 信任、量化版本污染、檔案完整性檢查">6.0 模型供應鏈</a></li>
<li>跟 RAG 的取捨見 <a href="/blog/llm/04-applications/rag-principles/" data-link-title="4.1 RAG 原理：retrieval &#43; augmentation 模式" data-link-desc="為什麼模型需要外掛知識、語意相似 vs 字面相似、chunking 的本質取捨、retrieval 失敗的根本原因">4.1 RAG 原理</a> 的「RAG vs Fine-tuning vs Long Context」段</li>
</ul>
]]></content:encoded></item><item><title>Hands-on：跨資料夾風格 follow 任務的模型對比</title><link>https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/instruction-following-test/</link><pubDate>Tue, 12 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/instruction-following-test/</guid><description>&lt;p>本篇是個讓本地 LLM 在「&lt;strong>讀兩個資料夾、學風格、寫新章節&lt;/strong>」任務上自我評估的實驗。任務本身內容無關緊要（隨便挑了一份私人創作資料夾）、要看的是&lt;strong>不同模型在 &lt;a href="https://tarrragon.github.io/blog/llm/knowledge-cards/instruction-following/" data-link-title="Instruction Following" data-link-desc="模型遵守任務範圍、格式、限制與停止條件的能力，是評估 instruction-tuned 模型能否落地的核心訊號">instruction following&lt;/a> / format consistency / 篇幅控制三個維度的差距&lt;/strong>。&lt;/p>
&lt;p>實驗跑了四個本地模型對比：&lt;/p>
&lt;ul>
&lt;li>&lt;code>gemma3:1b&lt;/code>（815 MB、舊代 / 小）&lt;/li>
&lt;li>&lt;code>gemma3:4b&lt;/code>（3.3 GB、舊代 / 中）&lt;/li>
&lt;li>&lt;code>qwen3:8b&lt;/code>（5.2 GB、跨家族 / 大）&lt;/li>
&lt;li>&lt;code>gemma4:e4b&lt;/code>（9.6 GB、新代 / 中、bf16）&lt;/li>
&lt;/ul>
&lt;p>對應 &lt;a href="https://tarrragon.github.io/blog/llm/04-applications/agent-architecture/" data-link-title="4.4 Agent 架構原理" data-link-desc="Agent loop 結構、失敗模式、什麼任務適合 vs 不適合、跟人類審查的協作模型">4.4 Agent 架構&lt;/a> 「規劃能力是雲端旗艦的明顯強項、本地小模型的明顯弱項」這條觀察、用具體 structural metrics 驗證、並揭示**「最新世代 + 較大 size」未必比「跨家族 / 較強訓練」勝出**。&lt;/p>
&lt;blockquote>
&lt;p>&lt;strong>驗證日期&lt;/strong>：2026-05-12
&lt;strong>環境&lt;/strong>：Ollama 0.23.2、Apple Silicon、&lt;a href="https://tarrragon.github.io/blog/llm/knowledge-cards/gpu-compute-backend/" data-link-title="GPU Compute Backend" data-link-desc="GPU 加速計算的底層 API 介面（CUDA / ROCm / Vulkan / Metal / SYCL）、決定推論軟體能否用 GPU 跑得快">MPS backend&lt;/a>
&lt;strong>任務&lt;/strong>：讀資料夾 A（風格參考、5 章已寫完）+ 資料夾 B（同類型、5 章已寫完、需寫 v06）→ 為 B 生成 v06
&lt;strong>評估方式&lt;/strong>：純 structural metrics、不評論內容品質&lt;/p>&lt;/blockquote>
&lt;h2 id="任務設計">任務設計&lt;/h2>
&lt;p>兩個資料夾結構：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-text" data-lang="text">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">A/ B/
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">├── README.md ├── README.md
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">├── v01_XXX.md ├── v01_XXX.md
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">├── v02_XXX.md ├── v02_XXX.md
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl">├── v03_XXX.md ├── v03_XXX.md
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">6&lt;/span>&lt;span class="cl">├── v04_XXX.md ├── v04_XXX.md
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">7&lt;/span>&lt;span class="cl">└── v05_XXX.md └── v05_XXX.md
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">8&lt;/span>&lt;span class="cl"> └── v06_XXX.md ← 要生成&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>兩個資料夾用&lt;strong>不同 markdown 格式&lt;/strong>：&lt;/p>
&lt;ul>
&lt;li>A 風格：&lt;code># 標題&lt;/code>（H1）+ &lt;code>## 場景設定&lt;/code> 段 + 結尾 &lt;code>**【本章結束】**&lt;/code>&lt;/li>
&lt;li>B 風格：&lt;code>## v0X｜&amp;lt;主題&amp;gt;（&amp;lt;角色1&amp;gt;×&amp;lt;角色2&amp;gt;）&lt;/code>（H2）+ 直接敘事、無結尾 marker&lt;/li>
&lt;/ul>
&lt;p>LLM 看完 A + B 後、要寫 B 的 v06——&lt;strong>必須 follow B 的格式、不是 A 的&lt;/strong>。是個 format discrimination 測試。&lt;/p>
&lt;h2 id="評估維度">評估維度&lt;/h2>
&lt;p>純 structural、不涉內容：&lt;/p>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>維度&lt;/th>
 &lt;th>測法&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>篇幅控制&lt;/td>
 &lt;td>char count、跟 B 既有 v01-v05 平均比&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>段落結構&lt;/td>
 &lt;td>paragraph count、avg paragraph char&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Markdown heading&lt;/td>
 &lt;td>H1 / H2 count、是否寫對 v06 title 格式&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>結尾 marker&lt;/td>
 &lt;td>是否誤加 A 風格的「&lt;strong>【本章結束】&lt;/strong>」&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>角色 fidelity&lt;/td>
 &lt;td>提到 B 兩個主角名次數（太少 = 內容偏離）&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>跨資料夾串戲&lt;/td>
 &lt;td>提到 A 資料夾角色名次數（contamination）&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>對話 follow&lt;/td>
 &lt;td>「對話行」（行首是 &lt;code>「&lt;/code>）數量、跟 baseline 比&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>生成時間&lt;/td>
 &lt;td>從送 prompt 到收完整 response&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>不評估的：&lt;/p></description><content:encoded><![CDATA[<p>本篇是個讓本地 LLM 在「<strong>讀兩個資料夾、學風格、寫新章節</strong>」任務上自我評估的實驗。任務本身內容無關緊要（隨便挑了一份私人創作資料夾）、要看的是<strong>不同模型在 <a href="/blog/llm/knowledge-cards/instruction-following/" data-link-title="Instruction Following" data-link-desc="模型遵守任務範圍、格式、限制與停止條件的能力，是評估 instruction-tuned 模型能否落地的核心訊號">instruction following</a> / format consistency / 篇幅控制三個維度的差距</strong>。</p>
<p>實驗跑了四個本地模型對比：</p>
<ul>
<li><code>gemma3:1b</code>（815 MB、舊代 / 小）</li>
<li><code>gemma3:4b</code>（3.3 GB、舊代 / 中）</li>
<li><code>qwen3:8b</code>（5.2 GB、跨家族 / 大）</li>
<li><code>gemma4:e4b</code>（9.6 GB、新代 / 中、bf16）</li>
</ul>
<p>對應 <a href="/blog/llm/04-applications/agent-architecture/" data-link-title="4.4 Agent 架構原理" data-link-desc="Agent loop 結構、失敗模式、什麼任務適合 vs 不適合、跟人類審查的協作模型">4.4 Agent 架構</a> 「規劃能力是雲端旗艦的明顯強項、本地小模型的明顯弱項」這條觀察、用具體 structural metrics 驗證、並揭示**「最新世代 + 較大 size」未必比「跨家族 / 較強訓練」勝出**。</p>
<blockquote>
<p><strong>驗證日期</strong>：2026-05-12
<strong>環境</strong>：Ollama 0.23.2、Apple Silicon、<a href="/blog/llm/knowledge-cards/gpu-compute-backend/" data-link-title="GPU Compute Backend" data-link-desc="GPU 加速計算的底層 API 介面（CUDA / ROCm / Vulkan / Metal / SYCL）、決定推論軟體能否用 GPU 跑得快">MPS backend</a>
<strong>任務</strong>：讀資料夾 A（風格參考、5 章已寫完）+ 資料夾 B（同類型、5 章已寫完、需寫 v06）→ 為 B 生成 v06
<strong>評估方式</strong>：純 structural metrics、不評論內容品質</p></blockquote>
<h2 id="任務設計">任務設計</h2>
<p>兩個資料夾結構：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">A/                          B/
</span></span><span class="line"><span class="ln">2</span><span class="cl">├── README.md               ├── README.md
</span></span><span class="line"><span class="ln">3</span><span class="cl">├── v01_XXX.md              ├── v01_XXX.md
</span></span><span class="line"><span class="ln">4</span><span class="cl">├── v02_XXX.md              ├── v02_XXX.md
</span></span><span class="line"><span class="ln">5</span><span class="cl">├── v03_XXX.md              ├── v03_XXX.md
</span></span><span class="line"><span class="ln">6</span><span class="cl">├── v04_XXX.md              ├── v04_XXX.md
</span></span><span class="line"><span class="ln">7</span><span class="cl">└── v05_XXX.md              └── v05_XXX.md
</span></span><span class="line"><span class="ln">8</span><span class="cl">                            └── v06_XXX.md  ← 要生成</span></span></code></pre></div><p>兩個資料夾用<strong>不同 markdown 格式</strong>：</p>
<ul>
<li>A 風格：<code># 標題</code>（H1）+ <code>## 場景設定</code> 段 + 結尾 <code>**【本章結束】**</code></li>
<li>B 風格：<code>## v0X｜&lt;主題&gt;（&lt;角色1&gt;×&lt;角色2&gt;）</code>（H2）+ 直接敘事、無結尾 marker</li>
</ul>
<p>LLM 看完 A + B 後、要寫 B 的 v06——<strong>必須 follow B 的格式、不是 A 的</strong>。是個 format discrimination 測試。</p>
<h2 id="評估維度">評估維度</h2>
<p>純 structural、不涉內容：</p>
<table>
  <thead>
      <tr>
          <th>維度</th>
          <th>測法</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>篇幅控制</td>
          <td>char count、跟 B 既有 v01-v05 平均比</td>
      </tr>
      <tr>
          <td>段落結構</td>
          <td>paragraph count、avg paragraph char</td>
      </tr>
      <tr>
          <td>Markdown heading</td>
          <td>H1 / H2 count、是否寫對 v06 title 格式</td>
      </tr>
      <tr>
          <td>結尾 marker</td>
          <td>是否誤加 A 風格的「<strong>【本章結束】</strong>」</td>
      </tr>
      <tr>
          <td>角色 fidelity</td>
          <td>提到 B 兩個主角名次數（太少 = 內容偏離）</td>
      </tr>
      <tr>
          <td>跨資料夾串戲</td>
          <td>提到 A 資料夾角色名次數（contamination）</td>
      </tr>
      <tr>
          <td>對話 follow</td>
          <td>「對話行」（行首是 <code>「</code>）數量、跟 baseline 比</td>
      </tr>
      <tr>
          <td>生成時間</td>
          <td>從送 prompt 到收完整 response</td>
      </tr>
  </tbody>
</table>
<p>不評估的：</p>
<ul>
<li>內容品質、文筆好壞</li>
<li>敘事邏輯是否合理</li>
<li>角色塑造是否生動</li>
</ul>
<p>純 structural 評估的好處是 reproducible、不需 reviewer 主觀判斷、可自動跑。</p>
<h2 id="baselineb-既有-v01-v05-的-metrics">Baseline：B 既有 v01-v05 的 metrics</h2>
<p>B 資料夾 5 個既有章節的平均：</p>
<table>
  <thead>
      <tr>
          <th>Metric</th>
          <th>Average</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>char count</td>
          <td>~933</td>
      </tr>
      <tr>
          <td>paragraph count</td>
          <td>~32</td>
      </tr>
      <tr>
          <td>avg paragraph chars</td>
          <td>~29</td>
      </tr>
      <tr>
          <td>dialogue lines</td>
          <td>~7</td>
      </tr>
      <tr>
          <td>H1 used</td>
          <td>0（全部用 H2）</td>
      </tr>
      <tr>
          <td>H2 used</td>
          <td>1</td>
      </tr>
      <tr>
          <td>結尾「<strong>【本章結束】</strong>」</td>
          <td>全部 False</td>
      </tr>
      <tr>
          <td>Cross leak</td>
          <td>全部 0</td>
      </tr>
      <tr>
          <td>主角名提及（合計）</td>
          <td>~60</td>
      </tr>
  </tbody>
</table>
<p>這是 LLM 該模仿的目標。</p>
<h2 id="四個模型的結果">四個模型的結果</h2>
<p>四個 model 跑同樣 prompt、同樣輸入內容。</p>
<h3 id="對比表">對比表</h3>
<table>
  <thead>
      <tr>
          <th>維度</th>
          <th>Baseline</th>
          <th><code>gemma3:1b</code></th>
          <th><code>gemma3:4b</code></th>
          <th><code>qwen3:8b</code></th>
          <th><code>gemma4:e4b</code></th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><strong>模型大小</strong></td>
          <td>—</td>
          <td>815 MB</td>
          <td>3.3 GB</td>
          <td>5.2 GB</td>
          <td>9.6 GB（bf16）</td>
      </tr>
      <tr>
          <td><strong>發布世代</strong></td>
          <td>—</td>
          <td>Gemma 3</td>
          <td>Gemma 3</td>
          <td>Qwen 3</td>
          <td><strong>Gemma 4（2026/4）</strong></td>
      </tr>
      <tr>
          <td>char count</td>
          <td>~933</td>
          <td>4324（4.6×）</td>
          <td>1330</td>
          <td><strong>951（1.02×）</strong></td>
          <td>679</td>
      </tr>
      <tr>
          <td>paragraph count</td>
          <td>~32</td>
          <td>145</td>
          <td>29</td>
          <td><strong>36</strong></td>
          <td>11</td>
      </tr>
      <tr>
          <td>avg paragraph chars</td>
          <td>~29</td>
          <td>30</td>
          <td>46</td>
          <td><strong>26</strong></td>
          <td>62</td>
      </tr>
      <tr>
          <td>H1 = 0</td>
          <td>符合</td>
          <td>不符（1）</td>
          <td>符合</td>
          <td>符合</td>
          <td>不符（1）</td>
      </tr>
      <tr>
          <td>H2 = 1</td>
          <td>符合</td>
          <td>不符（0）</td>
          <td>符合</td>
          <td>符合</td>
          <td>不符（3）</td>
      </tr>
      <tr>
          <td>v06 title 格式</td>
          <td>—</td>
          <td>不符</td>
          <td>符合</td>
          <td>符合</td>
          <td>不符</td>
      </tr>
      <tr>
          <td>結尾 marker</td>
          <td>False</td>
          <td>符合</td>
          <td>符合</td>
          <td>符合</td>
          <td>符合</td>
      </tr>
      <tr>
          <td>Cross leak</td>
          <td>0</td>
          <td>無（0）</td>
          <td>無（0）</td>
          <td>無（0）</td>
          <td>無（0）</td>
      </tr>
      <tr>
          <td>dialogue lines</td>
          <td>~7</td>
          <td>4</td>
          <td><strong>0</strong></td>
          <td><strong>7</strong></td>
          <td>0</td>
      </tr>
      <tr>
          <td>主角名提及（合計）</td>
          <td>~60</td>
          <td>286</td>
          <td>24</td>
          <td><strong>27</strong></td>
          <td><strong>0</strong></td>
      </tr>
      <tr>
          <td><strong>通過項目</strong></td>
          <td>—</td>
          <td><strong>2 / 7</strong></td>
          <td><strong>6 / 7</strong></td>
          <td><strong>7 / 7</strong></td>
          <td><strong>1 / 7</strong></td>
      </tr>
      <tr>
          <td>生成時間</td>
          <td>—</td>
          <td>41.8s</td>
          <td>36.5s</td>
          <td>97.5s</td>
          <td>43.5s</td>
      </tr>
  </tbody>
</table>
<h3 id="各模型觀察">各模型觀察</h3>
<p><strong><code>gemma3:1b</code>（815 MB）</strong>：</p>
<ul>
<li>篇幅 4.6× 失控、段落數 4.5× 超標、用 H1 而不是 H2。</li>
<li>顯示 1B 模型對「2000-3000 字」這種 numeric instruction 沒有有效執行能力、會一直生成到 context 限制。</li>
<li>但 cross leak 0、結尾 marker 也沒誤加——「不要 X」這類 negative instruction follow 較成功。</li>
</ul>
<p><strong><code>gemma3:4b</code>（3.3 GB）</strong>：</p>
<ul>
<li>篇幅 / 段落 / heading 結構全 OK、明顯比 1B 大幅改善。</li>
<li><strong>dialogue lines = 0</strong>：完全沒寫對話、整篇純敘事。表示 4B 抓到字面 structural feature、但沒抓到「對話 driven 敘事」這個 stylistic feature。</li>
<li>主角名提及 24 次（baseline ~60）—內容偏短、提及次數偏低、但比例合理。</li>
</ul>
<p><strong><code>qwen3:8b</code>（5.2 GB、跨家族）</strong>：</p>
<ul>
<li><strong>唯一 7/7 全 pass 的模型</strong>——篇幅完美匹配（951 vs ~933）、段落數合理（36 vs ~32）、heading 對、對話 7 行完全等於 baseline。</li>
<li>跨家族 + 大一級的組合表現質變，比同家族下一級的 4B 模型大幅提升。</li>
<li>代價：生成時間 97.5s、約是 4B 模型的 2.7×。</li>
</ul>
<p><strong><code>gemma4:e4b</code>（9.6 GB、新代）</strong>：</p>
<ul>
<li><strong>驚人的 1/7、最差表現</strong>——比 1B 還少通過項目。</li>
<li><strong>主角名提及 0</strong>：完全沒寫角色名、純抽象敘述「某一方」「另一方」。</li>
<li><strong>dialogue 0</strong>：沒對話。</li>
<li><strong>生成內容是「劇情大綱建議」而非實際章節</strong>：含「劇情核心思路」「預計情緒強度」「寫作切入點建議」等 meta-text。</li>
<li>輸出末尾「<strong>（此為結構化建議、等待具體的指令後、將會生成與風格一致的劇情內容。）</strong>」——明示它把 prompt 理解成「給建議框架、等下一步」。</li>
</ul>
<h3 id="strict-prompt-retest揭示-internal-alignment">Strict prompt retest：揭示 internal alignment</h3>
<p>懷疑 1/7 可能是「prompt 不夠強硬」、用 strict prompt 重跑 <code>gemma4:e4b</code>。Strict 加了八條規則、明示：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">- 直接從 `## v06｜...` 開頭、不寫前言
</span></span><span class="line"><span class="ln">2</span><span class="cl">- 絕對不可寫「劇情核心思路」「預計情緒強度」「寫作切入點」等 meta-text
</span></span><span class="line"><span class="ln">3</span><span class="cl">- 必須直接寫敘事內容、含對話、動作、感受描寫
</span></span><span class="line"><span class="ln">4</span><span class="cl">- 強制提到角色名多次、不要用「某一方」「另一人」抽象稱呼
</span></span><span class="line"><span class="ln">5</span><span class="cl">- ...</span></span></code></pre></div><p>Strict prompt 結果：</p>
<table>
  <thead>
      <tr>
          <th>Metric</th>
          <th>原 prompt</th>
          <th>strict prompt</th>
          <th>變化</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>char count</td>
          <td>679</td>
          <td>660</td>
          <td>相同量級</td>
      </tr>
      <tr>
          <td>H1 = 0</td>
          <td>不符（1）</td>
          <td>符合</td>
          <td><strong>改善</strong></td>
      </tr>
      <tr>
          <td>H2 = 1</td>
          <td>不符（3）</td>
          <td>符合</td>
          <td><strong>改善</strong></td>
      </tr>
      <tr>
          <td>v06 title 格式</td>
          <td>不符</td>
          <td>符合</td>
          <td><strong>改善</strong></td>
      </tr>
      <tr>
          <td>meta-text 出現</td>
          <td>有</td>
          <td>無</td>
          <td><strong>改善</strong></td>
      </tr>
      <tr>
          <td>dialogue lines</td>
          <td>0</td>
          <td>3</td>
          <td><strong>改善</strong></td>
      </tr>
      <tr>
          <td><strong>主角名提及</strong></td>
          <td><strong>0</strong></td>
          <td><strong>0</strong></td>
          <td><strong>未改善</strong></td>
      </tr>
      <tr>
          <td><strong>通過項目</strong></td>
          <td><strong>1 / 7</strong></td>
          <td><strong>4 / 7</strong></td>
          <td><strong>+3</strong></td>
      </tr>
  </tbody>
</table>
<p>從 1/7 → 4/7、prompt 強化明顯有用。但<strong>主角名提及兩次都 0</strong>、即使 strict prompt 明示「強制提到角色名」、模型仍用「兩人」「彼此」「對方」抽象稱呼。</p>
<p>這比「模型不會 follow」更精確、是兩個層次的 follow 差別：</p>
<ul>
<li><strong>Surface level instruction</strong>（heading 格式、不要 meta-text、要對話）：model 願意 follow strict prompt。</li>
<li><strong>Semantic level instruction</strong>（在這個情境用具名角色）：model 有 <strong>internal alignment 抗拒</strong>、即使 prompt 明示也不 follow。</li>
</ul>
<p>Gemma 4 e4b 是 device-deployable edge variant、RLHF 可能特別針對「敏感情境下的人物識別」做 alignment。這個 alignment 比 prompt-level instruction follow 更深、是 hard line、不能用 prompt engineering 繞過。</p>
<h2 id="關鍵觀察">關鍵觀察</h2>
<h3 id="model-size-不是唯一因素訓練-alignment-更重要">Model size 不是唯一因素、訓練 alignment 更重要</h3>
<p>最反直覺的結果：</p>
<ul>
<li><code>gemma4:e4b</code>（9.6 GB、最新世代）原 prompt 通過 <strong>1/7</strong>、strict prompt 通過 <strong>4/7</strong>。</li>
<li><code>gemma3:4b</code>（3.3 GB、舊一代）通過 <strong>6/7</strong>。</li>
<li><code>qwen3:8b</code>（5.2 GB、跨家族）通過 <strong>7/7</strong>。</li>
</ul>
<p>「最大 + 最新」不等於「最好 follow instruction」。在這個任務上、ranking 是：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">qwen3:8b &gt; gemma3:4b &gt; gemma3:1b ≈ gemma4:e4b (strict) &gt; gemma4:e4b (default)</span></span></code></pre></div><p>可能因素：</p>
<ol>
<li><strong>訓練資料分佈差異</strong>：Qwen 系列訓練資料含大量中文、對中文 instruction follow 更穩。</li>
<li><strong>Edge variant 的 alignment 設計</strong>：<code>gemma4:e4b</code> 是 device-deployable edge variant、RLHF 可能特別在敏感情境用 conservative output。Strict prompt 能改善 surface-level（heading、meta-text、對話）、但 semantic-level（具名角色）有 hard line 不能繞過。</li>
<li><strong>跨家族效應 &gt; 跨代效應</strong>：Qwen vs Gemma（不同家族）比 Gemma 3 vs Gemma 4（同家族跨代）影響更大。</li>
</ol>
<h3 id="兩層-instruction-follow">兩層 instruction follow</h3>
<p><code>gemma4:e4b</code> 的 strict prompt retest 揭示一個重要區分：</p>
<ul>
<li><strong>Surface-level instruction</strong>（heading 格式、不要 meta-text、要對話）：可以用 strict prompt 改善、prompt engineering 有效。</li>
<li><strong>Semantic-level alignment</strong>（特定情境的角色處理、敏感主題的表述方式）：是 RLHF 階段建立的 hard line、prompt engineering 繞不過。</li>
</ul>
<p>設計應用時要意識：<strong>「LLM follow 不了 instruction」可能不是能力問題、是 alignment 問題</strong>。模型訓練時被刻意 align 不做某些事、即使 prompt 明示也不會做。發現這種情況、改換 model（或 less-aligned variant）會比繼續調 prompt 更省時間。</p>
<h3 id="最新世代的標籤可能誤導">「最新世代」的標籤可能誤導</h3>
<p>Gemma 4 是 2026/4/2 才發布的最新代、size 也夠大、但在這個 instruction following 任務上<strong>輸給 6 個月前發布的 Gemma 3 4b</strong>。</p>
<p>設計應用 / 選模型時、實測對自己 task 的表現比「最新 / 最大」標籤可靠。Benchmark ranking（如 LMSYS Chatbot Arena）反映平均表現、未必 reflect 你的 narrow 任務。本實驗示範了「自己跑一次」比「看 benchmark」更可靠的判讀方法。</p>
<h3 id="structural-feature-跟-stylistic-feature-兩層">Structural feature 跟 stylistic feature 兩層</h3>
<p>跨四個模型一致觀察：</p>
<ul>
<li><strong>Structural feature</strong>（heading level、結尾 marker、不要 cross leak）：所有模型多少都抓到。</li>
<li><strong>Stylistic feature</strong>（對話 driven 敘事、篇幅精準）：差異極大、Qwen3 8B 完美、其他三個都有明顯失分。</li>
</ul>
<p>這對應 <a href="/blog/llm/04-applications/agent-architecture/" data-link-title="4.4 Agent 架構原理" data-link-desc="Agent loop 結構、失敗模式、什麼任務適合 vs 不適合、跟人類審查的協作模型">4.4 Agent</a> 的「規劃 vs 字面 follow」差距——字面 instruction 容易、stylistic mimic 困難。寫應用時、預期 follow「形式約束」（output JSON、結尾 signature）跟 follow「風格約束」（用簡潔口吻、bullet 而非段落）兩種 instruction 的成功率不同。</p>
<h3 id="cross-pairing-leak全-0">Cross-pairing leak：全 0</h3>
<p>四個模型 cross leak 都 0——表示「不要混角色」這個 instruction 兩個都 follow 成功。可能因素：</p>
<ul>
<li>角色名是名詞、模型 generation 時容易 constrain。</li>
<li>Prompt 已明示「為 B 寫」、模型沒被 A 角色名干擾。</li>
</ul>
<p>如果改成模糊 instruction（「混合 A、B 風格」）、leak 可能會出現——本實驗沒涵蓋這個 case。</p>
<h3 id="生成時間size--時間">生成時間：size ≠ 時間</h3>
<p>四個模型的生成時間：</p>
<table>
  <thead>
      <tr>
          <th>模型</th>
          <th>size</th>
          <th>時間</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>gemma3:1b</td>
          <td>815 MB</td>
          <td>41.8s</td>
      </tr>
      <tr>
          <td>gemma3:4b</td>
          <td>3.3 GB</td>
          <td>36.5s</td>
      </tr>
      <tr>
          <td>qwen3:8b</td>
          <td>5.2 GB</td>
          <td><strong>97.5s</strong></td>
      </tr>
      <tr>
          <td>gemma4:e4b</td>
          <td>9.6 GB</td>
          <td>43.5s</td>
      </tr>
  </tbody>
</table>
<p>意外發現：</p>
<ol>
<li><strong>1B 比 4B 慢</strong>：因為 1B 生成 4324 字、4B 生成 1330 字、總 token 量決定總時間、不是 model size。</li>
<li><strong>qwen3:8b 慢 2.7×</strong>：8B 的 forward pass 較慢、加上 generation 量級正常、總時間最長。</li>
<li><strong>gemma4:e4b 跟 1B 相近</strong>：generation 短（679 字）、抵消 model 較大的開銷。</li>
</ol>
<p><a href="/blog/llm/knowledge-cards/tokens-per-second/" data-link-title="Tokens Per Second" data-link-desc="LLM 每秒能生成幾個 token：生字速度的標準量化指標">tokens per second</a> 跟 total latency 是兩件事——decode 速度快但生成太多 token、未必更快完成任務。</p>
<h2 id="對寫應用的啟示">對寫應用的啟示</h2>
<ol>
<li><strong>「最新最大」≠ 「最好 follow」</strong>：選模型實測自己 task、benchmark / size 只是輔助訊號。</li>
<li><strong>本地小模型（&lt; 3B）做需要 follow 結構規則的任務、要嚴格驗證</strong>：用 structural metrics 自動 check、目視判斷模型「看起來有做到」的可靠度低。</li>
<li><strong>Edge variant 可能有 special behavior</strong>：device-deployable variant 可能 RLHF 偏向 conservative、不一定適合所有任務。</li>
<li><strong>跨家族對比比同家族升 size 收益大</strong>：Qwen3 8B vs Gemma3 4B 比 Gemma3 4B vs Gemma3 1B 改善更明顯。</li>
<li><strong>「形式跟風格」分開驗證</strong>：應用層的 validation 分維度 score、比一次評全部更可解讀。</li>
</ol>
<h2 id="跑這個實驗的-framework">跑這個實驗的 framework</h2>
<p>通用流程（不放具體 script、會綁定 corpus 內容）：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">1. 準備兩個資料夾、A 是風格參考、B 是 work-in-progress
</span></span><span class="line"><span class="ln">2</span><span class="cl">2. 寫 helper script 把兩個資料夾完整內容 + 任務說明做成 prompt
</span></span><span class="line"><span class="ln">3</span><span class="cl">3. 跑多個 model 各一次（同 prompt、不同 model）
</span></span><span class="line"><span class="ln">4</span><span class="cl">4. 對輸出計算 structural metrics（char count、paragraph、heading、dialogue lines）
</span></span><span class="line"><span class="ln">5</span><span class="cl">5. 跟 B 既有章節的 baseline metrics 對比
</span></span><span class="line"><span class="ln">6</span><span class="cl">6. 列通過 / 失敗矩陣</span></span></code></pre></div><p>關鍵設計選擇：</p>
<ul>
<li><strong>A 跟 B 風格故意不一樣</strong>：才能驗證 LLM 是否分辨「該 follow 哪個」。</li>
<li><strong>不評估內容品質</strong>：純 structural 評估 reproducible、不需 reviewer 主觀判斷。</li>
<li><strong>baseline 用既有章節算</strong>：B 自己的 v01-v05 是「正確答案」的 reference。</li>
<li><strong>跑多個跨家族 / 跨世代 / 跨 size 模型</strong>：避免「只測一個就下結論」的偏差。</li>
</ul>
<h2 id="何時這份對比會過時">何時這份對比會過時</h2>
<ul>
<li><strong>具體模型 ranking</strong>：新模型發布後 ranking 會變、特別是新版 Gemma 4 / Qwen 4 / Llama 4 等推出時。</li>
<li><strong>「Gemma 4 edge 表現差」這個觀察</strong>：可能隨後續 fine-tune 或新版改善。</li>
</ul>
<p><strong>不會過時的部分</strong>：</p>
<ul>
<li>Model size 不是 instruction following 的唯一因素——這個現象在所有 LLM 都存在。</li>
<li>Structural vs stylistic 兩層 follow 難度不同。</li>
<li>跨家族對比比同家族升 size 收益大、這個現象可能持續。</li>
<li>純 metrics-based 評估比主觀判斷可重現。</li>
<li>「自己跑一次」比「看 benchmark」更可靠的判讀邏輯。</li>
</ul>
<p>未來想擴展、可以加入更多維度（如反向 retrieval：把生成內容當 query、看能不能找回原資料夾；或 perplexity-based 評估）。</p>
<p>跟其他 hands-on 章節的關係：完整 hands-on 系列見 <a href="/blog/llm/01-local-llm-services/hands-on/" data-link-title="Hands-on：本地 AI 工具實作筆記" data-link-desc="Ollama / ComfyUI / Whisper / Piper TTS：實際安裝、驗證、跑通的紀錄。隨工具版本演化、跟 1.x 原理章節互補。">Hands-on 章節索引</a>、選模型的優先序策略見 <a href="/blog/llm/01-local-llm-services/model-selection-priority/" data-link-title="1.4 寫 code 場景的模型選型優先順序" data-link-desc="Gemma 4 31B MTP → Qwen3-Coder 30B → Qwen3 14B → gpt-oss 20B 的取捨與適用情境">Model selection priority</a>、模型 tag 命名規則見 <a href="/blog/llm/knowledge-cards/model-tag/" data-link-title="Model Tag" data-link-desc="Ollama 等推論伺服器用來定位特定模型版本的命名規則">Model tag</a>、跑多模型的記憶體預算見 <a href="/blog/llm/01-local-llm-services/hands-on/resource-management/" data-link-title="Hands-on：LLM 運行中 &#43; 結束的資源管理" data-link-desc="RAM / 磁碟 / port 三個 dimension 的觀察跟釋放、Ollama keep_alive 跟 ComfyUI 兩種 lifecycle 對比、實測釋放數字">Resource management</a>。</p>
]]></content:encoded></item><item><title>Hands-on：LLM 運行中 + 結束的資源管理</title><link>https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/resource-management/</link><pubDate>Tue, 12 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/resource-management/</guid><description>&lt;p>跑本地 LLM 的核心 invariant 跟雲端不一樣：&lt;strong>Mac 是 shared resource、不是 dedicated GPU&lt;/strong>。雲端 inference server 跑進 dedicated container、結束 instance 自然回收所有資源；本地&lt;a href="https://tarrragon.github.io/blog/llm/knowledge-cards/inference-server/" data-link-title="Inference Server" data-link-desc="載入模型權重、處理 prompt、產生 token 的常駐 process">推論伺服器&lt;/a>跑在你日常用的 Mac、跟 &lt;a href="https://tarrragon.github.io/blog/llm/knowledge-cards/unified-memory/" data-link-title="Unified Memory Architecture" data-link-desc="Apple Silicon 讓 CPU / GPU / NE 共用同一塊記憶體：跑大模型的優勢來源">統一記憶體&lt;/a> 共享同一塊容量，忘記管理會 silently 吃光 RAM、磁碟、port、最後讓系統變慢甚至 swap。&lt;/p>
&lt;p>本篇紀錄三個 dimension（RAM / 磁碟 / port）的觀察工具跟釋放姿勢、對比 Ollama 跟 ComfyUI 兩種典型 lifecycle、加上實測釋放數字。對應 &lt;a href="https://tarrragon.github.io/blog/llm/00-foundations/privacy-data-flow/" data-link-title="0.7 隱私 / 資安的資料流原理" data-link-desc="從「位置」到「資料流」的思考升級：信任邊界、合約模型、零信任原則套用到 LLM 工作流">0.7 隱私資料流原理&lt;/a>「每個 hop 都要 audit」這條思維——資源管理也是 hop 級的 audit、不是「裝完就忘」。&lt;/p>
&lt;blockquote>
&lt;p>&lt;strong>驗證日期&lt;/strong>：2026-05-12
&lt;strong>環境&lt;/strong>：macOS 14、Apple Silicon、Ollama 0.23.2、ComfyUI 0.21.0、SDXL base 1.0&lt;/p>&lt;/blockquote>
&lt;h2 id="為什麼這事重要">為什麼這事重要&lt;/h2>
&lt;p>雲端 inference：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-text" data-lang="text">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">Container start → load model → serve requests → container stop → 所有 RAM / 磁碟 / port 自動回收&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>本地 inference：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-text" data-lang="text">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">brew services start → load model on demand → serve → ??? → 你忘記 stop
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl"> → RAM / 磁碟一直被佔
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl"> → 下次重開機才釋放&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>具體會踩到的問題：&lt;/p>
&lt;ul>
&lt;li>&lt;strong>RAM&lt;/strong>：18 GB SDXL 模型載入後不會自動卸、即使 ComfyUI idle、Python process 仍占 RAM&lt;/li>
&lt;li>&lt;strong>磁碟&lt;/strong>：&lt;code>ollama pull&lt;/code> 累積、&lt;code>~/.ollama/models/blobs&lt;/code> 半年可長到 50 GB+、不主動清不會減&lt;/li>
&lt;li>&lt;strong>Port&lt;/strong>：上次 crash 的 &lt;code>ollama serve&lt;/code> 進程沒乾淨清、port 11434 還占著、下次啟動報「address already in use」&lt;/li>
&lt;li>&lt;strong>GPU / Metal&lt;/strong>：模型載入後 Metal context 佔住、跟其他 GPU-using app（影片剪輯、遊戲）競爭&lt;/li>
&lt;/ul>
&lt;h2 id="三個-dimension--觀察工具">三個 dimension + 觀察工具&lt;/h2>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>Dimension&lt;/th>
 &lt;th>觀察指令&lt;/th>
 &lt;th>看什麼&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>RAM&lt;/td>
 &lt;td>&lt;code>vm_stat | head -5&lt;/code>&lt;/td>
 &lt;td>Pages free（每 page 16 KB）、空閒越多越好&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>RAM（per process）&lt;/td>
 &lt;td>Activity Monitor 或 &lt;code>ps aux | sort -k6 -rn | head&lt;/code>&lt;/td>
 &lt;td>哪個 process 佔最多記憶體&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>磁碟&lt;/td>
 &lt;td>&lt;code>df -h ~ | tail -1&lt;/code>&lt;/td>
 &lt;td>系統 volume 剩餘&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>磁碟（per dir）&lt;/td>
 &lt;td>&lt;code>du -sh ~/.ollama/models/blobs&lt;/code>&lt;/td>
 &lt;td>LLM models 累積量&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Port&lt;/td>
 &lt;td>&lt;code>lsof -i :11434&lt;/code>&lt;/td>
 &lt;td>誰在 listen 該 port&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Process&lt;/td>
 &lt;td>&lt;code>ps aux | grep -i ollama | grep -v grep&lt;/code>&lt;/td>
 &lt;td>Ollama / ComfyUI / Python 跑哪幾個&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Ollama loaded models&lt;/td>
 &lt;td>&lt;code>ollama ps&lt;/code>&lt;/td>
 &lt;td>哪些 model 在 RAM、size、idle timer&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>實測：剛 kill 完 ComfyUI（SDXL + Python venv）後、&lt;code>vm_stat&lt;/code> 看到 free pages 從 619K 變 1090K（每 page 16 KB）、約 &lt;strong>+7.5 GB RAM 釋放&lt;/strong>——這就是 SDXL + ComfyUI process 一直占的記憶體量。&lt;/p></description><content:encoded><![CDATA[<p>跑本地 LLM 的核心 invariant 跟雲端不一樣：<strong>Mac 是 shared resource、不是 dedicated GPU</strong>。雲端 inference server 跑進 dedicated container、結束 instance 自然回收所有資源；本地<a href="/blog/llm/knowledge-cards/inference-server/" data-link-title="Inference Server" data-link-desc="載入模型權重、處理 prompt、產生 token 的常駐 process">推論伺服器</a>跑在你日常用的 Mac、跟 <a href="/blog/llm/knowledge-cards/unified-memory/" data-link-title="Unified Memory Architecture" data-link-desc="Apple Silicon 讓 CPU / GPU / NE 共用同一塊記憶體：跑大模型的優勢來源">統一記憶體</a> 共享同一塊容量，忘記管理會 silently 吃光 RAM、磁碟、port、最後讓系統變慢甚至 swap。</p>
<p>本篇紀錄三個 dimension（RAM / 磁碟 / port）的觀察工具跟釋放姿勢、對比 Ollama 跟 ComfyUI 兩種典型 lifecycle、加上實測釋放數字。對應 <a href="/blog/llm/00-foundations/privacy-data-flow/" data-link-title="0.7 隱私 / 資安的資料流原理" data-link-desc="從「位置」到「資料流」的思考升級：信任邊界、合約模型、零信任原則套用到 LLM 工作流">0.7 隱私資料流原理</a>「每個 hop 都要 audit」這條思維——資源管理也是 hop 級的 audit、不是「裝完就忘」。</p>
<blockquote>
<p><strong>驗證日期</strong>：2026-05-12
<strong>環境</strong>：macOS 14、Apple Silicon、Ollama 0.23.2、ComfyUI 0.21.0、SDXL base 1.0</p></blockquote>
<h2 id="為什麼這事重要">為什麼這事重要</h2>
<p>雲端 inference：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">Container start → load model → serve requests → container stop → 所有 RAM / 磁碟 / port 自動回收</span></span></code></pre></div><p>本地 inference：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">brew services start → load model on demand → serve → ??? → 你忘記 stop
</span></span><span class="line"><span class="ln">2</span><span class="cl">                                              → RAM / 磁碟一直被佔
</span></span><span class="line"><span class="ln">3</span><span class="cl">                                              → 下次重開機才釋放</span></span></code></pre></div><p>具體會踩到的問題：</p>
<ul>
<li><strong>RAM</strong>：18 GB SDXL 模型載入後不會自動卸、即使 ComfyUI idle、Python process 仍占 RAM</li>
<li><strong>磁碟</strong>：<code>ollama pull</code> 累積、<code>~/.ollama/models/blobs</code> 半年可長到 50 GB+、不主動清不會減</li>
<li><strong>Port</strong>：上次 crash 的 <code>ollama serve</code> 進程沒乾淨清、port 11434 還占著、下次啟動報「address already in use」</li>
<li><strong>GPU / Metal</strong>：模型載入後 Metal context 佔住、跟其他 GPU-using app（影片剪輯、遊戲）競爭</li>
</ul>
<h2 id="三個-dimension--觀察工具">三個 dimension + 觀察工具</h2>
<table>
  <thead>
      <tr>
          <th>Dimension</th>
          <th>觀察指令</th>
          <th>看什麼</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>RAM</td>
          <td><code>vm_stat | head -5</code></td>
          <td>Pages free（每 page 16 KB）、空閒越多越好</td>
      </tr>
      <tr>
          <td>RAM（per process）</td>
          <td>Activity Monitor 或 <code>ps aux | sort -k6 -rn | head</code></td>
          <td>哪個 process 佔最多記憶體</td>
      </tr>
      <tr>
          <td>磁碟</td>
          <td><code>df -h ~ | tail -1</code></td>
          <td>系統 volume 剩餘</td>
      </tr>
      <tr>
          <td>磁碟（per dir）</td>
          <td><code>du -sh ~/.ollama/models/blobs</code></td>
          <td>LLM models 累積量</td>
      </tr>
      <tr>
          <td>Port</td>
          <td><code>lsof -i :11434</code></td>
          <td>誰在 listen 該 port</td>
      </tr>
      <tr>
          <td>Process</td>
          <td><code>ps aux | grep -i ollama | grep -v grep</code></td>
          <td>Ollama / ComfyUI / Python 跑哪幾個</td>
      </tr>
      <tr>
          <td>Ollama loaded models</td>
          <td><code>ollama ps</code></td>
          <td>哪些 model 在 RAM、size、idle timer</td>
      </tr>
  </tbody>
</table>
<p>實測：剛 kill 完 ComfyUI（SDXL + Python venv）後、<code>vm_stat</code> 看到 free pages 從 619K 變 1090K（每 page 16 KB）、約 <strong>+7.5 GB RAM 釋放</strong>——這就是 SDXL + ComfyUI process 一直占的記憶體量。</p>
<h2 id="ollama-的-lifecycleauto-unload-模式">Ollama 的 lifecycle（auto-unload 模式）</h2>
<p>Ollama 走「按需 load / idle unload」設計：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">brew services start ollama          → daemon 啟動、沒 model 載入、RAM 占用 ~200 MB
</span></span><span class="line"><span class="ln">2</span><span class="cl">                                     port 11434 listening
</span></span><span class="line"><span class="ln">3</span><span class="cl">ollama run gemma3:4b &#34;hello&#34;        → 把 model 載入 RAM (~4-5 GB)
</span></span><span class="line"><span class="ln">4</span><span class="cl">                                     立刻 generate response
</span></span><span class="line"><span class="ln">5</span><span class="cl">                                     model 留在 RAM
</span></span><span class="line"><span class="ln">6</span><span class="cl">(idle 5 分鐘、無新 request)         → Ollama 自動 unload model
</span></span><span class="line"><span class="ln">7</span><span class="cl">                                     RAM 釋放、daemon 仍跑著
</span></span><span class="line"><span class="ln">8</span><span class="cl">ollama run gemma3:4b &#34;next&#34;         → 重新 load model（~5-10 秒）、generate
</span></span><span class="line"><span class="ln">9</span><span class="cl">brew services stop ollama           → daemon 結束、port 釋放</span></span></code></pre></div><p><strong>關鍵參數 <code>OLLAMA_KEEP_ALIVE</code></strong>（環境變數、預設 <code>5m</code>）：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># 看當前 loaded models</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">ollama ps
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="c1"># NAME         ID              SIZE      PROCESSOR    UNTIL</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="c1"># gemma3:4b    a2af6cc3eb7f    5.5 GB    100% Metal   4 minutes from now</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="c1"># 啟動時調 keep_alive（持續佔 RAM 直到 ollama 重啟）</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="nv">OLLAMA_KEEP_ALIVE</span><span class="o">=</span>-1 brew services restart ollama
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="c1"># 啟動時讓 model 用完立即 unload</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="nv">OLLAMA_KEEP_ALIVE</span><span class="o">=</span><span class="m">0</span> brew services restart ollama</span></span></code></pre></div><p>選 keep_alive 的 trade-off：</p>
<table>
  <thead>
      <tr>
          <th>設定</th>
          <th>RAM 占用</th>
          <th>首字延遲</th>
          <th>適合場景</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><code>0</code></td>
          <td>最低（generate 完立即釋放）</td>
          <td>高（每次都重 load）</td>
          <td>偶爾用、RAM 緊張</td>
      </tr>
      <tr>
          <td><code>5m</code>（預設）</td>
          <td>中（活躍用占住、閒 5 分鐘後釋放）</td>
          <td>低（活躍期不重 load）</td>
          <td>大多場景</td>
      </tr>
      <tr>
          <td><code>-1</code></td>
          <td>高（永久占住）</td>
          <td>最低</td>
          <td>整天頻繁用、RAM 充裕</td>
      </tr>
  </tbody>
</table>
<p><strong>主動 unload 指令</strong>：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># 把 idle 的 model 立刻從 RAM 卸掉、但 daemon 仍跑</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">curl -s http://localhost:11434/api/generate <span class="se">\
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="se"></span>  -d <span class="s1">&#39;{&#34;model&#34;: &#34;gemma3:4b&#34;, &#34;keep_alive&#34;: 0}&#39;</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">
</span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="c1"># 或關掉整個 daemon</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl">brew services stop ollama</span></span></code></pre></div><h2 id="comfyui-的-lifecycle持續占用模式">ComfyUI 的 lifecycle（持續占用模式）</h2>
<p>ComfyUI 走完全不同模式：<strong>model 載入後一直在 RAM、直到 server process 結束</strong>。沒有 auto-unload 機制。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln"> 1</span><span class="cl">python main.py                      → ComfyUI server start、port 8188 listening
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">                                     RAM ~3 GB（Python venv + 框架）
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">第一次 Queue Prompt (用 SDXL)        → 載入 sd_xl_base_1.0.safetensors (~6 GB)
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">                                     RAM 跳到 ~9-10 GB
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">                                     generate 完成、model 留在 RAM
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">連續多張生成                          → 維持 ~9-10 GB、沒 unload
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">idle 1 小時                          → 仍 ~9-10 GB（沒 timer）
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">切到 ControlNet workflow             → 多載 ControlNet model (~2 GB)、ComfyUI 自動 swap
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">                                     RAM 暫升、SD 部分可能被 evict 到 disk
</span></span><span class="line"><span class="ln">10</span><span class="cl">Ctrl+C / pkill                       → process 結束、RAM 完全釋放</span></span></code></pre></div><p>要釋放 ComfyUI 占的 RAM、<strong>唯一方法是結束 server</strong>：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># 找 PID</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">ps aux <span class="p">|</span> grep <span class="s2">&#34;ComfyUI/main.py&#34;</span> <span class="p">|</span> grep -v grep
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="c1"># 優雅關（讓它 cleanup）</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">pkill -INT -f <span class="s2">&#34;ComfyUI/main.py&#34;</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="c1"># 強制 kill（如果上面沒反應、最多等 5 秒再強制）</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">pkill -KILL -f <span class="s2">&#34;ComfyUI/main.py&#34;</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="c1"># 確認 port 釋放</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">lsof -i :8188 <span class="p">|</span> head -3</span></span></code></pre></div><p>實測：M4 Pro 32GB、SDXL base 載入後 ComfyUI process 占 ~8 GB RAM；<code>pkill -9</code> 後 <code>vm_stat</code> 顯示 free pages 增加 ~470K page（<strong>7.5 GB 釋放</strong>）。</p>
<h3 id="為什麼-ollama-跟-comfyui-設計不同">為什麼 Ollama 跟 ComfyUI 設計不同</h3>
<table>
  <thead>
      <tr>
          <th>因素</th>
          <th>Ollama 設計</th>
          <th>ComfyUI 設計</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>主要使用模式</td>
          <td>API 服務、IDE plugin 透過 HTTP 用</td>
          <td>互動 GUI、user 連續調 prompt</td>
      </tr>
      <tr>
          <td>Model 切換頻率</td>
          <td>高（不同任務換不同 model）</td>
          <td>低（一次 session 通常一個 model）</td>
      </tr>
      <tr>
          <td>User 期待的 latency</td>
          <td>低首字延遲（IDE 補完場景）</td>
          <td>高 throughput（連續生圖）</td>
      </tr>
      <tr>
          <td>結論</td>
          <td>Auto-unload 釋 RAM 給其他 model</td>
          <td>持續載入避免重複 load 浪費</td>
      </tr>
  </tbody>
</table>
<p>兩種設計都 valid、適合不同使用模式。理解差異後就知道 ComfyUI 一直占 RAM「不是 bug」、是設計選擇。</p>
<h2 id="跟其他本地-server-對比">跟其他本地 server 對比</h2>
<table>
  <thead>
      <tr>
          <th>Server</th>
          <th>Auto-unload</th>
          <th>主動 unload 指令</th>
          <th>占 RAM 觀察</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Ollama</td>
          <td>有（5 分鐘 idle）</td>
          <td><code>keep_alive: 0</code> 或 stop daemon</td>
          <td><code>ollama ps</code></td>
      </tr>
      <tr>
          <td>LM Studio</td>
          <td>無（GUI 主動關閉 model 才釋）</td>
          <td>GUI Eject Model</td>
          <td>Activity Monitor</td>
      </tr>
      <tr>
          <td>llama.cpp <code>llama-server</code></td>
          <td>無</td>
          <td>kill process</td>
          <td><code>lsof -i :8080</code></td>
      </tr>
      <tr>
          <td>ComfyUI</td>
          <td>無</td>
          <td>kill process</td>
          <td><code>ps aux | grep ComfyUI</code></td>
      </tr>
      <tr>
          <td>oMLX</td>
          <td>有（per model 可配）</td>
          <td>API endpoint</td>
          <td>server log</td>
      </tr>
  </tbody>
</table>
<p><strong>結論</strong>：只有 Ollama 跟 oMLX 內建 auto-unload、其他都要手動釋放。GUI server（LM Studio）通常給 user 一個「Eject」按鈕、CLI server 通常要 kill process。</p>
<h2 id="標準釋放程序">標準釋放程序</h2>
<p>寫 code 完一天結束、要釋放所有資源、按下表順序操作：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># 1. 確認當前狀態（記下要還回去多少 RAM）</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">vm_stat <span class="p">|</span> head -3
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">df -h ~ <span class="p">|</span> tail -1
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">ollama ps
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">ps aux <span class="p">|</span> grep -E <span class="s2">&#34;ollama|ComfyUI|llama-server&#34;</span> <span class="p">|</span> grep -v grep
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="c1"># 2. 釋放當前載入的 LLM models（Ollama）</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">brew services stop ollama
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="c1"># 或保留 daemon、只 unload model：</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="c1"># curl -s http://localhost:11434/api/generate -d &#39;{&#34;model&#34;: &#34;&lt;your model&gt;&#34;, &#34;keep_alive&#34;: 0}&#39;</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">
</span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="c1"># 3. 結束 ComfyUI / 其他 GUI server</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">pkill -INT -f <span class="s2">&#34;ComfyUI/main.py&#34;</span> 2&gt;/dev/null
</span></span><span class="line"><span class="ln">14</span><span class="cl">pkill -INT -f <span class="s2">&#34;llama-server&#34;</span> 2&gt;/dev/null
</span></span><span class="line"><span class="ln">15</span><span class="cl">sleep <span class="m">5</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl"><span class="c1"># 強制（如果上面沒清乾淨）</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl">pkill -KILL -f <span class="s2">&#34;ComfyUI/main.py&#34;</span> 2&gt;/dev/null
</span></span><span class="line"><span class="ln">18</span><span class="cl">pkill -KILL -f <span class="s2">&#34;llama-server&#34;</span> 2&gt;/dev/null
</span></span><span class="line"><span class="ln">19</span><span class="cl">
</span></span><span class="line"><span class="ln">20</span><span class="cl"><span class="c1"># 4. 驗證所有 port 釋放</span>
</span></span><span class="line"><span class="ln">21</span><span class="cl">lsof -i :11434 -i :1234 -i :8080 -i :8188 -i :8000 2&gt;<span class="p">&amp;</span><span class="m">1</span> <span class="p">|</span> head
</span></span><span class="line"><span class="ln">22</span><span class="cl">
</span></span><span class="line"><span class="ln">23</span><span class="cl"><span class="c1"># 5. 確認釋放量</span>
</span></span><span class="line"><span class="ln">24</span><span class="cl">vm_stat <span class="p">|</span> head -3
</span></span><span class="line"><span class="ln">25</span><span class="cl"><span class="c1"># free pages 該明顯增加</span></span></span></code></pre></div><h3 id="容易出錯的釋放方式">容易出錯的「釋放方式」</h3>
<ul>
<li><strong><code>killall Python</code></strong>：會 kill 所有 Python process、包括其他 dev tool（如 jupyter、Django）。用 <code>pkill -f &quot;ComfyUI/main.py&quot;</code> 等明確 pattern。</li>
<li><strong><code>rm -rf ~/.ollama</code></strong>：會清掉所有 model registry、下次要重 pull 全部 model。Cleanup 用 <code>ollama rm &lt;model&gt;</code> 才精準。</li>
<li><strong><code>brew uninstall ollama</code></strong>：直接卸載 Ollama 本身、過 reinstall 麻煩。Stop service 就夠。</li>
<li><strong>重開機釋放</strong>：work 但太重、會中斷其他工作。用 process-level 操作即可。</li>
</ul>
<h2 id="磁碟長期累積管理">磁碟長期累積管理</h2>
<p>Models 一旦 <code>pull</code> 進 <code>~/.ollama/models/blobs</code>、不主動 <code>rm</code> 不會減少。半年累積可長到 50 GB+。</p>
<p>Ollama models 只是磁碟大戶之一。整台 Mac 突然被吃光、要從哪裡查起的全機診斷順序（先排除快照浮動、再用實際佔用值逐層找大戶），見 <a href="/blog/other/macos-%E7%A3%81%E7%A2%9F%E7%A9%BA%E9%96%93%E8%A2%AB%E5%90%83%E5%85%89%E7%9A%84%E8%A8%BA%E6%96%B7%E6%B5%81%E7%A8%8B/" data-link-title="macOS 磁碟空間被吃光的診斷流程" data-link-desc="Mac 空間莫名歸零、清 cache 沒救、或空間掉了又回來時的排查順序。避開 sparse 假大小和本地快照浮動的誤判。含 disk-report 腳本。">macOS 磁碟空間診斷流程</a>——那篇的佔用大戶表也會把 ollama 列為其中一項、再連回本篇的專屬清理 idiom。</p>
<h3 id="觀察累積">觀察累積</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># Ollama models 總占用</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">du -sh ~/.ollama/models/blobs
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="c1"># 4.1G    /Users/tarragon/.ollama/models/blobs</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="c1"># 逐 model 看大小</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">ollama list
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="c1"># NAME                       ID              SIZE      MODIFIED</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="c1"># gemma4:e4b                 c6eb396dbd59    9.6 GB    Less than a second ago</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="c1"># nomic-embed-text:latest    0a109f422b47    274 MB    3 hours ago</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">
</span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="c1"># ComfyUI checkpoints 累積</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">du -sh ~/.ollama ~/Projects/ComfyUI/models 2&gt;/dev/null
</span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="c1"># 4.2G    /Users/tarragon/.ollama</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="c1"># 7.0G    /Users/tarragon/Projects/ComfyUI/models</span></span></span></code></pre></div><h3 id="清理策略">清理策略</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># 刪掉很久沒用的 model</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">ollama rm &lt;model-tag&gt;
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="c1"># 一次清掉所有 Ollama models（保留 daemon）</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">ollama list <span class="p">|</span> tail -n +2 <span class="p">|</span> awk <span class="s1">&#39;{print $1}&#39;</span> <span class="p">|</span> xargs -I <span class="o">{}</span> ollama rm <span class="o">{}</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="c1"># 看 ComfyUI checkpoints 哪些可清</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">ls -lh ~/Projects/ComfyUI/models/checkpoints/
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="c1"># 手動刪不要的 .safetensors（小心、不能 undo）</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">rm ~/Projects/ComfyUI/models/checkpoints/&lt;old-model&gt;.safetensors</span></span></code></pre></div><h3 id="磁碟管理-idiom">磁碟管理 idiom</h3>
<p>定期（每月或磁碟剩 &lt; 20% 時）做：</p>
<ol>
<li><code>du -sh ~/.ollama ~/Projects/ComfyUI/models</code> 看當前累積</li>
<li><code>ollama list</code> 看哪些 model 沒在用（看 <code>MODIFIED</code> 欄、太舊的考慮刪）</li>
<li>刪實驗用的 model、保留 daily-driver</li>
<li>ComfyUI checkpoints 同樣 review</li>
</ol>
<h2 id="port--process-排錯">Port / Process 排錯</h2>
<h3 id="啟動報address-already-in-use">啟動報「address already in use」</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># 找誰占</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">lsof -i :11434
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="c1"># COMMAND  PID  USER   ...   NAME</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="c1"># ollama   xxx  ...    ...   TCP localhost:11434 (LISTEN)</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="c1"># 看是不是 zombie process</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">ps aux <span class="p">|</span> grep <span class="k">$(</span>lsof -ti :11434 <span class="p">|</span> head -1<span class="k">)</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="c1"># 清掉</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="nb">kill</span> -9 <span class="k">$(</span>lsof -ti :11434<span class="k">)</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">
</span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="c1"># 或重啟 service（會自動清舊 instance）</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">brew services restart ollama</span></span></code></pre></div><h3 id="ollama-daemon-掛了不知道">Ollama daemon 掛了不知道</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># 健康檢查</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">curl -s http://localhost:11434/api/version
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="c1"># 沒回應、看 service 狀態</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">brew services list <span class="p">|</span> grep ollama
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="c1"># 沒在跑、重啟</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">brew services start ollama
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="c1"># 看 log</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">tail -50 /opt/homebrew/var/log/ollama.log</span></span></code></pre></div><h3 id="comfyui-看似跑著但-queue-不動">ComfyUI 看似跑著但 Queue 不動</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># 看 stdout / stderr log</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">tail -30 /tmp/comfyui.log  <span class="c1"># 如果啟動時 redirect 到 log</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="c1"># 看是不是 GPU / Metal stuck（極少見、但 SDXL 大量並發可能踩到）</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="c1"># 解法：kill + 重啟</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl">pkill -9 -f <span class="s2">&#34;ComfyUI/main.py&#34;</span></span></span></code></pre></div><p>完整排錯流程跟「先確認哪一層壞」見 <a href="/blog/llm/01-local-llm-services/troubleshooting/" data-link-title="1.7 排錯方法論：用三層架構做故障定位" data-link-desc="故障定位的分層思考、症狀到層級的對應反射、log 在三層的角色差異、最小可重現的縮減策略">1.7 排錯方法論</a>。</p>
<h2 id="觀察記憶體佔用實測對照">觀察記憶體佔用：實測對照</h2>
<p>跑這幾步紀錄 baseline → load model → kill 的 RAM 變化：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># Baseline</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">vm_stat <span class="p">|</span> grep <span class="s2">&#34;Pages free&#34;</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="c1"># Pages free:                              1090076.   ← ~17 GB free</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="c1"># 啟動 Ollama + load 4B model</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">brew services start ollama
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">ollama run gemma3:4b <span class="s2">&#34;hello&#34;</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">ollama ps
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="c1"># NAME       SIZE     PROCESSOR    UNTIL</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="c1"># gemma3:4b  5.5 GB   100% Metal   4 minutes from now</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">
</span></span><span class="line"><span class="ln">12</span><span class="cl">vm_stat <span class="p">|</span> grep <span class="s2">&#34;Pages free&#34;</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="c1"># Pages free:                               750000.   ← 跌 ~5 GB（model 載入）</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">
</span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="c1"># 額外啟動 ComfyUI + load SDXL</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl">nohup python main.py &gt; /tmp/comfyui.log 2&gt;<span class="p">&amp;</span><span class="m">1</span> <span class="p">&amp;</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl"><span class="c1"># 在 GUI 上 Queue Prompt 跑一次 SDXL generation</span>
</span></span><span class="line"><span class="ln">18</span><span class="cl">vm_stat <span class="p">|</span> grep <span class="s2">&#34;Pages free&#34;</span>
</span></span><span class="line"><span class="ln">19</span><span class="cl"><span class="c1"># Pages free:                               280000.   ← 再跌 ~7.5 GB（SDXL 載入 + Python venv）</span>
</span></span><span class="line"><span class="ln">20</span><span class="cl">
</span></span><span class="line"><span class="ln">21</span><span class="cl"><span class="c1"># kill 全部</span>
</span></span><span class="line"><span class="ln">22</span><span class="cl">brew services stop ollama
</span></span><span class="line"><span class="ln">23</span><span class="cl">pkill -9 -f <span class="s2">&#34;ComfyUI/main.py&#34;</span>
</span></span><span class="line"><span class="ln">24</span><span class="cl">sleep <span class="m">3</span>
</span></span><span class="line"><span class="ln">25</span><span class="cl">vm_stat <span class="p">|</span> grep <span class="s2">&#34;Pages free&#34;</span>
</span></span><span class="line"><span class="ln">26</span><span class="cl"><span class="c1"># Pages free:                              1090000.   ← 回到 baseline</span></span></span></code></pre></div><p>每 page 16 KB、所以 free pages 數字 × 16 KB = 實際 free RAM bytes。</p>
<h2 id="自動化釋放launchd--shell-alias">自動化釋放：launchd / shell alias</h2>
<p>寫個 shell function 一鍵 cleanup：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># 加進 ~/.zshrc</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">llm-cleanup<span class="o">()</span> <span class="o">{</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">  <span class="nb">echo</span> <span class="s2">&#34;[*] Stopping Ollama...&#34;</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">  brew services stop ollama 2&gt;/dev/null
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">  <span class="nb">echo</span> <span class="s2">&#34;[*] Killing ComfyUI...&#34;</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">  pkill -INT -f <span class="s2">&#34;ComfyUI/main.py&#34;</span> 2&gt;/dev/null
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">  sleep <span class="m">3</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">  pkill -KILL -f <span class="s2">&#34;ComfyUI/main.py&#34;</span> 2&gt;/dev/null
</span></span><span class="line"><span class="ln">10</span><span class="cl">
</span></span><span class="line"><span class="ln">11</span><span class="cl">  <span class="nb">echo</span> <span class="s2">&#34;[*] Killing other model servers...&#34;</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">  pkill -KILL -f <span class="s2">&#34;llama-server&#34;</span> 2&gt;/dev/null
</span></span><span class="line"><span class="ln">13</span><span class="cl">  pkill -KILL -f <span class="s2">&#34;lm-studio-server&#34;</span> 2&gt;/dev/null
</span></span><span class="line"><span class="ln">14</span><span class="cl">
</span></span><span class="line"><span class="ln">15</span><span class="cl">  <span class="nb">echo</span> <span class="s2">&#34;[*] Verifying ports...&#34;</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl">  <span class="k">for</span> p in <span class="m">11434</span> <span class="m">1234</span> <span class="m">8080</span> <span class="m">8188</span> 8000<span class="p">;</span> <span class="k">do</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl">    lsof -i :<span class="nv">$p</span> 2&gt;/dev/null <span class="p">|</span> head -2
</span></span><span class="line"><span class="ln">18</span><span class="cl">  <span class="k">done</span>
</span></span><span class="line"><span class="ln">19</span><span class="cl">
</span></span><span class="line"><span class="ln">20</span><span class="cl">  <span class="nb">echo</span> <span class="s2">&#34;[*] Free RAM:&#34;</span>
</span></span><span class="line"><span class="ln">21</span><span class="cl">  vm_stat <span class="p">|</span> grep <span class="s2">&#34;Pages free&#34;</span>
</span></span><span class="line"><span class="ln">22</span><span class="cl"><span class="o">}</span></span></span></code></pre></div><p>完事打 <code>llm-cleanup</code> 一鍵釋放、不用記每個 process 怎麼 kill。</p>
<h2 id="何時這篇會過時">何時這篇會過時</h2>
<p><strong>不會過時的部分</strong>：</p>
<ul>
<li>RAM / 磁碟 / port 三個 dimension 是長期 invariant、用什麼 LLM server 都成立。</li>
<li>「Mac 是 shared resource、需要主動管理」這個 framing。</li>
<li>Ollama 跟 ComfyUI 兩種典型 lifecycle 對比（auto-unload vs persistent）。</li>
<li>觀察工具（<code>vm_stat</code>、<code>lsof</code>、<code>ps</code>、<code>du</code>、Activity Monitor）是 macOS 系統 API、不會 deprecate。</li>
<li>標準釋放程序、自動化 shell function 模式。</li>
</ul>
<p><strong>會變的部分</strong>：</p>
<ul>
<li>具體 model size / RAM 占用數字（隨模型架構演化）。</li>
<li><code>OLLAMA_KEEP_ALIVE</code> 等具體環境變數名（Ollama API 演化）。</li>
<li>ComfyUI 可能加 auto-unload feature（社群有 issue 在討論）。</li>
</ul>
<p>讀的時候若指令跑不過、先 <code>--help</code> 看當前版本 flag；釋放 RAM 的「kill process」這個機制本身永遠成立。</p>
<h2 id="跟其他-hands-on-章節的關係">跟其他 hands-on 章節的關係</h2>
<ul>
<li><a href="/blog/llm/01-local-llm-services/hands-on/ollama-setup/" data-link-title="Hands-on：安裝 Ollama &#43; 拉第一個 Gemma 模型" data-link-desc="brew install ollama、launchd service、ollama pull、curl 驗證 OpenAI 相容 API">Ollama 安裝</a>：介紹 <code>brew services start/stop</code>、本篇延伸 lifecycle 細節</li>
<li><a href="/blog/llm/01-local-llm-services/hands-on/comfyui-setup/" data-link-title="Hands-on：安裝 ComfyUI &#43; SDXL base" data-link-desc="git clone、venv、pip install requirements、SDXL safetensors 放哪、--listen 啟動 server、瀏覽器 workflow 驗證">ComfyUI 安裝</a>：介紹 ComfyUI 啟動、本篇延伸 RAM 占用 + 釋放</li>
<li><a href="/blog/llm/01-local-llm-services/troubleshooting/" data-link-title="1.7 排錯方法論：用三層架構做故障定位" data-link-desc="故障定位的分層思考、症狀到層級的對應反射、log 在三層的角色差異、最小可重現的縮減策略">1.7 排錯方法論</a>：用三層架構定位故障、本篇是 lifecycle 視角的補完</li>
<li><a href="/blog/llm/00-foundations/privacy-data-flow/" data-link-title="0.7 隱私 / 資安的資料流原理" data-link-desc="從「位置」到「資料流」的思考升級：信任邊界、合約模型、零信任原則套用到 LLM 工作流">0.7 隱私資料流原理</a>：「每個 hop 都要 audit」延伸到資源層</li>
</ul>
<p>整體心法：本地 LLM 工作流跟雲端不一樣、要主動管理 lifecycle、不能裝完就忘。</p>
]]></content:encoded></item><item><title>Hands-on：用本地 LLM 跑 judge harness（最小可行版）</title><link>https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/local-llm-judge-harness/</link><pubDate>Tue, 12 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/local-llm-judge-harness/</guid><description>&lt;p>&lt;a href="https://tarrragon.github.io/blog/llm/04-applications/llm-as-judge/" data-link-title="4.21 LLM-as-Judge 評估方法" data-link-desc="LLM 評估 LLM 的 production eval 方法：rubric design、pairwise / direct scoring、三大 bias 緩解、跟 trace 串接的閉環、calibration">4.21 LLM-as-judge&lt;/a> 寫的是原理。本篇用 Ollama / LM Studio 在本地跑一個最小可行的 judge harness、對自己工作流的真實案例做 systematic eval。隱私敏感場景特別合用 — eval 資料（user query、agent output、可能含 PII）不需要送雲端。&lt;/p>
&lt;p>本篇 framing 是「&lt;strong>真的能跑、不只跑 demo&lt;/strong>」、所以包含：硬體預算估算、judge model 選型、bias 緩解、calibration 流程、跟 production trace 串接的延伸；術語對應 &lt;a href="https://tarrragon.github.io/blog/llm/knowledge-cards/llm-as-judge/" data-link-title="LLM-as-Judge" data-link-desc="用 LLM 評估另一個 LLM 的輸出品質、production eval 的主流方法、500-5000× 成本降但有 bias 要處理">LLM-as-Judge&lt;/a> 與 &lt;a href="https://tarrragon.github.io/blog/llm/knowledge-cards/llm-tracing/" data-link-title="LLM Tracing" data-link-desc="把 LLM 應用的每次 LLM call / tool call / memory op 編成結構化 span、用 OpenTelemetry GenAI semantic conventions 標準化">LLM Tracing&lt;/a>。&lt;/p>
&lt;blockquote>
&lt;p>&lt;strong>驗證日期&lt;/strong>：2026-05-12
&lt;strong>環境&lt;/strong>：M4 Max 64GB / 或 24GB+ VRAM PC + Ollama
&lt;strong>Judge model&lt;/strong>：DeepSeek-R1-Distill-Qwen-32B 或 QwQ-32B（reasoning model 當 judge 更穩）&lt;/p>&lt;/blockquote>
&lt;h2 id="為什麼用本地-llm-當-judge">為什麼用本地 LLM 當 judge&lt;/h2>
&lt;p>跟雲端 judge（GPT-5 / Claude 4）對比：&lt;/p>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>維度&lt;/th>
 &lt;th>本地 judge&lt;/th>
 &lt;th>雲端 judge&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>Cost&lt;/td>
 &lt;td>0（電費）&lt;/td>
 &lt;td>$0.001-0.01 per item&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>隱私&lt;/td>
 &lt;td>完全本地、eval 資料不出機器&lt;/td>
 &lt;td>送雲端、依政策&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Latency&lt;/td>
 &lt;td>視硬體、reasoning model 30B 約 30-60s&lt;/td>
 &lt;td>API call 5-30s&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>品質上限&lt;/td>
 &lt;td>本地 30B reasoning 接近 2024 雲端中段&lt;/td>
 &lt;td>雲端旗艦上限高&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>大量 batch&lt;/td>
 &lt;td>慢但 zero cost&lt;/td>
 &lt;td>快但 cost 累積&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>判讀：&lt;/p>
&lt;ul>
&lt;li>&lt;strong>大量 production trace eval（千筆以上）+ 隱私敏感&lt;/strong> → 本地 judge&lt;/li>
&lt;li>&lt;strong>少量 high-stake eval（&amp;lt; 50 筆）&lt;/strong> → 雲端旗艦 judge&lt;/li>
&lt;li>&lt;strong>A/B test 快速 iterate&lt;/strong> → 雲端（latency 重要）&lt;/li>
&lt;/ul>
&lt;h2 id="硬體預算">硬體預算&lt;/h2>
&lt;p>Judge model 選擇看硬體：&lt;/p>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>硬體&lt;/th>
 &lt;th>適合 judge model&lt;/th>
 &lt;th>預期 latency / item&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>M4 Pro 24GB / 4090 16GB&lt;/td>
 &lt;td>Qwen2.5-32B Q4 或 DeepSeek-R1-Distill-14B&lt;/td>
 &lt;td>30-60s&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>M4 Pro 36GB&lt;/td>
 &lt;td>DeepSeek-R1-Distill-Qwen-32B Q4&lt;/td>
 &lt;td>60-120s&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>M4 Max 48-64GB / 5090 24GB&lt;/td>
 &lt;td>QwQ-32B 或 DeepSeek-R1-Distill-Qwen-32B Q6&lt;/td>
 &lt;td>60-180s（含 reasoning trace）&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>M4 Max 128GB / 多卡 PC&lt;/td>
 &lt;td>Llama 3.3 70B 或 Qwen3-72B&lt;/td>
 &lt;td>120-300s&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>注意：reasoning model 的 thinking trace 拉長 latency、跑大量 batch 要規劃時間（100 item × 60s = 100 min）。&lt;/p></description><content:encoded><![CDATA[<p><a href="/blog/llm/04-applications/llm-as-judge/" data-link-title="4.21 LLM-as-Judge 評估方法" data-link-desc="LLM 評估 LLM 的 production eval 方法：rubric design、pairwise / direct scoring、三大 bias 緩解、跟 trace 串接的閉環、calibration">4.21 LLM-as-judge</a> 寫的是原理。本篇用 Ollama / LM Studio 在本地跑一個最小可行的 judge harness、對自己工作流的真實案例做 systematic eval。隱私敏感場景特別合用 — eval 資料（user query、agent output、可能含 PII）不需要送雲端。</p>
<p>本篇 framing 是「<strong>真的能跑、不只跑 demo</strong>」、所以包含：硬體預算估算、judge model 選型、bias 緩解、calibration 流程、跟 production trace 串接的延伸；術語對應 <a href="/blog/llm/knowledge-cards/llm-as-judge/" data-link-title="LLM-as-Judge" data-link-desc="用 LLM 評估另一個 LLM 的輸出品質、production eval 的主流方法、500-5000× 成本降但有 bias 要處理">LLM-as-Judge</a> 與 <a href="/blog/llm/knowledge-cards/llm-tracing/" data-link-title="LLM Tracing" data-link-desc="把 LLM 應用的每次 LLM call / tool call / memory op 編成結構化 span、用 OpenTelemetry GenAI semantic conventions 標準化">LLM Tracing</a>。</p>
<blockquote>
<p><strong>驗證日期</strong>：2026-05-12
<strong>環境</strong>：M4 Max 64GB / 或 24GB+ VRAM PC + Ollama
<strong>Judge model</strong>：DeepSeek-R1-Distill-Qwen-32B 或 QwQ-32B（reasoning model 當 judge 更穩）</p></blockquote>
<h2 id="為什麼用本地-llm-當-judge">為什麼用本地 LLM 當 judge</h2>
<p>跟雲端 judge（GPT-5 / Claude 4）對比：</p>
<table>
  <thead>
      <tr>
          <th>維度</th>
          <th>本地 judge</th>
          <th>雲端 judge</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Cost</td>
          <td>0（電費）</td>
          <td>$0.001-0.01 per item</td>
      </tr>
      <tr>
          <td>隱私</td>
          <td>完全本地、eval 資料不出機器</td>
          <td>送雲端、依政策</td>
      </tr>
      <tr>
          <td>Latency</td>
          <td>視硬體、reasoning model 30B 約 30-60s</td>
          <td>API call 5-30s</td>
      </tr>
      <tr>
          <td>品質上限</td>
          <td>本地 30B reasoning 接近 2024 雲端中段</td>
          <td>雲端旗艦上限高</td>
      </tr>
      <tr>
          <td>大量 batch</td>
          <td>慢但 zero cost</td>
          <td>快但 cost 累積</td>
      </tr>
  </tbody>
</table>
<p>判讀：</p>
<ul>
<li><strong>大量 production trace eval（千筆以上）+ 隱私敏感</strong> → 本地 judge</li>
<li><strong>少量 high-stake eval（&lt; 50 筆）</strong> → 雲端旗艦 judge</li>
<li><strong>A/B test 快速 iterate</strong> → 雲端（latency 重要）</li>
</ul>
<h2 id="硬體預算">硬體預算</h2>
<p>Judge model 選擇看硬體：</p>
<table>
  <thead>
      <tr>
          <th>硬體</th>
          <th>適合 judge model</th>
          <th>預期 latency / item</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>M4 Pro 24GB / 4090 16GB</td>
          <td>Qwen2.5-32B Q4 或 DeepSeek-R1-Distill-14B</td>
          <td>30-60s</td>
      </tr>
      <tr>
          <td>M4 Pro 36GB</td>
          <td>DeepSeek-R1-Distill-Qwen-32B Q4</td>
          <td>60-120s</td>
      </tr>
      <tr>
          <td>M4 Max 48-64GB / 5090 24GB</td>
          <td>QwQ-32B 或 DeepSeek-R1-Distill-Qwen-32B Q6</td>
          <td>60-180s（含 reasoning trace）</td>
      </tr>
      <tr>
          <td>M4 Max 128GB / 多卡 PC</td>
          <td>Llama 3.3 70B 或 Qwen3-72B</td>
          <td>120-300s</td>
      </tr>
  </tbody>
</table>
<p>注意：reasoning model 的 thinking trace 拉長 latency、跑大量 batch 要規劃時間（100 item × 60s = 100 min）。</p>
<p><strong>何時不適合用本地 judge</strong>：</p>
<ol>
<li><strong>硬體低於 M4 Pro 24GB / 4090 16GB</strong>（如 M1/M2 16GB、無獨立 GPU PC）：跑 32B reasoning model 太緊、強行跑會 swap、latency 爆 5-10×。改用 14B instruct model（如 Qwen2.5-14B Q4）作 judge、或直接走雲端 judge</li>
<li><strong>Batch × latency &gt; 你可接受的等待時間</strong>：100 item × 60s/item = 100 min；500 item × 120s = 17 hr。預估超過 4 hr 時改雲端 batch API</li>
<li><strong>eval 任務太 nuanced</strong>：細粒度倫理 / 法律 / 高 stake 判讀、本地 32B distill 能力不夠、用雲端旗艦 judge 或人工 review</li>
<li><strong>calibration 階段</strong>：第一次跑、要快速 iterate rubric、雲端 judge latency 短（5-30s）更適合 iterate</li>
</ol>
<h2 id="整體流程">整體流程</h2>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">1. 蒐集 eval dataset    → JSONL：每行一個 (input, output) 待評
</span></span><span class="line"><span class="ln">2</span><span class="cl">2. 設計 rubric         → 評分維度、scale、明確 anti-pattern
</span></span><span class="line"><span class="ln">3</span><span class="cl">3. 寫 judge prompt     → 4 段式（task / input-output / rubric / format）
</span></span><span class="line"><span class="ln">4</span><span class="cl">4. 跑 harness          → 對每筆 input call judge、parse JSON output
</span></span><span class="line"><span class="ln">5</span><span class="cl">5. Aggregate 結果      → 算平均分數、找 outlier、看 reasoning
</span></span><span class="line"><span class="ln">6</span><span class="cl">6. Calibration（可選）  → 跟 human eval 比對、調 rubric
</span></span><span class="line"><span class="ln">7</span><span class="cl">7. 跟 production trace 串接 → 定期跑 production sample</span></span></code></pre></div><h2 id="step-1蒐集-eval-dataset">Step 1：蒐集 eval dataset</h2>
<p>JSONL format（每行一筆）：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-json" data-lang="json"><span class="line"><span class="ln">1</span><span class="cl"><span class="p">{</span><span class="nt">&#34;id&#34;</span><span class="p">:</span> <span class="s2">&#34;001&#34;</span><span class="p">,</span> <span class="nt">&#34;input&#34;</span><span class="p">:</span> <span class="s2">&#34;用 Python 寫 fibonacci function&#34;</span><span class="p">,</span> <span class="nt">&#34;output&#34;</span><span class="p">:</span> <span class="s2">&#34;def fib(n):\n    if n &lt;= 1:\n        return n\n    return fib(n-1) + fib(n-2)&#34;</span><span class="p">}</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="p">{</span><span class="nt">&#34;id&#34;</span><span class="p">:</span> <span class="s2">&#34;002&#34;</span><span class="p">,</span> <span class="nt">&#34;input&#34;</span><span class="p">:</span> <span class="s2">&#34;解釋這段 code 在做什麼：[code]&#34;</span><span class="p">,</span> <span class="nt">&#34;output&#34;</span><span class="p">:</span> <span class="s2">&#34;這段 code 實作了 ...&#34;</span><span class="p">}</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="p">{</span><span class="nt">&#34;id&#34;</span><span class="p">:</span> <span class="s2">&#34;003&#34;</span><span class="p">,</span> <span class="nt">&#34;input&#34;</span><span class="p">:</span> <span class="s2">&#34;[bug 描述]&#34;</span><span class="p">,</span> <span class="nt">&#34;output&#34;</span><span class="p">:</span> <span class="s2">&#34;[suggested fix]&#34;</span><span class="p">}</span></span></span></code></pre></div><p>來源：</p>
<ul>
<li>過往 Continue.dev / Cursor 跟 LLM 的對話 log</li>
<li>Production agent 的 trace（手動 export 或 LangSmith / Phoenix dump）</li>
<li>自己 hand-craft 30-100 個典型 case</li>
</ul>
<p>放在 <code>data/eval.jsonl</code>。</p>
<h2 id="step-2設計-rubric">Step 2：設計 rubric</h2>
<p>依任務類型設計、coding 任務的範例 rubric：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln"> 1</span><span class="cl">評分維度：
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">1. Correctness（程式碼能否運作、邏輯是否正確）：1-5
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">2. Style（是否符合 codebase convention、習慣命名）：1-5
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">3. Completeness（是否完整解決 user request）：1-5
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">評分規則：
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">- 5：完美無瑕、可直接 merge
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">- 4：小修可用、整體正確
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">- 3：方向正確、需 substantial 修改
</span></span><span class="line"><span class="ln">10</span><span class="cl">- 2：部分對、主要邏輯有錯
</span></span><span class="line"><span class="ln">11</span><span class="cl">- 1：完全錯、誤導使用者
</span></span><span class="line"><span class="ln">12</span><span class="cl">
</span></span><span class="line"><span class="ln">13</span><span class="cl">明確不加分（緩解 verbosity bias）：
</span></span><span class="line"><span class="ln">14</span><span class="cl">- 冗長 / verbose（同樣正確的短答 = 長答）
</span></span><span class="line"><span class="ln">15</span><span class="cl">- 道歉 / 開場白
</span></span><span class="line"><span class="ln">16</span><span class="cl">- 「我希望這有幫助」這類禮貌話
</span></span><span class="line"><span class="ln">17</span><span class="cl">- 過多 markdown 修飾（不加分）</span></span></code></pre></div><h2 id="step-3judge-prompt-模板">Step 3：Judge prompt 模板</h2>
<p>寫成 file <code>prompts/judge.txt</code>：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln"> 1</span><span class="cl">你是 LLM 輸出品質評估員、要評估 coding assistant 對使用者請求的回答品質。
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">重要：請保持公正、忽略風格偏好、聚焦在實質品質。
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">User request:
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">{input}
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">Assistant response:
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">{output}
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">
</span></span><span class="line"><span class="ln">10</span><span class="cl">評分維度（每維 1-5、加總用 overall）：
</span></span><span class="line"><span class="ln">11</span><span class="cl">
</span></span><span class="line"><span class="ln">12</span><span class="cl">1. Correctness：程式碼能否運作、邏輯正確
</span></span><span class="line"><span class="ln">13</span><span class="cl">   5: 完美無瑕
</span></span><span class="line"><span class="ln">14</span><span class="cl">   4: 小修可用
</span></span><span class="line"><span class="ln">15</span><span class="cl">   3: 方向正確、需 substantial 修改
</span></span><span class="line"><span class="ln">16</span><span class="cl">   2: 部分對、主要邏輯有錯
</span></span><span class="line"><span class="ln">17</span><span class="cl">   1: 完全錯
</span></span><span class="line"><span class="ln">18</span><span class="cl">
</span></span><span class="line"><span class="ln">19</span><span class="cl">2. Style：符合 codebase convention
</span></span><span class="line"><span class="ln">20</span><span class="cl">   1-5 同 scale
</span></span><span class="line"><span class="ln">21</span><span class="cl">
</span></span><span class="line"><span class="ln">22</span><span class="cl">3. Completeness：完整解決 user request
</span></span><span class="line"><span class="ln">23</span><span class="cl">   1-5 同 scale
</span></span><span class="line"><span class="ln">24</span><span class="cl">
</span></span><span class="line"><span class="ln">25</span><span class="cl">明確不加分項：
</span></span><span class="line"><span class="ln">26</span><span class="cl">- 冗長 / verbose（同樣正確的短答 = 長答）
</span></span><span class="line"><span class="ln">27</span><span class="cl">- 道歉 / 開場白
</span></span><span class="line"><span class="ln">28</span><span class="cl">- 「我希望這有幫助」這類禮貌話
</span></span><span class="line"><span class="ln">29</span><span class="cl">- 過多 markdown 修飾
</span></span><span class="line"><span class="ln">30</span><span class="cl">
</span></span><span class="line"><span class="ln">31</span><span class="cl">請依下列 JSON 輸出（不要加額外文字、不要 markdown code fence）：
</span></span><span class="line"><span class="ln">32</span><span class="cl">{
</span></span><span class="line"><span class="ln">33</span><span class="cl">  &#34;correctness&#34;: &lt;1-5&gt;,
</span></span><span class="line"><span class="ln">34</span><span class="cl">  &#34;style&#34;: &lt;1-5&gt;,
</span></span><span class="line"><span class="ln">35</span><span class="cl">  &#34;completeness&#34;: &lt;1-5&gt;,
</span></span><span class="line"><span class="ln">36</span><span class="cl">  &#34;reasoning&#34;: &#34;&lt;簡短解釋、&lt; 100 字&gt;&#34;,
</span></span><span class="line"><span class="ln">37</span><span class="cl">  &#34;overall&#34;: &lt;1-5&gt;
</span></span><span class="line"><span class="ln">38</span><span class="cl">}</span></span></code></pre></div><h2 id="step-4跑-harness">Step 4：跑 harness</h2>
<p>Python 最小可行版：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># judge_harness.py</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="kn">import</span> <span class="nn">json</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="kn">import</span> <span class="nn">requests</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="kn">from</span> <span class="nn">pathlib</span> <span class="kn">import</span> <span class="n">Path</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="n">JUDGE_MODEL</span> <span class="o">=</span> <span class="s2">&#34;deepseek-r1:32b&#34;</span>  <span class="c1"># 或 qwq:32b</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="n">OLLAMA_URL</span> <span class="o">=</span> <span class="s2">&#34;http://localhost:11434/v1/chat/completions&#34;</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="k">def</span> <span class="nf">load_dataset</span><span class="p">(</span><span class="n">path</span><span class="p">):</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">    <span class="s2">&#34;&#34;&#34;Load JSONL eval dataset.&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">    <span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">path</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">        <span class="k">return</span> <span class="p">[</span><span class="n">json</span><span class="o">.</span><span class="n">loads</span><span class="p">(</span><span class="n">line</span><span class="p">)</span> <span class="k">for</span> <span class="n">line</span> <span class="ow">in</span> <span class="n">f</span> <span class="k">if</span> <span class="n">line</span><span class="o">.</span><span class="n">strip</span><span class="p">()]</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">
</span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="k">def</span> <span class="nf">load_prompt_template</span><span class="p">(</span><span class="n">path</span><span class="p">):</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">    <span class="k">return</span> <span class="n">Path</span><span class="p">(</span><span class="n">path</span><span class="p">)</span><span class="o">.</span><span class="n">read_text</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl">
</span></span><span class="line"><span class="ln">17</span><span class="cl"><span class="k">def</span> <span class="nf">call_judge</span><span class="p">(</span><span class="n">prompt</span><span class="p">):</span>
</span></span><span class="line"><span class="ln">18</span><span class="cl">    <span class="s2">&#34;&#34;&#34;Call Ollama judge model、回 raw response text.&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="ln">19</span><span class="cl">    <span class="n">resp</span> <span class="o">=</span> <span class="n">requests</span><span class="o">.</span><span class="n">post</span><span class="p">(</span><span class="n">OLLAMA_URL</span><span class="p">,</span> <span class="n">json</span><span class="o">=</span><span class="p">{</span>
</span></span><span class="line"><span class="ln">20</span><span class="cl">        <span class="s2">&#34;model&#34;</span><span class="p">:</span> <span class="n">JUDGE_MODEL</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">21</span><span class="cl">        <span class="s2">&#34;messages&#34;</span><span class="p">:</span> <span class="p">[{</span><span class="s2">&#34;role&#34;</span><span class="p">:</span> <span class="s2">&#34;user&#34;</span><span class="p">,</span> <span class="s2">&#34;content&#34;</span><span class="p">:</span> <span class="n">prompt</span><span class="p">}],</span>
</span></span><span class="line"><span class="ln">22</span><span class="cl">        <span class="s2">&#34;temperature&#34;</span><span class="p">:</span> <span class="mf">0.1</span><span class="p">,</span>  <span class="c1"># judge 用低 temperature 穩定</span>
</span></span><span class="line"><span class="ln">23</span><span class="cl">        <span class="s2">&#34;stream&#34;</span><span class="p">:</span> <span class="kc">False</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">24</span><span class="cl">    <span class="p">},</span> <span class="n">timeout</span><span class="o">=</span><span class="mi">600</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">25</span><span class="cl">    <span class="k">return</span> <span class="n">resp</span><span class="o">.</span><span class="n">json</span><span class="p">()[</span><span class="s2">&#34;choices&#34;</span><span class="p">][</span><span class="mi">0</span><span class="p">][</span><span class="s2">&#34;message&#34;</span><span class="p">][</span><span class="s2">&#34;content&#34;</span><span class="p">]</span>
</span></span><span class="line"><span class="ln">26</span><span class="cl">
</span></span><span class="line"><span class="ln">27</span><span class="cl"><span class="k">def</span> <span class="nf">parse_judge_output</span><span class="p">(</span><span class="n">text</span><span class="p">):</span>
</span></span><span class="line"><span class="ln">28</span><span class="cl">    <span class="s2">&#34;&#34;&#34;Parse judge 回的 JSON、容錯處理（reasoning model 可能加 &lt;think&gt; 標記）。&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="ln">29</span><span class="cl">    <span class="c1"># 跳過 reasoning trace</span>
</span></span><span class="line"><span class="ln">30</span><span class="cl">    <span class="k">if</span> <span class="s2">&#34;&lt;/think&gt;&#34;</span> <span class="ow">in</span> <span class="n">text</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">31</span><span class="cl">        <span class="n">text</span> <span class="o">=</span> <span class="n">text</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s2">&#34;&lt;/think&gt;&#34;</span><span class="p">)[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
</span></span><span class="line"><span class="ln">32</span><span class="cl">
</span></span><span class="line"><span class="ln">33</span><span class="cl">    <span class="c1"># 找 JSON 區塊</span>
</span></span><span class="line"><span class="ln">34</span><span class="cl">    <span class="n">start</span> <span class="o">=</span> <span class="n">text</span><span class="o">.</span><span class="n">find</span><span class="p">(</span><span class="s2">&#34;{&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">35</span><span class="cl">    <span class="n">end</span> <span class="o">=</span> <span class="n">text</span><span class="o">.</span><span class="n">rfind</span><span class="p">(</span><span class="s2">&#34;}&#34;</span><span class="p">)</span> <span class="o">+</span> <span class="mi">1</span>
</span></span><span class="line"><span class="ln">36</span><span class="cl">    <span class="k">if</span> <span class="n">start</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span> <span class="ow">or</span> <span class="n">end</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">37</span><span class="cl">        <span class="k">return</span> <span class="kc">None</span>
</span></span><span class="line"><span class="ln">38</span><span class="cl">    <span class="k">try</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">39</span><span class="cl">        <span class="k">return</span> <span class="n">json</span><span class="o">.</span><span class="n">loads</span><span class="p">(</span><span class="n">text</span><span class="p">[</span><span class="n">start</span><span class="p">:</span><span class="n">end</span><span class="p">])</span>
</span></span><span class="line"><span class="ln">40</span><span class="cl">    <span class="k">except</span> <span class="n">json</span><span class="o">.</span><span class="n">JSONDecodeError</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">41</span><span class="cl">        <span class="k">return</span> <span class="kc">None</span>
</span></span><span class="line"><span class="ln">42</span><span class="cl">
</span></span><span class="line"><span class="ln">43</span><span class="cl"><span class="k">def</span> <span class="nf">run_harness</span><span class="p">(</span><span class="n">dataset_path</span><span class="p">,</span> <span class="n">prompt_template_path</span><span class="p">,</span> <span class="n">output_path</span><span class="p">):</span>
</span></span><span class="line"><span class="ln">44</span><span class="cl">    <span class="n">dataset</span> <span class="o">=</span> <span class="n">load_dataset</span><span class="p">(</span><span class="n">dataset_path</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">45</span><span class="cl">    <span class="n">template</span> <span class="o">=</span> <span class="n">load_prompt_template</span><span class="p">(</span><span class="n">prompt_template_path</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">46</span><span class="cl">
</span></span><span class="line"><span class="ln">47</span><span class="cl">    <span class="n">results</span> <span class="o">=</span> <span class="p">[]</span>
</span></span><span class="line"><span class="ln">48</span><span class="cl">    <span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">item</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">dataset</span><span class="p">):</span>
</span></span><span class="line"><span class="ln">49</span><span class="cl">        <span class="n">prompt</span> <span class="o">=</span> <span class="n">template</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="n">item</span><span class="p">[</span><span class="s2">&#34;input&#34;</span><span class="p">],</span> <span class="n">output</span><span class="o">=</span><span class="n">item</span><span class="p">[</span><span class="s2">&#34;output&#34;</span><span class="p">])</span>
</span></span><span class="line"><span class="ln">50</span><span class="cl">        <span class="n">raw</span> <span class="o">=</span> <span class="n">call_judge</span><span class="p">(</span><span class="n">prompt</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">51</span><span class="cl">        <span class="n">parsed</span> <span class="o">=</span> <span class="n">parse_judge_output</span><span class="p">(</span><span class="n">raw</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">52</span><span class="cl">
</span></span><span class="line"><span class="ln">53</span><span class="cl">        <span class="n">result</span> <span class="o">=</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">54</span><span class="cl">            <span class="s2">&#34;id&#34;</span><span class="p">:</span> <span class="n">item</span><span class="p">[</span><span class="s2">&#34;id&#34;</span><span class="p">],</span>
</span></span><span class="line"><span class="ln">55</span><span class="cl">            <span class="s2">&#34;scores&#34;</span><span class="p">:</span> <span class="n">parsed</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">56</span><span class="cl">            <span class="s2">&#34;raw_judge_output&#34;</span><span class="p">:</span> <span class="n">raw</span><span class="p">[:</span><span class="mi">500</span><span class="p">],</span>  <span class="c1"># 保留前 500 字便於 debug</span>
</span></span><span class="line"><span class="ln">57</span><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="ln">58</span><span class="cl">        <span class="n">results</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">result</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">59</span><span class="cl">        <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;[</span><span class="si">{</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="si">}</span><span class="s2">/</span><span class="si">{</span><span class="nb">len</span><span class="p">(</span><span class="n">dataset</span><span class="p">)</span><span class="si">}</span><span class="s2">] id=</span><span class="si">{</span><span class="n">item</span><span class="p">[</span><span class="s1">&#39;id&#39;</span><span class="p">]</span><span class="si">}</span><span class="s2"> overall=</span><span class="si">{</span><span class="n">parsed</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">&#39;overall&#39;</span><span class="p">)</span> <span class="k">if</span> <span class="n">parsed</span> <span class="k">else</span> <span class="s1">&#39;FAIL&#39;</span><span class="si">}</span><span class="s2">&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">60</span><span class="cl">
</span></span><span class="line"><span class="ln">61</span><span class="cl">    <span class="c1"># 寫出 JSONL</span>
</span></span><span class="line"><span class="ln">62</span><span class="cl">    <span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">output_path</span><span class="p">,</span> <span class="s2">&#34;w&#34;</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">63</span><span class="cl">        <span class="k">for</span> <span class="n">r</span> <span class="ow">in</span> <span class="n">results</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">64</span><span class="cl">            <span class="n">f</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">json</span><span class="o">.</span><span class="n">dumps</span><span class="p">(</span><span class="n">r</span><span class="p">)</span> <span class="o">+</span> <span class="s2">&#34;</span><span class="se">\n</span><span class="s2">&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">65</span><span class="cl">
</span></span><span class="line"><span class="ln">66</span><span class="cl">    <span class="c1"># Aggregate</span>
</span></span><span class="line"><span class="ln">67</span><span class="cl">    <span class="n">valid</span> <span class="o">=</span> <span class="p">[</span><span class="n">r</span> <span class="k">for</span> <span class="n">r</span> <span class="ow">in</span> <span class="n">results</span> <span class="k">if</span> <span class="n">r</span><span class="p">[</span><span class="s2">&#34;scores&#34;</span><span class="p">]]</span>
</span></span><span class="line"><span class="ln">68</span><span class="cl">    <span class="k">if</span> <span class="n">valid</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">69</span><span class="cl">        <span class="n">avg</span> <span class="o">=</span> <span class="nb">sum</span><span class="p">(</span><span class="n">r</span><span class="p">[</span><span class="s2">&#34;scores&#34;</span><span class="p">][</span><span class="s2">&#34;overall&#34;</span><span class="p">]</span> <span class="k">for</span> <span class="n">r</span> <span class="ow">in</span> <span class="n">valid</span><span class="p">)</span> <span class="o">/</span> <span class="nb">len</span><span class="p">(</span><span class="n">valid</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">70</span><span class="cl">        <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;</span><span class="se">\n</span><span class="s2">Aggregate: </span><span class="si">{</span><span class="nb">len</span><span class="p">(</span><span class="n">valid</span><span class="p">)</span><span class="si">}</span><span class="s2">/</span><span class="si">{</span><span class="nb">len</span><span class="p">(</span><span class="n">results</span><span class="p">)</span><span class="si">}</span><span class="s2"> valid、avg overall = </span><span class="si">{</span><span class="n">avg</span><span class="si">:</span><span class="s2">.2f</span><span class="si">}</span><span class="s2">&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">71</span><span class="cl">
</span></span><span class="line"><span class="ln">72</span><span class="cl"><span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s2">&#34;__main__&#34;</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">73</span><span class="cl">    <span class="n">run_harness</span><span class="p">(</span><span class="s2">&#34;data/eval.jsonl&#34;</span><span class="p">,</span> <span class="s2">&#34;prompts/judge.txt&#34;</span><span class="p">,</span> <span class="s2">&#34;results/eval.jsonl&#34;</span><span class="p">)</span></span></span></code></pre></div><p>跑：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># 先確認 judge model 已 pull</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">ollama pull deepseek-r1:32b
</span></span><span class="line"><span class="ln">3</span><span class="cl">
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="c1"># 跑 harness</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">python judge_harness.py</span></span></code></pre></div><h2 id="step-5aggregate-跟看-outlier">Step 5：Aggregate 跟看 outlier</h2>
<p>跑完後 results/eval.jsonl 含每筆評分跟 reasoning。看哪些是 outlier：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># 找 overall &lt; 3 的 case（低分、值得 review）</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">jq <span class="s1">&#39;select(.scores.overall &lt; 3)&#39;</span> results/eval.jsonl
</span></span><span class="line"><span class="ln">3</span><span class="cl">
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="c1"># 看 reasoning 找系統性問題</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">jq <span class="s1">&#39;.scores.reasoning&#39;</span> results/eval.jsonl <span class="p">|</span> sort -u</span></span></code></pre></div><p>判讀：</p>
<ul>
<li><strong>多數 score 4-5、少數 1-2</strong>：整體品質好、focus 在低分 case 找 fix</li>
<li><strong>多數 score 2-3</strong>：系統性問題、改 prompt / model / agent design</li>
<li><strong>分數分佈兩極（很多 5 很多 1）</strong>：可能是 task difficulty 分群、stratified analysis</li>
</ul>
<h2 id="step-6calibration可選但推薦">Step 6：Calibration（可選但推薦）</h2>
<p>跟 human eval 比對、確認 judge 對齊：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">1. 從 dataset 抽 30 個（覆蓋 difficulty / score 分佈）
</span></span><span class="line"><span class="ln">2</span><span class="cl">2. 自己 human eval（依同樣 rubric）
</span></span><span class="line"><span class="ln">3</span><span class="cl">3. 對比 judge 跟 human 的 overall score
</span></span><span class="line"><span class="ln">4</span><span class="cl">4. 算 Spearman correlation
</span></span><span class="line"><span class="ln">5</span><span class="cl">   - &gt; 0.7：judge 對齊夠好、可信
</span></span><span class="line"><span class="ln">6</span><span class="cl">   - 0.5-0.7：部分問題、改 rubric
</span></span><span class="line"><span class="ln">7</span><span class="cl">   - &lt; 0.5：judge 不可信、換 model 或重寫 rubric</span></span></code></pre></div><p>低 correlation 的常見原因：</p>
<ul>
<li>Rubric 太 vague、judge 自由發揮</li>
<li>Judge model 能力不夠（換更強 judge）</li>
<li>Verbosity / position bias 沒緩解</li>
<li>Eval task 跟 judge 訓練分佈差距大</li>
</ul>
<h2 id="step-7跟-production-trace-串接延伸">Step 7：跟 production trace 串接（延伸）</h2>
<p>把 <a href="/blog/llm/04-applications/llm-tracing-and-observability/" data-link-title="4.20 LLM tracing 與 observability" data-link-desc="OpenTelemetry GenAI semantic conventions、結構化 span 設計、cost / latency 監控、failure debug 流程、跟 LLM-as-judge eval 的串接">4.20 LLM tracing</a> 蒐集的 production trace export 成 JSONL、定期跑 judge：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># 假設用 Langfuse self-host</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">langfuse <span class="nb">export</span> --filter <span class="s2">&#34;user_feedback=negative&#34;</span> --output traces.jsonl
</span></span><span class="line"><span class="ln">3</span><span class="cl">
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="c1"># 轉成 eval format</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">python convert_trace_to_eval.py traces.jsonl &gt; data/eval-from-prod.jsonl
</span></span><span class="line"><span class="ln">6</span><span class="cl">
</span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="c1"># 跑 judge</span>
</span></span><span class="line"><span class="ln">8</span><span class="cl">python judge_harness.py</span></span></code></pre></div><p>這是 production quality engineering 閉環的本地版本、隱私敏感場景的 cost-free alternative。</p>
<h2 id="失敗模式">失敗模式</h2>
<ol>
<li><strong>Judge 不輸出合法 JSON</strong>：reasoning model 可能在 <code>&lt;think&gt;...&lt;/think&gt;</code> 後仍加 markdown / 解釋</li>
</ol>
<p><strong>緩解</strong>：parse 時跳 <code>&lt;think&gt;</code> 段、容錯處理、或開 <a href="/blog/llm/knowledge-cards/constrained-decoding/" data-link-title="Constrained Decoding" data-link-desc="推論時用 grammar 強制 LLM 輸出符合特定格式（JSON / regex / CFG）的 sampling 機制、把不合法 token 的機率歸零">constrained decoding</a>（llama.cpp grammar）</p>
<ol start="2">
<li><strong>Latency 太長、batch 跑不完</strong>：reasoning model 32B 每 item 60-120s、100 item 要 2 小時</li>
</ol>
<p><strong>緩解</strong>：用較小 judge model（如 Qwen2.5-32B instruct、非 reasoning）、或拆 batch 並行</p>
<ol start="3">
<li><strong>Judge bias 沒緩解</strong>：本地 judge 跟雲端 judge 都會有 verbosity / position bias</li>
</ol>
<p><strong>緩解</strong>：rubric 寫明、pairwise 換位置跑 2 次</p>
<ol start="4">
<li><strong>本地 judge 能力上限</strong>：30B distill 對 nuanced case 判讀不如雲端旗艦</li>
</ol>
<p><strong>緩解</strong>：critical case 加 spot human review、或混用本地（量大）+ 雲端（精選 sample）</p>
<h2 id="跟其他章節的關係">跟其他章節的關係</h2>
<ul>
<li>原理層的 LLM-as-judge 設計見 <a href="/blog/llm/04-applications/llm-as-judge/" data-link-title="4.21 LLM-as-Judge 評估方法" data-link-desc="LLM 評估 LLM 的 production eval 方法：rubric design、pairwise / direct scoring、三大 bias 緩解、跟 trace 串接的閉環、calibration">4.21</a></li>
<li>Production trace 串接見 <a href="/blog/llm/04-applications/llm-tracing-and-observability/" data-link-title="4.20 LLM tracing 與 observability" data-link-desc="OpenTelemetry GenAI semantic conventions、結構化 span 設計、cost / latency 監控、failure debug 流程、跟 LLM-as-judge eval 的串接">4.20 tracing</a></li>
<li>Reasoning model 選型見 <a href="/blog/llm/03-theoretical-foundations/reasoning-models/" data-link-title="3.8 Reasoning models：test-time compute paradigm" data-link-desc="Chain-of-thought 從 prompting 技巧演化成訓練 paradigm、reasoning model 的內部運作、本地可跑的選項與適用任務">3.8</a></li>
<li>隱私 / 跨雲端邊界判讀見 <a href="/blog/llm/06-security/cross-cloud-local-data-boundary/" data-link-title="6.4 跨雲端 / 本地的資料邊界" data-link-desc="個人 dev 場景下混用雲端 LLM 跟本地 LLM 時的 prompt 洩漏點：Continue.dev 多 provider 設定、隱私資料流、按敏感度分流的判讀">6.4</a></li>
<li>Benchmark 跟 in-house eval 的層次見 <a href="/blog/llm/04-applications/benchmarking-and-evaluation/" data-link-title="4.14 Benchmarking 與評估方法論" data-link-desc="判讀 model card benchmark 數字、做自己工作流的 in-house benchmark、量測本地推論速度的完整方法論">4.14</a></li>
</ul>
]]></content:encoded></item><item><title>Hands-on：RAG / MCP 的資源 footprint</title><link>https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/rag-mcp-resources/</link><pubDate>Tue, 12 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/rag-mcp-resources/</guid><description>&lt;p>&lt;a href="https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/resource-management/" data-link-title="Hands-on：LLM 運行中 &amp;#43; 結束的資源管理" data-link-desc="RAM / 磁碟 / port 三個 dimension 的觀察跟釋放、Ollama keep_alive 跟 ComfyUI 兩種 lifecycle 對比、實測釋放數字">Resource management 章&lt;/a> 講的是 Ollama / ComfyUI 等&lt;a href="https://tarrragon.github.io/blog/llm/knowledge-cards/inference-server/" data-link-title="Inference Server" data-link-desc="載入模型權重、處理 prompt、產生 token 的常駐 process">推論伺服器&lt;/a>的 lifecycle。但&lt;strong>跑 &lt;a href="https://tarrragon.github.io/blog/llm/knowledge-cards/rag/" data-link-title="RAG" data-link-desc="Retrieval-Augmented Generation：動態外掛知識給 LLM、繞開模型參數記憶的靜態限制">RAG&lt;/a> / &lt;a href="https://tarrragon.github.io/blog/llm/knowledge-cards/mcp/" data-link-title="MCP（Model Context Protocol）" data-link-desc="LLM application ↔ 外部 tool server 之間的標準化協議、複用 OpenAI 相容 API 的成功模式">MCP&lt;/a> 應用&lt;/strong>比單純 chat 多吃幾倍資源——&lt;a href="https://tarrragon.github.io/blog/llm/knowledge-cards/embedding-model/" data-link-title="Embedding Model" data-link-desc="把文字轉成向量的模型：用於 codebase 索引與語意搜尋">embedding model&lt;/a>、chat model、index 檔、subprocess、tool 邏輯——而且不同階段（ingest vs query）的瓶頸不一樣。&lt;/p>
&lt;p>本篇紀錄 &lt;a href="https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/rag-demo/" data-link-title="Hands-on：用 blog content 當 corpus 跑 RAG" data-link-desc="200 行 Python：embedding &amp;#43; cosine retrieval &amp;#43; Ollama chat、validating 4.0 RAG 原理">RAG demo&lt;/a> 跟 &lt;a href="https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/mcp-demo/" data-link-title="Hands-on：用 blog content 寫一個最小 MCP server" data-link-desc="stdio JSON-RPC、stdlib-only Python、暴露 blog content 給 LLM 用、validating 4.3 應用層協議">MCP demo&lt;/a> 跑起來的實測資源 footprint、提供本地多模型並存的 baseline、給寫 production 應用前的 sanity check。&lt;/p>
&lt;blockquote>
&lt;p>&lt;strong>驗證日期&lt;/strong>：2026-05-12
&lt;strong>環境&lt;/strong>：M4 Pro 32 GB、Ollama 0.23.2、Python 3.14
&lt;strong>Corpus&lt;/strong>：本 blog 的 &lt;code>content/llm/&lt;/code>、71 個 markdown 檔、463 chunks&lt;/p>&lt;/blockquote>
&lt;h2 id="各階段資源-footprint">各階段資源 footprint&lt;/h2>
&lt;p>RAG / MCP 工作流通常分三階段、各自吃不同資源：&lt;/p>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>階段&lt;/th>
 &lt;th>主要資源消耗&lt;/th>
 &lt;th>持續時間&lt;/th>
 &lt;th>是否常駐&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>&lt;strong>RAG ingest&lt;/strong>&lt;/td>
 &lt;td>embedding model RAM + CPU + 磁碟寫&lt;/td>
 &lt;td>one-shot（corpus 更動時跑）&lt;/td>
 &lt;td>否&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;strong>RAG query&lt;/strong>&lt;/td>
 &lt;td>index 載入 RAM + chat model RAM + GPU&lt;/td>
 &lt;td>per-request&lt;/td>
 &lt;td>retrieval index 常駐&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;strong>MCP server&lt;/strong>&lt;/td>
 &lt;td>subprocess 永久跑、tool 呼叫時動態載資源&lt;/td>
 &lt;td>session 內常駐&lt;/td>
 &lt;td>是&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>不同階段的瓶頸不一樣、優化目標也不同。&lt;/p>
&lt;h2 id="rag-ingest-階段one-shot-但批次密集">RAG Ingest 階段：one-shot 但批次密集&lt;/h2>
&lt;p>跑 &lt;code>python3 scripts/rag-demo/ingest.py&lt;/code> 時：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-text" data-lang="text">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">Found 71 markdown files under content/llm
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl"> [10/71] 86 chunks in 4.5s
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl"> [20/71] 181 chunks in 8.6s
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl"> ...
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl"> [70/71] 461 chunks in 22.2s
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">6&lt;/span>&lt;span class="cl">Wrote 463 records to scripts/rag-demo/index.pkl (22.3s)&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>實測資源消耗：&lt;/p></description><content:encoded><![CDATA[<p><a href="/blog/llm/01-local-llm-services/hands-on/resource-management/" data-link-title="Hands-on：LLM 運行中 &#43; 結束的資源管理" data-link-desc="RAM / 磁碟 / port 三個 dimension 的觀察跟釋放、Ollama keep_alive 跟 ComfyUI 兩種 lifecycle 對比、實測釋放數字">Resource management 章</a> 講的是 Ollama / ComfyUI 等<a href="/blog/llm/knowledge-cards/inference-server/" data-link-title="Inference Server" data-link-desc="載入模型權重、處理 prompt、產生 token 的常駐 process">推論伺服器</a>的 lifecycle。但<strong>跑 <a href="/blog/llm/knowledge-cards/rag/" data-link-title="RAG" data-link-desc="Retrieval-Augmented Generation：動態外掛知識給 LLM、繞開模型參數記憶的靜態限制">RAG</a> / <a href="/blog/llm/knowledge-cards/mcp/" data-link-title="MCP（Model Context Protocol）" data-link-desc="LLM application ↔ 外部 tool server 之間的標準化協議、複用 OpenAI 相容 API 的成功模式">MCP</a> 應用</strong>比單純 chat 多吃幾倍資源——<a href="/blog/llm/knowledge-cards/embedding-model/" data-link-title="Embedding Model" data-link-desc="把文字轉成向量的模型：用於 codebase 索引與語意搜尋">embedding model</a>、chat model、index 檔、subprocess、tool 邏輯——而且不同階段（ingest vs query）的瓶頸不一樣。</p>
<p>本篇紀錄 <a href="/blog/llm/01-local-llm-services/hands-on/rag-demo/" data-link-title="Hands-on：用 blog content 當 corpus 跑 RAG" data-link-desc="200 行 Python：embedding &#43; cosine retrieval &#43; Ollama chat、validating 4.0 RAG 原理">RAG demo</a> 跟 <a href="/blog/llm/01-local-llm-services/hands-on/mcp-demo/" data-link-title="Hands-on：用 blog content 寫一個最小 MCP server" data-link-desc="stdio JSON-RPC、stdlib-only Python、暴露 blog content 給 LLM 用、validating 4.3 應用層協議">MCP demo</a> 跑起來的實測資源 footprint、提供本地多模型並存的 baseline、給寫 production 應用前的 sanity check。</p>
<blockquote>
<p><strong>驗證日期</strong>：2026-05-12
<strong>環境</strong>：M4 Pro 32 GB、Ollama 0.23.2、Python 3.14
<strong>Corpus</strong>：本 blog 的 <code>content/llm/</code>、71 個 markdown 檔、463 chunks</p></blockquote>
<h2 id="各階段資源-footprint">各階段資源 footprint</h2>
<p>RAG / MCP 工作流通常分三階段、各自吃不同資源：</p>
<table>
  <thead>
      <tr>
          <th>階段</th>
          <th>主要資源消耗</th>
          <th>持續時間</th>
          <th>是否常駐</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><strong>RAG ingest</strong></td>
          <td>embedding model RAM + CPU + 磁碟寫</td>
          <td>one-shot（corpus 更動時跑）</td>
          <td>否</td>
      </tr>
      <tr>
          <td><strong>RAG query</strong></td>
          <td>index 載入 RAM + chat model RAM + GPU</td>
          <td>per-request</td>
          <td>retrieval index 常駐</td>
      </tr>
      <tr>
          <td><strong>MCP server</strong></td>
          <td>subprocess 永久跑、tool 呼叫時動態載資源</td>
          <td>session 內常駐</td>
          <td>是</td>
      </tr>
  </tbody>
</table>
<p>不同階段的瓶頸不一樣、優化目標也不同。</p>
<h2 id="rag-ingest-階段one-shot-但批次密集">RAG Ingest 階段：one-shot 但批次密集</h2>
<p>跑 <code>python3 scripts/rag-demo/ingest.py</code> 時：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">Found 71 markdown files under content/llm
</span></span><span class="line"><span class="ln">2</span><span class="cl">  [10/71] 86 chunks in 4.5s
</span></span><span class="line"><span class="ln">3</span><span class="cl">  [20/71] 181 chunks in 8.6s
</span></span><span class="line"><span class="ln">4</span><span class="cl">  ...
</span></span><span class="line"><span class="ln">5</span><span class="cl">  [70/71] 461 chunks in 22.2s
</span></span><span class="line"><span class="ln">6</span><span class="cl">Wrote 463 records to scripts/rag-demo/index.pkl (22.3s)</span></span></code></pre></div><p>實測資源消耗：</p>
<table>
  <thead>
      <tr>
          <th>資源</th>
          <th>數字</th>
          <th>為什麼</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>RAM（峰值）</td>
          <td>~600 MB</td>
          <td>nomic-embed-text 模型 (274 MB) + Python runtime + 累積 records (~200 MB)</td>
      </tr>
      <tr>
          <td>磁碟寫</td>
          <td><code>index.pkl</code> ~3.7 MB</td>
          <td>463 records、每筆含 chunk text + 768-dim float embedding</td>
      </tr>
      <tr>
          <td>CPU + GPU</td>
          <td>Ollama 推 embedding、Apple Silicon Metal backend</td>
          <td>22 秒處理 463 個 chunk、平均 ~21 chunk/sec</td>
      </tr>
      <tr>
          <td>網路</td>
          <td>0</td>
          <td>完全本地推論</td>
      </tr>
  </tbody>
</table>
<p><strong>Ingest 階段的特性</strong>：</p>
<ul>
<li><strong>One-shot</strong>：corpus 不變不用重跑、index 寫一次永久用。</li>
<li><strong>吃 CPU 多於 RAM</strong>：產生 embedding 是 forward pass、瓶頸在 GPU 算力、RAM 沒太大壓力。</li>
<li><strong>磁碟寫小</strong>：每 chunk 約 8 KB（text 部分 ~5 KB + embedding 768 floats × 4 bytes = ~3 KB）、463 chunks 總共 ~3.7 MB。</li>
<li><strong>可平行</strong>：sequential <code>embed(chunk)</code> 是最慢實作、用 batching API（如果 Ollama 支援）或多 worker、能快 5-10x。</li>
</ul>
<p><strong>規模 extrapolation</strong>：</p>
<table>
  <thead>
      <tr>
          <th>Corpus 大小</th>
          <th>預估 ingest 時間</th>
          <th>index.pkl 大小</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>71 docs / 463 chunks（本 blog）</td>
          <td>22 秒</td>
          <td>3.7 MB</td>
      </tr>
      <tr>
          <td>1000 docs / ~7000 chunks（中型 codebase）</td>
          <td>~5 分鐘</td>
          <td>~55 MB</td>
      </tr>
      <tr>
          <td>10000 docs / ~70000 chunks（大型 codebase）</td>
          <td>~50 分鐘</td>
          <td>~550 MB</td>
      </tr>
      <tr>
          <td>100K docs / ~700K chunks（公司 wiki）</td>
          <td>~8 小時</td>
          <td>~5.5 GB</td>
      </tr>
  </tbody>
</table>
<p>10K docs 以上就應該考慮：</p>
<ul>
<li><a href="/blog/llm/knowledge-cards/batching/" data-link-title="Batching" data-link-desc="多 request 一起跑、攤平 model load 成本：production LLM inference 的核心優化、決定 throughput vs latency 取捨">Batching</a> embedding（單次 request 送 50 個 chunks）</li>
<li>並行 worker（Python multiprocessing、4-8 worker）</li>
<li>換 <a href="/blog/llm/knowledge-cards/vector-database/" data-link-title="Vector Database" data-link-desc="為高維向量 (embedding) 設計的儲存 &#43; 近似最近鄰 (ANN) 檢索系統：RAG 從 prototype 跨到 production 的關鍵元件">vector database</a>（避免把全部資料用 pickle 塞 RAM）</li>
</ul>
<h2 id="rag-query-階段retrieval-加-generation">RAG Query 階段：retrieval 加 generation</h2>
<p>跑 <code>python3 scripts/rag-demo/query.py --show-retrieved &quot;問題&quot;</code> 時：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">Loaded 463 chunks from scripts/rag-demo/index.pkl
</span></span><span class="line"><span class="ln">2</span><span class="cl">=== Retrieved chunks ===
</span></span><span class="line"><span class="ln">3</span><span class="cl">  0.870  llm/knowledge-cards/transformer.md#chunk2
</span></span><span class="line"><span class="ln">4</span><span class="cl">  ...
</span></span><span class="line"><span class="ln">5</span><span class="cl">（LLM 生成 response）</span></span></code></pre></div><p>實測資源消耗（單次 query）：</p>
<table>
  <thead>
      <tr>
          <th>階段</th>
          <th>RAM 增量</th>
          <th>時間</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>載 index.pkl 到 RAM</td>
          <td>3.7 MB（小 corpus）/ MB 級（大 corpus）</td>
          <td>&lt; 1 秒</td>
      </tr>
      <tr>
          <td>embed query</td>
          <td>0（已載入的 nomic-embed-text）</td>
          <td>200 ms</td>
      </tr>
      <tr>
          <td>cosine over 463 chunks</td>
          <td>純 Python 計算、暫時用 ~10 MB</td>
          <td>50 ms</td>
      </tr>
      <tr>
          <td>載 chat model（gemma3:1b）</td>
          <td>~1 GB（首次）/ 0（已 cached）</td>
          <td>5-10 秒（首次）/ 0（cached）</td>
      </tr>
      <tr>
          <td>生成 response</td>
          <td>0 額外</td>
          <td>5-30 秒（看 model + prompt 長度）</td>
      </tr>
  </tbody>
</table>
<p><strong>Query 階段的特性</strong>：</p>
<ul>
<li><strong>第一次 cold start</strong>：要載 chat model 進 RAM、5-10 秒首字延遲。</li>
<li><strong>後續 query 都快</strong>：embedding model + chat model 都在 RAM、retrieval 毫秒級、只剩 generation 時間。</li>
<li><strong>RAM 占用 = embedding model + chat model + index</strong>：
<ul>
<li>463 chunks: 274 MB + chat model + 3.7 MB ≈ chat model + 280 MB</li>
<li>100K chunks: 274 MB + chat model + ~800 MB 進 RAM、加上 mmap pickle 額外開銷</li>
</ul>
</li>
<li><strong>瓶頸是 chat model</strong>：retrieval 部分快、瓶頸完全在 generation。</li>
</ul>
<p><strong>多模型並存</strong>（embedding + chat）：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># 看當前 RAM 占用</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">ollama ps
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="c1"># NAME                       SIZE      UNTIL</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="c1"># nomic-embed-text:latest    274 MB    4 minutes from now</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="c1"># gemma3:4b                  5.5 GB    4 minutes from now</span></span></span></code></pre></div><p>兩個 model 都載入時、Ollama RAM 占用約 6 GB。Ollama 的 <code>OLLAMA_KEEP_ALIVE</code>（預設 5 分鐘）會 idle 後分別 unload 兩個 model。</p>
<p><strong>規模 sanity check</strong>：</p>
<table>
  <thead>
      <tr>
          <th>場景</th>
          <th>RAM 需求</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>純 chat（gemma3:1b）</td>
          <td>~1 GB</td>
      </tr>
      <tr>
          <td>RAG with gemma3:1b + nomic-embed-text + 小 index</td>
          <td>~1.5 GB</td>
      </tr>
      <tr>
          <td>RAG with gemma3:4b + nomic-embed-text + 中型 index</td>
          <td>~6 GB</td>
      </tr>
      <tr>
          <td>RAG with gemma4:31b + nomic-embed-text + 大 index</td>
          <td>~20 GB</td>
      </tr>
  </tbody>
</table>
<p>跑 RAG 比 chat 額外要 ~300-1000 MB（embedding model + index）、不會太重。</p>
<h2 id="mcp-server-階段subprocess-常駐">MCP Server 階段：subprocess 常駐</h2>
<p>跑 <code>python3 scripts/mcp-demo/test_client.py</code> 時、client 會 spawn <code>blog_mcp_server.py</code> 當 child process。</p>
<p>實測：</p>
<table>
  <thead>
      <tr>
          <th>資源</th>
          <th>數字</th>
          <th>備註</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Subprocess RAM</td>
          <td>~50 MB</td>
          <td>Python runtime + index.pkl mmap</td>
      </tr>
      <tr>
          <td>stdio pipe 數量</td>
          <td>3（stdin、stdout、stderr）</td>
          <td>每 spawn 一個 server 都要 3 FD</td>
      </tr>
      <tr>
          <td>持續時間</td>
          <td>client 在跑就在跑</td>
          <td>client 結束時 SIGPIPE 自動結束 server</td>
      </tr>
  </tbody>
</table>
<p><strong>MCP server 的特性</strong>：</p>
<ul>
<li><strong>每個 client spawn 一個 server</strong>：Claude Desktop 開 5 個 MCP server、就有 5 個 Python subprocess。</li>
<li><strong>Index lazy load</strong>：本 demo <code>load_index()</code> 第一次 call 才 read pickle、之後 cached。Cold start 第一次 tool call 稍慢。</li>
<li><strong>Process lifecycle 在 client 端</strong>：client 死了、stdin EOF、server 自然結束。Client 沒清乾淨 spawn 多次就 leak process。</li>
</ul>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># 看當前所有 MCP server</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">ps aux <span class="p">|</span> grep blog_mcp_server <span class="p">|</span> grep -v grep
</span></span><span class="line"><span class="ln">3</span><span class="cl">
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="c1"># 如果 client crash 留下 zombie：</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">pkill -f <span class="s2">&#34;blog_mcp_server.py&#34;</span></span></span></code></pre></div><p><strong>多 MCP server 並存</strong>（如 Claude Desktop 接 git server + filesystem server + custom server）：</p>
<table>
  <thead>
      <tr>
          <th>Server</th>
          <th>RAM</th>
          <th>主要負載</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>git MCP server</td>
          <td>~30 MB</td>
          <td>shell 呼叫</td>
      </tr>
      <tr>
          <td>filesystem MCP server</td>
          <td>~30 MB</td>
          <td>fs 操作</td>
      </tr>
      <tr>
          <td>blog_mcp_server（本 demo）</td>
          <td>~50 MB（含 index）</td>
          <td>embedding + retrieval</td>
      </tr>
      <tr>
          <td>5 個 server 同時</td>
          <td>~200 MB</td>
          <td>累積</td>
      </tr>
  </tbody>
</table>
<p>200 MB 在 32 GB Mac 上不顯眼、但 16 GB Mac + 多 MCP server + 大 chat model 就可能擠到。</p>
<h2 id="rag--mcp-整合完整應用-stack">RAG + MCP 整合：完整應用 stack</h2>
<p>實際應用會疊起來：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">User 在 Claude Desktop 打字
</span></span><span class="line"><span class="ln">2</span><span class="cl">  ↓
</span></span><span class="line"><span class="ln">3</span><span class="cl">Claude Desktop (~200 MB)
</span></span><span class="line"><span class="ln">4</span><span class="cl">  ↓ MCP stdio
</span></span><span class="line"><span class="ln">5</span><span class="cl">blog_mcp_server.py (~50 MB)
</span></span><span class="line"><span class="ln">6</span><span class="cl">  ↓ HTTP /api/embeddings + /v1/chat/completions
</span></span><span class="line"><span class="ln">7</span><span class="cl">Ollama daemon (~200 MB)
</span></span><span class="line"><span class="ln">8</span><span class="cl">  ↓ load
</span></span><span class="line"><span class="ln">9</span><span class="cl">nomic-embed-text 模型 (~274 MB) + 主 chat model (~6 GB)</span></span></code></pre></div><p>整體 RAM 占用範圍：</p>
<table>
  <thead>
      <tr>
          <th>配置</th>
          <th>估算</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Minimal（gemma3:1b + 小 index）</td>
          <td>~1.7 GB</td>
      </tr>
      <tr>
          <td>Standard（gemma3:4b + 中 index）</td>
          <td>~6.5 GB</td>
      </tr>
      <tr>
          <td>Heavy（gemma4:31b + 大 index + 多 MCP server）</td>
          <td>~22 GB</td>
      </tr>
  </tbody>
</table>
<p>跟 <a href="/blog/llm/01-local-llm-services/hands-on/resource-management/" data-link-title="Hands-on：LLM 運行中 &#43; 結束的資源管理" data-link-desc="RAM / 磁碟 / port 三個 dimension 的觀察跟釋放、Ollama keep_alive 跟 ComfyUI 兩種 lifecycle 對比、實測釋放數字">resource-management 章</a> 比、RAG / MCP 加 ~500 MB-1 GB overhead 在 chat 之上、是合理的 tradeoff（換來 retrieval + tool use 能力）。</p>
<h2 id="各資源類型的關鍵指標">各資源類型的關鍵指標</h2>
<p>整理三 dimension 的關鍵指標跟監控方式：</p>
<h3 id="ram">RAM</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># 看 Ollama 載了哪些 model</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">ollama ps
</span></span><span class="line"><span class="ln">3</span><span class="cl">
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="c1"># 看所有 LLM-related process</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">ps aux <span class="p">|</span> grep -E <span class="s2">&#34;ollama|comfyui|mcp&#34;</span> <span class="p">|</span> grep -v grep <span class="p">|</span> awk <span class="s1">&#39;{print $4, $11, $12, $13}&#39;</span> <span class="p">|</span> sort -rn
</span></span><span class="line"><span class="ln">6</span><span class="cl">
</span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="c1"># 系統整體</span>
</span></span><span class="line"><span class="ln">8</span><span class="cl">vm_stat <span class="p">|</span> head -3</span></span></code></pre></div><p><strong>告警閾值</strong>：</p>
<ul>
<li>RAM 占用 &gt; 80% 系統總量：開始考慮 unload model 或關掉 ComfyUI</li>
<li>看到 swap 增加（<code>vm_stat | grep &quot;Swapouts&quot;</code>）：已經 swap、要立刻減少 model</li>
</ul>
<h3 id="磁碟">磁碟</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># Ollama models 累積</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">du -sh ~/.ollama/models
</span></span><span class="line"><span class="ln">3</span><span class="cl">
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="c1"># RAG index 累積（多個 corpus）</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">du -sh scripts/rag-demo/index*.pkl 2&gt;/dev/null
</span></span><span class="line"><span class="ln">6</span><span class="cl">
</span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="c1"># ComfyUI checkpoints / VAE / LoRA / etc</span>
</span></span><span class="line"><span class="ln">8</span><span class="cl">du -sh ~/Projects/ComfyUI/models/*</span></span></code></pre></div><p><strong>累積評估</strong>：</p>
<ul>
<li>Ollama: 每 model 1-20 GB、半年累積容易破 50 GB</li>
<li>RAG index: 每 100K chunks ~800 MB、多 corpus 累積要管</li>
<li>ComfyUI: 每 checkpoint 4-7 GB、加 LoRA / VAE / ControlNet 等可達 50+ GB</li>
</ul>
<h3 id="process--port">Process / Port</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># 一鍵 audit 所有 LLM service</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="k">for</span> p in <span class="m">11434</span> <span class="m">1234</span> <span class="m">8080</span> <span class="m">8188</span> 8000<span class="p">;</span> <span class="k">do</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">  <span class="nb">echo</span> <span class="s2">&#34;=== port </span><span class="nv">$p</span><span class="s2"> ===&#34;</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">  lsof -i :<span class="nv">$p</span> 2&gt;/dev/null <span class="p">|</span> head -2
</span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="k">done</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl">
</span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="c1"># 找 zombie subprocess（沒 parent 的 mcp server）</span>
</span></span><span class="line"><span class="ln">8</span><span class="cl">ps aux <span class="p">|</span> grep <span class="s2">&#34;mcp_server&#34;</span> <span class="p">|</span> grep -v grep</span></span></code></pre></div><p><strong>告警訊號</strong>：</p>
<ul>
<li>同 port 兩個 process listen：明顯有 zombie、要 kill</li>
<li>多個 mcp_server PPID = 1（被 reparent 到 init）：原 client 死了沒清乾淨</li>
</ul>
<h2 id="rag-應用的長期累積管理">RAG 應用的長期累積管理</h2>
<p>跑超過幾週、會累積：</p>
<table>
  <thead>
      <tr>
          <th>累積物</th>
          <th>為什麼累積</th>
          <th>怎麼清</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Multiple <code>index.pkl</code></td>
          <td>跑不同 corpus 各建 index、舊的沒刪</td>
          <td><code>find scripts -name 'index*.pkl' -mtime +30 -delete</code></td>
      </tr>
      <tr>
          <td>Ollama models</td>
          <td>試了不同 model 沒清</td>
          <td>看 <code>ollama list</code> modified 欄、<code>ollama rm</code> 不用的</td>
      </tr>
      <tr>
          <td>Python <code>__pycache__</code></td>
          <td>每次跑 script 累積</td>
          <td><code>.gitignore</code> 已包、本地 <code>find . -name __pycache__ -exec rm -rf {} +</code></td>
      </tr>
      <tr>
          <td>Embedding cache</td>
          <td>如果你寫了 embedding cache 機制</td>
          <td>各自清理策略</td>
      </tr>
  </tbody>
</table>
<p>清理 idiom：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># 每月跑一次的 cleanup</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">llm-rag-cleanup<span class="o">()</span> <span class="o">{</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">  <span class="nb">echo</span> <span class="s2">&#34;[*] Old indexes (&gt;30 days):&#34;</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">  find scripts -name <span class="s1">&#39;index*.pkl&#39;</span> -mtime +30 -ls
</span></span><span class="line"><span class="ln">5</span><span class="cl">  <span class="nb">echo</span> <span class="s2">&#34;[*] Ollama models (review):&#34;</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl">  ollama list
</span></span><span class="line"><span class="ln">7</span><span class="cl">  <span class="nb">echo</span> <span class="s2">&#34;[*] Python caches:&#34;</span>
</span></span><span class="line"><span class="ln">8</span><span class="cl">  find ~/Projects -name __pycache__ -type d <span class="p">|</span> head -10
</span></span><span class="line"><span class="ln">9</span><span class="cl"><span class="o">}</span></span></span></code></pre></div><h2 id="跟-production-的差距預告">跟 production 的差距預告</h2>
<p>本篇紀錄的數字、是「single-user、single-machine、no concurrency」的 baseline。Production 場景多了幾個維度：</p>
<table>
  <thead>
      <tr>
          <th>維度</th>
          <th>本地</th>
          <th>Production</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>並發 user</td>
          <td>1</td>
          <td>10-10000</td>
      </tr>
      <tr>
          <td>Index 大小</td>
          <td>&lt; 100 MB</td>
          <td>TB 級</td>
      </tr>
      <tr>
          <td>Model serving</td>
          <td>Ollama 1 process</td>
          <td>vLLM / TGI / Triton 多 worker</td>
      </tr>
      <tr>
          <td>Vector storage</td>
          <td>pickle</td>
          <td>Pinecone / Weaviate / pgvector</td>
      </tr>
      <tr>
          <td>Latency 要求</td>
          <td>秒級 OK</td>
          <td>p50 &lt; 500ms、p99 &lt; 2s</td>
      </tr>
      <tr>
          <td>Cost model</td>
          <td>一次性硬體</td>
          <td>$/request、$/token</td>
      </tr>
      <tr>
          <td>Observability</td>
          <td>tail log</td>
          <td>metrics / traces / dashboards</td>
      </tr>
      <tr>
          <td>失敗模式</td>
          <td>crash → 自己重啟</td>
          <td>99.9% uptime SLA</td>
      </tr>
  </tbody>
</table>
<p>Production 視角詳細展開見 <a href="/blog/llm/04-applications/production-resource-planning/" data-link-title="4.9 Production 部署的資源評估原理" data-link-desc="從本地單 user 到 production multi-tenant：concurrent users、cost model、observability、SLA、capacity planning 的設計取捨">4.9 Production 部署的資源評估原理</a>。</p>
<h2 id="何時這篇會過時">何時這篇會過時</h2>
<p><strong>不會過時的部分</strong>：</p>
<ul>
<li>三階段 footprint 分類（ingest / query / server）</li>
<li>RAM / 磁碟 / process 三 dimension 的監控指令</li>
<li>多模型並存的 RAM 預估方法</li>
<li>長期累積管理 idiom</li>
</ul>
<p><strong>會變的部分</strong>：</p>
<ul>
<li>具體 RAM / 磁碟數字（隨模型架構、量化方法演化）</li>
<li><code>OLLAMA_KEEP_ALIVE</code> 等具體環境變數名</li>
<li>哪些 vector DB 主流（會持續演化）</li>
</ul>
<p>讀的時候若 RAM 占用跟本篇對不上、可能是新 model 架構效率改變、用同樣方法量自己環境的 baseline 即可。</p>
<p>跟其他 hands-on 章節的關係：完整 hands-on 系列見 <a href="/blog/llm/01-local-llm-services/hands-on/" data-link-title="Hands-on：本地 AI 工具實作筆記" data-link-desc="Ollama / ComfyUI / Whisper / Piper TTS：實際安裝、驗證、跑通的紀錄。隨工具版本演化、跟 1.x 原理章節互補。">Hands-on 章節索引</a>、實作配對見 <a href="/blog/llm/01-local-llm-services/hands-on/rag-demo/" data-link-title="Hands-on：用 blog content 當 corpus 跑 RAG" data-link-desc="200 行 Python：embedding &#43; cosine retrieval &#43; Ollama chat、validating 4.0 RAG 原理">RAG demo</a> 跟 <a href="/blog/llm/01-local-llm-services/hands-on/mcp-demo/" data-link-title="Hands-on：用 blog content 寫一個最小 MCP server" data-link-desc="stdio JSON-RPC、stdlib-only Python、暴露 blog content 給 LLM 用、validating 4.3 應用層協議">MCP demo</a>、Ollama / ComfyUI 共用的 lifecycle 管理見 <a href="/blog/llm/01-local-llm-services/hands-on/resource-management/" data-link-title="Hands-on：LLM 運行中 &#43; 結束的資源管理" data-link-desc="RAM / 磁碟 / port 三個 dimension 的觀察跟釋放、Ollama keep_alive 跟 ComfyUI 兩種 lifecycle 對比、實測釋放數字">Resource management</a>、Apple Silicon 統一記憶體預算原理見 <a href="/blog/llm/00-foundations/hardware-memory-budget/" data-link-title="0.5 Apple Silicon 記憶體預算" data-link-desc="記憶體決定能跑什麼，Q4 量化下的可運作模型對照與系統保留">0.5 記憶體預算</a>。</p>
<h2 id="跑這篇實測的指令總結">跑這篇實測的指令總結</h2>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># 1. RAG ingest 階段 RAM 量</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">ollama ps  <span class="c1"># 先看 baseline</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">python3 scripts/rag-demo/ingest.py <span class="p">&amp;</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="nv">INGEST_PID</span><span class="o">=</span><span class="nv">$!</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">ollama ps  <span class="c1"># 看 embedding model 載入後</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">vm_stat <span class="p">|</span> head -3
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="nb">wait</span> <span class="nv">$INGEST_PID</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="c1"># 2. RAG query 階段 RAM 量</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">ollama ps  <span class="c1"># 看 idle 後 unload</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">python3 scripts/rag-demo/query.py --show-retrieved <span class="s2">&#34;test query&#34;</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">ollama ps  <span class="c1"># 看 chat model 載入</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">
</span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="c1"># 3. MCP server 階段 process / RAM</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">python3 scripts/mcp-demo/test_client.py <span class="p">&amp;</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl"><span class="nv">CLIENT_PID</span><span class="o">=</span><span class="nv">$!</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl">sleep <span class="m">2</span>
</span></span><span class="line"><span class="ln">18</span><span class="cl">ps aux <span class="p">|</span> grep blog_mcp_server <span class="p">|</span> grep -v grep
</span></span><span class="line"><span class="ln">19</span><span class="cl"><span class="nb">wait</span> <span class="nv">$CLIENT_PID</span>
</span></span><span class="line"><span class="ln">20</span><span class="cl">
</span></span><span class="line"><span class="ln">21</span><span class="cl"><span class="c1"># 4. 完成釋放</span>
</span></span><span class="line"><span class="ln">22</span><span class="cl">ollama list <span class="p">|</span> tail -n +2 <span class="p">|</span> awk <span class="s1">&#39;{print $1}&#39;</span> <span class="p">|</span> xargs -I <span class="o">{}</span> <span class="se">\
</span></span></span><span class="line"><span class="ln">23</span><span class="cl"><span class="se"></span>  curl -s http://localhost:11434/api/generate -d <span class="s2">&#34;{\&#34;model\&#34;:\&#34;{}\&#34;,\&#34;keep_alive\&#34;:0}&#34;</span></span></span></code></pre></div>]]></content:encoded></item><item><title>4.x Hands-on：端到端案例</title><link>https://tarrragon.github.io/blog/llm/04-applications/hands-on/</link><pubDate>Thu, 14 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/llm/04-applications/hands-on/</guid><description>&lt;p>本子資料夾收錄把模組四原理串起來的端到端案例。跟前面 principle-first 章節的差別：principle 章節是「跨工具不變的原理」、hands-on 是「把這些原理放在同一個任務上、走一遍完整流程」。&lt;/p>
&lt;p>讀法建議：先讀 principle 章節建立心智模型、再進 hands-on 看「實際做的時候、原理怎麼落」。&lt;/p>
&lt;h2 id="案例列表">案例列表&lt;/h2>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>案例&lt;/th>
 &lt;th>主題&lt;/th>
 &lt;th>對應原理章節&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>&lt;a href="https://tarrragon.github.io/blog/llm/04-applications/hands-on/customer-support-case-study/" data-link-title="Case Study：customer support agent 從 task decomposition 到 eval" data-link-desc="把模組四原理串成端到端案例：observe → decompose → design workflow → instrument trace → design eval → iterate。每段標出引用哪章。">Customer support agent 從零到 eval&lt;/a>&lt;/td>
 &lt;td>Task decomposition → 設計 → trace → eval → iterate&lt;/td>
 &lt;td>4.0 prompt / 4.1 RAG / 4.3 tool / 4.4 agent / 4.5 HITL / 4.7 workflow / 4.13 eval / 4.20 trace&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;a href="https://tarrragon.github.io/blog/llm/04-applications/hands-on/blog-vector-search/" data-link-title="Case Study：Blog 語意搜尋從 pickle 到 production" data-link-desc="為 CLI 或個人工具選 RAG storage backend、或原始選型理由被 benchmark 推翻但結論不變時，如何區分結論、理由與前提">Blog 語意搜尋從 pickle 到 production&lt;/a>&lt;/td>
 &lt;td>Storage 選型 → 實作 → 效能優化 → 四方案 benchmark&lt;/td>
 &lt;td>4.1 RAG / 4.12 embedding / 4.14 benchmarking / 4.22 storage 工程&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table></description><content:encoded><![CDATA[<p>本子資料夾收錄把模組四原理串起來的端到端案例。跟前面 principle-first 章節的差別：principle 章節是「跨工具不變的原理」、hands-on 是「把這些原理放在同一個任務上、走一遍完整流程」。</p>
<p>讀法建議：先讀 principle 章節建立心智模型、再進 hands-on 看「實際做的時候、原理怎麼落」。</p>
<h2 id="案例列表">案例列表</h2>
<table>
  <thead>
      <tr>
          <th>案例</th>
          <th>主題</th>
          <th>對應原理章節</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><a href="/blog/llm/04-applications/hands-on/customer-support-case-study/" data-link-title="Case Study：customer support agent 從 task decomposition 到 eval" data-link-desc="把模組四原理串成端到端案例：observe → decompose → design workflow → instrument trace → design eval → iterate。每段標出引用哪章。">Customer support agent 從零到 eval</a></td>
          <td>Task decomposition → 設計 → trace → eval → iterate</td>
          <td>4.0 prompt / 4.1 RAG / 4.3 tool / 4.4 agent / 4.5 HITL / 4.7 workflow / 4.13 eval / 4.20 trace</td>
      </tr>
      <tr>
          <td><a href="/blog/llm/04-applications/hands-on/blog-vector-search/" data-link-title="Case Study：Blog 語意搜尋從 pickle 到 production" data-link-desc="為 CLI 或個人工具選 RAG storage backend、或原始選型理由被 benchmark 推翻但結論不變時，如何區分結論、理由與前提">Blog 語意搜尋從 pickle 到 production</a></td>
          <td>Storage 選型 → 實作 → 效能優化 → 四方案 benchmark</td>
          <td>4.1 RAG / 4.12 embedding / 4.14 benchmarking / 4.22 storage 工程</td>
      </tr>
  </tbody>
</table>
]]></content:encoded></item><item><title>Hands-on：本地 AI 工具實作筆記</title><link>https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/</link><pubDate>Mon, 11 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/</guid><description>&lt;p>本子資料夾收錄本地 AI 工具的實際安裝跟驗證紀錄。跟 1.x 原理章節的關係：&lt;/p>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>1.x 原理章節&lt;/th>
 &lt;th>Hands-on 紀錄&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>為什麼選 Ollama&lt;/td>
 &lt;td>實際 &lt;code>brew install&lt;/code> + &lt;code>ollama pull&lt;/code> 流程&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Speculative decoding 原理&lt;/td>
 &lt;td>MTP 模型實際載入 + 速度量測&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>ComfyUI 在生態的位置&lt;/td>
 &lt;td>實際 git clone + Python 環境 + 模型路徑配置&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>本資料夾的內容&lt;strong>會隨工具版本演化&lt;/strong>：指令、目錄結構、相依套件版本都會變。寫的時間戳記在每篇開頭、版本資訊在 frontmatter。跟 1.x 原理章節的差別是「原理跨工具世代不變、實作筆記是當下這版的快照」。&lt;/p>
&lt;h2 id="章節列表">章節列表&lt;/h2>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>章節&lt;/th>
 &lt;th>主題&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>&lt;a href="https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/quickstart/" data-link-title="Hands-on Quickstart：clone repo 後跑通所有 demo" data-link-desc="4 步驟跑通 RAG / MCP / permission demo 的 setup 跟驗證指令、整合 hands-on 系列所有章節的 prerequisite">Quickstart：clone repo 後跑通所有 demo&lt;/a>&lt;/td>
 &lt;td>4 步驟整合 setup、跑 RAG / MCP / permission demo、跨 hands-on 系列導讀&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;a href="https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/ollama-setup/" data-link-title="Hands-on：安裝 Ollama &amp;#43; 拉第一個 Gemma 模型" data-link-desc="brew install ollama、launchd service、ollama pull、curl 驗證 OpenAI 相容 API">Ollama 安裝 + Gemma 模型&lt;/a>&lt;/td>
 &lt;td>brew install、ollama pull、curl 驗證&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;a href="https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/comfyui-setup/" data-link-title="Hands-on：安裝 ComfyUI &amp;#43; SDXL base" data-link-desc="git clone、venv、pip install requirements、SDXL safetensors 放哪、--listen 啟動 server、瀏覽器 workflow 驗證">ComfyUI + Stable Diffusion XL&lt;/a>&lt;/td>
 &lt;td>git clone、Python 環境、SDXL 模型放哪&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;a href="https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/whisper-setup/" data-link-title="Hands-on：安裝 whisper.cpp 做語音轉文字" data-link-desc="brew install whisper-cpp、下載 GGML model、Metal 加速、ffmpeg 餵 WAV、484ms 完成 7 秒音訊轉錄">Whisper 語音轉文字&lt;/a>&lt;/td>
 &lt;td>&lt;code>brew install whisper-cpp&lt;/code> + Metal 加速、GGML 模型選擇、&lt;code>whisper-cli&lt;/code> + ffmpeg 驗證轉錄&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;a href="https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/piper-tts-setup/" data-link-title="Hands-on：安裝 Piper TTS 做文字轉語音" data-link-desc="pip install piper-tts、ONNX voice model、stdin 餵文字、WAV 輸出、跟 Whisper 互為 round-trip 驗證">Piper TTS 文字轉語音&lt;/a>&lt;/td>
 &lt;td>下載 binary、voice 選擇、wav 輸出&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;a href="https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/rag-demo/" data-link-title="Hands-on：用 blog content 當 corpus 跑 RAG" data-link-desc="200 行 Python：embedding &amp;#43; cosine retrieval &amp;#43; Ollama chat、validating 4.0 RAG 原理">RAG demo：用 blog content 當 corpus&lt;/a>&lt;/td>
 &lt;td>embedding + retrieval、串 Ollama&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;a href="https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/mcp-demo/" data-link-title="Hands-on：用 blog content 寫一個最小 MCP server" data-link-desc="stdio JSON-RPC、stdlib-only Python、暴露 blog content 給 LLM 用、validating 4.3 應用層協議">MCP server demo：暴露 blog content&lt;/a>&lt;/td>
 &lt;td>最小 MCP server、給 LLM 用&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;a href="https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/permission-boundary/" data-link-title="Hands-on：Ollama 改檔案 / 寫程式碼的權限邊界在哪" data-link-desc="四組對照實驗：Ollama 自己沒 FS / shell 權限、wrapper 才有；--dry-run / --confirm / --auto 三檔審查粒度的取捨">權限邊界實驗：LLM 改檔案 / 寫 shell 誰執行&lt;/a>&lt;/td>
 &lt;td>LLM 是 pure function、wrapper 才是權限 gate、&lt;code>--dry-run&lt;/code> / &lt;code>--confirm&lt;/code> / &lt;code>--auto&lt;/code> 取捨&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;a href="https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/instruction-following-test/" data-link-title="Hands-on：跨資料夾風格 follow 任務的模型對比" data-link-desc="1B / 4B / 8B / 跨代 4B 在「讀風格參考、follow 既有格式、寫新章節」任務上的 structural metrics 對比、揭示 model size 不是唯一因素">跨資料夾風格 follow 任務的 model size 對比&lt;/a>&lt;/td>
 &lt;td>1B vs 4B 在「讀資料夾、follow 既有格式、寫新章節」任務上的 structural metrics phase transition&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;a href="https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/resource-management/" data-link-title="Hands-on：LLM 運行中 &amp;#43; 結束的資源管理" data-link-desc="RAM / 磁碟 / port 三個 dimension 的觀察跟釋放、Ollama keep_alive 跟 ComfyUI 兩種 lifecycle 對比、實測釋放數字">LLM 運行中 + 結束的資源管理&lt;/a>&lt;/td>
 &lt;td>RAM / 磁碟 / port 三 dimension 觀察、Ollama auto-unload vs ComfyUI persistent lifecycle、實測釋放數字、自動化 cleanup shell function&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;a href="https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/rag-mcp-resources/" data-link-title="Hands-on：RAG / MCP 的資源 footprint" data-link-desc="RAG ingest / query / MCP server 三階段的 RAM / 磁碟 / process 實測、多模型並存的 RAM 衝突、本地 LLM 跑 RAG 跟單純 chat 的差異">RAG / MCP 的資源 footprint&lt;/a>&lt;/td>
 &lt;td>RAG ingest / query / MCP server 三階段 RAM / 磁碟 / process 實測、多模型並存 RAM 衝突、長期累積管理&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;h2 id="通用前置">通用前置&lt;/h2>
&lt;p>所有工具都假設你的 Mac 滿足：&lt;/p></description><content:encoded><![CDATA[<p>本子資料夾收錄本地 AI 工具的實際安裝跟驗證紀錄。跟 1.x 原理章節的關係：</p>
<table>
  <thead>
      <tr>
          <th>1.x 原理章節</th>
          <th>Hands-on 紀錄</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>為什麼選 Ollama</td>
          <td>實際 <code>brew install</code> + <code>ollama pull</code> 流程</td>
      </tr>
      <tr>
          <td>Speculative decoding 原理</td>
          <td>MTP 模型實際載入 + 速度量測</td>
      </tr>
      <tr>
          <td>ComfyUI 在生態的位置</td>
          <td>實際 git clone + Python 環境 + 模型路徑配置</td>
      </tr>
  </tbody>
</table>
<p>本資料夾的內容<strong>會隨工具版本演化</strong>：指令、目錄結構、相依套件版本都會變。寫的時間戳記在每篇開頭、版本資訊在 frontmatter。跟 1.x 原理章節的差別是「原理跨工具世代不變、實作筆記是當下這版的快照」。</p>
<h2 id="章節列表">章節列表</h2>
<table>
  <thead>
      <tr>
          <th>章節</th>
          <th>主題</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><a href="/blog/llm/01-local-llm-services/hands-on/quickstart/" data-link-title="Hands-on Quickstart：clone repo 後跑通所有 demo" data-link-desc="4 步驟跑通 RAG / MCP / permission demo 的 setup 跟驗證指令、整合 hands-on 系列所有章節的 prerequisite">Quickstart：clone repo 後跑通所有 demo</a></td>
          <td>4 步驟整合 setup、跑 RAG / MCP / permission demo、跨 hands-on 系列導讀</td>
      </tr>
      <tr>
          <td><a href="/blog/llm/01-local-llm-services/hands-on/ollama-setup/" data-link-title="Hands-on：安裝 Ollama &#43; 拉第一個 Gemma 模型" data-link-desc="brew install ollama、launchd service、ollama pull、curl 驗證 OpenAI 相容 API">Ollama 安裝 + Gemma 模型</a></td>
          <td>brew install、ollama pull、curl 驗證</td>
      </tr>
      <tr>
          <td><a href="/blog/llm/01-local-llm-services/hands-on/comfyui-setup/" data-link-title="Hands-on：安裝 ComfyUI &#43; SDXL base" data-link-desc="git clone、venv、pip install requirements、SDXL safetensors 放哪、--listen 啟動 server、瀏覽器 workflow 驗證">ComfyUI + Stable Diffusion XL</a></td>
          <td>git clone、Python 環境、SDXL 模型放哪</td>
      </tr>
      <tr>
          <td><a href="/blog/llm/01-local-llm-services/hands-on/whisper-setup/" data-link-title="Hands-on：安裝 whisper.cpp 做語音轉文字" data-link-desc="brew install whisper-cpp、下載 GGML model、Metal 加速、ffmpeg 餵 WAV、484ms 完成 7 秒音訊轉錄">Whisper 語音轉文字</a></td>
          <td><code>brew install whisper-cpp</code> + Metal 加速、GGML 模型選擇、<code>whisper-cli</code> + ffmpeg 驗證轉錄</td>
      </tr>
      <tr>
          <td><a href="/blog/llm/01-local-llm-services/hands-on/piper-tts-setup/" data-link-title="Hands-on：安裝 Piper TTS 做文字轉語音" data-link-desc="pip install piper-tts、ONNX voice model、stdin 餵文字、WAV 輸出、跟 Whisper 互為 round-trip 驗證">Piper TTS 文字轉語音</a></td>
          <td>下載 binary、voice 選擇、wav 輸出</td>
      </tr>
      <tr>
          <td><a href="/blog/llm/01-local-llm-services/hands-on/rag-demo/" data-link-title="Hands-on：用 blog content 當 corpus 跑 RAG" data-link-desc="200 行 Python：embedding &#43; cosine retrieval &#43; Ollama chat、validating 4.0 RAG 原理">RAG demo：用 blog content 當 corpus</a></td>
          <td>embedding + retrieval、串 Ollama</td>
      </tr>
      <tr>
          <td><a href="/blog/llm/01-local-llm-services/hands-on/mcp-demo/" data-link-title="Hands-on：用 blog content 寫一個最小 MCP server" data-link-desc="stdio JSON-RPC、stdlib-only Python、暴露 blog content 給 LLM 用、validating 4.3 應用層協議">MCP server demo：暴露 blog content</a></td>
          <td>最小 MCP server、給 LLM 用</td>
      </tr>
      <tr>
          <td><a href="/blog/llm/01-local-llm-services/hands-on/permission-boundary/" data-link-title="Hands-on：Ollama 改檔案 / 寫程式碼的權限邊界在哪" data-link-desc="四組對照實驗：Ollama 自己沒 FS / shell 權限、wrapper 才有；--dry-run / --confirm / --auto 三檔審查粒度的取捨">權限邊界實驗：LLM 改檔案 / 寫 shell 誰執行</a></td>
          <td>LLM 是 pure function、wrapper 才是權限 gate、<code>--dry-run</code> / <code>--confirm</code> / <code>--auto</code> 取捨</td>
      </tr>
      <tr>
          <td><a href="/blog/llm/01-local-llm-services/hands-on/instruction-following-test/" data-link-title="Hands-on：跨資料夾風格 follow 任務的模型對比" data-link-desc="1B / 4B / 8B / 跨代 4B 在「讀風格參考、follow 既有格式、寫新章節」任務上的 structural metrics 對比、揭示 model size 不是唯一因素">跨資料夾風格 follow 任務的 model size 對比</a></td>
          <td>1B vs 4B 在「讀資料夾、follow 既有格式、寫新章節」任務上的 structural metrics phase transition</td>
      </tr>
      <tr>
          <td><a href="/blog/llm/01-local-llm-services/hands-on/resource-management/" data-link-title="Hands-on：LLM 運行中 &#43; 結束的資源管理" data-link-desc="RAM / 磁碟 / port 三個 dimension 的觀察跟釋放、Ollama keep_alive 跟 ComfyUI 兩種 lifecycle 對比、實測釋放數字">LLM 運行中 + 結束的資源管理</a></td>
          <td>RAM / 磁碟 / port 三 dimension 觀察、Ollama auto-unload vs ComfyUI persistent lifecycle、實測釋放數字、自動化 cleanup shell function</td>
      </tr>
      <tr>
          <td><a href="/blog/llm/01-local-llm-services/hands-on/rag-mcp-resources/" data-link-title="Hands-on：RAG / MCP 的資源 footprint" data-link-desc="RAG ingest / query / MCP server 三階段的 RAM / 磁碟 / process 實測、多模型並存的 RAM 衝突、本地 LLM 跑 RAG 跟單純 chat 的差異">RAG / MCP 的資源 footprint</a></td>
          <td>RAG ingest / query / MCP server 三階段 RAM / 磁碟 / process 實測、多模型並存 RAM 衝突、長期累積管理</td>
      </tr>
  </tbody>
</table>
<h2 id="通用前置">通用前置</h2>
<p>所有工具都假設你的 Mac 滿足：</p>
<ul>
<li>Apple Silicon Mac（M1 / M2 / M3 / M4）</li>
<li>macOS 14 (Sonoma) 或以上</li>
<li>Homebrew 安裝完成（<code>brew --version</code> 可看版本）</li>
<li>至少 16 GB 統一記憶體（24 GB+ 較順）</li>
<li>至少 20 GB 可用磁碟空間（本系列總共會佔約 15 GB）</li>
</ul>
<p>需要 Python 環境的工具（ComfyUI、Whisper）會用 venv 隔離、不污染系統 Python。</p>
<h2 id="驗證紀錄環境">驗證紀錄環境</h2>
<p>本系列的指令在以下環境驗證：</p>
<table>
  <thead>
      <tr>
          <th>項目</th>
          <th>版本</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>macOS</td>
          <td>Darwin 24.3.0（Sonoma 14.x）</td>
      </tr>
      <tr>
          <td>Homebrew</td>
          <td>由 <code>/opt/homebrew/bin/brew</code> 提供</td>
      </tr>
      <tr>
          <td>Python</td>
          <td>3.x（系統或 pyenv 都可）</td>
      </tr>
      <tr>
          <td>驗證日期</td>
          <td>2026-05-11</td>
      </tr>
  </tbody>
</table>
<p>換 Mac 規格、換 macOS 版本、半年後再讀本系列、指令可能要小調整、但<strong>前置設定的種類跟驗證步驟的結構</strong>通常不變。看到指令跑不過時、回 1.7 <a href="/blog/llm/01-local-llm-services/troubleshooting/" data-link-title="1.7 排錯方法論：用三層架構做故障定位" data-link-desc="故障定位的分層思考、症狀到層級的對應反射、log 在三層的角色差異、最小可重現的縮減策略">排錯方法論</a> 的三層架構定位、不要把錯誤訊息當絕對。</p>
]]></content:encoded></item><item><title>Firestore Distributed Counter Lab</title><link>https://tarrragon.github.io/blog/backend/01-database/vendors/firestore/hands-on/distributed-counter-lab/</link><pubDate>Tue, 16 Jun 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/01-database/vendors/firestore/hands-on/distributed-counter-lab/</guid><description>&lt;blockquote>
&lt;p>本文是 &lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/firestore/hands-on/" data-link-title="Firestore Hands-on 操作路線" data-link-desc="用 Firebase Emulator Suite 在本地演練 Firestore：emulator quickstart、Security Rules 單元測試、distributed counter 分片計數，全程零雲端成本、可重跑、產出可驗證 artifact">Firestore Hands-on 操作路線&lt;/a> 的 lab，實作 &lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/firestore/distributed-counter-high-frequency-write/" data-link-title="Firestore 高頻寫入與 distributed counter：單 document contention 邊界與分片計數" data-link-desc="Firestore 單一 document 有持續寫入的軟上限、高頻計數寫爆 contention 是常見事故；本文展開寫入 contention 的成因、distributed counter 分片計數的實作與讀取彙總、shard 數量與讀寫成本的取捨、五個高頻寫入踩坑，以及計數需求超過分片能處理時改走外部聚合的邊界">distributed counter 高頻寫入&lt;/a> deep article 的機制。前置環境見 &lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/firestore/hands-on/local-emulator-quickstart/" data-link-title="Firestore Local Emulator Quickstart" data-link-desc="用 Firebase CLI 啟動 Firestore emulator、寫 firestore.rules、用 admin SDK seed 資料、跑 query baseline 與 cleanup，建立後續 Security Rules 與 distributed counter lab 共用的本地環境">Local emulator quickstart&lt;/a>。&lt;/p>&lt;/blockquote>
&lt;p>Firestore distributed counter lab 的核心責任是把「分片計數」從概念變成可觀察的寫入分佈與彙總結果。這個 lab 在 emulator 上建立 N 個 shard、隨機分片寫入大量 increment、檢查寫入是否均勻打散到各 shard、再讀取彙總驗證總和正確。&lt;/p>
&lt;p>本文的驗收標準是：你能跑出一個 sharded counter、看到 N 個 shard 各自累積了大致均勻的 partial count、彙總後等於總寫入次數，並理解 emulator 能驗什麼、不能驗什麼。&lt;/p>
&lt;h2 id="先講清楚-emulator-的邊界">先講清楚 emulator 的邊界&lt;/h2>
&lt;p>這個 lab 驗證的是&lt;strong>分片計數的機制正確性&lt;/strong>：寫入是否均勻分佈、彙總是否等於總和、讀取要讀幾個 document。它不驗證的是 &lt;strong>contention 本身&lt;/strong>——emulator 不強制 production 的單 document 持續寫入軟上限，所以「不分片會寫爆」這件事在 emulator 跑不出來。contention 是 production 的規模特性，要在雲端真實負載下才會出現。&lt;/p>
&lt;p>這個分界本身是要學的判讀：emulator 證明「分片計數做對了」，雲端負載測試才證明「不分片會撞牆」。把兩者混為一談會誤以為 emulator 全綠就代表 production 安全。&lt;/p>
&lt;h2 id="lab-環境">Lab 環境&lt;/h2>
&lt;p>沿用 &lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/firestore/hands-on/local-emulator-quickstart/" data-link-title="Firestore Local Emulator Quickstart" data-link-desc="用 Firebase CLI 啟動 Firestore emulator、寫 firestore.rules、用 admin SDK seed 資料、跑 query baseline 與 cleanup，建立後續 Security Rules 與 distributed counter lab 共用的本地環境">quickstart&lt;/a> 的工作區與 emulator。確認 emulator 在跑（另一個 terminal）。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="nb">cd&lt;/span> /tmp/firestore-lab
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="c1"># 確認 emulator 已啟動：firebase emulators:start --only firestore --project demo-firestore-lab&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">&lt;span class="nb">export&lt;/span> &lt;span class="nv">FIRESTORE_EMULATOR_HOST&lt;/span>&lt;span class="o">=&lt;/span>localhost:8080&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="實作-sharded-counter">實作 sharded counter&lt;/h2>
&lt;p>counter 的核心責任是把一個邏輯計數拆成 N 個 shard document。寫入時隨機挑 shard &lt;code>increment(1)&lt;/code>，讀取時加總所有 shard。這份 script 用 admin SDK 直接對 emulator 操作。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln"> 1&lt;/span>&lt;span class="cl">cat &amp;gt; counter.js &lt;span class="s">&amp;lt;&amp;lt;&amp;#39;JS&amp;#39;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 2&lt;/span>&lt;span class="cl">&lt;span class="s">const admin = require(&amp;#39;firebase-admin&amp;#39;);
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 3&lt;/span>&lt;span class="cl">&lt;span class="s">admin.initializeApp({ projectId: &amp;#39;demo-firestore-lab&amp;#39; });
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 4&lt;/span>&lt;span class="cl">&lt;span class="s">const db = admin.firestore();
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 5&lt;/span>&lt;span class="cl">&lt;span class="s">const FieldValue = admin.firestore.FieldValue;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 6&lt;/span>&lt;span class="cl">&lt;span class="s">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 7&lt;/span>&lt;span class="cl">&lt;span class="s">const NUM_SHARDS = 10;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 8&lt;/span>&lt;span class="cl">&lt;span class="s">const counterRef = db.collection(&amp;#39;counters&amp;#39;).doc(&amp;#39;likes&amp;#39;);
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 9&lt;/span>&lt;span class="cl">&lt;span class="s">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">10&lt;/span>&lt;span class="cl">&lt;span class="s">async function createCounter() {
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">11&lt;/span>&lt;span class="cl">&lt;span class="s"> const batch = db.batch();
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">12&lt;/span>&lt;span class="cl">&lt;span class="s"> for (let i = 0; i &amp;lt; NUM_SHARDS; i++) {
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">13&lt;/span>&lt;span class="cl">&lt;span class="s"> batch.set(counterRef.collection(&amp;#39;shards&amp;#39;).doc(String(i)), { count: 0 });
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">14&lt;/span>&lt;span class="cl">&lt;span class="s"> }
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">15&lt;/span>&lt;span class="cl">&lt;span class="s"> await batch.commit();
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">16&lt;/span>&lt;span class="cl">&lt;span class="s">}
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">17&lt;/span>&lt;span class="cl">&lt;span class="s">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">18&lt;/span>&lt;span class="cl">&lt;span class="s">async function incrementOnce() {
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">19&lt;/span>&lt;span class="cl">&lt;span class="s"> const shardId = Math.floor(Math.random() * NUM_SHARDS);
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">20&lt;/span>&lt;span class="cl">&lt;span class="s"> await counterRef.collection(&amp;#39;shards&amp;#39;).doc(String(shardId))
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">21&lt;/span>&lt;span class="cl">&lt;span class="s"> .set({ count: FieldValue.increment(1) }, { merge: true });
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">22&lt;/span>&lt;span class="cl">&lt;span class="s">}
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">23&lt;/span>&lt;span class="cl">&lt;span class="s">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">24&lt;/span>&lt;span class="cl">&lt;span class="s">async function getCount() {
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">25&lt;/span>&lt;span class="cl">&lt;span class="s"> const snap = await counterRef.collection(&amp;#39;shards&amp;#39;).get();
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">26&lt;/span>&lt;span class="cl">&lt;span class="s"> let total = 0;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">27&lt;/span>&lt;span class="cl">&lt;span class="s"> const perShard = {};
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">28&lt;/span>&lt;span class="cl">&lt;span class="s"> snap.forEach((s) =&amp;gt; { perShard[s.id] = s.data().count; total += s.data().count; });
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">29&lt;/span>&lt;span class="cl">&lt;span class="s"> return { total, perShard };
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">30&lt;/span>&lt;span class="cl">&lt;span class="s">}
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">31&lt;/span>&lt;span class="cl">&lt;span class="s">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">32&lt;/span>&lt;span class="cl">&lt;span class="s">module.exports = { createCounter, incrementOnce, getCount, NUM_SHARDS };
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">33&lt;/span>&lt;span class="cl">&lt;span class="s">JS&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>三個設計點對應 deep article：用 &lt;code>FieldValue.increment(1)&lt;/code> 而非讀-改-寫（避開 race）；隨機選 shard 讓寫入均勻打散；讀取要讀 N 個 shard 加總（這是分片的代價）。&lt;/p></description><content:encoded><![CDATA[<blockquote>
<p>本文是 <a href="/blog/backend/01-database/vendors/firestore/hands-on/" data-link-title="Firestore Hands-on 操作路線" data-link-desc="用 Firebase Emulator Suite 在本地演練 Firestore：emulator quickstart、Security Rules 單元測試、distributed counter 分片計數，全程零雲端成本、可重跑、產出可驗證 artifact">Firestore Hands-on 操作路線</a> 的 lab，實作 <a href="/blog/backend/01-database/vendors/firestore/distributed-counter-high-frequency-write/" data-link-title="Firestore 高頻寫入與 distributed counter：單 document contention 邊界與分片計數" data-link-desc="Firestore 單一 document 有持續寫入的軟上限、高頻計數寫爆 contention 是常見事故；本文展開寫入 contention 的成因、distributed counter 分片計數的實作與讀取彙總、shard 數量與讀寫成本的取捨、五個高頻寫入踩坑，以及計數需求超過分片能處理時改走外部聚合的邊界">distributed counter 高頻寫入</a> deep article 的機制。前置環境見 <a href="/blog/backend/01-database/vendors/firestore/hands-on/local-emulator-quickstart/" data-link-title="Firestore Local Emulator Quickstart" data-link-desc="用 Firebase CLI 啟動 Firestore emulator、寫 firestore.rules、用 admin SDK seed 資料、跑 query baseline 與 cleanup，建立後續 Security Rules 與 distributed counter lab 共用的本地環境">Local emulator quickstart</a>。</p></blockquote>
<p>Firestore distributed counter lab 的核心責任是把「分片計數」從概念變成可觀察的寫入分佈與彙總結果。這個 lab 在 emulator 上建立 N 個 shard、隨機分片寫入大量 increment、檢查寫入是否均勻打散到各 shard、再讀取彙總驗證總和正確。</p>
<p>本文的驗收標準是：你能跑出一個 sharded counter、看到 N 個 shard 各自累積了大致均勻的 partial count、彙總後等於總寫入次數，並理解 emulator 能驗什麼、不能驗什麼。</p>
<h2 id="先講清楚-emulator-的邊界">先講清楚 emulator 的邊界</h2>
<p>這個 lab 驗證的是<strong>分片計數的機制正確性</strong>：寫入是否均勻分佈、彙總是否等於總和、讀取要讀幾個 document。它不驗證的是 <strong>contention 本身</strong>——emulator 不強制 production 的單 document 持續寫入軟上限，所以「不分片會寫爆」這件事在 emulator 跑不出來。contention 是 production 的規模特性，要在雲端真實負載下才會出現。</p>
<p>這個分界本身是要學的判讀：emulator 證明「分片計數做對了」，雲端負載測試才證明「不分片會撞牆」。把兩者混為一談會誤以為 emulator 全綠就代表 production 安全。</p>
<h2 id="lab-環境">Lab 環境</h2>
<p>沿用 <a href="/blog/backend/01-database/vendors/firestore/hands-on/local-emulator-quickstart/" data-link-title="Firestore Local Emulator Quickstart" data-link-desc="用 Firebase CLI 啟動 Firestore emulator、寫 firestore.rules、用 admin SDK seed 資料、跑 query baseline 與 cleanup，建立後續 Security Rules 與 distributed counter lab 共用的本地環境">quickstart</a> 的工作區與 emulator。確認 emulator 在跑（另一個 terminal）。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="nb">cd</span> /tmp/firestore-lab
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"># 確認 emulator 已啟動：firebase emulators:start --only firestore --project demo-firestore-lab</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="nb">export</span> <span class="nv">FIRESTORE_EMULATOR_HOST</span><span class="o">=</span>localhost:8080</span></span></code></pre></div><h2 id="實作-sharded-counter">實作 sharded counter</h2>
<p>counter 的核心責任是把一個邏輯計數拆成 N 個 shard document。寫入時隨機挑 shard <code>increment(1)</code>，讀取時加總所有 shard。這份 script 用 admin SDK 直接對 emulator 操作。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl">cat &gt; counter.js <span class="s">&lt;&lt;&#39;JS&#39;
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="s">const admin = require(&#39;firebase-admin&#39;);
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="s">admin.initializeApp({ projectId: &#39;demo-firestore-lab&#39; });
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="s">const db = admin.firestore();
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="s">const FieldValue = admin.firestore.FieldValue;
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="s">
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="s">const NUM_SHARDS = 10;
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="s">const counterRef = db.collection(&#39;counters&#39;).doc(&#39;likes&#39;);
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="s">
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="s">async function createCounter() {
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="s">  const batch = db.batch();
</span></span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="s">  for (let i = 0; i &lt; NUM_SHARDS; i++) {
</span></span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="s">    batch.set(counterRef.collection(&#39;shards&#39;).doc(String(i)), { count: 0 });
</span></span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="s">  }
</span></span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="s">  await batch.commit();
</span></span></span><span class="line"><span class="ln">16</span><span class="cl"><span class="s">}
</span></span></span><span class="line"><span class="ln">17</span><span class="cl"><span class="s">
</span></span></span><span class="line"><span class="ln">18</span><span class="cl"><span class="s">async function incrementOnce() {
</span></span></span><span class="line"><span class="ln">19</span><span class="cl"><span class="s">  const shardId = Math.floor(Math.random() * NUM_SHARDS);
</span></span></span><span class="line"><span class="ln">20</span><span class="cl"><span class="s">  await counterRef.collection(&#39;shards&#39;).doc(String(shardId))
</span></span></span><span class="line"><span class="ln">21</span><span class="cl"><span class="s">    .set({ count: FieldValue.increment(1) }, { merge: true });
</span></span></span><span class="line"><span class="ln">22</span><span class="cl"><span class="s">}
</span></span></span><span class="line"><span class="ln">23</span><span class="cl"><span class="s">
</span></span></span><span class="line"><span class="ln">24</span><span class="cl"><span class="s">async function getCount() {
</span></span></span><span class="line"><span class="ln">25</span><span class="cl"><span class="s">  const snap = await counterRef.collection(&#39;shards&#39;).get();
</span></span></span><span class="line"><span class="ln">26</span><span class="cl"><span class="s">  let total = 0;
</span></span></span><span class="line"><span class="ln">27</span><span class="cl"><span class="s">  const perShard = {};
</span></span></span><span class="line"><span class="ln">28</span><span class="cl"><span class="s">  snap.forEach((s) =&gt; { perShard[s.id] = s.data().count; total += s.data().count; });
</span></span></span><span class="line"><span class="ln">29</span><span class="cl"><span class="s">  return { total, perShard };
</span></span></span><span class="line"><span class="ln">30</span><span class="cl"><span class="s">}
</span></span></span><span class="line"><span class="ln">31</span><span class="cl"><span class="s">
</span></span></span><span class="line"><span class="ln">32</span><span class="cl"><span class="s">module.exports = { createCounter, incrementOnce, getCount, NUM_SHARDS };
</span></span></span><span class="line"><span class="ln">33</span><span class="cl"><span class="s">JS</span></span></span></code></pre></div><p>三個設計點對應 deep article：用 <code>FieldValue.increment(1)</code> 而非讀-改-寫（避開 race）；隨機選 shard 讓寫入均勻打散；讀取要讀 N 個 shard 加總（這是分片的代價）。</p>
<h2 id="跑寫入並觀察分佈">跑寫入並觀察分佈</h2>
<p>driver 的核心責任是製造大量 increment、然後檢查寫入是否均勻落在各 shard。均勻分佈是分片有效的前提——若 shard 選擇有偏，熱點會在某幾個 shard 復現。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl">cat &gt; run.js <span class="s">&lt;&lt;&#39;JS&#39;
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="s">const { createCounter, incrementOnce, getCount, NUM_SHARDS } = require(&#39;./counter&#39;);
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="s">
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="s">const TOTAL_WRITES = 1000;
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="s">
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="s">async function main() {
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="s">  await createCounter();
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="s">  console.log(`created ${NUM_SHARDS} shards`);
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="s">
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="s">  // 製造 1000 次 increment
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="s">  const tasks = [];
</span></span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="s">  for (let i = 0; i &lt; TOTAL_WRITES; i++) tasks.push(incrementOnce());
</span></span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="s">  await Promise.all(tasks);
</span></span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="s">
</span></span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="s">  const { total, perShard } = await getCount();
</span></span></span><span class="line"><span class="ln">16</span><span class="cl"><span class="s">  console.log(&#39;per-shard counts:&#39;, perShard);
</span></span></span><span class="line"><span class="ln">17</span><span class="cl"><span class="s">  console.log(`total = ${total} (expected ${TOTAL_WRITES})`);
</span></span></span><span class="line"><span class="ln">18</span><span class="cl"><span class="s">
</span></span></span><span class="line"><span class="ln">19</span><span class="cl"><span class="s">  // 均勻度檢查：每個 shard 期望 ~100，看極差
</span></span></span><span class="line"><span class="ln">20</span><span class="cl"><span class="s">  const counts = Object.values(perShard);
</span></span></span><span class="line"><span class="ln">21</span><span class="cl"><span class="s">  const min = Math.min(...counts), max = Math.max(...counts);
</span></span></span><span class="line"><span class="ln">22</span><span class="cl"><span class="s">  console.log(`min=${min} max=${max} spread=${max - min} (expected mean ~${TOTAL_WRITES / NUM_SHARDS})`);
</span></span></span><span class="line"><span class="ln">23</span><span class="cl"><span class="s">}
</span></span></span><span class="line"><span class="ln">24</span><span class="cl"><span class="s">main().then(() =&gt; process.exit(0));
</span></span></span><span class="line"><span class="ln">25</span><span class="cl"><span class="s">JS</span>
</span></span><span class="line"><span class="ln">26</span><span class="cl">
</span></span><span class="line"><span class="ln">27</span><span class="cl"><span class="nb">export</span> <span class="nv">FIRESTORE_EMULATOR_HOST</span><span class="o">=</span>localhost:8080
</span></span><span class="line"><span class="ln">28</span><span class="cl">node run.js</span></span></code></pre></div><p>預期輸出類似（實際數字每次隨機分佈而異）：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">created 10 shards
</span></span><span class="line"><span class="ln">2</span><span class="cl">per-shard counts: { &#39;0&#39;: 98, &#39;1&#39;: 105, &#39;2&#39;: 92, ... }
</span></span><span class="line"><span class="ln">3</span><span class="cl">total = 1000 (expected 1000)
</span></span><span class="line"><span class="ln">4</span><span class="cl">min=88 max=112 spread=24 (expected mean ~100)</span></span></code></pre></div><p>兩個驗收點：<code>total</code> 等於總寫入次數（彙總正確、沒有 increment 遺失），以及各 shard 的 count 大致落在均值附近（隨機分佈均勻、沒有單一 shard 吸走大部分寫入）。</p>
<h2 id="對照實驗讀取成本隨-shard-數成長">對照實驗：讀取成本隨 shard 數成長</h2>
<p>讀取的核心代價是讀 N 個 document。把 <code>NUM_SHARDS</code> 改大（例如 100）重跑，<code>getCount</code> 要讀的 document 從 10 變 100——這就是 deep article 講的「寫入便宜了、讀取乘以 N」的取捨。在 production 這直接反映成 read 計費。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># 編輯 counter.js 把 NUM_SHARDS 改為 100、重跑 run.js</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"># 觀察 per-shard counts 物件變成 100 個 key、getCount 讀取量 10x</span></span></span></code></pre></div><p>這個對照讓「shard 數是寫入分散與讀取成本的取捨」從文字變成可觀察：多 shard 寫入更分散（每 shard 更少），但讀取要加總更多筆。高寫入高讀取的場景該配 summary 彙總（deep article 的進階手段），而非無限加 shard。</p>
<h2 id="artifact-與驗收">Artifact 與驗收</h2>
<table>
  <thead>
      <tr>
          <th>Artifact</th>
          <th>來源</th>
          <th>驗收</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>counter 實作</td>
          <td><code>counter.js</code></td>
          <td><code>increment</code> 分片寫入 + 彙總讀取</td>
      </tr>
      <tr>
          <td>寫入分佈</td>
          <td><code>run.js</code> output</td>
          <td>total = 寫入次數、各 shard 均勻</td>
      </tr>
      <tr>
          <td>讀寫取捨</td>
          <td>NUM_SHARDS 對照</td>
          <td>shard 數↑ → 讀取 document 數↑</td>
      </tr>
  </tbody>
</table>
<h2 id="回到-production-判讀">回到 production 判讀</h2>
<p>emulator lab 證明了機制正確，但三個 production 判讀要回雲端確認：單 document 寫入軟上限（決定 shard 數要多少）、read 計費（決定 shard 數別太多 / 要不要 summary）、shard 選擇在真實流量下是否仍均勻。把 emulator 的機制驗證當第一道關，production 的容量與成本判讀見 <a href="/blog/backend/01-database/vendors/firestore/distributed-counter-high-frequency-write/#%e5%ae%b9%e9%87%8f%e8%88%87%e8%a7%80%e6%b8%acshard-%e6%95%b8%e7%9a%84%e4%bc%b0%e7%ae%97%e8%88%87%e7%9b%a3%e6%8e%a7" data-link-title="Firestore 高頻寫入與 distributed counter：單 document contention 邊界與分片計數" data-link-desc="Firestore 單一 document 有持續寫入的軟上限、高頻計數寫爆 contention 是常見事故；本文展開寫入 contention 的成因、distributed counter 分片計數的實作與讀取彙總、shard 數量與讀寫成本的取捨、五個高頻寫入踩坑，以及計數需求超過分片能處理時改走外部聚合的邊界">deep article 的容量段</a>。</p>
<h2 id="cleanup">Cleanup</h2>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># 停 emulator（Ctrl-C）或清整個工作區</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">rm -rf /tmp/firestore-lab</span></span></code></pre></div><h2 id="引用路徑">引用路徑</h2>
<ul>
<li>上游：<a href="/blog/backend/01-database/vendors/firestore/hands-on/" data-link-title="Firestore Hands-on 操作路線" data-link-desc="用 Firebase Emulator Suite 在本地演練 Firestore：emulator quickstart、Security Rules 單元測試、distributed counter 分片計數，全程零雲端成本、可重跑、產出可驗證 artifact">Firestore Hands-on 操作路線</a></li>
<li>Deep article：<a href="/blog/backend/01-database/vendors/firestore/distributed-counter-high-frequency-write/" data-link-title="Firestore 高頻寫入與 distributed counter：單 document contention 邊界與分片計數" data-link-desc="Firestore 單一 document 有持續寫入的軟上限、高頻計數寫爆 contention 是常見事故；本文展開寫入 contention 的成因、distributed counter 分片計數的實作與讀取彙總、shard 數量與讀寫成本的取捨、五個高頻寫入踩坑，以及計數需求超過分片能處理時改走外部聚合的邊界">高頻寫入與 distributed counter</a></li>
<li>一致性邊界：<a href="/blog/backend/01-database/transaction-boundary/" data-link-title="1.3 Transaction 與一致性邊界" data-link-desc="交易邊界、isolation level、retry 策略、distributed transaction（2PC、Saga）與跨 region 強一致取捨">1.3 transaction 與一致性邊界</a></li>
<li>官方：<a href="https://firebase.google.com/docs/firestore/solutions/counters">Distributed counters</a>、<a href="https://firebase.google.com/docs/firestore/best-practices">Firestore best practices</a></li>
</ul>
]]></content:encoded></item><item><title>Firestore Hands-on 操作路線</title><link>https://tarrragon.github.io/blog/backend/01-database/vendors/firestore/hands-on/</link><pubDate>Tue, 16 Jun 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/01-database/vendors/firestore/hands-on/</guid><description>&lt;p>Firestore hands-on 操作路線的核心責任是把 deep article 的機制判讀轉成可在本地演練的操作。這一層全程跑在 &lt;a href="https://firebase.google.com/docs/emulator-suite">Firebase Emulator Suite&lt;/a> 上——本地、免費、不碰雲端專案、不產生計費，讓讀者能建立資料、寫規則測試、跑分片計數，並取得 query output、測試結果與 artifact，而不只停在概念。&lt;/p>
&lt;h2 id="為什麼用-emulator">為什麼用 emulator&lt;/h2>
&lt;p>Firestore 的 client 直連模型讓「在本地驗證」變得重要：規則寫錯是資安漏洞、查詢設計錯是成本事故，這些都該在進雲端前用真實求值引擎驗過。Emulator Suite 提供與雲端一致的 Firestore 行為與 Security Rules 求值引擎，是規則測試的官方推薦環境。要留意的邊界是——emulator 模擬功能行為，但不模擬計費與部分 production 規模限制（單 document 寫入軟上限、連線天花板）。涉及成本與規模的判讀仍以雲端為準，emulator lab 會在對應處標明。&lt;/p>
&lt;h2 id="章節列表">章節列表&lt;/h2>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>章節&lt;/th>
 &lt;th>主題&lt;/th>
 &lt;th>產出 artifact&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>&lt;a href="local-emulator-quickstart/">Local emulator quickstart&lt;/a>&lt;/td>
 &lt;td>emulator 啟動、&lt;code>firestore.rules&lt;/code>、admin seed、query baseline&lt;/td>
 &lt;td>emulator config、seed script、query output&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;a href="security-rules-test-lab/">Security Rules test lab&lt;/a>&lt;/td>
 &lt;td>&lt;code>@firebase/rules-unit-testing&lt;/code>、放行 / 拒絕斷言、CI 整合&lt;/td>
 &lt;td>rules 測試檔、pass / fail 結果、emulators:exec log&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;a href="distributed-counter-lab/">Distributed counter lab&lt;/a>&lt;/td>
 &lt;td>分片計數寫入、shard 分佈、讀取彙總、contention 的 production 邊界&lt;/td>
 &lt;td>counter script、shard 分佈 output、彙總驗證&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;h2 id="設計原則">設計原則&lt;/h2>
&lt;p>Firestore hands-on 章節以「進雲端前先驗」為中心。操作指令只在能產出 artifact 時出現；每篇都要回答 emulator 在哪裡跑、需要哪些 input、怎麼知道操作成功（query output / 測試斷言 / shard 分佈），以及哪些 production 特性（計費、寫入上限）emulator 不負責、要回雲端確認。&lt;/p>
&lt;h2 id="引用路徑">引用路徑&lt;/h2>
&lt;ul>
&lt;li>上游：&lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/firestore/" data-link-title="Firestore" data-link-desc="Firebase / Google Cloud 的 serverless document database、collection / document 模型、client 直連 &amp;#43; Security Rules、realtime listener 與 offline 同步、BaaS bundle 的資料層面">Firestore overview&lt;/a>&lt;/li>
&lt;li>Deep article：&lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/firestore/security-rules-authz-modeling/" data-link-title="Firestore Security Rules 授權建模與可測試化：把規則當程式碼治理" data-link-desc="Firestore client 直連模型把整個授權控制面壓在 Security Rules 這套 DSL 裡；本文展開規則的求值模型、把授權拆成可組合 function、用 emulator 寫單元測試、五個把規則寫成資安漏洞的 production 踩坑，以及規則複雜度撞牆時把授權拉回後端的邊界">Security Rules 授權建模&lt;/a> / &lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/firestore/distributed-counter-high-frequency-write/" data-link-title="Firestore 高頻寫入與 distributed counter：單 document contention 邊界與分片計數" data-link-desc="Firestore 單一 document 有持續寫入的軟上限、高頻計數寫爆 contention 是常見事故；本文展開寫入 contention 的成因、distributed counter 分片計數的實作與讀取彙總、shard 數量與讀寫成本的取捨、五個高頻寫入踩坑，以及計數需求超過分片能處理時改走外部聚合的邊界">distributed counter 高頻寫入&lt;/a>&lt;/li>
&lt;li>發布證據：&lt;a href="https://tarrragon.github.io/blog/backend/06-reliability/release-gate/" data-link-title="6.8 Release Gate 與變更節奏" data-link-desc="把驗證、migration、相容性納入放行判準">6.8 release gate&lt;/a>（規則測試接進 gate）&lt;/li>
&lt;li>官方：&lt;a href="https://firebase.google.com/docs/emulator-suite">Emulator Suite&lt;/a>、&lt;a href="https://firebase.google.com/docs/emulator-suite/connect_firestore">Connect to Firestore emulator&lt;/a>&lt;/li>
&lt;/ul></description><content:encoded><![CDATA[<p>Firestore hands-on 操作路線的核心責任是把 deep article 的機制判讀轉成可在本地演練的操作。這一層全程跑在 <a href="https://firebase.google.com/docs/emulator-suite">Firebase Emulator Suite</a> 上——本地、免費、不碰雲端專案、不產生計費，讓讀者能建立資料、寫規則測試、跑分片計數，並取得 query output、測試結果與 artifact，而不只停在概念。</p>
<h2 id="為什麼用-emulator">為什麼用 emulator</h2>
<p>Firestore 的 client 直連模型讓「在本地驗證」變得重要：規則寫錯是資安漏洞、查詢設計錯是成本事故，這些都該在進雲端前用真實求值引擎驗過。Emulator Suite 提供與雲端一致的 Firestore 行為與 Security Rules 求值引擎，是規則測試的官方推薦環境。要留意的邊界是——emulator 模擬功能行為，但不模擬計費與部分 production 規模限制（單 document 寫入軟上限、連線天花板）。涉及成本與規模的判讀仍以雲端為準，emulator lab 會在對應處標明。</p>
<h2 id="章節列表">章節列表</h2>
<table>
  <thead>
      <tr>
          <th>章節</th>
          <th>主題</th>
          <th>產出 artifact</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><a href="local-emulator-quickstart/">Local emulator quickstart</a></td>
          <td>emulator 啟動、<code>firestore.rules</code>、admin seed、query baseline</td>
          <td>emulator config、seed script、query output</td>
      </tr>
      <tr>
          <td><a href="security-rules-test-lab/">Security Rules test lab</a></td>
          <td><code>@firebase/rules-unit-testing</code>、放行 / 拒絕斷言、CI 整合</td>
          <td>rules 測試檔、pass / fail 結果、emulators:exec log</td>
      </tr>
      <tr>
          <td><a href="distributed-counter-lab/">Distributed counter lab</a></td>
          <td>分片計數寫入、shard 分佈、讀取彙總、contention 的 production 邊界</td>
          <td>counter script、shard 分佈 output、彙總驗證</td>
      </tr>
  </tbody>
</table>
<h2 id="設計原則">設計原則</h2>
<p>Firestore hands-on 章節以「進雲端前先驗」為中心。操作指令只在能產出 artifact 時出現；每篇都要回答 emulator 在哪裡跑、需要哪些 input、怎麼知道操作成功（query output / 測試斷言 / shard 分佈），以及哪些 production 特性（計費、寫入上限）emulator 不負責、要回雲端確認。</p>
<h2 id="引用路徑">引用路徑</h2>
<ul>
<li>上游：<a href="/blog/backend/01-database/vendors/firestore/" data-link-title="Firestore" data-link-desc="Firebase / Google Cloud 的 serverless document database、collection / document 模型、client 直連 &#43; Security Rules、realtime listener 與 offline 同步、BaaS bundle 的資料層面">Firestore overview</a></li>
<li>Deep article：<a href="/blog/backend/01-database/vendors/firestore/security-rules-authz-modeling/" data-link-title="Firestore Security Rules 授權建模與可測試化：把規則當程式碼治理" data-link-desc="Firestore client 直連模型把整個授權控制面壓在 Security Rules 這套 DSL 裡；本文展開規則的求值模型、把授權拆成可組合 function、用 emulator 寫單元測試、五個把規則寫成資安漏洞的 production 踩坑，以及規則複雜度撞牆時把授權拉回後端的邊界">Security Rules 授權建模</a> / <a href="/blog/backend/01-database/vendors/firestore/distributed-counter-high-frequency-write/" data-link-title="Firestore 高頻寫入與 distributed counter：單 document contention 邊界與分片計數" data-link-desc="Firestore 單一 document 有持續寫入的軟上限、高頻計數寫爆 contention 是常見事故；本文展開寫入 contention 的成因、distributed counter 分片計數的實作與讀取彙總、shard 數量與讀寫成本的取捨、五個高頻寫入踩坑，以及計數需求超過分片能處理時改走外部聚合的邊界">distributed counter 高頻寫入</a></li>
<li>發布證據：<a href="/blog/backend/06-reliability/release-gate/" data-link-title="6.8 Release Gate 與變更節奏" data-link-desc="把驗證、migration、相容性納入放行判準">6.8 release gate</a>（規則測試接進 gate）</li>
<li>官方：<a href="https://firebase.google.com/docs/emulator-suite">Emulator Suite</a>、<a href="https://firebase.google.com/docs/emulator-suite/connect_firestore">Connect to Firestore emulator</a></li>
</ul>
]]></content:encoded></item><item><title>Firestore Local Emulator Quickstart</title><link>https://tarrragon.github.io/blog/backend/01-database/vendors/firestore/hands-on/local-emulator-quickstart/</link><pubDate>Tue, 16 Jun 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/01-database/vendors/firestore/hands-on/local-emulator-quickstart/</guid><description>&lt;blockquote>
&lt;p>本文是 &lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/firestore/hands-on/" data-link-title="Firestore Hands-on 操作路線" data-link-desc="用 Firebase Emulator Suite 在本地演練 Firestore：emulator quickstart、Security Rules 單元測試、distributed counter 分片計數，全程零雲端成本、可重跑、產出可驗證 artifact">Firestore Hands-on 操作路線&lt;/a> 的基礎 lab。指令以 &lt;a href="https://firebase.google.com/docs/cli">Firebase CLI 文件&lt;/a> 與 &lt;a href="https://firebase.google.com/docs/emulator-suite">Emulator Suite 文件&lt;/a> 為準、最後檢查日 2026-06-16。&lt;/p>&lt;/blockquote>
&lt;p>Firestore local emulator quickstart 的核心責任是建立後續 Security Rules 測試與 distributed counter lab 共用的本地環境。這個 lab 把 Firestore 從抽象服務轉成可觀察的 emulator、規則檔、seed 資料與 query 結果，全程不碰雲端專案。&lt;/p>
&lt;p>本文的驗收標準是：你能在本地啟動 Firestore emulator、用 admin SDK 寫入並查詢一組 seed 資料、看到 emulator UI 裡的資料，並知道 cleanup 路徑。&lt;/p>
&lt;h2 id="lab-環境與前置">Lab 環境與前置&lt;/h2>
&lt;p>Lab 在本地資料夾跑，需要 Node.js 與 Firebase CLI。以下命令建立一個可刪除的工作區並裝好工具。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">mkdir -p /tmp/firestore-lab
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="nb">cd&lt;/span> /tmp/firestore-lab
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">&lt;span class="c1"># Firebase CLI（已裝可跳過）；用 npx 也可避免全域安裝&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl">npm install -g firebase-tools
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">6&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">7&lt;/span>&lt;span class="cl">&lt;span class="c1"># 本 lab 的 Node 依賴&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">8&lt;/span>&lt;span class="cl">npm init -y
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">9&lt;/span>&lt;span class="cl">npm install firebase-admin&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>emulator 需要 Java runtime（Firestore emulator 跑在 JVM 上）。&lt;code>java -version&lt;/code> 確認存在；缺的話先裝 JDK 再繼續。驗收 artifact 是 &lt;code>/tmp/firestore-lab&lt;/code> 工作區。&lt;/p>
&lt;h2 id="emulator-設定">Emulator 設定&lt;/h2>
&lt;p>&lt;code>firebase.json&lt;/code> 的核心責任是宣告要啟動哪些 emulator 與對應 port。這裡只開 Firestore 與 UI，不需要真實 Firebase 專案——emulator 用一個 demo project id 即可，&lt;code>demo-&lt;/code> 前綴讓 CLI 知道這是純本地、不連雲端。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln"> 1&lt;/span>&lt;span class="cl">cat &amp;gt; firebase.json &lt;span class="s">&amp;lt;&amp;lt;&amp;#39;JSON&amp;#39;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 2&lt;/span>&lt;span class="cl">&lt;span class="s">{
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 3&lt;/span>&lt;span class="cl">&lt;span class="s"> &amp;#34;emulators&amp;#34;: {
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 4&lt;/span>&lt;span class="cl">&lt;span class="s"> &amp;#34;firestore&amp;#34;: { &amp;#34;port&amp;#34;: 8080 },
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 5&lt;/span>&lt;span class="cl">&lt;span class="s"> &amp;#34;ui&amp;#34;: { &amp;#34;enabled&amp;#34;: true, &amp;#34;port&amp;#34;: 4000 }
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 6&lt;/span>&lt;span class="cl">&lt;span class="s"> },
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 7&lt;/span>&lt;span class="cl">&lt;span class="s"> &amp;#34;firestore&amp;#34;: {
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 8&lt;/span>&lt;span class="cl">&lt;span class="s"> &amp;#34;rules&amp;#34;: &amp;#34;firestore.rules&amp;#34;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 9&lt;/span>&lt;span class="cl">&lt;span class="s"> }
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">10&lt;/span>&lt;span class="cl">&lt;span class="s">}
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">11&lt;/span>&lt;span class="cl">&lt;span class="s">JSON&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="baseline-規則">Baseline 規則&lt;/h2>
&lt;p>&lt;code>firestore.rules&lt;/code> 的核心責任是定義授權。Quickstart 先用一組明確的 owner-scoped 規則（不是 &lt;code>allow read, write: if true&lt;/code>，那是 &lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/firestore/security-rules-authz-modeling/" data-link-title="Firestore Security Rules 授權建模與可測試化：把規則當程式碼治理" data-link-desc="Firestore client 直連模型把整個授權控制面壓在 Security Rules 這套 DSL 裡；本文展開規則的求值模型、把授權拆成可組合 function、用 emulator 寫單元測試、五個把規則寫成資安漏洞的 production 踩坑，以及規則複雜度撞牆時把授權拉回後端的邊界">deep article Case 1&lt;/a> 的漏洞）。這份規則後續在 &lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/firestore/hands-on/security-rules-test-lab/" data-link-title="Firestore Security Rules Test Lab" data-link-desc="用 @firebase/rules-unit-testing 在 emulator 上把 Security Rules 寫成自動化測試：放行 / 越權拒絕 / 未登入拒絕 / 欄位竄改拒絕四類斷言、firebase emulators:exec 在 CI 跑、把規則測試接進 release gate">Security Rules test lab&lt;/a> 會被測試覆蓋。&lt;/p></description><content:encoded><![CDATA[<blockquote>
<p>本文是 <a href="/blog/backend/01-database/vendors/firestore/hands-on/" data-link-title="Firestore Hands-on 操作路線" data-link-desc="用 Firebase Emulator Suite 在本地演練 Firestore：emulator quickstart、Security Rules 單元測試、distributed counter 分片計數，全程零雲端成本、可重跑、產出可驗證 artifact">Firestore Hands-on 操作路線</a> 的基礎 lab。指令以 <a href="https://firebase.google.com/docs/cli">Firebase CLI 文件</a> 與 <a href="https://firebase.google.com/docs/emulator-suite">Emulator Suite 文件</a> 為準、最後檢查日 2026-06-16。</p></blockquote>
<p>Firestore local emulator quickstart 的核心責任是建立後續 Security Rules 測試與 distributed counter lab 共用的本地環境。這個 lab 把 Firestore 從抽象服務轉成可觀察的 emulator、規則檔、seed 資料與 query 結果，全程不碰雲端專案。</p>
<p>本文的驗收標準是：你能在本地啟動 Firestore emulator、用 admin SDK 寫入並查詢一組 seed 資料、看到 emulator UI 裡的資料，並知道 cleanup 路徑。</p>
<h2 id="lab-環境與前置">Lab 環境與前置</h2>
<p>Lab 在本地資料夾跑，需要 Node.js 與 Firebase CLI。以下命令建立一個可刪除的工作區並裝好工具。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">mkdir -p /tmp/firestore-lab
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="nb">cd</span> /tmp/firestore-lab
</span></span><span class="line"><span class="ln">3</span><span class="cl">
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="c1"># Firebase CLI（已裝可跳過）；用 npx 也可避免全域安裝</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">npm install -g firebase-tools
</span></span><span class="line"><span class="ln">6</span><span class="cl">
</span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="c1"># 本 lab 的 Node 依賴</span>
</span></span><span class="line"><span class="ln">8</span><span class="cl">npm init -y
</span></span><span class="line"><span class="ln">9</span><span class="cl">npm install firebase-admin</span></span></code></pre></div><p>emulator 需要 Java runtime（Firestore emulator 跑在 JVM 上）。<code>java -version</code> 確認存在；缺的話先裝 JDK 再繼續。驗收 artifact 是 <code>/tmp/firestore-lab</code> 工作區。</p>
<h2 id="emulator-設定">Emulator 設定</h2>
<p><code>firebase.json</code> 的核心責任是宣告要啟動哪些 emulator 與對應 port。這裡只開 Firestore 與 UI，不需要真實 Firebase 專案——emulator 用一個 demo project id 即可，<code>demo-</code> 前綴讓 CLI 知道這是純本地、不連雲端。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl">cat &gt; firebase.json <span class="s">&lt;&lt;&#39;JSON&#39;
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="s">{
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="s">  &#34;emulators&#34;: {
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="s">    &#34;firestore&#34;: { &#34;port&#34;: 8080 },
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="s">    &#34;ui&#34;: { &#34;enabled&#34;: true, &#34;port&#34;: 4000 }
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="s">  },
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="s">  &#34;firestore&#34;: {
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="s">    &#34;rules&#34;: &#34;firestore.rules&#34;
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="s">  }
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="s">}
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="s">JSON</span></span></span></code></pre></div><h2 id="baseline-規則">Baseline 規則</h2>
<p><code>firestore.rules</code> 的核心責任是定義授權。Quickstart 先用一組明確的 owner-scoped 規則（不是 <code>allow read, write: if true</code>，那是 <a href="/blog/backend/01-database/vendors/firestore/security-rules-authz-modeling/" data-link-title="Firestore Security Rules 授權建模與可測試化：把規則當程式碼治理" data-link-desc="Firestore client 直連模型把整個授權控制面壓在 Security Rules 這套 DSL 裡；本文展開規則的求值模型、把授權拆成可組合 function、用 emulator 寫單元測試、五個把規則寫成資安漏洞的 production 踩坑，以及規則複雜度撞牆時把授權拉回後端的邊界">deep article Case 1</a> 的漏洞）。這份規則後續在 <a href="/blog/backend/01-database/vendors/firestore/hands-on/security-rules-test-lab/" data-link-title="Firestore Security Rules Test Lab" data-link-desc="用 @firebase/rules-unit-testing 在 emulator 上把 Security Rules 寫成自動化測試：放行 / 越權拒絕 / 未登入拒絕 / 欄位竄改拒絕四類斷言、firebase emulators:exec 在 CI 跑、把規則測試接進 release gate">Security Rules test lab</a> 會被測試覆蓋。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl">cat &gt; firestore.rules <span class="s">&lt;&lt;&#39;RULES&#39;
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="s">rules_version = &#39;2&#39;;
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="s">service cloud.firestore {
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="s">  match /databases/{database}/documents {
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="s">    match /notes/{noteId} {
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="s">      allow read: if request.auth != null
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="s">                  &amp;&amp; resource.data.ownerId == request.auth.uid;
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="s">      allow create: if request.auth != null
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="s">                    &amp;&amp; request.resource.data.ownerId == request.auth.uid;
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="s">      allow update, delete: if request.auth != null
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="s">                            &amp;&amp; resource.data.ownerId == request.auth.uid;
</span></span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="s">    }
</span></span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="s">  }
</span></span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="s">}
</span></span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="s">RULES</span></span></span></code></pre></div><h2 id="啟動-emulator">啟動 emulator</h2>
<p>啟動 emulator 的核心責任是讓本地有一個可寫可查的 Firestore。用 demo project id 啟動，emulator UI 在 <code>http://localhost:4000</code> 可看到資料。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">firebase emulators:start --only firestore --project demo-firestore-lab</span></span></code></pre></div><p>這個指令會 foreground 跑住 emulator。保持它開著，另開一個 terminal 做 seed 與 query。終端輸出會印出 Firestore emulator 的位址（預設 <code>localhost:8080</code>）與 UI 位址。</p>
<h2 id="seed-資料admin-sdk-繞過規則">Seed 資料（admin SDK 繞過規則）</h2>
<p>Seed 的核心責任是建立可重跑的測試資料。admin SDK 連到 emulator 時繞過 Security Rules（模擬後端的特權寫入），適合種資料。關鍵是設 <code>FIRESTORE_EMULATOR_HOST</code> 環境變數——有了它，admin SDK 的寫入全部導向 emulator、不需要任何雲端 credential。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl">cat &gt; seed.js <span class="s">&lt;&lt;&#39;JS&#39;
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="s">const admin = require(&#39;firebase-admin&#39;);
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="s">admin.initializeApp({ projectId: &#39;demo-firestore-lab&#39; });
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="s">const db = admin.firestore();
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="s">
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="s">async function main() {
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="s">  await db.collection(&#39;notes&#39;).doc(&#39;n1&#39;).set({
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="s">    ownerId: &#39;alice&#39;, text: &#39;Alice first note&#39;, createdAt: Date.now(),
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="s">  });
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="s">  await db.collection(&#39;notes&#39;).doc(&#39;n2&#39;).set({
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="s">    ownerId: &#39;bob&#39;, text: &#39;Bob first note&#39;, createdAt: Date.now(),
</span></span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="s">  });
</span></span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="s">  console.log(&#39;seeded 2 notes&#39;);
</span></span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="s">}
</span></span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="s">main().then(() =&gt; process.exit(0));
</span></span></span><span class="line"><span class="ln">16</span><span class="cl"><span class="s">JS</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl">
</span></span><span class="line"><span class="ln">18</span><span class="cl"><span class="c1"># 在新 terminal、同 lab 目錄下</span>
</span></span><span class="line"><span class="ln">19</span><span class="cl"><span class="nb">export</span> <span class="nv">FIRESTORE_EMULATOR_HOST</span><span class="o">=</span>localhost:8080
</span></span><span class="line"><span class="ln">20</span><span class="cl">node seed.js</span></span></code></pre></div><p>預期輸出 <code>seeded 2 notes</code>。打開 <code>http://localhost:4000/firestore</code> 應看到 <code>notes</code> collection 下兩筆 document。</p>
<h2 id="query-baseline">Query baseline</h2>
<p>Query 的核心責任是確認資料可讀、access pattern 入口可用。admin SDK 同樣繞過規則，這裡驗證的是資料與查詢本身（規則的放行 / 拒絕在下一個 lab 用 client context 驗）。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl">cat &gt; query.js <span class="s">&lt;&lt;&#39;JS&#39;
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="s">const admin = require(&#39;firebase-admin&#39;);
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="s">admin.initializeApp({ projectId: &#39;demo-firestore-lab&#39; });
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="s">const db = admin.firestore();
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="s">
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="s">async function main() {
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="s">  const snap = await db.collection(&#39;notes&#39;)
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="s">    .where(&#39;ownerId&#39;, &#39;==&#39;, &#39;alice&#39;).get();
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="s">  console.log(`alice notes: ${snap.size}`);
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="s">  snap.forEach((d) =&gt; console.log(d.id, d.data().text));
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="s">}
</span></span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="s">main().then(() =&gt; process.exit(0));
</span></span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="s">JS</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">
</span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="nb">export</span> <span class="nv">FIRESTORE_EMULATOR_HOST</span><span class="o">=</span>localhost:8080
</span></span><span class="line"><span class="ln">16</span><span class="cl">node query.js</span></span></code></pre></div><p>預期輸出 <code>alice notes: 1</code> 與 <code>n1 Alice first note</code>。這證明 <code>where('ownerId', '==', ...)</code> 的 access pattern 成立——它也正是 client 端要自帶、好讓 owner-scoped 規則放行的查詢條件。</p>
<h2 id="artifact-與驗收">Artifact 與驗收</h2>
<table>
  <thead>
      <tr>
          <th>Artifact</th>
          <th>路徑 / 來源</th>
          <th>驗收</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>emulator config</td>
          <td><code>firebase.json</code></td>
          <td>Firestore + UI port 宣告</td>
      </tr>
      <tr>
          <td>規則檔</td>
          <td><code>firestore.rules</code></td>
          <td>owner-scoped、非 <code>if true</code></td>
      </tr>
      <tr>
          <td>seed 結果</td>
          <td><code>seed.js</code> output + UI</td>
          <td><code>notes/n1</code>、<code>notes/n2</code> 存在</td>
      </tr>
      <tr>
          <td>query 結果</td>
          <td><code>query.js</code> output</td>
          <td><code>alice notes: 1</code></td>
      </tr>
  </tbody>
</table>
<h2 id="cleanup">Cleanup</h2>
<p>Cleanup 的核心責任是讓 lab 可重跑。emulator 的資料在 process 結束時預設不持久化（除非設了 <code>--export-on-exit</code>），所以停掉 emulator 等於清空資料。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># 停掉 emulator：在 emulator terminal 按 Ctrl-C</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"># 移除整個工作區</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">rm -rf /tmp/firestore-lab</span></span></code></pre></div><p>若想保留 emulator 資料跨 session，啟動時加 <code>--import=./data --export-on-exit=./data</code>；lab 預設不持久化，保持每次乾淨起步。</p>
<p>完成本篇後，下一步進 <a href="/blog/backend/01-database/vendors/firestore/hands-on/security-rules-test-lab/" data-link-title="Firestore Security Rules Test Lab" data-link-desc="用 @firebase/rules-unit-testing 在 emulator 上把 Security Rules 寫成自動化測試：放行 / 越權拒絕 / 未登入拒絕 / 欄位竄改拒絕四類斷言、firebase emulators:exec 在 CI 跑、把規則測試接進 release gate">Security Rules test lab</a>（把上面的規則寫成自動化測試）或 <a href="/blog/backend/01-database/vendors/firestore/hands-on/distributed-counter-lab/" data-link-title="Firestore Distributed Counter Lab" data-link-desc="在 emulator 上實作 distributed counter：建立 N 個 shard、隨機分片寫入、觀察 shard 分佈是否均勻、讀取彙總驗證總和正確，並說明 contention 本身是 emulator 不模擬的 production 特性">Distributed counter lab</a>。</p>
<h2 id="引用路徑">引用路徑</h2>
<ul>
<li>上游：<a href="/blog/backend/01-database/vendors/firestore/hands-on/" data-link-title="Firestore Hands-on 操作路線" data-link-desc="用 Firebase Emulator Suite 在本地演練 Firestore：emulator quickstart、Security Rules 單元測試、distributed counter 分片計數，全程零雲端成本、可重跑、產出可驗證 artifact">Firestore Hands-on 操作路線</a></li>
<li>Deep article：<a href="/blog/backend/01-database/vendors/firestore/security-rules-authz-modeling/" data-link-title="Firestore Security Rules 授權建模與可測試化：把規則當程式碼治理" data-link-desc="Firestore client 直連模型把整個授權控制面壓在 Security Rules 這套 DSL 裡；本文展開規則的求值模型、把授權拆成可組合 function、用 emulator 寫單元測試、五個把規則寫成資安漏洞的 production 踩坑，以及規則複雜度撞牆時把授權拉回後端的邊界">Security Rules 授權建模</a></li>
<li>官方：<a href="https://firebase.google.com/docs/cli">Install Firebase CLI</a>、<a href="https://firebase.google.com/docs/emulator-suite/connect_firestore">Connect to Firestore emulator</a></li>
</ul>
]]></content:encoded></item><item><title>Firestore Security Rules Test Lab</title><link>https://tarrragon.github.io/blog/backend/01-database/vendors/firestore/hands-on/security-rules-test-lab/</link><pubDate>Tue, 16 Jun 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/01-database/vendors/firestore/hands-on/security-rules-test-lab/</guid><description>&lt;blockquote>
&lt;p>本文是 &lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/firestore/hands-on/" data-link-title="Firestore Hands-on 操作路線" data-link-desc="用 Firebase Emulator Suite 在本地演練 Firestore：emulator quickstart、Security Rules 單元測試、distributed counter 分片計數，全程零雲端成本、可重跑、產出可驗證 artifact">Firestore Hands-on 操作路線&lt;/a> 的 lab，實作 &lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/firestore/security-rules-authz-modeling/" data-link-title="Firestore Security Rules 授權建模與可測試化：把規則當程式碼治理" data-link-desc="Firestore client 直連模型把整個授權控制面壓在 Security Rules 這套 DSL 裡；本文展開規則的求值模型、把授權拆成可組合 function、用 emulator 寫單元測試、五個把規則寫成資安漏洞的 production 踩坑，以及規則複雜度撞牆時把授權拉回後端的邊界">Security Rules 授權建模&lt;/a> deep article 的測試方法。前置環境見 &lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/firestore/hands-on/local-emulator-quickstart/" data-link-title="Firestore Local Emulator Quickstart" data-link-desc="用 Firebase CLI 啟動 Firestore emulator、寫 firestore.rules、用 admin SDK seed 資料、跑 query baseline 與 cleanup，建立後續 Security Rules 與 distributed counter lab 共用的本地環境">Local emulator quickstart&lt;/a>。測試 API 以 &lt;a href="https://firebase.google.com/docs/rules/unit-tests">Rules unit testing 文件&lt;/a> 為準、最後檢查日 2026-06-16。&lt;/p>&lt;/blockquote>
&lt;p>Firestore Security Rules test lab 的核心責任是把授權規則變成可自動驗證的測試。規則是 client 直連模型的整個控制面，改一條就要證明沒開新洞——這個 lab 用 &lt;code>@firebase/rules-unit-testing&lt;/code> 在 emulator 上對規則跑斷言，產出可接進 CI 與 release gate 的測試 evidence。&lt;/p>
&lt;p>本文的驗收標準是：你能對一組規則寫出「放行 / 越權拒絕 / 未登入拒絕 / 欄位竄改拒絕」四類斷言、用 &lt;code>firebase emulators:exec&lt;/code> 一鍵跑完、並看到 &lt;code>assertFails&lt;/code> 確實證明該擋的有擋住。&lt;/p>
&lt;h2 id="lab-環境與依賴">Lab 環境與依賴&lt;/h2>
&lt;p>沿用 &lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/firestore/hands-on/local-emulator-quickstart/" data-link-title="Firestore Local Emulator Quickstart" data-link-desc="用 Firebase CLI 啟動 Firestore emulator、寫 firestore.rules、用 admin SDK seed 資料、跑 query baseline 與 cleanup，建立後續 Security Rules 與 distributed counter lab 共用的本地環境">quickstart&lt;/a> 的工作區與 &lt;code>firebase.json&lt;/code> / &lt;code>firestore.rules&lt;/code>。再裝測試依賴。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="nb">cd&lt;/span> /tmp/firestore-lab
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">npm install --save-dev @firebase/rules-unit-testing firebase jest&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>驗收前置是 &lt;code>firestore.rules&lt;/code> 存在（quickstart 已建立 owner-scoped 規則）與 &lt;code>firebase.json&lt;/code> 宣告了 Firestore emulator。&lt;/p>
&lt;h2 id="升級規則加入欄位竄改防護">升級規則：加入欄位竄改防護&lt;/h2>
&lt;p>quickstart 的規則擋了越權讀寫，但還沒擋「owner 改自己 note 時偷改 &lt;code>ownerId&lt;/code> 把資料轉走」。先把規則升級到帶欄位白名單，讓測試有更多面向可驗。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln"> 1&lt;/span>&lt;span class="cl">cat &amp;gt; firestore.rules &lt;span class="s">&amp;lt;&amp;lt;&amp;#39;RULES&amp;#39;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 2&lt;/span>&lt;span class="cl">&lt;span class="s">rules_version = &amp;#39;2&amp;#39;;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 3&lt;/span>&lt;span class="cl">&lt;span class="s">service cloud.firestore {
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 4&lt;/span>&lt;span class="cl">&lt;span class="s"> match /databases/{database}/documents {
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 5&lt;/span>&lt;span class="cl">&lt;span class="s">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 6&lt;/span>&lt;span class="cl">&lt;span class="s"> function isSignedIn() { return request.auth != null; }
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 7&lt;/span>&lt;span class="cl">&lt;span class="s">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 8&lt;/span>&lt;span class="cl">&lt;span class="s"> function ownsExisting() {
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 9&lt;/span>&lt;span class="cl">&lt;span class="s"> return isSignedIn() &amp;amp;&amp;amp; resource.data.ownerId == request.auth.uid;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">10&lt;/span>&lt;span class="cl">&lt;span class="s"> }
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">11&lt;/span>&lt;span class="cl">&lt;span class="s">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">12&lt;/span>&lt;span class="cl">&lt;span class="s"> function onlyChanges(fields) {
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">13&lt;/span>&lt;span class="cl">&lt;span class="s"> return request.resource.data.diff(resource.data).affectedKeys().hasOnly(fields);
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">14&lt;/span>&lt;span class="cl">&lt;span class="s"> }
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">15&lt;/span>&lt;span class="cl">&lt;span class="s">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">16&lt;/span>&lt;span class="cl">&lt;span class="s"> match /notes/{noteId} {
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">17&lt;/span>&lt;span class="cl">&lt;span class="s"> allow read: if ownsExisting();
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">18&lt;/span>&lt;span class="cl">&lt;span class="s"> allow create: if isSignedIn()
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">19&lt;/span>&lt;span class="cl">&lt;span class="s"> &amp;amp;&amp;amp; request.resource.data.ownerId == request.auth.uid;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">20&lt;/span>&lt;span class="cl">&lt;span class="s"> allow update: if ownsExisting() &amp;amp;&amp;amp; onlyChanges([&amp;#39;text&amp;#39;, &amp;#39;updatedAt&amp;#39;]);
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">21&lt;/span>&lt;span class="cl">&lt;span class="s"> allow delete: if ownsExisting();
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">22&lt;/span>&lt;span class="cl">&lt;span class="s"> }
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">23&lt;/span>&lt;span class="cl">&lt;span class="s"> }
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">24&lt;/span>&lt;span class="cl">&lt;span class="s">}
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">25&lt;/span>&lt;span class="cl">&lt;span class="s">RULES&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;code>onlyChanges(['text', 'updatedAt'])&lt;/code> 是這版的重點：update 只准動 &lt;code>text&lt;/code> 與 &lt;code>updatedAt&lt;/code>，碰 &lt;code>ownerId&lt;/code> 直接拒絕。下面的測試會驗證它。&lt;/p></description><content:encoded><![CDATA[<blockquote>
<p>本文是 <a href="/blog/backend/01-database/vendors/firestore/hands-on/" data-link-title="Firestore Hands-on 操作路線" data-link-desc="用 Firebase Emulator Suite 在本地演練 Firestore：emulator quickstart、Security Rules 單元測試、distributed counter 分片計數，全程零雲端成本、可重跑、產出可驗證 artifact">Firestore Hands-on 操作路線</a> 的 lab，實作 <a href="/blog/backend/01-database/vendors/firestore/security-rules-authz-modeling/" data-link-title="Firestore Security Rules 授權建模與可測試化：把規則當程式碼治理" data-link-desc="Firestore client 直連模型把整個授權控制面壓在 Security Rules 這套 DSL 裡；本文展開規則的求值模型、把授權拆成可組合 function、用 emulator 寫單元測試、五個把規則寫成資安漏洞的 production 踩坑，以及規則複雜度撞牆時把授權拉回後端的邊界">Security Rules 授權建模</a> deep article 的測試方法。前置環境見 <a href="/blog/backend/01-database/vendors/firestore/hands-on/local-emulator-quickstart/" data-link-title="Firestore Local Emulator Quickstart" data-link-desc="用 Firebase CLI 啟動 Firestore emulator、寫 firestore.rules、用 admin SDK seed 資料、跑 query baseline 與 cleanup，建立後續 Security Rules 與 distributed counter lab 共用的本地環境">Local emulator quickstart</a>。測試 API 以 <a href="https://firebase.google.com/docs/rules/unit-tests">Rules unit testing 文件</a> 為準、最後檢查日 2026-06-16。</p></blockquote>
<p>Firestore Security Rules test lab 的核心責任是把授權規則變成可自動驗證的測試。規則是 client 直連模型的整個控制面，改一條就要證明沒開新洞——這個 lab 用 <code>@firebase/rules-unit-testing</code> 在 emulator 上對規則跑斷言，產出可接進 CI 與 release gate 的測試 evidence。</p>
<p>本文的驗收標準是：你能對一組規則寫出「放行 / 越權拒絕 / 未登入拒絕 / 欄位竄改拒絕」四類斷言、用 <code>firebase emulators:exec</code> 一鍵跑完、並看到 <code>assertFails</code> 確實證明該擋的有擋住。</p>
<h2 id="lab-環境與依賴">Lab 環境與依賴</h2>
<p>沿用 <a href="/blog/backend/01-database/vendors/firestore/hands-on/local-emulator-quickstart/" data-link-title="Firestore Local Emulator Quickstart" data-link-desc="用 Firebase CLI 啟動 Firestore emulator、寫 firestore.rules、用 admin SDK seed 資料、跑 query baseline 與 cleanup，建立後續 Security Rules 與 distributed counter lab 共用的本地環境">quickstart</a> 的工作區與 <code>firebase.json</code> / <code>firestore.rules</code>。再裝測試依賴。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="nb">cd</span> /tmp/firestore-lab
</span></span><span class="line"><span class="ln">2</span><span class="cl">npm install --save-dev @firebase/rules-unit-testing firebase jest</span></span></code></pre></div><p>驗收前置是 <code>firestore.rules</code> 存在（quickstart 已建立 owner-scoped 規則）與 <code>firebase.json</code> 宣告了 Firestore emulator。</p>
<h2 id="升級規則加入欄位竄改防護">升級規則：加入欄位竄改防護</h2>
<p>quickstart 的規則擋了越權讀寫，但還沒擋「owner 改自己 note 時偷改 <code>ownerId</code> 把資料轉走」。先把規則升級到帶欄位白名單，讓測試有更多面向可驗。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl">cat &gt; firestore.rules <span class="s">&lt;&lt;&#39;RULES&#39;
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="s">rules_version = &#39;2&#39;;
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="s">service cloud.firestore {
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="s">  match /databases/{database}/documents {
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="s">
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="s">    function isSignedIn() { return request.auth != null; }
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="s">
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="s">    function ownsExisting() {
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="s">      return isSignedIn() &amp;&amp; resource.data.ownerId == request.auth.uid;
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="s">    }
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="s">
</span></span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="s">    function onlyChanges(fields) {
</span></span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="s">      return request.resource.data.diff(resource.data).affectedKeys().hasOnly(fields);
</span></span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="s">    }
</span></span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="s">
</span></span></span><span class="line"><span class="ln">16</span><span class="cl"><span class="s">    match /notes/{noteId} {
</span></span></span><span class="line"><span class="ln">17</span><span class="cl"><span class="s">      allow read: if ownsExisting();
</span></span></span><span class="line"><span class="ln">18</span><span class="cl"><span class="s">      allow create: if isSignedIn()
</span></span></span><span class="line"><span class="ln">19</span><span class="cl"><span class="s">                    &amp;&amp; request.resource.data.ownerId == request.auth.uid;
</span></span></span><span class="line"><span class="ln">20</span><span class="cl"><span class="s">      allow update: if ownsExisting() &amp;&amp; onlyChanges([&#39;text&#39;, &#39;updatedAt&#39;]);
</span></span></span><span class="line"><span class="ln">21</span><span class="cl"><span class="s">      allow delete: if ownsExisting();
</span></span></span><span class="line"><span class="ln">22</span><span class="cl"><span class="s">    }
</span></span></span><span class="line"><span class="ln">23</span><span class="cl"><span class="s">  }
</span></span></span><span class="line"><span class="ln">24</span><span class="cl"><span class="s">}
</span></span></span><span class="line"><span class="ln">25</span><span class="cl"><span class="s">RULES</span></span></span></code></pre></div><p><code>onlyChanges(['text', 'updatedAt'])</code> 是這版的重點：update 只准動 <code>text</code> 與 <code>updatedAt</code>，碰 <code>ownerId</code> 直接拒絕。下面的測試會驗證它。</p>
<h2 id="寫測試四類斷言">寫測試：四類斷言</h2>
<p>測試的核心責任是覆蓋「該放行的放行、該拒絕的拒絕」。<code>initializeTestEnvironment</code> 載入規則、<code>authenticatedContext</code> 模擬登入身分、<code>assertSucceeds</code> / <code>assertFails</code> 對操作斷言。預先種資料用 <code>withSecurityRulesDisabled</code> 繞過規則。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl">cat &gt; rules.test.js <span class="s">&lt;&lt;&#39;JS&#39;
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="s">const {
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="s">  initializeTestEnvironment, assertFails, assertSucceeds,
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="s">} = require(&#39;@firebase/rules-unit-testing&#39;);
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="s">const { doc, getDoc, setDoc, updateDoc } = require(&#39;firebase/firestore&#39;);
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="s">const fs = require(&#39;fs&#39;);
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="s">
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="s">let testEnv;
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="s">
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="s">beforeAll(async () =&gt; {
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="s">  testEnv = await initializeTestEnvironment({
</span></span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="s">    projectId: &#39;demo-firestore-lab&#39;,
</span></span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="s">    firestore: { rules: fs.readFileSync(&#39;firestore.rules&#39;, &#39;utf8&#39;) },
</span></span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="s">  });
</span></span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="s">});
</span></span></span><span class="line"><span class="ln">16</span><span class="cl"><span class="s">afterAll(async () =&gt; { await testEnv.cleanup(); });
</span></span></span><span class="line"><span class="ln">17</span><span class="cl"><span class="s">beforeEach(async () =&gt; {
</span></span></span><span class="line"><span class="ln">18</span><span class="cl"><span class="s">  await testEnv.clearFirestore();
</span></span></span><span class="line"><span class="ln">19</span><span class="cl"><span class="s">  await testEnv.withSecurityRulesDisabled(async (ctx) =&gt; {
</span></span></span><span class="line"><span class="ln">20</span><span class="cl"><span class="s">    await setDoc(doc(ctx.firestore(), &#39;notes/n1&#39;),
</span></span></span><span class="line"><span class="ln">21</span><span class="cl"><span class="s">      { ownerId: &#39;alice&#39;, text: &#39;hi&#39;, updatedAt: 0 });
</span></span></span><span class="line"><span class="ln">22</span><span class="cl"><span class="s">  });
</span></span></span><span class="line"><span class="ln">23</span><span class="cl"><span class="s">});
</span></span></span><span class="line"><span class="ln">24</span><span class="cl"><span class="s">
</span></span></span><span class="line"><span class="ln">25</span><span class="cl"><span class="s">// 1. 放行：owner 讀自己的
</span></span></span><span class="line"><span class="ln">26</span><span class="cl"><span class="s">test(&#39;owner reads own note&#39;, async () =&gt; {
</span></span></span><span class="line"><span class="ln">27</span><span class="cl"><span class="s">  const db = testEnv.authenticatedContext(&#39;alice&#39;).firestore();
</span></span></span><span class="line"><span class="ln">28</span><span class="cl"><span class="s">  await assertSucceeds(getDoc(doc(db, &#39;notes/n1&#39;)));
</span></span></span><span class="line"><span class="ln">29</span><span class="cl"><span class="s">});
</span></span></span><span class="line"><span class="ln">30</span><span class="cl"><span class="s">
</span></span></span><span class="line"><span class="ln">31</span><span class="cl"><span class="s">// 2. 越權拒絕：非 owner 讀別人的
</span></span></span><span class="line"><span class="ln">32</span><span class="cl"><span class="s">test(&#39;non-owner cannot read&#39;, async () =&gt; {
</span></span></span><span class="line"><span class="ln">33</span><span class="cl"><span class="s">  const db = testEnv.authenticatedContext(&#39;bob&#39;).firestore();
</span></span></span><span class="line"><span class="ln">34</span><span class="cl"><span class="s">  await assertFails(getDoc(doc(db, &#39;notes/n1&#39;)));
</span></span></span><span class="line"><span class="ln">35</span><span class="cl"><span class="s">});
</span></span></span><span class="line"><span class="ln">36</span><span class="cl"><span class="s">
</span></span></span><span class="line"><span class="ln">37</span><span class="cl"><span class="s">// 3. 未登入拒絕
</span></span></span><span class="line"><span class="ln">38</span><span class="cl"><span class="s">test(&#39;unauthenticated denied&#39;, async () =&gt; {
</span></span></span><span class="line"><span class="ln">39</span><span class="cl"><span class="s">  const db = testEnv.unauthenticatedContext().firestore();
</span></span></span><span class="line"><span class="ln">40</span><span class="cl"><span class="s">  await assertFails(getDoc(doc(db, &#39;notes/n1&#39;)));
</span></span></span><span class="line"><span class="ln">41</span><span class="cl"><span class="s">});
</span></span></span><span class="line"><span class="ln">42</span><span class="cl"><span class="s">
</span></span></span><span class="line"><span class="ln">43</span><span class="cl"><span class="s">// 4. 欄位竄改拒絕：owner 偷改 ownerId
</span></span></span><span class="line"><span class="ln">44</span><span class="cl"><span class="s">test(&#39;owner cannot change ownerId&#39;, async () =&gt; {
</span></span></span><span class="line"><span class="ln">45</span><span class="cl"><span class="s">  const db = testEnv.authenticatedContext(&#39;alice&#39;).firestore();
</span></span></span><span class="line"><span class="ln">46</span><span class="cl"><span class="s">  await assertFails(updateDoc(doc(db, &#39;notes/n1&#39;), { ownerId: &#39;bob&#39; }));
</span></span></span><span class="line"><span class="ln">47</span><span class="cl"><span class="s">});
</span></span></span><span class="line"><span class="ln">48</span><span class="cl"><span class="s">
</span></span></span><span class="line"><span class="ln">49</span><span class="cl"><span class="s">// 4b. 正當 update 放行
</span></span></span><span class="line"><span class="ln">50</span><span class="cl"><span class="s">test(&#39;owner can edit text&#39;, async () =&gt; {
</span></span></span><span class="line"><span class="ln">51</span><span class="cl"><span class="s">  const db = testEnv.authenticatedContext(&#39;alice&#39;).firestore();
</span></span></span><span class="line"><span class="ln">52</span><span class="cl"><span class="s">  await assertSucceeds(updateDoc(doc(db, &#39;notes/n1&#39;), { text: &#39;edited&#39;, updatedAt: 1 }));
</span></span></span><span class="line"><span class="ln">53</span><span class="cl"><span class="s">});
</span></span></span><span class="line"><span class="ln">54</span><span class="cl"><span class="s">JS</span></span></span></code></pre></div><p>四類斷言裡 <code>assertFails</code> 比 <code>assertSucceeds</code> 更重要——它證明的是攻擊路徑被擋住，正是滲透測試會打的點。每條規則至少要有「正向放行 + 至少一條拒絕」配對，光測 happy path 證明不了授權安全。</p>
<h2 id="一鍵跑emulatorsexec">一鍵跑：emulators:exec</h2>
<p>跑測試的核心責任是讓它在乾淨 emulator 上自動化執行。<code>firebase emulators:exec</code> 啟動 emulator、跑指定命令、結束後關閉——適合 CI，不需要手動開關 emulator。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">cat &gt; package.json.test <span class="s">&lt;&lt;&#39;JSON&#39;
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="s">{ &#34;scripts&#34;: { &#34;test:rules&#34;: &#34;jest rules.test.js&#34; } }
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="s">JSON</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="c1"># 把 test:rules script 併進既有 package.json 後執行：</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">
</span></span><span class="line"><span class="ln">6</span><span class="cl">firebase emulators:exec --only firestore --project demo-firestore-lab <span class="s2">&#34;npx jest rules.test.js&#34;</span></span></span></code></pre></div><p>預期輸出五個測試全 pass：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">PASS  ./rules.test.js
</span></span><span class="line"><span class="ln">2</span><span class="cl">  owner reads own note (passed)
</span></span><span class="line"><span class="ln">3</span><span class="cl">  non-owner cannot read (passed)
</span></span><span class="line"><span class="ln">4</span><span class="cl">  unauthenticated denied (passed)
</span></span><span class="line"><span class="ln">5</span><span class="cl">  owner cannot change ownerId (passed)
</span></span><span class="line"><span class="ln">6</span><span class="cl">  owner can edit text (passed)
</span></span><span class="line"><span class="ln">7</span><span class="cl">
</span></span><span class="line"><span class="ln">8</span><span class="cl">Test Suites: 1 passed, 1 total
</span></span><span class="line"><span class="ln">9</span><span class="cl">Tests:       5 passed, 5 total</span></span></code></pre></div><p>（Jest 預設 reporter 每行會印一個通過標記、此處以 <code>(passed)</code> 文字呈現，實際終端輸出為工具自身格式。）</p>
<h2 id="故意改壞驗證測試有效">故意改壞驗證測試有效</h2>
<p>測試的價值在於它會抓到回歸。把規則改回 <code>allow read, write: if true</code> 再跑，應看到「越權拒絕」「未登入拒絕」「欄位竄改拒絕」三個測試 fail——這證明測試確實守在攻擊路徑上，而不是恆綠的假測試。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># 暫時把規則改成全放行</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="nb">printf</span> <span class="s2">&#34;rules_version=&#39;2&#39;;\nservice cloud.firestore{match /databases/{db}/documents{match /{d=**}{allow read,write:if true;}}}&#34;</span> &gt; firestore.rules
</span></span><span class="line"><span class="ln">3</span><span class="cl">firebase emulators:exec --only firestore --project demo-firestore-lab <span class="s2">&#34;npx jest rules.test.js&#34;</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="c1"># 預期：3 個 assertFails 測試 fail（該擋的沒擋）</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="c1"># 驗證完改回上面的正確規則</span></span></span></code></pre></div><h2 id="artifact-與驗收">Artifact 與驗收</h2>
<table>
  <thead>
      <tr>
          <th>Artifact</th>
          <th>來源</th>
          <th>驗收</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>規則測試檔</td>
          <td><code>rules.test.js</code></td>
          <td>四類斷言 + 正向 update</td>
      </tr>
      <tr>
          <td>測試結果</td>
          <td><code>emulators:exec</code> 輸出</td>
          <td>正確規則下全 pass</td>
      </tr>
      <tr>
          <td>回歸證明</td>
          <td>改壞後重跑</td>
          <td>3 個 assertFails 測試轉 fail</td>
      </tr>
  </tbody>
</table>
<h2 id="接進-release-gate">接進 release gate</h2>
<p>規則測試的下游責任是成為發布證據。把 <code>firebase emulators:exec ... jest</code> 接進 CI pipeline，規則變更的 PR 必須通過才能 merge——這把「規則改動沒開新洞」從人工推敲變成 gate 條件，對齊 <a href="/blog/backend/06-reliability/release-gate/" data-link-title="6.8 Release Gate 與變更節奏" data-link-desc="把驗證、migration、相容性納入放行判準">6.8 release gate</a> 的 <code>Gate decision / Checks / Stop condition</code>。授權翻譯的正確性是安全邊界，這個 gate 比一般功能測試更該設為硬性 stop condition。</p>
<h2 id="cleanup">Cleanup</h2>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># emulators:exec 跑完會自動關 emulator；清依賴與工作區</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">rm -rf /tmp/firestore-lab</span></span></code></pre></div><h2 id="引用路徑">引用路徑</h2>
<ul>
<li>上游：<a href="/blog/backend/01-database/vendors/firestore/hands-on/" data-link-title="Firestore Hands-on 操作路線" data-link-desc="用 Firebase Emulator Suite 在本地演練 Firestore：emulator quickstart、Security Rules 單元測試、distributed counter 分片計數，全程零雲端成本、可重跑、產出可驗證 artifact">Firestore Hands-on 操作路線</a></li>
<li>Deep article：<a href="/blog/backend/01-database/vendors/firestore/security-rules-authz-modeling/" data-link-title="Firestore Security Rules 授權建模與可測試化：把規則當程式碼治理" data-link-desc="Firestore client 直連模型把整個授權控制面壓在 Security Rules 這套 DSL 裡；本文展開規則的求值模型、把授權拆成可組合 function、用 emulator 寫單元測試、五個把規則寫成資安漏洞的 production 踩坑，以及規則複雜度撞牆時把授權拉回後端的邊界">Security Rules 授權建模與可測試化</a></li>
<li>安全驗證：<a href="/blog/backend/01-database/red-team-data-layer/" data-link-title="1.5 攻擊者視角（紅隊）：資料層弱點判讀" data-link-desc="從資料存取邊界、外洩路徑與修復代價、盤點 database 的主要弱點">1.5 資料層紅隊</a></li>
<li>發布證據：<a href="/blog/backend/06-reliability/release-gate/" data-link-title="6.8 Release Gate 與變更節奏" data-link-desc="把驗證、migration、相容性納入放行判準">6.8 release gate</a></li>
<li>官方：<a href="https://firebase.google.com/docs/rules/unit-tests">Rules unit testing</a>、<a href="https://firebase.google.com/docs/emulator-suite/install_and_configure">emulators:exec</a></li>
</ul>
]]></content:encoded></item><item><title>MySQL Backup Restore Drill</title><link>https://tarrragon.github.io/blog/backend/01-database/vendors/mysql/hands-on/backup-restore-drill/</link><pubDate>Fri, 22 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/01-database/vendors/mysql/hands-on/backup-restore-drill/</guid><description>&lt;p>MySQL backup restore drill 的核心責任是證明資料可以從 backup 回到可用狀態。這篇承接 &lt;a href="../../pitr-backup/">PITR / Backup&lt;/a>，用 logical dump 建立最小演練框架，並保留 physical backup / binlog PITR 的 evidence 欄位。&lt;/p>
&lt;p>本文的驗收標準是：你能產出 dump、記錄 binlog position、還原到隔離 database、跑 validation query，並寫下 RPO / RTO note。&lt;/p>
&lt;h2 id="create-backup">Create Backup&lt;/h2>
&lt;p>Create backup 的核心責任是建立可還原 artifact。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">mkdir -p /tmp/mysql-backup-lab
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">mysqldump -h 127.0.0.1 -P &lt;span class="m">33069&lt;/span> -u app_user -papp_pw &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">&lt;span class="se">&lt;/span> --single-transaction --routines --triggers appdb &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">&lt;span class="se">&lt;/span> &amp;gt; /tmp/mysql-backup-lab/appdb.sql&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>記錄 binlog 狀態：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">mysql -h 127.0.0.1 -P &lt;span class="m">33069&lt;/span> -u root -proot_pw -e &lt;span class="s2">&amp;#34;SHOW BINARY LOG STATUS;&amp;#34;&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;code>--single-transaction&lt;/code> 適合 InnoDB consistent dump。大型 production 要評估 physical backup、backup lock、replication lag 與 binlog retention。&lt;/p>
&lt;h2 id="mutate-source">Mutate Source&lt;/h2>
&lt;p>Mutate source 的核心責任是讓 restore 時間點具體化。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">mysql -h 127.0.0.1 -P &lt;span class="m">33069&lt;/span> -u app_user -papp_pw appdb &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="se">&lt;/span> -e &lt;span class="s2">&amp;#34;INSERT INTO ledger_entries(account_id, amount_cents, idempotency_key) VALUES (1, 777, &amp;#39;after-backup-write&amp;#39;);&amp;#34;&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Source 現在比 backup 多一筆。這能用來討論 RPO 與 binlog PITR。&lt;/p>
&lt;h2 id="restore-isolated-database">Restore Isolated Database&lt;/h2>
&lt;p>Restore isolated database 的核心責任是避免覆蓋 source。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">mysql -h 127.0.0.1 -P &lt;span class="m">33069&lt;/span> -u root -proot_pw &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="se">&lt;/span> -e &lt;span class="s2">&amp;#34;DROP DATABASE IF EXISTS appdb_restore; CREATE DATABASE appdb_restore;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">mysql -h 127.0.0.1 -P &lt;span class="m">33069&lt;/span> -u root -proot_pw appdb_restore &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">&lt;span class="se">&lt;/span> &amp;lt; /tmp/mysql-backup-lab/appdb.sql&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Validation：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">mysql -h 127.0.0.1 -P &lt;span class="m">33069&lt;/span> -u root -proot_pw appdb_restore &lt;span class="s">&amp;lt;&amp;lt;&amp;#39;SQL&amp;#39;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="s">SELECT COUNT(*) FROM accounts;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">&lt;span class="s">SELECT COUNT(*) FROM ledger_entries;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">&lt;span class="s">SELECT a.owner_name, SUM(l.amount_cents) AS balance_cents
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl">&lt;span class="s">FROM accounts a JOIN ledger_entries l ON l.account_id = a.id
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">6&lt;/span>&lt;span class="cl">&lt;span class="s">GROUP BY a.owner_name;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">7&lt;/span>&lt;span class="cl">&lt;span class="s">SQL&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Validation query 要和 application smoke test 對齊。正式 drill 還要啟動 app 指向 restore database。&lt;/p>
&lt;h2 id="rpo--rto-note">RPO / RTO Note&lt;/h2>
&lt;p>RPO / RTO note 的核心責任是把演練結果轉成服務承諾。&lt;/p></description><content:encoded><![CDATA[<p>MySQL backup restore drill 的核心責任是證明資料可以從 backup 回到可用狀態。這篇承接 <a href="../../pitr-backup/">PITR / Backup</a>，用 logical dump 建立最小演練框架，並保留 physical backup / binlog PITR 的 evidence 欄位。</p>
<p>本文的驗收標準是：你能產出 dump、記錄 binlog position、還原到隔離 database、跑 validation query，並寫下 RPO / RTO note。</p>
<h2 id="create-backup">Create Backup</h2>
<p>Create backup 的核心責任是建立可還原 artifact。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">mkdir -p /tmp/mysql-backup-lab
</span></span><span class="line"><span class="ln">2</span><span class="cl">mysqldump -h 127.0.0.1 -P <span class="m">33069</span> -u app_user -papp_pw <span class="se">\
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="se"></span>  --single-transaction --routines --triggers appdb <span class="se">\
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="se"></span>  &gt; /tmp/mysql-backup-lab/appdb.sql</span></span></code></pre></div><p>記錄 binlog 狀態：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">mysql -h 127.0.0.1 -P <span class="m">33069</span> -u root -proot_pw -e <span class="s2">&#34;SHOW BINARY LOG STATUS;&#34;</span></span></span></code></pre></div><p><code>--single-transaction</code> 適合 InnoDB consistent dump。大型 production 要評估 physical backup、backup lock、replication lag 與 binlog retention。</p>
<h2 id="mutate-source">Mutate Source</h2>
<p>Mutate source 的核心責任是讓 restore 時間點具體化。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">mysql -h 127.0.0.1 -P <span class="m">33069</span> -u app_user -papp_pw appdb <span class="se">\
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="se"></span>  -e <span class="s2">&#34;INSERT INTO ledger_entries(account_id, amount_cents, idempotency_key) VALUES (1, 777, &#39;after-backup-write&#39;);&#34;</span></span></span></code></pre></div><p>Source 現在比 backup 多一筆。這能用來討論 RPO 與 binlog PITR。</p>
<h2 id="restore-isolated-database">Restore Isolated Database</h2>
<p>Restore isolated database 的核心責任是避免覆蓋 source。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">mysql -h 127.0.0.1 -P <span class="m">33069</span> -u root -proot_pw <span class="se">\
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="se"></span>  -e <span class="s2">&#34;DROP DATABASE IF EXISTS appdb_restore; CREATE DATABASE appdb_restore;&#34;</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">mysql -h 127.0.0.1 -P <span class="m">33069</span> -u root -proot_pw appdb_restore <span class="se">\
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="se"></span>  &lt; /tmp/mysql-backup-lab/appdb.sql</span></span></code></pre></div><p>Validation：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">mysql -h 127.0.0.1 -P <span class="m">33069</span> -u root -proot_pw appdb_restore <span class="s">&lt;&lt;&#39;SQL&#39;
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="s">SELECT COUNT(*) FROM accounts;
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="s">SELECT COUNT(*) FROM ledger_entries;
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="s">SELECT a.owner_name, SUM(l.amount_cents) AS balance_cents
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="s">FROM accounts a JOIN ledger_entries l ON l.account_id = a.id
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="s">GROUP BY a.owner_name;
</span></span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="s">SQL</span></span></span></code></pre></div><p>Validation query 要和 application smoke test 對齊。正式 drill 還要啟動 app 指向 restore database。</p>
<h2 id="rpo--rto-note">RPO / RTO Note</h2>
<p>RPO / RTO note 的核心責任是把演練結果轉成服務承諾。</p>
<table>
  <thead>
      <tr>
          <th>Evidence</th>
          <th>記錄內容</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Backup time</td>
          <td>dump start / finish</td>
      </tr>
      <tr>
          <td>Binlog position</td>
          <td>file、position 或 GTID set</td>
      </tr>
      <tr>
          <td>Restore time</td>
          <td>開始 restore 到 validation 成功</td>
      </tr>
      <tr>
          <td>Data gap</td>
          <td>backup 後需要 binlog 補回的寫入</td>
      </tr>
      <tr>
          <td>Smoke test</td>
          <td>application workflow</td>
      </tr>
  </tbody>
</table>
<p>完成本篇後，binlog CDC 讀 <a href="../../binlog-cdc/">Binlog CDC</a>；PITR 策略讀 <a href="../../pitr-backup/">PITR / Backup</a>。</p>
]]></content:encoded></item><item><title>MySQL Hands-on 操作路線</title><link>https://tarrragon.github.io/blog/backend/01-database/vendors/mysql/hands-on/</link><pubDate>Fri, 22 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/01-database/vendors/mysql/hands-on/</guid><description>&lt;p>MySQL hands-on 操作路線的核心責任是把 MySQL deep article 的設定與 failure mode 轉成可演練流程。這一層對齊 LLM &lt;code>hands-on/&lt;/code>：讀者能跑出 config、metric、validation query 與 rollback evidence。&lt;/p>
&lt;h2 id="章節列表">章節列表&lt;/h2>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>章節&lt;/th>
 &lt;th>主題&lt;/th>
 &lt;th>產出 artifact&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>&lt;a href="local-lab-quickstart/">Local lab quickstart&lt;/a>&lt;/td>
 &lt;td>MySQL container、sample schema、baseline workload&lt;/td>
 &lt;td>local DSN、schema log、basic metric snapshot&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;a href="proxysql-routing-lab/">ProxySQL routing lab&lt;/a>&lt;/td>
 &lt;td>read/write split、lag-aware routing、runtime / disk config&lt;/td>
 &lt;td>ProxySQL config、routing evidence、drift note&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;a href="online-schema-change-lab/">Online schema change lab&lt;/a>&lt;/td>
 &lt;td>gh-ost / pt-osc cutover、metadata lock、rollback&lt;/td>
 &lt;td>OSC command、cutover note、lock evidence&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;a href="replication-failover-lab/">Replication failover lab&lt;/a>&lt;/td>
 &lt;td>GTID replica、semi-sync、Orchestrator / manual failover&lt;/td>
 &lt;td>topology map、lag evidence、failover timeline&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;a href="backup-restore-drill/">Backup restore drill&lt;/a>&lt;/td>
 &lt;td>logical / physical backup、binlog recovery、restore validation&lt;/td>
 &lt;td>restore record、RPO / RTO evidence&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;a href="vitess-sandbox-route/">Vitess sandbox route&lt;/a>&lt;/td>
 &lt;td>keyspace、VSchema、VTGate / VTTablet sandbox&lt;/td>
 &lt;td>sandbox topology、routing sample、shard key note&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;h2 id="設計原則">設計原則&lt;/h2>
&lt;p>MySQL hands-on 章節要保留「高併發簡單 OLTP + 分片生態」的服務語言。操作章節不只給指令，也要說明 command 產出的 evidence 如何回到 replication、schema change、connection routing 或 sharding decision。&lt;/p>
&lt;h2 id="引用路徑">引用路徑&lt;/h2>
&lt;ul>
&lt;li>上游：&lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/mysql/" data-link-title="MySQL" data-link-desc="高併發網路服務常用關聯式資料庫、Vitess / PlanetScale 分片生態、GitHub / Shopify / Facebook 規模驗證">MySQL overview&lt;/a>&lt;/li>
&lt;li>Deep article：&lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/mysql/proxysql-config/" data-link-title="MySQL ProxySQL 配置：connection / query / route / response 四段 lifecycle 跟 query rule 設計" data-link-desc="ProxySQL 是 MySQL 生態的 connection pool &amp;#43; query routing 標準。本文走 connection → query parse → route → response 四段 lifecycle、query rule engine 的 rule chain 設計、Hostgroup / Server / User 三層 schema、配置 step-by-step（讀寫分離 &amp;#43; replica lag-aware routing）、5 production 踩雷（query rule 順序錯亂 / connection 漂移 / write 路由到 replica / runtime / disk schema drift / mirror traffic 副作用）、跟 Replication / Orchestrator / HAProxy 整合">ProxySQL Config&lt;/a>、&lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/mysql/online-schema-change-tools/" data-link-title="MySQL Online Schema Change：gh-ost 跟 pt-online-schema-change 兩條完全不同的 ghost table 路徑" data-link-desc="MySQL ALTER TABLE 可能鎖整張表，production 需要 online schema change 流程。gh-ost（GitHub）跟 pt-online-schema-change（Percona）都用 ghost table 解決、但底層機制完全不同：pt-osc 用 trigger 同步、gh-ost 用 binlog stream 同步。本文走兩工具機制對照表 → trigger vs binlog 各自取捨 → 配置 step-by-step → 5 production 踩雷（trigger overhead / binlog 延遲 / FK constraint / hot trigger lock / 切換瞬間 deadlock）→ 何時用哪一個">Online Schema Change Tools&lt;/a>、&lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/mysql/replication-topology/" data-link-title="MySQL Replication Topology：async / semi-sync / GTID 不是三選一、是三個 trade-off 軸的疊加" data-link-desc="MySQL replication 不是「選 async 還是 semi-sync」、是 *durability / latency / consistency* 三個 trade-off 軸的疊加；GTID 是跨 mode 的 infrastructure layer、不是第三種 mode。本文走 3 軸取捨模型 → async / semi-sync 行為對比 → GTID 替代 binlog-position 的好處 → 配置 step-by-step → 5 production 踩雷（lag 暴衝 / semi-sync 退回 async / GTID gap / Loss-Less semi-sync 真的 loss-less / chained replication 雪崩）→ 跟 Aurora MySQL / Vitess / ProxySQL / Orchestrator 整合">Replication Topology&lt;/a>、&lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/mysql/vitess-sharding/" data-link-title="MySQL Vitess Sharding：VTGate / VTTablet / VReplication / VSchema 四件套協作" data-link-desc="Vitess 不只是 MySQL sharding proxy、是 4 個 component 協作的完整 sharding 系統 — VTGate（query routing layer）、VTTablet（per-MySQL agent）、VReplication（跨 shard 資料移動）、VSchema（sharding metadata）。本文走 4 件套各自責任、keyspace / shard / tablet 架構、shard key 設計（Vindex）、配置 step-by-step、5 production 踩雷（cross-shard transaction / VStream lag / Vindex 不均勻 / resharding 切流 / VReplication 卡住）、跟自管 sharding 跟 PlanetScale 的對比">Vitess Sharding&lt;/a>&lt;/li>
&lt;/ul></description><content:encoded><![CDATA[<p>MySQL hands-on 操作路線的核心責任是把 MySQL deep article 的設定與 failure mode 轉成可演練流程。這一層對齊 LLM <code>hands-on/</code>：讀者能跑出 config、metric、validation query 與 rollback evidence。</p>
<h2 id="章節列表">章節列表</h2>
<table>
  <thead>
      <tr>
          <th>章節</th>
          <th>主題</th>
          <th>產出 artifact</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><a href="local-lab-quickstart/">Local lab quickstart</a></td>
          <td>MySQL container、sample schema、baseline workload</td>
          <td>local DSN、schema log、basic metric snapshot</td>
      </tr>
      <tr>
          <td><a href="proxysql-routing-lab/">ProxySQL routing lab</a></td>
          <td>read/write split、lag-aware routing、runtime / disk config</td>
          <td>ProxySQL config、routing evidence、drift note</td>
      </tr>
      <tr>
          <td><a href="online-schema-change-lab/">Online schema change lab</a></td>
          <td>gh-ost / pt-osc cutover、metadata lock、rollback</td>
          <td>OSC command、cutover note、lock evidence</td>
      </tr>
      <tr>
          <td><a href="replication-failover-lab/">Replication failover lab</a></td>
          <td>GTID replica、semi-sync、Orchestrator / manual failover</td>
          <td>topology map、lag evidence、failover timeline</td>
      </tr>
      <tr>
          <td><a href="backup-restore-drill/">Backup restore drill</a></td>
          <td>logical / physical backup、binlog recovery、restore validation</td>
          <td>restore record、RPO / RTO evidence</td>
      </tr>
      <tr>
          <td><a href="vitess-sandbox-route/">Vitess sandbox route</a></td>
          <td>keyspace、VSchema、VTGate / VTTablet sandbox</td>
          <td>sandbox topology、routing sample、shard key note</td>
      </tr>
  </tbody>
</table>
<h2 id="設計原則">設計原則</h2>
<p>MySQL hands-on 章節要保留「高併發簡單 OLTP + 分片生態」的服務語言。操作章節不只給指令，也要說明 command 產出的 evidence 如何回到 replication、schema change、connection routing 或 sharding decision。</p>
<h2 id="引用路徑">引用路徑</h2>
<ul>
<li>上游：<a href="/blog/backend/01-database/vendors/mysql/" data-link-title="MySQL" data-link-desc="高併發網路服務常用關聯式資料庫、Vitess / PlanetScale 分片生態、GitHub / Shopify / Facebook 規模驗證">MySQL overview</a></li>
<li>Deep article：<a href="/blog/backend/01-database/vendors/mysql/proxysql-config/" data-link-title="MySQL ProxySQL 配置：connection / query / route / response 四段 lifecycle 跟 query rule 設計" data-link-desc="ProxySQL 是 MySQL 生態的 connection pool &#43; query routing 標準。本文走 connection → query parse → route → response 四段 lifecycle、query rule engine 的 rule chain 設計、Hostgroup / Server / User 三層 schema、配置 step-by-step（讀寫分離 &#43; replica lag-aware routing）、5 production 踩雷（query rule 順序錯亂 / connection 漂移 / write 路由到 replica / runtime / disk schema drift / mirror traffic 副作用）、跟 Replication / Orchestrator / HAProxy 整合">ProxySQL Config</a>、<a href="/blog/backend/01-database/vendors/mysql/online-schema-change-tools/" data-link-title="MySQL Online Schema Change：gh-ost 跟 pt-online-schema-change 兩條完全不同的 ghost table 路徑" data-link-desc="MySQL ALTER TABLE 可能鎖整張表，production 需要 online schema change 流程。gh-ost（GitHub）跟 pt-online-schema-change（Percona）都用 ghost table 解決、但底層機制完全不同：pt-osc 用 trigger 同步、gh-ost 用 binlog stream 同步。本文走兩工具機制對照表 → trigger vs binlog 各自取捨 → 配置 step-by-step → 5 production 踩雷（trigger overhead / binlog 延遲 / FK constraint / hot trigger lock / 切換瞬間 deadlock）→ 何時用哪一個">Online Schema Change Tools</a>、<a href="/blog/backend/01-database/vendors/mysql/replication-topology/" data-link-title="MySQL Replication Topology：async / semi-sync / GTID 不是三選一、是三個 trade-off 軸的疊加" data-link-desc="MySQL replication 不是「選 async 還是 semi-sync」、是 *durability / latency / consistency* 三個 trade-off 軸的疊加；GTID 是跨 mode 的 infrastructure layer、不是第三種 mode。本文走 3 軸取捨模型 → async / semi-sync 行為對比 → GTID 替代 binlog-position 的好處 → 配置 step-by-step → 5 production 踩雷（lag 暴衝 / semi-sync 退回 async / GTID gap / Loss-Less semi-sync 真的 loss-less / chained replication 雪崩）→ 跟 Aurora MySQL / Vitess / ProxySQL / Orchestrator 整合">Replication Topology</a>、<a href="/blog/backend/01-database/vendors/mysql/vitess-sharding/" data-link-title="MySQL Vitess Sharding：VTGate / VTTablet / VReplication / VSchema 四件套協作" data-link-desc="Vitess 不只是 MySQL sharding proxy、是 4 個 component 協作的完整 sharding 系統 — VTGate（query routing layer）、VTTablet（per-MySQL agent）、VReplication（跨 shard 資料移動）、VSchema（sharding metadata）。本文走 4 件套各自責任、keyspace / shard / tablet 架構、shard key 設計（Vindex）、配置 step-by-step、5 production 踩雷（cross-shard transaction / VStream lag / Vindex 不均勻 / resharding 切流 / VReplication 卡住）、跟自管 sharding 跟 PlanetScale 的對比">Vitess Sharding</a></li>
</ul>
]]></content:encoded></item><item><title>MySQL Local Lab Quickstart</title><link>https://tarrragon.github.io/blog/backend/01-database/vendors/mysql/hands-on/local-lab-quickstart/</link><pubDate>Fri, 22 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/01-database/vendors/mysql/hands-on/local-lab-quickstart/</guid><description>&lt;p>MySQL local lab quickstart 的核心責任是建立後續 ProxySQL、OSC、replication、backup 與 Vitess sandbox 共用的本地環境。這個 lab 提供可重建 MySQL instance、baseline schema、seed data 與 basic evidence。&lt;/p>
&lt;p>本文的驗收標準是：你能啟動 MySQL、套用 schema、跑 sample workload、取得 processlist / InnoDB status / table count，並能 teardown 重建。&lt;/p>
&lt;h2 id="docker-compose">Docker Compose&lt;/h2>
&lt;p>Docker Compose 的核心責任是讓 lab 環境可重建。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-yaml" data-lang="yaml">&lt;span class="line">&lt;span class="ln"> 1&lt;/span>&lt;span class="cl">&lt;span class="nt">services&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 2&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">mysql&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 3&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">image&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">mysql:8.4&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 4&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">environment&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 5&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">MYSQL_ROOT_PASSWORD&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">root_pw&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 6&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">MYSQL_DATABASE&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">appdb&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 7&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">MYSQL_USER&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">app_user&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 8&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">MYSQL_PASSWORD&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">app_pw&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 9&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">ports&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">10&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>- &lt;span class="s2">&amp;#34;33069:3306&amp;#34;&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">11&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">command&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">12&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>- &lt;span class="s2">&amp;#34;--performance-schema=ON&amp;#34;&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">13&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>- &lt;span class="s2">&amp;#34;--log-bin=mysql-bin&amp;#34;&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">14&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>- &lt;span class="s2">&amp;#34;--server-id=1&amp;#34;&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>啟動：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">docker compose up -d
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="nb">export&lt;/span> &lt;span class="nv">MYSQL_PWD&lt;/span>&lt;span class="o">=&lt;/span>app_pw
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">mysql -h 127.0.0.1 -P &lt;span class="m">33069&lt;/span> -u app_user appdb -e &lt;span class="s2">&amp;#34;SELECT VERSION();&amp;#34;&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="baseline-schema">Baseline Schema&lt;/h2>
&lt;p>Baseline schema 的核心責任是建立可測 transaction、index、binlog 與 OSC 的模型。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln"> 1&lt;/span>&lt;span class="cl">mysql -h 127.0.0.1 -P &lt;span class="m">33069&lt;/span> -u app_user appdb &lt;span class="s">&amp;lt;&amp;lt;&amp;#39;SQL&amp;#39;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 2&lt;/span>&lt;span class="cl">&lt;span class="s">CREATE TABLE accounts (
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 3&lt;/span>&lt;span class="cl">&lt;span class="s"> id BIGINT PRIMARY KEY AUTO_INCREMENT,
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 4&lt;/span>&lt;span class="cl">&lt;span class="s"> tenant_id CHAR(36) NOT NULL,
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 5&lt;/span>&lt;span class="cl">&lt;span class="s"> owner_name VARCHAR(128) NOT NULL,
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 6&lt;/span>&lt;span class="cl">&lt;span class="s"> status ENUM(&amp;#39;active&amp;#39;, &amp;#39;closed&amp;#39;) NOT NULL,
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 7&lt;/span>&lt;span class="cl">&lt;span class="s"> created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 8&lt;/span>&lt;span class="cl">&lt;span class="s"> KEY idx_accounts_tenant (tenant_id)
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 9&lt;/span>&lt;span class="cl">&lt;span class="s">) ENGINE=InnoDB;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">10&lt;/span>&lt;span class="cl">&lt;span class="s">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">11&lt;/span>&lt;span class="cl">&lt;span class="s">CREATE TABLE ledger_entries (
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">12&lt;/span>&lt;span class="cl">&lt;span class="s"> id BIGINT PRIMARY KEY AUTO_INCREMENT,
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">13&lt;/span>&lt;span class="cl">&lt;span class="s"> account_id BIGINT NOT NULL,
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">14&lt;/span>&lt;span class="cl">&lt;span class="s"> amount_cents BIGINT NOT NULL,
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">15&lt;/span>&lt;span class="cl">&lt;span class="s"> idempotency_key VARCHAR(128) NOT NULL,
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">16&lt;/span>&lt;span class="cl">&lt;span class="s"> created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">17&lt;/span>&lt;span class="cl">&lt;span class="s"> UNIQUE KEY uk_ledger_idempotency (idempotency_key),
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">18&lt;/span>&lt;span class="cl">&lt;span class="s"> KEY idx_ledger_account_created (account_id, created_at),
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">19&lt;/span>&lt;span class="cl">&lt;span class="s"> CONSTRAINT fk_ledger_account FOREIGN KEY (account_id) REFERENCES accounts(id)
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">20&lt;/span>&lt;span class="cl">&lt;span class="s">) ENGINE=InnoDB;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">21&lt;/span>&lt;span class="cl">&lt;span class="s">SQL&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="seed-and-evidence">Seed and Evidence&lt;/h2>
&lt;p>Seed and evidence 的核心責任是產生可重跑資料與 baseline。&lt;/p></description><content:encoded><![CDATA[<p>MySQL local lab quickstart 的核心責任是建立後續 ProxySQL、OSC、replication、backup 與 Vitess sandbox 共用的本地環境。這個 lab 提供可重建 MySQL instance、baseline schema、seed data 與 basic evidence。</p>
<p>本文的驗收標準是：你能啟動 MySQL、套用 schema、跑 sample workload、取得 processlist / InnoDB status / table count，並能 teardown 重建。</p>
<h2 id="docker-compose">Docker Compose</h2>
<p>Docker Compose 的核心責任是讓 lab 環境可重建。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="nt">services</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="w">  </span><span class="nt">mysql</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="w">    </span><span class="nt">image</span><span class="p">:</span><span class="w"> </span><span class="l">mysql:8.4</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="w">    </span><span class="nt">environment</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="w">      </span><span class="nt">MYSQL_ROOT_PASSWORD</span><span class="p">:</span><span class="w"> </span><span class="l">root_pw</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="w">      </span><span class="nt">MYSQL_DATABASE</span><span class="p">:</span><span class="w"> </span><span class="l">appdb</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="w">      </span><span class="nt">MYSQL_USER</span><span class="p">:</span><span class="w"> </span><span class="l">app_user</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="w">      </span><span class="nt">MYSQL_PASSWORD</span><span class="p">:</span><span class="w"> </span><span class="l">app_pw</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="w">    </span><span class="nt">ports</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="w">      </span>- <span class="s2">&#34;33069:3306&#34;</span><span class="w">
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="w">    </span><span class="nt">command</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="w">      </span>- <span class="s2">&#34;--performance-schema=ON&#34;</span><span class="w">
</span></span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="w">      </span>- <span class="s2">&#34;--log-bin=mysql-bin&#34;</span><span class="w">
</span></span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="w">      </span>- <span class="s2">&#34;--server-id=1&#34;</span></span></span></code></pre></div><p>啟動：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">docker compose up -d
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="nb">export</span> <span class="nv">MYSQL_PWD</span><span class="o">=</span>app_pw
</span></span><span class="line"><span class="ln">3</span><span class="cl">mysql -h 127.0.0.1 -P <span class="m">33069</span> -u app_user appdb -e <span class="s2">&#34;SELECT VERSION();&#34;</span></span></span></code></pre></div><h2 id="baseline-schema">Baseline Schema</h2>
<p>Baseline schema 的核心責任是建立可測 transaction、index、binlog 與 OSC 的模型。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl">mysql -h 127.0.0.1 -P <span class="m">33069</span> -u app_user appdb <span class="s">&lt;&lt;&#39;SQL&#39;
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="s">CREATE TABLE accounts (
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="s">  id BIGINT PRIMARY KEY AUTO_INCREMENT,
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="s">  tenant_id CHAR(36) NOT NULL,
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="s">  owner_name VARCHAR(128) NOT NULL,
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="s">  status ENUM(&#39;active&#39;, &#39;closed&#39;) NOT NULL,
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="s">  created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="s">  KEY idx_accounts_tenant (tenant_id)
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="s">) ENGINE=InnoDB;
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="s">
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="s">CREATE TABLE ledger_entries (
</span></span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="s">  id BIGINT PRIMARY KEY AUTO_INCREMENT,
</span></span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="s">  account_id BIGINT NOT NULL,
</span></span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="s">  amount_cents BIGINT NOT NULL,
</span></span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="s">  idempotency_key VARCHAR(128) NOT NULL,
</span></span></span><span class="line"><span class="ln">16</span><span class="cl"><span class="s">  created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
</span></span></span><span class="line"><span class="ln">17</span><span class="cl"><span class="s">  UNIQUE KEY uk_ledger_idempotency (idempotency_key),
</span></span></span><span class="line"><span class="ln">18</span><span class="cl"><span class="s">  KEY idx_ledger_account_created (account_id, created_at),
</span></span></span><span class="line"><span class="ln">19</span><span class="cl"><span class="s">  CONSTRAINT fk_ledger_account FOREIGN KEY (account_id) REFERENCES accounts(id)
</span></span></span><span class="line"><span class="ln">20</span><span class="cl"><span class="s">) ENGINE=InnoDB;
</span></span></span><span class="line"><span class="ln">21</span><span class="cl"><span class="s">SQL</span></span></span></code></pre></div><h2 id="seed-and-evidence">Seed and Evidence</h2>
<p>Seed and evidence 的核心責任是產生可重跑資料與 baseline。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl">mysql -h 127.0.0.1 -P <span class="m">33069</span> -u app_user appdb <span class="s">&lt;&lt;&#39;SQL&#39;
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="s">INSERT INTO accounts(tenant_id, owner_name, status)
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="s">VALUES (&#39;tenant-a&#39;, &#39;Ada&#39;, &#39;active&#39;), (&#39;tenant-b&#39;, &#39;Lin&#39;, &#39;active&#39;);
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="s">
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="s">INSERT INTO ledger_entries(account_id, amount_cents, idempotency_key)
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="s">VALUES (1, 1000, &#39;seed-ada-1&#39;), (1, -200, &#39;seed-ada-2&#39;), (2, 500, &#39;seed-lin-1&#39;);
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="s">
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="s">SELECT a.owner_name, SUM(l.amount_cents) AS balance_cents
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="s">FROM accounts a JOIN ledger_entries l ON l.account_id = a.id
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="s">GROUP BY a.owner_name;
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="s">SQL</span></span></span></code></pre></div><p>Basic evidence：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">mysql -h 127.0.0.1 -P <span class="m">33069</span> -u app_user appdb -e <span class="s2">&#34;SHOW FULL PROCESSLIST;&#34;</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">mysql -h 127.0.0.1 -P <span class="m">33069</span> -u app_user appdb -e <span class="s2">&#34;SHOW TABLE STATUS;&#34;</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">mysql -h 127.0.0.1 -P <span class="m">33069</span> -u app_user appdb -e <span class="s2">&#34;SHOW ENGINE INNODB STATUS\\G&#34;</span></span></span></code></pre></div><h2 id="teardown">Teardown</h2>
<p>Teardown 的核心責任是讓 lab 可重跑。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">docker compose down -v</span></span></code></pre></div><p>完成本篇後，backup 進入 <a href="../backup-restore-drill/">Backup Restore Drill</a>；schema change 進入 <a href="../online-schema-change-lab/">Online Schema Change Lab</a>；routing 進入 <a href="../proxysql-routing-lab/">ProxySQL Routing Lab</a>。</p>
]]></content:encoded></item><item><title>MySQL Online Schema Change Lab</title><link>https://tarrragon.github.io/blog/backend/01-database/vendors/mysql/hands-on/online-schema-change-lab/</link><pubDate>Fri, 22 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/01-database/vendors/mysql/hands-on/online-schema-change-lab/</guid><description>&lt;p>MySQL online schema change lab 的核心責任是讓讀者看到 schema change 的 metadata lock、algorithm、copy / cutover 與 validation evidence。這篇承接 &lt;a href="../../online-schema-change-tools/">Online Schema Change Tools&lt;/a> 與 &lt;a href="../../metadata-lock-deep-dive/">Metadata Lock Deep Dive&lt;/a>。&lt;/p>
&lt;p>本文的驗收標準是：你能跑一個低風險 ALTER、觀察 metadata lock、記錄 validation query，並理解 gh-ost / pt-osc 的 cutover evidence。&lt;/p>
&lt;h2 id="direct-alter-baseline">Direct ALTER Baseline&lt;/h2>
&lt;p>Direct ALTER baseline 的核心責任是先看 MySQL 原生 DDL 的行為。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">mysql -h 127.0.0.1 -P &lt;span class="m">33069&lt;/span> -u app_user -papp_pw appdb &lt;span class="s">&amp;lt;&amp;lt;&amp;#39;SQL&amp;#39;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="s">ALTER TABLE accounts ADD COLUMN email VARCHAR(255) NULL;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">&lt;span class="s">SHOW CREATE TABLE accounts\G
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">&lt;span class="s">SQL&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>記錄 ALTER duration、algorithm、lock impact 與 table size。不同 MySQL 版本與 DDL 類型會有不同行為，production 要在 staging dry run。&lt;/p>
&lt;h2 id="metadata-lock-observation">Metadata Lock Observation&lt;/h2>
&lt;p>Metadata lock observation 的核心責任是看到 blocker。&lt;/p>
&lt;p>開 Session A：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-sql" data-lang="sql">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="k">START&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">TRANSACTION&lt;/span>&lt;span class="p">;&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="k">SELECT&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="o">*&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">FROM&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">accounts&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">WHERE&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">id&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">;&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>保持 transaction 開啟。Session B 執行：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-sql" data-lang="sql">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="k">ALTER&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">TABLE&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">accounts&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">ADD&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">COLUMN&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">note&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="nb">VARCHAR&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">255&lt;/span>&lt;span class="p">)&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">NULL&lt;/span>&lt;span class="p">;&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Session C 查：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-sql" data-lang="sql">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="k">SELECT&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">OBJECT_SCHEMA&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">OBJECT_NAME&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">LOCK_TYPE&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">LOCK_STATUS&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">OWNER_THREAD_ID&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="k">FROM&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">performance_schema&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="n">metadata_locks&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="k">WHERE&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">OBJECT_SCHEMA&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s1">&amp;#39;appdb&amp;#39;&lt;/span>&lt;span class="p">;&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>完成觀察後，Session A &lt;code>COMMIT&lt;/code>。這段 lab 展示 long transaction 如何讓 DDL 等待。&lt;/p>
&lt;h2 id="osc-frame">OSC Frame&lt;/h2>
&lt;p>OSC frame 的核心責任是理解 gh-ost / pt-online-schema-change 的證據，而非要求每個 lab 都安裝工具。&lt;/p>
&lt;p>OSC runbook 要記錄：&lt;/p>
&lt;ol>
&lt;li>Source table、ghost table、migration statement。&lt;/li>
&lt;li>Copy progress、chunk size、throttle condition。&lt;/li>
&lt;li>Replication lag / load threshold。&lt;/li>
&lt;li>Cutover pre-check：long transaction、metadata lock、traffic。&lt;/li>
&lt;li>Cutover duration 與 validation query。&lt;/li>
&lt;li>Rollback / drop ghost table policy。&lt;/li>
&lt;/ol>
&lt;p>Cutover 前最重要的是 metadata lock pre-check。工具能降低大部分 copy 風險，但最後 rename / swap 仍需要短暫鎖。&lt;/p>
&lt;h2 id="validation">Validation&lt;/h2>
&lt;p>Validation 的核心責任是證明 schema change 後資料與 query 仍正確。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">mysql -h 127.0.0.1 -P &lt;span class="m">33069&lt;/span> -u app_user -papp_pw appdb &lt;span class="s">&amp;lt;&amp;lt;&amp;#39;SQL&amp;#39;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="s">SELECT COUNT(*) FROM accounts;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">&lt;span class="s">SELECT COUNT(*) FROM ledger_entries;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">&lt;span class="s">EXPLAIN SELECT * FROM accounts WHERE tenant_id = &amp;#39;tenant-a&amp;#39;;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl">&lt;span class="s">SQL&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>正式 migration 要補 row checksum、null rate、index usage、replication lag 與 application smoke test。&lt;/p></description><content:encoded><![CDATA[<p>MySQL online schema change lab 的核心責任是讓讀者看到 schema change 的 metadata lock、algorithm、copy / cutover 與 validation evidence。這篇承接 <a href="../../online-schema-change-tools/">Online Schema Change Tools</a> 與 <a href="../../metadata-lock-deep-dive/">Metadata Lock Deep Dive</a>。</p>
<p>本文的驗收標準是：你能跑一個低風險 ALTER、觀察 metadata lock、記錄 validation query，並理解 gh-ost / pt-osc 的 cutover evidence。</p>
<h2 id="direct-alter-baseline">Direct ALTER Baseline</h2>
<p>Direct ALTER baseline 的核心責任是先看 MySQL 原生 DDL 的行為。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">mysql -h 127.0.0.1 -P <span class="m">33069</span> -u app_user -papp_pw appdb <span class="s">&lt;&lt;&#39;SQL&#39;
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="s">ALTER TABLE accounts ADD COLUMN email VARCHAR(255) NULL;
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="s">SHOW CREATE TABLE accounts\G
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="s">SQL</span></span></span></code></pre></div><p>記錄 ALTER duration、algorithm、lock impact 與 table size。不同 MySQL 版本與 DDL 類型會有不同行為，production 要在 staging dry run。</p>
<h2 id="metadata-lock-observation">Metadata Lock Observation</h2>
<p>Metadata lock observation 的核心責任是看到 blocker。</p>
<p>開 Session A：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">START</span><span class="w"> </span><span class="k">TRANSACTION</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w"></span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">accounts</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span></span></span></code></pre></div><p>保持 transaction 開啟。Session B 執行：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">ALTER</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">accounts</span><span class="w"> </span><span class="k">ADD</span><span class="w"> </span><span class="k">COLUMN</span><span class="w"> </span><span class="n">note</span><span class="w"> </span><span class="nb">VARCHAR</span><span class="p">(</span><span class="mi">255</span><span class="p">)</span><span class="w"> </span><span class="k">NULL</span><span class="p">;</span></span></span></code></pre></div><p>Session C 查：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">SELECT</span><span class="w"> </span><span class="n">OBJECT_SCHEMA</span><span class="p">,</span><span class="w"> </span><span class="n">OBJECT_NAME</span><span class="p">,</span><span class="w"> </span><span class="n">LOCK_TYPE</span><span class="p">,</span><span class="w"> </span><span class="n">LOCK_STATUS</span><span class="p">,</span><span class="w"> </span><span class="n">OWNER_THREAD_ID</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w"></span><span class="k">FROM</span><span class="w"> </span><span class="n">performance_schema</span><span class="p">.</span><span class="n">metadata_locks</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w"></span><span class="k">WHERE</span><span class="w"> </span><span class="n">OBJECT_SCHEMA</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;appdb&#39;</span><span class="p">;</span></span></span></code></pre></div><p>完成觀察後，Session A <code>COMMIT</code>。這段 lab 展示 long transaction 如何讓 DDL 等待。</p>
<h2 id="osc-frame">OSC Frame</h2>
<p>OSC frame 的核心責任是理解 gh-ost / pt-online-schema-change 的證據，而非要求每個 lab 都安裝工具。</p>
<p>OSC runbook 要記錄：</p>
<ol>
<li>Source table、ghost table、migration statement。</li>
<li>Copy progress、chunk size、throttle condition。</li>
<li>Replication lag / load threshold。</li>
<li>Cutover pre-check：long transaction、metadata lock、traffic。</li>
<li>Cutover duration 與 validation query。</li>
<li>Rollback / drop ghost table policy。</li>
</ol>
<p>Cutover 前最重要的是 metadata lock pre-check。工具能降低大部分 copy 風險，但最後 rename / swap 仍需要短暫鎖。</p>
<h2 id="validation">Validation</h2>
<p>Validation 的核心責任是證明 schema change 後資料與 query 仍正確。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">mysql -h 127.0.0.1 -P <span class="m">33069</span> -u app_user -papp_pw appdb <span class="s">&lt;&lt;&#39;SQL&#39;
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="s">SELECT COUNT(*) FROM accounts;
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="s">SELECT COUNT(*) FROM ledger_entries;
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="s">EXPLAIN SELECT * FROM accounts WHERE tenant_id = &#39;tenant-a&#39;;
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="s">SQL</span></span></span></code></pre></div><p>正式 migration 要補 row checksum、null rate、index usage、replication lag 與 application smoke test。</p>
<h2 id="release-gate">Release Gate</h2>
<p>Release gate 的核心責任是形成交付 artifact。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">Migration:
</span></span><span class="line"><span class="ln">2</span><span class="cl">DDL / OSC command:
</span></span><span class="line"><span class="ln">3</span><span class="cl">Table size:
</span></span><span class="line"><span class="ln">4</span><span class="cl">MDL pre-check:
</span></span><span class="line"><span class="ln">5</span><span class="cl">Duration:
</span></span><span class="line"><span class="ln">6</span><span class="cl">Validation:
</span></span><span class="line"><span class="ln">7</span><span class="cl">Rollback:
</span></span><span class="line"><span class="ln">8</span><span class="cl">Owner:</span></span></code></pre></div><p>完成本篇後，MDL 事故讀 <a href="../../metadata-lock-deep-dive/">Metadata Lock Deep Dive</a>；工具選型讀 <a href="../../online-schema-change-tools/">Online Schema Change Tools</a>。</p>
]]></content:encoded></item><item><title>MySQL ProxySQL Routing Lab</title><link>https://tarrragon.github.io/blog/backend/01-database/vendors/mysql/hands-on/proxysql-routing-lab/</link><pubDate>Fri, 22 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/01-database/vendors/mysql/hands-on/proxysql-routing-lab/</guid><description>&lt;p>MySQL ProxySQL routing lab 的核心責任是讓讀者看到 database proxy 如何把 application query 導向不同 hostgroup。這篇承接 &lt;a href="../../proxysql-config/">ProxySQL Config&lt;/a>。&lt;/p>
&lt;p>本文的驗收標準是：你能定義 writer / reader hostgroup、建立 query rule、觀察 routing stats，並寫下 stale read 與 failover 風險。&lt;/p>
&lt;h2 id="hostgroup-model">Hostgroup Model&lt;/h2>
&lt;p>Hostgroup model 的核心責任是把 backend 分成 writer 與 reader。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-text" data-lang="text">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">hostgroup 10: writer
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">hostgroup 20: reader&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>在單節點 lab 中，writer / reader 可以先指向同一 MySQL；正式環境應用 replica 作 reader，並搭配 replication lag guard。&lt;/p>
&lt;h2 id="query-rule">Query Rule&lt;/h2>
&lt;p>Query rule 的核心責任是示範 routing policy。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-sql" data-lang="sql">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="c1">-- Conceptual ProxySQL admin commands. Adjust host / credential for your lab.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="k">INSERT&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">INTO&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">mysql_query_rules&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">rule_id&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">active&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">match_pattern&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">destination_hostgroup&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">apply&lt;/span>&lt;span class="p">)&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="k">VALUES&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">10&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s1">&amp;#39;^SELECT&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="mi">20&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">),&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">20&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s1">&amp;#39;.*&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="mi">10&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">);&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">6&lt;/span>&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="k">LOAD&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">MYSQL&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">QUERY&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">RULES&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">TO&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">RUNTIME&lt;/span>&lt;span class="p">;&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">7&lt;/span>&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="n">SAVE&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">MYSQL&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">QUERY&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">RULES&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">TO&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">DISK&lt;/span>&lt;span class="p">;&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>這個規則把 &lt;code>SELECT&lt;/code> 導向 reader，其餘導向 writer。Production 要排除 &lt;code>SELECT ... FOR UPDATE&lt;/code>、transaction、read-after-write 與 session state。&lt;/p>
&lt;h2 id="routing-evidence">Routing Evidence&lt;/h2>
&lt;p>Routing evidence 的核心責任是確認 query 真的走到預期 hostgroup。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-sql" data-lang="sql">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="k">SELECT&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">hostgroup&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">srv_host&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">Queries&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="k">FROM&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">stats_mysql_connection_pool&lt;/span>&lt;span class="p">;&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="k">SELECT&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">rule_id&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">hits&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="k">FROM&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">stats_mysql_query_rules&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">6&lt;/span>&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="k">ORDER&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">BY&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">rule_id&lt;/span>&lt;span class="p">;&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Evidence 要和 application log 對齊。若某個 workflow 寫後立刻讀，routing rule 要保證它走 writer 或具備 freshness policy。&lt;/p>
&lt;h2 id="failure-note">Failure Note&lt;/h2>
&lt;p>Failure note 的核心責任是記錄 proxy 常見風險。&lt;/p>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>風險&lt;/th>
 &lt;th>控制方式&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>Stale read&lt;/td>
 &lt;td>lag guard、read-after-write to writer&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Transaction split&lt;/td>
 &lt;td>transaction pinning、query rule review&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Bad regex&lt;/td>
 &lt;td>query digest / allowlist&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Backend unhealthy&lt;/td>
 &lt;td>health check、hostgroup failover&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Credential drift&lt;/td>
 &lt;td>ProxySQL user sync / secret rotation&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>完成本篇後，完整設定讀 &lt;a href="../../proxysql-config/">ProxySQL Config&lt;/a>；replica 與 failover 讀 &lt;a href="../replication-failover-lab/">Replication Failover Lab&lt;/a>。&lt;/p></description><content:encoded><![CDATA[<p>MySQL ProxySQL routing lab 的核心責任是讓讀者看到 database proxy 如何把 application query 導向不同 hostgroup。這篇承接 <a href="../../proxysql-config/">ProxySQL Config</a>。</p>
<p>本文的驗收標準是：你能定義 writer / reader hostgroup、建立 query rule、觀察 routing stats，並寫下 stale read 與 failover 風險。</p>
<h2 id="hostgroup-model">Hostgroup Model</h2>
<p>Hostgroup model 的核心責任是把 backend 分成 writer 與 reader。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">hostgroup 10: writer
</span></span><span class="line"><span class="ln">2</span><span class="cl">hostgroup 20: reader</span></span></code></pre></div><p>在單節點 lab 中，writer / reader 可以先指向同一 MySQL；正式環境應用 replica 作 reader，並搭配 replication lag guard。</p>
<h2 id="query-rule">Query Rule</h2>
<p>Query rule 的核心責任是示範 routing policy。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1">-- Conceptual ProxySQL admin commands. Adjust host / credential for your lab.
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"></span><span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">mysql_query_rules</span><span class="p">(</span><span class="n">rule_id</span><span class="p">,</span><span class="w"> </span><span class="n">active</span><span class="p">,</span><span class="w"> </span><span class="n">match_pattern</span><span class="p">,</span><span class="w"> </span><span class="n">destination_hostgroup</span><span class="p">,</span><span class="w"> </span><span class="n">apply</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w"></span><span class="k">VALUES</span><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w">  </span><span class="p">(</span><span class="mi">10</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;^SELECT&#39;</span><span class="p">,</span><span class="w"> </span><span class="mi">20</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">),</span><span class="w">
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="w">  </span><span class="p">(</span><span class="mi">20</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;.*&#39;</span><span class="p">,</span><span class="w"> </span><span class="mi">10</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="w"></span><span class="k">LOAD</span><span class="w"> </span><span class="n">MYSQL</span><span class="w"> </span><span class="n">QUERY</span><span class="w"> </span><span class="n">RULES</span><span class="w"> </span><span class="k">TO</span><span class="w"> </span><span class="n">RUNTIME</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="w"></span><span class="n">SAVE</span><span class="w"> </span><span class="n">MYSQL</span><span class="w"> </span><span class="n">QUERY</span><span class="w"> </span><span class="n">RULES</span><span class="w"> </span><span class="k">TO</span><span class="w"> </span><span class="n">DISK</span><span class="p">;</span></span></span></code></pre></div><p>這個規則把 <code>SELECT</code> 導向 reader，其餘導向 writer。Production 要排除 <code>SELECT ... FOR UPDATE</code>、transaction、read-after-write 與 session state。</p>
<h2 id="routing-evidence">Routing Evidence</h2>
<p>Routing evidence 的核心責任是確認 query 真的走到預期 hostgroup。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">SELECT</span><span class="w"> </span><span class="n">hostgroup</span><span class="p">,</span><span class="w"> </span><span class="n">srv_host</span><span class="p">,</span><span class="w"> </span><span class="n">Queries</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w"></span><span class="k">FROM</span><span class="w"> </span><span class="n">stats_mysql_connection_pool</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w"></span><span class="k">SELECT</span><span class="w"> </span><span class="n">rule_id</span><span class="p">,</span><span class="w"> </span><span class="n">hits</span><span class="w">
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="w"></span><span class="k">FROM</span><span class="w"> </span><span class="n">stats_mysql_query_rules</span><span class="w">
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="w"></span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">rule_id</span><span class="p">;</span></span></span></code></pre></div><p>Evidence 要和 application log 對齊。若某個 workflow 寫後立刻讀，routing rule 要保證它走 writer 或具備 freshness policy。</p>
<h2 id="failure-note">Failure Note</h2>
<p>Failure note 的核心責任是記錄 proxy 常見風險。</p>
<table>
  <thead>
      <tr>
          <th>風險</th>
          <th>控制方式</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Stale read</td>
          <td>lag guard、read-after-write to writer</td>
      </tr>
      <tr>
          <td>Transaction split</td>
          <td>transaction pinning、query rule review</td>
      </tr>
      <tr>
          <td>Bad regex</td>
          <td>query digest / allowlist</td>
      </tr>
      <tr>
          <td>Backend unhealthy</td>
          <td>health check、hostgroup failover</td>
      </tr>
      <tr>
          <td>Credential drift</td>
          <td>ProxySQL user sync / secret rotation</td>
      </tr>
  </tbody>
</table>
<p>完成本篇後，完整設定讀 <a href="../../proxysql-config/">ProxySQL Config</a>；replica 與 failover 讀 <a href="../replication-failover-lab/">Replication Failover Lab</a>。</p>
]]></content:encoded></item><item><title>MySQL Replication Failover Lab</title><link>https://tarrragon.github.io/blog/backend/01-database/vendors/mysql/hands-on/replication-failover-lab/</link><pubDate>Fri, 22 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/01-database/vendors/mysql/hands-on/replication-failover-lab/</guid><description>&lt;p>MySQL replication failover lab 的核心責任是讓讀者觀察 source / replica 拓撲在 promotion 時的資料與 client route。這篇承接 &lt;a href="../../replication-topology/">Replication Topology&lt;/a> 與 &lt;a href="../../orchestrator-failover/">Orchestrator Failover&lt;/a>。&lt;/p>
&lt;p>本文的驗收標準是：你能記錄 replication status、lag、promotion timeline、client error sample、validation query 與 incident decision log。&lt;/p>
&lt;h2 id="baseline-replication">Baseline Replication&lt;/h2>
&lt;p>Baseline replication 的核心責任是先保存 source / replica 狀態。實際建立 replication 依 GTID、binlog file position、Docker topology 或 managed service 而異；本文聚焦演練 evidence。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-sql" data-lang="sql">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="k">SHOW&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">REPLICA&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">STATUS&lt;/span>&lt;span class="err">\&lt;/span>&lt;span class="k">G&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="k">SHOW&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="nb">BINARY&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">LOG&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">STATUS&lt;/span>&lt;span class="p">;&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Baseline 要記錄：&lt;/p>
&lt;ol>
&lt;li>Source host / replica host。&lt;/li>
&lt;li>GTID executed / retrieved。&lt;/li>
&lt;li>IO thread / SQL thread。&lt;/li>
&lt;li>Seconds behind source。&lt;/li>
&lt;li>Read endpoint / write endpoint。&lt;/li>
&lt;/ol>
&lt;h2 id="client-workload">Client Workload&lt;/h2>
&lt;p>Client workload 的核心責任是讓 failover 對 application 可見。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="k">while&lt;/span> true&lt;span class="p">;&lt;/span> &lt;span class="k">do&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl"> mysql -h &lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="nv">$MYSQL_WRITE_HOST&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span> -u app_user -papp_pw appdb &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">&lt;span class="se">&lt;/span> -e &lt;span class="s2">&amp;#34;INSERT INTO ledger_entries(account_id, amount_cents, idempotency_key) VALUES (1, 1, UUID());&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl"> sleep &lt;span class="m">1&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl">&lt;span class="k">done&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>這個 synthetic workload 產生成功、timeout、duplicate、read-only 或 connection error。正式演練要避免碰 production side effect。&lt;/p>
&lt;h2 id="promotion-frame">Promotion Frame&lt;/h2>
&lt;p>Promotion frame 的核心責任是把 failover action 寫成可審查步驟。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-text" data-lang="text">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">failover_start:
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">old_source:
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">candidate_replica:
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">lag_before:
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl">promotion_method:
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">6&lt;/span>&lt;span class="cl">accepted_data_loss:
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">7&lt;/span>&lt;span class="cl">operator:&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Managed service、Orchestrator 或手動 promotion 都要留下同樣欄位。工具不同，決策證據一致。&lt;/p>
&lt;h2 id="validation">Validation&lt;/h2>
&lt;p>Validation 的核心責任是確認 promoted instance 可讀寫且資料符合預期。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-sql" data-lang="sql">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="k">SELECT&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">COUNT&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="o">*&lt;/span>&lt;span class="p">)&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">FROM&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">ledger_entries&lt;/span>&lt;span class="p">;&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="k">SELECT&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">MAX&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">created_at&lt;/span>&lt;span class="p">)&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">FROM&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">ledger_entries&lt;/span>&lt;span class="p">;&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="k">SHOW&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">VARIABLES&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">LIKE&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s1">&amp;#39;read_only&amp;#39;&lt;/span>&lt;span class="p">;&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="k">SHOW&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">VARIABLES&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">LIKE&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s1">&amp;#39;super_read_only&amp;#39;&lt;/span>&lt;span class="p">;&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>若使用 GTID，還要比較 source / replica 的 GTID set。若有 external side effect，要用 idempotency key 做 reconciliation。&lt;/p>
&lt;h2 id="client-route">Client Route&lt;/h2>
&lt;p>Client route 的核心責任是確認 application、ProxySQL、DNS 或 secret 已指向新 writer。&lt;/p>
&lt;p>檢查項目：&lt;/p>
&lt;ol>
&lt;li>Write endpoint 是否更新。&lt;/li>
&lt;li>ProxySQL writer hostgroup 是否切換。&lt;/li>
&lt;li>Application pool 是否清掉舊連線。&lt;/li>
&lt;li>Retry 是否有 backoff。&lt;/li>
&lt;li>Read replica 是否重新掛到新 source。&lt;/li>
&lt;/ol>
&lt;p>Failover 完成標準包含資料庫 promotion 與 client route 穩定。只 promote 成功，application 仍可能寫到舊 endpoint。&lt;/p></description><content:encoded><![CDATA[<p>MySQL replication failover lab 的核心責任是讓讀者觀察 source / replica 拓撲在 promotion 時的資料與 client route。這篇承接 <a href="../../replication-topology/">Replication Topology</a> 與 <a href="../../orchestrator-failover/">Orchestrator Failover</a>。</p>
<p>本文的驗收標準是：你能記錄 replication status、lag、promotion timeline、client error sample、validation query 與 incident decision log。</p>
<h2 id="baseline-replication">Baseline Replication</h2>
<p>Baseline replication 的核心責任是先保存 source / replica 狀態。實際建立 replication 依 GTID、binlog file position、Docker topology 或 managed service 而異；本文聚焦演練 evidence。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">SHOW</span><span class="w"> </span><span class="n">REPLICA</span><span class="w"> </span><span class="n">STATUS</span><span class="err">\</span><span class="k">G</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w"></span><span class="k">SHOW</span><span class="w"> </span><span class="nb">BINARY</span><span class="w"> </span><span class="n">LOG</span><span class="w"> </span><span class="n">STATUS</span><span class="p">;</span></span></span></code></pre></div><p>Baseline 要記錄：</p>
<ol>
<li>Source host / replica host。</li>
<li>GTID executed / retrieved。</li>
<li>IO thread / SQL thread。</li>
<li>Seconds behind source。</li>
<li>Read endpoint / write endpoint。</li>
</ol>
<h2 id="client-workload">Client Workload</h2>
<p>Client workload 的核心責任是讓 failover 對 application 可見。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">while</span> true<span class="p">;</span> <span class="k">do</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">  mysql -h <span class="s2">&#34;</span><span class="nv">$MYSQL_WRITE_HOST</span><span class="s2">&#34;</span> -u app_user -papp_pw appdb <span class="se">\
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="se"></span>    -e <span class="s2">&#34;INSERT INTO ledger_entries(account_id, amount_cents, idempotency_key) VALUES (1, 1, UUID());&#34;</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">  sleep <span class="m">1</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="k">done</span></span></span></code></pre></div><p>這個 synthetic workload 產生成功、timeout、duplicate、read-only 或 connection error。正式演練要避免碰 production side effect。</p>
<h2 id="promotion-frame">Promotion Frame</h2>
<p>Promotion frame 的核心責任是把 failover action 寫成可審查步驟。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">failover_start:
</span></span><span class="line"><span class="ln">2</span><span class="cl">old_source:
</span></span><span class="line"><span class="ln">3</span><span class="cl">candidate_replica:
</span></span><span class="line"><span class="ln">4</span><span class="cl">lag_before:
</span></span><span class="line"><span class="ln">5</span><span class="cl">promotion_method:
</span></span><span class="line"><span class="ln">6</span><span class="cl">accepted_data_loss:
</span></span><span class="line"><span class="ln">7</span><span class="cl">operator:</span></span></code></pre></div><p>Managed service、Orchestrator 或手動 promotion 都要留下同樣欄位。工具不同，決策證據一致。</p>
<h2 id="validation">Validation</h2>
<p>Validation 的核心責任是確認 promoted instance 可讀寫且資料符合預期。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">SELECT</span><span class="w"> </span><span class="k">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">ledger_entries</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w"></span><span class="k">SELECT</span><span class="w"> </span><span class="k">MAX</span><span class="p">(</span><span class="n">created_at</span><span class="p">)</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">ledger_entries</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w"></span><span class="k">SHOW</span><span class="w"> </span><span class="n">VARIABLES</span><span class="w"> </span><span class="k">LIKE</span><span class="w"> </span><span class="s1">&#39;read_only&#39;</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w"></span><span class="k">SHOW</span><span class="w"> </span><span class="n">VARIABLES</span><span class="w"> </span><span class="k">LIKE</span><span class="w"> </span><span class="s1">&#39;super_read_only&#39;</span><span class="p">;</span></span></span></code></pre></div><p>若使用 GTID，還要比較 source / replica 的 GTID set。若有 external side effect，要用 idempotency key 做 reconciliation。</p>
<h2 id="client-route">Client Route</h2>
<p>Client route 的核心責任是確認 application、ProxySQL、DNS 或 secret 已指向新 writer。</p>
<p>檢查項目：</p>
<ol>
<li>Write endpoint 是否更新。</li>
<li>ProxySQL writer hostgroup 是否切換。</li>
<li>Application pool 是否清掉舊連線。</li>
<li>Retry 是否有 backoff。</li>
<li>Read replica 是否重新掛到新 source。</li>
</ol>
<p>Failover 完成標準包含資料庫 promotion 與 client route 穩定。只 promote 成功，application 仍可能寫到舊 endpoint。</p>
<h2 id="incident-log">Incident Log</h2>
<p>Incident log 的核心責任是把演練結果保存。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">Drill id:
</span></span><span class="line"><span class="ln">2</span><span class="cl">RTO observed:
</span></span><span class="line"><span class="ln">3</span><span class="cl">RPO / accepted data loss:
</span></span><span class="line"><span class="ln">4</span><span class="cl">Client errors:
</span></span><span class="line"><span class="ln">5</span><span class="cl">Validation:
</span></span><span class="line"><span class="ln">6</span><span class="cl">Follow-up:</span></span></code></pre></div><p>完成本篇後，拓撲設計讀 <a href="../../replication-topology/">Replication Topology</a>；自動化工具讀 <a href="../../orchestrator-failover/">Orchestrator Failover</a>；routing 讀 <a href="../proxysql-routing-lab/">ProxySQL Routing Lab</a>。</p>
]]></content:encoded></item><item><title>MySQL Vitess Sandbox Route</title><link>https://tarrragon.github.io/blog/backend/01-database/vendors/mysql/hands-on/vitess-sandbox-route/</link><pubDate>Fri, 22 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/01-database/vendors/mysql/hands-on/vitess-sandbox-route/</guid><description>&lt;p>MySQL Vitess sandbox route 的核心責任是讓讀者用 sandbox 理解 Vitess 如何把 MySQL 拓展成 sharded database platform。這篇承接 &lt;a href="../../vitess-sharding/">Vitess Sharding&lt;/a> 與 &lt;a href="../../migrate-to-planetscale/">MySQL to PlanetScale&lt;/a>。&lt;/p>
&lt;p>本文的驗收標準是：你能建立 sandbox、辨識 keyspace / shard / tablet / vtgate、跑基本 query，並記錄 resharding preview 的 evidence。&lt;/p>
&lt;p>官方文件路由的核心責任是固定 sandbox 指令。實作前先查 &lt;a href="https://vitess.io/docs/21.0/get-started/local/">Vitess local install docs&lt;/a>；本文最後檢查日是 2026-05-22。&lt;/p>
&lt;h2 id="concept-map">Concept Map&lt;/h2>
&lt;p>Concept map 的核心責任是先建立 Vitess vocabulary。&lt;/p>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>概念&lt;/th>
 &lt;th>責任&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>Keyspace&lt;/td>
 &lt;td>logical database / routing boundary&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Shard&lt;/td>
 &lt;td>keyrange 分片&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Tablet&lt;/td>
 &lt;td>MySQL instance + Vitess sidecar role&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>vtgate&lt;/td>
 &lt;td>application query routing endpoint&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>VSchema&lt;/td>
 &lt;td>routing、vindex、sharding metadata&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>VReplication&lt;/td>
 &lt;td>resharding / materialize workflow&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>Vitess 的重點是 routing 與 resharding。Application 看到的是 vtgate；底下是多個 MySQL tablet 與 topology service。&lt;/p>
&lt;h2 id="sandbox-setup">Sandbox Setup&lt;/h2>
&lt;p>Sandbox setup 的核心責任是使用官方 sandbox 建立可丟棄環境。實際命令依 Vitess 版本調整，正式操作以 Vitess 官方文件為準。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="c1"># Conceptual route. Use the current Vitess examples for exact commands.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">git clone https://github.com/vitessio/vitess.git
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">&lt;span class="nb">cd&lt;/span> vitess/examples/local
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">./101_initial_cluster.sh&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>啟動後要記錄：&lt;/p>
&lt;ol>
&lt;li>Vitess version。&lt;/li>
&lt;li>Keyspace name。&lt;/li>
&lt;li>Shard count。&lt;/li>
&lt;li>vtgate host / port。&lt;/li>
&lt;li>Tablet roles。&lt;/li>
&lt;/ol>
&lt;h2 id="query-through-vtgate">Query Through vtgate&lt;/h2>
&lt;p>Query through vtgate 的核心責任是確認 application 走 routing layer。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">mysql -h 127.0.0.1 -P &lt;span class="m">15306&lt;/span> -u user &lt;span class="s">&amp;lt;&amp;lt;&amp;#39;SQL&amp;#39;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="s">SHOW DATABASES;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">&lt;span class="s">USE commerce;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">&lt;span class="s">SHOW TABLES;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl">&lt;span class="s">SELECT * FROM product LIMIT 5;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">6&lt;/span>&lt;span class="cl">&lt;span class="s">SQL&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Evidence 要包含 query result、target keyspace、vtgate endpoint 與 tablet health。Production migration 要確認 ORM / driver 對 vtgate 的相容性。&lt;/p>
&lt;h2 id="vschema-review">VSchema Review&lt;/h2>
&lt;p>VSchema review 的核心責任是理解 shard key 與 routing。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="c1"># Conceptual command; exact path depends on sandbox.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">cat vschema_commerce_initial.json&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>審查問題：&lt;/p>
&lt;ol>
&lt;li>哪些 table 是 sharded。&lt;/li>
&lt;li>shard key / vindex 是什麼。&lt;/li>
&lt;li>lookup vindex 是否需要維護。&lt;/li>
&lt;li>cross-shard query 是否存在。&lt;/li>
&lt;li>sequence / id generation 如何處理。&lt;/li>
&lt;/ol>
&lt;p>VSchema 是 Vitess migration 的核心設計文件。選錯 shard key 會讓跨 shard transaction、hot shard 與 resharding 成本升高。&lt;/p></description><content:encoded><![CDATA[<p>MySQL Vitess sandbox route 的核心責任是讓讀者用 sandbox 理解 Vitess 如何把 MySQL 拓展成 sharded database platform。這篇承接 <a href="../../vitess-sharding/">Vitess Sharding</a> 與 <a href="../../migrate-to-planetscale/">MySQL to PlanetScale</a>。</p>
<p>本文的驗收標準是：你能建立 sandbox、辨識 keyspace / shard / tablet / vtgate、跑基本 query，並記錄 resharding preview 的 evidence。</p>
<p>官方文件路由的核心責任是固定 sandbox 指令。實作前先查 <a href="https://vitess.io/docs/21.0/get-started/local/">Vitess local install docs</a>；本文最後檢查日是 2026-05-22。</p>
<h2 id="concept-map">Concept Map</h2>
<p>Concept map 的核心責任是先建立 Vitess vocabulary。</p>
<table>
  <thead>
      <tr>
          <th>概念</th>
          <th>責任</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Keyspace</td>
          <td>logical database / routing boundary</td>
      </tr>
      <tr>
          <td>Shard</td>
          <td>keyrange 分片</td>
      </tr>
      <tr>
          <td>Tablet</td>
          <td>MySQL instance + Vitess sidecar role</td>
      </tr>
      <tr>
          <td>vtgate</td>
          <td>application query routing endpoint</td>
      </tr>
      <tr>
          <td>VSchema</td>
          <td>routing、vindex、sharding metadata</td>
      </tr>
      <tr>
          <td>VReplication</td>
          <td>resharding / materialize workflow</td>
      </tr>
  </tbody>
</table>
<p>Vitess 的重點是 routing 與 resharding。Application 看到的是 vtgate；底下是多個 MySQL tablet 與 topology service。</p>
<h2 id="sandbox-setup">Sandbox Setup</h2>
<p>Sandbox setup 的核心責任是使用官方 sandbox 建立可丟棄環境。實際命令依 Vitess 版本調整，正式操作以 Vitess 官方文件為準。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># Conceptual route. Use the current Vitess examples for exact commands.</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">git clone https://github.com/vitessio/vitess.git
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="nb">cd</span> vitess/examples/local
</span></span><span class="line"><span class="ln">4</span><span class="cl">./101_initial_cluster.sh</span></span></code></pre></div><p>啟動後要記錄：</p>
<ol>
<li>Vitess version。</li>
<li>Keyspace name。</li>
<li>Shard count。</li>
<li>vtgate host / port。</li>
<li>Tablet roles。</li>
</ol>
<h2 id="query-through-vtgate">Query Through vtgate</h2>
<p>Query through vtgate 的核心責任是確認 application 走 routing layer。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">mysql -h 127.0.0.1 -P <span class="m">15306</span> -u user <span class="s">&lt;&lt;&#39;SQL&#39;
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="s">SHOW DATABASES;
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="s">USE commerce;
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="s">SHOW TABLES;
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="s">SELECT * FROM product LIMIT 5;
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="s">SQL</span></span></span></code></pre></div><p>Evidence 要包含 query result、target keyspace、vtgate endpoint 與 tablet health。Production migration 要確認 ORM / driver 對 vtgate 的相容性。</p>
<h2 id="vschema-review">VSchema Review</h2>
<p>VSchema review 的核心責任是理解 shard key 與 routing。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># Conceptual command; exact path depends on sandbox.</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">cat vschema_commerce_initial.json</span></span></code></pre></div><p>審查問題：</p>
<ol>
<li>哪些 table 是 sharded。</li>
<li>shard key / vindex 是什麼。</li>
<li>lookup vindex 是否需要維護。</li>
<li>cross-shard query 是否存在。</li>
<li>sequence / id generation 如何處理。</li>
</ol>
<p>VSchema 是 Vitess migration 的核心設計文件。選錯 shard key 會讓跨 shard transaction、hot shard 與 resharding 成本升高。</p>
<h2 id="resharding-preview">Resharding Preview</h2>
<p>Resharding preview 的核心責任是看見 Vitess 的主要價值與操作成本。</p>
<p>Resharding evidence 欄位：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">source shard:
</span></span><span class="line"><span class="ln">2</span><span class="cl">target shards:
</span></span><span class="line"><span class="ln">3</span><span class="cl">workflow name:
</span></span><span class="line"><span class="ln">4</span><span class="cl">copy phase duration:
</span></span><span class="line"><span class="ln">5</span><span class="cl">replication lag:
</span></span><span class="line"><span class="ln">6</span><span class="cl">cutover time:
</span></span><span class="line"><span class="ln">7</span><span class="cl">validation query:
</span></span><span class="line"><span class="ln">8</span><span class="cl">rollback:</span></span></code></pre></div><p>Resharding 是 production operation，不只是一次 migration。Runbook 要包含 throttling、lag、tablet health、cutover 與 application query validation。</p>
<h2 id="migration-decision">Migration Decision</h2>
<p>Migration decision 的核心責任是判斷何時從 MySQL 走向 Vitess / PlanetScale 類路線。</p>
<table>
  <thead>
      <tr>
          <th>訊號</th>
          <th>意義</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>單 MySQL writer 到頂</td>
          <td>需要 horizontal write scaling</td>
      </tr>
      <tr>
          <td>tenant shard boundary 清楚</td>
          <td>Vitess keyspace / shard 有機會匹配</td>
      </tr>
      <tr>
          <td>online resharding 是核心需求</td>
          <td>Vitess value 高</td>
      </tr>
      <tr>
          <td>app 缺少 routing 語意改造空間</td>
          <td>先重構 repository / query</td>
      </tr>
  </tbody>
</table>
<p>完成本篇後，設計細節讀 <a href="../../vitess-sharding/">Vitess Sharding</a>；managed route 讀 <a href="../../migrate-to-planetscale/">MySQL to PlanetScale</a>。</p>
]]></content:encoded></item><item><title>PostgreSQL Connection Pool Lab</title><link>https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/hands-on/connection-pool-lab/</link><pubDate>Fri, 22 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/hands-on/connection-pool-lab/</guid><description>&lt;p>PostgreSQL connection pool lab 的核心責任是讓讀者看到 connection pressure 如何從 application pool 傳到 PostgreSQL backend process。這篇承接 &lt;a href="../../connection-scaling/">Connection Scaling&lt;/a> 與 &lt;a href="../../pgbouncer-config/">PgBouncer Config&lt;/a>。&lt;/p>
&lt;p>本文的驗收標準是：你能比較 direct connection 與 PgBouncer transaction pooling，取得 &lt;code>pg_stat_activity&lt;/code>、PgBouncer &lt;code>SHOW POOLS&lt;/code>、latency / error sample 與 failure note。&lt;/p>
&lt;h2 id="baseline-direct-connections">Baseline Direct Connections&lt;/h2>
&lt;p>Baseline direct connections 的核心責任是先看 application 直連 PostgreSQL 時的 backend 數。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="nb">export&lt;/span> &lt;span class="nv">DATABASE_URL&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;postgres://lab_admin:lab_admin_pw@localhost:54329/appdb?sslmode=disable&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">psql &lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="nv">$DATABASE_URL&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span> -c &lt;span class="s2">&amp;#34;SELECT count(*) FROM pg_stat_activity WHERE datname = current_database();&amp;#34;&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>用多個 terminal 或簡單 workload 產生 idle connection：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="k">for&lt;/span> i in &lt;span class="m">1&lt;/span> &lt;span class="m">2&lt;/span> &lt;span class="m">3&lt;/span> &lt;span class="m">4&lt;/span> 5&lt;span class="p">;&lt;/span> &lt;span class="k">do&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl"> psql &lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="nv">$DATABASE_URL&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span> -c &lt;span class="s2">&amp;#34;SELECT pg_sleep(10);&amp;#34;&lt;/span> &lt;span class="p">&amp;amp;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">&lt;span class="k">done&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">psql &lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="nv">$DATABASE_URL&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span> -c &lt;span class="s2">&amp;#34;SELECT state, count(*) FROM pg_stat_activity WHERE datname = current_database() GROUP BY state;&amp;#34;&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>這一步證明每個 client session 會占用 PostgreSQL backend process。&lt;/p>
&lt;h2 id="add-pgbouncer">Add PgBouncer&lt;/h2>
&lt;p>Add PgBouncer 的核心責任是把 client connection 與 server connection 拆開。以下 compose fragment 可加入 local lab：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-yaml" data-lang="yaml">&lt;span class="line">&lt;span class="ln"> 1&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">pgbouncer&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 2&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">image&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">edoburu/pgbouncer:latest&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 3&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">environment&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 4&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">DB_HOST&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">postgres&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 5&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">DB_USER&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">lab_admin&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 6&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">DB_PASSWORD&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">lab_admin_pw&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 7&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">DB_NAME&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">appdb&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 8&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">POOL_MODE&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">transaction&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 9&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">MAX_CLIENT_CONN&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="m">100&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">10&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">DEFAULT_POOL_SIZE&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="m">5&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">11&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">ports&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">12&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>- &lt;span class="s2">&amp;#34;64329:5432&amp;#34;&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>啟動後設定 pooler URL：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="nb">export&lt;/span> &lt;span class="nv">POOL_URL&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;postgres://lab_admin:lab_admin_pw@localhost:64329/appdb?sslmode=disable&amp;#34;&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="compare-pool-behavior">Compare Pool Behavior&lt;/h2>
&lt;p>Compare pool behavior 的核心責任是觀察 client 多、server 少的效果。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="k">for&lt;/span> i in &lt;span class="k">$(&lt;/span>seq &lt;span class="m">1&lt;/span> 20&lt;span class="k">)&lt;/span>&lt;span class="p">;&lt;/span> &lt;span class="k">do&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl"> psql &lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="nv">$POOL_URL&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span> -c &lt;span class="s2">&amp;#34;SELECT pg_sleep(1);&amp;#34;&lt;/span> &lt;span class="p">&amp;amp;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">&lt;span class="k">done&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">psql &lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="nv">$DATABASE_URL&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span> -c &lt;span class="s2">&amp;#34;SELECT state, count(*) FROM pg_stat_activity WHERE datname = current_database() GROUP BY state;&amp;#34;&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>再進 PgBouncer admin console，實際命令依 image 設定調整：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">psql &lt;span class="s2">&amp;#34;postgres://lab_admin:lab_admin_pw@localhost:64329/pgbouncer?sslmode=disable&amp;#34;&lt;/span> -c &lt;span class="s2">&amp;#34;SHOW POOLS;&amp;#34;&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>驗收重點是：client workload 增加時，PostgreSQL backend 數量被 pool size 控制，排隊發生在 pooler 層。&lt;/p>
&lt;h2 id="pool-exhaustion">Pool Exhaustion&lt;/h2>
&lt;p>Pool exhaustion 的核心責任是看過載時的錯誤與等待。&lt;/p></description><content:encoded><![CDATA[<p>PostgreSQL connection pool lab 的核心責任是讓讀者看到 connection pressure 如何從 application pool 傳到 PostgreSQL backend process。這篇承接 <a href="../../connection-scaling/">Connection Scaling</a> 與 <a href="../../pgbouncer-config/">PgBouncer Config</a>。</p>
<p>本文的驗收標準是：你能比較 direct connection 與 PgBouncer transaction pooling，取得 <code>pg_stat_activity</code>、PgBouncer <code>SHOW POOLS</code>、latency / error sample 與 failure note。</p>
<h2 id="baseline-direct-connections">Baseline Direct Connections</h2>
<p>Baseline direct connections 的核心責任是先看 application 直連 PostgreSQL 時的 backend 數。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="nb">export</span> <span class="nv">DATABASE_URL</span><span class="o">=</span><span class="s2">&#34;postgres://lab_admin:lab_admin_pw@localhost:54329/appdb?sslmode=disable&#34;</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">psql <span class="s2">&#34;</span><span class="nv">$DATABASE_URL</span><span class="s2">&#34;</span> -c <span class="s2">&#34;SELECT count(*) FROM pg_stat_activity WHERE datname = current_database();&#34;</span></span></span></code></pre></div><p>用多個 terminal 或簡單 workload 產生 idle connection：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">for</span> i in <span class="m">1</span> <span class="m">2</span> <span class="m">3</span> <span class="m">4</span> 5<span class="p">;</span> <span class="k">do</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">  psql <span class="s2">&#34;</span><span class="nv">$DATABASE_URL</span><span class="s2">&#34;</span> -c <span class="s2">&#34;SELECT pg_sleep(10);&#34;</span> <span class="p">&amp;</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="k">done</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">psql <span class="s2">&#34;</span><span class="nv">$DATABASE_URL</span><span class="s2">&#34;</span> -c <span class="s2">&#34;SELECT state, count(*) FROM pg_stat_activity WHERE datname = current_database() GROUP BY state;&#34;</span></span></span></code></pre></div><p>這一步證明每個 client session 會占用 PostgreSQL backend process。</p>
<h2 id="add-pgbouncer">Add PgBouncer</h2>
<p>Add PgBouncer 的核心責任是把 client connection 與 server connection 拆開。以下 compose fragment 可加入 local lab：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="w">  </span><span class="nt">pgbouncer</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="w">    </span><span class="nt">image</span><span class="p">:</span><span class="w"> </span><span class="l">edoburu/pgbouncer:latest</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="w">    </span><span class="nt">environment</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="w">      </span><span class="nt">DB_HOST</span><span class="p">:</span><span class="w"> </span><span class="l">postgres</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="w">      </span><span class="nt">DB_USER</span><span class="p">:</span><span class="w"> </span><span class="l">lab_admin</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="w">      </span><span class="nt">DB_PASSWORD</span><span class="p">:</span><span class="w"> </span><span class="l">lab_admin_pw</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="w">      </span><span class="nt">DB_NAME</span><span class="p">:</span><span class="w"> </span><span class="l">appdb</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="w">      </span><span class="nt">POOL_MODE</span><span class="p">:</span><span class="w"> </span><span class="l">transaction</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="w">      </span><span class="nt">MAX_CLIENT_CONN</span><span class="p">:</span><span class="w"> </span><span class="m">100</span><span class="w">
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="w">      </span><span class="nt">DEFAULT_POOL_SIZE</span><span class="p">:</span><span class="w"> </span><span class="m">5</span><span class="w">
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="w">    </span><span class="nt">ports</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="w">      </span>- <span class="s2">&#34;64329:5432&#34;</span></span></span></code></pre></div><p>啟動後設定 pooler URL：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="nb">export</span> <span class="nv">POOL_URL</span><span class="o">=</span><span class="s2">&#34;postgres://lab_admin:lab_admin_pw@localhost:64329/appdb?sslmode=disable&#34;</span></span></span></code></pre></div><h2 id="compare-pool-behavior">Compare Pool Behavior</h2>
<p>Compare pool behavior 的核心責任是觀察 client 多、server 少的效果。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">for</span> i in <span class="k">$(</span>seq <span class="m">1</span> 20<span class="k">)</span><span class="p">;</span> <span class="k">do</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">  psql <span class="s2">&#34;</span><span class="nv">$POOL_URL</span><span class="s2">&#34;</span> -c <span class="s2">&#34;SELECT pg_sleep(1);&#34;</span> <span class="p">&amp;</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="k">done</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">psql <span class="s2">&#34;</span><span class="nv">$DATABASE_URL</span><span class="s2">&#34;</span> -c <span class="s2">&#34;SELECT state, count(*) FROM pg_stat_activity WHERE datname = current_database() GROUP BY state;&#34;</span></span></span></code></pre></div><p>再進 PgBouncer admin console，實際命令依 image 設定調整：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">psql <span class="s2">&#34;postgres://lab_admin:lab_admin_pw@localhost:64329/pgbouncer?sslmode=disable&#34;</span> -c <span class="s2">&#34;SHOW POOLS;&#34;</span></span></span></code></pre></div><p>驗收重點是：client workload 增加時，PostgreSQL backend 數量被 pool size 控制，排隊發生在 pooler 層。</p>
<h2 id="pool-exhaustion">Pool Exhaustion</h2>
<p>Pool exhaustion 的核心責任是看過載時的錯誤與等待。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">for</span> i in <span class="k">$(</span>seq <span class="m">1</span> 50<span class="k">)</span><span class="p">;</span> <span class="k">do</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">  psql <span class="s2">&#34;</span><span class="nv">$POOL_URL</span><span class="s2">&#34;</span> -c <span class="s2">&#34;BEGIN; SELECT pg_sleep(5); COMMIT;&#34;</span> <span class="p">&amp;</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="k">done</span></span></span></code></pre></div><p>觀察：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">psql <span class="s2">&#34;</span><span class="nv">$DATABASE_URL</span><span class="s2">&#34;</span> -c <span class="s2">&#34;SELECT count(*) FROM pg_stat_activity WHERE datname = current_database();&#34;</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">psql <span class="s2">&#34;postgres://lab_admin:lab_admin_pw@localhost:64329/pgbouncer?sslmode=disable&#34;</span> -c <span class="s2">&#34;SHOW POOLS;&#34;</span></span></span></code></pre></div><p>Pool exhaustion 的 evidence 包含 waiting clients、timeout、application latency 與 error message。這些要接到 production alert。</p>
<h2 id="failure-note">Failure Note</h2>
<p>Failure note 的核心責任是把 lab 結果轉成 runbook。記錄三件事：</p>
<ol>
<li>Direct connection baseline backend 數。</li>
<li>PgBouncer transaction pooling 下 server connection 數。</li>
<li>Pool exhaustion 時的 latency / error / queue。</li>
</ol>
<p>若 application 使用 session state、prepared statement、temp table 或 advisory lock，還要補 transaction pooling compatibility matrix。</p>
<h2 id="下一步路由">下一步路由</h2>
<p>完成本篇後，回到 <a href="../../connection-pooler-comparison/">Connection Pooler Comparison</a> 做選型；要看 PgBouncer production 設定讀 <a href="../../pgbouncer-config/">PgBouncer Config</a>。</p>
]]></content:encoded></item><item><title>PostgreSQL HA Failover Drill</title><link>https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/hands-on/ha-failover-drill/</link><pubDate>Fri, 22 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/hands-on/ha-failover-drill/</guid><description>&lt;p>PostgreSQL HA failover drill 的核心責任是讓讀者觀察 primary promotion 對 application、pooler 與 incident decision 的影響。這篇承接 &lt;a href="../../patroni-ha/">Patroni HA&lt;/a> 與 &lt;a href="../../cross-region-dr/">Cross-region DR&lt;/a>。&lt;/p>
&lt;p>本文的驗收標準是：你能記錄 failover timeline、replication lag snapshot、client error sample、data validation query 與 incident decision log entry。實際觸發方式依 Patroni、managed PostgreSQL 或雲平台而異；lab 重點是 evidence。&lt;/p>
&lt;h2 id="pre-failover-baseline">Pre-Failover Baseline&lt;/h2>
&lt;p>Pre-failover baseline 的核心責任是確認 primary / standby 狀態與 client route。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-sql" data-lang="sql">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="k">SELECT&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">pg_is_in_recovery&lt;/span>&lt;span class="p">();&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="k">SELECT&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">now&lt;/span>&lt;span class="p">(),&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">pg_current_wal_lsn&lt;/span>&lt;span class="p">();&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="k">SELECT&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">application_name&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">state&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">sync_state&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">replay_lag&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="k">FROM&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">pg_stat_replication&lt;/span>&lt;span class="p">;&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>在 standby 查：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-sql" data-lang="sql">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="k">SELECT&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">pg_is_in_recovery&lt;/span>&lt;span class="p">();&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="k">SELECT&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">now&lt;/span>&lt;span class="p">(),&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">pg_last_wal_receive_lsn&lt;/span>&lt;span class="p">(),&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">pg_last_wal_replay_lsn&lt;/span>&lt;span class="p">();&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Baseline 要保存 primary host、standby host、replication lag、application connection string、pooler route 與 current timeline。&lt;/p>
&lt;h2 id="client-workload">Client Workload&lt;/h2>
&lt;p>Client workload 的核心責任是讓 failover 對 application 的影響可見。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="k">while&lt;/span> true&lt;span class="p">;&lt;/span> &lt;span class="k">do&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl"> date -u
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl"> psql &lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="nv">$DATABASE_URL&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span> -c &lt;span class="s2">&amp;#34;INSERT INTO restore_markers(marker) VALUES (&amp;#39;failover-drill&amp;#39;) RETURNING id, created_at;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl"> sleep &lt;span class="m">1&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl">&lt;span class="k">done&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>這個 loop 會在 failover 期間產生成功、timeout、connection reset 或 read-only error。正式演練要用 synthetic workload，避免影響真實使用者。&lt;/p>
&lt;h2 id="trigger-failover">Trigger Failover&lt;/h2>
&lt;p>Trigger failover 的核心責任是以受控方式促成 promotion。Patroni lab 可以用 &lt;code>patronictl failover&lt;/code>；managed service 則用 provider failover / reboot with failover 功能。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-text" data-lang="text">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">failover_start_time:
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">trigger_method:
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">old_primary:
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">candidate:
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl">operator:
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">6&lt;/span>&lt;span class="cl">reason:&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Failover 觸發前要先確認這是演練，並且 workload、backup、rollback 與 stakeholder 都已對齊。&lt;/p>
&lt;h2 id="observe-promotion">Observe Promotion&lt;/h2>
&lt;p>Observe promotion 的核心責任是記錄資料庫與 client 的時間線。&lt;/p>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>時間點&lt;/th>
 &lt;th>Evidence&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>Trigger issued&lt;/td>
 &lt;td>command / provider event&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Old primary down&lt;/td>
 &lt;td>connection error / health check&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>New primary promoted&lt;/td>
 &lt;td>&lt;code>pg_is_in_recovery() = false&lt;/code>&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Client reconnect&lt;/td>
 &lt;td>first successful write&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Pooler stable&lt;/td>
 &lt;td>pool queue / server connection&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Validation complete&lt;/td>
 &lt;td>row count / marker sequence&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>Promotion timeline 要保留秒級時間戳。這是評估 RTO、client retry 與 pooler behavior 的基礎。&lt;/p></description><content:encoded><![CDATA[<p>PostgreSQL HA failover drill 的核心責任是讓讀者觀察 primary promotion 對 application、pooler 與 incident decision 的影響。這篇承接 <a href="../../patroni-ha/">Patroni HA</a> 與 <a href="../../cross-region-dr/">Cross-region DR</a>。</p>
<p>本文的驗收標準是：你能記錄 failover timeline、replication lag snapshot、client error sample、data validation query 與 incident decision log entry。實際觸發方式依 Patroni、managed PostgreSQL 或雲平台而異；lab 重點是 evidence。</p>
<h2 id="pre-failover-baseline">Pre-Failover Baseline</h2>
<p>Pre-failover baseline 的核心責任是確認 primary / standby 狀態與 client route。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">SELECT</span><span class="w"> </span><span class="n">pg_is_in_recovery</span><span class="p">();</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w"></span><span class="k">SELECT</span><span class="w"> </span><span class="n">now</span><span class="p">(),</span><span class="w"> </span><span class="n">pg_current_wal_lsn</span><span class="p">();</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w"></span><span class="k">SELECT</span><span class="w"> </span><span class="n">application_name</span><span class="p">,</span><span class="w"> </span><span class="k">state</span><span class="p">,</span><span class="w"> </span><span class="n">sync_state</span><span class="p">,</span><span class="w"> </span><span class="n">replay_lag</span><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w"></span><span class="k">FROM</span><span class="w"> </span><span class="n">pg_stat_replication</span><span class="p">;</span></span></span></code></pre></div><p>在 standby 查：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">SELECT</span><span class="w"> </span><span class="n">pg_is_in_recovery</span><span class="p">();</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w"></span><span class="k">SELECT</span><span class="w"> </span><span class="n">now</span><span class="p">(),</span><span class="w"> </span><span class="n">pg_last_wal_receive_lsn</span><span class="p">(),</span><span class="w"> </span><span class="n">pg_last_wal_replay_lsn</span><span class="p">();</span></span></span></code></pre></div><p>Baseline 要保存 primary host、standby host、replication lag、application connection string、pooler route 與 current timeline。</p>
<h2 id="client-workload">Client Workload</h2>
<p>Client workload 的核心責任是讓 failover 對 application 的影響可見。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">while</span> true<span class="p">;</span> <span class="k">do</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">  date -u
</span></span><span class="line"><span class="ln">3</span><span class="cl">  psql <span class="s2">&#34;</span><span class="nv">$DATABASE_URL</span><span class="s2">&#34;</span> -c <span class="s2">&#34;INSERT INTO restore_markers(marker) VALUES (&#39;failover-drill&#39;) RETURNING id, created_at;&#34;</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">  sleep <span class="m">1</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="k">done</span></span></span></code></pre></div><p>這個 loop 會在 failover 期間產生成功、timeout、connection reset 或 read-only error。正式演練要用 synthetic workload，避免影響真實使用者。</p>
<h2 id="trigger-failover">Trigger Failover</h2>
<p>Trigger failover 的核心責任是以受控方式促成 promotion。Patroni lab 可以用 <code>patronictl failover</code>；managed service 則用 provider failover / reboot with failover 功能。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">failover_start_time:
</span></span><span class="line"><span class="ln">2</span><span class="cl">trigger_method:
</span></span><span class="line"><span class="ln">3</span><span class="cl">old_primary:
</span></span><span class="line"><span class="ln">4</span><span class="cl">candidate:
</span></span><span class="line"><span class="ln">5</span><span class="cl">operator:
</span></span><span class="line"><span class="ln">6</span><span class="cl">reason:</span></span></code></pre></div><p>Failover 觸發前要先確認這是演練，並且 workload、backup、rollback 與 stakeholder 都已對齊。</p>
<h2 id="observe-promotion">Observe Promotion</h2>
<p>Observe promotion 的核心責任是記錄資料庫與 client 的時間線。</p>
<table>
  <thead>
      <tr>
          <th>時間點</th>
          <th>Evidence</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Trigger issued</td>
          <td>command / provider event</td>
      </tr>
      <tr>
          <td>Old primary down</td>
          <td>connection error / health check</td>
      </tr>
      <tr>
          <td>New primary promoted</td>
          <td><code>pg_is_in_recovery() = false</code></td>
      </tr>
      <tr>
          <td>Client reconnect</td>
          <td>first successful write</td>
      </tr>
      <tr>
          <td>Pooler stable</td>
          <td>pool queue / server connection</td>
      </tr>
      <tr>
          <td>Validation complete</td>
          <td>row count / marker sequence</td>
      </tr>
  </tbody>
</table>
<p>Promotion timeline 要保留秒級時間戳。這是評估 RTO、client retry 與 pooler behavior 的基礎。</p>
<h2 id="data-validation">Data Validation</h2>
<p>Data validation 的核心責任是確認 failover 後資料一致性。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">SELECT</span><span class="w"> </span><span class="k">count</span><span class="p">(</span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">restore_markers</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">marker</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;failover-drill&#39;</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w"></span><span class="k">SELECT</span><span class="w"> </span><span class="k">max</span><span class="p">(</span><span class="n">created_at</span><span class="p">)</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">restore_markers</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w"></span><span class="k">SELECT</span><span class="w"> </span><span class="n">status</span><span class="p">,</span><span class="w"> </span><span class="k">count</span><span class="p">(</span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">accounts</span><span class="w"> </span><span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">status</span><span class="p">;</span></span></span></code></pre></div><p>若 workload 有 idempotency key，還要檢查 duplicate。若外部 side effect 參與交易，例如 payment 或 queue，必須有 reconciliation query。</p>
<h2 id="pooler-and-client-behavior">Pooler and Client Behavior</h2>
<p>Pooler and client behavior 的核心責任是確認 failover 後連線能重新指向新 primary。</p>
<p>檢查項目：</p>
<ol>
<li>Application retry 是否有 backoff / jitter。</li>
<li>PgBouncer / proxy 是否清掉舊 server connection。</li>
<li>DNS / endpoint TTL 是否符合 RTO。</li>
<li>Read-only error 是否被正確分類。</li>
<li>Migration / background job 是否暫停。</li>
</ol>
<p>Failover 的完成標準包含 database promote、client reconnect 與 pooler stable。若 client 長時間連到舊 primary 或 pooler 卡住，服務仍處於 unavailable 狀態。</p>
<h2 id="incident-decision-log">Incident Decision Log</h2>
<p>Incident decision log 的核心責任是把演練變成可審查紀錄。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">Incident / drill id:
</span></span><span class="line"><span class="ln">2</span><span class="cl">Decision: promote standby
</span></span><span class="line"><span class="ln">3</span><span class="cl">Reason:
</span></span><span class="line"><span class="ln">4</span><span class="cl">Accepted data loss:
</span></span><span class="line"><span class="ln">5</span><span class="cl">RTO observed:
</span></span><span class="line"><span class="ln">6</span><span class="cl">Client impact:
</span></span><span class="line"><span class="ln">7</span><span class="cl">Validation result:
</span></span><span class="line"><span class="ln">8</span><span class="cl">Follow-up:</span></span></code></pre></div><p>每次 drill 都要產生 follow-up。常見 follow-up 是調整 retry、降低 DNS TTL、補 pooler command、增加 validation query 或改善 monitoring。</p>
<h2 id="下一步路由">下一步路由</h2>
<p>完成本篇後，HA 架構讀 <a href="../../patroni-ha/">Patroni HA</a>；跨區災難復原讀 <a href="../../cross-region-dr/">Cross-region DR</a>；connection retry 與 pooler 行為讀 <a href="../connection-pool-lab/">Connection Pool Lab</a>。</p>
]]></content:encoded></item><item><title>PostgreSQL Hands-on 操作路線</title><link>https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/hands-on/</link><pubDate>Fri, 22 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/hands-on/</guid><description>&lt;p>PostgreSQL hands-on 操作路線的核心責任是把 overview 與 deep article 的判讀轉成可演練的操作流程。這一層對齊 LLM &lt;code>hands-on/&lt;/code> 的功能：讀者不只知道 PostgreSQL 的機制，也能在 local lab 或 staging 產出可驗證 artifact。&lt;/p>
&lt;h2 id="章節列表">章節列表&lt;/h2>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>章節&lt;/th>
 &lt;th>主題&lt;/th>
 &lt;th>產出 artifact&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>&lt;a href="local-lab-quickstart/">Local lab quickstart&lt;/a>&lt;/td>
 &lt;td>Docker Compose 啟動 PostgreSQL、建立 schema、跑 sample workload&lt;/td>
 &lt;td>local DSN、schema migration log、basic metric snapshot&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;a href="connection-pool-lab/">Connection pool lab&lt;/a>&lt;/td>
 &lt;td>application pool → pgBouncer → PostgreSQL 的連線壓力演練&lt;/td>
 &lt;td>pool config、connection count evidence、failure note&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;a href="pitr-restore-drill/">PITR restore drill&lt;/a>&lt;/td>
 &lt;td>base backup + WAL archive + restore target time 的恢復演練&lt;/td>
 &lt;td>restore record、RPO / RTO evidence、validation query&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;a href="schema-migration-evidence-lab/">Schema migration evidence lab&lt;/a>&lt;/td>
 &lt;td>expand / contract migration、validation query、rollback condition&lt;/td>
 &lt;td>migration plan、row count、rollback note&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;a href="ha-failover-drill/">HA failover drill&lt;/a>&lt;/td>
 &lt;td>Patroni / managed failover 的 application impact 演練&lt;/td>
 &lt;td>failover timeline、client error sample、decision log&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;h2 id="設計原則">設計原則&lt;/h2>
&lt;p>PostgreSQL hands-on 章節只收錄能產出 evidence 的操作。純安裝指令留給官方文件；本路線要教讀者如何知道設定生效、失敗時看到什麼、以及 evidence 要交給 04 / 06 / 08 的哪個 artifact。&lt;/p>
&lt;h2 id="引用路徑">引用路徑&lt;/h2>
&lt;ul>
&lt;li>上游：&lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/" data-link-title="PostgreSQL" data-link-desc="多用途 OLTP 主流關聯式資料庫、MVCC、豐富 SQL 特性、是 Aurora / Cosmos DB / Spanner / CockroachDB / Aurora DSQL 的相容目標">PostgreSQL overview&lt;/a>&lt;/li>
&lt;li>Deep article：&lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/pgbouncer-config/" data-link-title="PostgreSQL pgBouncer 配置 &amp;#43; 連線池治理" data-link-desc="pgBouncer transaction pooling 配置、跟 application connection pool 的分層、production 故障演練（pool exhaustion / stale connection / DNS failover）跟容量規劃">pgBouncer Config&lt;/a>、&lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/pitr-wal-archiving/" data-link-title="PostgreSQL PITR &amp;#43; WAL archiving：從 base backup 到 point-in-time recovery 的完整鏈" data-link-desc="Base backup &amp;#43; WAL archive 構成 PITR 的雙軌資料、archive_command &amp;#43; restore_command 配置、用 pgBackRest / WAL-G 替代手寫腳本、5 個 production 踩雷（archive 靜默失敗 / archive lag / 錯誤 target time / base backup 過期未清 / timeline 分歧 recovery 模糊）、跟 Patroni &amp;#43; monitoring 整合">PITR + WAL Archiving&lt;/a>、&lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/patroni-ha/" data-link-title="PostgreSQL Patroni HA：從 leader 失聯到 client 重連的 5 段 failover lifecycle" data-link-desc="Patroni 把 PostgreSQL HA 拆成 detection / election / promotion / reconfiguration / recovery 五段 lifecycle、每段都有獨立配置跟 failure mode；DCS quorum &amp;#43; watchdog 防 split-brain、async/sync replication 取捨、5 個 production 踩雷、跟 PgBouncer / HAProxy / cert-manager 整合">Patroni HA&lt;/a>&lt;/li>
&lt;li>跨模組：&lt;a href="https://tarrragon.github.io/blog/backend/04-observability/observability-evidence-package/" data-link-title="4.20 Observability Evidence Package" data-link-desc="把 log、metric、trace、audit 與資料品質限制包成可交接證據">Observability Evidence Package&lt;/a>、&lt;a href="https://tarrragon.github.io/blog/backend/06-reliability/migration-safety/" data-link-title="6.11 Migration Safety 與 DB Rollout" data-link-desc="把 schema migration 從一次性事件變成可逆、可漸進的 rollout 流程">Migration Safety&lt;/a>、&lt;a href="https://tarrragon.github.io/blog/backend/08-incident-response/incident-decision-log/" data-link-title="8.19 Incident Decision Log" data-link-desc="把事中假設、決策、證據、回退條件與責任人留下可復盤紀錄">Incident Decision Log&lt;/a>&lt;/li>
&lt;/ul></description><content:encoded><![CDATA[<p>PostgreSQL hands-on 操作路線的核心責任是把 overview 與 deep article 的判讀轉成可演練的操作流程。這一層對齊 LLM <code>hands-on/</code> 的功能：讀者不只知道 PostgreSQL 的機制，也能在 local lab 或 staging 產出可驗證 artifact。</p>
<h2 id="章節列表">章節列表</h2>
<table>
  <thead>
      <tr>
          <th>章節</th>
          <th>主題</th>
          <th>產出 artifact</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><a href="local-lab-quickstart/">Local lab quickstart</a></td>
          <td>Docker Compose 啟動 PostgreSQL、建立 schema、跑 sample workload</td>
          <td>local DSN、schema migration log、basic metric snapshot</td>
      </tr>
      <tr>
          <td><a href="connection-pool-lab/">Connection pool lab</a></td>
          <td>application pool → pgBouncer → PostgreSQL 的連線壓力演練</td>
          <td>pool config、connection count evidence、failure note</td>
      </tr>
      <tr>
          <td><a href="pitr-restore-drill/">PITR restore drill</a></td>
          <td>base backup + WAL archive + restore target time 的恢復演練</td>
          <td>restore record、RPO / RTO evidence、validation query</td>
      </tr>
      <tr>
          <td><a href="schema-migration-evidence-lab/">Schema migration evidence lab</a></td>
          <td>expand / contract migration、validation query、rollback condition</td>
          <td>migration plan、row count、rollback note</td>
      </tr>
      <tr>
          <td><a href="ha-failover-drill/">HA failover drill</a></td>
          <td>Patroni / managed failover 的 application impact 演練</td>
          <td>failover timeline、client error sample、decision log</td>
      </tr>
  </tbody>
</table>
<h2 id="設計原則">設計原則</h2>
<p>PostgreSQL hands-on 章節只收錄能產出 evidence 的操作。純安裝指令留給官方文件；本路線要教讀者如何知道設定生效、失敗時看到什麼、以及 evidence 要交給 04 / 06 / 08 的哪個 artifact。</p>
<h2 id="引用路徑">引用路徑</h2>
<ul>
<li>上游：<a href="/blog/backend/01-database/vendors/postgresql/" data-link-title="PostgreSQL" data-link-desc="多用途 OLTP 主流關聯式資料庫、MVCC、豐富 SQL 特性、是 Aurora / Cosmos DB / Spanner / CockroachDB / Aurora DSQL 的相容目標">PostgreSQL overview</a></li>
<li>Deep article：<a href="/blog/backend/01-database/vendors/postgresql/pgbouncer-config/" data-link-title="PostgreSQL pgBouncer 配置 &#43; 連線池治理" data-link-desc="pgBouncer transaction pooling 配置、跟 application connection pool 的分層、production 故障演練（pool exhaustion / stale connection / DNS failover）跟容量規劃">pgBouncer Config</a>、<a href="/blog/backend/01-database/vendors/postgresql/pitr-wal-archiving/" data-link-title="PostgreSQL PITR &#43; WAL archiving：從 base backup 到 point-in-time recovery 的完整鏈" data-link-desc="Base backup &#43; WAL archive 構成 PITR 的雙軌資料、archive_command &#43; restore_command 配置、用 pgBackRest / WAL-G 替代手寫腳本、5 個 production 踩雷（archive 靜默失敗 / archive lag / 錯誤 target time / base backup 過期未清 / timeline 分歧 recovery 模糊）、跟 Patroni &#43; monitoring 整合">PITR + WAL Archiving</a>、<a href="/blog/backend/01-database/vendors/postgresql/patroni-ha/" data-link-title="PostgreSQL Patroni HA：從 leader 失聯到 client 重連的 5 段 failover lifecycle" data-link-desc="Patroni 把 PostgreSQL HA 拆成 detection / election / promotion / reconfiguration / recovery 五段 lifecycle、每段都有獨立配置跟 failure mode；DCS quorum &#43; watchdog 防 split-brain、async/sync replication 取捨、5 個 production 踩雷、跟 PgBouncer / HAProxy / cert-manager 整合">Patroni HA</a></li>
<li>跨模組：<a href="/blog/backend/04-observability/observability-evidence-package/" data-link-title="4.20 Observability Evidence Package" data-link-desc="把 log、metric、trace、audit 與資料品質限制包成可交接證據">Observability Evidence Package</a>、<a href="/blog/backend/06-reliability/migration-safety/" data-link-title="6.11 Migration Safety 與 DB Rollout" data-link-desc="把 schema migration 從一次性事件變成可逆、可漸進的 rollout 流程">Migration Safety</a>、<a href="/blog/backend/08-incident-response/incident-decision-log/" data-link-title="8.19 Incident Decision Log" data-link-desc="把事中假設、決策、證據、回退條件與責任人留下可復盤紀錄">Incident Decision Log</a></li>
</ul>
]]></content:encoded></item><item><title>PostgreSQL Local Lab Quickstart</title><link>https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/hands-on/local-lab-quickstart/</link><pubDate>Fri, 22 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/hands-on/local-lab-quickstart/</guid><description>&lt;p>PostgreSQL local lab quickstart 的核心責任是建立後續 connection、migration、backup 與 failover 演練共用的本地環境。這個 lab 提供一個可重建的 PostgreSQL instance、app-facing user、baseline schema、seed data 與 basic evidence。&lt;/p>
&lt;p>本文的驗收標準是：你能啟動本地 PostgreSQL，套用 schema，跑 sample workload，取得 &lt;code>pg_stat_activity&lt;/code> / &lt;code>pg_stat_database&lt;/code> snapshot，最後 teardown 並重建。&lt;/p>
&lt;h2 id="docker-compose">Docker Compose&lt;/h2>
&lt;p>Docker Compose 的核心責任是讓 lab 環境可重建。建立 &lt;code>docker-compose.yml&lt;/code>：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-yaml" data-lang="yaml">&lt;span class="line">&lt;span class="ln"> 1&lt;/span>&lt;span class="cl">&lt;span class="nt">services&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 2&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">postgres&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 3&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">image&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">postgres:16&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 4&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">environment&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 5&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">POSTGRES_USER&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">lab_admin&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 6&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">POSTGRES_PASSWORD&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">lab_admin_pw&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 7&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">POSTGRES_DB&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">appdb&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 8&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">ports&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 9&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>- &lt;span class="s2">&amp;#34;54329:5432&amp;#34;&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">10&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">command&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">11&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>- &lt;span class="s2">&amp;#34;postgres&amp;#34;&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">12&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>- &lt;span class="s2">&amp;#34;-c&amp;#34;&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">13&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>- &lt;span class="s2">&amp;#34;log_min_duration_statement=100&amp;#34;&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">14&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>- &lt;span class="s2">&amp;#34;-c&amp;#34;&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">15&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>- &lt;span class="s2">&amp;#34;shared_preload_libraries=pg_stat_statements&amp;#34;&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>啟動：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">docker compose up -d
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="nb">export&lt;/span> &lt;span class="nv">DATABASE_URL&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;postgres://lab_admin:lab_admin_pw@localhost:54329/appdb?sslmode=disable&amp;#34;&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="baseline-schema">Baseline Schema&lt;/h2>
&lt;p>Baseline schema 的核心責任是建立可測 transaction、index、lock 與 migration 的資料模型。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln"> 1&lt;/span>&lt;span class="cl">psql &lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="nv">$DATABASE_URL&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span> &lt;span class="s">&amp;lt;&amp;lt;&amp;#39;SQL&amp;#39;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 2&lt;/span>&lt;span class="cl">&lt;span class="s">CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 3&lt;/span>&lt;span class="cl">&lt;span class="s">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 4&lt;/span>&lt;span class="cl">&lt;span class="s">CREATE TABLE accounts (
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 5&lt;/span>&lt;span class="cl">&lt;span class="s"> id bigserial PRIMARY KEY,
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 6&lt;/span>&lt;span class="cl">&lt;span class="s"> tenant_id uuid NOT NULL,
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 7&lt;/span>&lt;span class="cl">&lt;span class="s"> owner_name text NOT NULL,
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 8&lt;/span>&lt;span class="cl">&lt;span class="s"> status text NOT NULL CHECK (status IN (&amp;#39;active&amp;#39;, &amp;#39;closed&amp;#39;)),
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 9&lt;/span>&lt;span class="cl">&lt;span class="s"> created_at timestamptz NOT NULL DEFAULT now()
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">10&lt;/span>&lt;span class="cl">&lt;span class="s">);
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">11&lt;/span>&lt;span class="cl">&lt;span class="s">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">12&lt;/span>&lt;span class="cl">&lt;span class="s">CREATE TABLE ledger_entries (
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">13&lt;/span>&lt;span class="cl">&lt;span class="s"> id bigserial PRIMARY KEY,
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">14&lt;/span>&lt;span class="cl">&lt;span class="s"> account_id bigint NOT NULL REFERENCES accounts(id),
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">15&lt;/span>&lt;span class="cl">&lt;span class="s"> amount_cents bigint NOT NULL CHECK (amount_cents &amp;lt;&amp;gt; 0),
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">16&lt;/span>&lt;span class="cl">&lt;span class="s"> idempotency_key text NOT NULL UNIQUE,
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">17&lt;/span>&lt;span class="cl">&lt;span class="s"> created_at timestamptz NOT NULL DEFAULT now()
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">18&lt;/span>&lt;span class="cl">&lt;span class="s">);
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">19&lt;/span>&lt;span class="cl">&lt;span class="s">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">20&lt;/span>&lt;span class="cl">&lt;span class="s">CREATE INDEX idx_ledger_entries_account_created
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">21&lt;/span>&lt;span class="cl">&lt;span class="s">ON ledger_entries(account_id, created_at DESC);
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">22&lt;/span>&lt;span class="cl">&lt;span class="s">SQL&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>這組 schema 後續可用於 migration、lock、PITR 與 pool lab。&lt;/p></description><content:encoded><![CDATA[<p>PostgreSQL local lab quickstart 的核心責任是建立後續 connection、migration、backup 與 failover 演練共用的本地環境。這個 lab 提供一個可重建的 PostgreSQL instance、app-facing user、baseline schema、seed data 與 basic evidence。</p>
<p>本文的驗收標準是：你能啟動本地 PostgreSQL，套用 schema，跑 sample workload，取得 <code>pg_stat_activity</code> / <code>pg_stat_database</code> snapshot，最後 teardown 並重建。</p>
<h2 id="docker-compose">Docker Compose</h2>
<p>Docker Compose 的核心責任是讓 lab 環境可重建。建立 <code>docker-compose.yml</code>：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="nt">services</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="w">  </span><span class="nt">postgres</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="w">    </span><span class="nt">image</span><span class="p">:</span><span class="w"> </span><span class="l">postgres:16</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="w">    </span><span class="nt">environment</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="w">      </span><span class="nt">POSTGRES_USER</span><span class="p">:</span><span class="w"> </span><span class="l">lab_admin</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="w">      </span><span class="nt">POSTGRES_PASSWORD</span><span class="p">:</span><span class="w"> </span><span class="l">lab_admin_pw</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="w">      </span><span class="nt">POSTGRES_DB</span><span class="p">:</span><span class="w"> </span><span class="l">appdb</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="w">    </span><span class="nt">ports</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="w">      </span>- <span class="s2">&#34;54329:5432&#34;</span><span class="w">
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="w">    </span><span class="nt">command</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="w">      </span>- <span class="s2">&#34;postgres&#34;</span><span class="w">
</span></span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="w">      </span>- <span class="s2">&#34;-c&#34;</span><span class="w">
</span></span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="w">      </span>- <span class="s2">&#34;log_min_duration_statement=100&#34;</span><span class="w">
</span></span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="w">      </span>- <span class="s2">&#34;-c&#34;</span><span class="w">
</span></span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="w">      </span>- <span class="s2">&#34;shared_preload_libraries=pg_stat_statements&#34;</span></span></span></code></pre></div><p>啟動：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">docker compose up -d
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="nb">export</span> <span class="nv">DATABASE_URL</span><span class="o">=</span><span class="s2">&#34;postgres://lab_admin:lab_admin_pw@localhost:54329/appdb?sslmode=disable&#34;</span></span></span></code></pre></div><h2 id="baseline-schema">Baseline Schema</h2>
<p>Baseline schema 的核心責任是建立可測 transaction、index、lock 與 migration 的資料模型。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl">psql <span class="s2">&#34;</span><span class="nv">$DATABASE_URL</span><span class="s2">&#34;</span> <span class="s">&lt;&lt;&#39;SQL&#39;
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="s">CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="s">
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="s">CREATE TABLE accounts (
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="s">  id bigserial PRIMARY KEY,
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="s">  tenant_id uuid NOT NULL,
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="s">  owner_name text NOT NULL,
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="s">  status text NOT NULL CHECK (status IN (&#39;active&#39;, &#39;closed&#39;)),
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="s">  created_at timestamptz NOT NULL DEFAULT now()
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="s">);
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="s">
</span></span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="s">CREATE TABLE ledger_entries (
</span></span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="s">  id bigserial PRIMARY KEY,
</span></span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="s">  account_id bigint NOT NULL REFERENCES accounts(id),
</span></span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="s">  amount_cents bigint NOT NULL CHECK (amount_cents &lt;&gt; 0),
</span></span></span><span class="line"><span class="ln">16</span><span class="cl"><span class="s">  idempotency_key text NOT NULL UNIQUE,
</span></span></span><span class="line"><span class="ln">17</span><span class="cl"><span class="s">  created_at timestamptz NOT NULL DEFAULT now()
</span></span></span><span class="line"><span class="ln">18</span><span class="cl"><span class="s">);
</span></span></span><span class="line"><span class="ln">19</span><span class="cl"><span class="s">
</span></span></span><span class="line"><span class="ln">20</span><span class="cl"><span class="s">CREATE INDEX idx_ledger_entries_account_created
</span></span></span><span class="line"><span class="ln">21</span><span class="cl"><span class="s">ON ledger_entries(account_id, created_at DESC);
</span></span></span><span class="line"><span class="ln">22</span><span class="cl"><span class="s">SQL</span></span></span></code></pre></div><p>這組 schema 後續可用於 migration、lock、PITR 與 pool lab。</p>
<h2 id="seed-and-workload">Seed and Workload</h2>
<p>Seed and workload 的核心責任是產生可觀察的資料與查詢。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl">psql <span class="s2">&#34;</span><span class="nv">$DATABASE_URL</span><span class="s2">&#34;</span> <span class="s">&lt;&lt;&#39;SQL&#39;
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="s">INSERT INTO accounts(tenant_id, owner_name, status)
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="s">VALUES
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="s">  (&#39;00000000-0000-0000-0000-000000000001&#39;, &#39;Ada&#39;, &#39;active&#39;),
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="s">  (&#39;00000000-0000-0000-0000-000000000002&#39;, &#39;Lin&#39;, &#39;active&#39;);
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="s">
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="s">INSERT INTO ledger_entries(account_id, amount_cents, idempotency_key)
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="s">SELECT 1, 100, &#39;seed-ada-&#39; || g
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="s">FROM generate_series(1, 100) AS g;
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="s">
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="s">SELECT a.owner_name, SUM(l.amount_cents) AS balance_cents
</span></span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="s">FROM accounts a
</span></span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="s">JOIN ledger_entries l ON l.account_id = a.id
</span></span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="s">GROUP BY a.owner_name;
</span></span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="s">SQL</span></span></span></code></pre></div><p>Sample workload 要保留 SQL 與輸出，作為後續 migration / restore validation 的 baseline。</p>
<h2 id="basic-evidence">Basic Evidence</h2>
<p>Basic evidence 的核心責任是把 lab 狀態保存成可比較 snapshot。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl">psql <span class="s2">&#34;</span><span class="nv">$DATABASE_URL</span><span class="s2">&#34;</span> <span class="s">&lt;&lt;&#39;SQL&#39;
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="s">SELECT current_database(), current_user, version();
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="s">SELECT relname, n_live_tup FROM pg_stat_user_tables ORDER BY relname;
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="s">SELECT datname, numbackends, xact_commit, xact_rollback
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="s">FROM pg_stat_database
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="s">WHERE datname = current_database();
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="s">SELECT pid, state, wait_event_type, query
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="s">FROM pg_stat_activity
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="s">WHERE datname = current_database();
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="s">SQL</span></span></span></code></pre></div><p>這些查詢是 PostgreSQL lab 的最小 evidence。正式服務要再加入 slow query、lock wait、replica lag、backup status 與 pooler metrics。</p>
<h2 id="teardown">Teardown</h2>
<p>Teardown 的核心責任是讓 lab 可重跑。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">docker compose down -v</span></span></code></pre></div><p>重建後應能重新套用 schema 與 seed。若 lab 需要跨章節沿用資料，先用 <code>pg_dump</code> 保存 fixture，再 teardown。</p>
<h2 id="下一步路由">下一步路由</h2>
<p>完成本篇後，連線壓力進入 <a href="../connection-pool-lab/">Connection Pool Lab</a>；migration evidence 進入 <a href="../schema-migration-evidence-lab/">Schema Migration Evidence Lab</a>；backup / PITR 進入 <a href="../pitr-restore-drill/">PITR Restore Drill</a>。</p>
]]></content:encoded></item><item><title>PostgreSQL PITR Restore Drill</title><link>https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/hands-on/pitr-restore-drill/</link><pubDate>Fri, 22 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/hands-on/pitr-restore-drill/</guid><description>&lt;p>PostgreSQL PITR restore drill 的核心責任是證明 backup 可以還原到指定時間點。這篇承接 &lt;a href="../../pitr-wal-archiving/">PITR + WAL Archiving&lt;/a>，把備份從存在狀態推進到可恢復證據。&lt;/p>
&lt;p>本文的驗收標準是：你能記錄 base backup 時間、target time、restore duration、validation query 與 RPO / RTO note。實際命令會依 pgBackRest、Barman、cloud snapshot 或 managed service 而變；本文提供 vendor-neutral drill frame。&lt;/p>
&lt;h2 id="prepare-recovery-point">Prepare Recovery Point&lt;/h2>
&lt;p>Prepare recovery point 的核心責任是建立可辨識 transaction。先寫入一筆 marker，記錄時間。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln"> 1&lt;/span>&lt;span class="cl">psql &lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="nv">$DATABASE_URL&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span> &lt;span class="s">&amp;lt;&amp;lt;&amp;#39;SQL&amp;#39;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 2&lt;/span>&lt;span class="cl">&lt;span class="s">CREATE TABLE IF NOT EXISTS restore_markers (
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 3&lt;/span>&lt;span class="cl">&lt;span class="s"> id bigserial PRIMARY KEY,
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 4&lt;/span>&lt;span class="cl">&lt;span class="s"> marker text NOT NULL,
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 5&lt;/span>&lt;span class="cl">&lt;span class="s"> created_at timestamptz NOT NULL DEFAULT clock_timestamp()
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 6&lt;/span>&lt;span class="cl">&lt;span class="s">);
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 7&lt;/span>&lt;span class="cl">&lt;span class="s">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 8&lt;/span>&lt;span class="cl">&lt;span class="s">INSERT INTO restore_markers(marker) VALUES (&amp;#39;before-bad-change&amp;#39;);
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 9&lt;/span>&lt;span class="cl">&lt;span class="s">SELECT id, marker, created_at FROM restore_markers ORDER BY id DESC LIMIT 1;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">10&lt;/span>&lt;span class="cl">&lt;span class="s">SQL&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>把 &lt;code>created_at&lt;/code> 記為 target time。正式 drill 要用 UTC，並記錄 timezone、operator、backup set 與 WAL archive status。&lt;/p>
&lt;h2 id="create-bad-change">Create Bad Change&lt;/h2>
&lt;p>Create bad change 的核心責任是模擬需要 PITR 的錯誤。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">psql &lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="nv">$DATABASE_URL&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span> &lt;span class="s">&amp;lt;&amp;lt;&amp;#39;SQL&amp;#39;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="s">INSERT INTO restore_markers(marker) VALUES (&amp;#39;bad-change-after-target&amp;#39;);
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">&lt;span class="s">UPDATE accounts SET status = &amp;#39;closed&amp;#39;;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">&lt;span class="s">SELECT status, count(*) FROM accounts GROUP BY status;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl">&lt;span class="s">SQL&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>這一步在 lab 中代表誤操作。Production 事故中，bad change 可能是誤刪、錯誤 batch、壞 migration 或 application bug。&lt;/p>
&lt;h2 id="restore-workflow">Restore Workflow&lt;/h2>
&lt;p>Restore workflow 的核心責任是把 backup tool 的操作轉成固定 evidence。不同工具命令不同，但流程一致：&lt;/p>
&lt;ol>
&lt;li>選定 base backup。&lt;/li>
&lt;li>設定 recovery target time。&lt;/li>
&lt;li>套用 WAL 到 target time。&lt;/li>
&lt;li>Promote restored instance。&lt;/li>
&lt;li>跑 validation query。&lt;/li>
&lt;li>啟動 application smoke test。&lt;/li>
&lt;/ol>
&lt;p>Example pseudo-runbook：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-text" data-lang="text">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">restore_target_time = 2026-05-21T10:15:30Z
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">base_backup = latest backup before target
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">wal_archive = available through target
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">restore_path = isolated environment&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Restore 必須在隔離環境先完成。直接覆蓋 production 會讓 evidence 與 rollback 空間消失。&lt;/p></description><content:encoded><![CDATA[<p>PostgreSQL PITR restore drill 的核心責任是證明 backup 可以還原到指定時間點。這篇承接 <a href="../../pitr-wal-archiving/">PITR + WAL Archiving</a>，把備份從存在狀態推進到可恢復證據。</p>
<p>本文的驗收標準是：你能記錄 base backup 時間、target time、restore duration、validation query 與 RPO / RTO note。實際命令會依 pgBackRest、Barman、cloud snapshot 或 managed service 而變；本文提供 vendor-neutral drill frame。</p>
<h2 id="prepare-recovery-point">Prepare Recovery Point</h2>
<p>Prepare recovery point 的核心責任是建立可辨識 transaction。先寫入一筆 marker，記錄時間。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl">psql <span class="s2">&#34;</span><span class="nv">$DATABASE_URL</span><span class="s2">&#34;</span> <span class="s">&lt;&lt;&#39;SQL&#39;
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="s">CREATE TABLE IF NOT EXISTS restore_markers (
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="s">  id bigserial PRIMARY KEY,
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="s">  marker text NOT NULL,
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="s">  created_at timestamptz NOT NULL DEFAULT clock_timestamp()
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="s">);
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="s">
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="s">INSERT INTO restore_markers(marker) VALUES (&#39;before-bad-change&#39;);
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="s">SELECT id, marker, created_at FROM restore_markers ORDER BY id DESC LIMIT 1;
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="s">SQL</span></span></span></code></pre></div><p>把 <code>created_at</code> 記為 target time。正式 drill 要用 UTC，並記錄 timezone、operator、backup set 與 WAL archive status。</p>
<h2 id="create-bad-change">Create Bad Change</h2>
<p>Create bad change 的核心責任是模擬需要 PITR 的錯誤。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">psql <span class="s2">&#34;</span><span class="nv">$DATABASE_URL</span><span class="s2">&#34;</span> <span class="s">&lt;&lt;&#39;SQL&#39;
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="s">INSERT INTO restore_markers(marker) VALUES (&#39;bad-change-after-target&#39;);
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="s">UPDATE accounts SET status = &#39;closed&#39;;
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="s">SELECT status, count(*) FROM accounts GROUP BY status;
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="s">SQL</span></span></span></code></pre></div><p>這一步在 lab 中代表誤操作。Production 事故中，bad change 可能是誤刪、錯誤 batch、壞 migration 或 application bug。</p>
<h2 id="restore-workflow">Restore Workflow</h2>
<p>Restore workflow 的核心責任是把 backup tool 的操作轉成固定 evidence。不同工具命令不同，但流程一致：</p>
<ol>
<li>選定 base backup。</li>
<li>設定 recovery target time。</li>
<li>套用 WAL 到 target time。</li>
<li>Promote restored instance。</li>
<li>跑 validation query。</li>
<li>啟動 application smoke test。</li>
</ol>
<p>Example pseudo-runbook：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">restore_target_time = 2026-05-21T10:15:30Z
</span></span><span class="line"><span class="ln">2</span><span class="cl">base_backup = latest backup before target
</span></span><span class="line"><span class="ln">3</span><span class="cl">wal_archive = available through target
</span></span><span class="line"><span class="ln">4</span><span class="cl">restore_path = isolated environment</span></span></code></pre></div><p>Restore 必須在隔離環境先完成。直接覆蓋 production 會讓 evidence 與 rollback 空間消失。</p>
<h2 id="validation-query">Validation Query</h2>
<p>Validation query 的核心責任是確認 restore 到正確時間點。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">psql <span class="s2">&#34;</span><span class="nv">$RESTORED_DATABASE_URL</span><span class="s2">&#34;</span> <span class="s">&lt;&lt;&#39;SQL&#39;
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="s">SELECT marker, created_at
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="s">FROM restore_markers
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="s">ORDER BY id;
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="s">
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="s">SELECT status, count(*)
</span></span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="s">FROM accounts
</span></span></span><span class="line"><span class="ln">8</span><span class="cl"><span class="s">GROUP BY status;
</span></span></span><span class="line"><span class="ln">9</span><span class="cl"><span class="s">SQL</span></span></span></code></pre></div><p>預期結果是存在 <code>before-bad-change</code>，且 <code>bad-change-after-target</code> 尚未出現。<code>accounts</code> 狀態應維持 target time 前的分布。</p>
<h2 id="rpo--rto-evidence">RPO / RTO Evidence</h2>
<p>RPO / RTO evidence 的核心責任是把 drill 結果轉成服務語言。</p>
<table>
  <thead>
      <tr>
          <th>Evidence</th>
          <th>記錄內容</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Backup timestamp</td>
          <td>使用哪份 base backup</td>
      </tr>
      <tr>
          <td>Target time</td>
          <td>要恢復到哪一秒</td>
      </tr>
      <tr>
          <td>WAL availability</td>
          <td>target time 前後 WAL 是否完整</td>
      </tr>
      <tr>
          <td>Restore duration</td>
          <td>從開始 restore 到 validation 成功</td>
      </tr>
      <tr>
          <td>Data gap</td>
          <td>target time 後需補償的 transaction</td>
      </tr>
      <tr>
          <td>Smoke test</td>
          <td>application 核心 workflow 是否可用</td>
      </tr>
  </tbody>
</table>
<p>PITR 的成功標準是資料與 application 都可用。只讓 PostgreSQL 啟動成功，還不足以交付服務。</p>
<h2 id="drill-retrospective">Drill Retrospective</h2>
<p>Drill retrospective 的核心責任是把演練缺口轉成下一步。</p>
<p>常見缺口：</p>
<ol>
<li>找不到正確 base backup。</li>
<li>WAL archive 缺段。</li>
<li>target time timezone 混亂。</li>
<li>Restore 太慢，超過 RTO。</li>
<li>Application secret / config 指不到 restored DB。</li>
<li>Validation query 缺少 business invariant。</li>
</ol>
<p>完成本篇後，跨區恢復讀 <a href="../../cross-region-dr/">Cross-region DR</a>；備份策略讀 <a href="../../pitr-wal-archiving/">PITR + WAL Archiving</a>。</p>
]]></content:encoded></item><item><title>PostgreSQL Schema Migration Evidence Lab</title><link>https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/hands-on/schema-migration-evidence-lab/</link><pubDate>Fri, 22 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/hands-on/schema-migration-evidence-lab/</guid><description>&lt;p>PostgreSQL schema migration evidence lab 的核心責任是把 schema change 轉成 release gate 可使用的 evidence。這篇承接 &lt;a href="../../online-schema-change/">Online Schema Change&lt;/a> 與 &lt;a href="https://tarrragon.github.io/blog/backend/01-database/database-migration-playbook/" data-link-title="1.6 資料庫轉換實作：雙寫、回填、切流與回滾" data-link-desc="同 DB 內 schema 演進與資料變更的可分段驗證流程、跟 1.12 cross-DB migration 分工">Database Migration Playbook&lt;/a>。&lt;/p>
&lt;p>本文的驗收標準是：你能設計 expand migration、量測 lock、跑 backfill validation、建立 contract migration 的 fail-forward / rollback 判準。&lt;/p>
&lt;h2 id="expand-migration">Expand Migration&lt;/h2>
&lt;p>Expand migration 的核心責任是先加入向後相容 schema。以下範例新增 &lt;code>accounts.email&lt;/code>，先允許 null。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">psql &lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="nv">$DATABASE_URL&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span> &lt;span class="s">&amp;lt;&amp;lt;&amp;#39;SQL&amp;#39;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="s">\timing on
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">&lt;span class="s">BEGIN;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">&lt;span class="s">ALTER TABLE accounts ADD COLUMN email text;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl">&lt;span class="s">COMMIT;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">6&lt;/span>&lt;span class="cl">&lt;span class="s">SQL&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>新增 nullable column 通常是低風險操作，但仍要記錄 timing 與 lock。正式服務要在低流量窗口或 staging 上先測。&lt;/p>
&lt;h2 id="lock-evidence">Lock Evidence&lt;/h2>
&lt;p>Lock evidence 的核心責任是讓 migration 的阻塞風險可見。開另一個 terminal，在 migration 前後查 lock。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">psql &lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="nv">$DATABASE_URL&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span> &lt;span class="s">&amp;lt;&amp;lt;&amp;#39;SQL&amp;#39;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="s">SELECT locktype, relation::regclass, mode, granted, pid
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">&lt;span class="s">FROM pg_locks
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">&lt;span class="s">WHERE relation IN (&amp;#39;accounts&amp;#39;::regclass, &amp;#39;ledger_entries&amp;#39;::regclass)
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl">&lt;span class="s">ORDER BY granted, mode;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">6&lt;/span>&lt;span class="cl">&lt;span class="s">SQL&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Release gate 要保存 lock mode、duration、blocked session 與 application impact。高風險 DDL 要先改成 expand / backfill / contract。&lt;/p>
&lt;h2 id="backfill-and-validation">Backfill and Validation&lt;/h2>
&lt;p>Backfill and validation 的核心責任是把資料補齊並證明結果符合 domain。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">psql &lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="nv">$DATABASE_URL&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span> &lt;span class="s">&amp;lt;&amp;lt;&amp;#39;SQL&amp;#39;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="s">UPDATE accounts
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">&lt;span class="s">SET email = lower(owner_name) || &amp;#39;@example.test&amp;#39;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">&lt;span class="s">WHERE email IS NULL;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl">&lt;span class="s">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">6&lt;/span>&lt;span class="cl">&lt;span class="s">SELECT count(*) AS missing_email
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">7&lt;/span>&lt;span class="cl">&lt;span class="s">FROM accounts
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">8&lt;/span>&lt;span class="cl">&lt;span class="s">WHERE email IS NULL;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">9&lt;/span>&lt;span class="cl">&lt;span class="s">SQL&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>大型表要分 batch backfill，避免 WAL、replica lag、autovacuum 與 lock 壓力。每個 batch 要記錄 row count、duration、error 與 lag。&lt;/p>
&lt;h2 id="add-constraint-safely">Add Constraint Safely&lt;/h2>
&lt;p>Add constraint safely 的核心責任是把資料驗證和 constraint 生效拆開。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">psql &lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="nv">$DATABASE_URL&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span> &lt;span class="s">&amp;lt;&amp;lt;&amp;#39;SQL&amp;#39;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="s">ALTER TABLE accounts
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">&lt;span class="s">ADD CONSTRAINT accounts_email_present
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">&lt;span class="s">CHECK (email IS NOT NULL) NOT VALID;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl">&lt;span class="s">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">6&lt;/span>&lt;span class="cl">&lt;span class="s">ALTER TABLE accounts
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">7&lt;/span>&lt;span class="cl">&lt;span class="s">VALIDATE CONSTRAINT accounts_email_present;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">8&lt;/span>&lt;span class="cl">&lt;span class="s">SQL&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;code>NOT VALID&lt;/code> 讓 constraint 先約束新資料，再用 validation 掃既有資料。這是 PostgreSQL online migration 常用技巧。&lt;/p></description><content:encoded><![CDATA[<p>PostgreSQL schema migration evidence lab 的核心責任是把 schema change 轉成 release gate 可使用的 evidence。這篇承接 <a href="../../online-schema-change/">Online Schema Change</a> 與 <a href="/blog/backend/01-database/database-migration-playbook/" data-link-title="1.6 資料庫轉換實作：雙寫、回填、切流與回滾" data-link-desc="同 DB 內 schema 演進與資料變更的可分段驗證流程、跟 1.12 cross-DB migration 分工">Database Migration Playbook</a>。</p>
<p>本文的驗收標準是：你能設計 expand migration、量測 lock、跑 backfill validation、建立 contract migration 的 fail-forward / rollback 判準。</p>
<h2 id="expand-migration">Expand Migration</h2>
<p>Expand migration 的核心責任是先加入向後相容 schema。以下範例新增 <code>accounts.email</code>，先允許 null。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">psql <span class="s2">&#34;</span><span class="nv">$DATABASE_URL</span><span class="s2">&#34;</span> <span class="s">&lt;&lt;&#39;SQL&#39;
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="s">\timing on
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="s">BEGIN;
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="s">ALTER TABLE accounts ADD COLUMN email text;
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="s">COMMIT;
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="s">SQL</span></span></span></code></pre></div><p>新增 nullable column 通常是低風險操作，但仍要記錄 timing 與 lock。正式服務要在低流量窗口或 staging 上先測。</p>
<h2 id="lock-evidence">Lock Evidence</h2>
<p>Lock evidence 的核心責任是讓 migration 的阻塞風險可見。開另一個 terminal，在 migration 前後查 lock。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">psql <span class="s2">&#34;</span><span class="nv">$DATABASE_URL</span><span class="s2">&#34;</span> <span class="s">&lt;&lt;&#39;SQL&#39;
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="s">SELECT locktype, relation::regclass, mode, granted, pid
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="s">FROM pg_locks
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="s">WHERE relation IN (&#39;accounts&#39;::regclass, &#39;ledger_entries&#39;::regclass)
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="s">ORDER BY granted, mode;
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="s">SQL</span></span></span></code></pre></div><p>Release gate 要保存 lock mode、duration、blocked session 與 application impact。高風險 DDL 要先改成 expand / backfill / contract。</p>
<h2 id="backfill-and-validation">Backfill and Validation</h2>
<p>Backfill and validation 的核心責任是把資料補齊並證明結果符合 domain。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">psql <span class="s2">&#34;</span><span class="nv">$DATABASE_URL</span><span class="s2">&#34;</span> <span class="s">&lt;&lt;&#39;SQL&#39;
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="s">UPDATE accounts
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="s">SET email = lower(owner_name) || &#39;@example.test&#39;
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="s">WHERE email IS NULL;
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="s">
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="s">SELECT count(*) AS missing_email
</span></span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="s">FROM accounts
</span></span></span><span class="line"><span class="ln">8</span><span class="cl"><span class="s">WHERE email IS NULL;
</span></span></span><span class="line"><span class="ln">9</span><span class="cl"><span class="s">SQL</span></span></span></code></pre></div><p>大型表要分 batch backfill，避免 WAL、replica lag、autovacuum 與 lock 壓力。每個 batch 要記錄 row count、duration、error 與 lag。</p>
<h2 id="add-constraint-safely">Add Constraint Safely</h2>
<p>Add constraint safely 的核心責任是把資料驗證和 constraint 生效拆開。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">psql <span class="s2">&#34;</span><span class="nv">$DATABASE_URL</span><span class="s2">&#34;</span> <span class="s">&lt;&lt;&#39;SQL&#39;
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="s">ALTER TABLE accounts
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="s">ADD CONSTRAINT accounts_email_present
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="s">CHECK (email IS NOT NULL) NOT VALID;
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="s">
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="s">ALTER TABLE accounts
</span></span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="s">VALIDATE CONSTRAINT accounts_email_present;
</span></span></span><span class="line"><span class="ln">8</span><span class="cl"><span class="s">SQL</span></span></span></code></pre></div><p><code>NOT VALID</code> 讓 constraint 先約束新資料，再用 validation 掃既有資料。這是 PostgreSQL online migration 常用技巧。</p>
<h2 id="query-plan-evidence">Query Plan Evidence</h2>
<p>Query plan evidence 的核心責任是確認 migration 後 query 仍走正確路徑。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">psql <span class="s2">&#34;</span><span class="nv">$DATABASE_URL</span><span class="s2">&#34;</span> <span class="s">&lt;&lt;&#39;SQL&#39;
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="s">EXPLAIN (ANALYZE, BUFFERS)
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="s">SELECT *
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="s">FROM accounts
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="s">WHERE email = &#39;ada@example.test&#39;;
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="s">SQL</span></span></span></code></pre></div><p>若 email 查詢成為正式 path，要新增 index，並用 <code>CREATE INDEX CONCURRENTLY</code> 評估 lock 與時間。</p>
<h2 id="contract-migration">Contract Migration</h2>
<p>Contract migration 的核心責任是在 application 都改用新欄位後，收斂舊欄位或舊 constraint。Contract migration 要比 expand 更謹慎，因為 rollback 空間更小。</p>
<p>Contract release gate：</p>
<ol>
<li>所有 app version 已停止讀舊欄位 / 舊行為。</li>
<li>Backfill validation 為零缺口。</li>
<li>Query plan 與 index evidence 已保存。</li>
<li>Rollback path 是 fail-forward 或 restore，兩者擇一寫清楚。</li>
<li>PITR / backup window 符合風險。</li>
</ol>
<h2 id="release-gate-note">Release Gate Note</h2>
<p>Release gate note 的核心責任是形成可交付 artifact。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">Migration: add accounts.email
</span></span><span class="line"><span class="ln">2</span><span class="cl">Expand DDL duration:
</span></span><span class="line"><span class="ln">3</span><span class="cl">Backfill rows:
</span></span><span class="line"><span class="ln">4</span><span class="cl">Validation query:
</span></span><span class="line"><span class="ln">5</span><span class="cl">Lock evidence:
</span></span><span class="line"><span class="ln">6</span><span class="cl">Query plan:
</span></span><span class="line"><span class="ln">7</span><span class="cl">Rollback / fail-forward:
</span></span><span class="line"><span class="ln">8</span><span class="cl">Owner:</span></span></code></pre></div><p>完成本篇後，複雜 migration 回到 <a href="../../online-schema-change/">Online Schema Change</a>；需要跨 DB 遷移則讀 <a href="/blog/backend/01-database/database-migration-playbook/" data-link-title="1.6 資料庫轉換實作：雙寫、回填、切流與回滾" data-link-desc="同 DB 內 schema 演進與資料變更的可分段驗證流程、跟 1.12 cross-DB migration 分工">Database Migration Playbook</a>。</p>
]]></content:encoded></item><item><title>SQLite Backup Restore Drill</title><link>https://tarrragon.github.io/blog/backend/01-database/vendors/sqlite/hands-on/backup-restore-drill/</link><pubDate>Thu, 21 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/01-database/vendors/sqlite/hands-on/backup-restore-drill/</guid><description>&lt;p>SQLite backup restore drill 的核心責任是證明單檔 database 可以被一致備份並還原。這篇承接 &lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/sqlite/file-lifecycle-backup-boundary/" data-link-title="SQLite file lifecycle 與 backup boundary" data-link-desc="把 SQLite 單檔案正式狀態拆成 WAL、backup API、restore drill、corruption recovery 與操作責任邊界">File lifecycle / backup boundary&lt;/a>，把備份從概念轉成 artifact、validation query 與 RPO / RTO note。&lt;/p>
&lt;p>本文的驗收標準是：你能從 live &lt;code>app.db&lt;/code> 建立 backup，將它還原到隔離路徑，通過 &lt;code>integrity_check&lt;/code> 與核心查詢，並記錄 restore duration。&lt;/p>
&lt;h2 id="prepare-source">Prepare Source&lt;/h2>
&lt;p>Prepare source 的核心責任是建立一個有 WAL 與資料變化的 live database。若你已跑過 &lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/sqlite/hands-on/local-file-quickstart/" data-link-title="SQLite Local File Quickstart" data-link-desc="SQLite local .db file、schema、seed data、PRAGMA baseline、query sample 與 cleanup 的操作說明">local file quickstart&lt;/a>，可以直接沿用 &lt;code>/tmp/sqlite-lab/app.db&lt;/code>。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">mkdir -p /tmp/sqlite-lab/backup /tmp/sqlite-lab/restore
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="nb">cd&lt;/span> /tmp/sqlite-lab
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">sqlite3 app.db &lt;span class="s2">&amp;#34;PRAGMA journal_mode = WAL;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">sqlite3 app.db &lt;span class="s2">&amp;#34;INSERT INTO ledger_entries(account_id, amount_cents, idempotency_key, created_at) VALUES (2, 100, &amp;#39;backup-drill-1&amp;#39;, &amp;#39;2026-05-21T01:00:00Z&amp;#39;);&amp;#34;&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>這一步讓 source database 有新的資料。後續會用 backup snapshot 和 source 後續寫入做對照。&lt;/p>
&lt;h2 id="create-backup">Create Backup&lt;/h2>
&lt;p>Create backup 的核心責任是用 SQLite-aware 方法建立一致 snapshot。SQLite CLI &lt;code>.backup&lt;/code> 會透過 SQLite backup API 產出目標檔案。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">sqlite3 app.db &lt;span class="s2">&amp;#34;.backup &amp;#39;backup/app-backup.db&amp;#39;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">sqlite3 backup/app-backup.db &lt;span class="s2">&amp;#34;PRAGMA integrity_check;&amp;#34;&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>預期 &lt;code>integrity_check&lt;/code> 輸出 &lt;code>ok&lt;/code>。這是最小 backup evidence。&lt;/p>
&lt;p>&lt;code>VACUUM INTO&lt;/code> 也可以產出 compact copy，適合想順便整理檔案大小的情境。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">sqlite3 app.db &lt;span class="s2">&amp;#34;VACUUM INTO &amp;#39;backup/app-vacuum-copy.db&amp;#39;;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">sqlite3 backup/app-vacuum-copy.db &lt;span class="s2">&amp;#34;PRAGMA integrity_check;&amp;#34;&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;code>.backup&lt;/code> 與 &lt;code>VACUUM INTO&lt;/code> 都要在 runbook 中標明使用條件、耗時、目標路徑與失敗處理。正式環境還要記錄檔案大小、checksum 與 storage retention。&lt;/p>
&lt;h2 id="mutate-source-after-backup">Mutate Source After Backup&lt;/h2>
&lt;p>Mutate source 的核心責任是確認 backup 是時間點 snapshot。備份後對 source 寫入新資料，再用 restore 驗證 backup 保持原時間點。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">sqlite3 app.db &lt;span class="s2">&amp;#34;INSERT INTO ledger_entries(account_id, amount_cents, idempotency_key, created_at) VALUES (1, 777, &amp;#39;after-backup-write&amp;#39;, &amp;#39;2026-05-21T01:05:00Z&amp;#39;);&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">sqlite3 app.db &lt;span class="s2">&amp;#34;SELECT COUNT(*) FROM ledger_entries;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">sqlite3 backup/app-backup.db &lt;span class="s2">&amp;#34;SELECT COUNT(*) FROM ledger_entries;&amp;#34;&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Source count 應比 backup count 多一筆。這個差異讓 RPO 討論具體化：backup 只保護到它建立的時間點。&lt;/p>
&lt;h2 id="restore-isolated-copy">Restore Isolated Copy&lt;/h2>
&lt;p>Restore isolated copy 的核心責任是避免把演練和 source 混在一起。把 backup 複製到 restore path，所有 validation 都對 restore file 執行。&lt;/p></description><content:encoded><![CDATA[<p>SQLite backup restore drill 的核心責任是證明單檔 database 可以被一致備份並還原。這篇承接 <a href="/blog/backend/01-database/vendors/sqlite/file-lifecycle-backup-boundary/" data-link-title="SQLite file lifecycle 與 backup boundary" data-link-desc="把 SQLite 單檔案正式狀態拆成 WAL、backup API、restore drill、corruption recovery 與操作責任邊界">File lifecycle / backup boundary</a>，把備份從概念轉成 artifact、validation query 與 RPO / RTO note。</p>
<p>本文的驗收標準是：你能從 live <code>app.db</code> 建立 backup，將它還原到隔離路徑，通過 <code>integrity_check</code> 與核心查詢，並記錄 restore duration。</p>
<h2 id="prepare-source">Prepare Source</h2>
<p>Prepare source 的核心責任是建立一個有 WAL 與資料變化的 live database。若你已跑過 <a href="/blog/backend/01-database/vendors/sqlite/hands-on/local-file-quickstart/" data-link-title="SQLite Local File Quickstart" data-link-desc="SQLite local .db file、schema、seed data、PRAGMA baseline、query sample 與 cleanup 的操作說明">local file quickstart</a>，可以直接沿用 <code>/tmp/sqlite-lab/app.db</code>。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">mkdir -p /tmp/sqlite-lab/backup /tmp/sqlite-lab/restore
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="nb">cd</span> /tmp/sqlite-lab
</span></span><span class="line"><span class="ln">3</span><span class="cl">sqlite3 app.db <span class="s2">&#34;PRAGMA journal_mode = WAL;&#34;</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">sqlite3 app.db <span class="s2">&#34;INSERT INTO ledger_entries(account_id, amount_cents, idempotency_key, created_at) VALUES (2, 100, &#39;backup-drill-1&#39;, &#39;2026-05-21T01:00:00Z&#39;);&#34;</span></span></span></code></pre></div><p>這一步讓 source database 有新的資料。後續會用 backup snapshot 和 source 後續寫入做對照。</p>
<h2 id="create-backup">Create Backup</h2>
<p>Create backup 的核心責任是用 SQLite-aware 方法建立一致 snapshot。SQLite CLI <code>.backup</code> 會透過 SQLite backup API 產出目標檔案。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">sqlite3 app.db <span class="s2">&#34;.backup &#39;backup/app-backup.db&#39;&#34;</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">sqlite3 backup/app-backup.db <span class="s2">&#34;PRAGMA integrity_check;&#34;</span></span></span></code></pre></div><p>預期 <code>integrity_check</code> 輸出 <code>ok</code>。這是最小 backup evidence。</p>
<p><code>VACUUM INTO</code> 也可以產出 compact copy，適合想順便整理檔案大小的情境。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">sqlite3 app.db <span class="s2">&#34;VACUUM INTO &#39;backup/app-vacuum-copy.db&#39;;&#34;</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">sqlite3 backup/app-vacuum-copy.db <span class="s2">&#34;PRAGMA integrity_check;&#34;</span></span></span></code></pre></div><p><code>.backup</code> 與 <code>VACUUM INTO</code> 都要在 runbook 中標明使用條件、耗時、目標路徑與失敗處理。正式環境還要記錄檔案大小、checksum 與 storage retention。</p>
<h2 id="mutate-source-after-backup">Mutate Source After Backup</h2>
<p>Mutate source 的核心責任是確認 backup 是時間點 snapshot。備份後對 source 寫入新資料，再用 restore 驗證 backup 保持原時間點。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">sqlite3 app.db <span class="s2">&#34;INSERT INTO ledger_entries(account_id, amount_cents, idempotency_key, created_at) VALUES (1, 777, &#39;after-backup-write&#39;, &#39;2026-05-21T01:05:00Z&#39;);&#34;</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">sqlite3 app.db <span class="s2">&#34;SELECT COUNT(*) FROM ledger_entries;&#34;</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">sqlite3 backup/app-backup.db <span class="s2">&#34;SELECT COUNT(*) FROM ledger_entries;&#34;</span></span></span></code></pre></div><p>Source count 應比 backup count 多一筆。這個差異讓 RPO 討論具體化：backup 只保護到它建立的時間點。</p>
<h2 id="restore-isolated-copy">Restore Isolated Copy</h2>
<p>Restore isolated copy 的核心責任是避免把演練和 source 混在一起。把 backup 複製到 restore path，所有 validation 都對 restore file 執行。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl">cp backup/app-backup.db restore/app-restored.db
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">sqlite3 restore/app-restored.db <span class="s2">&#34;PRAGMA integrity_check;&#34;</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">sqlite3 restore/app-restored.db <span class="s">&lt;&lt;&#39;SQL&#39;
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="s">.headers on
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="s">.mode column
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="s">SELECT account_id, SUM(amount_cents) AS balance_cents
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="s">FROM ledger_entries
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="s">GROUP BY account_id
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="s">ORDER BY account_id;
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="s">SQL</span></span></span></code></pre></div><p>正式 restore drill 還要啟動 application 指向 <code>restore/app-restored.db</code>，跑核心 read/write smoke test。若 application 需要 migration，也要確認 restore file 的 <code>PRAGMA user_version</code> 與 app version 相容。</p>
<h2 id="rpo--rto-note">RPO / RTO Note</h2>
<p>RPO / RTO note 的核心責任是把演練結果轉成服務承諾。RPO 是可接受資料遺失窗口，RTO 是可接受恢復時間。</p>
<table>
  <thead>
      <tr>
          <th>指標</th>
          <th>本 lab 記錄方式</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>RPO</td>
          <td>backup 建立時間到事故時間的資料差距</td>
      </tr>
      <tr>
          <td>RTO</td>
          <td>從取得 backup 到 app smoke test 成功耗時</td>
      </tr>
  </tbody>
</table>
<p>可以用 shell 的 <code>time</code> 記錄 restore duration。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="nb">time</span> sqlite3 restore/app-restored.db <span class="s2">&#34;PRAGMA integrity_check;&#34;</span></span></span></code></pre></div><p>正式服務要把 RPO / RTO 寫進 <a href="/blog/backend/01-database/vendors/sqlite/observability-runbook/" data-link-title="SQLite Observability and Runbook" data-link-desc="SQLite production runbook、backup evidence、WAL growth、busy errors、disk usage、restore drill 與 incident route">observability / runbook</a>。</p>
<h2 id="known-gap">Known Gap</h2>
<p>Known gap 的核心責任是讓 lab 結果誠實。這個 drill 驗證 SQLite-aware backup 與 restore path；它尚未覆蓋 object storage credential、remote retention、large database restore time、encrypted disk、user device support flow 與 legal retention。</p>
<p>完成本篇後，下一步可以進入 <a href="/blog/backend/01-database/vendors/sqlite/hands-on/wal-busy-reproduction/" data-link-title="SQLite WAL Busy Reproduction" data-link-desc="SQLite long transaction、SQLITE_BUSY、busy_timeout、checkpoint growth 與 writer queue 的操作說明">WAL busy reproduction</a> 觀察 writer boundary，或進入 <a href="/blog/backend/01-database/vendors/sqlite/hands-on/migration-fixture-lab/" data-link-title="SQLite Migration Fixture Lab" data-link-desc="SQLite user_version、table rebuild migration、fixture snapshot、rollback note 與 CI evidence 的操作說明">migration fixture lab</a> 建立 schema change evidence。</p>
]]></content:encoded></item><item><title>SQLite D1 / Turso Preview Lab</title><link>https://tarrragon.github.io/blog/backend/01-database/vendors/sqlite/hands-on/d1-turso-preview-lab/</link><pubDate>Thu, 21 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/01-database/vendors/sqlite/hands-on/d1-turso-preview-lab/</guid><description>&lt;p>SQLite D1 / Turso preview lab 的核心責任是把 local SQLite 轉向 edge SQLite product 前的 compatibility gap 找出來。這篇承接 &lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/sqlite/d1-turso-libsql-comparison/" data-link-title="SQLite D1 / Turso / libSQL Comparison" data-link-desc="Cloudflare D1、Turso、libSQL 與 local SQLite 在 edge、replication、consistency、migration 與 vendor boundary 的比較">D1 / Turso / libSQL Comparison&lt;/a> 與 &lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/sqlite/migrate-to-d1-turso/" data-link-title="SQLite to D1 / Turso Migration" data-link-desc="SQLite 轉向 Cloudflare D1、Turso / libSQL 的 edge driver、compatibility audit、data movement 與 rollback">SQLite to D1 / Turso Migration&lt;/a>，把 edge migration 變成可回報的 query matrix。&lt;/p>
&lt;p>本文的驗收標準是：你能從 local SQLite 匯出 schema / seed，匯入 D1 或 Turso preview database，跑相同 query set，記錄 unsupported SQL、latency、error mapping 與 rollback route。&lt;/p>
&lt;h2 id="preview-scope">Preview Scope&lt;/h2>
&lt;p>Preview scope 的核心責任是把 lab 限制在 staging / preview。D1 與 Turso 都是平台產品，實際命令會依 CLI version、帳號、region 與專案設定改變；本文提供操作骨架與 evidence 格式，正式命令以官方文件為準。&lt;/p>
&lt;p>官方文件路由：&lt;/p>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>產品&lt;/th>
 &lt;th>官方文件&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>Cloudflare D1&lt;/td>
 &lt;td>&lt;a href="https://developers.cloudflare.com/d1/">Cloudflare D1 docs&lt;/a>&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>D1 limits&lt;/td>
 &lt;td>&lt;a href="https://developers.cloudflare.com/d1/platform/limits/">Cloudflare D1 limits&lt;/a>&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Turso&lt;/td>
 &lt;td>&lt;a href="https://docs.turso.tech/">Turso docs&lt;/a>&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Turso embedded replicas&lt;/td>
 &lt;td>&lt;a href="https://docs.turso.tech/features/embedded-replicas/introduction">Embedded replicas&lt;/a>&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>Preview lab 要先確認資料不含 production PII。若 seed data 來自正式資料，先做 masking 或 synthetic data。&lt;/p>
&lt;h2 id="export-local-sqlite">Export Local SQLite&lt;/h2>
&lt;p>Export local SQLite 的核心責任是建立 target platform 的 seed input。沿用 &lt;code>/tmp/sqlite-lab/app.db&lt;/code> 或 migration fixture。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">mkdir -p /tmp/sqlite-edge-lab
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="nb">cd&lt;/span> /tmp/sqlite-edge-lab
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">cp /tmp/sqlite-lab/app.db ./app.db
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">sqlite3 app.db &lt;span class="s2">&amp;#34;.schema&amp;#34;&lt;/span> &amp;gt; schema.sql
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl">sqlite3 app.db &lt;span class="s2">&amp;#34;.dump&amp;#34;&lt;/span> &amp;gt; seed.sql
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">6&lt;/span>&lt;span class="cl">sqlite3 app.db &lt;span class="s2">&amp;#34;SELECT COUNT(*) FROM accounts;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">7&lt;/span>&lt;span class="cl">sqlite3 app.db &lt;span class="s2">&amp;#34;SELECT COUNT(*) FROM ledger_entries;&amp;#34;&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;code>schema.sql&lt;/code> 用來審查 DDL，&lt;code>seed.sql&lt;/code> 用來匯入 preview database。正式 migration 可能要拆 schema / data / index，並處理 target platform limits。&lt;/p>
&lt;h2 id="build-query-matrix">Build Query Matrix&lt;/h2>
&lt;p>Build query matrix 的核心責任是定義 preview 要驗證什麼。query set 要代表產品行為，而非只跑一個 &lt;code>SELECT 1&lt;/code>。&lt;/p></description><content:encoded><![CDATA[<p>SQLite D1 / Turso preview lab 的核心責任是把 local SQLite 轉向 edge SQLite product 前的 compatibility gap 找出來。這篇承接 <a href="/blog/backend/01-database/vendors/sqlite/d1-turso-libsql-comparison/" data-link-title="SQLite D1 / Turso / libSQL Comparison" data-link-desc="Cloudflare D1、Turso、libSQL 與 local SQLite 在 edge、replication、consistency、migration 與 vendor boundary 的比較">D1 / Turso / libSQL Comparison</a> 與 <a href="/blog/backend/01-database/vendors/sqlite/migrate-to-d1-turso/" data-link-title="SQLite to D1 / Turso Migration" data-link-desc="SQLite 轉向 Cloudflare D1、Turso / libSQL 的 edge driver、compatibility audit、data movement 與 rollback">SQLite to D1 / Turso Migration</a>，把 edge migration 變成可回報的 query matrix。</p>
<p>本文的驗收標準是：你能從 local SQLite 匯出 schema / seed，匯入 D1 或 Turso preview database，跑相同 query set，記錄 unsupported SQL、latency、error mapping 與 rollback route。</p>
<h2 id="preview-scope">Preview Scope</h2>
<p>Preview scope 的核心責任是把 lab 限制在 staging / preview。D1 與 Turso 都是平台產品，實際命令會依 CLI version、帳號、region 與專案設定改變；本文提供操作骨架與 evidence 格式，正式命令以官方文件為準。</p>
<p>官方文件路由：</p>
<table>
  <thead>
      <tr>
          <th>產品</th>
          <th>官方文件</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Cloudflare D1</td>
          <td><a href="https://developers.cloudflare.com/d1/">Cloudflare D1 docs</a></td>
      </tr>
      <tr>
          <td>D1 limits</td>
          <td><a href="https://developers.cloudflare.com/d1/platform/limits/">Cloudflare D1 limits</a></td>
      </tr>
      <tr>
          <td>Turso</td>
          <td><a href="https://docs.turso.tech/">Turso docs</a></td>
      </tr>
      <tr>
          <td>Turso embedded replicas</td>
          <td><a href="https://docs.turso.tech/features/embedded-replicas/introduction">Embedded replicas</a></td>
      </tr>
  </tbody>
</table>
<p>Preview lab 要先確認資料不含 production PII。若 seed data 來自正式資料，先做 masking 或 synthetic data。</p>
<h2 id="export-local-sqlite">Export Local SQLite</h2>
<p>Export local SQLite 的核心責任是建立 target platform 的 seed input。沿用 <code>/tmp/sqlite-lab/app.db</code> 或 migration fixture。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">mkdir -p /tmp/sqlite-edge-lab
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="nb">cd</span> /tmp/sqlite-edge-lab
</span></span><span class="line"><span class="ln">3</span><span class="cl">cp /tmp/sqlite-lab/app.db ./app.db
</span></span><span class="line"><span class="ln">4</span><span class="cl">sqlite3 app.db <span class="s2">&#34;.schema&#34;</span> &gt; schema.sql
</span></span><span class="line"><span class="ln">5</span><span class="cl">sqlite3 app.db <span class="s2">&#34;.dump&#34;</span> &gt; seed.sql
</span></span><span class="line"><span class="ln">6</span><span class="cl">sqlite3 app.db <span class="s2">&#34;SELECT COUNT(*) FROM accounts;&#34;</span>
</span></span><span class="line"><span class="ln">7</span><span class="cl">sqlite3 app.db <span class="s2">&#34;SELECT COUNT(*) FROM ledger_entries;&#34;</span></span></span></code></pre></div><p><code>schema.sql</code> 用來審查 DDL，<code>seed.sql</code> 用來匯入 preview database。正式 migration 可能要拆 schema / data / index，並處理 target platform limits。</p>
<h2 id="build-query-matrix">Build Query Matrix</h2>
<p>Build query matrix 的核心責任是定義 preview 要驗證什麼。query set 要代表產品行為，而非只跑一個 <code>SELECT 1</code>。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">Q1 list account balances
</span></span><span class="line"><span class="ln">2</span><span class="cl">Q2 insert ledger entry with unique idempotency key
</span></span><span class="line"><span class="ln">3</span><span class="cl">Q3 insert duplicate idempotency key and capture error
</span></span><span class="line"><span class="ln">4</span><span class="cl">Q4 foreign key violation
</span></span><span class="line"><span class="ln">5</span><span class="cl">Q5 transaction rollback
</span></span><span class="line"><span class="ln">6</span><span class="cl">Q6 pagination by created_at
</span></span><span class="line"><span class="ln">7</span><span class="cl">Q7 explain / performance sample if platform supports it</span></span></code></pre></div><p>這份 matrix 要保存 expected result。Local SQLite 先跑一次，把 row count、error category、latency baseline 記下來。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">sqlite3 app.db <span class="s">&lt;&lt;&#39;SQL&#39;
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="s">.timer on
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="s">SELECT a.id, a.owner_name, SUM(l.amount_cents) AS balance_cents
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="s">FROM accounts a
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="s">JOIN ledger_entries l ON l.account_id = a.id
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="s">GROUP BY a.id, a.owner_name
</span></span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="s">ORDER BY a.id;
</span></span></span><span class="line"><span class="ln">8</span><span class="cl"><span class="s">SQL</span></span></span></code></pre></div><h2 id="import-to-d1-preview">Import to D1 Preview</h2>
<p>Import to D1 preview 的核心責任是驗證 Cloudflare D1 workflow。以下是操作骨架，正式命令與 flags 以 Cloudflare D1 docs 和 Wrangler 版本為準。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># Example shape only. Use your project naming and official Wrangler docs.</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">wrangler d1 create sqlite_edge_preview
</span></span><span class="line"><span class="ln">3</span><span class="cl">wrangler d1 execute sqlite_edge_preview --file<span class="o">=</span>seed.sql
</span></span><span class="line"><span class="ln">4</span><span class="cl">wrangler d1 execute sqlite_edge_preview --command<span class="o">=</span><span class="s2">&#34;SELECT COUNT(*) FROM accounts;&#34;</span></span></span></code></pre></div><p>D1 preview evidence 要記錄：</p>
<table>
  <thead>
      <tr>
          <th>Evidence</th>
          <th>內容</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>CLI version</td>
          <td>Wrangler version、account / project</td>
      </tr>
      <tr>
          <td>Import log</td>
          <td>duration、file size、error</td>
      </tr>
      <tr>
          <td>Query result</td>
          <td>每個 query 的 row count / error</td>
      </tr>
      <tr>
          <td>Limit hit</td>
          <td>D1 limits 是否影響 seed 或 query</td>
      </tr>
      <tr>
          <td>Rollback</td>
          <td>刪除 preview DB 或重建 seed</td>
      </tr>
  </tbody>
</table>
<p>若 seed file 太大或某些 SQL 需要改寫，就把 gap 寫進 compatibility matrix，先保留 production migration 的審查邊界。</p>
<h2 id="import-to-turso-preview">Import to Turso Preview</h2>
<p>Import to Turso preview 的核心責任是驗證 remote database、client SDK 與 embedded replica 行為。以下是操作骨架，正式命令以 Turso docs 與 CLI version 為準。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># Example shape only. Use your org, group, region and official Turso docs.</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">turso db create sqlite-edge-preview
</span></span><span class="line"><span class="ln">3</span><span class="cl">turso db shell sqlite-edge-preview &lt; seed.sql
</span></span><span class="line"><span class="ln">4</span><span class="cl">turso db shell sqlite-edge-preview <span class="s2">&#34;SELECT COUNT(*) FROM accounts;&#34;</span></span></span></code></pre></div><p>Turso preview evidence 要多記 replica freshness。若使用 embedded replica，測試流程要包含 bootstrap、sync、read query、write delegation 與 sync 後 read。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">embedded replica evidence:
</span></span><span class="line"><span class="ln">2</span><span class="cl">  bootstrap duration
</span></span><span class="line"><span class="ln">3</span><span class="cl">  first read latency
</span></span><span class="line"><span class="ln">4</span><span class="cl">  write path
</span></span><span class="line"><span class="ln">5</span><span class="cl">  sync command / interval
</span></span><span class="line"><span class="ln">6</span><span class="cl">  read freshness after write</span></span></code></pre></div><p>Freshness 是 product decision。若 query matrix 只測 remote primary，仍需要追加 embedded replica 的使用者體驗驗證。</p>
<h2 id="compatibility-matrix">Compatibility Matrix</h2>
<p>Compatibility matrix 的核心責任是把 local SQLite 與 edge target 的差異留下來。建議表格欄位如下：</p>
<table>
  <thead>
      <tr>
          <th>Query / operation</th>
          <th>Local SQLite</th>
          <th>D1 preview</th>
          <th>Turso preview</th>
          <th>Decision</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Balance list</td>
          <td>pass</td>
          <td>pass / diff</td>
          <td>pass / diff</td>
          <td>keep / rewrite</td>
      </tr>
      <tr>
          <td>Unique violation</td>
          <td>error class</td>
          <td>error class</td>
          <td>error class</td>
          <td>map error</td>
      </tr>
      <tr>
          <td>FK violation</td>
          <td>error class</td>
          <td>error class</td>
          <td>error class</td>
          <td>enable / validate</td>
      </tr>
      <tr>
          <td>Transaction rollback</td>
          <td>pass</td>
          <td>pass / diff</td>
          <td>pass / diff</td>
          <td>rewrite workflow</td>
      </tr>
      <tr>
          <td>Import seed</td>
          <td>pass</td>
          <td>duration / limit</td>
          <td>duration / limit</td>
          <td>split batch</td>
      </tr>
  </tbody>
</table>
<p>Decision 欄要寫具體下一步。<code>rewrite workflow</code> 代表 application adapter 要改；<code>split batch</code> 代表 migration runbook 要改；<code>map error</code> 代表 repository error classification 要改。</p>
<h2 id="latency-and-cost-sample">Latency and Cost Sample</h2>
<p>Latency and cost sample 的核心責任是避免只看功能相容。Edge SQLite migration 的收益常來自 region latency 或 managed operation，因此 preview 要量測主要使用者區域的 read / write latency。</p>
<p>最小量測：</p>
<ol>
<li>Local baseline latency。</li>
<li>Preview target read latency。</li>
<li>Preview target write latency。</li>
<li>Error rate / retry count。</li>
<li>Estimated request / storage / egress cost。</li>
</ol>
<p>Latency sample 要搭配 freshness。快速讀到舊資料和稍慢讀到最新資料是不同產品體驗；query matrix 要標註哪個 workflow 可以接受 stale read。</p>
<h2 id="rollback-route">Rollback Route</h2>
<p>Rollback route 的核心責任是保留 local SQLite 退路。Preview lab 完成後，要能刪除 preview database、保留 local seed、重跑 local app。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">sqlite3 app.db <span class="s2">&#34;PRAGMA integrity_check;&#34;</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">sqlite3 app.db <span class="s2">&#34;SELECT COUNT(*) FROM accounts;&#34;</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">sqlite3 app.db <span class="s2">&#34;SELECT COUNT(*) FROM ledger_entries;&#34;</span></span></span></code></pre></div><p>正式 cutover 的 rollback 還要處理 target-only writes。Preview 階段應避免讓真實使用者寫入 target；若需要 shadow traffic，先用 read-only 或 synthetic write。</p>
<h2 id="completion-note">Completion Note</h2>
<p>Completion note 的核心責任是決定是否進入正式 migration。Lab 完成後應輸出四個 artifact：<code>seed.sql</code>、import log、compatibility matrix、rollback note。</p>
<p>進入正式 migration 的條件：</p>
<ol>
<li>Query matrix 主要 workflow 通過或已有 rewrite plan。</li>
<li>Platform limits 對資料量與 migration time 可接受。</li>
<li>Error mapping 已接到 repository adapter。</li>
<li>Freshness / latency 符合產品需求。</li>
<li>Export / rollback route 已演練。</li>
</ol>
<p>完成本篇後，回到 <a href="/blog/backend/01-database/vendors/sqlite/migrate-to-d1-turso/" data-link-title="SQLite to D1 / Turso Migration" data-link-desc="SQLite 轉向 Cloudflare D1、Turso / libSQL 的 edge driver、compatibility audit、data movement 與 rollback">SQLite to D1 / Turso Migration</a> 補正式 phase plan。</p>
]]></content:encoded></item><item><title>SQLite Hands-on 操作路線</title><link>https://tarrragon.github.io/blog/backend/01-database/vendors/sqlite/hands-on/</link><pubDate>Thu, 21 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/01-database/vendors/sqlite/hands-on/</guid><description>&lt;p>SQLite hands-on 操作路線的核心責任是把單檔正式狀態轉成可演練流程。這一層對齊 LLM &lt;code>hands-on/&lt;/code>：讀者能建立一個 SQLite 檔案、製造 WAL / lock 訊號、跑 backup / restore、套 migration，並知道何時該升級到 server SQL 或 edge SQLite。&lt;/p>
&lt;h2 id="章節列表">章節列表&lt;/h2>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>章節&lt;/th>
 &lt;th>主題&lt;/th>
 &lt;th>產出 artifact&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>&lt;a href="local-file-quickstart/">Local file quickstart&lt;/a>&lt;/td>
 &lt;td>建立 &lt;code>.db&lt;/code>、schema、seed data、basic query&lt;/td>
 &lt;td>database file、schema version、query sample&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;a href="backup-restore-drill/">Backup restore drill&lt;/a>&lt;/td>
 &lt;td>&lt;code>.backup&lt;/code> / &lt;code>VACUUM INTO&lt;/code> / restore validation&lt;/td>
 &lt;td>backup file、restore record、validation query&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;a href="wal-busy-reproduction/">WAL busy reproduction&lt;/a>&lt;/td>
 &lt;td>long transaction、&lt;code>SQLITE_BUSY&lt;/code>、checkpoint growth&lt;/td>
 &lt;td>busy error sample、WAL size evidence&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;a href="migration-fixture-lab/">Migration fixture lab&lt;/a>&lt;/td>
 &lt;td>&lt;code>user_version&lt;/code>、table rebuild、fixture snapshot&lt;/td>
 &lt;td>migration log、fixture DB、rollback note&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;a href="d1-turso-preview-lab/">D1 / Turso preview lab&lt;/a>&lt;/td>
 &lt;td>local SQLite 到 edge SQLite product 的 compatibility preview&lt;/td>
 &lt;td>export / import note、compatibility gap&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;h2 id="設計原則">設計原則&lt;/h2>
&lt;p>SQLite hands-on 章節要以檔案生命週期為中心。操作指令只在能產出 evidence 時出現；每篇都要回答 database file 在哪裡、sidecar file 如何處理、restore 如何驗證，以及 application release 如何知道它仍相容。&lt;/p>
&lt;h2 id="引用路徑">引用路徑&lt;/h2>
&lt;ul>
&lt;li>上游：&lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/sqlite/" data-link-title="SQLite" data-link-desc="embedded、單檔案、test / CLI / edge 場景的標準選擇、近年因 Cloudflare D1 / Turso 等服務復興">SQLite overview&lt;/a>&lt;/li>
&lt;li>Structure：&lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/sqlite/teaching-structure/" data-link-title="SQLite Teaching Structure" data-link-desc="SQLite 服務章節群的大綱：從 embedded formal state、WAL、backup、test fixture、local-first、edge SQLite 到遷移路由">SQLite Teaching Structure&lt;/a>&lt;/li>
&lt;li>Deep article：&lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/sqlite/file-lifecycle-backup-boundary/" data-link-title="SQLite file lifecycle 與 backup boundary" data-link-desc="把 SQLite 單檔案正式狀態拆成 WAL、backup API、restore drill、corruption recovery 與操作責任邊界">File lifecycle / backup boundary&lt;/a>&lt;/li>
&lt;/ul></description><content:encoded><![CDATA[<p>SQLite hands-on 操作路線的核心責任是把單檔正式狀態轉成可演練流程。這一層對齊 LLM <code>hands-on/</code>：讀者能建立一個 SQLite 檔案、製造 WAL / lock 訊號、跑 backup / restore、套 migration，並知道何時該升級到 server SQL 或 edge SQLite。</p>
<h2 id="章節列表">章節列表</h2>
<table>
  <thead>
      <tr>
          <th>章節</th>
          <th>主題</th>
          <th>產出 artifact</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><a href="local-file-quickstart/">Local file quickstart</a></td>
          <td>建立 <code>.db</code>、schema、seed data、basic query</td>
          <td>database file、schema version、query sample</td>
      </tr>
      <tr>
          <td><a href="backup-restore-drill/">Backup restore drill</a></td>
          <td><code>.backup</code> / <code>VACUUM INTO</code> / restore validation</td>
          <td>backup file、restore record、validation query</td>
      </tr>
      <tr>
          <td><a href="wal-busy-reproduction/">WAL busy reproduction</a></td>
          <td>long transaction、<code>SQLITE_BUSY</code>、checkpoint growth</td>
          <td>busy error sample、WAL size evidence</td>
      </tr>
      <tr>
          <td><a href="migration-fixture-lab/">Migration fixture lab</a></td>
          <td><code>user_version</code>、table rebuild、fixture snapshot</td>
          <td>migration log、fixture DB、rollback note</td>
      </tr>
      <tr>
          <td><a href="d1-turso-preview-lab/">D1 / Turso preview lab</a></td>
          <td>local SQLite 到 edge SQLite product 的 compatibility preview</td>
          <td>export / import note、compatibility gap</td>
      </tr>
  </tbody>
</table>
<h2 id="設計原則">設計原則</h2>
<p>SQLite hands-on 章節要以檔案生命週期為中心。操作指令只在能產出 evidence 時出現；每篇都要回答 database file 在哪裡、sidecar file 如何處理、restore 如何驗證，以及 application release 如何知道它仍相容。</p>
<h2 id="引用路徑">引用路徑</h2>
<ul>
<li>上游：<a href="/blog/backend/01-database/vendors/sqlite/" data-link-title="SQLite" data-link-desc="embedded、單檔案、test / CLI / edge 場景的標準選擇、近年因 Cloudflare D1 / Turso 等服務復興">SQLite overview</a></li>
<li>Structure：<a href="/blog/backend/01-database/vendors/sqlite/teaching-structure/" data-link-title="SQLite Teaching Structure" data-link-desc="SQLite 服務章節群的大綱：從 embedded formal state、WAL、backup、test fixture、local-first、edge SQLite 到遷移路由">SQLite Teaching Structure</a></li>
<li>Deep article：<a href="/blog/backend/01-database/vendors/sqlite/file-lifecycle-backup-boundary/" data-link-title="SQLite file lifecycle 與 backup boundary" data-link-desc="把 SQLite 單檔案正式狀態拆成 WAL、backup API、restore drill、corruption recovery 與操作責任邊界">File lifecycle / backup boundary</a></li>
</ul>
]]></content:encoded></item><item><title>SQLite Local File Quickstart</title><link>https://tarrragon.github.io/blog/backend/01-database/vendors/sqlite/hands-on/local-file-quickstart/</link><pubDate>Thu, 21 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/01-database/vendors/sqlite/hands-on/local-file-quickstart/</guid><description>&lt;p>SQLite local file quickstart 的核心責任是建立後續 backup、WAL、migration 與 fixture lab 共用的 database file。這個 lab 把 SQLite 從抽象服務選型轉成可觀察的檔案、schema、PRAGMA、transaction 與 sidecar artifact。&lt;/p>
&lt;p>本文的驗收標準是：你能建立一個可重建的 &lt;code>app.db&lt;/code>，知道它的 schema version、journal mode、foreign key 設定、seed data 與 cleanup 路徑。&lt;/p>
&lt;h2 id="lab-directory">Lab Directory&lt;/h2>
&lt;p>Lab directory 的核心責任是把 SQLite artifact 放在隔離資料夾，避免和正式檔案混淆。以下命令建立一個可刪除的本地工作區。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">mkdir -p /tmp/sqlite-lab
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="nb">cd&lt;/span> /tmp/sqlite-lab
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">rm -f app.db app.db-wal app.db-shm&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>驗收 artifact 是 &lt;code>/tmp/sqlite-lab/app.db&lt;/code>。後續 lab 可以沿用這個路徑，也可以每次從頭建立。&lt;/p>
&lt;h2 id="baseline-schema">Baseline Schema&lt;/h2>
&lt;p>Baseline schema 的核心責任是建立一組能測 transaction、constraint、index 與 query 的小型資料模型。這裡使用 &lt;code>accounts&lt;/code> 與 &lt;code>ledger_entries&lt;/code>，因為它們能清楚展示 foreign key 與金額 invariant。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln"> 1&lt;/span>&lt;span class="cl">sqlite3 app.db &lt;span class="s">&amp;lt;&amp;lt;&amp;#39;SQL&amp;#39;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 2&lt;/span>&lt;span class="cl">&lt;span class="s">PRAGMA journal_mode = WAL;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 3&lt;/span>&lt;span class="cl">&lt;span class="s">PRAGMA foreign_keys = ON;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 4&lt;/span>&lt;span class="cl">&lt;span class="s">PRAGMA user_version = 1;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 5&lt;/span>&lt;span class="cl">&lt;span class="s">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 6&lt;/span>&lt;span class="cl">&lt;span class="s">CREATE TABLE accounts (
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 7&lt;/span>&lt;span class="cl">&lt;span class="s"> id INTEGER PRIMARY KEY,
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 8&lt;/span>&lt;span class="cl">&lt;span class="s"> owner_name TEXT NOT NULL,
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 9&lt;/span>&lt;span class="cl">&lt;span class="s"> status TEXT NOT NULL CHECK (status IN (&amp;#39;active&amp;#39;, &amp;#39;closed&amp;#39;)),
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">10&lt;/span>&lt;span class="cl">&lt;span class="s"> created_at TEXT NOT NULL
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">11&lt;/span>&lt;span class="cl">&lt;span class="s">) STRICT;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">12&lt;/span>&lt;span class="cl">&lt;span class="s">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">13&lt;/span>&lt;span class="cl">&lt;span class="s">CREATE TABLE ledger_entries (
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">14&lt;/span>&lt;span class="cl">&lt;span class="s"> id INTEGER PRIMARY KEY,
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">15&lt;/span>&lt;span class="cl">&lt;span class="s"> account_id INTEGER NOT NULL REFERENCES accounts(id),
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">16&lt;/span>&lt;span class="cl">&lt;span class="s"> amount_cents INTEGER NOT NULL CHECK (amount_cents != 0),
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">17&lt;/span>&lt;span class="cl">&lt;span class="s"> idempotency_key TEXT NOT NULL UNIQUE,
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">18&lt;/span>&lt;span class="cl">&lt;span class="s"> created_at TEXT NOT NULL
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">19&lt;/span>&lt;span class="cl">&lt;span class="s">) STRICT;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">20&lt;/span>&lt;span class="cl">&lt;span class="s">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">21&lt;/span>&lt;span class="cl">&lt;span class="s">CREATE INDEX idx_ledger_entries_account_created
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">22&lt;/span>&lt;span class="cl">&lt;span class="s">ON ledger_entries(account_id, created_at);
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">23&lt;/span>&lt;span class="cl">&lt;span class="s">SQL&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>這段 schema 的重點是明確資料合約。&lt;code>STRICT&lt;/code>、&lt;code>CHECK&lt;/code>、&lt;code>FOREIGN KEY&lt;/code> 與 &lt;code>UNIQUE&lt;/code> 讓 fixture 更接近正式資料責任，也讓後續 migration lab 有可驗證的 invariant。&lt;/p>
&lt;h2 id="seed-data">Seed Data&lt;/h2>
&lt;p>Seed data 的核心責任是建立可重跑的測試資料。每筆 ledger entry 都有 idempotency key，讓後續 edge / retry 設計可以沿用。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln"> 1&lt;/span>&lt;span class="cl">sqlite3 app.db &lt;span class="s">&amp;lt;&amp;lt;&amp;#39;SQL&amp;#39;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 2&lt;/span>&lt;span class="cl">&lt;span class="s">PRAGMA foreign_keys = ON;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 3&lt;/span>&lt;span class="cl">&lt;span class="s">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 4&lt;/span>&lt;span class="cl">&lt;span class="s">BEGIN;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 5&lt;/span>&lt;span class="cl">&lt;span class="s">INSERT INTO accounts(id, owner_name, status, created_at)
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 6&lt;/span>&lt;span class="cl">&lt;span class="s">VALUES
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 7&lt;/span>&lt;span class="cl">&lt;span class="s"> (1, &amp;#39;Ada&amp;#39;, &amp;#39;active&amp;#39;, &amp;#39;2026-05-21T00:00:00Z&amp;#39;),
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 8&lt;/span>&lt;span class="cl">&lt;span class="s"> (2, &amp;#39;Lin&amp;#39;, &amp;#39;active&amp;#39;, &amp;#39;2026-05-21T00:05:00Z&amp;#39;);
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 9&lt;/span>&lt;span class="cl">&lt;span class="s">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">10&lt;/span>&lt;span class="cl">&lt;span class="s">INSERT INTO ledger_entries(account_id, amount_cents, idempotency_key, created_at)
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">11&lt;/span>&lt;span class="cl">&lt;span class="s">VALUES
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">12&lt;/span>&lt;span class="cl">&lt;span class="s"> (1, 1200, &amp;#39;seed-ada-credit-1&amp;#39;, &amp;#39;2026-05-21T00:10:00Z&amp;#39;),
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">13&lt;/span>&lt;span class="cl">&lt;span class="s"> (1, -200, &amp;#39;seed-ada-debit-1&amp;#39;, &amp;#39;2026-05-21T00:12:00Z&amp;#39;),
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">14&lt;/span>&lt;span class="cl">&lt;span class="s"> (2, 900, &amp;#39;seed-lin-credit-1&amp;#39;, &amp;#39;2026-05-21T00:15:00Z&amp;#39;);
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">15&lt;/span>&lt;span class="cl">&lt;span class="s">COMMIT;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">16&lt;/span>&lt;span class="cl">&lt;span class="s">SQL&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Seed 完成後先跑基本查詢。這一步確認 schema、constraint 與 index 入口都可用。&lt;/p></description><content:encoded><![CDATA[<p>SQLite local file quickstart 的核心責任是建立後續 backup、WAL、migration 與 fixture lab 共用的 database file。這個 lab 把 SQLite 從抽象服務選型轉成可觀察的檔案、schema、PRAGMA、transaction 與 sidecar artifact。</p>
<p>本文的驗收標準是：你能建立一個可重建的 <code>app.db</code>，知道它的 schema version、journal mode、foreign key 設定、seed data 與 cleanup 路徑。</p>
<h2 id="lab-directory">Lab Directory</h2>
<p>Lab directory 的核心責任是把 SQLite artifact 放在隔離資料夾，避免和正式檔案混淆。以下命令建立一個可刪除的本地工作區。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">mkdir -p /tmp/sqlite-lab
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="nb">cd</span> /tmp/sqlite-lab
</span></span><span class="line"><span class="ln">3</span><span class="cl">rm -f app.db app.db-wal app.db-shm</span></span></code></pre></div><p>驗收 artifact 是 <code>/tmp/sqlite-lab/app.db</code>。後續 lab 可以沿用這個路徑，也可以每次從頭建立。</p>
<h2 id="baseline-schema">Baseline Schema</h2>
<p>Baseline schema 的核心責任是建立一組能測 transaction、constraint、index 與 query 的小型資料模型。這裡使用 <code>accounts</code> 與 <code>ledger_entries</code>，因為它們能清楚展示 foreign key 與金額 invariant。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl">sqlite3 app.db <span class="s">&lt;&lt;&#39;SQL&#39;
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="s">PRAGMA journal_mode = WAL;
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="s">PRAGMA foreign_keys = ON;
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="s">PRAGMA user_version = 1;
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="s">
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="s">CREATE TABLE accounts (
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="s">  id INTEGER PRIMARY KEY,
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="s">  owner_name TEXT NOT NULL,
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="s">  status TEXT NOT NULL CHECK (status IN (&#39;active&#39;, &#39;closed&#39;)),
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="s">  created_at TEXT NOT NULL
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="s">) STRICT;
</span></span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="s">
</span></span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="s">CREATE TABLE ledger_entries (
</span></span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="s">  id INTEGER PRIMARY KEY,
</span></span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="s">  account_id INTEGER NOT NULL REFERENCES accounts(id),
</span></span></span><span class="line"><span class="ln">16</span><span class="cl"><span class="s">  amount_cents INTEGER NOT NULL CHECK (amount_cents != 0),
</span></span></span><span class="line"><span class="ln">17</span><span class="cl"><span class="s">  idempotency_key TEXT NOT NULL UNIQUE,
</span></span></span><span class="line"><span class="ln">18</span><span class="cl"><span class="s">  created_at TEXT NOT NULL
</span></span></span><span class="line"><span class="ln">19</span><span class="cl"><span class="s">) STRICT;
</span></span></span><span class="line"><span class="ln">20</span><span class="cl"><span class="s">
</span></span></span><span class="line"><span class="ln">21</span><span class="cl"><span class="s">CREATE INDEX idx_ledger_entries_account_created
</span></span></span><span class="line"><span class="ln">22</span><span class="cl"><span class="s">ON ledger_entries(account_id, created_at);
</span></span></span><span class="line"><span class="ln">23</span><span class="cl"><span class="s">SQL</span></span></span></code></pre></div><p>這段 schema 的重點是明確資料合約。<code>STRICT</code>、<code>CHECK</code>、<code>FOREIGN KEY</code> 與 <code>UNIQUE</code> 讓 fixture 更接近正式資料責任，也讓後續 migration lab 有可驗證的 invariant。</p>
<h2 id="seed-data">Seed Data</h2>
<p>Seed data 的核心責任是建立可重跑的測試資料。每筆 ledger entry 都有 idempotency key，讓後續 edge / retry 設計可以沿用。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl">sqlite3 app.db <span class="s">&lt;&lt;&#39;SQL&#39;
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="s">PRAGMA foreign_keys = ON;
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="s">
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="s">BEGIN;
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="s">INSERT INTO accounts(id, owner_name, status, created_at)
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="s">VALUES
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="s">  (1, &#39;Ada&#39;, &#39;active&#39;, &#39;2026-05-21T00:00:00Z&#39;),
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="s">  (2, &#39;Lin&#39;, &#39;active&#39;, &#39;2026-05-21T00:05:00Z&#39;);
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="s">
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="s">INSERT INTO ledger_entries(account_id, amount_cents, idempotency_key, created_at)
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="s">VALUES
</span></span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="s">  (1, 1200, &#39;seed-ada-credit-1&#39;, &#39;2026-05-21T00:10:00Z&#39;),
</span></span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="s">  (1, -200, &#39;seed-ada-debit-1&#39;, &#39;2026-05-21T00:12:00Z&#39;),
</span></span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="s">  (2, 900, &#39;seed-lin-credit-1&#39;, &#39;2026-05-21T00:15:00Z&#39;);
</span></span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="s">COMMIT;
</span></span></span><span class="line"><span class="ln">16</span><span class="cl"><span class="s">SQL</span></span></span></code></pre></div><p>Seed 完成後先跑基本查詢。這一步確認 schema、constraint 與 index 入口都可用。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">sqlite3 app.db <span class="s">&lt;&lt;&#39;SQL&#39;
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="s">.headers on
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="s">.mode column
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="s">SELECT a.id, a.owner_name, SUM(l.amount_cents) AS balance_cents
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="s">FROM accounts a
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="s">JOIN ledger_entries l ON l.account_id = a.id
</span></span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="s">GROUP BY a.id, a.owner_name
</span></span></span><span class="line"><span class="ln">8</span><span class="cl"><span class="s">ORDER BY a.id;
</span></span></span><span class="line"><span class="ln">9</span><span class="cl"><span class="s">SQL</span></span></span></code></pre></div><p>預期輸出應顯示 Ada 餘額 <code>1000</code>，Lin 餘額 <code>900</code>。</p>
<h2 id="pragma-snapshot">PRAGMA Snapshot</h2>
<p>PRAGMA snapshot 的核心責任是把連線設定變成 evidence。SQLite 的部分設定與 connection 有關，因此 lab 要明確查出當前狀態。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">sqlite3 app.db <span class="s">&lt;&lt;&#39;SQL&#39;
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="s">.headers on
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="s">.mode column
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="s">PRAGMA journal_mode;
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="s">PRAGMA foreign_keys;
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="s">PRAGMA user_version;
</span></span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="s">PRAGMA integrity_check;
</span></span></span><span class="line"><span class="ln">8</span><span class="cl"><span class="s">SQL</span></span></span></code></pre></div><p>驗收重點如下：</p>
<table>
  <thead>
      <tr>
          <th>欄位</th>
          <th>期望結果</th>
          <th>意義</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><code>journal_mode</code></td>
          <td><code>wal</code></td>
          <td>後續可觀察 <code>-wal</code> sidecar</td>
      </tr>
      <tr>
          <td><code>foreign_keys</code></td>
          <td><code>1</code></td>
          <td>constraint 在連線上已啟用</td>
      </tr>
      <tr>
          <td><code>user_version</code></td>
          <td><code>1</code></td>
          <td>migration 起點清楚</td>
      </tr>
      <tr>
          <td>integrity</td>
          <td><code>ok</code></td>
          <td>database file 基本健康</td>
      </tr>
  </tbody>
</table>
<h2 id="transaction-sample">Transaction Sample</h2>
<p>Transaction sample 的核心責任是建立後續 busy / migration lab 的共同語言。SQLite transaction 成功時要同時更新資料與保護 invariant。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">sqlite3 app.db <span class="s">&lt;&lt;&#39;SQL&#39;
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="s">PRAGMA foreign_keys = ON;
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="s">BEGIN IMMEDIATE;
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="s">INSERT INTO ledger_entries(account_id, amount_cents, idempotency_key, created_at)
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="s">VALUES (1, 300, &#39;manual-ada-credit-1&#39;, &#39;2026-05-21T00:20:00Z&#39;);
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="s">COMMIT;
</span></span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="s">SQL</span></span></span></code></pre></div><p><code>BEGIN IMMEDIATE</code> 會提早取得 write lock。這讓後續 <a href="/blog/backend/01-database/vendors/sqlite/hands-on/wal-busy-reproduction/" data-link-title="SQLite WAL Busy Reproduction" data-link-desc="SQLite long transaction、SQLITE_BUSY、busy_timeout、checkpoint growth 與 writer queue 的操作說明">WAL busy reproduction</a> 可以直接展示 single writer boundary。</p>
<h2 id="file-artifact-check">File Artifact Check</h2>
<p>File artifact check 的核心責任是讓讀者看到 SQLite 由 <code>.db</code> 與可能存在的 sidecar 共同構成。WAL mode 可能建立 <code>-wal</code> 與 <code>-shm</code> sidecar，backup / copy / restore runbook 要理解這些檔案。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">ls -lh app.db app.db-wal app.db-shm</span></span></code></pre></div><p>若 sidecar 暫時未出現，可以再寫入一筆資料或保持連線開啟。Sidecar 是否存在取決於 WAL 狀態、checkpoint 與 connection lifecycle。</p>
<h2 id="cleanup">Cleanup</h2>
<p>Cleanup 的核心責任是讓 lab 可以重跑。若要重新開始，刪除 database 與 sidecar。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">rm -f /tmp/sqlite-lab/app.db /tmp/sqlite-lab/app.db-wal /tmp/sqlite-lab/app.db-shm</span></span></code></pre></div><p>完成本篇後，下一步可以進入 <a href="/blog/backend/01-database/vendors/sqlite/hands-on/backup-restore-drill/" data-link-title="SQLite Backup Restore Drill" data-link-desc="SQLite .backup、VACUUM INTO、restore validation、sidecar file handling 與 RPO / RTO note 的操作說明">backup restore drill</a> 或 <a href="/blog/backend/01-database/vendors/sqlite/hands-on/wal-busy-reproduction/" data-link-title="SQLite WAL Busy Reproduction" data-link-desc="SQLite long transaction、SQLITE_BUSY、busy_timeout、checkpoint growth 與 writer queue 的操作說明">WAL busy reproduction</a>。</p>
]]></content:encoded></item><item><title>SQLite Migration Fixture Lab</title><link>https://tarrragon.github.io/blog/backend/01-database/vendors/sqlite/hands-on/migration-fixture-lab/</link><pubDate>Thu, 21 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/01-database/vendors/sqlite/hands-on/migration-fixture-lab/</guid><description>&lt;p>SQLite migration fixture lab 的核心責任是把 schema migration 與 test fixture 放進同一個可重建流程。這篇承接 &lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/sqlite/schema-migration-versioning/" data-link-title="SQLite Schema Migration and Versioning" data-link-desc="SQLite schema migration、user_version、table rebuild、ALTER TABLE 限制、app release compatibility 與 migration evidence">Schema Migration / Versioning&lt;/a> 與 &lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/sqlite/test-fixture-best-practice/" data-link-title="SQLite Test Fixture Best Practice" data-link-desc="SQLite 作為 test fixture、repository contract test、production dialect gap、seed data、fixture snapshot 與 CI evidence 的操作判準">Test Fixture Best Practice&lt;/a>，讓 migration 有版本、snapshot、validation 與 rollback note。&lt;/p>
&lt;p>本文的驗收標準是：你能建立 v1 fixture、套用 v2 migration、產生 v2 snapshot，並用 validation query 證明資料合約仍成立。&lt;/p>
&lt;h2 id="create-fixture">Create Fixture&lt;/h2>
&lt;p>Create fixture 的核心責任是建立乾淨、可重建的 source fixture。沿用 quickstart schema，或重新建立一份 fixture DB。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln"> 1&lt;/span>&lt;span class="cl">mkdir -p /tmp/sqlite-fixture-lab
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 2&lt;/span>&lt;span class="cl">&lt;span class="nb">cd&lt;/span> /tmp/sqlite-fixture-lab
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 3&lt;/span>&lt;span class="cl">rm -f fixture-v1.db fixture-v2.db
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 4&lt;/span>&lt;span class="cl">sqlite3 fixture-v1.db &lt;span class="s">&amp;lt;&amp;lt;&amp;#39;SQL&amp;#39;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 5&lt;/span>&lt;span class="cl">&lt;span class="s">PRAGMA foreign_keys = ON;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 6&lt;/span>&lt;span class="cl">&lt;span class="s">PRAGMA user_version = 1;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 7&lt;/span>&lt;span class="cl">&lt;span class="s">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 8&lt;/span>&lt;span class="cl">&lt;span class="s">CREATE TABLE accounts (
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 9&lt;/span>&lt;span class="cl">&lt;span class="s"> id INTEGER PRIMARY KEY,
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">10&lt;/span>&lt;span class="cl">&lt;span class="s"> owner_name TEXT NOT NULL,
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">11&lt;/span>&lt;span class="cl">&lt;span class="s"> status TEXT NOT NULL CHECK (status IN (&amp;#39;active&amp;#39;, &amp;#39;closed&amp;#39;)),
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">12&lt;/span>&lt;span class="cl">&lt;span class="s"> created_at TEXT NOT NULL
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">13&lt;/span>&lt;span class="cl">&lt;span class="s">) STRICT;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">14&lt;/span>&lt;span class="cl">&lt;span class="s">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">15&lt;/span>&lt;span class="cl">&lt;span class="s">CREATE TABLE ledger_entries (
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">16&lt;/span>&lt;span class="cl">&lt;span class="s"> id INTEGER PRIMARY KEY,
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">17&lt;/span>&lt;span class="cl">&lt;span class="s"> account_id INTEGER NOT NULL REFERENCES accounts(id),
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">18&lt;/span>&lt;span class="cl">&lt;span class="s"> amount_cents INTEGER NOT NULL CHECK (amount_cents != 0),
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">19&lt;/span>&lt;span class="cl">&lt;span class="s"> idempotency_key TEXT NOT NULL UNIQUE,
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">20&lt;/span>&lt;span class="cl">&lt;span class="s"> created_at TEXT NOT NULL
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">21&lt;/span>&lt;span class="cl">&lt;span class="s">) STRICT;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">22&lt;/span>&lt;span class="cl">&lt;span class="s">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">23&lt;/span>&lt;span class="cl">&lt;span class="s">INSERT INTO accounts VALUES (1, &amp;#39;Ada&amp;#39;, &amp;#39;active&amp;#39;, &amp;#39;2026-05-21T00:00:00Z&amp;#39;);
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">24&lt;/span>&lt;span class="cl">&lt;span class="s">INSERT INTO ledger_entries(account_id, amount_cents, idempotency_key, created_at)
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">25&lt;/span>&lt;span class="cl">&lt;span class="s">VALUES (1, 1000, &amp;#39;fixture-v1-ada&amp;#39;, &amp;#39;2026-05-21T00:10:00Z&amp;#39;);
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">26&lt;/span>&lt;span class="cl">&lt;span class="s">SQL&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>這個 fixture 是 v1 source of truth。CI 可以每次從 SQL 重建，也可以保存 &lt;code>fixture-v1.db&lt;/code> 作為 binary fixture；兩者都要有版本與 checksum。&lt;/p></description><content:encoded><![CDATA[<p>SQLite migration fixture lab 的核心責任是把 schema migration 與 test fixture 放進同一個可重建流程。這篇承接 <a href="/blog/backend/01-database/vendors/sqlite/schema-migration-versioning/" data-link-title="SQLite Schema Migration and Versioning" data-link-desc="SQLite schema migration、user_version、table rebuild、ALTER TABLE 限制、app release compatibility 與 migration evidence">Schema Migration / Versioning</a> 與 <a href="/blog/backend/01-database/vendors/sqlite/test-fixture-best-practice/" data-link-title="SQLite Test Fixture Best Practice" data-link-desc="SQLite 作為 test fixture、repository contract test、production dialect gap、seed data、fixture snapshot 與 CI evidence 的操作判準">Test Fixture Best Practice</a>，讓 migration 有版本、snapshot、validation 與 rollback note。</p>
<p>本文的驗收標準是：你能建立 v1 fixture、套用 v2 migration、產生 v2 snapshot，並用 validation query 證明資料合約仍成立。</p>
<h2 id="create-fixture">Create Fixture</h2>
<p>Create fixture 的核心責任是建立乾淨、可重建的 source fixture。沿用 quickstart schema，或重新建立一份 fixture DB。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl">mkdir -p /tmp/sqlite-fixture-lab
</span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="nb">cd</span> /tmp/sqlite-fixture-lab
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">rm -f fixture-v1.db fixture-v2.db
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">sqlite3 fixture-v1.db <span class="s">&lt;&lt;&#39;SQL&#39;
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="s">PRAGMA foreign_keys = ON;
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="s">PRAGMA user_version = 1;
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="s">
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="s">CREATE TABLE accounts (
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="s">  id INTEGER PRIMARY KEY,
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="s">  owner_name TEXT NOT NULL,
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="s">  status TEXT NOT NULL CHECK (status IN (&#39;active&#39;, &#39;closed&#39;)),
</span></span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="s">  created_at TEXT NOT NULL
</span></span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="s">) STRICT;
</span></span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="s">
</span></span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="s">CREATE TABLE ledger_entries (
</span></span></span><span class="line"><span class="ln">16</span><span class="cl"><span class="s">  id INTEGER PRIMARY KEY,
</span></span></span><span class="line"><span class="ln">17</span><span class="cl"><span class="s">  account_id INTEGER NOT NULL REFERENCES accounts(id),
</span></span></span><span class="line"><span class="ln">18</span><span class="cl"><span class="s">  amount_cents INTEGER NOT NULL CHECK (amount_cents != 0),
</span></span></span><span class="line"><span class="ln">19</span><span class="cl"><span class="s">  idempotency_key TEXT NOT NULL UNIQUE,
</span></span></span><span class="line"><span class="ln">20</span><span class="cl"><span class="s">  created_at TEXT NOT NULL
</span></span></span><span class="line"><span class="ln">21</span><span class="cl"><span class="s">) STRICT;
</span></span></span><span class="line"><span class="ln">22</span><span class="cl"><span class="s">
</span></span></span><span class="line"><span class="ln">23</span><span class="cl"><span class="s">INSERT INTO accounts VALUES (1, &#39;Ada&#39;, &#39;active&#39;, &#39;2026-05-21T00:00:00Z&#39;);
</span></span></span><span class="line"><span class="ln">24</span><span class="cl"><span class="s">INSERT INTO ledger_entries(account_id, amount_cents, idempotency_key, created_at)
</span></span></span><span class="line"><span class="ln">25</span><span class="cl"><span class="s">VALUES (1, 1000, &#39;fixture-v1-ada&#39;, &#39;2026-05-21T00:10:00Z&#39;);
</span></span></span><span class="line"><span class="ln">26</span><span class="cl"><span class="s">SQL</span></span></span></code></pre></div><p>這個 fixture 是 v1 source of truth。CI 可以每次從 SQL 重建，也可以保存 <code>fixture-v1.db</code> 作為 binary fixture；兩者都要有版本與 checksum。</p>
<h2 id="pre-migration-snapshot">Pre-Migration Snapshot</h2>
<p>Pre-migration snapshot 的核心責任是建立 rollback 起點。正式 migration 前應先保存 source DB。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">sqlite3 fixture-v1.db <span class="s2">&#34;.backup &#39;fixture-v1-before-migration.db&#39;&#34;</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">sqlite3 fixture-v1-before-migration.db <span class="s2">&#34;PRAGMA integrity_check;&#34;</span></span></span></code></pre></div><p>這份 snapshot 代表 migration 失敗時的回退點。CI log 要保留 snapshot path、schema version 與 migration id。</p>
<h2 id="apply-add-column-migration">Apply Add Column Migration</h2>
<p>Apply add column migration 的核心責任是展示低風險 schema change。先複製 v1，再套用 v2。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">cp fixture-v1.db fixture-v2.db
</span></span><span class="line"><span class="ln">2</span><span class="cl">sqlite3 fixture-v2.db <span class="s">&lt;&lt;&#39;SQL&#39;
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="s">PRAGMA foreign_keys = ON;
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="s">BEGIN;
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="s">ALTER TABLE accounts ADD COLUMN email TEXT;
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="s">PRAGMA user_version = 2;
</span></span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="s">COMMIT;
</span></span></span><span class="line"><span class="ln">8</span><span class="cl"><span class="s">SQL</span></span></span></code></pre></div><p>驗證 schema version 與新欄位：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">sqlite3 fixture-v2.db <span class="s">&lt;&lt;&#39;SQL&#39;
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="s">PRAGMA user_version;
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="s">PRAGMA table_info(accounts);
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="s">SQL</span></span></span></code></pre></div><p>Add column 是較簡單的 migration。涉及 drop column、rename、constraint 重建或資料 reshape 時，應改用 table rebuild 策略。</p>
<h2 id="table-rebuild-example">Table Rebuild Example</h2>
<p>Table rebuild 的核心責任是展示 SQLite schema migration 的高風險路徑。以下範例把 <code>accounts.status</code> 的 allowed value 加入 <code>suspended</code>，透過新表重建 constraint。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl">sqlite3 fixture-v2.db <span class="s">&lt;&lt;&#39;SQL&#39;
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="s">PRAGMA foreign_keys = OFF;
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="s">BEGIN;
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="s">
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="s">CREATE TABLE accounts_new (
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="s">  id INTEGER PRIMARY KEY,
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="s">  owner_name TEXT NOT NULL,
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="s">  status TEXT NOT NULL CHECK (status IN (&#39;active&#39;, &#39;closed&#39;, &#39;suspended&#39;)),
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="s">  created_at TEXT NOT NULL,
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="s">  email TEXT
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="s">) STRICT;
</span></span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="s">
</span></span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="s">INSERT INTO accounts_new(id, owner_name, status, created_at, email)
</span></span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="s">SELECT id, owner_name, status, created_at, email
</span></span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="s">FROM accounts;
</span></span></span><span class="line"><span class="ln">16</span><span class="cl"><span class="s">
</span></span></span><span class="line"><span class="ln">17</span><span class="cl"><span class="s">DROP TABLE accounts;
</span></span></span><span class="line"><span class="ln">18</span><span class="cl"><span class="s">ALTER TABLE accounts_new RENAME TO accounts;
</span></span></span><span class="line"><span class="ln">19</span><span class="cl"><span class="s">
</span></span></span><span class="line"><span class="ln">20</span><span class="cl"><span class="s">PRAGMA user_version = 3;
</span></span></span><span class="line"><span class="ln">21</span><span class="cl"><span class="s">COMMIT;
</span></span></span><span class="line"><span class="ln">22</span><span class="cl"><span class="s">PRAGMA foreign_keys = ON;
</span></span></span><span class="line"><span class="ln">23</span><span class="cl"><span class="s">SQL</span></span></span></code></pre></div><p>Table rebuild 要保存 index、trigger、view 與 FK reference。這個 lab 只有小型 schema；正式 migration 要先列出所有 dependent object。</p>
<h2 id="validation-query">Validation Query</h2>
<p>Validation query 的核心責任是證明 migration 後資料仍符合 domain invariant。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">sqlite3 fixture-v2.db <span class="s">&lt;&lt;&#39;SQL&#39;
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="s">PRAGMA integrity_check;
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="s">PRAGMA foreign_key_check;
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="s">SELECT COUNT(*) AS account_count FROM accounts;
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="s">SELECT COUNT(*) AS ledger_count FROM ledger_entries;
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="s">SELECT SUM(amount_cents) AS total_balance FROM ledger_entries;
</span></span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="s">PRAGMA user_version;
</span></span></span><span class="line"><span class="ln">8</span><span class="cl"><span class="s">SQL</span></span></span></code></pre></div><p>驗收結果應包含 integrity <code>ok</code>、foreign key check 空結果、account count <code>1</code>、ledger count <code>1</code>、total balance <code>1000</code>、user version <code>3</code>。</p>
<h2 id="contract-test-hook">Contract Test Hook</h2>
<p>Contract test hook 的核心責任是讓 fixture 進入 CI。語言與 framework 可以不同，但測試要固定做三件事：開啟 FK、確認 schema version、跑 repository contract。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">test setup:
</span></span><span class="line"><span class="ln">2</span><span class="cl">  copy fixture-v2.db to temp path
</span></span><span class="line"><span class="ln">3</span><span class="cl">  open SQLite connection
</span></span><span class="line"><span class="ln">4</span><span class="cl">  execute PRAGMA foreign_keys = ON
</span></span><span class="line"><span class="ln">5</span><span class="cl">  assert PRAGMA user_version = 3
</span></span><span class="line"><span class="ln">6</span><span class="cl">  run repository contract tests</span></span></code></pre></div><p>每個 test 使用 temp copy 可以避免資料污染。需要測 concurrency 時，改用 <a href="/blog/backend/01-database/vendors/sqlite/hands-on/wal-busy-reproduction/" data-link-title="SQLite WAL Busy Reproduction" data-link-desc="SQLite long transaction、SQLITE_BUSY、busy_timeout、checkpoint growth 與 writer queue 的操作說明">WAL busy reproduction</a>。</p>
<h2 id="rollback-note">Rollback Note</h2>
<p>Rollback note 的核心責任是把 migration 失敗時的處理寫清楚。這個 lab 的 rollback 是保留 <code>fixture-v1-before-migration.db</code>，在 migration validation 失敗時停止 release 並保存 failed DB。</p>
<p>正式 runbook 要記錄：</p>
<ol>
<li>Migration id 與 source / target <code>user_version</code>。</li>
<li>Pre-migration backup path。</li>
<li>Validation query 與結果。</li>
<li>Failed DB 保存路徑。</li>
<li>Release block / rollback 條件。</li>
</ol>
<p>完成本篇後，下一步可以讀 <a href="/blog/backend/01-database/vendors/sqlite/migrate-to-postgresql/" data-link-title="SQLite to PostgreSQL Migration" data-link-desc="SQLite 升級到 PostgreSQL 的 driver、schema diff、data copy、dual run、cutover、rollback 與 cleanup">SQLite to PostgreSQL migration</a> 或 <a href="/blog/backend/01-database/vendors/sqlite/migrate-to-d1-turso/" data-link-title="SQLite to D1 / Turso Migration" data-link-desc="SQLite 轉向 Cloudflare D1、Turso / libSQL 的 edge driver、compatibility audit、data movement 與 rollback">SQLite to D1 / Turso migration</a>。</p>
]]></content:encoded></item><item><title>SQLite WAL Busy Reproduction</title><link>https://tarrragon.github.io/blog/backend/01-database/vendors/sqlite/hands-on/wal-busy-reproduction/</link><pubDate>Thu, 21 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/01-database/vendors/sqlite/hands-on/wal-busy-reproduction/</guid><description>&lt;p>SQLite WAL busy reproduction 的核心責任是讓讀者親眼看到 single writer boundary。這篇承接 &lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/sqlite/wal-concurrency-locking/" data-link-title="SQLite WAL Concurrency and Locking" data-link-desc="SQLite WAL mode 如何降低 reader / writer 衝突、保留 single writer boundary，並用 SQLITE_BUSY、WAL growth、checkpoint 訊號判斷 production 上限">WAL concurrency / locking&lt;/a>，把 &lt;code>SQLITE_BUSY&lt;/code> 從文字警告轉成可重現 timeline。&lt;/p>
&lt;p>本文的驗收標準是：你能用兩個 sqlite3 session 重現 writer contention，觀察 busy timeout 行為，並用 WAL size 與 checkpoint result 連回 production runbook。&lt;/p>
&lt;h2 id="prepare-database">Prepare Database&lt;/h2>
&lt;p>Prepare database 的核心責任是建立可重現的 WAL mode database。若已跑過 &lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/sqlite/hands-on/local-file-quickstart/" data-link-title="SQLite Local File Quickstart" data-link-desc="SQLite local .db file、schema、seed data、PRAGMA baseline、query sample 與 cleanup 的操作說明">local file quickstart&lt;/a>，可以沿用 &lt;code>/tmp/sqlite-lab/app.db&lt;/code>。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="nb">cd&lt;/span> /tmp/sqlite-lab
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">sqlite3 app.db &lt;span class="s2">&amp;#34;PRAGMA journal_mode = WAL;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">sqlite3 app.db &lt;span class="s2">&amp;#34;PRAGMA busy_timeout = 1000;&amp;#34;&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>確認 WAL mode：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">sqlite3 app.db &lt;span class="s2">&amp;#34;PRAGMA journal_mode;&amp;#34;&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>預期輸出是 &lt;code>wal&lt;/code>。&lt;/p>
&lt;h2 id="session-a-hold-writer-lock">Session A: Hold Writer Lock&lt;/h2>
&lt;p>Session A 的核心責任是刻意持有 write transaction。開第一個 terminal，執行：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">sqlite3 app.db&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>在 sqlite prompt 內輸入：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-sql" data-lang="sql">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="n">PRAGMA&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">foreign_keys&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">ON&lt;/span>&lt;span class="p">;&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="k">BEGIN&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">IMMEDIATE&lt;/span>&lt;span class="p">;&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="k">INSERT&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">INTO&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">ledger_entries&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">account_id&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">amount_cents&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">idempotency_key&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">created_at&lt;/span>&lt;span class="p">)&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="k">VALUES&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="mi">11&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s1">&amp;#39;busy-session-a&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s1">&amp;#39;2026-05-21T02:00:00Z&amp;#39;&lt;/span>&lt;span class="p">);&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>先保持 transaction 開啟，暫時延後 &lt;code>COMMIT&lt;/code>。&lt;code>BEGIN IMMEDIATE&lt;/code> 會取得 writer lock，讓第二個 writer 需要等待或失敗。&lt;/p>
&lt;h2 id="session-b-observe-busy">Session B: Observe Busy&lt;/h2>
&lt;p>Session B 的核心責任是用第二個 connection 觀察 single writer boundary。開第二個 terminal，執行：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="nb">cd&lt;/span> /tmp/sqlite-lab
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">sqlite3 app.db &lt;span class="s2">&amp;#34;PRAGMA busy_timeout = 1000; INSERT INTO ledger_entries(account_id, amount_cents, idempotency_key, created_at) VALUES (1, 22, &amp;#39;busy-session-b&amp;#39;, &amp;#39;2026-05-21T02:01:00Z&amp;#39;);&amp;#34;&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>預期結果是等待約 1 秒後出現 busy / locked 類錯誤。不同 sqlite3 版本的錯誤文字可能略有差異，核心訊號是第二個 writer 在 Session A commit 前拿不到 write lock。&lt;/p>
&lt;h2 id="release-lock">Release Lock&lt;/h2>
&lt;p>Release lock 的核心責任是確認 contention 來自 writer transaction。回到 Session A，輸入：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-sql" data-lang="sql">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="k">COMMIT&lt;/span>&lt;span class="p">;&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="n">quit&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>再次執行 Session B 的 insert，這次應成功。&lt;/p></description><content:encoded><![CDATA[<p>SQLite WAL busy reproduction 的核心責任是讓讀者親眼看到 single writer boundary。這篇承接 <a href="/blog/backend/01-database/vendors/sqlite/wal-concurrency-locking/" data-link-title="SQLite WAL Concurrency and Locking" data-link-desc="SQLite WAL mode 如何降低 reader / writer 衝突、保留 single writer boundary，並用 SQLITE_BUSY、WAL growth、checkpoint 訊號判斷 production 上限">WAL concurrency / locking</a>，把 <code>SQLITE_BUSY</code> 從文字警告轉成可重現 timeline。</p>
<p>本文的驗收標準是：你能用兩個 sqlite3 session 重現 writer contention，觀察 busy timeout 行為，並用 WAL size 與 checkpoint result 連回 production runbook。</p>
<h2 id="prepare-database">Prepare Database</h2>
<p>Prepare database 的核心責任是建立可重現的 WAL mode database。若已跑過 <a href="/blog/backend/01-database/vendors/sqlite/hands-on/local-file-quickstart/" data-link-title="SQLite Local File Quickstart" data-link-desc="SQLite local .db file、schema、seed data、PRAGMA baseline、query sample 與 cleanup 的操作說明">local file quickstart</a>，可以沿用 <code>/tmp/sqlite-lab/app.db</code>。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="nb">cd</span> /tmp/sqlite-lab
</span></span><span class="line"><span class="ln">2</span><span class="cl">sqlite3 app.db <span class="s2">&#34;PRAGMA journal_mode = WAL;&#34;</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">sqlite3 app.db <span class="s2">&#34;PRAGMA busy_timeout = 1000;&#34;</span></span></span></code></pre></div><p>確認 WAL mode：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">sqlite3 app.db <span class="s2">&#34;PRAGMA journal_mode;&#34;</span></span></span></code></pre></div><p>預期輸出是 <code>wal</code>。</p>
<h2 id="session-a-hold-writer-lock">Session A: Hold Writer Lock</h2>
<p>Session A 的核心責任是刻意持有 write transaction。開第一個 terminal，執行：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">sqlite3 app.db</span></span></code></pre></div><p>在 sqlite prompt 內輸入：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="n">PRAGMA</span><span class="w"> </span><span class="n">foreign_keys</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">ON</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w"></span><span class="k">BEGIN</span><span class="w"> </span><span class="k">IMMEDIATE</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w"></span><span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">ledger_entries</span><span class="p">(</span><span class="n">account_id</span><span class="p">,</span><span class="w"> </span><span class="n">amount_cents</span><span class="p">,</span><span class="w"> </span><span class="n">idempotency_key</span><span class="p">,</span><span class="w"> </span><span class="n">created_at</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w"></span><span class="k">VALUES</span><span class="w"> </span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="mi">11</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;busy-session-a&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;2026-05-21T02:00:00Z&#39;</span><span class="p">);</span></span></span></code></pre></div><p>先保持 transaction 開啟，暫時延後 <code>COMMIT</code>。<code>BEGIN IMMEDIATE</code> 會取得 writer lock，讓第二個 writer 需要等待或失敗。</p>
<h2 id="session-b-observe-busy">Session B: Observe Busy</h2>
<p>Session B 的核心責任是用第二個 connection 觀察 single writer boundary。開第二個 terminal，執行：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="nb">cd</span> /tmp/sqlite-lab
</span></span><span class="line"><span class="ln">2</span><span class="cl">sqlite3 app.db <span class="s2">&#34;PRAGMA busy_timeout = 1000; INSERT INTO ledger_entries(account_id, amount_cents, idempotency_key, created_at) VALUES (1, 22, &#39;busy-session-b&#39;, &#39;2026-05-21T02:01:00Z&#39;);&#34;</span></span></span></code></pre></div><p>預期結果是等待約 1 秒後出現 busy / locked 類錯誤。不同 sqlite3 版本的錯誤文字可能略有差異，核心訊號是第二個 writer 在 Session A commit 前拿不到 write lock。</p>
<h2 id="release-lock">Release Lock</h2>
<p>Release lock 的核心責任是確認 contention 來自 writer transaction。回到 Session A，輸入：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">COMMIT</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w"></span><span class="p">.</span><span class="n">quit</span></span></span></code></pre></div><p>再次執行 Session B 的 insert，這次應成功。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">sqlite3 app.db <span class="s2">&#34;PRAGMA foreign_keys = ON; INSERT INTO ledger_entries(account_id, amount_cents, idempotency_key, created_at) VALUES (1, 22, &#39;busy-session-b&#39;, &#39;2026-05-21T02:01:00Z&#39;);&#34;</span></span></span></code></pre></div><p>若 idempotency key 已在前一次嘗試中寫入，改成新的 key。這個細節也提醒 production write 要有 idempotency 設計。</p>
<h2 id="busy-timeout-comparison">Busy Timeout Comparison</h2>
<p>Busy timeout comparison 的核心責任是區分「等一下」和「解決 writer contention」。Timeout 可以讓短暫鎖等待更平滑，但長交易仍會造成延遲或失敗。</p>
<p>重開 Session A 並持有 transaction：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">BEGIN</span><span class="w"> </span><span class="k">IMMEDIATE</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w"></span><span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">ledger_entries</span><span class="p">(</span><span class="n">account_id</span><span class="p">,</span><span class="w"> </span><span class="n">amount_cents</span><span class="p">,</span><span class="w"> </span><span class="n">idempotency_key</span><span class="p">,</span><span class="w"> </span><span class="n">created_at</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w"></span><span class="k">VALUES</span><span class="w"> </span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="mi">33</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;busy-session-a-long&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;2026-05-21T02:10:00Z&#39;</span><span class="p">);</span></span></span></code></pre></div><p>在 Session B 測不同 timeout：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="nb">time</span> sqlite3 app.db <span class="s2">&#34;PRAGMA busy_timeout = 5000; INSERT INTO ledger_entries(account_id, amount_cents, idempotency_key, created_at) VALUES (1, 44, &#39;busy-session-b-long&#39;, &#39;2026-05-21T02:11:00Z&#39;);&#34;</span></span></span></code></pre></div><p>若 Session A 在 5 秒內 commit，Session B 可能成功；若持續持有 transaction，Session B 會在 timeout 後失敗。這就是 production 裡 busy timeout 的邊界：它緩衝短鎖，長 transaction 仍要被設計移除。</p>
<h2 id="wal-and-checkpoint">WAL and Checkpoint</h2>
<p>WAL and checkpoint 的核心責任是把 writer activity 和 file artifact 連起來。多做幾次寫入後觀察 sidecar。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">ls -lh app.db app.db-wal app.db-shm
</span></span><span class="line"><span class="ln">2</span><span class="cl">sqlite3 app.db <span class="s2">&#34;PRAGMA wal_checkpoint(PASSIVE);&#34;</span></span></span></code></pre></div><p><code>wal_checkpoint</code> 會回傳 checkpoint 狀態。正式 runbook 要記錄 WAL size、checkpoint duration、reader age 與 checkpoint failure。</p>
<p>可以手動觸發 truncate checkpoint：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">sqlite3 app.db <span class="s2">&#34;PRAGMA wal_checkpoint(TRUNCATE);&#34;</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">ls -lh app.db app.db-wal app.db-shm</span></span></code></pre></div><p>TRUNCATE 適合 lab 觀察。Production 使用時要評估 reader、latency 與維護窗口。</p>
<h2 id="mitigation-note">Mitigation Note</h2>
<p>Mitigation note 的核心責任是把 lab 結果轉成設計策略。看到 <code>SQLITE_BUSY</code> 後，優先檢查 long transaction、未關閉 cursor、背景 job、write burst、parallel test 共用 DB 與 checkpoint pressure。</p>
<p>常見策略包含：</p>
<ol>
<li>縮短 transaction，將外部 API call 移到 transaction 外。</li>
<li>設定合理 busy timeout 與 retry backoff。</li>
<li>把 write queue 序列化，讓高風險 workflow 先排隊。</li>
<li>將 heavy read 移到 snapshot 或 replica。</li>
<li>當 concurrent writer 成為常態，評估 PostgreSQL / MySQL。</li>
</ol>
<p>完成本篇後，下一步讀 <a href="/blog/backend/01-database/vendors/sqlite/observability-runbook/" data-link-title="SQLite Observability and Runbook" data-link-desc="SQLite production runbook、backup evidence、WAL growth、busy errors、disk usage、restore drill 與 incident route">observability / runbook</a> 把 busy、WAL 與 checkpoint 變成正式監控訊號。</p>
]]></content:encoded></item><item><title>Hands-on Quickstart：clone repo 後跑通所有 demo</title><link>https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/quickstart/</link><pubDate>Tue, 12 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/quickstart/</guid><description>&lt;p>本篇是 hands-on 系列的&lt;strong>導讀&lt;/strong>——把分散在 &lt;code>ollama-setup&lt;/code> / &lt;code>rag-demo&lt;/code> / &lt;code>mcp-demo&lt;/code> / &lt;code>permission-boundary&lt;/code> 各章節的 setup 步驟整合成一條最短路徑、讓 clone repo 的人能在 15 分鐘內跑通所有 demo（&lt;a href="https://tarrragon.github.io/blog/llm/knowledge-cards/rag/" data-link-title="RAG" data-link-desc="Retrieval-Augmented Generation：動態外掛知識給 LLM、繞開模型參數記憶的靜態限制">RAG&lt;/a>、&lt;a href="https://tarrragon.github.io/blog/llm/knowledge-cards/mcp/" data-link-title="MCP（Model Context Protocol）" data-link-desc="LLM application ↔ 外部 tool server 之間的標準化協議、複用 OpenAI 相容 API 的成功模式">MCP&lt;/a>、權限邊界三個 demo、RAG 是「retrieval 找相關內容 + LLM 回答」、MCP 是「LLM application ↔ tool server 的標準協議」）。&lt;/p>
&lt;p>每篇 hands-on 文章 focus 在「為什麼這樣設計」、本篇 focus 在「按順序跑通」。讀完想懂原理再進對應章節讀。&lt;/p>
&lt;blockquote>
&lt;p>&lt;strong>驗證日期&lt;/strong>：2026-05-12
&lt;strong>環境&lt;/strong>：macOS 14+、Apple Silicon、Ollama 0.23.2、Python 3.11+
&lt;strong>總時間&lt;/strong>：~15 分鐘（含 model 下載）
&lt;strong>磁碟需求&lt;/strong>：Step 1 ~ 4 約 ~5 GB（Ollama 200 MB + nomic-embed-text 274 MB + gemma3:1b 815 MB + room for index）；Step 5 ComfyUI 可選加 ~10 GB（SDXL base 模型）。
&lt;strong>適用平台&lt;/strong>：本快速路徑只在 Apple Silicon Mac 驗證過；Intel Mac / Linux 上 Ollama 仍可裝、但 GPU 加速跟 model tag 行為可能不同、實際以官方 release notes 為準。&lt;/p>&lt;/blockquote>
&lt;h2 id="適合誰讀">適合誰讀&lt;/h2>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>你是&lt;/th>
 &lt;th>本篇對你&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>剛 clone 我的 blog repo、想跑 demo 試試看&lt;/td>
 &lt;td>&lt;strong>從本篇開始&lt;/strong>、按步驟做&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>想懂某個 demo 的設計取捨&lt;/td>
 &lt;td>跑通後再進 &lt;a href="https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/rag-demo/" data-link-title="Hands-on：用 blog content 當 corpus 跑 RAG" data-link-desc="200 行 Python：embedding &amp;#43; cosine retrieval &amp;#43; Ollama chat、validating 4.0 RAG 原理">RAG demo&lt;/a> / &lt;a href="https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/mcp-demo/" data-link-title="Hands-on：用 blog content 寫一個最小 MCP server" data-link-desc="stdio JSON-RPC、stdlib-only Python、暴露 blog content 給 LLM 用、validating 4.3 應用層協議">MCP demo&lt;/a> / &lt;a href="https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/permission-boundary/" data-link-title="Hands-on：Ollama 改檔案 / 寫程式碼的權限邊界在哪" data-link-desc="四組對照實驗：Ollama 自己沒 FS / shell 權限、wrapper 才有；--dry-run / --confirm / --auto 三檔審查粒度的取捨">permission-boundary&lt;/a>&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>想懂 Ollama / ComfyUI 安裝細節&lt;/td>
 &lt;td>&lt;a href="https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/ollama-setup/" data-link-title="Hands-on：安裝 Ollama &amp;#43; 拉第一個 Gemma 模型" data-link-desc="brew install ollama、launchd service、ollama pull、curl 驗證 OpenAI 相容 API">Ollama setup&lt;/a> / &lt;a href="https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/comfyui-setup/" data-link-title="Hands-on：安裝 ComfyUI &amp;#43; SDXL base" data-link-desc="git clone、venv、pip install requirements、SDXL safetensors 放哪、--listen 啟動 server、瀏覽器 workflow 驗證">ComfyUI setup&lt;/a>&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>想看 production 怎麼想資源評估&lt;/td>
 &lt;td>&lt;a href="https://tarrragon.github.io/blog/llm/04-applications/production-resource-planning/" data-link-title="4.9 Production 部署的資源評估原理" data-link-desc="從本地單 user 到 production multi-tenant：concurrent users、cost model、observability、SLA、capacity planning 的設計取捨">4.9 Production resource planning&lt;/a>&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;h2 id="為什麼不是pre-builtclone-就能跑">為什麼不是「pre-built、clone 就能跑」&lt;/h2>
&lt;p>衍生產物（&lt;code>index.pkl&lt;/code>、&lt;code>__pycache__/&lt;/code>、Ollama model weights、即「跑出來的 cache / index / weight」、跟 source code 區別）刻意&lt;strong>不進 git&lt;/strong>、原因見 &lt;a href="https://tarrragon.github.io/blog/llm/04-applications/artifact-management/" data-link-title="4.10 衍生產物管理原理：什麼進 git、什麼不該" data-link-desc="LLM 應用的 source / derived / external 三類產物對應 git / build cache / registry、與 production 部署的 reproducibility / cost / share 取捨">4.10 衍生產物管理原理&lt;/a>。所以 clone repo 後需要：&lt;/p></description><content:encoded><![CDATA[<p>本篇是 hands-on 系列的<strong>導讀</strong>——把分散在 <code>ollama-setup</code> / <code>rag-demo</code> / <code>mcp-demo</code> / <code>permission-boundary</code> 各章節的 setup 步驟整合成一條最短路徑、讓 clone repo 的人能在 15 分鐘內跑通所有 demo（<a href="/blog/llm/knowledge-cards/rag/" data-link-title="RAG" data-link-desc="Retrieval-Augmented Generation：動態外掛知識給 LLM、繞開模型參數記憶的靜態限制">RAG</a>、<a href="/blog/llm/knowledge-cards/mcp/" data-link-title="MCP（Model Context Protocol）" data-link-desc="LLM application ↔ 外部 tool server 之間的標準化協議、複用 OpenAI 相容 API 的成功模式">MCP</a>、權限邊界三個 demo、RAG 是「retrieval 找相關內容 + LLM 回答」、MCP 是「LLM application ↔ tool server 的標準協議」）。</p>
<p>每篇 hands-on 文章 focus 在「為什麼這樣設計」、本篇 focus 在「按順序跑通」。讀完想懂原理再進對應章節讀。</p>
<blockquote>
<p><strong>驗證日期</strong>：2026-05-12
<strong>環境</strong>：macOS 14+、Apple Silicon、Ollama 0.23.2、Python 3.11+
<strong>總時間</strong>：~15 分鐘（含 model 下載）
<strong>磁碟需求</strong>：Step 1 ~ 4 約 ~5 GB（Ollama 200 MB + nomic-embed-text 274 MB + gemma3:1b 815 MB + room for index）；Step 5 ComfyUI 可選加 ~10 GB（SDXL base 模型）。
<strong>適用平台</strong>：本快速路徑只在 Apple Silicon Mac 驗證過；Intel Mac / Linux 上 Ollama 仍可裝、但 GPU 加速跟 model tag 行為可能不同、實際以官方 release notes 為準。</p></blockquote>
<h2 id="適合誰讀">適合誰讀</h2>
<table>
  <thead>
      <tr>
          <th>你是</th>
          <th>本篇對你</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>剛 clone 我的 blog repo、想跑 demo 試試看</td>
          <td><strong>從本篇開始</strong>、按步驟做</td>
      </tr>
      <tr>
          <td>想懂某個 demo 的設計取捨</td>
          <td>跑通後再進 <a href="/blog/llm/01-local-llm-services/hands-on/rag-demo/" data-link-title="Hands-on：用 blog content 當 corpus 跑 RAG" data-link-desc="200 行 Python：embedding &#43; cosine retrieval &#43; Ollama chat、validating 4.0 RAG 原理">RAG demo</a> / <a href="/blog/llm/01-local-llm-services/hands-on/mcp-demo/" data-link-title="Hands-on：用 blog content 寫一個最小 MCP server" data-link-desc="stdio JSON-RPC、stdlib-only Python、暴露 blog content 給 LLM 用、validating 4.3 應用層協議">MCP demo</a> / <a href="/blog/llm/01-local-llm-services/hands-on/permission-boundary/" data-link-title="Hands-on：Ollama 改檔案 / 寫程式碼的權限邊界在哪" data-link-desc="四組對照實驗：Ollama 自己沒 FS / shell 權限、wrapper 才有；--dry-run / --confirm / --auto 三檔審查粒度的取捨">permission-boundary</a></td>
      </tr>
      <tr>
          <td>想懂 Ollama / ComfyUI 安裝細節</td>
          <td><a href="/blog/llm/01-local-llm-services/hands-on/ollama-setup/" data-link-title="Hands-on：安裝 Ollama &#43; 拉第一個 Gemma 模型" data-link-desc="brew install ollama、launchd service、ollama pull、curl 驗證 OpenAI 相容 API">Ollama setup</a> / <a href="/blog/llm/01-local-llm-services/hands-on/comfyui-setup/" data-link-title="Hands-on：安裝 ComfyUI &#43; SDXL base" data-link-desc="git clone、venv、pip install requirements、SDXL safetensors 放哪、--listen 啟動 server、瀏覽器 workflow 驗證">ComfyUI setup</a></td>
      </tr>
      <tr>
          <td>想看 production 怎麼想資源評估</td>
          <td><a href="/blog/llm/04-applications/production-resource-planning/" data-link-title="4.9 Production 部署的資源評估原理" data-link-desc="從本地單 user 到 production multi-tenant：concurrent users、cost model、observability、SLA、capacity planning 的設計取捨">4.9 Production resource planning</a></td>
      </tr>
  </tbody>
</table>
<h2 id="為什麼不是pre-builtclone-就能跑">為什麼不是「pre-built、clone 就能跑」</h2>
<p>衍生產物（<code>index.pkl</code>、<code>__pycache__/</code>、Ollama model weights、即「跑出來的 cache / index / weight」、跟 source code 區別）刻意<strong>不進 git</strong>、原因見 <a href="/blog/llm/04-applications/artifact-management/" data-link-title="4.10 衍生產物管理原理：什麼進 git、什麼不該" data-link-desc="LLM 應用的 source / derived / external 三類產物對應 git / build cache / registry、與 production 部署的 reproducibility / cost / share 取捨">4.10 衍生產物管理原理</a>。所以 clone repo 後需要：</p>
<ol>
<li>裝 Ollama daemon + 拉 model（一次性）</li>
<li>跑 <code>ingest.py</code> 建 RAG index（corpus 變動時重跑）</li>
<li>之後 demo 就能用</li>
</ol>
<p>本篇是這個流程的 step-by-step。</p>
<h2 id="step-1裝-ollama-daemonbrew-install-ollama--brew-services-start">Step 1：裝 Ollama daemon（<code>brew install ollama</code> + <code>brew services start</code>）</h2>
<blockquote>
<p>daemon = 常駐 background process、開機自動啟動、見 <a href="/blog/llm/knowledge-cards/launchd-service/" data-link-title="launchd Service" data-link-desc="macOS 原生的服務管理機制、把 process 註冊成自動啟動的 daemon 或 agent">launchd service 卡</a>。</p></blockquote>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">brew install ollama
</span></span><span class="line"><span class="ln">2</span><span class="cl">brew services start ollama</span></span></code></pre></div><p>驗證：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">curl -s http://localhost:11434/api/version
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"># {&#34;version&#34;:&#34;0.x.x&#34;}</span></span></span></code></pre></div><p>詳細安裝跟 troubleshooting 見 <a href="/blog/llm/01-local-llm-services/hands-on/ollama-setup/" data-link-title="Hands-on：安裝 Ollama &#43; 拉第一個 Gemma 模型" data-link-desc="brew install ollama、launchd service、ollama pull、curl 驗證 OpenAI 相容 API">Ollama setup 章節</a>。</p>
<h2 id="step-2拉-modelembed--chat-兩種角色">Step 2：拉 model（embed + chat 兩種角色）</h2>
<blockquote>
<p>為什麼要拉兩個 model：RAG 需要 embedding model 把文字壓成向量做語意比對、chat model 負責根據 retrieval 結果生成回答、兩者訓練目標不同、不能互通（見 <a href="/blog/llm/03-theoretical-foundations/embedding-spaces/" data-link-title="3.1 Embedding 空間" data-link-desc="token 怎麼變成向量、為什麼相似 token 在向量空間中靠近、embedding 是怎麼學出來的">3.1 embedding 空間</a>）。</p></blockquote>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># Embedding model（RAG / MCP 都要、274 MB）</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">ollama pull nomic-embed-text
</span></span><span class="line"><span class="ln">3</span><span class="cl">
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="c1"># Chat model（推薦從 1B 開始驗證、之後可換大）</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">ollama pull gemma3:1b</span></span></code></pre></div><p>驗證：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">ollama list
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"># NAME                       SIZE      MODIFIED</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="c1"># gemma3:1b                  815 MB    ...</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="c1"># nomic-embed-text:latest    274 MB    ...</span></span></span></code></pre></div><p>選 chat model 大小的取捨見 <a href="/blog/llm/01-local-llm-services/model-selection-priority/" data-link-title="1.4 寫 code 場景的模型選型優先順序" data-link-desc="Gemma 4 31B MTP → Qwen3-Coder 30B → Qwen3 14B → gpt-oss 20B 的取捨與適用情境">1.4 模型選型優先順序</a>。本 quickstart 用 1B 主要驗證流程跑通；長段 daily use（需要 follow 多段格式指令、複雜推理）建議 4B / 8B 起跳（見 <a href="/blog/llm/01-local-llm-services/hands-on/instruction-following-test/" data-link-title="Hands-on：跨資料夾風格 follow 任務的模型對比" data-link-desc="1B / 4B / 8B / 跨代 4B 在「讀風格參考、follow 既有格式、寫新章節」任務上的 structural metrics 對比、揭示 model size 不是唯一因素">instruction-following-test</a>）、極短句驗證 / 簡單問答 1B 也可。本系列預設用 <a href="/blog/llm/knowledge-cards/instruction-tuned/" data-link-title="Instruction-Tuned Model" data-link-desc="經過指令微調的模型：會跟著 prompt 走、回答使用者問題">instruction-tuned model</a> 變體（tag 含 <code>:Xb</code> 不含 <code>-base</code>）、適合對話 / 寫 code。</p>
<h2 id="step-3建-rag-index跑-ingestpy">Step 3：建 RAG index（跑 <code>ingest.py</code>）</h2>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="nb">cd</span> /path/to/blog
</span></span><span class="line"><span class="ln">2</span><span class="cl">python3 scripts/rag-demo/ingest.py</span></span></code></pre></div><p>預期輸出：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">Found 71 markdown files under content/llm
</span></span><span class="line"><span class="ln">2</span><span class="cl">  [10/71] 86 chunks in 4.5s
</span></span><span class="line"><span class="ln">3</span><span class="cl">  ...
</span></span><span class="line"><span class="ln">4</span><span class="cl">Wrote 463 records to scripts/rag-demo/index.pkl (22.3s)</span></span></code></pre></div><p>實際數字看你的 blog content 量。Index file 在 <code>scripts/rag-demo/index.pkl</code>、3-50 MB 不等。</p>
<p>詳細的 chunking 策略、embedding 設計、為什麼 pickle、見 <a href="/blog/llm/01-local-llm-services/hands-on/rag-demo/" data-link-title="Hands-on：用 blog content 當 corpus 跑 RAG" data-link-desc="200 行 Python：embedding &#43; cosine retrieval &#43; Ollama chat、validating 4.0 RAG 原理">RAG demo 章節</a>。</p>
<h2 id="step-4跑-rag--mcp--permission-demo">Step 4：跑 RAG / MCP / permission demo</h2>
<p>完成 step 1-3 後、四個 demo 都能跑了：</p>
<h3 id="rag-demo語意搜尋--llm-回答">RAG demo（語意搜尋 + LLM 回答）</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">python3 scripts/rag-demo/query.py --show-retrieved <span class="s2">&#34;你的問題&#34;</span></span></span></code></pre></div><p>例：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">python3 scripts/rag-demo/query.py --show-retrieved <span class="s2">&#34;什麼是 MCP？&#34;</span></span></span></code></pre></div><p>預期看到 retrieved chunks（含相似度跟來源 path）+ LLM 用這些 context 生的答案。</p>
<h3 id="mcp-demostdio-json-rpc-server">MCP demo（stdio JSON-RPC server）</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">python3 scripts/mcp-demo/test_client.py</span></span></code></pre></div><p>預期看到 5 個階段的 JSON-RPC 對話：initialize / tools/list / tools/call (search_blog) / tools/call (read_chunk) / error。</p>
<h3 id="permission-boundary-demollm-mediated-file-edit">Permission boundary demo（LLM-mediated file edit）</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># 備份要試的檔案</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">cp content/llm/knowledge-cards/token.md /tmp/token-orig.md
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="c1"># Dry-run（預設、不寫檔、印 diff）</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">python3 scripts/permission-demo/edit_with_llm.py <span class="se">\
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="se"></span>  content/llm/knowledge-cards/token.md <span class="se">\
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="se"></span>  <span class="s2">&#34;加一句說明&#34;</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="c1"># 還原（如果剛剛沒用 dry-run）</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">cp /tmp/token-orig.md content/llm/knowledge-cards/token.md</span></span></code></pre></div><p>詳細的 <code>--dry-run</code> / <code>--confirm</code> / <code>--auto</code> 三種 mode 取捨見 <a href="/blog/llm/01-local-llm-services/hands-on/permission-boundary/" data-link-title="Hands-on：Ollama 改檔案 / 寫程式碼的權限邊界在哪" data-link-desc="四組對照實驗：Ollama 自己沒 FS / shell 權限、wrapper 才有；--dry-run / --confirm / --auto 三檔審查粒度的取捨">Permission boundary 章節</a>。</p>
<h2 id="step-5可選comfyui-text-to-image-demo">Step 5（可選）：ComfyUI text-to-image demo</h2>
<p>需要額外裝 ComfyUI + 拉 SDXL model（~10 GB 磁碟）、流程獨立：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># 跟 step 1 平行的軌道、見 ComfyUI setup 章節</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="nb">cd</span> ~/Projects
</span></span><span class="line"><span class="ln">3</span><span class="cl">git clone --depth <span class="m">1</span> https://github.com/comfyanonymous/ComfyUI.git
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="nb">cd</span> ComfyUI
</span></span><span class="line"><span class="ln">5</span><span class="cl">python3 -m venv venv
</span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="nb">source</span> venv/bin/activate
</span></span><span class="line"><span class="ln">7</span><span class="cl">pip install -r requirements.txt
</span></span><span class="line"><span class="ln">8</span><span class="cl"><span class="c1"># 下載 SDXL base：~/Projects/ComfyUI/models/checkpoints/</span>
</span></span><span class="line"><span class="ln">9</span><span class="cl"><span class="c1"># 見 ComfyUI setup 章節指令</span></span></span></code></pre></div><p>啟動 + 跑 generation：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="nb">cd</span> ~/Projects/ComfyUI <span class="o">&amp;&amp;</span> <span class="nb">source</span> venv/bin/activate <span class="o">&amp;&amp;</span> nohup python main.py &gt; /tmp/comfyui.log 2&gt;<span class="p">&amp;</span><span class="m">1</span> <span class="p">&amp;</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"># 等 server ready</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="k">until</span> curl -s -o /dev/null -w <span class="s2">&#34;%{http_code}&#34;</span> http://127.0.0.1:8188/ <span class="p">|</span> grep -q 200<span class="p">;</span> <span class="k">do</span> sleep 2<span class="p">;</span> <span class="k">done</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">
</span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="c1"># 跑 generation（用 repo 內的 script）</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="nb">cd</span> /path/to/blog
</span></span><span class="line"><span class="ln">7</span><span class="cl">python3 scripts/comfyui-test/generate.py --steps <span class="m">15</span></span></span></code></pre></div><p>詳細裝法 + workflow JSON 解讀見 <a href="/blog/llm/01-local-llm-services/hands-on/comfyui-setup/" data-link-title="Hands-on：安裝 ComfyUI &#43; SDXL base" data-link-desc="git clone、venv、pip install requirements、SDXL safetensors 放哪、--listen 啟動 server、瀏覽器 workflow 驗證">ComfyUI setup 章節</a>。</p>
<h2 id="cleanup完事釋放資源">Cleanup（完事釋放資源）</h2>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># 停 Ollama daemon</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">brew services stop ollama
</span></span><span class="line"><span class="ln">3</span><span class="cl">
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="c1"># kill ComfyUI（如果有跑）</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">pkill -9 -f <span class="s2">&#34;ComfyUI/main.py&#34;</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl">
</span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="c1"># 清 build artifact（可選、可重建）</span>
</span></span><span class="line"><span class="ln">8</span><span class="cl">rm -f scripts/rag-demo/index.pkl
</span></span><span class="line"><span class="ln">9</span><span class="cl">find scripts -name __pycache__ -type d -exec rm -rf <span class="o">{}</span> +</span></span></code></pre></div><p>詳細的 resource lifecycle 跟 cleanup idiom 見 <a href="/blog/llm/01-local-llm-services/hands-on/resource-management/" data-link-title="Hands-on：LLM 運行中 &#43; 結束的資源管理" data-link-desc="RAM / 磁碟 / port 三個 dimension 的觀察跟釋放、Ollama keep_alive 跟 ComfyUI 兩種 lifecycle 對比、實測釋放數字">Resource management 章節</a>。</p>
<h2 id="跑通後該往哪讀">跑通後該往哪讀</h2>
<table>
  <thead>
      <tr>
          <th>想懂什麼</th>
          <th>讀哪</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>「RAG 為什麼 retrieval 對 / generation 弱」</td>
          <td><a href="/blog/llm/01-local-llm-services/hands-on/rag-demo/" data-link-title="Hands-on：用 blog content 當 corpus 跑 RAG" data-link-desc="200 行 Python：embedding &#43; cosine retrieval &#43; Ollama chat、validating 4.0 RAG 原理">RAG demo</a></td>
      </tr>
      <tr>
          <td>「MCP wire protocol 細節」</td>
          <td><a href="/blog/llm/01-local-llm-services/hands-on/mcp-demo/" data-link-title="Hands-on：用 blog content 寫一個最小 MCP server" data-link-desc="stdio JSON-RPC、stdlib-only Python、暴露 blog content 給 LLM 用、validating 4.3 應用層協議">MCP demo</a></td>
      </tr>
      <tr>
          <td>「為什麼 LLM 寫 <code>rm -rf</code> 不會真的執行」</td>
          <td><a href="/blog/llm/01-local-llm-services/hands-on/permission-boundary/" data-link-title="Hands-on：Ollama 改檔案 / 寫程式碼的權限邊界在哪" data-link-desc="四組對照實驗：Ollama 自己沒 FS / shell 權限、wrapper 才有；--dry-run / --confirm / --auto 三檔審查粒度的取捨">Permission boundary</a></td>
      </tr>
      <tr>
          <td>「不同 model 在 instruction following 上的差距」</td>
          <td><a href="/blog/llm/01-local-llm-services/hands-on/instruction-following-test/" data-link-title="Hands-on：跨資料夾風格 follow 任務的模型對比" data-link-desc="1B / 4B / 8B / 跨代 4B 在「讀風格參考、follow 既有格式、寫新章節」任務上的 structural metrics 對比、揭示 model size 不是唯一因素">Instruction following test</a></td>
      </tr>
      <tr>
          <td>「跑 demo 占多少 RAM、怎麼釋放」</td>
          <td><a href="/blog/llm/01-local-llm-services/hands-on/resource-management/" data-link-title="Hands-on：LLM 運行中 &#43; 結束的資源管理" data-link-desc="RAM / 磁碟 / port 三個 dimension 的觀察跟釋放、Ollama keep_alive 跟 ComfyUI 兩種 lifecycle 對比、實測釋放數字">Resource management</a> + <a href="/blog/llm/01-local-llm-services/hands-on/rag-mcp-resources/" data-link-title="Hands-on：RAG / MCP 的資源 footprint" data-link-desc="RAG ingest / query / MCP server 三階段的 RAM / 磁碟 / process 實測、多模型並存的 RAM 衝突、本地 LLM 跑 RAG 跟單純 chat 的差異">RAG/MCP 資源 footprint</a></td>
      </tr>
      <tr>
          <td>「production 部署該怎麼想」</td>
          <td><a href="/blog/llm/04-applications/production-resource-planning/" data-link-title="4.9 Production 部署的資源評估原理" data-link-desc="從本地單 user 到 production multi-tenant：concurrent users、cost model、observability、SLA、capacity planning 的設計取捨">4.9 Production resource planning</a></td>
      </tr>
      <tr>
          <td>「什麼該進 git、什麼不該」</td>
          <td><a href="/blog/llm/04-applications/artifact-management/" data-link-title="4.10 衍生產物管理原理：什麼進 git、什麼不該" data-link-desc="LLM 應用的 source / derived / external 三類產物對應 git / build cache / registry、與 production 部署的 reproducibility / cost / share 取捨">4.10 衍生產物管理原理</a></td>
      </tr>
  </tbody>
</table>
<h2 id="跑不過時">跑不過時</h2>
<table>
  <thead>
      <tr>
          <th>症狀</th>
          <th>對應章節</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><code>ollama: command not found</code></td>
          <td><a href="/blog/llm/01-local-llm-services/hands-on/ollama-setup/" data-link-title="Hands-on：安裝 Ollama &#43; 拉第一個 Gemma 模型" data-link-desc="brew install ollama、launchd service、ollama pull、curl 驗證 OpenAI 相容 API">Ollama setup § 常見前置設定問題</a></td>
      </tr>
      <tr>
          <td><code>curl http://localhost:11434/api/version</code> 沒回應</td>
          <td>同上</td>
      </tr>
      <tr>
          <td><code>python3 ingest.py</code> 報 HTTP error</td>
          <td>確認 Ollama daemon 跑著、nomic-embed-text 已 pull</td>
      </tr>
      <tr>
          <td>RAG retrieval 結果都不相關</td>
          <td><a href="/blog/llm/04-applications/rag-principles/" data-link-title="4.1 RAG 原理：retrieval &#43; augmentation 模式" data-link-desc="為什麼模型需要外掛知識、語意相似 vs 字面相似、chunking 的本質取捨、retrieval 失敗的根本原因">4.1 RAG § Retrieval 失敗的根本原因</a></td>
      </tr>
      <tr>
          <td>MCP test_client 卡住</td>
          <td><a href="/blog/llm/01-local-llm-services/hands-on/mcp-demo/" data-link-title="Hands-on：用 blog content 寫一個最小 MCP server" data-link-desc="stdio JSON-RPC、stdlib-only Python、暴露 blog content 給 LLM 用、validating 4.3 應用層協議">MCP demo § subprocess 跟 bufsize</a></td>
      </tr>
      <tr>
          <td>一切都不對</td>
          <td><a href="/blog/llm/01-local-llm-services/troubleshooting/" data-link-title="1.7 排錯方法論：用三層架構做故障定位" data-link-desc="故障定位的分層思考、症狀到層級的對應反射、log 在三層的角色差異、最小可重現的縮減策略">1.7 排錯方法論</a></td>
      </tr>
  </tbody>
</table>
<h2 id="何時這篇會過時">何時這篇會過時</h2>
<p><strong>會變的部分</strong>：</p>
<ul>
<li><code>brew install ollama</code> 流程（macOS 跟 brew 演化）</li>
<li><code>ollama pull</code> 的具體 model tag（model 會新陳代謝）</li>
<li>Python 版本相容性（3.11 → 3.14 各有 quirk）</li>
</ul>
<p><strong>不會過時的部分</strong>：</p>
<ul>
<li>4 步驟的順序（裝 daemon → 拉 model → 建 index → 跑 demo）是 RAG / MCP / 任何 LLM 應用的通用 setup pattern</li>
<li>衍生產物（index、cache）不進 git 的設計取捨</li>
<li>Cleanup 步驟跟釋放邏輯</li>
</ul>
<p>跑指令時報錯先看 step 對應章節的 troubleshooting section、再 Google 或開 issue。</p>
]]></content:encoded></item><item><title>Hands-on：安裝 Ollama + 拉第一個 Gemma 模型</title><link>https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/ollama-setup/</link><pubDate>Mon, 11 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/ollama-setup/</guid><description>&lt;p>本篇紀錄在 Apple Silicon Mac 上裝 Ollama 並拉一個小模型驗證的完整流程。指令在 macOS 14 (Sonoma) / Homebrew 提供的環境下驗證。&lt;/p>
&lt;blockquote>
&lt;p>&lt;strong>驗證日期&lt;/strong>：2026-05-11
&lt;strong>Ollama 版本&lt;/strong>：0.23.2
&lt;strong>示範模型&lt;/strong>：&lt;code>gemma3:1b&lt;/code>（約 815 MB、選最小可運行的 Gemma 變體當驗證對象）&lt;/p>&lt;/blockquote>
&lt;h2 id="前置設定">前置設定&lt;/h2>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>項目&lt;/th>
 &lt;th>檢查指令&lt;/th>
 &lt;th>預期&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>macOS 版本&lt;/td>
 &lt;td>&lt;code>sw_vers -productVersion&lt;/code>&lt;/td>
 &lt;td>14.x 或更新&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Apple Silicon&lt;/td>
 &lt;td>&lt;code>uname -m&lt;/code>&lt;/td>
 &lt;td>&lt;code>arm64&lt;/code>&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Homebrew&lt;/td>
 &lt;td>&lt;code>brew --version&lt;/code>&lt;/td>
 &lt;td>4.x（任何近期版）&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>磁碟空間&lt;/td>
 &lt;td>&lt;code>df -h ~&lt;/code>&lt;/td>
 &lt;td>至少 3 GB 剩餘給 runtime + 1B 模型&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>port 11434&lt;/td>
 &lt;td>&lt;code>lsof -i :11434&lt;/code>&lt;/td>
 &lt;td>無輸出（port 沒被佔）&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>表中 &lt;code>brew --version&lt;/code> 這關若還沒過、代表 Homebrew 沒裝。新機從零的安裝順序（Homebrew、PATH、bash）見 &lt;a href="https://tarrragon.github.io/blog/other/macos-%E6%96%B0%E6%A9%9F%E5%9F%BA%E7%A4%8E%E5%BB%BA%E8%A8%AD%E5%A5%97%E4%BB%B6%E7%AE%A1%E7%90%86%E8%88%87%E5%80%8B%E4%BA%BA-bin-%E7%9A%84%E8%A8%AD%E5%AE%9A%E9%A0%86%E5%BA%8F/" data-link-title="macOS 新機基礎建設：套件管理與個人 bin 的設定順序" data-link-desc="重灌或換機後底層基礎建設的依賴順序，免得後面工具裝不起來或路徑互相找不到。">macOS 新機基礎建設&lt;/a>。&lt;/p>
&lt;p>選 1B 模型只是為了驗證流程、能力很弱、實際寫 code 場景請用 14B / 31B 級。模型大小跟記憶體 / 磁碟對應關係見 &lt;a href="https://tarrragon.github.io/blog/llm/00-foundations/hardware-memory-budget/" data-link-title="0.5 Apple Silicon 記憶體預算" data-link-desc="記憶體決定能跑什麼，Q4 量化下的可運作模型對照與系統保留">0.5 Apple Silicon 記憶體預算&lt;/a>。&lt;/p>
&lt;h2 id="安裝-ollama">安裝 Ollama&lt;/h2>
&lt;p>用 Homebrew 安裝、是 macOS 上最直接的路徑：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">brew install ollama&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>執行時間在 broadband 大約 30 秒到 2 分鐘、視 dependency cache 是否已有（Ollama 依賴 mlx-c 等 Apple Silicon 加速函式庫、首次裝較久）。&lt;/p>
&lt;p>裝完看到的 caveat 訊息：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-text" data-lang="text">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">To start ollama now and restart at login:
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl"> brew services start ollama
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">Or, if you don&amp;#39;t want/need a background service you can just run:
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl"> OLLAMA_FLASH_ATTENTION=&amp;#34;1&amp;#34; OLLAMA_KV_CACHE_TYPE=&amp;#34;q8_0&amp;#34; /opt/homebrew/opt/ollama/bin/ollama serve&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>兩種啟動模式：&lt;/p>
&lt;ul>
&lt;li>&lt;strong>launchd service&lt;/strong>（推薦日常用）：開機自動啟動、跑在背景。&lt;/li>
&lt;li>&lt;strong>前景手動跑&lt;/strong>：terminal 開著、關掉就停。&lt;/li>
&lt;/ul>
&lt;p>驗證 binary 路徑：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">which ollama
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="c1"># 應該回 /opt/homebrew/bin/ollama&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="啟動-ollama-service">啟動 Ollama Service&lt;/h2>
&lt;p>選 launchd service 模式：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">brew services start ollama&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>預期輸出：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-text" data-lang="text">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">==&amp;gt; Successfully started `ollama` (label: homebrew.mxcl.ollama)&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>這個動作做兩件事：&lt;/p>
&lt;ol>
&lt;li>註冊一個 launchd plist（macOS 開機自啟動 / 背景服務的設定檔、見 &lt;a href="https://tarrragon.github.io/blog/llm/knowledge-cards/launchd-service/" data-link-title="launchd Service" data-link-desc="macOS 原生的服務管理機制、把 process 註冊成自動啟動的 daemon 或 agent">launchd-service 卡片&lt;/a>）到 &lt;code>~/Library/LaunchAgents/homebrew.mxcl.ollama.plist&lt;/code>。&lt;/li>
&lt;li>立刻啟動 ollama serve、之後重開機自動啟動。&lt;/li>
&lt;/ol>
&lt;p>驗證 server 真的在跑：&lt;/p></description><content:encoded><![CDATA[<p>本篇紀錄在 Apple Silicon Mac 上裝 Ollama 並拉一個小模型驗證的完整流程。指令在 macOS 14 (Sonoma) / Homebrew 提供的環境下驗證。</p>
<blockquote>
<p><strong>驗證日期</strong>：2026-05-11
<strong>Ollama 版本</strong>：0.23.2
<strong>示範模型</strong>：<code>gemma3:1b</code>（約 815 MB、選最小可運行的 Gemma 變體當驗證對象）</p></blockquote>
<h2 id="前置設定">前置設定</h2>
<table>
  <thead>
      <tr>
          <th>項目</th>
          <th>檢查指令</th>
          <th>預期</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>macOS 版本</td>
          <td><code>sw_vers -productVersion</code></td>
          <td>14.x 或更新</td>
      </tr>
      <tr>
          <td>Apple Silicon</td>
          <td><code>uname -m</code></td>
          <td><code>arm64</code></td>
      </tr>
      <tr>
          <td>Homebrew</td>
          <td><code>brew --version</code></td>
          <td>4.x（任何近期版）</td>
      </tr>
      <tr>
          <td>磁碟空間</td>
          <td><code>df -h ~</code></td>
          <td>至少 3 GB 剩餘給 runtime + 1B 模型</td>
      </tr>
      <tr>
          <td>port 11434</td>
          <td><code>lsof -i :11434</code></td>
          <td>無輸出（port 沒被佔）</td>
      </tr>
  </tbody>
</table>
<p>表中 <code>brew --version</code> 這關若還沒過、代表 Homebrew 沒裝。新機從零的安裝順序（Homebrew、PATH、bash）見 <a href="/blog/other/macos-%E6%96%B0%E6%A9%9F%E5%9F%BA%E7%A4%8E%E5%BB%BA%E8%A8%AD%E5%A5%97%E4%BB%B6%E7%AE%A1%E7%90%86%E8%88%87%E5%80%8B%E4%BA%BA-bin-%E7%9A%84%E8%A8%AD%E5%AE%9A%E9%A0%86%E5%BA%8F/" data-link-title="macOS 新機基礎建設：套件管理與個人 bin 的設定順序" data-link-desc="重灌或換機後底層基礎建設的依賴順序，免得後面工具裝不起來或路徑互相找不到。">macOS 新機基礎建設</a>。</p>
<p>選 1B 模型只是為了驗證流程、能力很弱、實際寫 code 場景請用 14B / 31B 級。模型大小跟記憶體 / 磁碟對應關係見 <a href="/blog/llm/00-foundations/hardware-memory-budget/" data-link-title="0.5 Apple Silicon 記憶體預算" data-link-desc="記憶體決定能跑什麼，Q4 量化下的可運作模型對照與系統保留">0.5 Apple Silicon 記憶體預算</a>。</p>
<h2 id="安裝-ollama">安裝 Ollama</h2>
<p>用 Homebrew 安裝、是 macOS 上最直接的路徑：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">brew install ollama</span></span></code></pre></div><p>執行時間在 broadband 大約 30 秒到 2 分鐘、視 dependency cache 是否已有（Ollama 依賴 mlx-c 等 Apple Silicon 加速函式庫、首次裝較久）。</p>
<p>裝完看到的 caveat 訊息：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">To start ollama now and restart at login:
</span></span><span class="line"><span class="ln">2</span><span class="cl">  brew services start ollama
</span></span><span class="line"><span class="ln">3</span><span class="cl">Or, if you don&#39;t want/need a background service you can just run:
</span></span><span class="line"><span class="ln">4</span><span class="cl">  OLLAMA_FLASH_ATTENTION=&#34;1&#34; OLLAMA_KV_CACHE_TYPE=&#34;q8_0&#34; /opt/homebrew/opt/ollama/bin/ollama serve</span></span></code></pre></div><p>兩種啟動模式：</p>
<ul>
<li><strong>launchd service</strong>（推薦日常用）：開機自動啟動、跑在背景。</li>
<li><strong>前景手動跑</strong>：terminal 開著、關掉就停。</li>
</ul>
<p>驗證 binary 路徑：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">which ollama
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"># 應該回 /opt/homebrew/bin/ollama</span></span></span></code></pre></div><h2 id="啟動-ollama-service">啟動 Ollama Service</h2>
<p>選 launchd service 模式：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">brew services start ollama</span></span></code></pre></div><p>預期輸出：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">==&gt; Successfully started `ollama` (label: homebrew.mxcl.ollama)</span></span></code></pre></div><p>這個動作做兩件事：</p>
<ol>
<li>註冊一個 launchd plist（macOS 開機自啟動 / 背景服務的設定檔、見 <a href="/blog/llm/knowledge-cards/launchd-service/" data-link-title="launchd Service" data-link-desc="macOS 原生的服務管理機制、把 process 註冊成自動啟動的 daemon 或 agent">launchd-service 卡片</a>）到 <code>~/Library/LaunchAgents/homebrew.mxcl.ollama.plist</code>。</li>
<li>立刻啟動 ollama serve、之後重開機自動啟動。</li>
</ol>
<p>驗證 server 真的在跑：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">curl -s http://localhost:11434/api/version</span></span></code></pre></div><p>預期回：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-json" data-lang="json"><span class="line"><span class="ln">1</span><span class="cl"><span class="p">{</span><span class="nt">&#34;version&#34;</span><span class="p">:</span><span class="s2">&#34;0.23.2&#34;</span><span class="p">}</span></span></span></code></pre></div><p>看到這個 JSON 就證明三件事：Ollama daemon 跑了、port 11434 通了、API 結構正確。</p>
<h2 id="拉第一個模型">拉第一個模型</h2>
<p>Ollama 用 <code>ollama pull</code> 從官方 registry 下載模型：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">ollama pull gemma3:1b</span></span></code></pre></div><p>Gemma 3 1B 約 815 MB、broadband 約 1-2 分鐘下載。下載過程顯示多階段：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">pulling 7cd4618c1faf: 100% ▕██████████████████▏ 815 MB
</span></span><span class="line"><span class="ln">2</span><span class="cl">pulling e0a42594d802: 100% ▕██████████████████▏  358 B
</span></span><span class="line"><span class="ln">3</span><span class="cl">pulling dd084c7d92a3: 100% ▕██████████████████▏  8.4 KB
</span></span><span class="line"><span class="ln">4</span><span class="cl">pulling 3116c5225075: 100% ▕██████████████████▏   77 B
</span></span><span class="line"><span class="ln">5</span><span class="cl">pulling 120007c81bf8: 100% ▕██████████████████▏  492 B
</span></span><span class="line"><span class="ln">6</span><span class="cl">verifying sha256 digest
</span></span><span class="line"><span class="ln">7</span><span class="cl">writing manifest
</span></span><span class="line"><span class="ln">8</span><span class="cl">success</span></span></code></pre></div><p>幾個 hash blob 分別是：模型權重（最大那個）、tokenizer、template、license metadata 等。Ollama 把這些統一管理、放在 <code>~/.ollama/models/</code>。</p>
<p>驗證模型已下載：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">ollama list</span></span></code></pre></div><p>預期：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">NAME         ID              SIZE      MODIFIED
</span></span><span class="line"><span class="ln">2</span><span class="cl">gemma3:1b    8648f39daa8f    815 MB    35 seconds ago</span></span></code></pre></div><h2 id="驗證-openai-相容-api">驗證 OpenAI 相容 API</h2>
<p>OpenAI 相容 API 是下游所有工具（IDE plugin、RAG pipeline、MCP server、<a href="/blog/llm/01-local-llm-services/vscode-continue-integration/" data-link-title="1.3 VS Code &#43; Continue.dev 整合" data-link-desc="安裝 Continue 擴充套件、config.json 設定、Cmd&#43;L 對話 / Cmd&#43;I 行內編輯快捷鍵">Continue.dev</a> 等）依賴的介面 contract、驗證它能正常回應、整個 stack 才走得通：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">curl -s http://localhost:11434/v1/chat/completions <span class="se">\
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="se"></span>  -H <span class="s2">&#34;Content-Type: application/json&#34;</span> <span class="se">\
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="se"></span>  -d <span class="s1">&#39;{
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="s1">    &#34;model&#34;: &#34;gemma3:1b&#34;,
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="s1">    &#34;messages&#34;: [{&#34;role&#34;:&#34;user&#34;,&#34;content&#34;:&#34;Reply in one short sentence: what is 2+2?&#34;}],
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="s1">    &#34;stream&#34;: false
</span></span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="s1">  }&#39;</span></span></span></code></pre></div><p>預期回 JSON、<code>choices[0].message.content</code> 是模型回答（如 <code>&quot;2 + 2 = 4&quot;</code>）。看到合理回答就證明：</p>
<ol>
<li>Ollama 跟模型權重對接好。</li>
<li>OpenAI 相容 API 格式正常（IDE plugin 可以接）。</li>
<li>推論流程整條通。</li>
</ol>
<p>常見的失敗回應跟下一步：</p>
<ul>
<li><strong><code>{&quot;error&quot;:&quot;model 'gemma3:1b' not found, try pulling it first&quot;}</code></strong>：先跑 <code>ollama pull gemma3:1b</code>、確認 <code>ollama list</code> 看到該 tag。</li>
<li><strong><code>curl: (7) Failed to connect to localhost port 11434: Connection refused</code></strong>：server 沒在跑、回 <code>brew services list</code> 看 status、若是 stopped 跑 <code>brew services start ollama</code>。</li>
<li><strong><code>{&quot;error&quot;:&quot;json: cannot unmarshal ...&quot;}</code></strong>：請求格式錯（例如 messages 寫成 string 不是 array）、檢查 JSON body。</li>
<li><strong>連得上但長時間沒回應</strong>：第一次載入大 model 需要 30 ~ 60 秒、看 <code>~/.ollama/logs/server.log</code> 確認是否還在 loading。</li>
</ul>
<p>用內建 CLI 互動模式也行：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">ollama run gemma3:1b</span></span></code></pre></div><p>進入 REPL、可以打字對話。<code>/bye</code> 離開。</p>
<p>第一次跑 <code>ollama run</code> 會把模型載入記憶體（1B 模型大約 1-2 秒）、之後對話延遲低。如果幾分鐘沒用、模型會被 unload 釋放記憶體、下次 run 又要等載入。控制行為的環境變數是 <code>OLLAMA_KEEP_ALIVE</code>（預設 5 分鐘）。</p>
<h2 id="常見前置設定問題">常見前置設定問題</h2>
<h3 id="port-11434-被佔用">Port 11434 被佔用</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">lsof -i :11434</span></span></code></pre></div><p>若已有 process 占用、可能是先前手動跑過 <code>ollama serve</code> 沒關。kill 後再 start service：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">pkill -f <span class="s2">&#34;ollama serve&#34;</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">brew services restart ollama</span></span></code></pre></div><h3 id="ollama-command-not-found裝完還是找不到"><code>ollama: command not found</code>（裝完還是找不到）</h3>
<p>Homebrew 在 Apple Silicon 預設裝到 <code>/opt/homebrew/bin</code>、shell PATH 應該已含。若沒含：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="nb">echo</span> <span class="nv">$PATH</span> <span class="p">|</span> tr <span class="s1">&#39;:&#39;</span> <span class="s1">&#39;\n&#39;</span> <span class="p">|</span> grep homebrew
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"># 若沒看到 /opt/homebrew/bin、要加進 ~/.zshrc：</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="nb">echo</span> <span class="s1">&#39;export PATH=&#34;/opt/homebrew/bin:$PATH&#34;&#39;</span> &gt;&gt; ~/.zshrc
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="nb">source</span> ~/.zshrc</span></span></code></pre></div><h3 id="server-啟動但-curl-失敗">Server 啟動但 curl 失敗</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">brew services list <span class="p">|</span> grep ollama</span></span></code></pre></div><p>若 status 不是 <code>started</code>、看 log：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">tail -50 /opt/homebrew/var/log/ollama.log</span></span></code></pre></div><p>常見原因：port 衝突、權限問題、上次 crash 沒清乾淨。</p>
<p>完整排錯流程見 <a href="/blog/llm/01-local-llm-services/troubleshooting/" data-link-title="1.7 排錯方法論：用三層架構做故障定位" data-link-desc="故障定位的分層思考、症狀到層級的對應反射、log 在三層的角色差異、最小可重現的縮減策略">1.7 排錯方法論</a>。</p>
<h2 id="之後想做的事">之後想做的事</h2>
<ul>
<li><strong>接 VS Code</strong>：見 <a href="/blog/llm/01-local-llm-services/vscode-continue-integration/" data-link-title="1.3 VS Code &#43; Continue.dev 整合" data-link-desc="安裝 Continue 擴充套件、config.json 設定、Cmd&#43;L 對話 / Cmd&#43;I 行內編輯快捷鍵">1.3 VS Code + Continue.dev 整合</a>。設定 <code>apiBase: http://localhost:11434</code> 就能用。</li>
<li><strong>跑更大模型</strong>：32GB+ Mac 推薦 <code>gemma4:31b-coding-mtp-bf16</code>（18 GB）。模型選擇見 <a href="/blog/llm/01-local-llm-services/model-selection-priority/" data-link-title="1.4 寫 code 場景的模型選型優先順序" data-link-desc="Gemma 4 31B MTP → Qwen3-Coder 30B → Qwen3 14B → gpt-oss 20B 的取捨與適用情境">1.4 模型選型優先順序</a>。</li>
<li><strong>加 embedding</strong>：codebase 索引要 embedding 模型：<code>ollama pull nomic-embed-text</code>（274 MB）、見 <a href="/blog/llm/04-applications/rag-principles/" data-link-title="4.1 RAG 原理：retrieval &#43; augmentation 模式" data-link-desc="為什麼模型需要外掛知識、語意相似 vs 字面相似、chunking 的本質取捨、retrieval 失敗的根本原因">4.1 RAG 原理</a>。</li>
</ul>
<h2 id="升級--移除">升級 / 移除</h2>
<p>升級：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">brew upgrade ollama
</span></span><span class="line"><span class="ln">2</span><span class="cl">brew services restart ollama</span></span></code></pre></div><p>完整移除：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">brew services stop ollama
</span></span><span class="line"><span class="ln">2</span><span class="cl">brew uninstall ollama
</span></span><span class="line"><span class="ln">3</span><span class="cl">rm -rf ~/.ollama  <span class="c1"># 清模型 cache（可選）</span></span></span></code></pre></div><h2 id="何時這篇會過時">何時這篇會過時</h2>
<ul>
<li><code>brew install ollama</code> 安裝方式跟 OpenAI 相容 API 形狀短期內不會變（生態都依賴）。</li>
<li><code>gemma3:1b</code> 這個具體 tag 預期會被新模型取代、但「拉一個小模型驗證流程」的方法不變。</li>
<li>launchd service 機制是 macOS 系統 API、不會 deprecate。</li>
</ul>
<p>讀的時候若 <code>brew install</code> 跑失敗、查 Ollama GitHub release notes；其餘驗證步驟結構通用。</p>
]]></content:encoded></item></channel></rss>