<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Production on Tarragon</title><link>https://tarrragon.github.io/blog/tags/production/</link><description>Recent content in Production on Tarragon</description><generator>Hugo -- gohugo.io</generator><language>zh-TW</language><copyright>Tarragon (CC BY 4.0)</copyright><lastBuildDate>Thu, 14 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://tarrragon.github.io/blog/tags/production/index.xml" rel="self" type="application/rss+xml"/><item><title>Frozen baseline</title><link>https://tarrragon.github.io/blog/llm/knowledge-cards/frozen-baseline/</link><pubDate>Thu, 14 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/llm/knowledge-cards/frozen-baseline/</guid><description>&lt;p>Frozen baseline 的核心概念是「&lt;strong>把某個特定 prompt + 特定 model 跑 production 一段時間後 freeze、每次新版本都跟它比、定期 refresh 並標明時點&lt;/strong>」。Eval 系統的標準作法、讓行為漂移可見、避免「永遠跟上一版比、長期累積漂移看不見」的常見失敗。&lt;/p>
&lt;h2 id="概念位置">概念位置&lt;/h2>
&lt;p>跟其他 eval 概念對照：&lt;/p>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>概念&lt;/th>
 &lt;th>角色&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>Eval set&lt;/td>
 &lt;td>測試 input 的集合&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Frozen baseline&lt;/td>
 &lt;td>固定的「對照組」prompt + model 版本&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Regression set&lt;/td>
 &lt;td>Failed case 進來、防止改 prompt 又壞同樣 case&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Production trace&lt;/td>
 &lt;td>實際 traffic、抽樣補進 eval set / baseline&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>工作流：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-text" data-lang="text">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">Day 1：定義 eval set + 初始 prompt + model
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl"> ↓ 跑 production 一段時間（如 2 週）
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">Day 14：把當下 prompt + model freeze 成 baseline-v1
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl"> ↓
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl">新版本 prompt / model 都跟 baseline-v1 比
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">6&lt;/span>&lt;span class="cl"> ↓ 定期（如每季）refresh
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">7&lt;/span>&lt;span class="cl">Day 90：baseline-v2、標明 refresh 時點&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="設計責任">設計責任&lt;/h2>
&lt;p>讀 eval / production AI 文章看到「frozen baseline」「baseline drift」「regression set」就是這個機制。實作判讀：&lt;/p>
&lt;ol>
&lt;li>&lt;strong>為什麼必要&lt;/strong>：每次 A/B 都跟「最新版本」比、長期累積漂移完全不可見、「整體變好了沒」無從回答。Frozen baseline 是漂移的錨點。&lt;/li>
&lt;li>&lt;strong>何時 freeze&lt;/strong>：production 跑穩、user 滿意度可接受時 freeze。太早 freeze 鎖到不夠好的版本、太晚 freeze 鎖不到。&lt;/li>
&lt;li>&lt;strong>何時 refresh&lt;/strong>：定期（每季 / 每半年）、或當 baseline 明顯 obsolete（如 model 升級、產品大改版）。Refresh 後標明時點、舊版本仍可保留當歷史對照。&lt;/li>
&lt;li>&lt;strong>跟 frozen baseline 一起的還有&lt;/strong>：regression set（failed case 永遠進、防 fix 一個壞一個）、production trace 抽樣補進 eval set（讓 eval set 不脫節）。&lt;/li>
&lt;li>&lt;strong>失敗模式&lt;/strong>：baseline 跟 production 分佈差太遠（baseline 用 lab case、production 是 wild input）、跑出來分數沒參考價值。緩解：baseline 的 eval set 用 production trace 抽樣建。&lt;/li>
&lt;/ol>
&lt;p>完整 eval 系統設計見 &lt;a href="https://tarrragon.github.io/blog/llm/04-applications/eval-design-framework/" data-link-title="4.13 Eval 設計座標系：三軸、八象限、何時測什麼" data-link-desc="Eval 設計三軸（objective↔subjective / component↔end-to-end / quantitative↔qualitative）、八象限的對應 eval 工具、軸選錯的訊號、跟 benchmarking / LLM-as-judge / tracing 的關係">4.13 Eval 設計座標系&lt;/a>。&lt;/p></description><content:encoded><![CDATA[<p>Frozen baseline 的核心概念是「<strong>把某個特定 prompt + 特定 model 跑 production 一段時間後 freeze、每次新版本都跟它比、定期 refresh 並標明時點</strong>」。Eval 系統的標準作法、讓行為漂移可見、避免「永遠跟上一版比、長期累積漂移看不見」的常見失敗。</p>
<h2 id="概念位置">概念位置</h2>
<p>跟其他 eval 概念對照：</p>
<table>
  <thead>
      <tr>
          <th>概念</th>
          <th>角色</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Eval set</td>
          <td>測試 input 的集合</td>
      </tr>
      <tr>
          <td>Frozen baseline</td>
          <td>固定的「對照組」prompt + model 版本</td>
      </tr>
      <tr>
          <td>Regression set</td>
          <td>Failed case 進來、防止改 prompt 又壞同樣 case</td>
      </tr>
      <tr>
          <td>Production trace</td>
          <td>實際 traffic、抽樣補進 eval set / baseline</td>
      </tr>
  </tbody>
</table>
<p>工作流：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">Day 1：定義 eval set + 初始 prompt + model
</span></span><span class="line"><span class="ln">2</span><span class="cl">   ↓ 跑 production 一段時間（如 2 週）
</span></span><span class="line"><span class="ln">3</span><span class="cl">Day 14：把當下 prompt + model freeze 成 baseline-v1
</span></span><span class="line"><span class="ln">4</span><span class="cl">   ↓
</span></span><span class="line"><span class="ln">5</span><span class="cl">新版本 prompt / model 都跟 baseline-v1 比
</span></span><span class="line"><span class="ln">6</span><span class="cl">   ↓ 定期（如每季）refresh
</span></span><span class="line"><span class="ln">7</span><span class="cl">Day 90：baseline-v2、標明 refresh 時點</span></span></code></pre></div><h2 id="設計責任">設計責任</h2>
<p>讀 eval / production AI 文章看到「frozen baseline」「baseline drift」「regression set」就是這個機制。實作判讀：</p>
<ol>
<li><strong>為什麼必要</strong>：每次 A/B 都跟「最新版本」比、長期累積漂移完全不可見、「整體變好了沒」無從回答。Frozen baseline 是漂移的錨點。</li>
<li><strong>何時 freeze</strong>：production 跑穩、user 滿意度可接受時 freeze。太早 freeze 鎖到不夠好的版本、太晚 freeze 鎖不到。</li>
<li><strong>何時 refresh</strong>：定期（每季 / 每半年）、或當 baseline 明顯 obsolete（如 model 升級、產品大改版）。Refresh 後標明時點、舊版本仍可保留當歷史對照。</li>
<li><strong>跟 frozen baseline 一起的還有</strong>：regression set（failed case 永遠進、防 fix 一個壞一個）、production trace 抽樣補進 eval set（讓 eval set 不脫節）。</li>
<li><strong>失敗模式</strong>：baseline 跟 production 分佈差太遠（baseline 用 lab case、production 是 wild input）、跑出來分數沒參考價值。緩解：baseline 的 eval set 用 production trace 抽樣建。</li>
</ol>
<p>完整 eval 系統設計見 <a href="/blog/llm/04-applications/eval-design-framework/" data-link-title="4.13 Eval 設計座標系：三軸、八象限、何時測什麼" data-link-desc="Eval 設計三軸（objective↔subjective / component↔end-to-end / quantitative↔qualitative）、八象限的對應 eval 工具、軸選錯的訊號、跟 benchmarking / LLM-as-judge / tracing 的關係">4.13 Eval 設計座標系</a>。</p>
]]></content:encoded></item><item><title>LLM Tracing</title><link>https://tarrragon.github.io/blog/llm/knowledge-cards/llm-tracing/</link><pubDate>Tue, 12 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/llm/knowledge-cards/llm-tracing/</guid><description>&lt;p>LLM tracing 的核心概念是「&lt;strong>把 LLM 應用的每次 LLM call / tool call / memory op / handoff 編成結構化 span、串成 trace、可在 observability 平台查詢&lt;/strong>」。對應的標準是 OpenTelemetry GenAI semantic conventions（2025 stabilizing 中）。代表平台：LangSmith、Phoenix、Braintrust、Langfuse、Datadog APM、Logfire。是 production LLM 應用 debug / cost / latency 監控的事實標準、補 traditional logging 抓不到的「為什麼 agent 跑這條路」。&lt;/p>
&lt;h2 id="概念位置">概念位置&lt;/h2>
&lt;p>跟 traditional logging 的對比：&lt;/p>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>維度&lt;/th>
 &lt;th>Traditional logging&lt;/th>
 &lt;th>LLM tracing&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>結構&lt;/td>
 &lt;td>字串 line、靠 grep&lt;/td>
 &lt;td>結構化 span、parent-child 樹&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>關聯性&lt;/td>
 &lt;td>弱（要靠 request-id 串）&lt;/td>
 &lt;td>強（trace-id + span 父子關係內建）&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>屬性&lt;/td>
 &lt;td>自由 key-value&lt;/td>
 &lt;td>標準化（OTel GenAI semconv）：model / temperature / token usage / cost&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>查詢&lt;/td>
 &lt;td>grep / log aggregator&lt;/td>
 &lt;td>Trace explorer + filter + 視覺化&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>LLM 特有 attr&lt;/td>
 &lt;td>沒有&lt;/td>
 &lt;td>system prompt / tool calls / token / reasoning&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>主流 OTel GenAI span 類型：&lt;/p>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>Span 類型&lt;/th>
 &lt;th>內容&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>&lt;code>gen_ai.client.operation&lt;/code>&lt;/td>
 &lt;td>一次完整 LLM API call&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;code>gen_ai.tool.execution&lt;/code>&lt;/td>
 &lt;td>一次 tool 執行&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;code>gen_ai.agent&lt;/code>&lt;/td>
 &lt;td>Agent loop 一個 iteration&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;code>gen_ai.embeddings&lt;/code>&lt;/td>
 &lt;td>Embedding call&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;code>gen_ai.memory.read/write&lt;/code>&lt;/td>
 &lt;td>Memory 操作&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>每個 span 標準屬性：&lt;code>gen_ai.system&lt;/code>（vendor）、&lt;code>gen_ai.request.model&lt;/code>、&lt;code>gen_ai.usage.input_tokens&lt;/code> / &lt;code>output_tokens&lt;/code>、&lt;code>gen_ai.request.temperature&lt;/code> 等。&lt;/p>
&lt;h2 id="設計責任">設計責任&lt;/h2>
&lt;p>讀 LLM observability docs / OTel spec 看到「span」「trace」「OTel GenAI semconv」就是這 framing。寫 code 場景的判讀：&lt;/p>
&lt;ol>
&lt;li>&lt;strong>何時值得加 tracing&lt;/strong>：超過個人 demo、有實際使用者 / production 流量、開始遇到「為什麼 agent 跑這條路」debug 問題&lt;/li>
&lt;li>&lt;strong>不該自己寫 logging&lt;/strong>：用 OTel GenAI semconv 標準化、未來可換 backend（LangSmith → Phoenix → 自架）&lt;/li>
&lt;li>&lt;strong>Trace 不只 debug、也是 eval 來源&lt;/strong>：production trace 餵回 &lt;a href="https://tarrragon.github.io/blog/llm/knowledge-cards/llm-as-judge/" data-link-title="LLM-as-Judge" data-link-desc="用 LLM 評估另一個 LLM 的輸出品質、production eval 的主流方法、500-5000× 成本降但有 bias 要處理">LLM-as-judge&lt;/a> 做品質評估&lt;/li>
&lt;li>&lt;strong>跟 &lt;a href="https://tarrragon.github.io/blog/llm/04-applications/llm-tracing-and-observability/" data-link-title="4.20 LLM tracing 與 observability" data-link-desc="OpenTelemetry GenAI semantic conventions、結構化 span 設計、cost / latency 監控、failure debug 流程、跟 LLM-as-judge eval 的串接">4.20 LLM tracing 章節&lt;/a> 的關係&lt;/strong>：本卡是定義、章節是工程實務（attribute 設計、cost monitoring、failure debug 流程）&lt;/li>
&lt;/ol></description><content:encoded><![CDATA[<p>LLM tracing 的核心概念是「<strong>把 LLM 應用的每次 LLM call / tool call / memory op / handoff 編成結構化 span、串成 trace、可在 observability 平台查詢</strong>」。對應的標準是 OpenTelemetry GenAI semantic conventions（2025 stabilizing 中）。代表平台：LangSmith、Phoenix、Braintrust、Langfuse、Datadog APM、Logfire。是 production LLM 應用 debug / cost / latency 監控的事實標準、補 traditional logging 抓不到的「為什麼 agent 跑這條路」。</p>
<h2 id="概念位置">概念位置</h2>
<p>跟 traditional logging 的對比：</p>
<table>
  <thead>
      <tr>
          <th>維度</th>
          <th>Traditional logging</th>
          <th>LLM tracing</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>結構</td>
          <td>字串 line、靠 grep</td>
          <td>結構化 span、parent-child 樹</td>
      </tr>
      <tr>
          <td>關聯性</td>
          <td>弱（要靠 request-id 串）</td>
          <td>強（trace-id + span 父子關係內建）</td>
      </tr>
      <tr>
          <td>屬性</td>
          <td>自由 key-value</td>
          <td>標準化（OTel GenAI semconv）：model / temperature / token usage / cost</td>
      </tr>
      <tr>
          <td>查詢</td>
          <td>grep / log aggregator</td>
          <td>Trace explorer + filter + 視覺化</td>
      </tr>
      <tr>
          <td>LLM 特有 attr</td>
          <td>沒有</td>
          <td>system prompt / tool calls / token / reasoning</td>
      </tr>
  </tbody>
</table>
<p>主流 OTel GenAI span 類型：</p>
<table>
  <thead>
      <tr>
          <th>Span 類型</th>
          <th>內容</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><code>gen_ai.client.operation</code></td>
          <td>一次完整 LLM API call</td>
      </tr>
      <tr>
          <td><code>gen_ai.tool.execution</code></td>
          <td>一次 tool 執行</td>
      </tr>
      <tr>
          <td><code>gen_ai.agent</code></td>
          <td>Agent loop 一個 iteration</td>
      </tr>
      <tr>
          <td><code>gen_ai.embeddings</code></td>
          <td>Embedding call</td>
      </tr>
      <tr>
          <td><code>gen_ai.memory.read/write</code></td>
          <td>Memory 操作</td>
      </tr>
  </tbody>
</table>
<p>每個 span 標準屬性：<code>gen_ai.system</code>（vendor）、<code>gen_ai.request.model</code>、<code>gen_ai.usage.input_tokens</code> / <code>output_tokens</code>、<code>gen_ai.request.temperature</code> 等。</p>
<h2 id="設計責任">設計責任</h2>
<p>讀 LLM observability docs / OTel spec 看到「span」「trace」「OTel GenAI semconv」就是這 framing。寫 code 場景的判讀：</p>
<ol>
<li><strong>何時值得加 tracing</strong>：超過個人 demo、有實際使用者 / production 流量、開始遇到「為什麼 agent 跑這條路」debug 問題</li>
<li><strong>不該自己寫 logging</strong>：用 OTel GenAI semconv 標準化、未來可換 backend（LangSmith → Phoenix → 自架）</li>
<li><strong>Trace 不只 debug、也是 eval 來源</strong>：production trace 餵回 <a href="/blog/llm/knowledge-cards/llm-as-judge/" data-link-title="LLM-as-Judge" data-link-desc="用 LLM 評估另一個 LLM 的輸出品質、production eval 的主流方法、500-5000× 成本降但有 bias 要處理">LLM-as-judge</a> 做品質評估</li>
<li><strong>跟 <a href="/blog/llm/04-applications/llm-tracing-and-observability/" data-link-title="4.20 LLM tracing 與 observability" data-link-desc="OpenTelemetry GenAI semantic conventions、結構化 span 設計、cost / latency 監控、failure debug 流程、跟 LLM-as-judge eval 的串接">4.20 LLM tracing 章節</a> 的關係</strong>：本卡是定義、章節是工程實務（attribute 設計、cost monitoring、failure debug 流程）</li>
</ol>
]]></content:encoded></item><item><title>LLM-as-Judge</title><link>https://tarrragon.github.io/blog/llm/knowledge-cards/llm-as-judge/</link><pubDate>Tue, 12 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/llm/knowledge-cards/llm-as-judge/</guid><description>&lt;p>LLM-as-Judge 的核心概念是「&lt;strong>用一個 LLM（judge）對另一個 LLM（test subject）的輸出做品質評估&lt;/strong>」。給 judge 一個 rubric（評分標準）跟 (input, output) pair、judge 輸出分數或 pairwise 偏好。是 production LLM eval 的主流方法（500-5000× 比 human eval 便宜、80%+ 跟人類同意度）、但有 bias 要處理（position / verbosity / self-preference）。&lt;/p>
&lt;h2 id="概念位置">概念位置&lt;/h2>
&lt;p>跟其他 eval 路徑的對比：&lt;/p>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>Eval 路徑&lt;/th>
 &lt;th>成本&lt;/th>
 &lt;th>速度&lt;/th>
 &lt;th>適合&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>Standard &lt;a href="https://tarrragon.github.io/blog/llm/knowledge-cards/llm-benchmarks/" data-link-title="LLM Benchmarks（MMLU / HumanEval / SWE-bench 等）" data-link-desc="LLM 能力評估的標準 benchmark 集合：MMLU / HumanEval / MBPP / SWE-bench / MT-Bench 等的覆蓋範圍與失效情境">benchmark&lt;/a>（MMLU / SWE-bench 等）&lt;/td>
 &lt;td>中&lt;/td>
 &lt;td>慢（一次 run 數小時）&lt;/td>
 &lt;td>通用能力比較&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Human eval&lt;/td>
 &lt;td>極高（每筆 $1-10）&lt;/td>
 &lt;td>慢&lt;/td>
 &lt;td>黃金標準、final QA&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;strong>LLM-as-Judge（本卡）&lt;/strong>&lt;/td>
 &lt;td>低（每筆 $0.001-0.01）&lt;/td>
 &lt;td>快&lt;/td>
 &lt;td>Production loop eval、自己應用 in-house&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Rule-based / regex&lt;/td>
 &lt;td>極低&lt;/td>
 &lt;td>即時&lt;/td>
 &lt;td>明確 binary（如格式對不對）&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>主要 use case：&lt;/p>
&lt;ol>
&lt;li>&lt;strong>In-house benchmark&lt;/strong>：自己工作流的真實案例、自寫 rubric、judge 評&lt;/li>
&lt;li>&lt;strong>Production trace eval&lt;/strong>：用 &lt;a href="https://tarrragon.github.io/blog/llm/knowledge-cards/llm-tracing/" data-link-title="LLM Tracing" data-link-desc="把 LLM 應用的每次 LLM call / tool call / memory op 編成結構化 span、用 OpenTelemetry GenAI semantic conventions 標準化">LLM tracing&lt;/a> 蒐集的 production trace、定期 judge 跑、抓品質回歸&lt;/li>
&lt;li>&lt;strong>A/B test&lt;/strong>：兩個 prompt / model 變體、judge 做 pairwise 比較&lt;/li>
&lt;li>&lt;strong>Synthetic data quality&lt;/strong>：用大模型生 fine-tune 資料、judge 過濾低品質&lt;/li>
&lt;/ol>
&lt;h2 id="設計責任">設計責任&lt;/h2>
&lt;p>讀 eval framework / production AI app 看到「LLM as judge」「pairwise eval」「LLM evaluator」就是這 framing。寫 code 場景的判讀：&lt;/p>
&lt;ol>
&lt;li>&lt;strong>Judge 模型選擇&lt;/strong>：強模型當 judge（GPT-5 / Claude 4 / Gemini 旗艦）、reasoning model 更穩；judge 跟被測同家可能有 self-preference bias&lt;/li>
&lt;li>&lt;strong>三大 bias 緩解&lt;/strong>：
&lt;ul>
&lt;li>&lt;strong>Position bias&lt;/strong>：A/B pairwise 換位置跑 2 次取一致 vote&lt;/li>
&lt;li>&lt;strong>Verbosity bias&lt;/strong>：rubric 加「冗長不加分」明確指示、或長度 normalize&lt;/li>
&lt;li>&lt;strong>Self-preference bias&lt;/strong>：用 3 個不同 judge model 取多數&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>跟 &lt;a href="https://tarrragon.github.io/blog/llm/04-applications/llm-as-judge/" data-link-title="4.21 LLM-as-Judge 評估方法" data-link-desc="LLM 評估 LLM 的 production eval 方法：rubric design、pairwise / direct scoring、三大 bias 緩解、跟 trace 串接的閉環、calibration">4.21 LLM-as-judge 章節&lt;/a> 的關係&lt;/strong>：本卡是定義、章節是工程實務（rubric design、bias 緩解、calibration、trace 串接）&lt;/li>
&lt;li>&lt;strong>不是萬靈丹&lt;/strong>：高 stake 任務（醫療、法律、安全）仍需 human eval；judge 的天花板 = judge 模型本身的能力&lt;/li>
&lt;/ol></description><content:encoded><![CDATA[<p>LLM-as-Judge 的核心概念是「<strong>用一個 LLM（judge）對另一個 LLM（test subject）的輸出做品質評估</strong>」。給 judge 一個 rubric（評分標準）跟 (input, output) pair、judge 輸出分數或 pairwise 偏好。是 production LLM eval 的主流方法（500-5000× 比 human eval 便宜、80%+ 跟人類同意度）、但有 bias 要處理（position / verbosity / self-preference）。</p>
<h2 id="概念位置">概念位置</h2>
<p>跟其他 eval 路徑的對比：</p>
<table>
  <thead>
      <tr>
          <th>Eval 路徑</th>
          <th>成本</th>
          <th>速度</th>
          <th>適合</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Standard <a href="/blog/llm/knowledge-cards/llm-benchmarks/" data-link-title="LLM Benchmarks（MMLU / HumanEval / SWE-bench 等）" data-link-desc="LLM 能力評估的標準 benchmark 集合：MMLU / HumanEval / MBPP / SWE-bench / MT-Bench 等的覆蓋範圍與失效情境">benchmark</a>（MMLU / SWE-bench 等）</td>
          <td>中</td>
          <td>慢（一次 run 數小時）</td>
          <td>通用能力比較</td>
      </tr>
      <tr>
          <td>Human eval</td>
          <td>極高（每筆 $1-10）</td>
          <td>慢</td>
          <td>黃金標準、final QA</td>
      </tr>
      <tr>
          <td><strong>LLM-as-Judge（本卡）</strong></td>
          <td>低（每筆 $0.001-0.01）</td>
          <td>快</td>
          <td>Production loop eval、自己應用 in-house</td>
      </tr>
      <tr>
          <td>Rule-based / regex</td>
          <td>極低</td>
          <td>即時</td>
          <td>明確 binary（如格式對不對）</td>
      </tr>
  </tbody>
</table>
<p>主要 use case：</p>
<ol>
<li><strong>In-house benchmark</strong>：自己工作流的真實案例、自寫 rubric、judge 評</li>
<li><strong>Production trace eval</strong>：用 <a href="/blog/llm/knowledge-cards/llm-tracing/" data-link-title="LLM Tracing" data-link-desc="把 LLM 應用的每次 LLM call / tool call / memory op 編成結構化 span、用 OpenTelemetry GenAI semantic conventions 標準化">LLM tracing</a> 蒐集的 production trace、定期 judge 跑、抓品質回歸</li>
<li><strong>A/B test</strong>：兩個 prompt / model 變體、judge 做 pairwise 比較</li>
<li><strong>Synthetic data quality</strong>：用大模型生 fine-tune 資料、judge 過濾低品質</li>
</ol>
<h2 id="設計責任">設計責任</h2>
<p>讀 eval framework / production AI app 看到「LLM as judge」「pairwise eval」「LLM evaluator」就是這 framing。寫 code 場景的判讀：</p>
<ol>
<li><strong>Judge 模型選擇</strong>：強模型當 judge（GPT-5 / Claude 4 / Gemini 旗艦）、reasoning model 更穩；judge 跟被測同家可能有 self-preference bias</li>
<li><strong>三大 bias 緩解</strong>：
<ul>
<li><strong>Position bias</strong>：A/B pairwise 換位置跑 2 次取一致 vote</li>
<li><strong>Verbosity bias</strong>：rubric 加「冗長不加分」明確指示、或長度 normalize</li>
<li><strong>Self-preference bias</strong>：用 3 個不同 judge model 取多數</li>
</ul>
</li>
<li><strong>跟 <a href="/blog/llm/04-applications/llm-as-judge/" data-link-title="4.21 LLM-as-Judge 評估方法" data-link-desc="LLM 評估 LLM 的 production eval 方法：rubric design、pairwise / direct scoring、三大 bias 緩解、跟 trace 串接的閉環、calibration">4.21 LLM-as-judge 章節</a> 的關係</strong>：本卡是定義、章節是工程實務（rubric design、bias 緩解、calibration、trace 串接）</li>
<li><strong>不是萬靈丹</strong>：高 stake 任務（醫療、法律、安全）仍需 human eval；judge 的天花板 = judge 模型本身的能力</li>
</ol>
]]></content:encoded></item><item><title>Prefix Cache</title><link>https://tarrragon.github.io/blog/llm/knowledge-cards/prefix-cache/</link><pubDate>Tue, 12 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/llm/knowledge-cards/prefix-cache/</guid><description>&lt;p>Prefix Cache 的核心概念是「當多個請求共用相同的前綴 prompt（如同一 system prompt、同一 few-shot 範例）、把該前綴的 &lt;a href="https://tarrragon.github.io/blog/llm/knowledge-cards/kv-cache/" data-link-title="KV Cache" data-link-desc="已處理 token 的 attention 中間結果暫存：避免重算、加速後續生成">KV cache&lt;/a> 算一次、後續請求共用、省下重複 &lt;a href="https://tarrragon.github.io/blog/llm/knowledge-cards/prefill/" data-link-title="Prefill" data-link-desc="Prompt 首次處理時的計算階段：把整段輸入跑過模型、產生 KV cache">prefill&lt;/a> 算力」。是 production LLM 服務的常見優化、能大幅降低 latency 跟成本；但在多租戶場景下、跨租戶共用 prefix cache 是直接的隱私洩漏面。&lt;/p>
&lt;h2 id="概念位置">概念位置&lt;/h2>
&lt;p>Prefix Cache 在推論流程中的角色：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-text" data-lang="text">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">傳統推論：
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl"> Request A：system prompt + user A → 完整 prefill → 生成
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl"> Request B：system prompt + user B → 完整 prefill → 生成
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl"> ↑ 重複算 system prompt
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">6&lt;/span>&lt;span class="cl">開啟 Prefix Cache：
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">7&lt;/span>&lt;span class="cl"> Request A：system prompt + user A → prefill 整段、cache 共用 prefix
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">8&lt;/span>&lt;span class="cl"> Request B：system prompt + user B → 重用 cache 的 system prefix + 只 prefill user B → 生成
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">9&lt;/span>&lt;span class="cl"> ↑ 省下 system prompt 的 prefill 算力&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>效益對應的場景：&lt;/p>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>場景&lt;/th>
 &lt;th>效益&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>同 system prompt、不同 user message&lt;/td>
 &lt;td>prefill 算力大幅省&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>同 few-shot 例子、不同 query&lt;/td>
 &lt;td>prefill 算力大幅省&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>長 RAG context 共用、不同問題&lt;/td>
 &lt;td>prefill 算力大幅省&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>完全獨立的請求（無共用前綴）&lt;/td>
 &lt;td>無效益&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>主流推論引擎的支援度（依版本變化）：vLLM、SGLang、llama.cpp 等都有 prefix cache 機制、命名各異。&lt;/p>
&lt;blockquote>
&lt;p>&lt;strong>事實查核註&lt;/strong>：prefix cache 的命名、設定方式、tenant 隔離預設行為依推論引擎跟版本差異大、引用前以對應引擎的官方文件為準（如 &lt;a href="https://docs.vllm.ai/">vLLM Automatic Prefix Caching&lt;/a>、SGLang RadixAttention 等）。&lt;/p>&lt;/blockquote>
&lt;h2 id="設計責任">設計責任&lt;/h2>
&lt;p>理解 prefix cache 後可以解釋兩個現象：為什麼 production LLM 服務的 latency 在啟用 prefix cache 後大幅下降（system prompt 不再每次重算）、為什麼 prefix cache 在多租戶場景是隱私風險（A 租戶的 prefix 可能被 B 看到、見 &lt;a href="https://tarrragon.github.io/blog/backend/07-security-data-protection/llm-multi-tenant-isolation/" data-link-title="LLM 多租戶推論隔離" data-link-desc="production LLM 服務的多租戶隔離：KV cache 不共享、log / model artifact 隔離、跨用戶 prompt 洩漏面">llm-multi-tenant-isolation&lt;/a>）。&lt;/p>
&lt;p>production 設計時、prefix cache 應該按 tenant 分桶、同 tenant 內可共用、跨 tenant 必須隔離。隔離邊界對齊 &lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/tenant-boundary/" data-link-title="Tenant Boundary" data-link-desc="說明多租戶系統如何隔離不同客戶或組織的資料與資源">tenant-boundary&lt;/a> 卡片的設計。&lt;/p></description><content:encoded><![CDATA[<p>Prefix Cache 的核心概念是「當多個請求共用相同的前綴 prompt（如同一 system prompt、同一 few-shot 範例）、把該前綴的 <a href="/blog/llm/knowledge-cards/kv-cache/" data-link-title="KV Cache" data-link-desc="已處理 token 的 attention 中間結果暫存：避免重算、加速後續生成">KV cache</a> 算一次、後續請求共用、省下重複 <a href="/blog/llm/knowledge-cards/prefill/" data-link-title="Prefill" data-link-desc="Prompt 首次處理時的計算階段：把整段輸入跑過模型、產生 KV cache">prefill</a> 算力」。是 production LLM 服務的常見優化、能大幅降低 latency 跟成本；但在多租戶場景下、跨租戶共用 prefix cache 是直接的隱私洩漏面。</p>
<h2 id="概念位置">概念位置</h2>
<p>Prefix Cache 在推論流程中的角色：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">傳統推論：
</span></span><span class="line"><span class="ln">2</span><span class="cl">  Request A：system prompt + user A → 完整 prefill → 生成
</span></span><span class="line"><span class="ln">3</span><span class="cl">  Request B：system prompt + user B → 完整 prefill → 生成
</span></span><span class="line"><span class="ln">4</span><span class="cl">                                       ↑ 重複算 system prompt
</span></span><span class="line"><span class="ln">5</span><span class="cl">
</span></span><span class="line"><span class="ln">6</span><span class="cl">開啟 Prefix Cache：
</span></span><span class="line"><span class="ln">7</span><span class="cl">  Request A：system prompt + user A → prefill 整段、cache 共用 prefix
</span></span><span class="line"><span class="ln">8</span><span class="cl">  Request B：system prompt + user B → 重用 cache 的 system prefix + 只 prefill user B → 生成
</span></span><span class="line"><span class="ln">9</span><span class="cl">                                       ↑ 省下 system prompt 的 prefill 算力</span></span></code></pre></div><p>效益對應的場景：</p>
<table>
  <thead>
      <tr>
          <th>場景</th>
          <th>效益</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>同 system prompt、不同 user message</td>
          <td>prefill 算力大幅省</td>
      </tr>
      <tr>
          <td>同 few-shot 例子、不同 query</td>
          <td>prefill 算力大幅省</td>
      </tr>
      <tr>
          <td>長 RAG context 共用、不同問題</td>
          <td>prefill 算力大幅省</td>
      </tr>
      <tr>
          <td>完全獨立的請求（無共用前綴）</td>
          <td>無效益</td>
      </tr>
  </tbody>
</table>
<p>主流推論引擎的支援度（依版本變化）：vLLM、SGLang、llama.cpp 等都有 prefix cache 機制、命名各異。</p>
<blockquote>
<p><strong>事實查核註</strong>：prefix cache 的命名、設定方式、tenant 隔離預設行為依推論引擎跟版本差異大、引用前以對應引擎的官方文件為準（如 <a href="https://docs.vllm.ai/">vLLM Automatic Prefix Caching</a>、SGLang RadixAttention 等）。</p></blockquote>
<h2 id="設計責任">設計責任</h2>
<p>理解 prefix cache 後可以解釋兩個現象：為什麼 production LLM 服務的 latency 在啟用 prefix cache 後大幅下降（system prompt 不再每次重算）、為什麼 prefix cache 在多租戶場景是隱私風險（A 租戶的 prefix 可能被 B 看到、見 <a href="/blog/backend/07-security-data-protection/llm-multi-tenant-isolation/" data-link-title="LLM 多租戶推論隔離" data-link-desc="production LLM 服務的多租戶隔離：KV cache 不共享、log / model artifact 隔離、跨用戶 prompt 洩漏面">llm-multi-tenant-isolation</a>）。</p>
<p>production 設計時、prefix cache 應該按 tenant 分桶、同 tenant 內可共用、跨 tenant 必須隔離。隔離邊界對齊 <a href="/blog/backend/knowledge-cards/tenant-boundary/" data-link-title="Tenant Boundary" data-link-desc="說明多租戶系統如何隔離不同客戶或組織的資料與資源">tenant-boundary</a> 卡片的設計。</p>
]]></content:encoded></item><item><title>6.1 graceful shutdown 與 signal handling</title><link>https://tarrragon.github.io/blog/go-advanced/06-production-operations/graceful-shutdown/</link><pubDate>Wed, 22 Apr 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/go-advanced/06-production-operations/graceful-shutdown/</guid><description>&lt;p>&lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/graceful-shutdown/" data-link-title="Graceful Shutdown" data-link-desc="說明服務停止前如何排空流量、完成工作與保存狀態">Graceful shutdown&lt;/a> 的核心目標是服務收到停止訊號後，不再接受新工作，並給既有工作一段時間完成或清理。Go 服務通常用 signal、root context、&lt;code>http.Server.Shutdown&lt;/code>、worker context 與 &lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/timeout/" data-link-title="Timeout" data-link-desc="說明等待外部操作的時間上限如何保護資源與使用者體驗">timeout&lt;/a> 串起停止流程。&lt;/p>
&lt;h2 id="本章目標">本章目標&lt;/h2>
&lt;p>學完本章後，你將能夠：&lt;/p>
&lt;ol>
&lt;li>把 OS signal 轉成 root context 取消&lt;/li>
&lt;li>用 &lt;code>http.Server.Shutdown&lt;/code> 停止接受新 request&lt;/li>
&lt;li>讓 worker、hub、&lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/websocket/" data-link-title="WebSocket" data-link-desc="說明 WebSocket 如何提供長連線雙向即時通訊">WebSocket&lt;/a> pump 觀察同一個停止訊號&lt;/li>
&lt;li>設計 shutdown timeout 與強制退出邊界&lt;/li>
&lt;li>測試 server 與 worker 的停止流程&lt;/li>
&lt;/ol>
&lt;hr>
&lt;h2 id="觀察直接結束-process-會留下不確定狀態">【觀察】直接結束 process 會留下不確定狀態&lt;/h2>
&lt;p>Shutdown 的核心風險是停止流程不明確。服務可能正在處理 request、WebSocket client 仍在線、worker 正在寫資料、&lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/queue/" data-link-title="Queue" data-link-desc="說明 queue 如何保存等待處理的工作並形成容量邊界">queue&lt;/a> message 尚未 &lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/ack-nack/" data-link-title="Ack / Nack" data-link-desc="說明 consumer 如何向 broker 回報訊息處理結果">ack&lt;/a>、diagnostics 還以為服務可接流量。&lt;/p>
&lt;p>不完整停止常見後果：&lt;/p>
&lt;ul>
&lt;li>新 request 在服務即將關閉時仍被接受。&lt;/li>
&lt;li>WebSocket client 沒收到 close，server 端 goroutine 殘留。&lt;/li>
&lt;li>背景 worker 寫到一半被中斷。&lt;/li>
&lt;li>&lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/readiness/" data-link-title="Readiness" data-link-desc="說明 instance 何時可以安全接收流量，以及 readiness 如何和部署平台協作">readiness&lt;/a> 還是 200，負載平衡器繼續送流量。&lt;/li>
&lt;li>測試結束後留下 goroutine 或開放 port。&lt;/li>
&lt;/ul>
&lt;p>Graceful shutdown 是讓停止策略可預期。&lt;/p>
&lt;h2 id="判讀shutdown-是多階段流程">【判讀】shutdown 是多階段流程&lt;/h2>
&lt;p>Graceful shutdown 的核心流程是先停止接新工作，再讓既有工作收尾，最後釋放資源。&lt;/p>
&lt;p>建議順序：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-text" data-lang="text">&lt;span class="line">&lt;span class="ln"> 1&lt;/span>&lt;span class="cl">receive SIGINT/SIGTERM
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 2&lt;/span>&lt;span class="cl"> │
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 3&lt;/span>&lt;span class="cl"> ▼
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 4&lt;/span>&lt;span class="cl">cancel root context
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 5&lt;/span>&lt;span class="cl"> │
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 6&lt;/span>&lt;span class="cl"> ├── readiness becomes false
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 7&lt;/span>&lt;span class="cl"> ├── HTTP server stops accepting new requests
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 8&lt;/span>&lt;span class="cl"> ├── workers stop consuming new jobs
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 9&lt;/span>&lt;span class="cl"> ├── WebSocket hub unregisters clients
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">10&lt;/span>&lt;span class="cl"> └── diagnostics/log records shutdown reason
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">11&lt;/span>&lt;span class="cl"> │
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">12&lt;/span>&lt;span class="cl"> ▼
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">13&lt;/span>&lt;span class="cl">wait within timeout
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">14&lt;/span>&lt;span class="cl"> │
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">15&lt;/span>&lt;span class="cl"> ▼
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">16&lt;/span>&lt;span class="cl">process exits&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>不同服務會有不同細節，但核心不變：停止訊號要集中，元件各自完成自己的 cleanup，整體流程要有 timeout。&lt;/p>
&lt;h2 id="執行signal-轉成-root-context">【執行】signal 轉成 root context&lt;/h2>
&lt;p>Signal handling 的核心責任是把作業系統訊號轉成應用程式可理解的取消訊號。Go 1.16 之後可以使用 &lt;code>signal.NotifyContext&lt;/code>。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">main&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl"> &lt;span class="nx">ctx&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">stop&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">signal&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">NotifyContext&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">context&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Background&lt;/span>&lt;span class="p">(),&lt;/span> &lt;span class="nx">os&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Interrupt&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">syscall&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">SIGTERM&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl"> &lt;span class="k">defer&lt;/span> &lt;span class="nf">stop&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="nx">err&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nf">run&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">ctx&lt;/span>&lt;span class="p">);&lt;/span> &lt;span class="nx">err&lt;/span> &lt;span class="o">!=&lt;/span> &lt;span class="kc">nil&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">6&lt;/span>&lt;span class="cl"> &lt;span class="nx">log&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Fatal&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">err&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">7&lt;/span>&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">8&lt;/span>&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;code>ctx&lt;/code> 是 root context。HTTP server、worker、hub、diagnostics 都應從它派生出自己的 lifecycle，而不是每個元件各自監聽 signal。&lt;/p>
&lt;p>Signal handler 不應放大量清理邏輯。它只負責發出停止意圖；實際清理由各元件在自己的 ownership 邊界內完成。&lt;/p></description><content:encoded><![CDATA[<p><a href="/blog/backend/knowledge-cards/graceful-shutdown/" data-link-title="Graceful Shutdown" data-link-desc="說明服務停止前如何排空流量、完成工作與保存狀態">Graceful shutdown</a> 的核心目標是服務收到停止訊號後，不再接受新工作，並給既有工作一段時間完成或清理。Go 服務通常用 signal、root context、<code>http.Server.Shutdown</code>、worker context 與 <a href="/blog/backend/knowledge-cards/timeout/" data-link-title="Timeout" data-link-desc="說明等待外部操作的時間上限如何保護資源與使用者體驗">timeout</a> 串起停止流程。</p>
<h2 id="本章目標">本章目標</h2>
<p>學完本章後，你將能夠：</p>
<ol>
<li>把 OS signal 轉成 root context 取消</li>
<li>用 <code>http.Server.Shutdown</code> 停止接受新 request</li>
<li>讓 worker、hub、<a href="/blog/backend/knowledge-cards/websocket/" data-link-title="WebSocket" data-link-desc="說明 WebSocket 如何提供長連線雙向即時通訊">WebSocket</a> pump 觀察同一個停止訊號</li>
<li>設計 shutdown timeout 與強制退出邊界</li>
<li>測試 server 與 worker 的停止流程</li>
</ol>
<hr>
<h2 id="觀察直接結束-process-會留下不確定狀態">【觀察】直接結束 process 會留下不確定狀態</h2>
<p>Shutdown 的核心風險是停止流程不明確。服務可能正在處理 request、WebSocket client 仍在線、worker 正在寫資料、<a href="/blog/backend/knowledge-cards/queue/" data-link-title="Queue" data-link-desc="說明 queue 如何保存等待處理的工作並形成容量邊界">queue</a> message 尚未 <a href="/blog/backend/knowledge-cards/ack-nack/" data-link-title="Ack / Nack" data-link-desc="說明 consumer 如何向 broker 回報訊息處理結果">ack</a>、diagnostics 還以為服務可接流量。</p>
<p>不完整停止常見後果：</p>
<ul>
<li>新 request 在服務即將關閉時仍被接受。</li>
<li>WebSocket client 沒收到 close，server 端 goroutine 殘留。</li>
<li>背景 worker 寫到一半被中斷。</li>
<li><a href="/blog/backend/knowledge-cards/readiness/" data-link-title="Readiness" data-link-desc="說明 instance 何時可以安全接收流量，以及 readiness 如何和部署平台協作">readiness</a> 還是 200，負載平衡器繼續送流量。</li>
<li>測試結束後留下 goroutine 或開放 port。</li>
</ul>
<p>Graceful shutdown 是讓停止策略可預期。</p>
<h2 id="判讀shutdown-是多階段流程">【判讀】shutdown 是多階段流程</h2>
<p>Graceful shutdown 的核心流程是先停止接新工作，再讓既有工作收尾，最後釋放資源。</p>
<p>建議順序：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln"> 1</span><span class="cl">receive SIGINT/SIGTERM
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">        │
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">        ▼
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">cancel root context
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">        │
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">        ├── readiness becomes false
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">        ├── HTTP server stops accepting new requests
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">        ├── workers stop consuming new jobs
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">        ├── WebSocket hub unregisters clients
</span></span><span class="line"><span class="ln">10</span><span class="cl">        └── diagnostics/log records shutdown reason
</span></span><span class="line"><span class="ln">11</span><span class="cl">        │
</span></span><span class="line"><span class="ln">12</span><span class="cl">        ▼
</span></span><span class="line"><span class="ln">13</span><span class="cl">wait within timeout
</span></span><span class="line"><span class="ln">14</span><span class="cl">        │
</span></span><span class="line"><span class="ln">15</span><span class="cl">        ▼
</span></span><span class="line"><span class="ln">16</span><span class="cl">process exits</span></span></code></pre></div><p>不同服務會有不同細節，但核心不變：停止訊號要集中，元件各自完成自己的 cleanup，整體流程要有 timeout。</p>
<h2 id="執行signal-轉成-root-context">【執行】signal 轉成 root context</h2>
<p>Signal handling 的核心責任是把作業系統訊號轉成應用程式可理解的取消訊號。Go 1.16 之後可以使用 <code>signal.NotifyContext</code>。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln">1</span><span class="cl"><span class="kd">func</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">    <span class="nx">ctx</span><span class="p">,</span> <span class="nx">stop</span> <span class="o">:=</span> <span class="nx">signal</span><span class="p">.</span><span class="nf">NotifyContext</span><span class="p">(</span><span class="nx">context</span><span class="p">.</span><span class="nf">Background</span><span class="p">(),</span> <span class="nx">os</span><span class="p">.</span><span class="nx">Interrupt</span><span class="p">,</span> <span class="nx">syscall</span><span class="p">.</span><span class="nx">SIGTERM</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">    <span class="k">defer</span> <span class="nf">stop</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">
</span></span><span class="line"><span class="ln">5</span><span class="cl">    <span class="k">if</span> <span class="nx">err</span> <span class="o">:=</span> <span class="nf">run</span><span class="p">(</span><span class="nx">ctx</span><span class="p">);</span> <span class="nx">err</span> <span class="o">!=</span> <span class="kc">nil</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl">        <span class="nx">log</span><span class="p">.</span><span class="nf">Fatal</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">7</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">8</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p><code>ctx</code> 是 root context。HTTP server、worker、hub、diagnostics 都應從它派生出自己的 lifecycle，而不是每個元件各自監聽 signal。</p>
<p>Signal handler 不應放大量清理邏輯。它只負責發出停止意圖；實際清理由各元件在自己的 ownership 邊界內完成。</p>
<h2 id="執行http-server-用-shutdown-停止接新-request">【執行】HTTP server 用 Shutdown 停止接新 request</h2>
<p><code>http.Server.Shutdown</code> 的核心行為是停止接受新連線，並等待既有 request 在 timeout 內完成。它比直接 <code>Close</code> 更適合 graceful shutdown。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="kd">func</span> <span class="nf">RunHTTPServer</span><span class="p">(</span><span class="nx">ctx</span> <span class="nx">context</span><span class="p">.</span><span class="nx">Context</span><span class="p">,</span> <span class="nx">handler</span> <span class="nx">http</span><span class="p">.</span><span class="nx">Handler</span><span class="p">)</span> <span class="kt">error</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="nx">server</span> <span class="o">:=</span> <span class="o">&amp;</span><span class="nx">http</span><span class="p">.</span><span class="nx">Server</span><span class="p">{</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">        <span class="nx">Addr</span><span class="p">:</span>    <span class="s">&#34;:8080&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">        <span class="nx">Handler</span><span class="p">:</span> <span class="nx">handler</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">    <span class="nx">errCh</span> <span class="o">:=</span> <span class="nb">make</span><span class="p">(</span><span class="kd">chan</span> <span class="kt">error</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">    <span class="k">go</span> <span class="kd">func</span><span class="p">()</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">        <span class="nx">errCh</span> <span class="o">&lt;-</span> <span class="nx">server</span><span class="p">.</span><span class="nf">ListenAndServe</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">    <span class="p">}()</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">
</span></span><span class="line"><span class="ln">12</span><span class="cl">    <span class="k">select</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">    <span class="k">case</span> <span class="o">&lt;-</span><span class="nx">ctx</span><span class="p">.</span><span class="nf">Done</span><span class="p">():</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">        <span class="nx">shutdownCtx</span><span class="p">,</span> <span class="nx">cancel</span> <span class="o">:=</span> <span class="nx">context</span><span class="p">.</span><span class="nf">WithTimeout</span><span class="p">(</span><span class="nx">context</span><span class="p">.</span><span class="nf">Background</span><span class="p">(),</span> <span class="mi">10</span><span class="o">*</span><span class="nx">time</span><span class="p">.</span><span class="nx">Second</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">        <span class="k">defer</span> <span class="nf">cancel</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl">        <span class="k">return</span> <span class="nx">server</span><span class="p">.</span><span class="nf">Shutdown</span><span class="p">(</span><span class="nx">shutdownCtx</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl">
</span></span><span class="line"><span class="ln">18</span><span class="cl">    <span class="k">case</span> <span class="nx">err</span> <span class="o">:=</span> <span class="o">&lt;-</span><span class="nx">errCh</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">19</span><span class="cl">        <span class="k">if</span> <span class="nx">errors</span><span class="p">.</span><span class="nf">Is</span><span class="p">(</span><span class="nx">err</span><span class="p">,</span> <span class="nx">http</span><span class="p">.</span><span class="nx">ErrServerClosed</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">20</span><span class="cl">            <span class="k">return</span> <span class="kc">nil</span>
</span></span><span class="line"><span class="ln">21</span><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="ln">22</span><span class="cl">        <span class="k">return</span> <span class="nx">err</span>
</span></span><span class="line"><span class="ln">23</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">24</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>Shutdown timeout 是必要邊界。沒有 timeout 的 shutdown 可能永遠等待某個卡住 request；timeout 太短則可能讓合理 request 來不及收尾。</p>
<h2 id="策略readiness-應先變成-false">【策略】readiness 應先變成 false</h2>
<p>Readiness 的核心用途是控制服務是否應接新流量。Shutdown 開始後，readiness 應先變成 false，再停止 server 或等待既有工作。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="kd">type</span> <span class="nx">Lifecycle</span> <span class="kd">struct</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="nx">shuttingDown</span> <span class="nx">atomic</span><span class="p">.</span><span class="nx">Bool</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="kd">func</span> <span class="p">(</span><span class="nx">l</span> <span class="o">*</span><span class="nx">Lifecycle</span><span class="p">)</span> <span class="nf">BeginShutdown</span><span class="p">()</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">    <span class="nx">l</span><span class="p">.</span><span class="nx">shuttingDown</span><span class="p">.</span><span class="nf">Store</span><span class="p">(</span><span class="kc">true</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="kd">func</span> <span class="p">(</span><span class="nx">l</span> <span class="o">*</span><span class="nx">Lifecycle</span><span class="p">)</span> <span class="nf">Ready</span><span class="p">()</span> <span class="kt">bool</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">    <span class="k">return</span> <span class="p">!</span><span class="nx">l</span><span class="p">.</span><span class="nx">shuttingDown</span><span class="p">.</span><span class="nf">Load</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>Signal 收到後：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln">1</span><span class="cl"><span class="nx">lifecycle</span><span class="p">.</span><span class="nf">BeginShutdown</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="nf">cancel</span><span class="p">()</span></span></span></code></pre></div><p>這讓負載平衡器或監控能知道服務不應再接新流量。Process 還活著，但 readiness 已經反映操作狀態。</p>
<h2 id="執行背景工作要觀察-context">【執行】背景工作要觀察 context</h2>
<p>背景 worker 的核心 shutdown 條件是每個 loop 都能觀察停止訊號。Ticker、queue <a href="/blog/backend/knowledge-cards/consumer/" data-link-title="Consumer" data-link-desc="說明 consumer 如何取得等待處理的工作並產生業務結果">consumer</a>、WebSocket hub 都應該有退出路徑。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="kd">func</span> <span class="nf">RunWorker</span><span class="p">(</span><span class="nx">ctx</span> <span class="nx">context</span><span class="p">.</span><span class="nx">Context</span><span class="p">)</span> <span class="kt">error</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="nx">ticker</span> <span class="o">:=</span> <span class="nx">time</span><span class="p">.</span><span class="nf">NewTicker</span><span class="p">(</span><span class="nx">time</span><span class="p">.</span><span class="nx">Minute</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">    <span class="k">defer</span> <span class="nx">ticker</span><span class="p">.</span><span class="nf">Stop</span><span class="p">()</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">    <span class="k">for</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">        <span class="k">select</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">        <span class="k">case</span> <span class="o">&lt;-</span><span class="nx">ctx</span><span class="p">.</span><span class="nf">Done</span><span class="p">():</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">            <span class="k">return</span> <span class="nx">ctx</span><span class="p">.</span><span class="nf">Err</span><span class="p">()</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">        <span class="k">case</span> <span class="o">&lt;-</span><span class="nx">ticker</span><span class="p">.</span><span class="nx">C</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">            <span class="k">if</span> <span class="nx">err</span> <span class="o">:=</span> <span class="nf">RunOnce</span><span class="p">(</span><span class="nx">ctx</span><span class="p">);</span> <span class="nx">err</span> <span class="o">!=</span> <span class="kc">nil</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">                <span class="k">return</span> <span class="nx">err</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">            <span class="p">}</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>若 <code>RunOnce</code> 可能執行很久，也應接收 context。否則外層 loop 看到 cancel，內層 I/O 或計算仍可能卡住。</p>
<h2 id="策略websocket-cleanup-要回到-hub-owner">【策略】WebSocket cleanup 要回到 hub owner</h2>
<p>WebSocket shutdown 的核心原則是讓 hub 或 connection manager 統一清理 client。不要讓 signal handler 直接遍歷各種 connection 並隨意 close。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="kd">func</span> <span class="p">(</span><span class="nx">h</span> <span class="o">*</span><span class="nx">Hub</span><span class="p">)</span> <span class="nf">Run</span><span class="p">(</span><span class="nx">ctx</span> <span class="nx">context</span><span class="p">.</span><span class="nx">Context</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="k">for</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">        <span class="k">select</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">        <span class="k">case</span> <span class="o">&lt;-</span><span class="nx">ctx</span><span class="p">.</span><span class="nf">Done</span><span class="p">():</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">            <span class="nx">h</span><span class="p">.</span><span class="nf">closeAllClients</span><span class="p">()</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">            <span class="k">return</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">        <span class="k">case</span> <span class="nx">client</span> <span class="o">:=</span> <span class="o">&lt;-</span><span class="nx">h</span><span class="p">.</span><span class="nx">register</span><span class="p">:</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">            <span class="nx">h</span><span class="p">.</span><span class="nf">registerClient</span><span class="p">(</span><span class="nx">client</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">        <span class="k">case</span> <span class="nx">client</span> <span class="o">:=</span> <span class="o">&lt;-</span><span class="nx">h</span><span class="p">.</span><span class="nx">unregister</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">            <span class="nx">h</span><span class="p">.</span><span class="nf">unregisterClient</span><span class="p">(</span><span class="nx">client</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p><code>closeAllClients</code> 應透過 hub 的既有 owner 邏輯關閉 <code>send</code>、移除訂閱、關閉 connection。這延續前面模組的 ownership 原則。</p>
<h2 id="測試shutdown-測試要觀察明確條件">【測試】shutdown 測試要觀察明確條件</h2>
<p>Shutdown 測試的核心是確認停止訊號能讓元件退出，而不是等待固定時間。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="kd">func</span> <span class="nf">TestWorkerStopsOnContextCancel</span><span class="p">(</span><span class="nx">t</span> <span class="o">*</span><span class="nx">testing</span><span class="p">.</span><span class="nx">T</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="nx">ctx</span><span class="p">,</span> <span class="nx">cancel</span> <span class="o">:=</span> <span class="nx">context</span><span class="p">.</span><span class="nf">WithCancel</span><span class="p">(</span><span class="nx">context</span><span class="p">.</span><span class="nf">Background</span><span class="p">())</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">    <span class="nx">done</span> <span class="o">:=</span> <span class="nb">make</span><span class="p">(</span><span class="kd">chan</span> <span class="kd">struct</span><span class="p">{})</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">    <span class="k">go</span> <span class="kd">func</span><span class="p">()</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">        <span class="k">defer</span> <span class="nb">close</span><span class="p">(</span><span class="nx">done</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">        <span class="nx">_</span> <span class="p">=</span> <span class="nf">RunWorker</span><span class="p">(</span><span class="nx">ctx</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">    <span class="p">}()</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">
</span></span><span class="line"><span class="ln">10</span><span class="cl">    <span class="nf">cancel</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">
</span></span><span class="line"><span class="ln">12</span><span class="cl">    <span class="k">select</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">    <span class="k">case</span> <span class="o">&lt;-</span><span class="nx">done</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">    <span class="k">case</span> <span class="o">&lt;-</span><span class="nx">time</span><span class="p">.</span><span class="nf">After</span><span class="p">(</span><span class="nx">time</span><span class="p">.</span><span class="nx">Second</span><span class="p">):</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">        <span class="nx">t</span><span class="p">.</span><span class="nf">Fatalf</span><span class="p">(</span><span class="s">&#34;worker did not stop&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>HTTP server 測試可以啟動 server 後 cancel context，確認 <code>RunHTTPServer</code> 回傳。測試應使用隨機 port 或 <code>httptest.Server</code>，避免固定 port 造成衝突。</p>
<h2 id="本章不處理">本章不處理</h2>
<p>本章先處理服務內部的 shutdown 順序與 cleanup owner；平台 hook、timeout 與 load balancer 合約，會在下列章節再往外延伸：</p>
<ul>
<li><a href="/blog/go-advanced/07-distributed-operations/deployment-contracts/" data-link-title="7.5 Kubernetes、systemd 與 load balancer 合約" data-link-desc="理解部署平台如何影響 Go 服務的 shutdown、health 與資源限制">Go 進階：Kubernetes、systemd 與 load balancer 合約</a></li>
</ul>
<h2 id="和-go-教材的關係">和 Go 教材的關係</h2>
<p>這一章承接的是 goroutine lifecycle、ticker cleanup 與 platform handoff；如果你要先回看語言教材，可以讀：</p>
<ul>
<li><a href="/blog/go/04-concurrency/goroutine/" data-link-title="4.1 goroutine：輕量並發工作" data-link-desc="用 goroutine 啟動並發工作，並設計清楚的退出條件">Go：goroutine：輕量並發工作</a></li>
<li><a href="/blog/go/03-stdlib/defer-cleanup/" data-link-title="3.8 defer 與資源清理" data-link-desc="用 defer 管理 close、unlock、cleanup 與 panic 邊界">Go：defer 與資源清理</a></li>
<li><a href="/blog/go/04-concurrency/select/" data-link-title="4.3 select：同時等待多種事件" data-link-desc="用 select 建立事件迴圈">Go：select：同時等待多種事件</a></li>
<li><a href="/blog/go-advanced/03-runtime-profiling/goroutine-leak/" data-link-title="3.3 goroutine leak 偵測" data-link-desc="判斷背景工作與 client pump 是否正確退出">Go：goroutine leak 偵測</a></li>
<li><a href="/blog/backend/05-deployment-platform/" data-link-title="模組五：部署平台與網路入口" data-link-desc="整理 Kubernetes、systemd、load balancer、container 與服務生命週期合約">Backend：部署平台與網路入口</a></li>
</ul>
<h2 id="小結">小結</h2>
<p>Graceful shutdown 是多階段流程：signal 轉成 root context，readiness 先關閉，HTTP server 停止接新 request，worker 和 WebSocket hub 觀察 context 收尾，整體流程受 timeout 保護。停止訊號越集中，元件 ownership 越清楚，服務在部署、測試與本機開發時越不容易留下殘存 goroutine 或未釋放連線。</p>
]]></content:encoded></item><item><title>6.2 健康檢查與診斷 endpoint</title><link>https://tarrragon.github.io/blog/go-advanced/06-production-operations/health-diagnostics/</link><pubDate>Wed, 22 Apr 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/go-advanced/06-production-operations/health-diagnostics/</guid><description>&lt;p>健康檢查與診斷 endpoint 的核心差異是使用者與風險不同。&lt;code>/health&lt;/code> 給監控或負載平衡器判斷 process 是否活著，&lt;code>/ready&lt;/code> 判斷是否應接流量，&lt;code>/debug/...&lt;/code> 則給工程師排查問題且必須限制存取。&lt;/p>
&lt;h2 id="本章目標">本章目標&lt;/h2>
&lt;p>學完本章後，你將能夠：&lt;/p>
&lt;ol>
&lt;li>分辨 health、&lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/readiness/" data-link-title="Readiness" data-link-desc="說明 instance 何時可以安全接收流量，以及 readiness 如何和部署平台協作">readiness&lt;/a>、diagnostics 的語意&lt;/li>
&lt;li>設計快速穩定的 &lt;code>/health&lt;/code>&lt;/li>
&lt;li>用 &lt;code>/ready&lt;/code> 控制是否接新流量&lt;/li>
&lt;li>條件啟用 pprof、runtime stats 等診斷入口&lt;/li>
&lt;li>測試 status code 與 JSON response 合約&lt;/li>
&lt;/ol>
&lt;hr>
&lt;h2 id="觀察所有狀態都塞進-health-會讓監控失真">【觀察】所有狀態都塞進 health 會讓監控失真&lt;/h2>
&lt;p>Health endpoint 的核心風險是語意混亂。若 &lt;code>/health&lt;/code> 同時檢查 process、&lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/database/" data-link-title="Database" data-link-desc="說明 database 在後端系統中如何承擔正式狀態、查詢與一致性責任">database&lt;/a>、&lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/queue/" data-link-title="Queue" data-link-desc="說明 queue 如何保存等待處理的工作並形成容量邊界">queue&lt;/a>、外部 API、cache、背景同步，任何依賴短暫波動都可能讓服務被判定死亡。&lt;/p>
&lt;p>問題範例：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-text" data-lang="text">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">/health
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl"> ├── process alive?
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl"> ├── database reachable?
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl"> ├── queue lag small?
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl"> ├── external API reachable?
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">6&lt;/span>&lt;span class="cl"> └── background sync fresh?&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>這些問題不應全部塞進同一個 endpoint。Process 活著、可接流量、依賴降級、工程診斷，是不同操作訊號。&lt;/p>
&lt;h2 id="判讀healthreadydiagnostics-回答不同問題">【判讀】health、ready、diagnostics 回答不同問題&lt;/h2>
&lt;p>操作 endpoint 的核心設計是每個 endpoint 只回答一個問題。&lt;/p>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>Endpoint&lt;/th>
 &lt;th>使用者&lt;/th>
 &lt;th>回答的問題&lt;/th>
 &lt;th>失敗影響&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>&lt;code>/health&lt;/code>&lt;/td>
 &lt;td>process monitor&lt;/td>
 &lt;td>process 是否基本活著&lt;/td>
 &lt;td>可能重啟 process&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;code>/ready&lt;/code>&lt;/td>
 &lt;td>load balancer&lt;/td>
 &lt;td>是否應接新流量&lt;/td>
 &lt;td>暫停導流&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;code>/debug/...&lt;/code>&lt;/td>
 &lt;td>工程師&lt;/td>
 &lt;td>服務內部狀態如何&lt;/td>
 &lt;td>不應公開&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;code>/metrics&lt;/code>&lt;/td>
 &lt;td>&lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/metrics/" data-link-title="Metrics" data-link-desc="說明指標如何描述服務趨勢、容量與健康狀態">metrics&lt;/a> collector&lt;/td>
 &lt;td>可聚合監控資料&lt;/td>
 &lt;td>監控缺資料&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>這樣切分後，某個外部依賴故障不一定要讓 process 被重啟；服務可能只是不 ready，或處於 degraded 狀態。&lt;/p>
&lt;h2 id="執行health-endpoint-應簡單快速">【執行】health endpoint 應簡單快速&lt;/h2>
&lt;p>Health endpoint 的核心責任是快速回答 process 是否能處理基本 HTTP request。它應該簡單、快速、穩定。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="ln"> 1&lt;/span>&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">HandleHealth&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">w&lt;/span> &lt;span class="nx">http&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">ResponseWriter&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">r&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">http&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Request&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 2&lt;/span>&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="nx">r&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Method&lt;/span> &lt;span class="o">!=&lt;/span> &lt;span class="nx">http&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">MethodGet&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 3&lt;/span>&lt;span class="cl"> &lt;span class="nx">http&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Error&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">w&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;method not allowed&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">http&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">StatusMethodNotAllowed&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 4&lt;/span>&lt;span class="cl"> &lt;span class="k">return&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 5&lt;/span>&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 6&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 7&lt;/span>&lt;span class="cl"> &lt;span class="nx">w&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Header&lt;/span>&lt;span class="p">().&lt;/span>&lt;span class="nf">Set&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;Content-Type&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;application/json&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 8&lt;/span>&lt;span class="cl"> &lt;span class="nx">w&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">WriteHeader&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">http&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">StatusOK&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 9&lt;/span>&lt;span class="cl"> &lt;span class="nx">_&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">_&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="nx">w&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Write&lt;/span>&lt;span class="p">([]&lt;/span>&lt;span class="nb">byte&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">`{&amp;#34;status&amp;#34;:&amp;#34;ok&amp;#34;}`&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">10&lt;/span>&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;code>/health&lt;/code> 不應執行昂貴查詢，也不應依賴大量下游服務。若健康檢查本身很慢，監控會把診斷工具變成新問題。&lt;/p>
&lt;h2 id="執行readiness-控制是否接流量">【執行】readiness 控制是否接流量&lt;/h2>
&lt;p>Readiness 的核心責任是回答「服務現在是否應該接新流量」。它可以檢查啟動狀態、必要依賴、shutdown 狀態。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="ln"> 1&lt;/span>&lt;span class="cl">&lt;span class="kd">type&lt;/span> &lt;span class="nx">Readiness&lt;/span> &lt;span class="kd">struct&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 2&lt;/span>&lt;span class="cl"> &lt;span class="nx">ready&lt;/span> &lt;span class="nx">atomic&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Bool&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 3&lt;/span>&lt;span class="cl"> &lt;span class="nx">shuttingDown&lt;/span> &lt;span class="nx">atomic&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Bool&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 4&lt;/span>&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 5&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 6&lt;/span>&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">r&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">Readiness&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nf">Ready&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="kt">bool&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 7&lt;/span>&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">r&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">ready&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Load&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="o">&amp;amp;&amp;amp;&lt;/span> &lt;span class="p">!&lt;/span>&lt;span class="nx">r&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">shuttingDown&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Load&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 8&lt;/span>&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 9&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">10&lt;/span>&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">HandleReady&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">readiness&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">Readiness&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nx">http&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">HandlerFunc&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">11&lt;/span>&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="kd">func&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">w&lt;/span> &lt;span class="nx">http&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">ResponseWriter&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">r&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">http&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Request&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">12&lt;/span>&lt;span class="cl"> &lt;span class="nx">w&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Header&lt;/span>&lt;span class="p">().&lt;/span>&lt;span class="nf">Set&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;Content-Type&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;application/json&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">13&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">14&lt;/span>&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="p">!&lt;/span>&lt;span class="nx">readiness&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Ready&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">15&lt;/span>&lt;span class="cl"> &lt;span class="nx">w&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">WriteHeader&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">http&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">StatusServiceUnavailable&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">16&lt;/span>&lt;span class="cl"> &lt;span class="nx">_&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">_&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="nx">w&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Write&lt;/span>&lt;span class="p">([]&lt;/span>&lt;span class="nb">byte&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">`{&amp;#34;status&amp;#34;:&amp;#34;not_ready&amp;#34;}`&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">17&lt;/span>&lt;span class="cl"> &lt;span class="k">return&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">18&lt;/span>&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">19&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">20&lt;/span>&lt;span class="cl"> &lt;span class="nx">w&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">WriteHeader&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">http&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">StatusOK&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">21&lt;/span>&lt;span class="cl"> &lt;span class="nx">_&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">_&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="nx">w&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Write&lt;/span>&lt;span class="p">([]&lt;/span>&lt;span class="nb">byte&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">`{&amp;#34;status&amp;#34;:&amp;#34;ready&amp;#34;}`&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">22&lt;/span>&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">23&lt;/span>&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>服務啟動尚未完成、必要背景同步尚未就緒、或 &lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/graceful-shutdown/" data-link-title="Graceful Shutdown" data-link-desc="說明服務停止前如何排空流量、完成工作與保存狀態">graceful shutdown&lt;/a> 已開始時，readiness 應回 &lt;code>503&lt;/code>。Process 仍然活著，但不應接新流量。&lt;/p>
&lt;h2 id="策略dependency-check-依照監控語意分層">【策略】dependency check 依照監控語意分層&lt;/h2>
&lt;p>依賴檢查的核心判斷是故障是否代表 process 應重啟。Database 暫時不可用不一定代表 process 壞掉；重啟可能無法修復，反而造成更多負載。&lt;/p></description><content:encoded><![CDATA[<p>健康檢查與診斷 endpoint 的核心差異是使用者與風險不同。<code>/health</code> 給監控或負載平衡器判斷 process 是否活著，<code>/ready</code> 判斷是否應接流量，<code>/debug/...</code> 則給工程師排查問題且必須限制存取。</p>
<h2 id="本章目標">本章目標</h2>
<p>學完本章後，你將能夠：</p>
<ol>
<li>分辨 health、<a href="/blog/backend/knowledge-cards/readiness/" data-link-title="Readiness" data-link-desc="說明 instance 何時可以安全接收流量，以及 readiness 如何和部署平台協作">readiness</a>、diagnostics 的語意</li>
<li>設計快速穩定的 <code>/health</code></li>
<li>用 <code>/ready</code> 控制是否接新流量</li>
<li>條件啟用 pprof、runtime stats 等診斷入口</li>
<li>測試 status code 與 JSON response 合約</li>
</ol>
<hr>
<h2 id="觀察所有狀態都塞進-health-會讓監控失真">【觀察】所有狀態都塞進 health 會讓監控失真</h2>
<p>Health endpoint 的核心風險是語意混亂。若 <code>/health</code> 同時檢查 process、<a href="/blog/backend/knowledge-cards/database/" data-link-title="Database" data-link-desc="說明 database 在後端系統中如何承擔正式狀態、查詢與一致性責任">database</a>、<a href="/blog/backend/knowledge-cards/queue/" data-link-title="Queue" data-link-desc="說明 queue 如何保存等待處理的工作並形成容量邊界">queue</a>、外部 API、cache、背景同步，任何依賴短暫波動都可能讓服務被判定死亡。</p>
<p>問題範例：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">/health
</span></span><span class="line"><span class="ln">2</span><span class="cl">  ├── process alive?
</span></span><span class="line"><span class="ln">3</span><span class="cl">  ├── database reachable?
</span></span><span class="line"><span class="ln">4</span><span class="cl">  ├── queue lag small?
</span></span><span class="line"><span class="ln">5</span><span class="cl">  ├── external API reachable?
</span></span><span class="line"><span class="ln">6</span><span class="cl">  └── background sync fresh?</span></span></code></pre></div><p>這些問題不應全部塞進同一個 endpoint。Process 活著、可接流量、依賴降級、工程診斷，是不同操作訊號。</p>
<h2 id="判讀healthreadydiagnostics-回答不同問題">【判讀】health、ready、diagnostics 回答不同問題</h2>
<p>操作 endpoint 的核心設計是每個 endpoint 只回答一個問題。</p>
<table>
  <thead>
      <tr>
          <th>Endpoint</th>
          <th>使用者</th>
          <th>回答的問題</th>
          <th>失敗影響</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><code>/health</code></td>
          <td>process monitor</td>
          <td>process 是否基本活著</td>
          <td>可能重啟 process</td>
      </tr>
      <tr>
          <td><code>/ready</code></td>
          <td>load balancer</td>
          <td>是否應接新流量</td>
          <td>暫停導流</td>
      </tr>
      <tr>
          <td><code>/debug/...</code></td>
          <td>工程師</td>
          <td>服務內部狀態如何</td>
          <td>不應公開</td>
      </tr>
      <tr>
          <td><code>/metrics</code></td>
          <td><a href="/blog/backend/knowledge-cards/metrics/" data-link-title="Metrics" data-link-desc="說明指標如何描述服務趨勢、容量與健康狀態">metrics</a> collector</td>
          <td>可聚合監控資料</td>
          <td>監控缺資料</td>
      </tr>
  </tbody>
</table>
<p>這樣切分後，某個外部依賴故障不一定要讓 process 被重啟；服務可能只是不 ready，或處於 degraded 狀態。</p>
<h2 id="執行health-endpoint-應簡單快速">【執行】health endpoint 應簡單快速</h2>
<p>Health endpoint 的核心責任是快速回答 process 是否能處理基本 HTTP request。它應該簡單、快速、穩定。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="kd">func</span> <span class="nf">HandleHealth</span><span class="p">(</span><span class="nx">w</span> <span class="nx">http</span><span class="p">.</span><span class="nx">ResponseWriter</span><span class="p">,</span> <span class="nx">r</span> <span class="o">*</span><span class="nx">http</span><span class="p">.</span><span class="nx">Request</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="k">if</span> <span class="nx">r</span><span class="p">.</span><span class="nx">Method</span> <span class="o">!=</span> <span class="nx">http</span><span class="p">.</span><span class="nx">MethodGet</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">        <span class="nx">http</span><span class="p">.</span><span class="nf">Error</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span> <span class="s">&#34;method not allowed&#34;</span><span class="p">,</span> <span class="nx">http</span><span class="p">.</span><span class="nx">StatusMethodNotAllowed</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">        <span class="k">return</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">    <span class="nx">w</span><span class="p">.</span><span class="nf">Header</span><span class="p">().</span><span class="nf">Set</span><span class="p">(</span><span class="s">&#34;Content-Type&#34;</span><span class="p">,</span> <span class="s">&#34;application/json&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">    <span class="nx">w</span><span class="p">.</span><span class="nf">WriteHeader</span><span class="p">(</span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusOK</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">    <span class="nx">_</span><span class="p">,</span> <span class="nx">_</span> <span class="p">=</span> <span class="nx">w</span><span class="p">.</span><span class="nf">Write</span><span class="p">([]</span><span class="nb">byte</span><span class="p">(</span><span class="s">`{&#34;status&#34;:&#34;ok&#34;}`</span><span class="p">))</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p><code>/health</code> 不應執行昂貴查詢，也不應依賴大量下游服務。若健康檢查本身很慢，監控會把診斷工具變成新問題。</p>
<h2 id="執行readiness-控制是否接流量">【執行】readiness 控制是否接流量</h2>
<p>Readiness 的核心責任是回答「服務現在是否應該接新流量」。它可以檢查啟動狀態、必要依賴、shutdown 狀態。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="kd">type</span> <span class="nx">Readiness</span> <span class="kd">struct</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="nx">ready</span>        <span class="nx">atomic</span><span class="p">.</span><span class="nx">Bool</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">    <span class="nx">shuttingDown</span> <span class="nx">atomic</span><span class="p">.</span><span class="nx">Bool</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="kd">func</span> <span class="p">(</span><span class="nx">r</span> <span class="o">*</span><span class="nx">Readiness</span><span class="p">)</span> <span class="nf">Ready</span><span class="p">()</span> <span class="kt">bool</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">    <span class="k">return</span> <span class="nx">r</span><span class="p">.</span><span class="nx">ready</span><span class="p">.</span><span class="nf">Load</span><span class="p">()</span> <span class="o">&amp;&amp;</span> <span class="p">!</span><span class="nx">r</span><span class="p">.</span><span class="nx">shuttingDown</span><span class="p">.</span><span class="nf">Load</span><span class="p">()</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="kd">func</span> <span class="nf">HandleReady</span><span class="p">(</span><span class="nx">readiness</span> <span class="o">*</span><span class="nx">Readiness</span><span class="p">)</span> <span class="nx">http</span><span class="p">.</span><span class="nx">HandlerFunc</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">    <span class="k">return</span> <span class="kd">func</span><span class="p">(</span><span class="nx">w</span> <span class="nx">http</span><span class="p">.</span><span class="nx">ResponseWriter</span><span class="p">,</span> <span class="nx">r</span> <span class="o">*</span><span class="nx">http</span><span class="p">.</span><span class="nx">Request</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">        <span class="nx">w</span><span class="p">.</span><span class="nf">Header</span><span class="p">().</span><span class="nf">Set</span><span class="p">(</span><span class="s">&#34;Content-Type&#34;</span><span class="p">,</span> <span class="s">&#34;application/json&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">
</span></span><span class="line"><span class="ln">14</span><span class="cl">        <span class="k">if</span> <span class="p">!</span><span class="nx">readiness</span><span class="p">.</span><span class="nf">Ready</span><span class="p">()</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">            <span class="nx">w</span><span class="p">.</span><span class="nf">WriteHeader</span><span class="p">(</span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusServiceUnavailable</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl">            <span class="nx">_</span><span class="p">,</span> <span class="nx">_</span> <span class="p">=</span> <span class="nx">w</span><span class="p">.</span><span class="nf">Write</span><span class="p">([]</span><span class="nb">byte</span><span class="p">(</span><span class="s">`{&#34;status&#34;:&#34;not_ready&#34;}`</span><span class="p">))</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl">            <span class="k">return</span>
</span></span><span class="line"><span class="ln">18</span><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="ln">19</span><span class="cl">
</span></span><span class="line"><span class="ln">20</span><span class="cl">        <span class="nx">w</span><span class="p">.</span><span class="nf">WriteHeader</span><span class="p">(</span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusOK</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">21</span><span class="cl">        <span class="nx">_</span><span class="p">,</span> <span class="nx">_</span> <span class="p">=</span> <span class="nx">w</span><span class="p">.</span><span class="nf">Write</span><span class="p">([]</span><span class="nb">byte</span><span class="p">(</span><span class="s">`{&#34;status&#34;:&#34;ready&#34;}`</span><span class="p">))</span>
</span></span><span class="line"><span class="ln">22</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">23</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>服務啟動尚未完成、必要背景同步尚未就緒、或 <a href="/blog/backend/knowledge-cards/graceful-shutdown/" data-link-title="Graceful Shutdown" data-link-desc="說明服務停止前如何排空流量、完成工作與保存狀態">graceful shutdown</a> 已開始時，readiness 應回 <code>503</code>。Process 仍然活著，但不應接新流量。</p>
<h2 id="策略dependency-check-依照監控語意分層">【策略】dependency check 依照監控語意分層</h2>
<p>依賴檢查的核心判斷是故障是否代表 process 應重啟。Database 暫時不可用不一定代表 process 壞掉；重啟可能無法修復，反而造成更多負載。</p>
<p>建議分層：</p>
<ul>
<li><code>/health</code>：只確認 process alive。</li>
<li><code>/ready</code>：確認必要依賴是否足以接新流量。</li>
<li><code>/diagnostics/dependencies</code>：提供工程師查看細節。</li>
</ul>
<p>診斷 response 可以包含穩定欄位：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-json" data-lang="json"><span class="line"><span class="ln">1</span><span class="cl"><span class="p">{</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">  <span class="nt">&#34;status&#34;</span><span class="p">:</span> <span class="s2">&#34;degraded&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">  <span class="nt">&#34;dependencies&#34;</span><span class="p">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">    <span class="nt">&#34;database&#34;</span><span class="p">:</span> <span class="s2">&#34;ok&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">    <span class="nt">&#34;queue&#34;</span><span class="p">:</span> <span class="s2">&#34;lagging&#34;</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl">  <span class="p">}</span>
</span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>監控應依賴 status code 與穩定欄位，工程師再用 body 細節診斷問題。自由文字可以輔助閱讀，但不應成為監控規則的依據。</p>
<h2 id="執行diagnostics-endpoint-要條件啟用">【執行】diagnostics endpoint 要條件啟用</h2>
<p>Diagnostics endpoint 的核心用途是提供工程師排查問題的資料。pprof、runtime metrics、internal queue length、goroutine count 都屬於這類。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="kd">func</span> <span class="nf">RegisterDiagnostics</span><span class="p">(</span><span class="nx">mux</span> <span class="o">*</span><span class="nx">http</span><span class="p">.</span><span class="nx">ServeMux</span><span class="p">,</span> <span class="nx">enabled</span> <span class="kt">bool</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="k">if</span> <span class="p">!</span><span class="nx">enabled</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">        <span class="k">return</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">    <span class="nx">mux</span><span class="p">.</span><span class="nf">HandleFunc</span><span class="p">(</span><span class="s">&#34;/debug/runtime&#34;</span><span class="p">,</span> <span class="nx">HandleRuntimeStats</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="kd">func</span> <span class="nf">HandleRuntimeStats</span><span class="p">(</span><span class="nx">w</span> <span class="nx">http</span><span class="p">.</span><span class="nx">ResponseWriter</span><span class="p">,</span> <span class="nx">r</span> <span class="o">*</span><span class="nx">http</span><span class="p">.</span><span class="nx">Request</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">    <span class="kd">var</span> <span class="nx">stats</span> <span class="nx">runtime</span><span class="p">.</span><span class="nx">MemStats</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">    <span class="nx">runtime</span><span class="p">.</span><span class="nf">ReadMemStats</span><span class="p">(</span><span class="o">&amp;</span><span class="nx">stats</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">
</span></span><span class="line"><span class="ln">13</span><span class="cl">    <span class="nx">response</span> <span class="o">:=</span> <span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">any</span><span class="p">{</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">        <span class="s">&#34;heap_alloc&#34;</span><span class="p">:</span>  <span class="nx">stats</span><span class="p">.</span><span class="nx">HeapAlloc</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">        <span class="s">&#34;num_gc&#34;</span><span class="p">:</span>      <span class="nx">stats</span><span class="p">.</span><span class="nx">NumGC</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl">        <span class="s">&#34;goroutines&#34;</span><span class="p">:</span>  <span class="nx">runtime</span><span class="p">.</span><span class="nf">NumGoroutine</span><span class="p">(),</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">18</span><span class="cl">
</span></span><span class="line"><span class="ln">19</span><span class="cl">    <span class="nx">_</span> <span class="p">=</span> <span class="nx">json</span><span class="p">.</span><span class="nf">NewEncoder</span><span class="p">(</span><span class="nx">w</span><span class="p">).</span><span class="nf">Encode</span><span class="p">(</span><span class="nx">response</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">20</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>Diagnostics 可能揭露內部狀態、記憶體資訊、goroutine 數量、路徑與部署細節，不應公開給一般使用者。若需要長期保留，至少應限制在內網、管理 port、認證或防火牆後。</p>
<h2 id="判讀status-code-是監控合約">【判讀】status code 是監控合約</h2>
<p>健康檢查的核心合約是 status code。監控系統通常先看 HTTP code 與 <a href="/blog/backend/knowledge-cards/timeout/" data-link-title="Timeout" data-link-desc="說明等待外部操作的時間上限如何保護資源與使用者體驗">timeout</a>，不會理解複雜 body。</p>
<table>
  <thead>
      <tr>
          <th>狀態</th>
          <th>意義</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><code>200 OK</code></td>
          <td>符合該 endpoint 的健康條件</td>
      </tr>
      <tr>
          <td><code>503 Service Unavailable</code></td>
          <td>暫時不可用或不應接流量</td>
      </tr>
      <tr>
          <td><code>405 Method Not Allowed</code></td>
          <td>呼叫方式錯誤</td>
      </tr>
      <tr>
          <td>timeout</td>
          <td>endpoint 無法在預期時間內回應</td>
      </tr>
  </tbody>
</table>
<p>Body 可以提供人類可讀資訊，但不應讓監控依賴自由文字。若要機器讀取，使用穩定 JSON 欄位，例如 <code>status</code>、<code>reason</code>、<code>dependencies</code>。</p>
<h2 id="測試endpoint-測試要鎖定-status-code">【測試】endpoint 測試要鎖定 status code</h2>
<p>Endpoint 測試的核心是驗證 status code 與穩定 JSON 欄位，而不是完整自由文字。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="kd">func</span> <span class="nf">TestReadyReturnsUnavailableWhenShuttingDown</span><span class="p">(</span><span class="nx">t</span> <span class="o">*</span><span class="nx">testing</span><span class="p">.</span><span class="nx">T</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="nx">readiness</span> <span class="o">:=</span> <span class="o">&amp;</span><span class="nx">Readiness</span><span class="p">{}</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">    <span class="nx">readiness</span><span class="p">.</span><span class="nx">ready</span><span class="p">.</span><span class="nf">Store</span><span class="p">(</span><span class="kc">true</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">    <span class="nx">readiness</span><span class="p">.</span><span class="nx">shuttingDown</span><span class="p">.</span><span class="nf">Store</span><span class="p">(</span><span class="kc">true</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">    <span class="nx">req</span> <span class="o">:=</span> <span class="nx">httptest</span><span class="p">.</span><span class="nf">NewRequest</span><span class="p">(</span><span class="nx">http</span><span class="p">.</span><span class="nx">MethodGet</span><span class="p">,</span> <span class="s">&#34;/ready&#34;</span><span class="p">,</span> <span class="kc">nil</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">    <span class="nx">rec</span> <span class="o">:=</span> <span class="nx">httptest</span><span class="p">.</span><span class="nf">NewRecorder</span><span class="p">()</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">    <span class="nf">HandleReady</span><span class="p">(</span><span class="nx">readiness</span><span class="p">).</span><span class="nf">ServeHTTP</span><span class="p">(</span><span class="nx">rec</span><span class="p">,</span> <span class="nx">req</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">
</span></span><span class="line"><span class="ln">11</span><span class="cl">    <span class="k">if</span> <span class="nx">rec</span><span class="p">.</span><span class="nx">Code</span> <span class="o">!=</span> <span class="nx">http</span><span class="p">.</span><span class="nx">StatusServiceUnavailable</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">        <span class="nx">t</span><span class="p">.</span><span class="nf">Fatalf</span><span class="p">(</span><span class="s">&#34;status = %d, want %d&#34;</span><span class="p">,</span> <span class="nx">rec</span><span class="p">.</span><span class="nx">Code</span><span class="p">,</span> <span class="nx">http</span><span class="p">.</span><span class="nx">StatusServiceUnavailable</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>Diagnostics endpoint 也應測 gate 關閉時不註冊或回 404，避免診斷入口不小心暴露。</p>
<h2 id="本章不處理">本章不處理</h2>
<p>本章先處理 health、readiness 與 diagnostics 的語意切分；Prometheus、OpenTelemetry 與平台設定，會在下列章節再往外延伸：</p>
<ul>
<li><a href="/blog/go-advanced/07-distributed-operations/observability-pipeline/" data-link-title="7.4 Observability pipeline、metrics 與 tracing" data-link-desc="把 structured log、metric、trace 與 profile 組成可操作的診斷系統">Go 進階：Observability pipeline、metrics 與 tracing</a></li>
</ul>
<h2 id="和-go-教材的關係">和 Go 教材的關係</h2>
<p>這一章承接的是 pprof、runtime metrics 與 deploy readiness；如果你要先回看語言教材，可以讀：</p>
<ul>
<li><a href="/blog/go-advanced/03-runtime-profiling/pprof/" data-link-title="3.2 pprof 基礎診斷流程" data-link-desc="用 pprof endpoint 診斷 heap、goroutine 與 CPU 問題">Go：pprof 基礎診斷流程</a></li>
<li><a href="/blog/go-advanced/03-runtime-profiling/gc-memory-limit/" data-link-title="3.1 GC 與 memory limit" data-link-desc="理解 debug.SetMemoryLimit 在長時間服務中的用途">Go：GC 與 memory limit</a></li>
<li><a href="/blog/go/03-stdlib/slog/" data-link-title="3.6 log/slog：結構化日誌" data-link-desc="用 key-value log 設計可查詢、可過濾的程式訊號">Go：結構化日誌</a></li>
<li><a href="/blog/backend/knowledge-cards/graceful-shutdown/" data-link-title="Graceful Shutdown" data-link-desc="說明服務停止前如何排空流量、完成工作與保存狀態">Go：graceful shutdown 與 signal handling</a></li>
<li><a href="/blog/go-advanced/07-distributed-operations/deployment-contracts/" data-link-title="7.5 Kubernetes、systemd 與 load balancer 合約" data-link-desc="理解部署平台如何影響 Go 服務的 shutdown、health 與資源限制">Go 進階：Kubernetes、systemd 與 load balancer 合約</a></li>
<li><a href="/blog/backend/04-observability/" data-link-title="模組四：可觀測性平台" data-link-desc="整理 log、metric、trace、dashboard 與 alert 的後端操作實務">Backend：可觀測性平台</a></li>
<li><a href="/blog/backend/05-deployment-platform/" data-link-title="模組五：部署平台與網路入口" data-link-desc="整理 Kubernetes、systemd、load balancer、container 與服務生命週期合約">Backend：部署平台與網路入口</a></li>
</ul>
<h2 id="小結">小結</h2>
<p><code>/health</code>、<code>/ready</code>、diagnostics endpoint 解決不同問題。Health 檢查 process 基本可用性，readiness 控制是否接新流量，diagnostics 支援工程排查且應限制存取。Status code 是監控合約，JSON body 是補充細節；把這些訊號混在一起會讓操作判斷與安全邊界都變模糊。</p>
]]></content:encoded></item><item><title>6.3 結構化日誌欄位設計</title><link>https://tarrragon.github.io/blog/go-advanced/06-production-operations/log-fields/</link><pubDate>Wed, 22 Apr 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/go-advanced/06-production-operations/log-fields/</guid><description>&lt;p>結構化日誌欄位的核心目標是讓 &lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/log/" data-link-title="Log" data-link-desc="說明 log 如何記錄單一事件的上下文並支援事故排查">log&lt;/a> 可查詢、可聚合、可追蹤。Message 給人讀，欄位給系統查；重要資訊應放在穩定欄位，不應只藏在自由文字裡。&lt;/p>
&lt;h2 id="本章目標">本章目標&lt;/h2>
&lt;p>學完本章後，你將能夠：&lt;/p>
&lt;ol>
&lt;li>設計穩定 &lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/log-schema/" data-link-title="Log Schema" data-link-desc="說明結構化 log 欄位如何支援搜尋、關聯與事故排查">log schema&lt;/a>&lt;/li>
&lt;li>用 &lt;code>layer&lt;/code>、&lt;code>request_id&lt;/code>、&lt;code>event_type&lt;/code>、&lt;code>reason&lt;/code> 支援查詢&lt;/li>
&lt;li>區分 message 與 structured fields 的責任&lt;/li>
&lt;li>避免重複記錄同一個錯誤&lt;/li>
&lt;li>避免把敏感資料寫進 log&lt;/li>
&lt;/ol>
&lt;hr>
&lt;h2 id="觀察自由文字-log-很難查詢">【觀察】自由文字 log 很難查詢&lt;/h2>
&lt;p>Log 設計的核心問題是事故發生時需要快速查詢。若所有資訊都在 message 裡，查詢只能依賴模糊字串。&lt;/p>
&lt;p>不穩定 log：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="nx">logger&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Info&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;event accepted for user 123 request abc&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>這行給人看可以，但系統很難穩定查 &lt;code>request_id=abc&lt;/code> 或 &lt;code>user_id=123&lt;/code>。不同工程師改字句後，查詢就可能失效。&lt;/p>
&lt;p>結構化 log：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="nx">logger&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Info&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;event accepted&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl"> &lt;span class="s">&amp;#34;layer&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;http&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl"> &lt;span class="s">&amp;#34;request_id&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">requestID&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl"> &lt;span class="s">&amp;#34;user_id&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">userID&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl"> &lt;span class="s">&amp;#34;event_type&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">event&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Type&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">6&lt;/span>&lt;span class="cl">&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Message 描述發生什麼事，欄位提供可查詢資料。這是 log schema 的基本分工。&lt;/p>
&lt;h2 id="判讀log-schema-是查詢合約">【判讀】log schema 是查詢合約&lt;/h2>
&lt;p>Log schema 的核心規則是欄位名稱與值集合要穩定。&lt;code>request_id&lt;/code>、&lt;code>requestID&lt;/code>、&lt;code>rid&lt;/code> 混用會讓查詢與儀表板變得困難。&lt;/p>
&lt;p>常用欄位：&lt;/p>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>欄位&lt;/th>
 &lt;th>用途&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>&lt;code>layer&lt;/code>&lt;/td>
 &lt;td>問題發生在哪個系統層&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;code>request_id&lt;/code>&lt;/td>
 &lt;td>串起單次 HTTP request&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;code>event_id&lt;/code>&lt;/td>
 &lt;td>串起事件處理流程&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;code>event_type&lt;/code>&lt;/td>
 &lt;td>聚合某類 domain event&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;code>client_id&lt;/code>&lt;/td>
 &lt;td>查 &lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/websocket/" data-link-title="WebSocket" data-link-desc="說明 WebSocket 如何提供長連線雙向即時通訊">WebSocket&lt;/a> client 行為&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;code>topic&lt;/code>&lt;/td>
 &lt;td>查訂閱或推送範圍&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;code>reason&lt;/code>&lt;/td>
 &lt;td>聚合失敗原因&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;code>error&lt;/code>&lt;/td>
 &lt;td>保存錯誤文字&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>欄位不需要很多，但要一致。穩定欄位能讓除錯從「讀一堆文字」變成「查一組條件」。&lt;/p>
&lt;h2 id="執行layer-表示發生位置">【執行】layer 表示發生位置&lt;/h2>
&lt;p>&lt;code>layer&lt;/code> 的核心用途是標示 log 來自哪個系統層，協助工程師快速縮小問題範圍。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="nx">logger&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Warn&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;queue full&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl"> &lt;span class="s">&amp;#34;layer&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;worker&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl"> &lt;span class="s">&amp;#34;queue&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;events&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl"> &lt;span class="s">&amp;#34;reason&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;buffer_full&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl">&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>常見 layer：&lt;/p>
&lt;ul>
&lt;li>&lt;code>http&lt;/code>&lt;/li>
&lt;li>&lt;code>websocket&lt;/code>&lt;/li>
&lt;li>&lt;code>worker&lt;/code>&lt;/li>
&lt;li>&lt;code>repository&lt;/code>&lt;/li>
&lt;li>&lt;code>runtime&lt;/code>&lt;/li>
&lt;li>&lt;code>diagnostics&lt;/code>&lt;/li>
&lt;/ul>
&lt;p>名稱不需要多，但應穩定。若 &lt;code>worker&lt;/code>、&lt;code>background&lt;/code>、&lt;code>job_runner&lt;/code> 混用，查詢就會變麻煩。&lt;/p>
&lt;h2 id="策略correlation-id-串起一次流程">【策略】correlation ID 串起一次流程&lt;/h2>
&lt;p>&lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/correlation-id/" data-link-title="Correlation ID" data-link-desc="說明跨事件或跨服務的關聯識別碼如何支援排障">Correlation ID&lt;/a> 的核心目標是把同一次請求或同一個事件流串起來。HTTP request 常用 &lt;code>request_id&lt;/code>，背景事件可以用 &lt;code>event_id&lt;/code> 或 &lt;code>trace_id&lt;/code>。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">WithRequestLog&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">r&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">http&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Request&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">logger&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">slog&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Logger&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">slog&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Logger&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl"> &lt;span class="nx">requestID&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">r&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Header&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Get&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;X-Request-ID&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="nx">requestID&lt;/span> &lt;span class="o">==&lt;/span> &lt;span class="s">&amp;#34;&amp;#34;&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl"> &lt;span class="nx">requestID&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="nx">uuid&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">NewString&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">6&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">7&lt;/span>&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">logger&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">With&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;request_id&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">requestID&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">8&lt;/span>&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>後續 handler、service、repository 都使用帶有 &lt;code>request_id&lt;/code> 的 logger。查詢單次流程時，不需要靠時間範圍猜哪些 log 相關。&lt;/p>
&lt;p>Correlation ID 不應包含敏感資料。它是追蹤用識別碼，不是使用者資料容器。&lt;/p>
&lt;h2 id="執行reason-欄位讓失敗可統計">【執行】reason 欄位讓失敗可統計&lt;/h2>
&lt;p>&lt;code>reason&lt;/code> 的核心用途是把錯誤原因變成可聚合分類。Message 可以給人讀，reason 給查詢與統計使用。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="nx">logger&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Warn&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;reject event&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl"> &lt;span class="s">&amp;#34;layer&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;http&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl"> &lt;span class="s">&amp;#34;reason&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;invalid_payload&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl"> &lt;span class="s">&amp;#34;event_type&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">event&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Type&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl">&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>穩定 reason 可以回答「最近一小時最多的拒絕原因是什麼」。如果原因只寫在 message 中，查詢會依賴模糊字串比對。&lt;/p>
&lt;p>Reason 值應像 enum 一樣維持小集合，例如：&lt;/p>
&lt;ul>
&lt;li>&lt;code>invalid_payload&lt;/code>&lt;/li>
&lt;li>&lt;code>queue_full&lt;/code>&lt;/li>
&lt;li>&lt;code>permission_denied&lt;/code>&lt;/li>
&lt;li>&lt;code>timeout&lt;/code>&lt;/li>
&lt;li>&lt;code>client_disconnected&lt;/code>&lt;/li>
&lt;li>&lt;code>dependency_unavailable&lt;/code>&lt;/li>
&lt;/ul>
&lt;p>&lt;code>reason&lt;/code> 應維持小集合分類，完整錯誤應放在 &lt;code>error&lt;/code> 欄位。這樣監控可以穩定聚合原因，工程師仍能從錯誤欄位取得診斷細節。&lt;/p>
&lt;h2 id="判讀錯誤只在負責處理的邊界記一次">【判讀】錯誤只在負責處理的邊界記一次&lt;/h2>
&lt;p>錯誤日誌的核心風險是同一個錯誤被每一層都記一次。這會放大噪音，讓事故時很難看出真正的失敗點。&lt;/p>
&lt;p>反模式：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="nx">logger&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Error&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;repository failed&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;error&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">err&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="k">return&lt;/span> &lt;span class="nx">fmt&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Errorf&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;save notification: %w&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">err&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>上層又記一次：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="nx">logger&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Error&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;request failed&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;error&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">err&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>較清楚的做法是底層 wrap error，上層在決定 response 或重試策略的邊界記錄一次：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="k">if&lt;/span> &lt;span class="nx">err&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">service&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Create&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">ctx&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">cmd&lt;/span>&lt;span class="p">);&lt;/span> &lt;span class="nx">err&lt;/span> &lt;span class="o">!=&lt;/span> &lt;span class="kc">nil&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl"> &lt;span class="nx">logger&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Warn&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;create notification failed&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl"> &lt;span class="s">&amp;#34;layer&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;http&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl"> &lt;span class="s">&amp;#34;reason&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nf">reasonOf&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">err&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl"> &lt;span class="s">&amp;#34;error&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">err&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">6&lt;/span>&lt;span class="cl"> &lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">7&lt;/span>&lt;span class="cl"> &lt;span class="nf">writeError&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">w&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">err&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">8&lt;/span>&lt;span class="cl"> &lt;span class="k">return&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">9&lt;/span>&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>底層若有必要補充脈絡，優先透過 error wrapping 或 structured error，而不是每層都 &lt;code>Error&lt;/code> log。&lt;/p></description><content:encoded><![CDATA[<p>結構化日誌欄位的核心目標是讓 <a href="/blog/backend/knowledge-cards/log/" data-link-title="Log" data-link-desc="說明 log 如何記錄單一事件的上下文並支援事故排查">log</a> 可查詢、可聚合、可追蹤。Message 給人讀，欄位給系統查；重要資訊應放在穩定欄位，不應只藏在自由文字裡。</p>
<h2 id="本章目標">本章目標</h2>
<p>學完本章後，你將能夠：</p>
<ol>
<li>設計穩定 <a href="/blog/backend/knowledge-cards/log-schema/" data-link-title="Log Schema" data-link-desc="說明結構化 log 欄位如何支援搜尋、關聯與事故排查">log schema</a></li>
<li>用 <code>layer</code>、<code>request_id</code>、<code>event_type</code>、<code>reason</code> 支援查詢</li>
<li>區分 message 與 structured fields 的責任</li>
<li>避免重複記錄同一個錯誤</li>
<li>避免把敏感資料寫進 log</li>
</ol>
<hr>
<h2 id="觀察自由文字-log-很難查詢">【觀察】自由文字 log 很難查詢</h2>
<p>Log 設計的核心問題是事故發生時需要快速查詢。若所有資訊都在 message 裡，查詢只能依賴模糊字串。</p>
<p>不穩定 log：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln">1</span><span class="cl"><span class="nx">logger</span><span class="p">.</span><span class="nf">Info</span><span class="p">(</span><span class="s">&#34;event accepted for user 123 request abc&#34;</span><span class="p">)</span></span></span></code></pre></div><p>這行給人看可以，但系統很難穩定查 <code>request_id=abc</code> 或 <code>user_id=123</code>。不同工程師改字句後，查詢就可能失效。</p>
<p>結構化 log：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln">1</span><span class="cl"><span class="nx">logger</span><span class="p">.</span><span class="nf">Info</span><span class="p">(</span><span class="s">&#34;event accepted&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">    <span class="s">&#34;layer&#34;</span><span class="p">,</span> <span class="s">&#34;http&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">    <span class="s">&#34;request_id&#34;</span><span class="p">,</span> <span class="nx">requestID</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">    <span class="s">&#34;user_id&#34;</span><span class="p">,</span> <span class="nx">userID</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">    <span class="s">&#34;event_type&#34;</span><span class="p">,</span> <span class="nx">event</span><span class="p">.</span><span class="nx">Type</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="p">)</span></span></span></code></pre></div><p>Message 描述發生什麼事，欄位提供可查詢資料。這是 log schema 的基本分工。</p>
<h2 id="判讀log-schema-是查詢合約">【判讀】log schema 是查詢合約</h2>
<p>Log schema 的核心規則是欄位名稱與值集合要穩定。<code>request_id</code>、<code>requestID</code>、<code>rid</code> 混用會讓查詢與儀表板變得困難。</p>
<p>常用欄位：</p>
<table>
  <thead>
      <tr>
          <th>欄位</th>
          <th>用途</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><code>layer</code></td>
          <td>問題發生在哪個系統層</td>
      </tr>
      <tr>
          <td><code>request_id</code></td>
          <td>串起單次 HTTP request</td>
      </tr>
      <tr>
          <td><code>event_id</code></td>
          <td>串起事件處理流程</td>
      </tr>
      <tr>
          <td><code>event_type</code></td>
          <td>聚合某類 domain event</td>
      </tr>
      <tr>
          <td><code>client_id</code></td>
          <td>查 <a href="/blog/backend/knowledge-cards/websocket/" data-link-title="WebSocket" data-link-desc="說明 WebSocket 如何提供長連線雙向即時通訊">WebSocket</a> client 行為</td>
      </tr>
      <tr>
          <td><code>topic</code></td>
          <td>查訂閱或推送範圍</td>
      </tr>
      <tr>
          <td><code>reason</code></td>
          <td>聚合失敗原因</td>
      </tr>
      <tr>
          <td><code>error</code></td>
          <td>保存錯誤文字</td>
      </tr>
  </tbody>
</table>
<p>欄位不需要很多，但要一致。穩定欄位能讓除錯從「讀一堆文字」變成「查一組條件」。</p>
<h2 id="執行layer-表示發生位置">【執行】layer 表示發生位置</h2>
<p><code>layer</code> 的核心用途是標示 log 來自哪個系統層，協助工程師快速縮小問題範圍。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln">1</span><span class="cl"><span class="nx">logger</span><span class="p">.</span><span class="nf">Warn</span><span class="p">(</span><span class="s">&#34;queue full&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">    <span class="s">&#34;layer&#34;</span><span class="p">,</span> <span class="s">&#34;worker&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">    <span class="s">&#34;queue&#34;</span><span class="p">,</span> <span class="s">&#34;events&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">    <span class="s">&#34;reason&#34;</span><span class="p">,</span> <span class="s">&#34;buffer_full&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="p">)</span></span></span></code></pre></div><p>常見 layer：</p>
<ul>
<li><code>http</code></li>
<li><code>websocket</code></li>
<li><code>worker</code></li>
<li><code>repository</code></li>
<li><code>runtime</code></li>
<li><code>diagnostics</code></li>
</ul>
<p>名稱不需要多，但應穩定。若 <code>worker</code>、<code>background</code>、<code>job_runner</code> 混用，查詢就會變麻煩。</p>
<h2 id="策略correlation-id-串起一次流程">【策略】correlation ID 串起一次流程</h2>
<p><a href="/blog/backend/knowledge-cards/correlation-id/" data-link-title="Correlation ID" data-link-desc="說明跨事件或跨服務的關聯識別碼如何支援排障">Correlation ID</a> 的核心目標是把同一次請求或同一個事件流串起來。HTTP request 常用 <code>request_id</code>，背景事件可以用 <code>event_id</code> 或 <code>trace_id</code>。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln">1</span><span class="cl"><span class="kd">func</span> <span class="nf">WithRequestLog</span><span class="p">(</span><span class="nx">r</span> <span class="o">*</span><span class="nx">http</span><span class="p">.</span><span class="nx">Request</span><span class="p">,</span> <span class="nx">logger</span> <span class="o">*</span><span class="nx">slog</span><span class="p">.</span><span class="nx">Logger</span><span class="p">)</span> <span class="o">*</span><span class="nx">slog</span><span class="p">.</span><span class="nx">Logger</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">    <span class="nx">requestID</span> <span class="o">:=</span> <span class="nx">r</span><span class="p">.</span><span class="nx">Header</span><span class="p">.</span><span class="nf">Get</span><span class="p">(</span><span class="s">&#34;X-Request-ID&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">    <span class="k">if</span> <span class="nx">requestID</span> <span class="o">==</span> <span class="s">&#34;&#34;</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">        <span class="nx">requestID</span> <span class="p">=</span> <span class="nx">uuid</span><span class="p">.</span><span class="nf">NewString</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl">
</span></span><span class="line"><span class="ln">7</span><span class="cl">    <span class="k">return</span> <span class="nx">logger</span><span class="p">.</span><span class="nf">With</span><span class="p">(</span><span class="s">&#34;request_id&#34;</span><span class="p">,</span> <span class="nx">requestID</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">8</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>後續 handler、service、repository 都使用帶有 <code>request_id</code> 的 logger。查詢單次流程時，不需要靠時間範圍猜哪些 log 相關。</p>
<p>Correlation ID 不應包含敏感資料。它是追蹤用識別碼，不是使用者資料容器。</p>
<h2 id="執行reason-欄位讓失敗可統計">【執行】reason 欄位讓失敗可統計</h2>
<p><code>reason</code> 的核心用途是把錯誤原因變成可聚合分類。Message 可以給人讀，reason 給查詢與統計使用。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln">1</span><span class="cl"><span class="nx">logger</span><span class="p">.</span><span class="nf">Warn</span><span class="p">(</span><span class="s">&#34;reject event&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">    <span class="s">&#34;layer&#34;</span><span class="p">,</span> <span class="s">&#34;http&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">    <span class="s">&#34;reason&#34;</span><span class="p">,</span> <span class="s">&#34;invalid_payload&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">    <span class="s">&#34;event_type&#34;</span><span class="p">,</span> <span class="nx">event</span><span class="p">.</span><span class="nx">Type</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="p">)</span></span></span></code></pre></div><p>穩定 reason 可以回答「最近一小時最多的拒絕原因是什麼」。如果原因只寫在 message 中，查詢會依賴模糊字串比對。</p>
<p>Reason 值應像 enum 一樣維持小集合，例如：</p>
<ul>
<li><code>invalid_payload</code></li>
<li><code>queue_full</code></li>
<li><code>permission_denied</code></li>
<li><code>timeout</code></li>
<li><code>client_disconnected</code></li>
<li><code>dependency_unavailable</code></li>
</ul>
<p><code>reason</code> 應維持小集合分類，完整錯誤應放在 <code>error</code> 欄位。這樣監控可以穩定聚合原因，工程師仍能從錯誤欄位取得診斷細節。</p>
<h2 id="判讀錯誤只在負責處理的邊界記一次">【判讀】錯誤只在負責處理的邊界記一次</h2>
<p>錯誤日誌的核心風險是同一個錯誤被每一層都記一次。這會放大噪音，讓事故時很難看出真正的失敗點。</p>
<p>反模式：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln">1</span><span class="cl"><span class="nx">logger</span><span class="p">.</span><span class="nf">Error</span><span class="p">(</span><span class="s">&#34;repository failed&#34;</span><span class="p">,</span> <span class="s">&#34;error&#34;</span><span class="p">,</span> <span class="nx">err</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="k">return</span> <span class="nx">fmt</span><span class="p">.</span><span class="nf">Errorf</span><span class="p">(</span><span class="s">&#34;save notification: %w&#34;</span><span class="p">,</span> <span class="nx">err</span><span class="p">)</span></span></span></code></pre></div><p>上層又記一次：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln">1</span><span class="cl"><span class="nx">logger</span><span class="p">.</span><span class="nf">Error</span><span class="p">(</span><span class="s">&#34;request failed&#34;</span><span class="p">,</span> <span class="s">&#34;error&#34;</span><span class="p">,</span> <span class="nx">err</span><span class="p">)</span></span></span></code></pre></div><p>較清楚的做法是底層 wrap error，上層在決定 response 或重試策略的邊界記錄一次：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">if</span> <span class="nx">err</span> <span class="o">:=</span> <span class="nx">service</span><span class="p">.</span><span class="nf">Create</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span> <span class="nx">cmd</span><span class="p">);</span> <span class="nx">err</span> <span class="o">!=</span> <span class="kc">nil</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">    <span class="nx">logger</span><span class="p">.</span><span class="nf">Warn</span><span class="p">(</span><span class="s">&#34;create notification failed&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">        <span class="s">&#34;layer&#34;</span><span class="p">,</span> <span class="s">&#34;http&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">        <span class="s">&#34;reason&#34;</span><span class="p">,</span> <span class="nf">reasonOf</span><span class="p">(</span><span class="nx">err</span><span class="p">),</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">        <span class="s">&#34;error&#34;</span><span class="p">,</span> <span class="nx">err</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl">    <span class="p">)</span>
</span></span><span class="line"><span class="ln">7</span><span class="cl">    <span class="nf">writeError</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span> <span class="nx">err</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">8</span><span class="cl">    <span class="k">return</span>
</span></span><span class="line"><span class="ln">9</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>底層若有必要補充脈絡，優先透過 error wrapping 或 structured error，而不是每層都 <code>Error</code> log。</p>
<h2 id="策略敏感資料不進-log">【策略】敏感資料不進 log</h2>
<p>Log 欄位設計的核心安全邊界是只記錄診斷必要資料。token、密碼、完整 cookie、完整個資與機密 payload 都屬於應排除資料；結構化 log 很容易被集中保存與搜尋，敏感資料一旦進入 log，清理成本很高。</p>
<p>可以記錄：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln">1</span><span class="cl"><span class="nx">logger</span><span class="p">.</span><span class="nf">Info</span><span class="p">(</span><span class="s">&#34;user login&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">    <span class="s">&#34;user_id&#34;</span><span class="p">,</span> <span class="nx">user</span><span class="p">.</span><span class="nx">ID</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="p">)</span></span></span></code></pre></div><p>應排除：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln">1</span><span class="cl"><span class="nx">logger</span><span class="p">.</span><span class="nf">Info</span><span class="p">(</span><span class="s">&#34;user login&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">    <span class="s">&#34;password&#34;</span><span class="p">,</span> <span class="nx">password</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">    <span class="s">&#34;token&#34;</span><span class="p">,</span> <span class="nx">token</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="p">)</span></span></span></code></pre></div><p>若需要診斷 payload，可記錄長度、hash、欄位是否存在，而不是完整內容。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln">1</span><span class="cl"><span class="nx">logger</span><span class="p">.</span><span class="nf">Debug</span><span class="p">(</span><span class="s">&#34;payload received&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">    <span class="s">&#34;payload_bytes&#34;</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="nx">body</span><span class="p">),</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">    <span class="s">&#34;payload_sha256&#34;</span><span class="p">,</span> <span class="nf">checksum</span><span class="p">(</span><span class="nx">body</span><span class="p">),</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="p">)</span></span></span></code></pre></div><p>所有會被收集或保存的 log 都應遵守同一套資料保護規則。Debug log 也會進入檔案、集中式 log 或診斷封包，因此不能把它當成敏感資料的例外通道。</p>
<h2 id="測試log-欄位可以用-handler-驗證">【測試】log 欄位可以用 handler 驗證</h2>
<p>Log schema 的測試核心是確認重要欄位存在，避免未來重構時消失。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="kd">func</span> <span class="nf">TestLogAttrsForEvent</span><span class="p">(</span><span class="nx">t</span> <span class="o">*</span><span class="nx">testing</span><span class="p">.</span><span class="nx">T</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="nx">event</span> <span class="o">:=</span> <span class="nx">DomainEvent</span><span class="p">{</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">        <span class="nx">ID</span><span class="p">:</span>        <span class="s">&#34;evt_1&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">        <span class="nx">Type</span><span class="p">:</span>      <span class="s">&#34;notification.created&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">        <span class="nx">SubjectID</span><span class="p">:</span> <span class="s">&#34;notification_1&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">    <span class="nx">attrs</span> <span class="o">:=</span> <span class="nf">LogAttrsForEvent</span><span class="p">(</span><span class="nx">event</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">
</span></span><span class="line"><span class="ln">10</span><span class="cl">    <span class="k">if</span> <span class="p">!</span><span class="nf">hasAttr</span><span class="p">(</span><span class="nx">attrs</span><span class="p">,</span> <span class="s">&#34;event_id&#34;</span><span class="p">,</span> <span class="s">&#34;evt_1&#34;</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">        <span class="nx">t</span><span class="p">.</span><span class="nf">Fatalf</span><span class="p">(</span><span class="s">&#34;event_id attr missing&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">    <span class="k">if</span> <span class="p">!</span><span class="nf">hasAttr</span><span class="p">(</span><span class="nx">attrs</span><span class="p">,</span> <span class="s">&#34;event_type&#34;</span><span class="p">,</span> <span class="s">&#34;notification.created&#34;</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">        <span class="nx">t</span><span class="p">.</span><span class="nf">Fatalf</span><span class="p">(</span><span class="s">&#34;event_type attr missing&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>不需要測整行 log 字串。測穩定欄位即可，message 文字可以保留一定調整空間。</p>
<h2 id="本章不處理">本章不處理</h2>
<p>本章先處理 Go 服務內部的 structured log schema；集中式平台、欄位標準與隱私治理，會在下列章節再往外延伸：</p>
<ul>
<li><a href="/blog/go-advanced/07-distributed-operations/observability-pipeline/" data-link-title="7.4 Observability pipeline、metrics 與 tracing" data-link-desc="把 structured log、metric、trace 與 profile 組成可操作的診斷系統">Go 進階：Observability pipeline、metrics 與 tracing</a></li>
</ul>
<h2 id="和-go-教材的關係">和 Go 教材的關係</h2>
<p>這一章承接的是 structured recording、<a href="/blog/backend/knowledge-cards/event-log/" data-link-title="Event Log" data-link-desc="說明事件歷史如何保存、重播與支援跨服務資料重建">event log</a> 與 observability pipeline；如果你要先回看語言教材，可以讀：</p>
<ul>
<li><a href="/blog/go/06-practical/structured-recording/" data-link-title="6.5 如何新增結構化記錄欄位" data-link-desc="區分 operational log、domain event log 與狀態資料">Go：如何新增結構化記錄欄位</a></li>
<li><a href="/blog/go/03-stdlib/slog/" data-link-title="3.6 log/slog：結構化日誌" data-link-desc="用 key-value log 設計可查詢、可過濾的程式訊號">Go：結構化日誌</a></li>
<li><a href="/blog/go/06-practical/new-event-type/" data-link-title="6.2 如何新增一種 domain event" data-link-desc="擴展事件常數、輸入驗證與處理流程">Go：如何新增一種 domain event</a></li>
<li><a href="/blog/go-advanced/07-distributed-operations/observability-pipeline/" data-link-title="7.4 Observability pipeline、metrics 與 tracing" data-link-desc="把 structured log、metric、trace 與 profile 組成可操作的診斷系統">Go：Observability pipeline、metrics 與 tracing</a></li>
<li><a href="/blog/backend/04-observability/" data-link-title="模組四：可觀測性平台" data-link-desc="整理 log、metric、trace、dashboard 與 alert 的後端操作實務">Backend：可觀測性平台</a></li>
<li><a href="/blog/go/03-stdlib/slog/" data-link-title="3.6 log/slog：結構化日誌" data-link-desc="用 key-value log 設計可查詢、可過濾的程式訊號">Go 入門：log/slog</a></li>
<li><a href="/blog/go/06-practical/structured-recording/" data-link-title="6.5 如何新增結構化記錄欄位" data-link-desc="區分 operational log、domain event log 與狀態資料">Go 入門：如何新增結構化記錄欄位</a></li>
</ul>
<h2 id="小結">小結</h2>
<p>結構化日誌的價值在於穩定欄位：<code>layer</code> 定位層級，<code>request_id</code> 串起請求，<code>event_id</code> 串起事件，<code>event_type</code> 支援聚合，<code>reason</code> 支援失敗分類。Message 給人讀，欄位給系統查。好的 log schema 能讓除錯從猜測變成查詢，同時避免敏感資料外洩與錯誤重複記錄。</p>
]]></content:encoded></item><item><title>6.4 版本偵測與 feature gate</title><link>https://tarrragon.github.io/blog/go-advanced/06-production-operations/feature-gate/</link><pubDate>Wed, 22 Apr 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/go-advanced/06-production-operations/feature-gate/</guid><description>&lt;p>Feature gate 的核心目標是在外部能力、部署環境或版本不同時，讓服務保留可預期行為。它明確管理功能何時啟用、關閉時如何降級、錯誤時如何回報。&lt;/p>
&lt;h2 id="本章目標">本章目標&lt;/h2>
&lt;p>學完本章後，你將能夠：&lt;/p>
&lt;ol>
&lt;li>用 config struct 集中載入 feature gate&lt;/li>
&lt;li>把外部版本偵測轉成 capability&lt;/li>
&lt;li>為 gate 關閉時定義降級、回錯或延後處理策略&lt;/li>
&lt;li>避免在程式各處直接讀環境變數&lt;/li>
&lt;li>同時測試 feature 開與關兩條路徑&lt;/li>
&lt;/ol>
&lt;hr>
&lt;h2 id="觀察新功能上線需要可控行為">【觀察】新功能上線需要可控行為&lt;/h2>
&lt;p>Feature gate 的核心需求來自生產環境差異。新功能可能只在部分部署環境可用，外部依賴可能版本不同，某些診斷入口只應在內網啟用，某些即時能力需要先灰度。&lt;/p>
&lt;p>沒有 gate 時常見問題：&lt;/p>
&lt;ul>
&lt;li>新功能只能一次性全開或全關。&lt;/li>
&lt;li>部署環境不支援時服務直接失敗。&lt;/li>
&lt;li>測試只能覆蓋預設路徑。&lt;/li>
&lt;li>問題發生時無法快速降級。&lt;/li>
&lt;li>程式各處用環境變數判斷，行為難以推理。&lt;/li>
&lt;/ul>
&lt;p>Feature gate 的目的是讓行為決策集中、可測、可回滾。&lt;/p>
&lt;h2 id="判讀feature-gate-是行為合約">【判讀】feature gate 是行為合約&lt;/h2>
&lt;p>Feature gate 的核心語意是控制某段行為是否啟用，以及未啟用時系統要做什麼。它不只是 &lt;code>if&lt;/code>，而是一個操作合約。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="kd">type&lt;/span> &lt;span class="nx">Features&lt;/span> &lt;span class="kd">struct&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl"> &lt;span class="nx">RealtimePush&lt;/span> &lt;span class="kt">bool&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl"> &lt;span class="nx">Diagnostics&lt;/span> &lt;span class="kt">bool&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl"> &lt;span class="nx">Pprof&lt;/span> &lt;span class="kt">bool&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>開關名稱應描述功能，而不是描述臨時任務。&lt;code>RealtimePush&lt;/code> 比 &lt;code>NewCode&lt;/code> 更能長期維護；&lt;code>Diagnostics&lt;/code> 比 &lt;code>DebugStuff&lt;/code> 更清楚。&lt;/p>
&lt;p>Gate 應在應用啟動時集中載入，再傳給需要的元件。不要在程式各處反覆直接讀環境變數，否則測試與推理都會變困難。&lt;/p>
&lt;h2 id="執行集中載入-feature-config">【執行】集中載入 feature config&lt;/h2>
&lt;p>Feature config 的核心責任是把環境變數、設定檔或啟動參數轉成明確資料。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">LoadFeaturesFromEnv&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="nx">Features&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">Features&lt;/span>&lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl"> &lt;span class="nx">RealtimePush&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nx">os&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Getenv&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;FEATURE_REALTIME_PUSH&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">==&lt;/span> &lt;span class="s">&amp;#34;1&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl"> &lt;span class="nx">Diagnostics&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nx">os&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Getenv&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;APP_DIAGNOSTICS&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">==&lt;/span> &lt;span class="s">&amp;#34;1&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl"> &lt;span class="nx">Pprof&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nx">os&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Getenv&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;APP_PPROF&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">==&lt;/span> &lt;span class="s">&amp;#34;1&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">6&lt;/span>&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">7&lt;/span>&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>組裝時傳入元件：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="ln"> 1&lt;/span>&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">main&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 2&lt;/span>&lt;span class="cl"> &lt;span class="nx">features&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nf">LoadFeaturesFromEnv&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 3&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 4&lt;/span>&lt;span class="cl"> &lt;span class="nx">mux&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">http&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">NewServeMux&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 5&lt;/span>&lt;span class="cl"> &lt;span class="nf">RegisterDiagnostics&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">mux&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">features&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Diagnostics&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 6&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 7&lt;/span>&lt;span class="cl"> &lt;span class="nx">publisher&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nf">NewPublisher&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">PublisherConfig&lt;/span>&lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 8&lt;/span>&lt;span class="cl"> &lt;span class="nx">RealtimeEnabled&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nx">features&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">RealtimePush&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 9&lt;/span>&lt;span class="cl"> &lt;span class="p">})&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">10&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">11&lt;/span>&lt;span class="cl"> &lt;span class="nx">_&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="nx">publisher&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">12&lt;/span>&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>這樣功能測試可以直接建構 &lt;code>Features&lt;/code>，不必依賴全域環境變數。環境變數解析只需要在 &lt;code>LoadFeaturesFromEnv&lt;/code> 的測試中覆蓋。&lt;/p>
&lt;h2 id="判讀版本偵測要轉成能力">【判讀】版本偵測要轉成能力&lt;/h2>
&lt;p>版本偵測的核心原則是不要讓整個程式到處比較版本字串。應把外部版本轉成 capability，內部只判斷能力。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="ln"> 1&lt;/span>&lt;span class="cl">&lt;span class="kd">type&lt;/span> &lt;span class="nx">Capabilities&lt;/span> &lt;span class="kd">struct&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 2&lt;/span>&lt;span class="cl"> &lt;span class="nx">SupportsStreaming&lt;/span> &lt;span class="kt">bool&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 3&lt;/span>&lt;span class="cl"> &lt;span class="nx">SupportsMetadata&lt;/span> &lt;span class="kt">bool&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 4&lt;/span>&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 5&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 6&lt;/span>&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">DetectCapabilities&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">version&lt;/span> &lt;span class="nx">semver&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Version&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nx">Capabilities&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 7&lt;/span>&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">Capabilities&lt;/span>&lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 8&lt;/span>&lt;span class="cl"> &lt;span class="nx">SupportsStreaming&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nx">version&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">GTE&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">semver&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">MustParse&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;2.0.0&amp;#34;&lt;/span>&lt;span class="p">)),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 9&lt;/span>&lt;span class="cl"> &lt;span class="nx">SupportsMetadata&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nx">version&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">GTE&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">semver&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">MustParse&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;2.1.0&amp;#34;&lt;/span>&lt;span class="p">)),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">10&lt;/span>&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">11&lt;/span>&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>內部程式應寫成：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="k">if&lt;/span> &lt;span class="nx">caps&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">SupportsStreaming&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nf">useStreaming&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">ctx&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl">&lt;span class="k">return&lt;/span> &lt;span class="nf">usePolling&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">ctx&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>這比到處寫 &lt;code>if version &amp;gt;= ...&lt;/code> 更清楚，也更容易測試。版本字串是外部事實，capability 是內部行為判斷。&lt;/p>
&lt;h2 id="策略gate-關閉時要有降級策略">【策略】gate 關閉時要有降級策略&lt;/h2>
&lt;p>Feature gate 的核心問題是關閉時要做什麼。常見策略包括降級、回錯、隱藏入口、排程稍後處理。&lt;/p>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>策略&lt;/th>
 &lt;th>行為&lt;/th>
 &lt;th>適用情境&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>&lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/fallback/" data-link-title="Fallback" data-link-desc="說明主要路徑失敗時使用替代結果或替代流程的設計責任">fallback&lt;/a>&lt;/td>
 &lt;td>使用舊流程&lt;/td>
 &lt;td>新能力只是效率改善&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>reject&lt;/td>
 &lt;td>回明確錯誤&lt;/td>
 &lt;td>功能沒有安全替代方案&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>hide&lt;/td>
 &lt;td>不註冊 endpoint 或不顯示入口&lt;/td>
 &lt;td>使用者不應看到該功能&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>store for later&lt;/td>
 &lt;td>先保存，稍後處理&lt;/td>
 &lt;td>即時能力暫不可用但資料不能丟&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>例如即時推送關閉時，可以改成保存待處理資料：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">p&lt;/span> &lt;span class="nx">Publisher&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nf">Publish&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">ctx&lt;/span> &lt;span class="nx">context&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Context&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">event&lt;/span> &lt;span class="nx">DomainEvent&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="kt">error&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="nx">p&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">realtimeEnabled&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">p&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">realtime&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Publish&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">ctx&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">event&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">6&lt;/span>&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">p&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">repository&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">SaveForLater&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">ctx&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">event&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">7&lt;/span>&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>降級策略要符合資料語意。不能即時送出不代表可以直接丟掉重要事件。&lt;/p>
&lt;h2 id="執行http-endpoint-可用-gate-控制註冊或行為">【執行】HTTP endpoint 可用 gate 控制註冊或行為&lt;/h2>
&lt;p>HTTP feature gate 的核心選擇是「不註冊 endpoint」或「註冊但回明確錯誤」。兩者語意不同。&lt;/p>
&lt;p>不註冊 endpoint：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="k">if&lt;/span> &lt;span class="nx">features&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Diagnostics&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl"> &lt;span class="nf">RegisterDiagnostics&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">mux&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="kc">true&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>適合診斷入口、內部工具或不希望使用者看見的功能。&lt;/p>
&lt;p>註冊但回錯：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="ln"> 1&lt;/span>&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">HandleRealtimeExport&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">features&lt;/span> &lt;span class="nx">Features&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nx">http&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">HandlerFunc&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 2&lt;/span>&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="kd">func&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">w&lt;/span> &lt;span class="nx">http&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">ResponseWriter&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">r&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">http&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Request&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 3&lt;/span>&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="p">!&lt;/span>&lt;span class="nx">features&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">RealtimePush&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 4&lt;/span>&lt;span class="cl"> &lt;span class="nx">http&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Error&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">w&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;realtime export is disabled&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">http&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">StatusNotImplemented&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 5&lt;/span>&lt;span class="cl"> &lt;span class="k">return&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 6&lt;/span>&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 7&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 8&lt;/span>&lt;span class="cl"> &lt;span class="nf">startRealtimeExport&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">w&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">r&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 9&lt;/span>&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">10&lt;/span>&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>適合公開 API，讓呼叫端知道功能存在但目前不可用。&lt;/p>
&lt;h2 id="策略gate-不應散落成巢狀-if">【策略】gate 不應散落成巢狀 if&lt;/h2>
&lt;p>Feature gate 的核心維護風險是判斷散落在多層呼叫中，最後沒人知道功能到底何時啟用。&lt;/p></description><content:encoded><![CDATA[<p>Feature gate 的核心目標是在外部能力、部署環境或版本不同時，讓服務保留可預期行為。它明確管理功能何時啟用、關閉時如何降級、錯誤時如何回報。</p>
<h2 id="本章目標">本章目標</h2>
<p>學完本章後，你將能夠：</p>
<ol>
<li>用 config struct 集中載入 feature gate</li>
<li>把外部版本偵測轉成 capability</li>
<li>為 gate 關閉時定義降級、回錯或延後處理策略</li>
<li>避免在程式各處直接讀環境變數</li>
<li>同時測試 feature 開與關兩條路徑</li>
</ol>
<hr>
<h2 id="觀察新功能上線需要可控行為">【觀察】新功能上線需要可控行為</h2>
<p>Feature gate 的核心需求來自生產環境差異。新功能可能只在部分部署環境可用，外部依賴可能版本不同，某些診斷入口只應在內網啟用，某些即時能力需要先灰度。</p>
<p>沒有 gate 時常見問題：</p>
<ul>
<li>新功能只能一次性全開或全關。</li>
<li>部署環境不支援時服務直接失敗。</li>
<li>測試只能覆蓋預設路徑。</li>
<li>問題發生時無法快速降級。</li>
<li>程式各處用環境變數判斷，行為難以推理。</li>
</ul>
<p>Feature gate 的目的是讓行為決策集中、可測、可回滾。</p>
<h2 id="判讀feature-gate-是行為合約">【判讀】feature gate 是行為合約</h2>
<p>Feature gate 的核心語意是控制某段行為是否啟用，以及未啟用時系統要做什麼。它不只是 <code>if</code>，而是一個操作合約。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln">1</span><span class="cl"><span class="kd">type</span> <span class="nx">Features</span> <span class="kd">struct</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">    <span class="nx">RealtimePush</span> <span class="kt">bool</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">    <span class="nx">Diagnostics</span>  <span class="kt">bool</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">    <span class="nx">Pprof</span>        <span class="kt">bool</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>開關名稱應描述功能，而不是描述臨時任務。<code>RealtimePush</code> 比 <code>NewCode</code> 更能長期維護；<code>Diagnostics</code> 比 <code>DebugStuff</code> 更清楚。</p>
<p>Gate 應在應用啟動時集中載入，再傳給需要的元件。不要在程式各處反覆直接讀環境變數，否則測試與推理都會變困難。</p>
<h2 id="執行集中載入-feature-config">【執行】集中載入 feature config</h2>
<p>Feature config 的核心責任是把環境變數、設定檔或啟動參數轉成明確資料。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln">1</span><span class="cl"><span class="kd">func</span> <span class="nf">LoadFeaturesFromEnv</span><span class="p">()</span> <span class="nx">Features</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">    <span class="k">return</span> <span class="nx">Features</span><span class="p">{</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">        <span class="nx">RealtimePush</span><span class="p">:</span> <span class="nx">os</span><span class="p">.</span><span class="nf">Getenv</span><span class="p">(</span><span class="s">&#34;FEATURE_REALTIME_PUSH&#34;</span><span class="p">)</span> <span class="o">==</span> <span class="s">&#34;1&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">        <span class="nx">Diagnostics</span><span class="p">:</span>  <span class="nx">os</span><span class="p">.</span><span class="nf">Getenv</span><span class="p">(</span><span class="s">&#34;APP_DIAGNOSTICS&#34;</span><span class="p">)</span> <span class="o">==</span> <span class="s">&#34;1&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">        <span class="nx">Pprof</span><span class="p">:</span>        <span class="nx">os</span><span class="p">.</span><span class="nf">Getenv</span><span class="p">(</span><span class="s">&#34;APP_PPROF&#34;</span><span class="p">)</span> <span class="o">==</span> <span class="s">&#34;1&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>組裝時傳入元件：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="kd">func</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="nx">features</span> <span class="o">:=</span> <span class="nf">LoadFeaturesFromEnv</span><span class="p">()</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">    <span class="nx">mux</span> <span class="o">:=</span> <span class="nx">http</span><span class="p">.</span><span class="nf">NewServeMux</span><span class="p">()</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">    <span class="nf">RegisterDiagnostics</span><span class="p">(</span><span class="nx">mux</span><span class="p">,</span> <span class="nx">features</span><span class="p">.</span><span class="nx">Diagnostics</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">    <span class="nx">publisher</span> <span class="o">:=</span> <span class="nf">NewPublisher</span><span class="p">(</span><span class="nx">PublisherConfig</span><span class="p">{</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">        <span class="nx">RealtimeEnabled</span><span class="p">:</span> <span class="nx">features</span><span class="p">.</span><span class="nx">RealtimePush</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">    <span class="p">})</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">
</span></span><span class="line"><span class="ln">11</span><span class="cl">    <span class="nx">_</span> <span class="p">=</span> <span class="nx">publisher</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>這樣功能測試可以直接建構 <code>Features</code>，不必依賴全域環境變數。環境變數解析只需要在 <code>LoadFeaturesFromEnv</code> 的測試中覆蓋。</p>
<h2 id="判讀版本偵測要轉成能力">【判讀】版本偵測要轉成能力</h2>
<p>版本偵測的核心原則是不要讓整個程式到處比較版本字串。應把外部版本轉成 capability，內部只判斷能力。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="kd">type</span> <span class="nx">Capabilities</span> <span class="kd">struct</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="nx">SupportsStreaming</span> <span class="kt">bool</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">    <span class="nx">SupportsMetadata</span>  <span class="kt">bool</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="kd">func</span> <span class="nf">DetectCapabilities</span><span class="p">(</span><span class="nx">version</span> <span class="nx">semver</span><span class="p">.</span><span class="nx">Version</span><span class="p">)</span> <span class="nx">Capabilities</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">    <span class="k">return</span> <span class="nx">Capabilities</span><span class="p">{</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">        <span class="nx">SupportsStreaming</span><span class="p">:</span> <span class="nx">version</span><span class="p">.</span><span class="nf">GTE</span><span class="p">(</span><span class="nx">semver</span><span class="p">.</span><span class="nf">MustParse</span><span class="p">(</span><span class="s">&#34;2.0.0&#34;</span><span class="p">)),</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">        <span class="nx">SupportsMetadata</span><span class="p">:</span>  <span class="nx">version</span><span class="p">.</span><span class="nf">GTE</span><span class="p">(</span><span class="nx">semver</span><span class="p">.</span><span class="nf">MustParse</span><span class="p">(</span><span class="s">&#34;2.1.0&#34;</span><span class="p">)),</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>內部程式應寫成：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">if</span> <span class="nx">caps</span><span class="p">.</span><span class="nx">SupportsStreaming</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">    <span class="k">return</span> <span class="nf">useStreaming</span><span class="p">(</span><span class="nx">ctx</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">
</span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="k">return</span> <span class="nf">usePolling</span><span class="p">(</span><span class="nx">ctx</span><span class="p">)</span></span></span></code></pre></div><p>這比到處寫 <code>if version &gt;= ...</code> 更清楚，也更容易測試。版本字串是外部事實，capability 是內部行為判斷。</p>
<h2 id="策略gate-關閉時要有降級策略">【策略】gate 關閉時要有降級策略</h2>
<p>Feature gate 的核心問題是關閉時要做什麼。常見策略包括降級、回錯、隱藏入口、排程稍後處理。</p>
<table>
  <thead>
      <tr>
          <th>策略</th>
          <th>行為</th>
          <th>適用情境</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><a href="/blog/backend/knowledge-cards/fallback/" data-link-title="Fallback" data-link-desc="說明主要路徑失敗時使用替代結果或替代流程的設計責任">fallback</a></td>
          <td>使用舊流程</td>
          <td>新能力只是效率改善</td>
      </tr>
      <tr>
          <td>reject</td>
          <td>回明確錯誤</td>
          <td>功能沒有安全替代方案</td>
      </tr>
      <tr>
          <td>hide</td>
          <td>不註冊 endpoint 或不顯示入口</td>
          <td>使用者不應看到該功能</td>
      </tr>
      <tr>
          <td>store for later</td>
          <td>先保存，稍後處理</td>
          <td>即時能力暫不可用但資料不能丟</td>
      </tr>
  </tbody>
</table>
<p>例如即時推送關閉時，可以改成保存待處理資料：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln">1</span><span class="cl"><span class="kd">func</span> <span class="p">(</span><span class="nx">p</span> <span class="nx">Publisher</span><span class="p">)</span> <span class="nf">Publish</span><span class="p">(</span><span class="nx">ctx</span> <span class="nx">context</span><span class="p">.</span><span class="nx">Context</span><span class="p">,</span> <span class="nx">event</span> <span class="nx">DomainEvent</span><span class="p">)</span> <span class="kt">error</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">    <span class="k">if</span> <span class="nx">p</span><span class="p">.</span><span class="nx">realtimeEnabled</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">        <span class="k">return</span> <span class="nx">p</span><span class="p">.</span><span class="nx">realtime</span><span class="p">.</span><span class="nf">Publish</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span> <span class="nx">event</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">
</span></span><span class="line"><span class="ln">6</span><span class="cl">    <span class="k">return</span> <span class="nx">p</span><span class="p">.</span><span class="nx">repository</span><span class="p">.</span><span class="nf">SaveForLater</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span> <span class="nx">event</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>降級策略要符合資料語意。不能即時送出不代表可以直接丟掉重要事件。</p>
<h2 id="執行http-endpoint-可用-gate-控制註冊或行為">【執行】HTTP endpoint 可用 gate 控制註冊或行為</h2>
<p>HTTP feature gate 的核心選擇是「不註冊 endpoint」或「註冊但回明確錯誤」。兩者語意不同。</p>
<p>不註冊 endpoint：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">if</span> <span class="nx">features</span><span class="p">.</span><span class="nx">Diagnostics</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">    <span class="nf">RegisterDiagnostics</span><span class="p">(</span><span class="nx">mux</span><span class="p">,</span> <span class="kc">true</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>適合診斷入口、內部工具或不希望使用者看見的功能。</p>
<p>註冊但回錯：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="kd">func</span> <span class="nf">HandleRealtimeExport</span><span class="p">(</span><span class="nx">features</span> <span class="nx">Features</span><span class="p">)</span> <span class="nx">http</span><span class="p">.</span><span class="nx">HandlerFunc</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="k">return</span> <span class="kd">func</span><span class="p">(</span><span class="nx">w</span> <span class="nx">http</span><span class="p">.</span><span class="nx">ResponseWriter</span><span class="p">,</span> <span class="nx">r</span> <span class="o">*</span><span class="nx">http</span><span class="p">.</span><span class="nx">Request</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">        <span class="k">if</span> <span class="p">!</span><span class="nx">features</span><span class="p">.</span><span class="nx">RealtimePush</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">            <span class="nx">http</span><span class="p">.</span><span class="nf">Error</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span> <span class="s">&#34;realtime export is disabled&#34;</span><span class="p">,</span> <span class="nx">http</span><span class="p">.</span><span class="nx">StatusNotImplemented</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">            <span class="k">return</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">        <span class="nf">startRealtimeExport</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span> <span class="nx">r</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>適合公開 API，讓呼叫端知道功能存在但目前不可用。</p>
<h2 id="策略gate-不應散落成巢狀-if">【策略】gate 不應散落成巢狀 if</h2>
<p>Feature gate 的核心維護風險是判斷散落在多層呼叫中，最後沒人知道功能到底何時啟用。</p>
<p>反模式：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">if</span> <span class="nx">os</span><span class="p">.</span><span class="nf">Getenv</span><span class="p">(</span><span class="s">&#34;FEATURE_REALTIME_PUSH&#34;</span><span class="p">)</span> <span class="o">==</span> <span class="s">&#34;1&#34;</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">    <span class="k">if</span> <span class="nx">version</span> <span class="o">&gt;=</span> <span class="s">&#34;2.0.0&#34;</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">        <span class="k">if</span> <span class="nx">user</span><span class="p">.</span><span class="nx">Enabled</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">            <span class="c1">// ...</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>較清楚的做法是先組出 decision：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="kd">type</span> <span class="nx">RealtimeDecision</span> <span class="kd">struct</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="nx">Enabled</span> <span class="kt">bool</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">    <span class="nx">Reason</span>  <span class="kt">string</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="kd">func</span> <span class="nf">DecideRealtime</span><span class="p">(</span><span class="nx">features</span> <span class="nx">Features</span><span class="p">,</span> <span class="nx">caps</span> <span class="nx">Capabilities</span><span class="p">)</span> <span class="nx">RealtimeDecision</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">    <span class="k">if</span> <span class="p">!</span><span class="nx">features</span><span class="p">.</span><span class="nx">RealtimePush</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">        <span class="k">return</span> <span class="nx">RealtimeDecision</span><span class="p">{</span><span class="nx">Enabled</span><span class="p">:</span> <span class="kc">false</span><span class="p">,</span> <span class="nx">Reason</span><span class="p">:</span> <span class="s">&#34;feature_disabled&#34;</span><span class="p">}</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">    <span class="k">if</span> <span class="p">!</span><span class="nx">caps</span><span class="p">.</span><span class="nx">SupportsStreaming</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">        <span class="k">return</span> <span class="nx">RealtimeDecision</span><span class="p">{</span><span class="nx">Enabled</span><span class="p">:</span> <span class="kc">false</span><span class="p">,</span> <span class="nx">Reason</span><span class="p">:</span> <span class="s">&#34;streaming_not_supported&#34;</span><span class="p">}</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">    <span class="k">return</span> <span class="nx">RealtimeDecision</span><span class="p">{</span><span class="nx">Enabled</span><span class="p">:</span> <span class="kc">true</span><span class="p">}</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>Decision 物件讓 <a href="/blog/backend/knowledge-cards/log/" data-link-title="Log" data-link-desc="說明 log 如何記錄單一事件的上下文並支援事故排查">log</a>、測試與錯誤回應都能使用相同 reason。</p>
<h2 id="執行log-要記錄-gate-decision">【執行】log 要記錄 gate decision</h2>
<p>Feature gate 的核心操作需求是知道功能為何啟用或關閉。當 gate 影響行為時，應記錄穩定 reason。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln">1</span><span class="cl"><span class="nx">decision</span> <span class="o">:=</span> <span class="nf">DecideRealtime</span><span class="p">(</span><span class="nx">features</span><span class="p">,</span> <span class="nx">caps</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="nx">logger</span><span class="p">.</span><span class="nf">Info</span><span class="p">(</span><span class="s">&#34;realtime decision&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">    <span class="s">&#34;feature&#34;</span><span class="p">,</span> <span class="s">&#34;realtime_push&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">    <span class="s">&#34;enabled&#34;</span><span class="p">,</span> <span class="nx">decision</span><span class="p">.</span><span class="nx">Enabled</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">    <span class="s">&#34;reason&#34;</span><span class="p">,</span> <span class="nx">decision</span><span class="p">.</span><span class="nx">Reason</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="p">)</span></span></span></code></pre></div><p>這能回答「功能為什麼沒有走即時推送」這類問題。Reason 應是小集合，不要塞完整錯誤字串。</p>
<h2 id="測試開與關兩條路徑都要測">【測試】開與關兩條路徑都要測</h2>
<p>Feature gate 測試的核心規則是同時測啟用與停用路徑。只測預設值很容易讓另一條路徑壞掉。</p>
<p>停用路徑：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="kd">func</span> <span class="nf">TestHandleRealtimeExportFeatureDisabled</span><span class="p">(</span><span class="nx">t</span> <span class="o">*</span><span class="nx">testing</span><span class="p">.</span><span class="nx">T</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="nx">req</span> <span class="o">:=</span> <span class="nx">httptest</span><span class="p">.</span><span class="nf">NewRequest</span><span class="p">(</span><span class="nx">http</span><span class="p">.</span><span class="nx">MethodPost</span><span class="p">,</span> <span class="s">&#34;/export&#34;</span><span class="p">,</span> <span class="kc">nil</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">    <span class="nx">rec</span> <span class="o">:=</span> <span class="nx">httptest</span><span class="p">.</span><span class="nf">NewRecorder</span><span class="p">()</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">    <span class="nx">handler</span> <span class="o">:=</span> <span class="nf">HandleRealtimeExport</span><span class="p">(</span><span class="nx">Features</span><span class="p">{</span><span class="nx">RealtimePush</span><span class="p">:</span> <span class="kc">false</span><span class="p">})</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">    <span class="nx">handler</span><span class="p">.</span><span class="nf">ServeHTTP</span><span class="p">(</span><span class="nx">rec</span><span class="p">,</span> <span class="nx">req</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">    <span class="k">if</span> <span class="nx">rec</span><span class="p">.</span><span class="nx">Code</span> <span class="o">!=</span> <span class="nx">http</span><span class="p">.</span><span class="nx">StatusNotImplemented</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">        <span class="nx">t</span><span class="p">.</span><span class="nf">Fatalf</span><span class="p">(</span><span class="s">&#34;status = %d, want %d&#34;</span><span class="p">,</span> <span class="nx">rec</span><span class="p">.</span><span class="nx">Code</span><span class="p">,</span> <span class="nx">http</span><span class="p">.</span><span class="nx">StatusNotImplemented</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>啟用路徑：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="kd">func</span> <span class="nf">TestDecideRealtimeEnabled</span><span class="p">(</span><span class="nx">t</span> <span class="o">*</span><span class="nx">testing</span><span class="p">.</span><span class="nx">T</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="nx">decision</span> <span class="o">:=</span> <span class="nf">DecideRealtime</span><span class="p">(</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">        <span class="nx">Features</span><span class="p">{</span><span class="nx">RealtimePush</span><span class="p">:</span> <span class="kc">true</span><span class="p">},</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">        <span class="nx">Capabilities</span><span class="p">{</span><span class="nx">SupportsStreaming</span><span class="p">:</span> <span class="kc">true</span><span class="p">},</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">    <span class="p">)</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">    <span class="k">if</span> <span class="p">!</span><span class="nx">decision</span><span class="p">.</span><span class="nx">Enabled</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">        <span class="nx">t</span><span class="p">.</span><span class="nf">Fatalf</span><span class="p">(</span><span class="s">&#34;realtime should be enabled, reason %q&#34;</span><span class="p">,</span> <span class="nx">decision</span><span class="p">.</span><span class="nx">Reason</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>環境變數解析應單獨測 <code>LoadFeaturesFromEnv</code>。功能測試應直接傳入 <code>Features</code>，不要依賴全域環境狀態。</p>
<h2 id="本章不處理">本章不處理</h2>
<p>本章先處理服務內部的 gate 行為邊界；遠端 <a href="/blog/backend/knowledge-cards/feature-flag/" data-link-title="Feature Flag" data-link-desc="說明如何用可動態開關控制功能曝光與風險">feature flag</a> 平台與灰度流程，會在下列章節再往外延伸：</p>
<ul>
<li><a href="/blog/backend/05-deployment-platform/" data-link-title="模組五：部署平台與網路入口" data-link-desc="整理 Kubernetes、systemd、load balancer、container 與服務生命週期合約">Backend：部署平台與網路入口</a></li>
</ul>
<h2 id="和-go-教材的關係">和 Go 教材的關係</h2>
<p>這一章承接的是 composition root、handler boundary 與 runtime gate；如果你要先回看語言教材，可以讀：</p>
<ul>
<li><a href="/blog/go/07-refactoring/composition-root/" data-link-title="7.7 composition root 與依賴組裝" data-link-desc="把具體 adapter、config 與 usecase wiring 留在應用入口層">Go：composition root 與依賴組裝</a></li>
<li><a href="/blog/go/07-refactoring/handler-boundary/" data-link-title="7.1 把 handler 邏輯拆成可測單元" data-link-desc="分離 HTTP 協定處理與核心邏輯">Go：把 handler 邏輯拆成可測單元</a></li>
<li><a href="/blog/go/07-refactoring/interface-boundary/" data-link-title="7.2 用 interface 隔離外部依賴" data-link-desc="建立小而穩定的測試替身">Go：用 interface 隔離外部依賴</a></li>
<li><a href="/blog/go/05-error-testing/testing-basics/" data-link-title="5.2 testing 基礎" data-link-desc="用 testing package 驗證函式行為">Go：testing 基礎</a></li>
<li><a href="/blog/go-advanced/07-distributed-operations/deployment-contracts/" data-link-title="7.5 Kubernetes、systemd 與 load balancer 合約" data-link-desc="理解部署平台如何影響 Go 服務的 shutdown、health 與資源限制">Go 進階：Kubernetes、systemd 與 load balancer 合約</a></li>
</ul>
<h2 id="小結">小結</h2>
<p>Feature gate 是生產操作工具，也是程式設計邊界。好的 gate 會集中載入、轉成 capability、定義降級策略、輸出穩定 reason，並同時測試開與關兩條路徑。它控制的是行為合約，不只是把新程式碼藏在 <code>if</code> 後面。</p>
]]></content:encoded></item><item><title>7.4 Observability pipeline、metrics 與 tracing</title><link>https://tarrragon.github.io/blog/go-advanced/07-distributed-operations/observability-pipeline/</link><pubDate>Wed, 22 Apr 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/go-advanced/07-distributed-operations/observability-pipeline/</guid><description>&lt;p>Observability pipeline 的核心責任是把服務訊號整理成可查詢、可聚合、可關聯的診斷資料。&lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/log-schema/" data-link-title="Log Schema" data-link-desc="說明結構化 log 欄位如何支援搜尋、關聯與事故排查">Log schema&lt;/a> 描述單次事件，&lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/metrics/" data-link-title="Metrics" data-link-desc="說明指標如何描述服務趨勢、容量與健康狀態">metrics&lt;/a> 描述趨勢，&lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/trace-context/" data-link-title="Trace Context" data-link-desc="說明跨服務 request 如何用 trace context 串起路徑與耗時">trace context&lt;/a> 描述跨元件路徑，profile 描述 runtime 成本；它們的責任不同，但應使用一致的識別欄位串起來。&lt;/p>
&lt;h2 id="本章目標">本章目標&lt;/h2>
&lt;p>學完本章後，你將能夠：&lt;/p>
&lt;ol>
&lt;li>分辨 &lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/log/" data-link-title="Log" data-link-desc="說明 log 如何記錄單一事件的上下文並支援事故排查">log&lt;/a>、metric、&lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/trace/" data-link-title="Trace" data-link-desc="說明 trace 如何重建跨服務請求的路徑、耗時與依賴關係">trace&lt;/a> 與 profile 各自回答什麼問題&lt;/li>
&lt;li>設計穩定的 correlation 欄位&lt;/li>
&lt;li>讓 Go 服務輸出適合聚合的診斷訊號&lt;/li>
&lt;li>在產生端控制敏感資料進入觀測管線&lt;/li>
&lt;li>了解 &lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/dashboard/" data-link-title="Dashboard" data-link-desc="說明 dashboard 如何把關鍵訊號組成可判讀的服務狀態畫面">dashboard&lt;/a> 與 &lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/alert/" data-link-title="Alert" data-link-desc="說明 alert 如何把需要處理的服務症狀轉成可行動通知">alert&lt;/a> 為什麼需要依賴穩定欄位&lt;/li>
&lt;/ol>
&lt;h2 id="前置章節">前置章節&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="https://tarrragon.github.io/blog/go/03-stdlib/slog/" data-link-title="3.6 log/slog：結構化日誌" data-link-desc="用 key-value log 設計可查詢、可過濾的程式訊號">Go 入門：log/slog：結構化日誌&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://tarrragon.github.io/blog/go-advanced/03-runtime-profiling/pprof/" data-link-title="3.2 pprof 基礎診斷流程" data-link-desc="用 pprof endpoint 診斷 heap、goroutine 與 CPU 問題">Go 進階：pprof 基礎診斷流程&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://tarrragon.github.io/blog/go-advanced/06-production-operations/log-fields/" data-link-title="6.3 結構化日誌欄位設計" data-link-desc="讓 log 可 grep、可聚合、可追蹤">Go 進階：結構化日誌欄位設計&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://tarrragon.github.io/blog/go-advanced/06-production-operations/health-diagnostics/" data-link-title="6.2 健康檢查與診斷 endpoint" data-link-desc="區分服務可用性與工程診斷入口">Go 進階：健康檢查與診斷 endpoint&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/sli-slo/" data-link-title="SLI / SLO" data-link-desc="說明服務品質指標與服務品質目標如何連接產品承諾">Backend：SLI / SLO&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/metric-cardinality/" data-link-title="Metric Cardinality" data-link-desc="說明 metric label 組合數量如何影響觀測成本與查詢穩定性">Backend：Metric Cardinality&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/alert-runbook/" data-link-title="Alert Runbook" data-link-desc="說明告警如何連到可執行的排障與恢復流程">Backend：Alert Runbook&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="後續撰寫方向">後續撰寫方向&lt;/h2>
&lt;ol>
&lt;li>Log、metric、trace、profile 分別回答哪些問題。&lt;/li>
&lt;li>&lt;code>request_id&lt;/code>、&lt;code>event_id&lt;/code>、&lt;code>trace_id&lt;/code>、&lt;code>span_id&lt;/code> 與 &lt;code>correlation_id&lt;/code> 如何分工。&lt;/li>
&lt;li>OpenTelemetry 導入時，Go 程式碼應保留哪些清楚邊界。&lt;/li>
&lt;li>Sensitive data policy 如何套用到 log、trace attribute 與 error event。&lt;/li>
&lt;li>Dashboard 與 alert 應依賴穩定欄位，讓查詢與告警規則可以被重複執行。&lt;/li>
&lt;/ol>
&lt;h2 id="觀察診斷資料要先可關聯再談漂亮">【觀察】診斷資料要先可關聯，再談漂亮&lt;/h2>
&lt;p>Observability pipeline 的第一個要求是關聯能力。Log、metric、trace 的格式可以各自精緻，但欄位需要對齊，才能把同一筆請求、同一個事件、同一條 goroutine 路徑串起來。&lt;/p>
&lt;p>通常會先建立幾個穩定欄位：&lt;/p>
&lt;ul>
&lt;li>request_id&lt;/li>
&lt;li>event_id&lt;/li>
&lt;li>trace_id&lt;/li>
&lt;li>&lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/span/" data-link-title="Span" data-link-desc="說明 trace 中一段工作如何記錄耗時、狀態與關聯">span&lt;/a>_id&lt;/li>
&lt;li>user_id 或 tenant_id&lt;/li>
&lt;/ul>
&lt;h2 id="判讀不同訊號回答不同問題">【判讀】不同訊號回答不同問題&lt;/h2>
&lt;ul>
&lt;li>log：這次發生了什麼。&lt;/li>
&lt;li>metric：這類事件發生得多不多、快不快、慢不慢。&lt;/li>
&lt;li>trace：它在多個元件之間怎麼走。&lt;/li>
&lt;li>profile：CPU、記憶體、goroutine 與等待成本落在哪裡。&lt;/li>
&lt;/ul>
&lt;p>如果某個問題要靠自由文字 log 去猜，通常代表欄位設計還不夠穩。&lt;/p>
&lt;h2 id="策略敏感資料要在產生端就攔住">【策略】敏感資料要在產生端就攔住&lt;/h2>
&lt;p>敏感資料政策應在產生端執行。Go 服務應該在輸出 log 或 trace attribute 前就決定哪些資訊可以外送。&lt;/p>
&lt;p>常見要注意的資料有：&lt;/p>
&lt;ul>
&lt;li>token&lt;/li>
&lt;li>email&lt;/li>
&lt;li>身分證號&lt;/li>
&lt;li>raw payload&lt;/li>
&lt;li>內部路徑與配置&lt;/li>
&lt;/ul>
&lt;h2 id="執行結構化-log-是-pipeline-的起點">【執行】結構化 log 是 pipeline 的起點&lt;/h2>
&lt;p>當 Go 服務使用結構化 log 時，最重要的是欄位穩定與語意清楚。這些 log 後面可能會被：&lt;/p>
&lt;ul>
&lt;li>集中式 log system 搜尋&lt;/li>
&lt;li>metric extraction 轉成趨勢指標&lt;/li>
&lt;li>alert rule 用來偵測異常&lt;/li>
&lt;/ul>
&lt;p>所以 log 欄位要維持穩定命名，分類資訊要放在結構化欄位裡。&lt;/p></description><content:encoded><![CDATA[<p>Observability pipeline 的核心責任是把服務訊號整理成可查詢、可聚合、可關聯的診斷資料。<a href="/blog/backend/knowledge-cards/log-schema/" data-link-title="Log Schema" data-link-desc="說明結構化 log 欄位如何支援搜尋、關聯與事故排查">Log schema</a> 描述單次事件，<a href="/blog/backend/knowledge-cards/metrics/" data-link-title="Metrics" data-link-desc="說明指標如何描述服務趨勢、容量與健康狀態">metrics</a> 描述趨勢，<a href="/blog/backend/knowledge-cards/trace-context/" data-link-title="Trace Context" data-link-desc="說明跨服務 request 如何用 trace context 串起路徑與耗時">trace context</a> 描述跨元件路徑，profile 描述 runtime 成本；它們的責任不同，但應使用一致的識別欄位串起來。</p>
<h2 id="本章目標">本章目標</h2>
<p>學完本章後，你將能夠：</p>
<ol>
<li>分辨 <a href="/blog/backend/knowledge-cards/log/" data-link-title="Log" data-link-desc="說明 log 如何記錄單一事件的上下文並支援事故排查">log</a>、metric、<a href="/blog/backend/knowledge-cards/trace/" data-link-title="Trace" data-link-desc="說明 trace 如何重建跨服務請求的路徑、耗時與依賴關係">trace</a> 與 profile 各自回答什麼問題</li>
<li>設計穩定的 correlation 欄位</li>
<li>讓 Go 服務輸出適合聚合的診斷訊號</li>
<li>在產生端控制敏感資料進入觀測管線</li>
<li>了解 <a href="/blog/backend/knowledge-cards/dashboard/" data-link-title="Dashboard" data-link-desc="說明 dashboard 如何把關鍵訊號組成可判讀的服務狀態畫面">dashboard</a> 與 <a href="/blog/backend/knowledge-cards/alert/" data-link-title="Alert" data-link-desc="說明 alert 如何把需要處理的服務症狀轉成可行動通知">alert</a> 為什麼需要依賴穩定欄位</li>
</ol>
<h2 id="前置章節">前置章節</h2>
<ul>
<li><a href="/blog/go/03-stdlib/slog/" data-link-title="3.6 log/slog：結構化日誌" data-link-desc="用 key-value log 設計可查詢、可過濾的程式訊號">Go 入門：log/slog：結構化日誌</a></li>
<li><a href="/blog/go-advanced/03-runtime-profiling/pprof/" data-link-title="3.2 pprof 基礎診斷流程" data-link-desc="用 pprof endpoint 診斷 heap、goroutine 與 CPU 問題">Go 進階：pprof 基礎診斷流程</a></li>
<li><a href="/blog/go-advanced/06-production-operations/log-fields/" data-link-title="6.3 結構化日誌欄位設計" data-link-desc="讓 log 可 grep、可聚合、可追蹤">Go 進階：結構化日誌欄位設計</a></li>
<li><a href="/blog/go-advanced/06-production-operations/health-diagnostics/" data-link-title="6.2 健康檢查與診斷 endpoint" data-link-desc="區分服務可用性與工程診斷入口">Go 進階：健康檢查與診斷 endpoint</a></li>
<li><a href="/blog/backend/knowledge-cards/sli-slo/" data-link-title="SLI / SLO" data-link-desc="說明服務品質指標與服務品質目標如何連接產品承諾">Backend：SLI / SLO</a></li>
<li><a href="/blog/backend/knowledge-cards/metric-cardinality/" data-link-title="Metric Cardinality" data-link-desc="說明 metric label 組合數量如何影響觀測成本與查詢穩定性">Backend：Metric Cardinality</a></li>
<li><a href="/blog/backend/knowledge-cards/alert-runbook/" data-link-title="Alert Runbook" data-link-desc="說明告警如何連到可執行的排障與恢復流程">Backend：Alert Runbook</a></li>
</ul>
<h2 id="後續撰寫方向">後續撰寫方向</h2>
<ol>
<li>Log、metric、trace、profile 分別回答哪些問題。</li>
<li><code>request_id</code>、<code>event_id</code>、<code>trace_id</code>、<code>span_id</code> 與 <code>correlation_id</code> 如何分工。</li>
<li>OpenTelemetry 導入時，Go 程式碼應保留哪些清楚邊界。</li>
<li>Sensitive data policy 如何套用到 log、trace attribute 與 error event。</li>
<li>Dashboard 與 alert 應依賴穩定欄位，讓查詢與告警規則可以被重複執行。</li>
</ol>
<h2 id="觀察診斷資料要先可關聯再談漂亮">【觀察】診斷資料要先可關聯，再談漂亮</h2>
<p>Observability pipeline 的第一個要求是關聯能力。Log、metric、trace 的格式可以各自精緻，但欄位需要對齊，才能把同一筆請求、同一個事件、同一條 goroutine 路徑串起來。</p>
<p>通常會先建立幾個穩定欄位：</p>
<ul>
<li>request_id</li>
<li>event_id</li>
<li>trace_id</li>
<li><a href="/blog/backend/knowledge-cards/span/" data-link-title="Span" data-link-desc="說明 trace 中一段工作如何記錄耗時、狀態與關聯">span</a>_id</li>
<li>user_id 或 tenant_id</li>
</ul>
<h2 id="判讀不同訊號回答不同問題">【判讀】不同訊號回答不同問題</h2>
<ul>
<li>log：這次發生了什麼。</li>
<li>metric：這類事件發生得多不多、快不快、慢不慢。</li>
<li>trace：它在多個元件之間怎麼走。</li>
<li>profile：CPU、記憶體、goroutine 與等待成本落在哪裡。</li>
</ul>
<p>如果某個問題要靠自由文字 log 去猜，通常代表欄位設計還不夠穩。</p>
<h2 id="策略敏感資料要在產生端就攔住">【策略】敏感資料要在產生端就攔住</h2>
<p>敏感資料政策應在產生端執行。Go 服務應該在輸出 log 或 trace attribute 前就決定哪些資訊可以外送。</p>
<p>常見要注意的資料有：</p>
<ul>
<li>token</li>
<li>email</li>
<li>身分證號</li>
<li>raw payload</li>
<li>內部路徑與配置</li>
</ul>
<h2 id="執行結構化-log-是-pipeline-的起點">【執行】結構化 log 是 pipeline 的起點</h2>
<p>當 Go 服務使用結構化 log 時，最重要的是欄位穩定與語意清楚。這些 log 後面可能會被：</p>
<ul>
<li>集中式 log system 搜尋</li>
<li>metric extraction 轉成趨勢指標</li>
<li>alert rule 用來偵測異常</li>
</ul>
<p>所以 log 欄位要維持穩定命名，分類資訊要放在結構化欄位裡。</p>
<h2 id="延伸診斷和容量規劃要串在一起">【延伸】診斷和容量規劃要串在一起</h2>
<p>觀測資料不只是事後排障，也會反過來影響容量規劃與 release 判斷。當你看到 goroutine 數、<a href="/blog/backend/knowledge-cards/queue/" data-link-title="Queue" data-link-desc="說明 queue 如何保存等待處理的工作並形成容量邊界">queue</a> lag、DB latency 或 retry rate 持續變高，就代表系統邊界已經開始吃緊。</p>
<h2 id="本章不處理">本章不處理</h2>
<p>本章不會綁定特定 observability SaaS。教材重點會放在 Go 服務如何輸出穩定訊號，讓不同收集平台都能使用。</p>
<h2 id="和-go-教材的關係">和 Go 教材的關係</h2>
<p>這一章承接的是 Go 的結構化日誌與 runtime 診斷；如果你要先回看語言教材，可以讀：</p>
<ul>
<li><a href="/blog/go/03-stdlib/slog/" data-link-title="3.6 log/slog：結構化日誌" data-link-desc="用 key-value log 設計可查詢、可過濾的程式訊號">Go：結構化日誌</a></li>
<li><a href="/blog/go-advanced/03-runtime-profiling/pprof/" data-link-title="3.2 pprof 基礎診斷流程" data-link-desc="用 pprof endpoint 診斷 heap、goroutine 與 CPU 問題">Go 進階：pprof 基礎診斷流程</a></li>
<li><a href="/blog/go-advanced/06-production-operations/log-fields/" data-link-title="6.3 結構化日誌欄位設計" data-link-desc="讓 log 可 grep、可聚合、可追蹤">Go 進階：結構化日誌欄位設計</a></li>
<li><a href="/blog/go-advanced/06-production-operations/health-diagnostics/" data-link-title="6.2 健康檢查與診斷 endpoint" data-link-desc="區分服務可用性與工程診斷入口">Go 進階：健康檢查與診斷 endpoint</a></li>
</ul>
]]></content:encoded></item><item><title>7.5 Kubernetes、systemd 與 load balancer 合約</title><link>https://tarrragon.github.io/blog/go-advanced/07-distributed-operations/deployment-contracts/</link><pubDate>Wed, 22 Apr 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/go-advanced/07-distributed-operations/deployment-contracts/</guid><description>&lt;p>部署平台合約的核心責任是讓 Go 服務的生命週期和外部調度系統對齊。程式內部需要清楚的 context、shutdown &lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/timeout/" data-link-title="Timeout" data-link-desc="說明等待外部操作的時間上限如何保護資源與使用者體驗">timeout&lt;/a>、&lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/readiness/" data-link-title="Readiness" data-link-desc="說明 instance 何時可以安全接收流量，以及 readiness 如何和部署平台協作">readiness&lt;/a>、&lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/health-check-liveness/" data-link-title="Liveness" data-link-desc="說明平台如何判斷 process 是否仍然存活，以及何時應重啟">health / liveness&lt;/a> 與 memory limit；Kubernetes、systemd、load balancer 或雲端平台則決定這些訊號何時被觸發與如何被解讀。&lt;/p>
&lt;h2 id="本章目標">本章目標&lt;/h2>
&lt;p>學完本章後，你將能夠：&lt;/p>
&lt;ol>
&lt;li>理解 shutdown、&lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/readiness/" data-link-title="Readiness" data-link-desc="說明 instance 何時可以安全接收流量，以及 readiness 如何和部署平台協作">readiness&lt;/a> 與 connection draining 的順序&lt;/li>
&lt;li>看懂平台 timeout 對 Go server 的影響&lt;/li>
&lt;li>分辨 health 與 readiness 的不同責任&lt;/li>
&lt;li>把 memory limit 與 Go runtime 的資源管理接在一起&lt;/li>
&lt;li>讓部署平台和程式彼此遵守同一份合約&lt;/li>
&lt;/ol>
&lt;h2 id="前置章節">前置章節&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="https://tarrragon.github.io/blog/go-advanced/03-runtime-profiling/gc-memory-limit/" data-link-title="3.1 GC 與 memory limit" data-link-desc="理解 debug.SetMemoryLimit 在長時間服務中的用途">Go 進階：GC 與 memory limit&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/graceful-shutdown/" data-link-title="Graceful Shutdown" data-link-desc="說明服務停止前如何排空流量、完成工作與保存狀態">Go 進階：graceful shutdown 與 signal handling&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://tarrragon.github.io/blog/go-advanced/06-production-operations/health-diagnostics/" data-link-title="6.2 健康檢查與診斷 endpoint" data-link-desc="區分服務可用性與工程診斷入口">Go 進階：健康檢查與診斷 endpoint&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/graceful-shutdown/" data-link-title="Graceful Shutdown" data-link-desc="說明服務停止前如何排空流量、完成工作與保存狀態">Backend：Graceful Shutdown&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/failover/" data-link-title="Failover" data-link-desc="說明主要服務或節點失效時如何切換到備援能力">Backend：Failover&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="後續撰寫方向">後續撰寫方向&lt;/h2>
&lt;ol>
&lt;li>SIGTERM、shutdown timeout、readiness false 與 connection draining 的順序。&lt;/li>
&lt;li>Kubernetes &lt;code>terminationGracePeriodSeconds&lt;/code> 與 Go &lt;code>http.Server.Shutdown&lt;/code> 如何配合。&lt;/li>
&lt;li>Load balancer idle timeout 如何影響 &lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/websocket/" data-link-title="WebSocket" data-link-desc="說明 WebSocket 如何提供長連線雙向即時通訊">WebSocket&lt;/a> heartbeat 參數。&lt;/li>
&lt;li>Container memory limit、Go memory limit 與 OOM killer 之間的關係。&lt;/li>
&lt;li>systemd restart policy 與 health endpoint 的責任分工。&lt;/li>
&lt;/ol>
&lt;h2 id="觀察平台會主動改變服務生命週期">【觀察】平台會主動改變服務生命週期&lt;/h2>
&lt;p>Go 程式不會在真空裡執行。Kubernetes、systemd、load balancer、container runtime 都會影響服務何時接新請求、何時開始收尾、何時被強制終止。這表示程式不只要「能跑」，還要能跟平台協調。&lt;/p>
&lt;p>常見的生命週期訊號有：&lt;/p>
&lt;ul>
&lt;li>SIGTERM&lt;/li>
&lt;li>readiness false&lt;/li>
&lt;li>HTTP shutdown&lt;/li>
&lt;li>connection draining&lt;/li>
&lt;li>memory pressure&lt;/li>
&lt;/ul>
&lt;h2 id="判讀health-與-readiness-有不同合約">【判讀】health 與 readiness 有不同合約&lt;/h2>
&lt;p>health 通常表示服務自己還活著，readiness 則表示它是否適合接新流量。&lt;/p>
&lt;ul>
&lt;li>health 可以用來讓平台知道 process 還活著。&lt;/li>
&lt;li>readiness 可以用來讓 load balancer 停止送新請求。&lt;/li>
&lt;/ul>
&lt;p>如果兩者混在一起，部署時就容易出現「服務還沒收尾就被塞新流量」或「其實還能接流量卻被誤判下線」的問題。&lt;/p>
&lt;h2 id="策略shutdown-應該是可預期流程">【策略】shutdown 應該是可預期流程&lt;/h2>
&lt;p>典型的 shutdown 順序是：&lt;/p>
&lt;ol>
&lt;li>接收到停止訊號。&lt;/li>
&lt;li>先把 readiness 關掉。&lt;/li>
&lt;li>停止接新流量。&lt;/li>
&lt;li>讓現有 request / worker / websocket 收尾。&lt;/li>
&lt;li>超時後強制結束。&lt;/li>
&lt;/ol>
&lt;p>這個順序能讓平台有時間把流量移走，也讓應用有時間清理資源。&lt;/p>
&lt;h2 id="執行資源限制要和-runtime-觀念一起看">【執行】資源限制要和 runtime 觀念一起看&lt;/h2>
&lt;p>container memory limit 不只是部署平台的事，也會影響 Go runtime 的行為。當可用記憶體變少時，應用更需要控制：&lt;/p>
&lt;ul>
&lt;li>goroutine 數量&lt;/li>
&lt;li>&lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/buffer/" data-link-title="Buffer" data-link-desc="說明系統如何用暫存空間吸收短暫速度差與尖峰流量">buffer&lt;/a> 大小&lt;/li>
&lt;li>cache 體積&lt;/li>
&lt;li>in-memory &lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/queue/" data-link-title="Queue" data-link-desc="說明 queue 如何保存等待處理的工作並形成容量邊界">queue&lt;/a> 長度&lt;/li>
&lt;/ul>
&lt;p>如果這些沒有限制，平台的 OOM killer 可能會比你的 &lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/graceful-shutdown/" data-link-title="Graceful Shutdown" data-link-desc="說明服務停止前如何排空流量、完成工作與保存狀態">graceful shutdown&lt;/a> 先來。&lt;/p></description><content:encoded><![CDATA[<p>部署平台合約的核心責任是讓 Go 服務的生命週期和外部調度系統對齊。程式內部需要清楚的 context、shutdown <a href="/blog/backend/knowledge-cards/timeout/" data-link-title="Timeout" data-link-desc="說明等待外部操作的時間上限如何保護資源與使用者體驗">timeout</a>、<a href="/blog/backend/knowledge-cards/readiness/" data-link-title="Readiness" data-link-desc="說明 instance 何時可以安全接收流量，以及 readiness 如何和部署平台協作">readiness</a>、<a href="/blog/backend/knowledge-cards/health-check-liveness/" data-link-title="Liveness" data-link-desc="說明平台如何判斷 process 是否仍然存活，以及何時應重啟">health / liveness</a> 與 memory limit；Kubernetes、systemd、load balancer 或雲端平台則決定這些訊號何時被觸發與如何被解讀。</p>
<h2 id="本章目標">本章目標</h2>
<p>學完本章後，你將能夠：</p>
<ol>
<li>理解 shutdown、<a href="/blog/backend/knowledge-cards/readiness/" data-link-title="Readiness" data-link-desc="說明 instance 何時可以安全接收流量，以及 readiness 如何和部署平台協作">readiness</a> 與 connection draining 的順序</li>
<li>看懂平台 timeout 對 Go server 的影響</li>
<li>分辨 health 與 readiness 的不同責任</li>
<li>把 memory limit 與 Go runtime 的資源管理接在一起</li>
<li>讓部署平台和程式彼此遵守同一份合約</li>
</ol>
<h2 id="前置章節">前置章節</h2>
<ul>
<li><a href="/blog/go-advanced/03-runtime-profiling/gc-memory-limit/" data-link-title="3.1 GC 與 memory limit" data-link-desc="理解 debug.SetMemoryLimit 在長時間服務中的用途">Go 進階：GC 與 memory limit</a></li>
<li><a href="/blog/backend/knowledge-cards/graceful-shutdown/" data-link-title="Graceful Shutdown" data-link-desc="說明服務停止前如何排空流量、完成工作與保存狀態">Go 進階：graceful shutdown 與 signal handling</a></li>
<li><a href="/blog/go-advanced/06-production-operations/health-diagnostics/" data-link-title="6.2 健康檢查與診斷 endpoint" data-link-desc="區分服務可用性與工程診斷入口">Go 進階：健康檢查與診斷 endpoint</a></li>
<li><a href="/blog/backend/knowledge-cards/graceful-shutdown/" data-link-title="Graceful Shutdown" data-link-desc="說明服務停止前如何排空流量、完成工作與保存狀態">Backend：Graceful Shutdown</a></li>
<li><a href="/blog/backend/knowledge-cards/failover/" data-link-title="Failover" data-link-desc="說明主要服務或節點失效時如何切換到備援能力">Backend：Failover</a></li>
</ul>
<h2 id="後續撰寫方向">後續撰寫方向</h2>
<ol>
<li>SIGTERM、shutdown timeout、readiness false 與 connection draining 的順序。</li>
<li>Kubernetes <code>terminationGracePeriodSeconds</code> 與 Go <code>http.Server.Shutdown</code> 如何配合。</li>
<li>Load balancer idle timeout 如何影響 <a href="/blog/backend/knowledge-cards/websocket/" data-link-title="WebSocket" data-link-desc="說明 WebSocket 如何提供長連線雙向即時通訊">WebSocket</a> heartbeat 參數。</li>
<li>Container memory limit、Go memory limit 與 OOM killer 之間的關係。</li>
<li>systemd restart policy 與 health endpoint 的責任分工。</li>
</ol>
<h2 id="觀察平台會主動改變服務生命週期">【觀察】平台會主動改變服務生命週期</h2>
<p>Go 程式不會在真空裡執行。Kubernetes、systemd、load balancer、container runtime 都會影響服務何時接新請求、何時開始收尾、何時被強制終止。這表示程式不只要「能跑」，還要能跟平台協調。</p>
<p>常見的生命週期訊號有：</p>
<ul>
<li>SIGTERM</li>
<li>readiness false</li>
<li>HTTP shutdown</li>
<li>connection draining</li>
<li>memory pressure</li>
</ul>
<h2 id="判讀health-與-readiness-有不同合約">【判讀】health 與 readiness 有不同合約</h2>
<p>health 通常表示服務自己還活著，readiness 則表示它是否適合接新流量。</p>
<ul>
<li>health 可以用來讓平台知道 process 還活著。</li>
<li>readiness 可以用來讓 load balancer 停止送新請求。</li>
</ul>
<p>如果兩者混在一起，部署時就容易出現「服務還沒收尾就被塞新流量」或「其實還能接流量卻被誤判下線」的問題。</p>
<h2 id="策略shutdown-應該是可預期流程">【策略】shutdown 應該是可預期流程</h2>
<p>典型的 shutdown 順序是：</p>
<ol>
<li>接收到停止訊號。</li>
<li>先把 readiness 關掉。</li>
<li>停止接新流量。</li>
<li>讓現有 request / worker / websocket 收尾。</li>
<li>超時後強制結束。</li>
</ol>
<p>這個順序能讓平台有時間把流量移走，也讓應用有時間清理資源。</p>
<h2 id="執行資源限制要和-runtime-觀念一起看">【執行】資源限制要和 runtime 觀念一起看</h2>
<p>container memory limit 不只是部署平台的事，也會影響 Go runtime 的行為。當可用記憶體變少時，應用更需要控制：</p>
<ul>
<li>goroutine 數量</li>
<li><a href="/blog/backend/knowledge-cards/buffer/" data-link-title="Buffer" data-link-desc="說明系統如何用暫存空間吸收短暫速度差與尖峰流量">buffer</a> 大小</li>
<li>cache 體積</li>
<li>in-memory <a href="/blog/backend/knowledge-cards/queue/" data-link-title="Queue" data-link-desc="說明 queue 如何保存等待處理的工作並形成容量邊界">queue</a> 長度</li>
</ul>
<p>如果這些沒有限制，平台的 OOM killer 可能會比你的 <a href="/blog/backend/knowledge-cards/graceful-shutdown/" data-link-title="Graceful Shutdown" data-link-desc="說明服務停止前如何排空流量、完成工作與保存狀態">graceful shutdown</a> 先來。</p>
<h2 id="延伸平台合約要被測試">【延伸】平台合約要被測試</h2>
<p>部署平台合約需要在測試或預備環境驗證。至少要確認：</p>
<ul>
<li>shutdown 時 request 是否停止接入</li>
<li>worker 是否有機會收尾</li>
<li>WebSocket 是否有 close path</li>
<li>health 與 readiness 是否分工清楚</li>
</ul>
<h2 id="本章不處理">本章不處理</h2>
<p>本章不會完整教 Kubernetes 或 systemd 操作。重點是讓 Go 程式設計能清楚暴露平台需要的生命週期訊號。</p>
<h2 id="和-go-教材的關係">和 Go 教材的關係</h2>
<p>這一章承接的是 Go 的 shutdown 與 runtime 限制；如果你要先回看語言教材，可以讀：</p>
<ul>
<li><a href="/blog/go-advanced/03-runtime-profiling/gc-memory-limit/" data-link-title="3.1 GC 與 memory limit" data-link-desc="理解 debug.SetMemoryLimit 在長時間服務中的用途">Go 進階：GC 與 memory limit</a></li>
<li><a href="/blog/backend/knowledge-cards/graceful-shutdown/" data-link-title="Graceful Shutdown" data-link-desc="說明服務停止前如何排空流量、完成工作與保存狀態">Go 進階：graceful shutdown 與 signal handling</a></li>
<li><a href="/blog/go-advanced/06-production-operations/health-diagnostics/" data-link-title="6.2 健康檢查與診斷 endpoint" data-link-desc="區分服務可用性與工程診斷入口">Go 進階：健康檢查與診斷 endpoint</a></li>
</ul>
]]></content:encoded></item><item><title>6.5 跨進 production 的 routing 中樞</title><link>https://tarrragon.github.io/blog/llm/06-security/routing-to-production-security/</link><pubDate>Tue, 12 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/llm/06-security/routing-to-production-security/</guid><description>&lt;p>模組六前五章建立了個人 dev 視角的 LLM 安全判讀（&lt;a href="https://tarrragon.github.io/blog/llm/06-security/model-supply-chain-trust/" data-link-title="6.0 模型供應鏈與信任邊界" data-link-desc="個人 dev 用本地 LLM 時的模型權重來源信任：GGUF 完整性、Hugging Face / Ollama registry 信任、量化版本污染、檔案完整性檢查">6.0 供應鏈&lt;/a>、&lt;a href="https://tarrragon.github.io/blog/llm/06-security/inference-server-binding/" data-link-title="6.1 推論伺服器的綁定與暴露範圍" data-link-desc="個人 dev 場景下 llama-server / Ollama / LM Studio 的 bind address 判讀：127.0.0.1 vs LAN vs 反代、預設安全、誤開放給內網的後果">6.1 伺服器綁定&lt;/a>、&lt;a href="https://tarrragon.github.io/blog/llm/06-security/tool-use-permission-model/" data-link-title="6.2 tool use 與 MCP server 的權限模型" data-link-desc="個人 dev 場景下 tool use / MCP server 的副作用權限：檔案系統 / shell / 網路存取邊界、第三方 MCP 信任、副作用的可逆性">6.2 tool use 權限&lt;/a>、&lt;a href="https://tarrragon.github.io/blog/llm/06-security/prompt-injection-in-ide/" data-link-title="6.3 IDE 場景的 prompt injection" data-link-desc="個人 dev 場景下 IDE 寫 code 工作流的 prompt injection：codebase 內容、外部文件、剪貼簿作為攻擊面、跟雲端 LLM 場景的差異">6.3 prompt injection&lt;/a>、&lt;a href="https://tarrragon.github.io/blog/llm/06-security/cross-cloud-local-data-boundary/" data-link-title="6.4 跨雲端 / 本地的資料邊界" data-link-desc="個人 dev 場景下混用雲端 LLM 跟本地 LLM 時的 prompt 洩漏點：Continue.dev 多 provider 設定、隱私資料流、按敏感度分流的判讀">6.4 跨雲端資料邊界&lt;/a>）、framing 的根基是 &lt;a href="https://tarrragon.github.io/blog/llm/00-foundations/privacy-data-flow/" data-link-title="0.7 隱私 / 資安的資料流原理" data-link-desc="從「位置」到「資料流」的思考升級：信任邊界、合約模型、零信任原則套用到 LLM 工作流">0.7 隱私資料流原理&lt;/a>。當工作流從個人 dev 跨進團隊共用、再跨進 production 服務時、安全議題的 framing 跟控制機制都會升級。升級的軸對應 backend 既有卡片：&lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/attack-surface/" data-link-title="Attack Surface" data-link-desc="說明系統哪些對外暴露面會被先行探測與枚舉">attack-surface&lt;/a>、&lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/blast-radius/" data-link-title="Blast Radius" data-link-desc="說明事故影響面如何估算與隔離">blast-radius&lt;/a>、&lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/trust-boundary/" data-link-title="Trust Boundary" data-link-desc="說明系統哪些位置開始不能沿用原本的信任假設">trust-boundary&lt;/a>、&lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/tenant-boundary/" data-link-title="Tenant Boundary" data-link-desc="說明多租戶系統如何隔離不同客戶或組織的資料與資源">tenant-boundary&lt;/a>、&lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/iam/" data-link-title="IAM" data-link-desc="說明 identity and access management 如何集中管理身分、角色與權限">iam&lt;/a> 等。本章是這兩個跨越的 routing 中樞、把每個議題在 production 場景下的對應位置（backend/07 對應卡片）整理出來、避免讀者在升級階段「不知道下一步該讀什麼」。&lt;/p>
&lt;p>讀完本章後、你應該能判讀自己當前處在三層哪一階、要跨到下一階時需要補哪些議題、對應到 backend/07 哪些卡片。&lt;/p>
&lt;h2 id="本章目標">本章目標&lt;/h2>
&lt;ol>
&lt;li>區分個人 dev、團隊共用、production 三層 LLM 部署的安全議題差異。&lt;/li>
&lt;li>知道從個人 dev 跨到團隊共用時、需要補哪些控制。&lt;/li>
&lt;li>知道從團隊共用跨到 production 時、需要補哪些控制。&lt;/li>
&lt;li>認識每層演化對應的 backend/07 卡片清單。&lt;/li>
&lt;li>知道何時該停留在當前層、何時該主動升級。&lt;/li>
&lt;/ol>
&lt;h2 id="三層演化的判讀軸">三層演化的判讀軸&lt;/h2>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-text" data-lang="text">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">個人 dev（本模組前五章）
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl"> ↓
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">團隊共用（家裡 / 小團隊 / 內部部署）
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl"> ↓
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl">production 服務（對外服務 / SaaS / B2B）&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>三層的核心差異：&lt;/p>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>維度&lt;/th>
 &lt;th>個人 dev&lt;/th>
 &lt;th>團隊共用&lt;/th>
 &lt;th>production 服務&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>使用者數&lt;/td>
 &lt;td>1&lt;/td>
 &lt;td>5 ~ 50&lt;/td>
 &lt;td>50+ / 對外不限&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>信任假設&lt;/td>
 &lt;td>自己信自己&lt;/td>
 &lt;td>同事互信、訪客不信&lt;/td>
 &lt;td>全部不信、用 IAM 控制&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>資料邊界&lt;/td>
 &lt;td>本機 user account&lt;/td>
 &lt;td>內網&lt;/td>
 &lt;td>多租戶、明確隔離&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>失誤後果&lt;/td>
 &lt;td>自己承擔&lt;/td>
 &lt;td>影響少數同事&lt;/td>
 &lt;td>影響大量用戶 / 法律責任&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>控制機制需求&lt;/td>
 &lt;td>基本配置 + git track&lt;/td>
 &lt;td>+ auth + log + 政策&lt;/td>
 &lt;td>+ IAM + audit + IR + 合規&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>對應的時間 / 預算&lt;/td>
 &lt;td>小時級&lt;/td>
 &lt;td>天級&lt;/td>
 &lt;td>週 / 月級、需要專人或團隊&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>關鍵原則：&lt;strong>控制機制應該跟需求對齊、不該過度設計也不該不足&lt;/strong>。個人 dev 不需要 SOC 2 audit、production 不能只靠 git track。&lt;/p></description><content:encoded><![CDATA[<p>模組六前五章建立了個人 dev 視角的 LLM 安全判讀（<a href="/blog/llm/06-security/model-supply-chain-trust/" data-link-title="6.0 模型供應鏈與信任邊界" data-link-desc="個人 dev 用本地 LLM 時的模型權重來源信任：GGUF 完整性、Hugging Face / Ollama registry 信任、量化版本污染、檔案完整性檢查">6.0 供應鏈</a>、<a href="/blog/llm/06-security/inference-server-binding/" data-link-title="6.1 推論伺服器的綁定與暴露範圍" data-link-desc="個人 dev 場景下 llama-server / Ollama / LM Studio 的 bind address 判讀：127.0.0.1 vs LAN vs 反代、預設安全、誤開放給內網的後果">6.1 伺服器綁定</a>、<a href="/blog/llm/06-security/tool-use-permission-model/" data-link-title="6.2 tool use 與 MCP server 的權限模型" data-link-desc="個人 dev 場景下 tool use / MCP server 的副作用權限：檔案系統 / shell / 網路存取邊界、第三方 MCP 信任、副作用的可逆性">6.2 tool use 權限</a>、<a href="/blog/llm/06-security/prompt-injection-in-ide/" data-link-title="6.3 IDE 場景的 prompt injection" data-link-desc="個人 dev 場景下 IDE 寫 code 工作流的 prompt injection：codebase 內容、外部文件、剪貼簿作為攻擊面、跟雲端 LLM 場景的差異">6.3 prompt injection</a>、<a href="/blog/llm/06-security/cross-cloud-local-data-boundary/" data-link-title="6.4 跨雲端 / 本地的資料邊界" data-link-desc="個人 dev 場景下混用雲端 LLM 跟本地 LLM 時的 prompt 洩漏點：Continue.dev 多 provider 設定、隱私資料流、按敏感度分流的判讀">6.4 跨雲端資料邊界</a>）、framing 的根基是 <a href="/blog/llm/00-foundations/privacy-data-flow/" data-link-title="0.7 隱私 / 資安的資料流原理" data-link-desc="從「位置」到「資料流」的思考升級：信任邊界、合約模型、零信任原則套用到 LLM 工作流">0.7 隱私資料流原理</a>。當工作流從個人 dev 跨進團隊共用、再跨進 production 服務時、安全議題的 framing 跟控制機制都會升級。升級的軸對應 backend 既有卡片：<a href="/blog/backend/knowledge-cards/attack-surface/" data-link-title="Attack Surface" data-link-desc="說明系統哪些對外暴露面會被先行探測與枚舉">attack-surface</a>、<a href="/blog/backend/knowledge-cards/blast-radius/" data-link-title="Blast Radius" data-link-desc="說明事故影響面如何估算與隔離">blast-radius</a>、<a href="/blog/backend/knowledge-cards/trust-boundary/" data-link-title="Trust Boundary" data-link-desc="說明系統哪些位置開始不能沿用原本的信任假設">trust-boundary</a>、<a href="/blog/backend/knowledge-cards/tenant-boundary/" data-link-title="Tenant Boundary" data-link-desc="說明多租戶系統如何隔離不同客戶或組織的資料與資源">tenant-boundary</a>、<a href="/blog/backend/knowledge-cards/iam/" data-link-title="IAM" data-link-desc="說明 identity and access management 如何集中管理身分、角色與權限">iam</a> 等。本章是這兩個跨越的 routing 中樞、把每個議題在 production 場景下的對應位置（backend/07 對應卡片）整理出來、避免讀者在升級階段「不知道下一步該讀什麼」。</p>
<p>讀完本章後、你應該能判讀自己當前處在三層哪一階、要跨到下一階時需要補哪些議題、對應到 backend/07 哪些卡片。</p>
<h2 id="本章目標">本章目標</h2>
<ol>
<li>區分個人 dev、團隊共用、production 三層 LLM 部署的安全議題差異。</li>
<li>知道從個人 dev 跨到團隊共用時、需要補哪些控制。</li>
<li>知道從團隊共用跨到 production 時、需要補哪些控制。</li>
<li>認識每層演化對應的 backend/07 卡片清單。</li>
<li>知道何時該停留在當前層、何時該主動升級。</li>
</ol>
<h2 id="三層演化的判讀軸">三層演化的判讀軸</h2>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">個人 dev（本模組前五章）
</span></span><span class="line"><span class="ln">2</span><span class="cl">   ↓
</span></span><span class="line"><span class="ln">3</span><span class="cl">團隊共用（家裡 / 小團隊 / 內部部署）
</span></span><span class="line"><span class="ln">4</span><span class="cl">   ↓
</span></span><span class="line"><span class="ln">5</span><span class="cl">production 服務（對外服務 / SaaS / B2B）</span></span></code></pre></div><p>三層的核心差異：</p>
<table>
  <thead>
      <tr>
          <th>維度</th>
          <th>個人 dev</th>
          <th>團隊共用</th>
          <th>production 服務</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>使用者數</td>
          <td>1</td>
          <td>5 ~ 50</td>
          <td>50+ / 對外不限</td>
      </tr>
      <tr>
          <td>信任假設</td>
          <td>自己信自己</td>
          <td>同事互信、訪客不信</td>
          <td>全部不信、用 IAM 控制</td>
      </tr>
      <tr>
          <td>資料邊界</td>
          <td>本機 user account</td>
          <td>內網</td>
          <td>多租戶、明確隔離</td>
      </tr>
      <tr>
          <td>失誤後果</td>
          <td>自己承擔</td>
          <td>影響少數同事</td>
          <td>影響大量用戶 / 法律責任</td>
      </tr>
      <tr>
          <td>控制機制需求</td>
          <td>基本配置 + git track</td>
          <td>+ auth + log + 政策</td>
          <td>+ IAM + audit + IR + 合規</td>
      </tr>
      <tr>
          <td>對應的時間 / 預算</td>
          <td>小時級</td>
          <td>天級</td>
          <td>週 / 月級、需要專人或團隊</td>
      </tr>
  </tbody>
</table>
<p>關鍵原則：<strong>控制機制應該跟需求對齊、不該過度設計也不該不足</strong>。個人 dev 不需要 SOC 2 audit、production 不能只靠 git track。</p>
<h2 id="個人-dev--團隊共用要補什麼">個人 dev → 團隊共用：要補什麼</h2>
<p>從個人 dev 跨到團隊共用、典型的觸發場景：</p>
<ol>
<li>家裡跑模型給家人 / 室友用</li>
<li>小團隊共用一台 LLM server</li>
<li>公司內部部署、有 5 ~ 50 個工程師用</li>
</ol>
<p>需要補的控制（在前五章的基礎上）：</p>
<table>
  <thead>
      <tr>
          <th>議題</th>
          <th>從個人 dev 的什麼演化而來</th>
          <th>對應的補強</th>
          <th>backend/07 對應卡片</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>身份識別</td>
          <td>自己一人 → 多人共用</td>
          <td>加 auth、知道誰送了什麼 prompt</td>
          <td><a href="/blog/backend/07-security-data-protection/identity-access-boundary/" data-link-title="7.2 身分與授權邊界" data-link-desc="以問題驅動方式整理身分、授權、會話與供應商身分鏈">identity-access-boundary</a></td>
      </tr>
      <tr>
          <td>入口治理</td>
          <td>bind 到 LAN 加 API key</td>
          <td>反代 + TLS + rate limit</td>
          <td><a href="/blog/backend/07-security-data-protection/entrypoint-and-server-protection/" data-link-title="7.3 入口治理與伺服器防護" data-link-desc="以問題驅動方式整理對外入口、管理平面與伺服器邊界">entrypoint-and-server-protection</a></td>
      </tr>
      <tr>
          <td>傳輸信任</td>
          <td>內網 HTTP 偶爾 OK</td>
          <td>內網全程 HTTPS、TLS 憑證管理</td>
          <td><a href="/blog/backend/07-security-data-protection/transport-trust-and-certificate-lifecycle/" data-link-title="7.5 傳輸信任與憑證生命週期" data-link-desc="以問題驅動方式整理傳輸信任鏈、會話完整性與憑證節奏">transport-trust-and-certificate-lifecycle</a></td>
      </tr>
      <tr>
          <td>秘密管理</td>
          <td>dotfile 環境變數</td>
          <td>集中 secret store（Vault / SSM / Doppler）</td>
          <td><a href="/blog/backend/07-security-data-protection/secrets-and-machine-credential-governance/" data-link-title="7.6 秘密管理與機器憑證治理" data-link-desc="以問題驅動方式整理 secret、token、key 與機器身份治理">secrets-and-machine-credential-governance</a></td>
      </tr>
      <tr>
          <td>供應鏈</td>
          <td>自己抓 GGUF / npm package（見 <a href="/blog/llm/06-security/model-supply-chain-trust/" data-link-title="6.0 模型供應鏈與信任邊界" data-link-desc="個人 dev 用本地 LLM 時的模型權重來源信任：GGUF 完整性、Hugging Face / Ollama registry 信任、量化版本污染、檔案完整性檢查">6.0</a>）</td>
          <td>內部 mirror、固定 version、定期 audit</td>
          <td><a href="/blog/backend/07-security-data-protection/supply-chain-integrity-and-artifact-trust/" data-link-title="7.12 供應鏈完整性與 Artifact 信任" data-link-desc="定義 build provenance、artifact 信任與交付鏈風險問題">supply-chain-integrity-and-artifact-trust</a></td>
      </tr>
      <tr>
          <td>政策</td>
          <td>自己腦中的判讀</td>
          <td>寫明 acceptable use、敏感內容指引</td>
          <td>（結合各章的政策性章節）</td>
      </tr>
  </tbody>
</table>
<p>團隊共用階段的常見 anti-pattern：</p>
<ol>
<li><strong>把個人 dev 的 dotfile config 直接複製到團隊 server</strong>：API key、log 路徑、reset 機制都不對。</li>
<li><strong>依賴單一管理員口頭傳遞政策</strong>：沒寫下來、新成員不知道、人離職就失傳。</li>
<li><strong>跳過 auth 直接用「公司內網本來就安全」當理由</strong>：內網設備有訪客、有實習生、有 BYOD、有合作廠商；零信任的最低版本仍要做。</li>
</ol>
<h2 id="團隊共用--production要補什麼">團隊共用 → production：要補什麼</h2>
<p>從團隊共用跨到 production 服務、典型的觸發場景：</p>
<ol>
<li>把內部 LLM 服務開放給外部客戶（B2B）</li>
<li>做 SaaS-like LLM API 對外賣</li>
<li>把 LLM 嵌入產品給終端用戶用</li>
</ol>
<p>需要補的控制（在前面兩層的基礎上）：</p>
<table>
  <thead>
      <tr>
          <th>議題</th>
          <th>從團隊共用的什麼演化而來</th>
          <th>對應的補強</th>
          <th>backend/07 對應卡片</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>多租戶隔離</td>
          <td>共用 server 跨同事 → 跨用戶</td>
          <td>KV cache / log / model 訪問權的多租戶隔離</td>
          <td><a href="/blog/backend/07-security-data-protection/llm-multi-tenant-isolation/" data-link-title="LLM 多租戶推論隔離" data-link-desc="production LLM 服務的多租戶隔離：KV cache 不共享、log / model artifact 隔離、跨用戶 prompt 洩漏面">llm-multi-tenant-isolation</a></td>
      </tr>
      <tr>
          <td>deployment 供應鏈</td>
          <td>內部 mirror → 對外責任</td>
          <td>模型 release 流程、簽章、回退機制</td>
          <td><a href="/blog/backend/07-security-data-protection/llm-deployment-supply-chain/" data-link-title="LLM Deployment 供應鏈完整性" data-link-desc="把 LLM 模型權重、推論伺服器、第三方 plugin 三條 production 供應鏈納入既有 artifact trust 框架的判讀">llm-deployment-supply-chain</a></td>
      </tr>
      <tr>
          <td>agent prompt injection 後果</td>
          <td>IDE injection（<a href="/blog/llm/06-security/prompt-injection-in-ide/" data-link-title="6.3 IDE 場景的 prompt injection" data-link-desc="個人 dev 場景下 IDE 寫 code 工作流的 prompt injection：codebase 內容、外部文件、剪貼簿作為攻擊面、跟雲端 LLM 場景的差異">6.3</a>）→ agent 場景（<a href="/blog/llm/04-applications/agent-architecture/" data-link-title="4.4 Agent 架構原理" data-link-desc="Agent loop 結構、失敗模式、什麼任務適合 vs 不適合、跟人類審查的協作模型">4.4</a>）</td>
          <td>tool spec 設計、限制 agent loop、人為 review checkpoint</td>
          <td><a href="/blog/backend/07-security-data-protection/llm-prompt-injection-in-agent/" data-link-title="LLM Agent Prompt Injection 後果治理" data-link-desc="production LLM agent 場景的 prompt injection 後果：tool spec 設計、agent loop 限制、review checkpoint、跟 incident workflow 的接合">llm-prompt-injection-in-agent</a></td>
      </tr>
      <tr>
          <td>log / PII 治理</td>
          <td>簡單 access log → 完整 prompt log</td>
          <td>log 累積的 prompt 內容、PII 偵測與過濾、保留期限</td>
          <td><a href="/blog/backend/07-security-data-protection/llm-log-and-pii-governance/" data-link-title="LLM Log 與 PII 治理" data-link-desc="production LLM 服務的 prompt log 累積、PII 偵測與過濾、保留期限與合規對齊">llm-log-and-pii-governance</a></td>
      </tr>
      <tr>
          <td>偵測訊號</td>
          <td>看 log → 主動偵測</td>
          <td>LLM agent 異常行為的訊號設計、tool use 異常模式</td>
          <td><a href="/blog/backend/07-security-data-protection/llm-as-service-detection-coverage/" data-link-title="LLM Service 偵測訊號覆蓋" data-link-desc="production LLM 服務的 detection 訊號設計：tool call 異常模式、prompt injection 觸發徵兆、abuse 跟濫用模式、跟既有 detection-coverage 框架的接合">llm-as-service-detection-coverage</a></td>
      </tr>
      <tr>
          <td>Workload Identity</td>
          <td>server 自己持 API key → workload IAM</td>
          <td>每個 workload 一個身份、可 audit</td>
          <td><a href="/blog/backend/07-security-data-protection/workload-identity-and-federated-trust/" data-link-title="7.10 Workload Identity 與聯邦信任邊界" data-link-desc="定義非人類身份、跨平台信任與短時憑證治理問題">workload-identity-and-federated-trust</a></td>
      </tr>
      <tr>
          <td>偵測平台</td>
          <td>手動觀察 → SIEM</td>
          <td>集中偵測、alert 系統</td>
          <td><a href="/blog/backend/07-security-data-protection/detection-coverage-and-signal-governance/" data-link-title="7.13 偵測覆蓋率與訊號治理" data-link-desc="定義偵測覆蓋、訊號品質與誤報成本的治理問題">detection-coverage-and-signal-governance</a></td>
      </tr>
      <tr>
          <td>Incident response</td>
          <td>重啟解決 → IR 流程</td>
          <td>IR 演練、escalation、post-mortem</td>
          <td><a href="/blog/backend/07-security-data-protection/incident-case-to-control-workflow/" data-link-title="7.16 從公開事故到工程 Workflow：案例如何回寫控制面" data-link-desc="建立公開事故如何轉成控制面失效樣式與 workflow 回寫的大綱">incident-case-to-control-workflow</a></td>
      </tr>
      <tr>
          <td>合規</td>
          <td>不需要 → 對外服務需要</td>
          <td>GDPR / HIPAA / SOC 2 等</td>
          <td><a href="/blog/backend/07-security-data-protection/data-protection-and-masking-governance/" data-link-title="7.4 資料保護與遮罩治理" data-link-desc="以問題驅動方式整理資料分級、遮罩、匯出與備份治理">data-protection-and-masking-governance</a></td>
      </tr>
  </tbody>
</table>
<p>production 階段不是「把團隊共用放大」、是「另一個複雜度等級」。多數議題從 backend/07 既有卡片開始讀、LLM-specific 議題在 backend/07 的 LLM 相關章節（<code>llm-*.md</code>）補充。</p>
<h2 id="何時該停留在當前層">何時該停留在當前層</h2>
<p>不是所有工作流都需要升級。停留在當前層的合理判讀：</p>
<table>
  <thead>
      <tr>
          <th>當前層</th>
          <th>該停留的徵兆</th>
          <th>升級的徵兆</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>個人 dev</td>
          <td>只有自己用、不分享、沒對外暴露需求</td>
          <td>開始有人想連你的 server / 想做 demo 給朋友 / 想分享給家人</td>
      </tr>
      <tr>
          <td>團隊共用</td>
          <td>5 ~ 50 人的內部使用、不對外賣、不涉及客戶 PII</td>
          <td>客戶要連 / 對外 SLA / 要收費 / 開始涉及客戶 PII</td>
      </tr>
      <tr>
          <td>production</td>
          <td>已對外服務、有 SLA、有客戶</td>
          <td>（目標狀態）</td>
      </tr>
  </tbody>
</table>
<p>升級的兩個常見錯誤：</p>
<ol>
<li><strong>過早升級</strong>：個人 dev 階段就上 enterprise stack（IAM、Vault、SIEM）、複雜度過高、自己用不到、維護成本反而傷工作流。</li>
<li><strong>過晚升級</strong>：團隊共用階段該補的控制沒補、出事才補、可能已經有資料外洩 / 法律責任。</li>
</ol>
<p>判讀依據：<strong>控制機制對齊實際 threat model 跟 user 規模</strong>、不是「越多越好」。</p>
<h2 id="跨層升級的常見-anti-pattern">跨層升級的常見 anti-pattern</h2>
<p>從各層往上跨時、常見的意外：</p>
<ol>
<li><strong>把個人 dev 的 LLM client config 直接放上 production</strong>：autocomplete model、default model、API key 都不對；production 場景需要重新設計 model 路由。</li>
<li><strong>把個人習慣的 prompt injection 防護當 production 防護</strong>：「我 git track 工作流」對個人 dev 夠、production agent 場景下、git 不在迴路裡、要改用 tool spec + review checkpoint。</li>
<li><strong>production 場景仍然依賴使用者「看 prompt 內容」</strong>：使用者數量大、不可能每個 prompt 都人工看；production 需要自動化偵測訊號。</li>
<li><strong>production 場景沒 tenant 隔離</strong>：所有用戶的 KV cache / log / context 混在一起、A 用戶能看到 B 用戶的 cache hit。</li>
<li><strong>沒有 vendor 政策的書面化承諾</strong>：team 階段口頭講「我們不訓練客戶資料」、production 階段要寫進條款 / SLA。</li>
</ol>
<h2 id="給讀者的層級判讀清單">給讀者的層級判讀清單</h2>
<p>判斷自己當前在哪一層：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">[ ] 只有自己用                                              → 個人 dev
</span></span><span class="line"><span class="ln">2</span><span class="cl">[ ] 1 ~ 5 個人共用一台 server                                → 個人 dev 或團隊共用初期
</span></span><span class="line"><span class="ln">3</span><span class="cl">[ ] 5 ~ 50 個人共用、內部部署                                → 團隊共用
</span></span><span class="line"><span class="ln">4</span><span class="cl">[ ] 對外提供 API 服務 / SaaS                                 → production
</span></span><span class="line"><span class="ln">5</span><span class="cl">[ ] 服務多個客戶 / 涉及客戶 PII                              → production
</span></span><span class="line"><span class="ln">6</span><span class="cl">[ ] 有 SLA / 合約承諾                                        → production</span></span></code></pre></div><p>對應的「要補的議題」：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln"> 1</span><span class="cl">個人 dev → 團隊共用：
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">  [ ] auth                  ← backend/07 identity-access-boundary
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">  [ ] 入口治理               ← backend/07 entrypoint-and-server-protection
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">  [ ] TLS                    ← backend/07 transport-trust-and-certificate-lifecycle
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">  [ ] secret 集中管理        ← backend/07 secrets-and-machine-credential-governance
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">  [ ] 內部 supply chain      ← backend/07 supply-chain-integrity-and-artifact-trust
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">  [ ] 寫下 acceptable use 政策
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">團隊共用 → production：
</span></span><span class="line"><span class="ln">10</span><span class="cl">  [ ] 多租戶 isolation       ← backend/07 llm-multi-tenant-isolation
</span></span><span class="line"><span class="ln">11</span><span class="cl">  [ ] deployment 供應鏈      ← backend/07 llm-deployment-supply-chain
</span></span><span class="line"><span class="ln">12</span><span class="cl">  [ ] agent prompt injection ← backend/07 llm-prompt-injection-in-agent
</span></span><span class="line"><span class="ln">13</span><span class="cl">  [ ] log / PII 治理         ← backend/07 llm-log-and-pii-governance
</span></span><span class="line"><span class="ln">14</span><span class="cl">  [ ] 偵測訊號               ← backend/07 llm-as-service-detection-coverage
</span></span><span class="line"><span class="ln">15</span><span class="cl">  [ ] workload identity      ← backend/07 workload-identity-and-federated-trust
</span></span><span class="line"><span class="ln">16</span><span class="cl">  [ ] 偵測平台               ← backend/07 detection-coverage-and-signal-governance
</span></span><span class="line"><span class="ln">17</span><span class="cl">  [ ] IR 流程                ← backend/07 incident-case-to-control-workflow
</span></span><span class="line"><span class="ln">18</span><span class="cl">  [ ] 合規                   ← backend/07 data-protection-and-masking-governance</span></span></code></pre></div><h2 id="下一步">下一步</h2>
<p>本章是模組六的最後一章。下一步可以回到 <a href="/blog/llm/06-security/" data-link-title="模組六：本地 LLM 的安全與權限" data-link-desc="個人 dev 在自己機器上跑本地 LLM 的安全議題：模型供應鏈、推論伺服器綁定、tool use 副作用、prompt injection 在 IDE、跨雲端 / 本地資料邊界">模組六 _index</a> 看其他章節、或進入 <a href="/blog/backend/07-security-data-protection/" data-link-title="模組七：資安與資料保護" data-link-desc="以問題驅動方式擴充資安知識網：先定義服務環節問題，再以案例作為觸發式參考">Backend 模組七 資安與資料保護</a> 接 production 場景。</p>
]]></content:encoded></item><item><title>1.6 rate limiting 與 backpressure</title><link>https://tarrragon.github.io/blog/go-advanced/01-concurrency-patterns/rate-limit/</link><pubDate>Wed, 22 Apr 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/go-advanced/01-concurrency-patterns/rate-limit/</guid><description>&lt;p>rate limiting 的核心責任是把過量輸入轉成可預期的服務行為。服務可以等待、排隊、拒絕、降級或取樣，但這些策略應由程式明確決定，而不是讓 goroutine、channel 或 memory 自行失控。&lt;/p>
&lt;h2 id="預計補充內容">預計補充內容&lt;/h2>
&lt;p>這些 &lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/backpressure/" data-link-title="Backpressure" data-link-desc="說明下游處理速度不足時系統如何讓上游依下游能力送出工作">backpressure&lt;/a> 邊界會在下列章節展開：&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://tarrragon.github.io/blog/go/04-concurrency/channel/" data-link-title="4.2 channel：資料傳遞與 backpressure " data-link-desc="理解 channel 如何在 goroutine 之間傳遞資料並形成 backpressure ">Go 入門：channel：事件流與 backpressure &lt;/a>：先理解 channel &lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/buffer/" data-link-title="Buffer" data-link-desc="說明系統如何用暫存空間吸收短暫速度差與尖峰流量">buffer&lt;/a> 和等待機制，才知道限流不是只有一種做法。&lt;/li>
&lt;li>&lt;a href="https://tarrragon.github.io/blog/go-advanced/01-concurrency-patterns/non-blocking-send/" data-link-title="1.3 非阻塞送出與事件丟棄策略" data-link-desc="設計 channel 滿載時的服務行為">Go 進階：非阻塞送出與事件丟棄策略&lt;/a>：當系統必須在滿載時做出明確選擇，這裡會處理 drop、覆蓋與回錯的語意。&lt;/li>
&lt;li>&lt;a href="https://tarrragon.github.io/blog/backend/05-deployment-platform/" data-link-title="模組五：部署平台與網路入口" data-link-desc="整理 Kubernetes、systemd、load balancer、container 與服務生命週期合約">Backend：部署平台與網路入口&lt;/a>：跨節點流量治理、gateway 與 quota，屬於平台層責任。&lt;/li>
&lt;/ul>
&lt;h2 id="本章不處理">本章不處理&lt;/h2>
&lt;p>本章先處理單一 process 內的輸入控制與 backpressure ；跨節點流量治理、gateway 與 quota 的平台責任，會放在 &lt;a href="https://tarrragon.github.io/blog/backend/05-deployment-platform/" data-link-title="模組五：部署平台與網路入口" data-link-desc="整理 Kubernetes、systemd、load balancer、container 與服務生命週期合約">Backend：部署平台與網路入口&lt;/a>。&lt;/p>
&lt;h2 id="與-backend-教材的分工">與 Backend 教材的分工&lt;/h2>
&lt;p>本章只處理 Go process 內的速率控制。API gateway、load balancer、service mesh、&lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/broker/" data-link-title="Broker" data-link-desc="說明 broker 在訊息傳遞系統中負責保存、路由與交付訊息">broker&lt;/a> quota 與跨節點流量治理會放在 &lt;a href="https://tarrragon.github.io/blog/backend/05-deployment-platform/" data-link-title="模組五：部署平台與網路入口" data-link-desc="整理 Kubernetes、systemd、load balancer、container 與服務生命週期合約">Backend：部署平台與網路入口&lt;/a>。&lt;/p>
&lt;h2 id="和-go-教材的關係">和 Go 教材的關係&lt;/h2>
&lt;p>這一章承接的是 channel backpressure 、non-blocking send 與 worker capacity；如果你要先回看語言教材，可以讀：&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://tarrragon.github.io/blog/go/04-concurrency/channel/" data-link-title="4.2 channel：資料傳遞與 backpressure " data-link-desc="理解 channel 如何在 goroutine 之間傳遞資料並形成 backpressure ">Go：channel：資料傳遞與 backpressure &lt;/a>&lt;/li>
&lt;li>&lt;a href="https://tarrragon.github.io/blog/go/04-concurrency/select/" data-link-title="4.3 select：同時等待多種事件" data-link-desc="用 select 建立事件迴圈">Go：select：同時等待多種事件&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://tarrragon.github.io/blog/go-advanced/01-concurrency-patterns/non-blocking-send/" data-link-title="1.3 非阻塞送出與事件丟棄策略" data-link-desc="設計 channel 滿載時的服務行為">Go：非阻塞送出與事件丟棄策略&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/worker-pool/" data-link-title="Worker Pool" data-link-desc="說明一組 worker 如何限制同時處理量並保護下游資源">Go：bounded worker pool&lt;/a>&lt;/li>
&lt;/ul></description><content:encoded><![CDATA[<p>rate limiting 的核心責任是把過量輸入轉成可預期的服務行為。服務可以等待、排隊、拒絕、降級或取樣，但這些策略應由程式明確決定，而不是讓 goroutine、channel 或 memory 自行失控。</p>
<h2 id="預計補充內容">預計補充內容</h2>
<p>這些 <a href="/blog/backend/knowledge-cards/backpressure/" data-link-title="Backpressure" data-link-desc="說明下游處理速度不足時系統如何讓上游依下游能力送出工作">backpressure</a> 邊界會在下列章節展開：</p>
<ul>
<li><a href="/blog/go/04-concurrency/channel/" data-link-title="4.2 channel：資料傳遞與 backpressure " data-link-desc="理解 channel 如何在 goroutine 之間傳遞資料並形成 backpressure ">Go 入門：channel：事件流與 backpressure </a>：先理解 channel <a href="/blog/backend/knowledge-cards/buffer/" data-link-title="Buffer" data-link-desc="說明系統如何用暫存空間吸收短暫速度差與尖峰流量">buffer</a> 和等待機制，才知道限流不是只有一種做法。</li>
<li><a href="/blog/go-advanced/01-concurrency-patterns/non-blocking-send/" data-link-title="1.3 非阻塞送出與事件丟棄策略" data-link-desc="設計 channel 滿載時的服務行為">Go 進階：非阻塞送出與事件丟棄策略</a>：當系統必須在滿載時做出明確選擇，這裡會處理 drop、覆蓋與回錯的語意。</li>
<li><a href="/blog/backend/05-deployment-platform/" data-link-title="模組五：部署平台與網路入口" data-link-desc="整理 Kubernetes、systemd、load balancer、container 與服務生命週期合約">Backend：部署平台與網路入口</a>：跨節點流量治理、gateway 與 quota，屬於平台層責任。</li>
</ul>
<h2 id="本章不處理">本章不處理</h2>
<p>本章先處理單一 process 內的輸入控制與 backpressure ；跨節點流量治理、gateway 與 quota 的平台責任，會放在 <a href="/blog/backend/05-deployment-platform/" data-link-title="模組五：部署平台與網路入口" data-link-desc="整理 Kubernetes、systemd、load balancer、container 與服務生命週期合約">Backend：部署平台與網路入口</a>。</p>
<h2 id="與-backend-教材的分工">與 Backend 教材的分工</h2>
<p>本章只處理 Go process 內的速率控制。API gateway、load balancer、service mesh、<a href="/blog/backend/knowledge-cards/broker/" data-link-title="Broker" data-link-desc="說明 broker 在訊息傳遞系統中負責保存、路由與交付訊息">broker</a> quota 與跨節點流量治理會放在 <a href="/blog/backend/05-deployment-platform/" data-link-title="模組五：部署平台與網路入口" data-link-desc="整理 Kubernetes、systemd、load balancer、container 與服務生命週期合約">Backend：部署平台與網路入口</a>。</p>
<h2 id="和-go-教材的關係">和 Go 教材的關係</h2>
<p>這一章承接的是 channel backpressure 、non-blocking send 與 worker capacity；如果你要先回看語言教材，可以讀：</p>
<ul>
<li><a href="/blog/go/04-concurrency/channel/" data-link-title="4.2 channel：資料傳遞與 backpressure " data-link-desc="理解 channel 如何在 goroutine 之間傳遞資料並形成 backpressure ">Go：channel：資料傳遞與 backpressure </a></li>
<li><a href="/blog/go/04-concurrency/select/" data-link-title="4.3 select：同時等待多種事件" data-link-desc="用 select 建立事件迴圈">Go：select：同時等待多種事件</a></li>
<li><a href="/blog/go-advanced/01-concurrency-patterns/non-blocking-send/" data-link-title="1.3 非阻塞送出與事件丟棄策略" data-link-desc="設計 channel 滿載時的服務行為">Go：非阻塞送出與事件丟棄策略</a></li>
<li><a href="/blog/backend/knowledge-cards/worker-pool/" data-link-title="Worker Pool" data-link-desc="說明一組 worker 如何限制同時處理量並保護下游資源">Go：bounded worker pool</a></li>
</ul>
]]></content:encoded></item><item><title>模組六：生產操作</title><link>https://tarrragon.github.io/blog/go-advanced/06-production-operations/</link><pubDate>Wed, 22 Apr 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/go-advanced/06-production-operations/</guid><description>&lt;p>生產操作的核心目標是讓 Go 服務可停止、可觀測、可診斷、可漸進啟用功能。服務能在本機跑起來只是第一步；長時間運行後，真正重要的是 shutdown 是否可預期、監控訊號是否清楚、&lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/log/" data-link-title="Log" data-link-desc="說明 log 如何記錄單一事件的上下文並支援事故排查">log&lt;/a> 是否可查詢、功能開關是否有降級策略。&lt;/p>
&lt;p>本模組承接前面的並發、&lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/websocket/" data-link-title="WebSocket" data-link-desc="說明 WebSocket 如何提供長連線雙向即時通訊">WebSocket&lt;/a>、runtime 與測試：&lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/graceful-shutdown/" data-link-title="Graceful Shutdown" data-link-desc="說明服務停止前如何排空流量、完成工作與保存狀態">graceful shutdown&lt;/a> 需要 context 和 goroutine lifecycle，health endpoint 需要區分可用性與診斷，structured &lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/log/" data-link-title="Log" data-link-desc="說明 log 如何記錄單一事件的上下文並支援事故排查">log&lt;/a> 需要能追 event flow，feature gate 需要能安全控制新能力。&lt;/p>
&lt;h2 id="章節列表">章節列表&lt;/h2>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>章節&lt;/th>
 &lt;th>主題&lt;/th>
 &lt;th>關鍵收穫&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>&lt;a href="https://tarrragon.github.io/blog/go-advanced/06-production-operations/graceful-shutdown/" data-link-title="6.1 graceful shutdown 與 signal handling" data-link-desc="用 signal 與 context 傳遞停止訊號">6.1&lt;/a>&lt;/td>
 &lt;td>&lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/graceful-shutdown/" data-link-title="Graceful Shutdown" data-link-desc="說明服務停止前如何排空流量、完成工作與保存狀態">graceful shutdown&lt;/a> 與 signal handling&lt;/td>
 &lt;td>用 signal、context、&lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/timeout/" data-link-title="Timeout" data-link-desc="說明等待外部操作的時間上限如何保護資源與使用者體驗">timeout&lt;/a> 與 owner cleanup 停止服務&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;a href="https://tarrragon.github.io/blog/go-advanced/06-production-operations/health-diagnostics/" data-link-title="6.2 健康檢查與診斷 endpoint" data-link-desc="區分服務可用性與工程診斷入口">6.2&lt;/a>&lt;/td>
 &lt;td>健康檢查與診斷 endpoint&lt;/td>
 &lt;td>區分 health、&lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/readiness/" data-link-title="Readiness" data-link-desc="說明 instance 何時可以安全接收流量，以及 readiness 如何和部署平台協作">readiness&lt;/a>、diagnostics 與 status code 合約&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;a href="https://tarrragon.github.io/blog/go-advanced/06-production-operations/log-fields/" data-link-title="6.3 結構化日誌欄位設計" data-link-desc="讓 log 可 grep、可聚合、可追蹤">6.3&lt;/a>&lt;/td>
 &lt;td>結構化日誌欄位設計&lt;/td>
 &lt;td>用穩定欄位讓 log 可 grep、可聚合、可追蹤&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;a href="https://tarrragon.github.io/blog/go-advanced/06-production-operations/feature-gate/" data-link-title="6.4 版本偵測與 feature gate" data-link-desc="依版本與環境能力啟用功能">6.4&lt;/a>&lt;/td>
 &lt;td>版本偵測與 feature gate&lt;/td>
 &lt;td>用功能開關、能力偵測與降級策略控制行為&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;h2 id="本模組使用的範例主題">本模組使用的範例主題&lt;/h2>
&lt;p>本模組使用虛構的即時通知服務作為範例。範例包含 HTTP server、&lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/websocket/" data-link-title="WebSocket" data-link-desc="說明 WebSocket 如何提供長連線雙向即時通訊">WebSocket&lt;/a> hub、background worker、runtime diagnostics、structured log 與 feature gate。&lt;/p>
&lt;p>範例只用來展示 Go 生產操作設計，不假設讀者正在維護任何特定專案。&lt;/p>
&lt;h2 id="本模組的-go-核心概念">本模組的 Go 核心概念&lt;/h2>
&lt;ul>
&lt;li>用 &lt;code>signal.NotifyContext&lt;/code> 或 signal channel 建立 root context。&lt;/li>
&lt;li>用 &lt;code>http.Server.Shutdown&lt;/code> 停止接受新 request。&lt;/li>
&lt;li>用 context 傳遞停止訊號給 worker、hub、WebSocket pump。&lt;/li>
&lt;li>用 &lt;code>/health&lt;/code>、&lt;code>/ready&lt;/code>、&lt;code>/debug/...&lt;/code> 分開不同操作訊號。&lt;/li>
&lt;li>用 &lt;code>log/slog&lt;/code> 建立穩定 structured fields。&lt;/li>
&lt;li>用 config struct 載入 feature gate，而不是到處讀環境變數。&lt;/li>
&lt;/ul>
&lt;h2 id="學習重點">學習重點&lt;/h2>
&lt;p>學完本模組後，你應該能判斷：&lt;/p>
&lt;ol>
&lt;li>服務收到停止訊號後，哪些元件要先停止接流量&lt;/li>
&lt;li>health、&lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/readiness/" data-link-title="Readiness" data-link-desc="說明 instance 何時可以安全接收流量，以及 readiness 如何和部署平台協作">readiness&lt;/a>、diagnostics 各自回答什麼問題&lt;/li>
&lt;li>structured log 欄位如何支援查詢與聚合&lt;/li>
&lt;li>哪些資料不應進入 log&lt;/li>
&lt;li>feature gate 關閉時應降級、回錯、隱藏還是排程稍後處理&lt;/li>
&lt;/ol>
&lt;h2 id="本模組不處理">本模組不處理&lt;/h2>
&lt;p>本模組不討論 Kubernetes、systemd、雲端平台或完整 SRE 流程的所有細節。這些環境會影響操作策略，但本模組先建立 Go 服務本身應具備的操作邊界；後續可接 &lt;a href="https://tarrragon.github.io/blog/go-advanced/07-distributed-operations/deployment-contracts/" data-link-title="7.5 Kubernetes、systemd 與 load balancer 合約" data-link-desc="理解部署平台如何影響 Go 服務的 shutdown、health 與資源限制">Kubernetes、systemd 與 load balancer 合約&lt;/a> 以及 &lt;a href="https://tarrragon.github.io/blog/go-advanced/07-distributed-operations/observability-pipeline/" data-link-title="7.4 Observability pipeline、metrics 與 tracing" data-link-desc="把 structured log、metric、trace 與 profile 組成可操作的診斷系統">Observability pipeline、metrics 與 tracing&lt;/a>。&lt;/p></description><content:encoded><![CDATA[<p>生產操作的核心目標是讓 Go 服務可停止、可觀測、可診斷、可漸進啟用功能。服務能在本機跑起來只是第一步；長時間運行後，真正重要的是 shutdown 是否可預期、監控訊號是否清楚、<a href="/blog/backend/knowledge-cards/log/" data-link-title="Log" data-link-desc="說明 log 如何記錄單一事件的上下文並支援事故排查">log</a> 是否可查詢、功能開關是否有降級策略。</p>
<p>本模組承接前面的並發、<a href="/blog/backend/knowledge-cards/websocket/" data-link-title="WebSocket" data-link-desc="說明 WebSocket 如何提供長連線雙向即時通訊">WebSocket</a>、runtime 與測試：<a href="/blog/backend/knowledge-cards/graceful-shutdown/" data-link-title="Graceful Shutdown" data-link-desc="說明服務停止前如何排空流量、完成工作與保存狀態">graceful shutdown</a> 需要 context 和 goroutine lifecycle，health endpoint 需要區分可用性與診斷，structured <a href="/blog/backend/knowledge-cards/log/" data-link-title="Log" data-link-desc="說明 log 如何記錄單一事件的上下文並支援事故排查">log</a> 需要能追 event flow，feature gate 需要能安全控制新能力。</p>
<h2 id="章節列表">章節列表</h2>
<table>
  <thead>
      <tr>
          <th>章節</th>
          <th>主題</th>
          <th>關鍵收穫</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><a href="/blog/go-advanced/06-production-operations/graceful-shutdown/" data-link-title="6.1 graceful shutdown 與 signal handling" data-link-desc="用 signal 與 context 傳遞停止訊號">6.1</a></td>
          <td><a href="/blog/backend/knowledge-cards/graceful-shutdown/" data-link-title="Graceful Shutdown" data-link-desc="說明服務停止前如何排空流量、完成工作與保存狀態">graceful shutdown</a> 與 signal handling</td>
          <td>用 signal、context、<a href="/blog/backend/knowledge-cards/timeout/" data-link-title="Timeout" data-link-desc="說明等待外部操作的時間上限如何保護資源與使用者體驗">timeout</a> 與 owner cleanup 停止服務</td>
      </tr>
      <tr>
          <td><a href="/blog/go-advanced/06-production-operations/health-diagnostics/" data-link-title="6.2 健康檢查與診斷 endpoint" data-link-desc="區分服務可用性與工程診斷入口">6.2</a></td>
          <td>健康檢查與診斷 endpoint</td>
          <td>區分 health、<a href="/blog/backend/knowledge-cards/readiness/" data-link-title="Readiness" data-link-desc="說明 instance 何時可以安全接收流量，以及 readiness 如何和部署平台協作">readiness</a>、diagnostics 與 status code 合約</td>
      </tr>
      <tr>
          <td><a href="/blog/go-advanced/06-production-operations/log-fields/" data-link-title="6.3 結構化日誌欄位設計" data-link-desc="讓 log 可 grep、可聚合、可追蹤">6.3</a></td>
          <td>結構化日誌欄位設計</td>
          <td>用穩定欄位讓 log 可 grep、可聚合、可追蹤</td>
      </tr>
      <tr>
          <td><a href="/blog/go-advanced/06-production-operations/feature-gate/" data-link-title="6.4 版本偵測與 feature gate" data-link-desc="依版本與環境能力啟用功能">6.4</a></td>
          <td>版本偵測與 feature gate</td>
          <td>用功能開關、能力偵測與降級策略控制行為</td>
      </tr>
  </tbody>
</table>
<h2 id="本模組使用的範例主題">本模組使用的範例主題</h2>
<p>本模組使用虛構的即時通知服務作為範例。範例包含 HTTP server、<a href="/blog/backend/knowledge-cards/websocket/" data-link-title="WebSocket" data-link-desc="說明 WebSocket 如何提供長連線雙向即時通訊">WebSocket</a> hub、background worker、runtime diagnostics、structured log 與 feature gate。</p>
<p>範例只用來展示 Go 生產操作設計，不假設讀者正在維護任何特定專案。</p>
<h2 id="本模組的-go-核心概念">本模組的 Go 核心概念</h2>
<ul>
<li>用 <code>signal.NotifyContext</code> 或 signal channel 建立 root context。</li>
<li>用 <code>http.Server.Shutdown</code> 停止接受新 request。</li>
<li>用 context 傳遞停止訊號給 worker、hub、WebSocket pump。</li>
<li>用 <code>/health</code>、<code>/ready</code>、<code>/debug/...</code> 分開不同操作訊號。</li>
<li>用 <code>log/slog</code> 建立穩定 structured fields。</li>
<li>用 config struct 載入 feature gate，而不是到處讀環境變數。</li>
</ul>
<h2 id="學習重點">學習重點</h2>
<p>學完本模組後，你應該能判斷：</p>
<ol>
<li>服務收到停止訊號後，哪些元件要先停止接流量</li>
<li>health、<a href="/blog/backend/knowledge-cards/readiness/" data-link-title="Readiness" data-link-desc="說明 instance 何時可以安全接收流量，以及 readiness 如何和部署平台協作">readiness</a>、diagnostics 各自回答什麼問題</li>
<li>structured log 欄位如何支援查詢與聚合</li>
<li>哪些資料不應進入 log</li>
<li>feature gate 關閉時應降級、回錯、隱藏還是排程稍後處理</li>
</ol>
<h2 id="本模組不處理">本模組不處理</h2>
<p>本模組不討論 Kubernetes、systemd、雲端平台或完整 SRE 流程的所有細節。這些環境會影響操作策略，但本模組先建立 Go 服務本身應具備的操作邊界；後續可接 <a href="/blog/go-advanced/07-distributed-operations/deployment-contracts/" data-link-title="7.5 Kubernetes、systemd 與 load balancer 合約" data-link-desc="理解部署平台如何影響 Go 服務的 shutdown、health 與資源限制">Kubernetes、systemd 與 load balancer 合約</a> 以及 <a href="/blog/go-advanced/07-distributed-operations/observability-pipeline/" data-link-title="7.4 Observability pipeline、metrics 與 tracing" data-link-desc="把 structured log、metric、trace 與 profile 組成可操作的診斷系統">Observability pipeline、metrics 與 tracing</a>。</p>
<h2 id="學習時間">學習時間</h2>
<p>預計 3-4 小時</p>
]]></content:encoded></item><item><title>4.9 Production 部署的資源評估原理</title><link>https://tarrragon.github.io/blog/llm/04-applications/production-resource-planning/</link><pubDate>Tue, 12 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/llm/04-applications/production-resource-planning/</guid><description>&lt;p>LLM 應用從本地實驗跨到 production 是個 phase transition、不是線性放大。本地 single-user 場景的「跑得起來」變 production 場景就要回答全新一組問題：100 個 user 同時打進來怎麼辦、每個 &lt;a href="https://tarrragon.github.io/blog/llm/knowledge-cards/token/" data-link-title="Token" data-link-desc="LLM 處理文字時的最小單位：介於字元與單字之間">token&lt;/a> 要多少錢、p99 latency 怎麼控、model service down 了怎麼處理。&lt;/p>
&lt;p>本章寫的是「&lt;strong>從本地實驗 → production 該想清楚的維度&lt;/strong>」、focus 在跨工具世代不變的原理。具體 framework（vLLM、TGI、Triton、SGLang）跟雲端服務（OpenAI / Anthropic / Bedrock）的選型不展開——這些半年一個世代、寫了會過時。本章建立的是「無論用哪套工具、都該回答」的設計取捨清單。&lt;/p>
&lt;p>跟 &lt;a href="https://tarrragon.github.io/blog/llm/04-applications/rag-principles/" data-link-title="4.1 RAG 原理：retrieval &amp;#43; augmentation 模式" data-link-desc="為什麼模型需要外掛知識、語意相似 vs 字面相似、chunking 的本質取捨、retrieval 失敗的根本原因">4.1 RAG&lt;/a> / &lt;a href="https://tarrragon.github.io/blog/llm/04-applications/tool-use-principles/" data-link-title="4.3 Tool use 原理：LLM 跟外部世界互動" data-link-desc="Structured output 是 LLM 跨入工程系統的橋、function calling 取捨、為什麼本地小模型 tool use 表現崩潰">4.3 Tool use&lt;/a> / &lt;a href="https://tarrragon.github.io/blog/llm/04-applications/agent-architecture/" data-link-title="4.4 Agent 架構原理" data-link-desc="Agent loop 結構、失敗模式、什麼任務適合 vs 不適合、跟人類審查的協作模型">4.4 Agent&lt;/a> 對應「應用怎麼設計」、本章對應「應用怎麼跑」。&lt;/p>
&lt;h2 id="本章目標">本章目標&lt;/h2>
&lt;p>讀完本章後你能：&lt;/p>
&lt;ol>
&lt;li>列出 production LLM 部署該評估的 6 個 dimension。&lt;/li>
&lt;li>解釋 single-user benchmark 為什麼不能直接 extrapolate 到 multi-user 場景。&lt;/li>
&lt;li>區分 latency-sensitive 跟 throughput-sensitive 應用的設計差別。&lt;/li>
&lt;li>對成本模型（$/request、$/token、$/month）做合理估算。&lt;/li>
&lt;/ol>
&lt;h2 id="從本地到-production-的-phase-transition">從本地到 production 的 phase transition&lt;/h2>
&lt;p>本地 LLM 跑 &lt;a href="https://tarrragon.github.io/blog/llm/knowledge-cards/rag/" data-link-title="RAG" data-link-desc="Retrieval-Augmented Generation：動態外掛知識給 LLM、繞開模型參數記憶的靜態限制">RAG&lt;/a> / &lt;a href="https://tarrragon.github.io/blog/llm/knowledge-cards/mcp/" data-link-title="MCP（Model Context Protocol）" data-link-desc="LLM application ↔ 外部 tool server 之間的標準化協議、複用 OpenAI 相容 API 的成功模式">MCP&lt;/a> 的 baseline（&lt;a href="https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/rag-mcp-resources/" data-link-title="Hands-on：RAG / MCP 的資源 footprint" data-link-desc="RAG ingest / query / MCP server 三階段的 RAM / 磁碟 / process 實測、多模型並存的 RAM 衝突、本地 LLM 跑 RAG 跟單純 chat 的差異">hands-on 章節&lt;/a>）：&lt;/p>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>維度&lt;/th>
 &lt;th>本地（single-user）&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>並發 user&lt;/td>
 &lt;td>1&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Latency 要求&lt;/td>
 &lt;td>秒級 OK&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Index 大小&lt;/td>
 &lt;td>&amp;lt; 100 MB&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Cost&lt;/td>
 &lt;td>一次性硬體&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Uptime&lt;/td>
 &lt;td>自己重啟&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>觀測&lt;/td>
 &lt;td>&lt;code>tail log&lt;/code>&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>Production 場景每個維度都跳一個量級：&lt;/p>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>維度&lt;/th>
 &lt;th>Production（multi-tenant）&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>並發 user&lt;/td>
 &lt;td>10 - 10000&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Latency 要求&lt;/td>
 &lt;td>p50 &amp;lt; 500 ms、p99 &amp;lt; 2 s&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Index 大小&lt;/td>
 &lt;td>GB - TB&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Cost&lt;/td>
 &lt;td>$ / request、$ / token、$ / month&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Uptime&lt;/td>
 &lt;td>99.9% SLA&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>觀測&lt;/td>
 &lt;td>metrics、traces、dashboards&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>每個維度跳一個量級的 implication 不是「資源 × 10」、是「全新的失敗模式 + 新的設計取捨」。&lt;/p></description><content:encoded><![CDATA[<p>LLM 應用從本地實驗跨到 production 是個 phase transition、不是線性放大。本地 single-user 場景的「跑得起來」變 production 場景就要回答全新一組問題：100 個 user 同時打進來怎麼辦、每個 <a href="/blog/llm/knowledge-cards/token/" data-link-title="Token" data-link-desc="LLM 處理文字時的最小單位：介於字元與單字之間">token</a> 要多少錢、p99 latency 怎麼控、model service down 了怎麼處理。</p>
<p>本章寫的是「<strong>從本地實驗 → production 該想清楚的維度</strong>」、focus 在跨工具世代不變的原理。具體 framework（vLLM、TGI、Triton、SGLang）跟雲端服務（OpenAI / Anthropic / Bedrock）的選型不展開——這些半年一個世代、寫了會過時。本章建立的是「無論用哪套工具、都該回答」的設計取捨清單。</p>
<p>跟 <a href="/blog/llm/04-applications/rag-principles/" data-link-title="4.1 RAG 原理：retrieval &#43; augmentation 模式" data-link-desc="為什麼模型需要外掛知識、語意相似 vs 字面相似、chunking 的本質取捨、retrieval 失敗的根本原因">4.1 RAG</a> / <a href="/blog/llm/04-applications/tool-use-principles/" data-link-title="4.3 Tool use 原理：LLM 跟外部世界互動" data-link-desc="Structured output 是 LLM 跨入工程系統的橋、function calling 取捨、為什麼本地小模型 tool use 表現崩潰">4.3 Tool use</a> / <a href="/blog/llm/04-applications/agent-architecture/" data-link-title="4.4 Agent 架構原理" data-link-desc="Agent loop 結構、失敗模式、什麼任務適合 vs 不適合、跟人類審查的協作模型">4.4 Agent</a> 對應「應用怎麼設計」、本章對應「應用怎麼跑」。</p>
<h2 id="本章目標">本章目標</h2>
<p>讀完本章後你能：</p>
<ol>
<li>列出 production LLM 部署該評估的 6 個 dimension。</li>
<li>解釋 single-user benchmark 為什麼不能直接 extrapolate 到 multi-user 場景。</li>
<li>區分 latency-sensitive 跟 throughput-sensitive 應用的設計差別。</li>
<li>對成本模型（$/request、$/token、$/month）做合理估算。</li>
</ol>
<h2 id="從本地到-production-的-phase-transition">從本地到 production 的 phase transition</h2>
<p>本地 LLM 跑 <a href="/blog/llm/knowledge-cards/rag/" data-link-title="RAG" data-link-desc="Retrieval-Augmented Generation：動態外掛知識給 LLM、繞開模型參數記憶的靜態限制">RAG</a> / <a href="/blog/llm/knowledge-cards/mcp/" data-link-title="MCP（Model Context Protocol）" data-link-desc="LLM application ↔ 外部 tool server 之間的標準化協議、複用 OpenAI 相容 API 的成功模式">MCP</a> 的 baseline（<a href="/blog/llm/01-local-llm-services/hands-on/rag-mcp-resources/" data-link-title="Hands-on：RAG / MCP 的資源 footprint" data-link-desc="RAG ingest / query / MCP server 三階段的 RAM / 磁碟 / process 實測、多模型並存的 RAM 衝突、本地 LLM 跑 RAG 跟單純 chat 的差異">hands-on 章節</a>）：</p>
<table>
  <thead>
      <tr>
          <th>維度</th>
          <th>本地（single-user）</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>並發 user</td>
          <td>1</td>
      </tr>
      <tr>
          <td>Latency 要求</td>
          <td>秒級 OK</td>
      </tr>
      <tr>
          <td>Index 大小</td>
          <td>&lt; 100 MB</td>
      </tr>
      <tr>
          <td>Cost</td>
          <td>一次性硬體</td>
      </tr>
      <tr>
          <td>Uptime</td>
          <td>自己重啟</td>
      </tr>
      <tr>
          <td>觀測</td>
          <td><code>tail log</code></td>
      </tr>
  </tbody>
</table>
<p>Production 場景每個維度都跳一個量級：</p>
<table>
  <thead>
      <tr>
          <th>維度</th>
          <th>Production（multi-tenant）</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>並發 user</td>
          <td>10 - 10000</td>
      </tr>
      <tr>
          <td>Latency 要求</td>
          <td>p50 &lt; 500 ms、p99 &lt; 2 s</td>
      </tr>
      <tr>
          <td>Index 大小</td>
          <td>GB - TB</td>
      </tr>
      <tr>
          <td>Cost</td>
          <td>$ / request、$ / token、$ / month</td>
      </tr>
      <tr>
          <td>Uptime</td>
          <td>99.9% SLA</td>
      </tr>
      <tr>
          <td>觀測</td>
          <td>metrics、traces、dashboards</td>
      </tr>
  </tbody>
</table>
<p>每個維度跳一個量級的 implication 不是「資源 × 10」、是「全新的失敗模式 + 新的設計取捨」。</p>
<h2 id="維度-1concurrent-users--throughput">維度 1：Concurrent users / Throughput</h2>
<h3 id="為什麼這個維度最關鍵">為什麼這個維度最關鍵</h3>
<p>本地 single-user 的 baseline 數字（<a href="/blog/llm/01-local-llm-services/hands-on/rag-mcp-resources/" data-link-title="Hands-on：RAG / MCP 的資源 footprint" data-link-desc="RAG ingest / query / MCP server 三階段的 RAM / 磁碟 / process 實測、多模型並存的 RAM 衝突、本地 LLM 跑 RAG 跟單純 chat 的差異">hands-on</a> 紀錄的 RAM / latency）<strong>在 multi-user 場景下幾乎無法 extrapolate</strong>、根因是資源爭用會放大原本看不到的成本：</p>
<ul>
<li>100 個 user 同時送 request → 不是「同樣 latency × 100」、是「queueing + memory contention + GPU 排隊」、單個 user 的 latency 可能漲 10×</li>
<li>同樣 model 服務 N 個 user → KV cache 占用要乘以 N、單卡 GPU 在容量限制下可能裝不下</li>
<li>Single-user 「200 ms latency」可能 production 變「p99 5 秒」</li>
</ul>
<h3 id="key-conceptbatching">Key concept：batching</h3>
<p><a href="/blog/llm/knowledge-cards/batching/" data-link-title="Batching" data-link-desc="多 request 一起跑、攤平 model load 成本：production LLM inference 的核心優化、決定 throughput vs latency 取捨">Batching</a> 跟 <a href="/blog/llm/knowledge-cards/kv-cache/" data-link-title="KV Cache" data-link-desc="已處理 token 的 attention 中間結果暫存：避免重算、加速後續生成">KV cache</a> 設計讓 GPU 能多 user 的 request 一次 forward pass、是 production <a href="/blog/llm/knowledge-cards/inference-server/" data-link-title="Inference Server" data-link-desc="載入模型權重、處理 prompt、產生 token 的常駐 process">inference server</a> 的核心優化。但 batching 也帶取捨：</p>
<ul>
<li><strong>靜態 batching</strong>：等湊滿 N 個 request 才跑、提高 throughput、犧牲首字延遲</li>
<li><strong>連續 batching（continuous batching）</strong>：vLLM / TGI 等用、新 request 動態加入正在跑的 batch、平衡 throughput + latency</li>
<li><strong>No batching</strong>：每 request 獨立跑、latency 低、GPU 利用率差</li>
</ul>
<p>選 batching 策略主要取決於 latency 跟 throughput 哪個重要：</p>
<table>
  <thead>
      <tr>
          <th>應用場景</th>
          <th>適合 batching 策略</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>互動式對話（IDE plugin、chatbot UI）</td>
          <td>continuous batching、低 latency 優先</td>
      </tr>
      <tr>
          <td>批次處理（document summarization、code review）</td>
          <td>static batching、throughput 優先</td>
      </tr>
      <tr>
          <td>Embedding 服務</td>
          <td>batching 越大越好、embedding 是純 forward pass、batch 16-128 都 OK</td>
      </tr>
  </tbody>
</table>
<h3 id="評估-concurrent-throughput">評估 concurrent throughput</h3>
<p>要做的測試（不在本章 hands-on、是 framework）：</p>
<ol>
<li><strong>Single-user baseline</strong>：measure single request 在 idle server 上的 latency</li>
<li><strong>N-user load test</strong>：用 <a href="https://k6.io">k6</a> / <a href="https://github.com/tsenart/vegeta">vegeta</a> / 自寫 async client 跑 1、10、100 個並發 request</li>
<li><strong>觀察 p50 / p95 / p99 latency 隨並發數變化</strong>：通常 &lt; N=batch_size 時平、超過 batch_size 後 latency 線性漲</li>
<li><strong>GPU memory 飽和點</strong>：tokens-in-flight 超過某個量、新 request 開始排隊</li>
</ol>
<p>實務評估公式：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">Max concurrent users (steady state)
</span></span><span class="line"><span class="ln">2</span><span class="cl">    = (GPU memory available - model weights) / (per-user KV cache size)</span></span></code></pre></div><p>例：H100 80 GB - 31B model 60 GB = 20 GB 可用 / 每 user 平均 200 MB KV cache = 100 個並發 user。</p>
<p>公式的失效條件（用這幾個 signal 判讀公式何時不可信）：</p>
<ul>
<li><strong>變長 context</strong>：per-user KV cache 隨 context 長度線性增長、長 context 用戶（10K+ tokens）的 KV cache 是短 context 用戶的 5-10 倍、用平均值會嚴重低估。修法：依 P95 context 長度估、不用 average。</li>
<li><strong>Prefix cache 啟用</strong>：vLLM、TGI 等用 prefix sharing 大幅省 KV cache、實際容量比公式高 2-3 倍。修法：跑實測量 prefix cache hit rate。</li>
<li><strong>Speculative decoding</strong>：drafter 跟 target 的 KV cache 都要算進去、每 user 佔用會比 dense baseline 高 10-20%。修法：用 drafter+target 合計算。</li>
<li><strong>不同 batching 策略</strong>：static batching 上限是「batch_size × 等待時間」、continuous batching 是「平均 in-flight tokens」、不同策略下公式的「per-user」定義不同。</li>
</ul>
<p>但這是上限、實際還要考慮 latency target。</p>
<h2 id="維度-2latency-budget">維度 2：Latency budget</h2>
<h3 id="latency-sensitive-vs-throughput-sensitive">Latency-sensitive vs throughput-sensitive</h3>
<p>兩類應用的設計取捨完全不同：</p>
<table>
  <thead>
      <tr>
          <th>屬性</th>
          <th>Latency-sensitive</th>
          <th>Throughput-sensitive</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>範例</td>
          <td>IDE 補完、chat UI、search assistant</td>
          <td>批次標籤、文件摘要、離線 RAG ingest</td>
      </tr>
      <tr>
          <td>目標 metric</td>
          <td>p99 latency</td>
          <td>tokens / second / GPU</td>
      </tr>
      <tr>
          <td>User 經驗影響</td>
          <td>直接（卡住）</td>
          <td>間接（總時間）</td>
      </tr>
      <tr>
          <td>Batching</td>
          <td>小 batch / continuous</td>
          <td>大 batch</td>
      </tr>
      <tr>
          <td>資源規劃</td>
          <td>預留 headroom 給 spike</td>
          <td>跑滿 GPU 利用率</td>
      </tr>
  </tbody>
</table>
<p>混合應用（如 chat with RAG）有兩段：retrieval（throughput-friendly、可 batch）+ generation（latency-sensitive、要 stream）。兩段獨立優化。</p>
<h3 id="latency-預算分配">Latency 預算分配</h3>
<p>一個 RAG 應用的 p99 latency 是各段加總：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">Total p99 = client → API gateway → retrieval → LLM prefill → LLM decode → response stream
</span></span><span class="line"><span class="ln">2</span><span class="cl">         ≈ 50 ms      20 ms        50 ms        500 ms       1500 ms      100 ms
</span></span><span class="line"><span class="ln">3</span><span class="cl">         ≈ 2.2 seconds</span></span></code></pre></div><p>如果 p99 budget 是 2 秒、要先確認<strong>最大消耗段是哪個</strong>：</p>
<ul>
<li>通常 LLM generation 是最大、是優化重心</li>
<li>Retrieval 在大 corpus 場景可能超過 100 ms、要 index 優化（HNSW、近似 nearest neighbor）</li>
<li>API gateway 通常可忽略、超過 50 ms 就有 SRE 議題</li>
</ul>
<p>各段監控分開、把監控拆到各段才找得到 root cause；只看 total latency 會錯失定位線索。</p>
<h2 id="維度-3cost-model">維度 3：Cost model</h2>
<h3 id="三種計費單位">三種計費單位</h3>
<table>
  <thead>
      <tr>
          <th>單位</th>
          <th>怎麼算</th>
          <th>適合</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>$/request</td>
          <td>每 API call 固定價</td>
          <td>簡單應用、可預測流量</td>
      </tr>
      <tr>
          <td>$/token</td>
          <td>看 input + output token 數</td>
          <td>OpenAI / Anthropic 主流、混合輸入長度應用</td>
      </tr>
      <tr>
          <td>$/server-hour</td>
          <td>自家跑 GPU instance、月租</td>
          <td>高 throughput、可預測 utilization</td>
      </tr>
  </tbody>
</table>
<p>雲端 API（OpenAI / Anthropic）幾乎都 $/token、給定 model 不同 price tier。自家跑（vLLM on Lambda Labs / RunPod）是 $/server-hour。</p>
<h3 id="成本估算-worked-example">成本估算 worked example</h3>
<p>假設應用：</p>
<ul>
<li>1000 active users / day</li>
<li>每 user 平均 10 requests / day</li>
<li>每 request 平均 1000 input tokens + 500 output tokens</li>
<li>用 Claude Sonnet 4.6（假設 $3 input / $15 output per million tokens）</li>
</ul>
<p>每日 cost：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">total_requests = 1000 × 10 = 10000 / day
</span></span><span class="line"><span class="ln">2</span><span class="cl">input_tokens = 10000 × 1000 = 10M
</span></span><span class="line"><span class="ln">3</span><span class="cl">output_tokens = 10000 × 500 = 5M
</span></span><span class="line"><span class="ln">4</span><span class="cl">daily_cost = 10M × $3/M + 5M × $15/M = $30 + $75 = $105 / day
</span></span><span class="line"><span class="ln">5</span><span class="cl">monthly_cost ≈ $3150</span></span></code></pre></div><p>跑自家 GPU 比較：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">H100 instance: ~$2/hour（以 2026 年 spot price 為例、實際隨雲廠商與當期報價變動）
</span></span><span class="line"><span class="ln">2</span><span class="cl">H100 monthly = $2 × 24 × 30 = $1440
</span></span><span class="line"><span class="ln">3</span><span class="cl">若 utilization &gt; 50% 且團隊有 SRE 能力維運、自架較划算
</span></span><span class="line"><span class="ln">4</span><span class="cl">若 utilization &lt; 30%、或團隊無 GPU 維運經驗、API 較划算</span></span></code></pre></div><p><strong>Breakeven 點通常在「持續高 utilization + 團隊有維運能力」</strong>——尖峰流量短的應用、或團隊無 GPU 維運經驗、API 更划算（不用養閒置 capacity 跟 SRE 人力）。實際判讀還要加合規 / 資料主權 / vendor lock-in 等非價格因素。</p>
<h3 id="hidden-cost">Hidden cost</h3>
<p>容易漏算的：</p>
<ul>
<li><strong>Egress bandwidth</strong>：cloud GPU instance 出流量、AWS / GCP 都 $/GB</li>
<li><strong>Storage</strong>：vector DB / log retention / metric retention</li>
<li><strong>失敗 retry</strong>：5xx error 自動 retry、token 重算</li>
<li><strong>Cold start</strong>：scale-to-zero 設定、cold start 浪費 5-30 秒 GPU time / 次</li>
</ul>
<h2 id="維度-4storage--vector-db">維度 4：Storage / Vector DB</h2>
<p>本地 <a href="/blog/llm/knowledge-cards/rag/" data-link-title="RAG" data-link-desc="Retrieval-Augmented Generation：動態外掛知識給 LLM、繞開模型參數記憶的靜態限制">RAG</a> demo 用 pickle、production 不行——pickle 不支援並發 read、不支援 update、不支援 partition、必須換 <a href="/blog/llm/knowledge-cards/vector-database/" data-link-title="Vector Database" data-link-desc="為高維向量 (embedding) 設計的儲存 &#43; 近似最近鄰 (ANN) 檢索系統：RAG 從 prototype 跨到 production 的關鍵元件">vector database</a>。</p>
<h3 id="vector-db-的設計取捨">Vector DB 的設計取捨</h3>
<table>
  <thead>
      <tr>
          <th>維度</th>
          <th>取捨</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><strong>Hosted vs self-host</strong></td>
          <td>Hosted（Pinecone、Weaviate Cloud）省維護、self-host 控制成本</td>
      </tr>
      <tr>
          <td><strong>In-memory vs disk-based</strong></td>
          <td>In-memory 快但記憶體限制、disk-based 大但 latency 高</td>
      </tr>
      <tr>
          <td><strong>HNSW vs flat</strong></td>
          <td>HNSW 近似但 sublinear、flat 精確但 linear</td>
      </tr>
      <tr>
          <td><strong>Update strategy</strong></td>
          <td>Periodic batch index rebuild vs incremental update</td>
      </tr>
  </tbody>
</table>
<p>具體選型半年一變、本章不展開。<strong>設計時要回答的問題</strong>：</p>
<ol>
<li>Corpus 多大？1M 以下 in-memory 就好、1M 以上要 disk-based</li>
<li>Update 頻率？每天一次 vs 即時、影響 architecture</li>
<li>Latency target？&lt; 50 ms 要 in-memory / HNSW、&lt; 200 ms 用 disk-based</li>
<li>並發 query 量？每秒 100 query 跟每秒 10000 query 設計完全不同</li>
</ol>
<h3 id="index-大小成長">Index 大小成長</h3>
<p>從 hands-on 章節 extrapolate：</p>
<table>
  <thead>
      <tr>
          <th>Corpus 規模</th>
          <th>Index 大小（含 chunks + embeddings）</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>1K docs</td>
          <td>~50 MB</td>
      </tr>
      <tr>
          <td>100K docs</td>
          <td>~5 GB</td>
      </tr>
      <tr>
          <td>1M docs</td>
          <td>~50 GB</td>
      </tr>
      <tr>
          <td>10M docs</td>
          <td>~500 GB</td>
      </tr>
      <tr>
          <td>100M docs</td>
          <td>~5 TB</td>
      </tr>
  </tbody>
</table>
<p>10M docs 以上、單機（256GB RAM、商用 SSD）放不進 in-memory index、要 sharding + 分散式 index。</p>
<h2 id="維度-5observability">維度 5：Observability</h2>
<p>Single-user <code>tail log</code> 不夠 production 用。要看的 metric：</p>
<h3 id="latency-metrics">Latency metrics</h3>
<ul>
<li><strong>TTFT (Time to First Token)</strong>：user-perceived「響應時間」、streaming 場景關鍵</li>
<li><strong>TPS (Tokens per second)</strong>：generation 速度</li>
<li><strong>End-to-end latency</strong>：含 retrieval + LLM + post-processing</li>
<li><strong>Per-percentile breakdown</strong>：p50 / p90 / p95 / p99——p99 反映最差 user 體驗</li>
</ul>
<h3 id="throughput-metrics">Throughput metrics</h3>
<ul>
<li><strong>Requests per second</strong>：API 端 RPS</li>
<li><strong>Tokens per second</strong>（aggregate）：GPU 整體 throughput</li>
<li><strong>Queue depth</strong>：等待 batch 的 request 數量、暴漲表示 overload</li>
</ul>
<h3 id="cost-metrics">Cost metrics</h3>
<ul>
<li><strong>$ per active user per day</strong>：產品經濟學基本盤</li>
<li><strong>Cost per session</strong>：互動式應用單位成本</li>
<li><strong>Cache hit rate</strong>：prompt cache / embedding cache 命中率、直接影響 cost</li>
</ul>
<h3 id="quality-metrics">Quality metrics</h3>
<ul>
<li><strong>Refusal rate</strong>：模型 refuse 回應的比例</li>
<li><strong>Hallucination rate</strong>：（要 reviewer 標）</li>
<li><strong>User feedback score</strong>：thumb up / down</li>
</ul>
<h3 id="工具metrics--traces--logs-三層">工具：metrics / traces / logs 三層</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">Metrics（Prometheus / Datadog / CloudWatch）
</span></span><span class="line"><span class="ln">2</span><span class="cl">    → time-series、aggregate、適合 alerting
</span></span><span class="line"><span class="ln">3</span><span class="cl">Traces（OpenTelemetry / Datadog APM）
</span></span><span class="line"><span class="ln">4</span><span class="cl">    → per-request、可追蹤跨服務 latency
</span></span><span class="line"><span class="ln">5</span><span class="cl">Logs（structured JSON、推 ELK / Loki）
</span></span><span class="line"><span class="ln">6</span><span class="cl">    → 詳細 context、debug 用</span></span></code></pre></div><p>三層各司其職、各層保留專屬職責：metric 看到 p99 漲、用 trace 找哪個 request 哪段慢、用 log 看那 request 的具體 prompt / response。</p>
<h2 id="維度-6reliability--sla">維度 6：Reliability / SLA</h2>
<h3 id="可預期的失敗模式">可預期的失敗模式</h3>
<table>
  <thead>
      <tr>
          <th>失敗類型</th>
          <th>處理</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><strong>Transient GPU OOM</strong></td>
          <td>retry with smaller batch、circuit breaker</td>
      </tr>
      <tr>
          <td><strong>Inference timeout</strong></td>
          <td>切短 max_tokens、拒絕過長 prompt</td>
      </tr>
      <tr>
          <td><strong>Model server crash</strong></td>
          <td>health check + auto-restart（systemd / k8s）</td>
      </tr>
      <tr>
          <td><strong>Vector DB unavailable</strong></td>
          <td>fallback：跳過 RAG、純 chat 答</td>
      </tr>
      <tr>
          <td><strong>Upstream API rate limit</strong></td>
          <td>exponential backoff + jitter</td>
      </tr>
  </tbody>
</table>
<h3 id="graceful-degradation">Graceful degradation</h3>
<p>設計 production LLM 應用、要回答「失敗時降級到什麼」：</p>
<table>
  <thead>
      <tr>
          <th>Component down</th>
          <th>Acceptable degradation</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Vector DB</td>
          <td>用 LLM 內知識回答 + 標明「未查最新文件」</td>
      </tr>
      <tr>
          <td>RAG retrieval 但 LLM 仍跑</td>
          <td>用退役 cache 結果 + retry</td>
      </tr>
      <tr>
          <td>Primary LLM API</td>
          <td>fallback 到 secondary（OpenAI ↔ Anthropic ↔ 本地）</td>
      </tr>
      <tr>
          <td>全部 down</td>
          <td>顯示維護頁、回 503 + Retry-After、避免直接 5xx</td>
      </tr>
  </tbody>
</table>
<p>在 SLA 承諾下、每個 fallback 路徑都要事前設計、避免出事時臨時決策（早期 prototype / 內部工具可接受 reactive 處理、production 階段不行）。</p>
<h3 id="capacity-planning">Capacity planning</h3>
<p>簡單公式：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">Required capacity = peak_concurrent_users × per_user_RAM
</span></span><span class="line"><span class="ln">2</span><span class="cl">                  × overhead_factor (1.3-1.5)
</span></span><span class="line"><span class="ln">3</span><span class="cl">                  × redundancy_factor (2x for HA)</span></span></code></pre></div><p>例：peak 100 並發、每 user ~500 MB KV cache、overhead 1.3、HA 2x → 130 GB GPU memory。一張 H100 不夠、要兩張 A100 80GB 或 H100 + sharding。</p>
<h2 id="跟本地-hands-on-的對照">跟本地 hands-on 的對照</h2>
<table>
  <thead>
      <tr>
          <th>維度</th>
          <th>本地 hands-on 紀錄</th>
          <th>Production 該量什麼</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Single-user latency</td>
          <td>30-60s for SDXL、5-20s for chat</td>
          <td>p50 / p95 / p99 latency</td>
      </tr>
      <tr>
          <td>Index size</td>
          <td>~3.7 MB / 463 chunks</td>
          <td>sharded index、GB-TB 規模</td>
      </tr>
      <tr>
          <td>Process management</td>
          <td><code>pkill -9</code></td>
          <td>systemd / k8s liveness probe</td>
      </tr>
      <tr>
          <td>Disk cleanup</td>
          <td>手動 <code>ollama rm</code></td>
          <td>自動 retention policy</td>
      </tr>
      <tr>
          <td>Cost</td>
          <td>一次性硬體</td>
          <td>$/token / day budget alerts</td>
      </tr>
      <tr>
          <td>Observability</td>
          <td><code>tail log</code></td>
          <td>Prometheus + Grafana / Datadog</td>
      </tr>
      <tr>
          <td>Failure response</td>
          <td>自己重啟</td>
          <td>auto-recover + alert + runbook</td>
      </tr>
  </tbody>
</table>
<p>本地數字是「能跑」的證明、production 數字是「能用」的驗證。本地驗證完 architecture 後、production deployment 該重做 load test、不能 assume 線性 scale。</p>
<h2 id="跨-framework-不變的設計問題">跨 framework 不變的設計問題</h2>
<p>不管你用 vLLM / TGI / Triton / SGLang / OpenAI API、production 設計都要回答：</p>
<ol>
<li><strong>Latency vs throughput</strong>：哪個是主要 metric？</li>
<li><strong>Batch strategy</strong>：static / continuous / per-request？</li>
<li><strong>Cost ceiling</strong>：$/day budget 多少？超過怎麼處理？</li>
<li><strong>Storage</strong>：vector DB 規模？update 頻率？</li>
<li><strong>Observability</strong>：哪些 metric 是 alert worthy？</li>
<li><strong>Reliability</strong>：failure mode + graceful degradation 設計</li>
<li><strong>Capacity</strong>：peak + redundancy 需要多少 GPU memory</li>
</ol>
<p>這 7 個問題回答一致時、framework 選擇通常不是 production 失敗的根因——資源評估跟設計取捨已對齊、framework 多半是配套選項。</p>
<h2 id="何時這篇會過時">何時這篇會過時</h2>
<p><strong>不會過時的部分</strong>：</p>
<ul>
<li>6 個維度（concurrency / latency / cost / storage / observability / reliability）</li>
<li>Latency-sensitive vs throughput-sensitive 應用的設計差異</li>
<li>三類計費單位的取捨</li>
<li>Metrics / traces / logs 三層觀測</li>
<li>Graceful degradation 設計</li>
</ul>
<p><strong>會變的部分</strong>：</p>
<ul>
<li>具體 inference framework（vLLM / TGI / SGLang 等）的 ranking</li>
<li>雲端 API price tier</li>
<li>哪些 vector DB 主流</li>
</ul>
<p>新 framework 出來時、回到 6 維度 framework 問：它在哪個維度有突破？對既有設計問題的答案有沒有改變？通常會發現核心問題沒變、只是工具更熟。</p>
<h2 id="跟其他章節的關係">跟其他章節的關係</h2>
<ul>
<li><a href="/blog/llm/01-local-llm-services/hands-on/rag-mcp-resources/" data-link-title="Hands-on：RAG / MCP 的資源 footprint" data-link-desc="RAG ingest / query / MCP server 三階段的 RAM / 磁碟 / process 實測、多模型並存的 RAM 衝突、本地 LLM 跑 RAG 跟單純 chat 的差異">hands-on RAG/MCP 資源</a>：本地 baseline 數字、本章的 production extrapolation 起點</li>
<li><a href="/blog/llm/04-applications/rag-principles/" data-link-title="4.1 RAG 原理：retrieval &#43; augmentation 模式" data-link-desc="為什麼模型需要外掛知識、語意相似 vs 字面相似、chunking 的本質取捨、retrieval 失敗的根本原因">4.1 RAG</a> / <a href="/blog/llm/04-applications/tool-use-principles/" data-link-title="4.3 Tool use 原理：LLM 跟外部世界互動" data-link-desc="Structured output 是 LLM 跨入工程系統的橋、function calling 取捨、為什麼本地小模型 tool use 表現崩潰">4.3 Tool use</a> / <a href="/blog/llm/04-applications/agent-architecture/" data-link-title="4.4 Agent 架構原理" data-link-desc="Agent loop 結構、失敗模式、什麼任務適合 vs 不適合、跟人類審查的協作模型">4.4 Agent</a>：應用層設計、本章是「應用如何跑」的補完</li>
<li><a href="/blog/llm/00-foundations/hardware-memory-budget/" data-link-title="0.5 Apple Silicon 記憶體預算" data-link-desc="記憶體決定能跑什麼，Q4 量化下的可運作模型對照與系統保留">0.5 硬體記憶體預算</a>：本地單機 perspective、本章對應 multi-machine production</li>
<li><a href="/blog/llm/01-local-llm-services/troubleshooting/" data-link-title="1.7 排錯方法論：用三層架構做故障定位" data-link-desc="故障定位的分層思考、症狀到層級的對應反射、log 在三層的角色差異、最小可重現的縮減策略">1.7 排錯方法論</a>：本地 trouble-shooting、本章是 production observability 的對照</li>
</ul>
]]></content:encoded></item><item><title>4.20 LLM tracing 與 observability</title><link>https://tarrragon.github.io/blog/llm/04-applications/llm-tracing-and-observability/</link><pubDate>Tue, 12 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/llm/04-applications/llm-tracing-and-observability/</guid><description>&lt;p>&lt;a href="https://tarrragon.github.io/blog/llm/knowledge-cards/llm-tracing/" data-link-title="LLM Tracing" data-link-desc="把 LLM 應用的每次 LLM call / tool call / memory op 編成結構化 span、用 OpenTelemetry GenAI semantic conventions 標準化">LLM tracing&lt;/a> 把每次 LLM call / tool call / memory op / handoff 編成結構化 span、用 OpenTelemetry GenAI semantic conventions 標準化、是 production LLM 應用 debug / cost / quality 監控的事實標準。傳統 web app 的字串 logging 抓不到 LLM 應用的關鍵問題 — agent 為什麼選了那條路、reasoning trace 怎麼推導、tool call 為什麼 retry 三次、token 消耗為什麼比預期高 ×3。本章把 LLM tracing 的運作機制、OTel GenAI semconv、三大 use case（cost / latency / failure）跟 production eval 閉環拆成可操作的工程實務。&lt;/p>
&lt;h2 id="本章目標">本章目標&lt;/h2>
&lt;p>讀完本章後、你應該能：&lt;/p>
&lt;ol>
&lt;li>解釋 LLM tracing 跟 traditional logging 的差異。&lt;/li>
&lt;li>用 OpenTelemetry GenAI semantic conventions 設計 span 結構。&lt;/li>
&lt;li>用 trace 做 cost / latency 監控跟 failure debug。&lt;/li>
&lt;li>把 production trace 餵回 &lt;a href="https://tarrragon.github.io/blog/llm/04-applications/llm-as-judge/" data-link-title="4.21 LLM-as-Judge 評估方法" data-link-desc="LLM 評估 LLM 的 production eval 方法：rubric design、pairwise / direct scoring、三大 bias 緩解、跟 trace 串接的閉環、calibration">LLM-as-judge&lt;/a> 做品質迴路。&lt;/li>
&lt;li>對自己應用判斷該用 self-host vs SaaS observability platform。&lt;/li>
&lt;/ol>
&lt;h2 id="traditional-logging-為什麼不夠">Traditional logging 為什麼不夠&lt;/h2>
&lt;p>LLM 應用的 debug 問題對傳統 logging 太抽象：&lt;/p>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>場景&lt;/th>
 &lt;th>Logging 看到&lt;/th>
 &lt;th>真正需要的資訊&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>Agent 為什麼選 tool A 不選 tool B&lt;/td>
 &lt;td>&lt;code>tool=A&lt;/code> 一行&lt;/td>
 &lt;td>完整 reasoning trace + 當下 context + tool list&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Token cost 為什麼高&lt;/td>
 &lt;td>&lt;code>tokens=15234&lt;/code>&lt;/td>
 &lt;td>Input / output / cached token 分項 + 每 turn 累積&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Why TTFT 5 秒&lt;/td>
 &lt;td>&lt;code>ttft=5012ms&lt;/code>&lt;/td>
 &lt;td>Prefill 跟 cache miss、prompt length、queue time&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Tool 為什麼 retry 三次&lt;/td>
 &lt;td>&lt;code>tool error retry&lt;/code>&lt;/td>
 &lt;td>每次 error message + LLM 的判讀 + retry 策略&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Agent 為什麼 infinite loop&lt;/td>
 &lt;td>大量重複 log&lt;/td>
 &lt;td>每 iteration 的 context + 為什麼沒判 terminate&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>LLM tracing 用「結構化 span + parent-child 關係 + 標準化 attribute」直接編碼這些訊息。&lt;/p></description><content:encoded><![CDATA[<p><a href="/blog/llm/knowledge-cards/llm-tracing/" data-link-title="LLM Tracing" data-link-desc="把 LLM 應用的每次 LLM call / tool call / memory op 編成結構化 span、用 OpenTelemetry GenAI semantic conventions 標準化">LLM tracing</a> 把每次 LLM call / tool call / memory op / handoff 編成結構化 span、用 OpenTelemetry GenAI semantic conventions 標準化、是 production LLM 應用 debug / cost / quality 監控的事實標準。傳統 web app 的字串 logging 抓不到 LLM 應用的關鍵問題 — agent 為什麼選了那條路、reasoning trace 怎麼推導、tool call 為什麼 retry 三次、token 消耗為什麼比預期高 ×3。本章把 LLM tracing 的運作機制、OTel GenAI semconv、三大 use case（cost / latency / failure）跟 production eval 閉環拆成可操作的工程實務。</p>
<h2 id="本章目標">本章目標</h2>
<p>讀完本章後、你應該能：</p>
<ol>
<li>解釋 LLM tracing 跟 traditional logging 的差異。</li>
<li>用 OpenTelemetry GenAI semantic conventions 設計 span 結構。</li>
<li>用 trace 做 cost / latency 監控跟 failure debug。</li>
<li>把 production trace 餵回 <a href="/blog/llm/04-applications/llm-as-judge/" data-link-title="4.21 LLM-as-Judge 評估方法" data-link-desc="LLM 評估 LLM 的 production eval 方法：rubric design、pairwise / direct scoring、三大 bias 緩解、跟 trace 串接的閉環、calibration">LLM-as-judge</a> 做品質迴路。</li>
<li>對自己應用判斷該用 self-host vs SaaS observability platform。</li>
</ol>
<h2 id="traditional-logging-為什麼不夠">Traditional logging 為什麼不夠</h2>
<p>LLM 應用的 debug 問題對傳統 logging 太抽象：</p>
<table>
  <thead>
      <tr>
          <th>場景</th>
          <th>Logging 看到</th>
          <th>真正需要的資訊</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Agent 為什麼選 tool A 不選 tool B</td>
          <td><code>tool=A</code> 一行</td>
          <td>完整 reasoning trace + 當下 context + tool list</td>
      </tr>
      <tr>
          <td>Token cost 為什麼高</td>
          <td><code>tokens=15234</code></td>
          <td>Input / output / cached token 分項 + 每 turn 累積</td>
      </tr>
      <tr>
          <td>Why TTFT 5 秒</td>
          <td><code>ttft=5012ms</code></td>
          <td>Prefill 跟 cache miss、prompt length、queue time</td>
      </tr>
      <tr>
          <td>Tool 為什麼 retry 三次</td>
          <td><code>tool error retry</code></td>
          <td>每次 error message + LLM 的判讀 + retry 策略</td>
      </tr>
      <tr>
          <td>Agent 為什麼 infinite loop</td>
          <td>大量重複 log</td>
          <td>每 iteration 的 context + 為什麼沒判 terminate</td>
      </tr>
  </tbody>
</table>
<p>LLM tracing 用「結構化 span + parent-child 關係 + 標準化 attribute」直接編碼這些訊息。</p>
<h2 id="opentelemetry-genai-semantic-conventions">OpenTelemetry GenAI semantic conventions</h2>
<p>OTel GenAI semconv 是 2024-2025 標準化中的 trace schema。核心概念：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln"> 1</span><span class="cl">Trace（一次 user query 從進來到 response）
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">  ├── Span: gen_ai.agent.invocation（agent loop iteration 1）
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">  │     ├── Span: gen_ai.client.operation（LLM call 1）
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">  │     │     attrs: model, temperature, input_tokens, output_tokens, cache_read
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">  │     ├── Span: gen_ai.tool.execution（tool: read_file）
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">  │     │     attrs: tool_name, input, output, duration
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">  │     └── Span: gen_ai.memory.read（retrieval）
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">  │           attrs: query, top_k, similarity_scores
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">  ├── Span: gen_ai.agent.invocation（iteration 2）
</span></span><span class="line"><span class="ln">10</span><span class="cl">  │     └── ...
</span></span><span class="line"><span class="ln">11</span><span class="cl">  └── Span: gen_ai.agent.terminate
</span></span><span class="line"><span class="ln">12</span><span class="cl">        attrs: reason, total_tokens, total_cost</span></span></code></pre></div><p>主要 attribute 分類：</p>
<table>
  <thead>
      <tr>
          <th>類別</th>
          <th>屬性 prefix</th>
          <th>典型內容</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Model</td>
          <td><code>gen_ai.request.*</code></td>
          <td>model, temperature, top_p, max_tokens, stream</td>
      </tr>
      <tr>
          <td>Usage</td>
          <td><code>gen_ai.usage.*</code></td>
          <td>input_tokens, output_tokens, cached_tokens</td>
      </tr>
      <tr>
          <td>Response</td>
          <td><code>gen_ai.response.*</code></td>
          <td>finish_reason, id</td>
      </tr>
      <tr>
          <td>Tool</td>
          <td><code>gen_ai.tool.*</code></td>
          <td>name, parameters, result</td>
      </tr>
      <tr>
          <td>Memory</td>
          <td><code>gen_ai.memory.*</code></td>
          <td>operation, store, query, hits</td>
      </tr>
      <tr>
          <td>Cost</td>
          <td><code>gen_ai.cost.*</code></td>
          <td>usd, currency（vendor-specific）</td>
      </tr>
  </tbody>
</table>
<p>實作概要（Python 例）：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="kn">from</span> <span class="nn">opentelemetry</span> <span class="kn">import</span> <span class="n">trace</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="kn">from</span> <span class="nn">openinference.semconv.trace</span> <span class="kn">import</span> <span class="n">SpanAttributes</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="n">tracer</span> <span class="o">=</span> <span class="n">trace</span><span class="o">.</span><span class="n">get_tracer</span><span class="p">(</span><span class="vm">__name__</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="k">with</span> <span class="n">tracer</span><span class="o">.</span><span class="n">start_as_current_span</span><span class="p">(</span><span class="s2">&#34;gen_ai.client.operation&#34;</span><span class="p">)</span> <span class="k">as</span> <span class="n">span</span><span class="p">:</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">    <span class="n">span</span><span class="o">.</span><span class="n">set_attribute</span><span class="p">(</span><span class="n">SpanAttributes</span><span class="o">.</span><span class="n">LLM_MODEL_NAME</span><span class="p">,</span> <span class="s2">&#34;claude-sonnet-4-6&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">    <span class="n">span</span><span class="o">.</span><span class="n">set_attribute</span><span class="p">(</span><span class="n">SpanAttributes</span><span class="o">.</span><span class="n">LLM_TEMPERATURE</span><span class="p">,</span> <span class="mf">0.7</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">
</span></span><span class="line"><span class="ln">10</span><span class="cl">    <span class="n">response</span> <span class="o">=</span> <span class="n">llm_client</span><span class="o">.</span><span class="n">chat</span><span class="p">(</span><span class="n">messages</span><span class="o">=...</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">
</span></span><span class="line"><span class="ln">12</span><span class="cl">    <span class="n">span</span><span class="o">.</span><span class="n">set_attribute</span><span class="p">(</span><span class="n">SpanAttributes</span><span class="o">.</span><span class="n">LLM_TOKEN_COUNT_PROMPT</span><span class="p">,</span> <span class="n">response</span><span class="o">.</span><span class="n">usage</span><span class="o">.</span><span class="n">input_tokens</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">    <span class="n">span</span><span class="o">.</span><span class="n">set_attribute</span><span class="p">(</span><span class="n">SpanAttributes</span><span class="o">.</span><span class="n">LLM_TOKEN_COUNT_COMPLETION</span><span class="p">,</span> <span class="n">response</span><span class="o">.</span><span class="n">usage</span><span class="o">.</span><span class="n">output_tokens</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">    <span class="n">span</span><span class="o">.</span><span class="n">set_attribute</span><span class="p">(</span><span class="s2">&#34;gen_ai.usage.cached_tokens&#34;</span><span class="p">,</span> <span class="n">response</span><span class="o">.</span><span class="n">usage</span><span class="o">.</span><span class="n">cache_read_tokens</span> <span class="ow">or</span> <span class="mi">0</span><span class="p">)</span></span></span></code></pre></div><p>實務上多用 framework auto-instrumentation（LangChain / LlamaIndex / Anthropic SDK 都有 OTel integration）、不必手寫 span。</p>
<h2 id="use-case-1cost-monitoring">Use case 1：Cost monitoring</h2>
<p>Trace 是 LLM 應用 cost 監控的核心 — token usage attribute 內建、不必另外算。</p>
<p>實作模式：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">1. Trace 端記錄 input_tokens / output_tokens / cached_tokens
</span></span><span class="line"><span class="ln">2</span><span class="cl">2. Observability 平台用「per-model pricing table」算出 USD
</span></span><span class="line"><span class="ln">3</span><span class="cl">3. Aggregate by：
</span></span><span class="line"><span class="ln">4</span><span class="cl">   - User（哪個 user 燒最多）
</span></span><span class="line"><span class="ln">5</span><span class="cl">   - Endpoint（哪條 API path 最貴）
</span></span><span class="line"><span class="ln">6</span><span class="cl">   - Feature（哪個 feature 最費 token）
</span></span><span class="line"><span class="ln">7</span><span class="cl">   - Time（哪天 spike）</span></span></code></pre></div><p>典型 dashboard 指標：</p>
<table>
  <thead>
      <tr>
          <th>指標</th>
          <th>直覺</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Total cost / day</td>
          <td>整體燒錢趨勢</td>
      </tr>
      <tr>
          <td>Cost per user</td>
          <td>找 power user 或 abuse</td>
      </tr>
      <tr>
          <td>Cost per request</td>
          <td>看單 request 平均 cost、設 alert</td>
      </tr>
      <tr>
          <td>Cached / total token ratio</td>
          <td><a href="/blog/llm/knowledge-cards/prompt-cache/" data-link-title="Prompt Cache" data-link-desc="重複出現的 prompt prefix 在推論伺服器或 LLM 服務端被 cache、後續 query 跳過 prefill、大幅降 cost 跟 TTFT">Prompt cache</a> 命中率</td>
      </tr>
      <tr>
          <td>Output / input token ratio</td>
          <td>輸出膨脹率、看 generation length 合理性</td>
      </tr>
  </tbody>
</table>
<h2 id="use-case-2latency--failure-debug">Use case 2：Latency / failure debug</h2>
<p>Trace 自然編碼 latency tree、能定位「哪個 span 卡」：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">User query → response total: 5.2s
</span></span><span class="line"><span class="ln">2</span><span class="cl">├── Agent iteration 1: 4.8s
</span></span><span class="line"><span class="ln">3</span><span class="cl">│   ├── LLM call (claude): 4.2s     ← 主要時間在這
</span></span><span class="line"><span class="ln">4</span><span class="cl">│   │   - prefill: 3.8s             ← prefill 太久、看 prompt 是否需要 cache
</span></span><span class="line"><span class="ln">5</span><span class="cl">│   │   - generation: 0.4s
</span></span><span class="line"><span class="ln">6</span><span class="cl">│   ├── tool: read_file: 0.5s
</span></span><span class="line"><span class="ln">7</span><span class="cl">│   └── memory: retrieval: 0.1s
</span></span><span class="line"><span class="ln">8</span><span class="cl">└── Agent iteration 2: 0.4s</span></span></code></pre></div><p>從這 trace 看出「90% 時間在 prefill、開 prompt cache 可以救」、不必猜。</p>
<p>Failure debug：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">User query → response: ERROR
</span></span><span class="line"><span class="ln">2</span><span class="cl">├── Agent iteration 1: success
</span></span><span class="line"><span class="ln">3</span><span class="cl">│   └── LLM call: tool_call(run_bash, cmd=&#34;rm -rf /&#34;)
</span></span><span class="line"><span class="ln">4</span><span class="cl">├── Agent iteration 2: failure
</span></span><span class="line"><span class="ln">5</span><span class="cl">│   └── tool: run_bash: REJECTED by permission system
</span></span><span class="line"><span class="ln">6</span><span class="cl">└── Agent fallback: error response
</span></span><span class="line"><span class="ln">7</span><span class="cl">
</span></span><span class="line"><span class="ln">8</span><span class="cl">從 trace 看：tool call 被 permission 擋下、不是 LLM 自己亂、而是 user query 觸發危險 tool call、permission 正確擋下。</span></span></code></pre></div><p>對應 <a href="/blog/llm/06-security/tool-use-permission-model/" data-link-title="6.2 tool use 與 MCP server 的權限模型" data-link-desc="個人 dev 場景下 tool use / MCP server 的副作用權限：檔案系統 / shell / 網路存取邊界、第三方 MCP 信任、副作用的可逆性">6.2 tool use 權限模型</a> 跟 <a href="/blog/llm/01-local-llm-services/hands-on/permission-boundary/" data-link-title="Hands-on：Ollama 改檔案 / 寫程式碼的權限邊界在哪" data-link-desc="四組對照實驗：Ollama 自己沒 FS / shell 權限、wrapper 才有；--dry-run / --confirm / --auto 三檔審查粒度的取捨">hands-on permission-boundary</a> 的判讀。</p>
<h2 id="use-case-3production-trace--eval-loop">Use case 3：Production trace → eval loop</h2>
<p>Production trace 是 <a href="/blog/llm/04-applications/llm-as-judge/" data-link-title="4.21 LLM-as-Judge 評估方法" data-link-desc="LLM 評估 LLM 的 production eval 方法：rubric design、pairwise / direct scoring、三大 bias 緩解、跟 trace 串接的閉環、calibration">LLM-as-judge</a> 的最佳資料來源：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln"> 1</span><span class="cl">Production users
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">   ↓ 產生 trace
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">Trace storage（LangSmith / Phoenix / Langfuse）
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">   ↓ filter（e.g. user thumbs-down 的 trace）
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">   ↓ sample N 個
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">LLM-as-judge eval
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">   ↓ rubric scoring
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">找出系統性問題（哪類 query 品質差）
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">   ↓
</span></span><span class="line"><span class="ln">10</span><span class="cl">改 system prompt / tool / agent loop
</span></span><span class="line"><span class="ln">11</span><span class="cl">   ↓
</span></span><span class="line"><span class="ln">12</span><span class="cl">A/B test on production traces</span></span></code></pre></div><p>這是 <a href="/blog/llm/04-applications/benchmarking-and-evaluation/" data-link-title="4.14 Benchmarking 與評估方法論" data-link-desc="判讀 model card benchmark 數字、做自己工作流的 in-house benchmark、量測本地推論速度的完整方法論">4.14 benchmarking</a> 提的「in-house benchmark」的具體 implementation — production trace 是最真實的 benchmark dataset。</p>
<h2 id="主流平台選型">主流平台選型</h2>
<table>
  <thead>
      <tr>
          <th>平台</th>
          <th>類型</th>
          <th>強項</th>
          <th>適合場景</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>LangSmith</td>
          <td>SaaS（LangChain 系）</td>
          <td>Auto-instrumentation 強、UI 完整</td>
          <td>LangChain / LangGraph user</td>
      </tr>
      <tr>
          <td>Phoenix</td>
          <td>OSS + SaaS（Arize 系）</td>
          <td>OpenInference 標準、可 self-host</td>
          <td>想 self-host + OTel native</td>
      </tr>
      <tr>
          <td>Langfuse</td>
          <td>OSS + SaaS</td>
          <td>開源強、cost 監控好</td>
          <td>Cost / eval 中心、可 self-host</td>
      </tr>
      <tr>
          <td>Braintrust</td>
          <td>SaaS</td>
          <td>Eval + tracing 一體</td>
          <td>重 eval workflow 的 team</td>
      </tr>
      <tr>
          <td>Datadog APM</td>
          <td>SaaS</td>
          <td>跟 traditional APM 整合</td>
          <td>已用 Datadog、想統一監控</td>
      </tr>
      <tr>
          <td>Logfire</td>
          <td>SaaS（Pydantic）</td>
          <td>簡潔、Python 為主</td>
          <td>Python 為主、輕量</td>
      </tr>
      <tr>
          <td>Self-host OTel + Jaeger</td>
          <td>OSS</td>
          <td>完全 self-host、最便宜</td>
          <td>隱私敏感、cost 敏感、技術強</td>
      </tr>
  </tbody>
</table>
<p>判讀：</p>
<ol>
<li><strong>個人 / 小流量</strong>：SaaS 免費 tier（LangSmith / Langfuse / Phoenix）夠用</li>
<li><strong>隱私敏感（user data 不能離本機）</strong>：Self-host（Langfuse / Phoenix self-hosted、或 OTel + Jaeger）</li>
<li><strong>已有 observability stack</strong>：用 OTel + 現有 Datadog / Grafana、別再加一層</li>
<li><strong>重 eval</strong>：Braintrust / Langfuse 的 eval feature 強</li>
</ol>
<h2 id="跟-49-production-resource-的關係">跟 <a href="/blog/llm/04-applications/production-resource-planning/" data-link-title="4.9 Production 部署的資源評估原理" data-link-desc="從本地單 user 到 production multi-tenant：concurrent users、cost model、observability、SLA、capacity planning 的設計取捨">4.9 production resource</a> 的關係</h2>
<p>4.5 寫 production resource 的 6 個 dimension（concurrency / latency / cost / storage / observability / reliability）、其中 observability 是 4.5 點到、本章展開。讀者讀完 4.5 知道「需要 observability」、本章補「具體怎麼做」。</p>
<h2 id="設計失敗模式">設計失敗模式</h2>
<ol>
<li><strong>過度 instrument</strong>：每個 internal function 都加 span、trace overhead 大、實際 production noise 多</li>
</ol>
<p><strong>緩解</strong>：聚焦 LLM-related 跟跨 service 邊界、internal logic 不必 trace</p>
<ol start="2">
<li><strong>PII / sensitive data 寫進 span attribute</strong>：user prompt、API key、會被 SaaS 平台看到</li>
</ol>
<p><strong>緩解</strong>：Span attribute 過 PII filter、敏感資料 hash / masking、跟 <a href="/blog/llm/06-security/cross-cloud-local-data-boundary/" data-link-title="6.4 跨雲端 / 本地的資料邊界" data-link-desc="個人 dev 場景下混用雲端 LLM 跟本地 LLM 時的 prompt 洩漏點：Continue.dev 多 provider 設定、隱私資料流、按敏感度分流的判讀">6.4 跨雲端邊界</a> 結合</p>
<ol start="3">
<li><strong>不 sample</strong>：production 100% trace、storage / cost 爆</li>
</ol>
<p><strong>緩解</strong>：Production sample rate &lt; 10%、error / outlier 100% capture</p>
<ol start="4">
<li><strong>沒設 trace 保留期</strong>：trace 越累積越多、舊 trace 沒人看但仍付儲存</li>
</ol>
<p><strong>緩解</strong>：明確保留 policy（如 7-30 天 hot、之後 archive 或刪）</p>
<ol start="5">
<li><strong>Trace 不跟 metric 串</strong>：trace 是 sample、metric 是 aggregate、debug 要兩個一起看</li>
</ol>
<p><strong>緩解</strong>：cost / latency 也輸出 metric（Prometheus 等）、trace 補 specific instance debug</p>
<h2 id="何時不需要-tracing">何時不需要 tracing</h2>
<ol>
<li><strong>純 demo / 個人玩</strong>：log 字串夠用</li>
<li><strong>單一 LLM call、無 agent loop</strong>：簡單到 grep log 也能 debug</li>
<li><strong>隱私極敏感且不 self-host</strong>：trace 內容流向 SaaS 是邊界、評估 risk</li>
<li><strong>每 request 都 trace 的 overhead &gt; 收益</strong>：超低 latency 場景看是否 worth it</li>
</ol>
<h2 id="何時過時--何時不過時">何時過時 / 何時不過時</h2>
<p><strong>不會過時的部分</strong>：</p>
<ul>
<li>LLM tracing 跟 traditional logging 的根本差異</li>
<li>結構化 span + parent-child 關係的 framing</li>
<li>Cost monitoring / latency debug / failure debug 三大 use case</li>
<li>Trace → eval 的閉環概念</li>
<li>5 個設計失敗模式</li>
</ul>
<p><strong>會變的部分</strong>：</p>
<ul>
<li>OTel GenAI semconv 的具體 attribute 名稱（仍在 stabilizing）</li>
<li>主流 SaaS 平台（每年 1-2 個新進入者）</li>
<li>Auto-instrumentation 的支援度（持續擴展）</li>
<li>跟具體 framework 的整合方式</li>
</ul>
<h2 id="下一章">下一章</h2>
<p>下一章：<a href="/blog/llm/04-applications/llm-as-judge/" data-link-title="4.21 LLM-as-Judge 評估方法" data-link-desc="LLM 評估 LLM 的 production eval 方法：rubric design、pairwise / direct scoring、三大 bias 緩解、跟 trace 串接的閉環、calibration">4.21 LLM-as-judge 評估方法</a>、把 production trace 變成系統性 eval 的閉環。</p>
]]></content:encoded></item><item><title>4.21 LLM-as-Judge 評估方法</title><link>https://tarrragon.github.io/blog/llm/04-applications/llm-as-judge/</link><pubDate>Tue, 12 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/llm/04-applications/llm-as-judge/</guid><description>&lt;p>&lt;a href="https://tarrragon.github.io/blog/llm/04-applications/benchmarking-and-evaluation/" data-link-title="4.14 Benchmarking 與評估方法論" data-link-desc="判讀 model card benchmark 數字、做自己工作流的 in-house benchmark、量測本地推論速度的完整方法論">4.14 benchmarking-and-evaluation&lt;/a> 寫了 capability benchmark（MMLU、SWE-bench 等）跟 in-house benchmark 概念。但「自己工作流的真實案例該怎麼系統性 eval」這個操作層、4.14 點到沒展開。本章補上 &lt;a href="https://tarrragon.github.io/blog/llm/knowledge-cards/llm-as-judge/" data-link-title="LLM-as-Judge" data-link-desc="用 LLM 評估另一個 LLM 的輸出品質、production eval 的主流方法、500-5000× 成本降但有 bias 要處理">LLM-as-Judge&lt;/a> — production AI app 的事實標準 eval 方法、比 human eval 便宜 500-5000×、跟人類有 80%+ agreement、但要處理 bias。&lt;/p>
&lt;p>Judge 在 eval 系統中的定位：&lt;a href="https://tarrragon.github.io/blog/llm/04-applications/eval-design-framework/" data-link-title="4.13 Eval 設計座標系：三軸、八象限、何時測什麼" data-link-desc="Eval 設計三軸（objective↔subjective / component↔end-to-end / quantitative↔qualitative）、八象限的對應 eval 工具、軸選錯的訊號、跟 benchmarking / LLM-as-judge / tracing 的關係">4.13 Eval 設計座標系&lt;/a> 把 eval 分三軸八象限、判斷哪個象限該用什麼工具——judge 的位置是 subjective 軸（沒 ground truth 的行為）、不是 objective 軸（有 ground truth 用 deterministic check 更便宜更準）。讀本章前先看 4.13 的軸誤選段、避開「全部 eval 都做成 judge」的常見反模式。&lt;/p>
&lt;h2 id="本章目標">本章目標&lt;/h2>
&lt;p>讀完本章後、你應該能：&lt;/p>
&lt;ol>
&lt;li>區分 LLM-as-Judge、standard benchmark、human eval 三條 eval 路徑。&lt;/li>
&lt;li>設計可重現的 judge rubric（input / output / rubric / reasoning 四段）。&lt;/li>
&lt;li>用 pairwise vs direct scoring、知道何時用哪種。&lt;/li>
&lt;li>緩解三大 bias（position / verbosity / self-preference）。&lt;/li>
&lt;li>把 production &lt;a href="https://tarrragon.github.io/blog/llm/04-applications/llm-tracing-and-observability/" data-link-title="4.20 LLM tracing 與 observability" data-link-desc="OpenTelemetry GenAI semantic conventions、結構化 span 設計、cost / latency 監控、failure debug 流程、跟 LLM-as-judge eval 的串接">trace&lt;/a> 餵回 judge、形成自動 eval 閉環。&lt;/li>
&lt;/ol>
&lt;h2 id="為什麼需要-llm-as-judge">為什麼需要 LLM-as-Judge&lt;/h2>
&lt;p>&lt;a href="https://tarrragon.github.io/blog/llm/04-applications/benchmarking-and-evaluation/" data-link-title="4.14 Benchmarking 與評估方法論" data-link-desc="判讀 model card benchmark 數字、做自己工作流的 in-house benchmark、量測本地推論速度的完整方法論">4.14&lt;/a> 推「in-house benchmark 是 final test」、但操作層是個 gap：&lt;/p>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>Eval 痛點&lt;/th>
 &lt;th>LLM-as-Judge 解法&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>Standard benchmark 跟自己 use case 不符&lt;/td>
 &lt;td>Judge 用自己 case 跑、rubric 自定義&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Human eval 太貴 / 太慢&lt;/td>
 &lt;td>Judge 自動跑、$0.001-0.01 per item&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Production trace 量大、人工看不完&lt;/td>
 &lt;td>Judge 跑 100% production trace 都可行&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Rule-based eval 抓不到語意問題&lt;/td>
 &lt;td>Judge 能判斷「答案是否符合意圖、即使措辭不同」&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Iteration 需要快速 feedback&lt;/td>
 &lt;td>Judge 幾分鐘跑完 100 items、prompt 改完馬上重測&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>主要 use case（重複 &lt;a href="https://tarrragon.github.io/blog/llm/knowledge-cards/llm-as-judge/" data-link-title="LLM-as-Judge" data-link-desc="用 LLM 評估另一個 LLM 的輸出品質、production eval 的主流方法、500-5000× 成本降但有 bias 要處理">LLM-as-Judge 卡片&lt;/a>）：in-house benchmark、production trace eval、A/B test、synthetic data quality。&lt;/p></description><content:encoded><![CDATA[<p><a href="/blog/llm/04-applications/benchmarking-and-evaluation/" data-link-title="4.14 Benchmarking 與評估方法論" data-link-desc="判讀 model card benchmark 數字、做自己工作流的 in-house benchmark、量測本地推論速度的完整方法論">4.14 benchmarking-and-evaluation</a> 寫了 capability benchmark（MMLU、SWE-bench 等）跟 in-house benchmark 概念。但「自己工作流的真實案例該怎麼系統性 eval」這個操作層、4.14 點到沒展開。本章補上 <a href="/blog/llm/knowledge-cards/llm-as-judge/" data-link-title="LLM-as-Judge" data-link-desc="用 LLM 評估另一個 LLM 的輸出品質、production eval 的主流方法、500-5000× 成本降但有 bias 要處理">LLM-as-Judge</a> — production AI app 的事實標準 eval 方法、比 human eval 便宜 500-5000×、跟人類有 80%+ agreement、但要處理 bias。</p>
<p>Judge 在 eval 系統中的定位：<a href="/blog/llm/04-applications/eval-design-framework/" data-link-title="4.13 Eval 設計座標系：三軸、八象限、何時測什麼" data-link-desc="Eval 設計三軸（objective↔subjective / component↔end-to-end / quantitative↔qualitative）、八象限的對應 eval 工具、軸選錯的訊號、跟 benchmarking / LLM-as-judge / tracing 的關係">4.13 Eval 設計座標系</a> 把 eval 分三軸八象限、判斷哪個象限該用什麼工具——judge 的位置是 subjective 軸（沒 ground truth 的行為）、不是 objective 軸（有 ground truth 用 deterministic check 更便宜更準）。讀本章前先看 4.13 的軸誤選段、避開「全部 eval 都做成 judge」的常見反模式。</p>
<h2 id="本章目標">本章目標</h2>
<p>讀完本章後、你應該能：</p>
<ol>
<li>區分 LLM-as-Judge、standard benchmark、human eval 三條 eval 路徑。</li>
<li>設計可重現的 judge rubric（input / output / rubric / reasoning 四段）。</li>
<li>用 pairwise vs direct scoring、知道何時用哪種。</li>
<li>緩解三大 bias（position / verbosity / self-preference）。</li>
<li>把 production <a href="/blog/llm/04-applications/llm-tracing-and-observability/" data-link-title="4.20 LLM tracing 與 observability" data-link-desc="OpenTelemetry GenAI semantic conventions、結構化 span 設計、cost / latency 監控、failure debug 流程、跟 LLM-as-judge eval 的串接">trace</a> 餵回 judge、形成自動 eval 閉環。</li>
</ol>
<h2 id="為什麼需要-llm-as-judge">為什麼需要 LLM-as-Judge</h2>
<p><a href="/blog/llm/04-applications/benchmarking-and-evaluation/" data-link-title="4.14 Benchmarking 與評估方法論" data-link-desc="判讀 model card benchmark 數字、做自己工作流的 in-house benchmark、量測本地推論速度的完整方法論">4.14</a> 推「in-house benchmark 是 final test」、但操作層是個 gap：</p>
<table>
  <thead>
      <tr>
          <th>Eval 痛點</th>
          <th>LLM-as-Judge 解法</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Standard benchmark 跟自己 use case 不符</td>
          <td>Judge 用自己 case 跑、rubric 自定義</td>
      </tr>
      <tr>
          <td>Human eval 太貴 / 太慢</td>
          <td>Judge 自動跑、$0.001-0.01 per item</td>
      </tr>
      <tr>
          <td>Production trace 量大、人工看不完</td>
          <td>Judge 跑 100% production trace 都可行</td>
      </tr>
      <tr>
          <td>Rule-based eval 抓不到語意問題</td>
          <td>Judge 能判斷「答案是否符合意圖、即使措辭不同」</td>
      </tr>
      <tr>
          <td>Iteration 需要快速 feedback</td>
          <td>Judge 幾分鐘跑完 100 items、prompt 改完馬上重測</td>
      </tr>
  </tbody>
</table>
<p>主要 use case（重複 <a href="/blog/llm/knowledge-cards/llm-as-judge/" data-link-title="LLM-as-Judge" data-link-desc="用 LLM 評估另一個 LLM 的輸出品質、production eval 的主流方法、500-5000× 成本降但有 bias 要處理">LLM-as-Judge 卡片</a>）：in-house benchmark、production trace eval、A/B test、synthetic data quality。</p>
<h2 id="judge-prompt-結構">Judge prompt 結構</h2>
<p>可重現的 judge 必須四段式：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln"> 1</span><span class="cl">[Section 1: Task description]
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">你是 LLM 輸出品質評估員。要評估 coding assistant 對使用者請求的回答品質。
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">[Section 2: Input + Output to evaluate]
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">User request: {input}
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">Assistant response: {output}
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">[Section 3: Rubric（評分標準）]
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">評分維度：
</span></span><span class="line"><span class="ln">10</span><span class="cl">1. Correctness（程式碼能否運作、邏輯是否正確）：1-5
</span></span><span class="line"><span class="ln">11</span><span class="cl">2. Style（是否符合 codebase convention）：1-5
</span></span><span class="line"><span class="ln">12</span><span class="cl">3. Completeness（是否完整解決 user request）：1-5
</span></span><span class="line"><span class="ln">13</span><span class="cl">
</span></span><span class="line"><span class="ln">14</span><span class="cl">評分規則：
</span></span><span class="line"><span class="ln">15</span><span class="cl">- 5：完美無瑕、可直接 merge
</span></span><span class="line"><span class="ln">16</span><span class="cl">- 4：小修可用、整體正確
</span></span><span class="line"><span class="ln">17</span><span class="cl">- 3：方向正確、需 substantial 修改
</span></span><span class="line"><span class="ln">18</span><span class="cl">- 2：部分對、主要邏輯有錯
</span></span><span class="line"><span class="ln">19</span><span class="cl">- 1：完全錯、誤導使用者
</span></span><span class="line"><span class="ln">20</span><span class="cl">
</span></span><span class="line"><span class="ln">21</span><span class="cl">明確不加分：
</span></span><span class="line"><span class="ln">22</span><span class="cl">- 冗長 / verbose（同樣正確的短答 = 長答）
</span></span><span class="line"><span class="ln">23</span><span class="cl">- 道歉 / 開場白
</span></span><span class="line"><span class="ln">24</span><span class="cl">- 「我希望這有幫助」這類禮貌話
</span></span><span class="line"><span class="ln">25</span><span class="cl">
</span></span><span class="line"><span class="ln">26</span><span class="cl">[Section 4: Output format]
</span></span><span class="line"><span class="ln">27</span><span class="cl">請依下列 JSON 輸出：
</span></span><span class="line"><span class="ln">28</span><span class="cl">{
</span></span><span class="line"><span class="ln">29</span><span class="cl">  &#34;correctness&#34;: &lt;1-5&gt;,
</span></span><span class="line"><span class="ln">30</span><span class="cl">  &#34;style&#34;: &lt;1-5&gt;,
</span></span><span class="line"><span class="ln">31</span><span class="cl">  &#34;completeness&#34;: &lt;1-5&gt;,
</span></span><span class="line"><span class="ln">32</span><span class="cl">  &#34;reasoning&#34;: &#34;&lt;簡短解釋&gt;&#34;,
</span></span><span class="line"><span class="ln">33</span><span class="cl">  &#34;overall&#34;: &lt;1-5&gt;
</span></span><span class="line"><span class="ln">34</span><span class="cl">}</span></span></code></pre></div><p>關鍵設計原則：</p>
<ol>
<li><strong>Rubric 明確、可重現</strong>：用 1-5 scale + 每分明確定義、避免 judge 自由發揮</li>
<li><strong>明確列「不加分項」</strong>：vag rubric 容易讓 judge 加分長答 / 道歉 / 客套（verbosity bias）</li>
<li><strong>要求 reasoning</strong>：強迫 judge 寫評分理由、提升 calibration、後續可 debug</li>
<li><strong>Structured output</strong>：用 JSON / <a href="/blog/llm/04-applications/application-protocols/" data-link-title="4.6 應用層協議：function calling / structured output / MCP" data-link-desc="三個常被混為一談的概念：模型能力、sampling 約束、server 協議，三者的層級差異與組合方式">structured output</a> 強制格式、後續可程式化處理</li>
</ol>
<h2 id="pairwise-vs-direct-scoring">Pairwise vs Direct scoring</h2>
<p>兩種主流評分方式：</p>
<h3 id="direct-scoring直接打分">Direct scoring（直接打分）</h3>
<p>給一個 (input, output)、judge 給絕對分數（1-5、1-10）。</p>
<p>優點：簡單、可看「絕對品質」隨時間改變
缺點：分數 calibration 不穩（不同 batch 跑、judge 可能 baseline drift）</p>
<h3 id="pairwise-comparison兩兩比較">Pairwise comparison（兩兩比較）</h3>
<p>給一個 input + 兩個 output（A、B）、judge 選哪個比較好。</p>
<p>優點：相對比較比絕對打分穩、適合 A/B testing
缺點：需要兩個 candidates、結果是「A &gt; B」不是「A 多好」</p>
<p>實務組合：</p>
<table>
  <thead>
      <tr>
          <th>場景</th>
          <th>適合方式</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Production quality monitoring</td>
          <td>Direct scoring（每個 trace 一個分數）</td>
      </tr>
      <tr>
          <td>Prompt / model A/B test</td>
          <td>Pairwise（A 跟 B 比）</td>
      </tr>
      <tr>
          <td>Fine-tune 前後比較</td>
          <td>Pairwise</td>
      </tr>
      <tr>
          <td>Regression detection</td>
          <td>Direct（跟 baseline 比較）</td>
      </tr>
      <tr>
          <td>Synthetic data filtering</td>
          <td>Direct（保留 ≥ 4 分）</td>
      </tr>
  </tbody>
</table>
<h2 id="三大-bias-跟緩解">三大 Bias 跟緩解</h2>
<h3 id="1-position-bias位置偏見">1. Position bias（位置偏見）</h3>
<p>Pairwise 比較時、judge 對「先出現」的 candidate 有偏好（通常偏 A）。</p>
<p><strong>緩解</strong>：</p>
<ul>
<li>換位置跑 2 次（A-B 跟 B-A）</li>
<li>只 count 兩次都偏 A 的為「prefer A」、不一致為「tie」</li>
<li>標準 LLM-as-Judge framework（如 MT-Bench）內建這做法</li>
</ul>
<h3 id="2-verbosity-bias冗長偏見">2. Verbosity bias（冗長偏見）</h3>
<p>Judge 傾向給「長答」高分、即使內容沒比「短答」更好。</p>
<p><strong>緩解</strong>：</p>
<ul>
<li>Rubric 明確寫「冗長不加分」「同樣正確的短答 = 長答」</li>
<li>長度 normalize：分數 = raw_score / log(length)</li>
<li>用 length-controlled benchmark（如 length-controlled AlpacaEval）</li>
</ul>
<h3 id="3-self-preference-bias自家偏好">3. Self-preference bias（自家偏好）</h3>
<p>Judge 偏好自家風格的答案（GPT 當 judge、偏好 GPT-style 輸出；Claude 當 judge、偏好 Claude-style）。</p>
<p><strong>緩解</strong>：</p>
<ul>
<li>用 3 個不同 family 的 judge model（如 Claude + GPT + Gemini）取多數</li>
<li>避免 judge 跟 test subject 同 model</li>
<li>用 reasoning model 當 judge（多家 reasoning model 共識更穩）</li>
</ul>
<h3 id="補充-biasformat-bias">補充 bias：Format bias</h3>
<p>Judge 對「有 markdown / 有 code block / 有結構」的答案偏好、即使內容沒比「純文字」更好。</p>
<p><strong>緩解</strong>：rubric 明確寫「格式不加分、看內容」。</p>
<h2 id="calibration校準">Calibration（校準）</h2>
<p>Judge 不該光信、要 calibrate：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln"> 1</span><span class="cl">1. 蒐集 100 個 (input, output) pair
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">2. Human eval（你自己或可信 human）打 ground truth 分數
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">3. Judge 跑同樣 100 個
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">4. 算 agreement rate：
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">   - Pairwise：judge 跟 human 同意比例（target &gt; 75%）
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">   - Direct scoring：Spearman correlation（target &gt; 0.7）
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">5. 若 agreement 低：
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">   - 改 rubric（更明確）
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">   - 換 judge model（更強）
</span></span><span class="line"><span class="ln">10</span><span class="cl">   - 改 prompt（few-shot example）
</span></span><span class="line"><span class="ln">11</span><span class="cl">6. Calibrate 後的 judge 才能跑 production</span></span></code></pre></div><p>Calibration 是「judge 評什麼」跟「人類評什麼」對齊的步驟、跳過會讓 production eval 失準。</p>
<h2 id="跟-420-llm-tracing-的閉環">跟 <a href="/blog/llm/04-applications/llm-tracing-and-observability/" data-link-title="4.20 LLM tracing 與 observability" data-link-desc="OpenTelemetry GenAI semantic conventions、結構化 span 設計、cost / latency 監控、failure debug 流程、跟 LLM-as-judge eval 的串接">4.20 LLM tracing</a> 的閉環</h2>
<p>Production trace + LLM-as-Judge 形成自動 eval pipeline：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln"> 1</span><span class="cl">Production users
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">   ↓ 產生 trace
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">[LLM tracing 平台]（LangSmith / Phoenix / Langfuse / Braintrust）
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">   ↓ filter：user thumbs-down、error、long latency 等 trace
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">   ↓ sample 100 個 / day
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">[LLM-as-Judge batch run]
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">   ↓ rubric scoring
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">[Dashboard]
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">   - 哪類 query 品質下降
</span></span><span class="line"><span class="ln">10</span><span class="cl">   - 哪個 deployment version 品質差
</span></span><span class="line"><span class="ln">11</span><span class="cl">   - 哪個 user segment 體驗差
</span></span><span class="line"><span class="ln">12</span><span class="cl">   ↓
</span></span><span class="line"><span class="ln">13</span><span class="cl">觸發 alert / 改 prompt / 改 model / 回退
</span></span><span class="line"><span class="ln">14</span><span class="cl">   ↓ A/B test
</span></span><span class="line"><span class="ln">15</span><span class="cl">   ↓ Pairwise judge eval new vs old
</span></span><span class="line"><span class="ln">16</span><span class="cl">   ↓ Deploy 勝者</span></span></code></pre></div><p>這是 production LLM 應用 quality engineering 的標準閉環。</p>
<h2 id="judge-model-選型">Judge model 選型</h2>
<table>
  <thead>
      <tr>
          <th>Judge model 候選</th>
          <th>強項</th>
          <th>弱項</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Claude Sonnet / Opus</td>
          <td>reasoning 強、rubric 跟得緊</td>
          <td>Cost 中等</td>
      </tr>
      <tr>
          <td>GPT-5 / GPT-4o</td>
          <td>普及、tool-calling 強</td>
          <td>對自家 GPT 輸出有 self-preference</td>
      </tr>
      <tr>
          <td>Gemini Pro 2.5</td>
          <td>Long context 強、multi-modal</td>
          <td>rubric 跟得較鬆</td>
      </tr>
      <tr>
          <td>o1 / o3 / R1（reasoning model）</td>
          <td>推理能力強、判 nuanced case 穩</td>
          <td>Cost 高、latency 長</td>
      </tr>
      <tr>
          <td>本地 30B+ 模型（QwQ、DeepSeek-R1 distill）</td>
          <td>隱私強、cost 0</td>
          <td>能力上限低於雲端旗艦</td>
      </tr>
  </tbody>
</table>
<p>判讀：</p>
<ol>
<li><strong>大 stake / final QA</strong>：雲端旗艦 reasoning model</li>
<li><strong>大量 production trace eval</strong>：中等模型（GPT-4o / Sonnet）、cost / speed 平衡</li>
<li><strong>隱私敏感（user trace 不能送雲端）</strong>：本地 reasoning model（QwQ-32B / R1 distill）</li>
<li><strong>A/B test prompt 改進</strong>：用同個 judge 跑前後比對、保持 baseline</li>
</ol>
<h2 id="失敗模式">失敗模式</h2>
<ol>
<li><strong>Rubric 太 vague</strong>：judge 自由發揮、分數沒重複性</li>
</ol>
<p><strong>緩解</strong>：rubric 寫得像 unit test、每分有具體 criteria</p>
<ol start="2">
<li><strong>沒做 calibration</strong>：judge 跟 human agreement 沒驗、可能 systematically off</li>
</ol>
<p><strong>緩解</strong>：每次大改 rubric / 換 judge model 都重新 calibrate</p>
<ol start="3">
<li><strong>Sample 不代表 production</strong>：只 eval easy case、production 真實困難 case 沒覆蓋</li>
</ol>
<p><strong>緩解</strong>：用 stratified sampling（按 difficulty / user segment / feature 抽樣）</p>
<ol start="4">
<li><strong>Bias 沒緩解</strong>：position / verbosity / self-preference 直接 baked in</li>
</ol>
<p><strong>緩解</strong>：標準 framework（DeepEval / Inspect / Braintrust）內建 bias 緩解、用既有 framework 比 DIY 穩</p>
<ol start="5">
<li><strong>Judge cost 比預期高</strong>：production trace 全跑 judge、cost 爆</li>
</ol>
<p><strong>緩解</strong>：sample rate &lt; 10%、配合 <a href="/blog/llm/04-applications/llm-tracing-and-observability/" data-link-title="4.20 LLM tracing 與 observability" data-link-desc="OpenTelemetry GenAI semantic conventions、結構化 span 設計、cost / latency 監控、failure debug 流程、跟 LLM-as-judge eval 的串接">LLM tracing</a> 的 sampling</p>
<ol start="6">
<li><strong>Over-reliance on judge</strong>：忘記 judge 也會錯、把 judge 當絕對真理</li>
</ol>
<p><strong>緩解</strong>：高 stake 任務仍需 spot human review、judge 是 80% 解、不是 100%</p>
<h2 id="主流-framework">主流 framework</h2>
<table>
  <thead>
      <tr>
          <th>Framework</th>
          <th>特色</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>DeepEval</td>
          <td>OSS、Python、跟 pytest 整合</td>
      </tr>
      <tr>
          <td>Inspect（UK AI Safety）</td>
          <td>強 eval framework、reasoning model 友善</td>
      </tr>
      <tr>
          <td>Braintrust</td>
          <td>SaaS、eval + tracing 一體</td>
      </tr>
      <tr>
          <td>Langfuse evals</td>
          <td>OSS、跟 tracing 整合</td>
      </tr>
      <tr>
          <td>OpenAI evals</td>
          <td>OSS、Anthropic 也支援</td>
      </tr>
      <tr>
          <td>Patronus</td>
          <td>Production eval SaaS</td>
      </tr>
  </tbody>
</table>
<h2 id="何時不該用-llm-as-judge">何時不該用 LLM-as-Judge</h2>
<ol>
<li><strong>可機械驗證</strong>：unit test、exact match、output schema validation — 用 deterministic rule 比 judge 穩</li>
<li><strong>極小 dataset（&lt; 20 items）</strong>：直接 human eval、不必 judge</li>
<li><strong>判讀需要 domain expertise</strong>：醫療 / 法律 / 安全的 high-stake 判讀、judge 不該替代 expert</li>
<li><strong>Judge 能力 &lt; test subject</strong>：用 GPT-4o judge 評 o3 輸出、judge 看不懂 reasoning trace</li>
</ol>
<h2 id="何時過時--何時不過時">何時過時 / 何時不過時</h2>
<p><strong>不會過時的部分</strong>：</p>
<ul>
<li>LLM-as-Judge 作為 production eval 主流方法的地位</li>
<li>四段式 judge prompt 結構（task / input-output / rubric / format）</li>
<li>Pairwise vs direct scoring 的取捨</li>
<li>三大 bias 分類跟緩解方法</li>
<li>Production trace → judge → action 的閉環</li>
</ul>
<p><strong>會變的部分</strong>：</p>
<ul>
<li>主流 framework（DeepEval / Inspect / Braintrust 等）</li>
<li>各 judge model 的具體能力（每代強模型）</li>
<li>Bias 的具體量化（人類 agreement 數字會隨時間 / 任務變）</li>
<li>新興 bias 跟緩解方法</li>
</ul>
<h2 id="下一步">下一步</h2>
<p>下一步：模組四到此覆蓋從基礎（4.0 prompt 技術光譜 / 4.1-4.2 RAG / 4.3 tool / 4.4 agent / 4.5 HITL）、協議與編排（4.6 protocols / 4.7 workflow / 4.8 multi-agent）、production 細節（4.9-4.12 resource / artifact / long-context / embedding）、到 eval 跟 production observability 閉環（4.13 eval 框架 / 4.14 benchmarking / 4.17-4.21 harness / caching / memory / tracing / judge）的完整應用層地圖。Hands-on 端到端案例見 <a href="/blog/llm/04-applications/hands-on/" data-link-title="4.x Hands-on：端到端案例" data-link-desc="把模組四的所有原理串成具體 case study：從 task decomposition、workflow 設計、eval 設計到 iteration loop">hands-on 子分類</a>。可進入 <a href="/blog/llm/05-discrete-gpu/" data-link-title="模組五：Windows / Linux &#43; 獨立 GPU" data-link-desc="消費級 PC（Windows / Linux &#43; NVIDIA / AMD 獨立 GPU）跑本地 LLM 的硬體判讀、MoE CPU 卸載、KV cache 量化與 llama.cpp 調參">模組五</a> 看本地推論硬體、進入 <a href="/blog/llm/06-security/" data-link-title="模組六：本地 LLM 的安全與權限" data-link-desc="個人 dev 在自己機器上跑本地 LLM 的安全議題：模型供應鏈、推論伺服器綁定、tool use 副作用、prompt injection 在 IDE、跨雲端 / 本地資料邊界">模組六</a> 看安全議題（特別是 <a href="/blog/llm/06-security/owasp-llm-top10-mapping/" data-link-title="6.6 OWASP LLM Top 10 對照圖" data-link-desc="把模組六的本地 dev 視角安全章節對照到 OWASP LLM Top 10 2025、補出個人 dev 場景跟企業合規溝通的共同詞彙">6.6 OWASP LLM Top 10 對照</a>、把 production eval 的安全議題對應到企業合規詞彙）、或回 <a href="/blog/llm/04-applications/eval-design-framework/" data-link-title="4.13 Eval 設計座標系：三軸、八象限、何時測什麼" data-link-desc="Eval 設計三軸（objective↔subjective / component↔end-to-end / quantitative↔qualitative）、八象限的對應 eval 工具、軸選錯的訊號、跟 benchmarking / LLM-as-judge / tracing 的關係">4.13 Eval 設計座標系</a> 看 judge 在 meta eval 框架中的定位。</p>
]]></content:encoded></item></channel></rss>