<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>模組六：生產操作 on Tarragon</title><link>https://tarrragon.github.io/blog/go-advanced/06-production-operations/</link><description>Recent content in 模組六：生產操作 on Tarragon</description><generator>Hugo -- gohugo.io</generator><language>zh-TW</language><copyright>Tarragon (CC BY 4.0)</copyright><lastBuildDate>Wed, 22 Apr 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://tarrragon.github.io/blog/go-advanced/06-production-operations/index.xml" rel="self" type="application/rss+xml"/><item><title>6.1 graceful shutdown 與 signal handling</title><link>https://tarrragon.github.io/blog/go-advanced/06-production-operations/graceful-shutdown/</link><pubDate>Wed, 22 Apr 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/go-advanced/06-production-operations/graceful-shutdown/</guid><description>&lt;p>&lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/graceful-shutdown/" data-link-title="Graceful Shutdown" data-link-desc="說明服務停止前如何排空流量、完成工作與保存狀態">Graceful shutdown&lt;/a> 的核心目標是服務收到停止訊號後，不再接受新工作，並給既有工作一段時間完成或清理。Go 服務通常用 signal、root context、&lt;code>http.Server.Shutdown&lt;/code>、worker context 與 &lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/timeout/" data-link-title="Timeout" data-link-desc="說明等待外部操作的時間上限如何保護資源與使用者體驗">timeout&lt;/a> 串起停止流程。&lt;/p>
&lt;h2 id="本章目標">本章目標&lt;/h2>
&lt;p>學完本章後，你將能夠：&lt;/p>
&lt;ol>
&lt;li>把 OS signal 轉成 root context 取消&lt;/li>
&lt;li>用 &lt;code>http.Server.Shutdown&lt;/code> 停止接受新 request&lt;/li>
&lt;li>讓 worker、hub、&lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/websocket/" data-link-title="WebSocket" data-link-desc="說明 WebSocket 如何提供長連線雙向即時通訊">WebSocket&lt;/a> pump 觀察同一個停止訊號&lt;/li>
&lt;li>設計 shutdown timeout 與強制退出邊界&lt;/li>
&lt;li>測試 server 與 worker 的停止流程&lt;/li>
&lt;/ol>
&lt;hr>
&lt;h2 id="觀察直接結束-process-會留下不確定狀態">【觀察】直接結束 process 會留下不確定狀態&lt;/h2>
&lt;p>Shutdown 的核心風險是停止流程不明確。服務可能正在處理 request、WebSocket client 仍在線、worker 正在寫資料、&lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/queue/" data-link-title="Queue" data-link-desc="說明 queue 如何保存等待處理的工作並形成容量邊界">queue&lt;/a> message 尚未 &lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/ack-nack/" data-link-title="Ack / Nack" data-link-desc="說明 consumer 如何向 broker 回報訊息處理結果">ack&lt;/a>、diagnostics 還以為服務可接流量。&lt;/p>
&lt;p>不完整停止常見後果：&lt;/p>
&lt;ul>
&lt;li>新 request 在服務即將關閉時仍被接受。&lt;/li>
&lt;li>WebSocket client 沒收到 close，server 端 goroutine 殘留。&lt;/li>
&lt;li>背景 worker 寫到一半被中斷。&lt;/li>
&lt;li>&lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/readiness/" data-link-title="Readiness" data-link-desc="說明 instance 何時可以安全接收流量，以及 readiness 如何和部署平台協作">readiness&lt;/a> 還是 200，負載平衡器繼續送流量。&lt;/li>
&lt;li>測試結束後留下 goroutine 或開放 port。&lt;/li>
&lt;/ul>
&lt;p>Graceful shutdown 是讓停止策略可預期。&lt;/p>
&lt;h2 id="判讀shutdown-是多階段流程">【判讀】shutdown 是多階段流程&lt;/h2>
&lt;p>Graceful shutdown 的核心流程是先停止接新工作，再讓既有工作收尾，最後釋放資源。&lt;/p>
&lt;p>建議順序：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-text" data-lang="text">&lt;span class="line">&lt;span class="ln"> 1&lt;/span>&lt;span class="cl">receive SIGINT/SIGTERM
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 2&lt;/span>&lt;span class="cl"> │
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 3&lt;/span>&lt;span class="cl"> ▼
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 4&lt;/span>&lt;span class="cl">cancel root context
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 5&lt;/span>&lt;span class="cl"> │
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 6&lt;/span>&lt;span class="cl"> ├── readiness becomes false
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 7&lt;/span>&lt;span class="cl"> ├── HTTP server stops accepting new requests
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 8&lt;/span>&lt;span class="cl"> ├── workers stop consuming new jobs
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 9&lt;/span>&lt;span class="cl"> ├── WebSocket hub unregisters clients
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">10&lt;/span>&lt;span class="cl"> └── diagnostics/log records shutdown reason
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">11&lt;/span>&lt;span class="cl"> │
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">12&lt;/span>&lt;span class="cl"> ▼
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">13&lt;/span>&lt;span class="cl">wait within timeout
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">14&lt;/span>&lt;span class="cl"> │
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">15&lt;/span>&lt;span class="cl"> ▼
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">16&lt;/span>&lt;span class="cl">process exits&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>不同服務會有不同細節，但核心不變：停止訊號要集中，元件各自完成自己的 cleanup，整體流程要有 timeout。&lt;/p>
&lt;h2 id="執行signal-轉成-root-context">【執行】signal 轉成 root context&lt;/h2>
&lt;p>Signal handling 的核心責任是把作業系統訊號轉成應用程式可理解的取消訊號。Go 1.16 之後可以使用 &lt;code>signal.NotifyContext&lt;/code>。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">main&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl"> &lt;span class="nx">ctx&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">stop&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">signal&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">NotifyContext&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">context&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Background&lt;/span>&lt;span class="p">(),&lt;/span> &lt;span class="nx">os&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Interrupt&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">syscall&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">SIGTERM&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl"> &lt;span class="k">defer&lt;/span> &lt;span class="nf">stop&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="nx">err&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nf">run&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">ctx&lt;/span>&lt;span class="p">);&lt;/span> &lt;span class="nx">err&lt;/span> &lt;span class="o">!=&lt;/span> &lt;span class="kc">nil&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">6&lt;/span>&lt;span class="cl"> &lt;span class="nx">log&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Fatal&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">err&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">7&lt;/span>&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">8&lt;/span>&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;code>ctx&lt;/code> 是 root context。HTTP server、worker、hub、diagnostics 都應從它派生出自己的 lifecycle，而不是每個元件各自監聽 signal。&lt;/p>
&lt;p>Signal handler 不應放大量清理邏輯。它只負責發出停止意圖；實際清理由各元件在自己的 ownership 邊界內完成。&lt;/p></description><content:encoded><![CDATA[<p><a href="/blog/backend/knowledge-cards/graceful-shutdown/" data-link-title="Graceful Shutdown" data-link-desc="說明服務停止前如何排空流量、完成工作與保存狀態">Graceful shutdown</a> 的核心目標是服務收到停止訊號後，不再接受新工作，並給既有工作一段時間完成或清理。Go 服務通常用 signal、root context、<code>http.Server.Shutdown</code>、worker context 與 <a href="/blog/backend/knowledge-cards/timeout/" data-link-title="Timeout" data-link-desc="說明等待外部操作的時間上限如何保護資源與使用者體驗">timeout</a> 串起停止流程。</p>
<h2 id="本章目標">本章目標</h2>
<p>學完本章後，你將能夠：</p>
<ol>
<li>把 OS signal 轉成 root context 取消</li>
<li>用 <code>http.Server.Shutdown</code> 停止接受新 request</li>
<li>讓 worker、hub、<a href="/blog/backend/knowledge-cards/websocket/" data-link-title="WebSocket" data-link-desc="說明 WebSocket 如何提供長連線雙向即時通訊">WebSocket</a> pump 觀察同一個停止訊號</li>
<li>設計 shutdown timeout 與強制退出邊界</li>
<li>測試 server 與 worker 的停止流程</li>
</ol>
<hr>
<h2 id="觀察直接結束-process-會留下不確定狀態">【觀察】直接結束 process 會留下不確定狀態</h2>
<p>Shutdown 的核心風險是停止流程不明確。服務可能正在處理 request、WebSocket client 仍在線、worker 正在寫資料、<a href="/blog/backend/knowledge-cards/queue/" data-link-title="Queue" data-link-desc="說明 queue 如何保存等待處理的工作並形成容量邊界">queue</a> message 尚未 <a href="/blog/backend/knowledge-cards/ack-nack/" data-link-title="Ack / Nack" data-link-desc="說明 consumer 如何向 broker 回報訊息處理結果">ack</a>、diagnostics 還以為服務可接流量。</p>
<p>不完整停止常見後果：</p>
<ul>
<li>新 request 在服務即將關閉時仍被接受。</li>
<li>WebSocket client 沒收到 close，server 端 goroutine 殘留。</li>
<li>背景 worker 寫到一半被中斷。</li>
<li><a href="/blog/backend/knowledge-cards/readiness/" data-link-title="Readiness" data-link-desc="說明 instance 何時可以安全接收流量，以及 readiness 如何和部署平台協作">readiness</a> 還是 200，負載平衡器繼續送流量。</li>
<li>測試結束後留下 goroutine 或開放 port。</li>
</ul>
<p>Graceful shutdown 是讓停止策略可預期。</p>
<h2 id="判讀shutdown-是多階段流程">【判讀】shutdown 是多階段流程</h2>
<p>Graceful shutdown 的核心流程是先停止接新工作，再讓既有工作收尾，最後釋放資源。</p>
<p>建議順序：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln"> 1</span><span class="cl">receive SIGINT/SIGTERM
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">        │
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">        ▼
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">cancel root context
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">        │
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">        ├── readiness becomes false
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">        ├── HTTP server stops accepting new requests
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">        ├── workers stop consuming new jobs
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">        ├── WebSocket hub unregisters clients
</span></span><span class="line"><span class="ln">10</span><span class="cl">        └── diagnostics/log records shutdown reason
</span></span><span class="line"><span class="ln">11</span><span class="cl">        │
</span></span><span class="line"><span class="ln">12</span><span class="cl">        ▼
</span></span><span class="line"><span class="ln">13</span><span class="cl">wait within timeout
</span></span><span class="line"><span class="ln">14</span><span class="cl">        │
</span></span><span class="line"><span class="ln">15</span><span class="cl">        ▼
</span></span><span class="line"><span class="ln">16</span><span class="cl">process exits</span></span></code></pre></div><p>不同服務會有不同細節，但核心不變：停止訊號要集中，元件各自完成自己的 cleanup，整體流程要有 timeout。</p>
<h2 id="執行signal-轉成-root-context">【執行】signal 轉成 root context</h2>
<p>Signal handling 的核心責任是把作業系統訊號轉成應用程式可理解的取消訊號。Go 1.16 之後可以使用 <code>signal.NotifyContext</code>。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln">1</span><span class="cl"><span class="kd">func</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">    <span class="nx">ctx</span><span class="p">,</span> <span class="nx">stop</span> <span class="o">:=</span> <span class="nx">signal</span><span class="p">.</span><span class="nf">NotifyContext</span><span class="p">(</span><span class="nx">context</span><span class="p">.</span><span class="nf">Background</span><span class="p">(),</span> <span class="nx">os</span><span class="p">.</span><span class="nx">Interrupt</span><span class="p">,</span> <span class="nx">syscall</span><span class="p">.</span><span class="nx">SIGTERM</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">    <span class="k">defer</span> <span class="nf">stop</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">
</span></span><span class="line"><span class="ln">5</span><span class="cl">    <span class="k">if</span> <span class="nx">err</span> <span class="o">:=</span> <span class="nf">run</span><span class="p">(</span><span class="nx">ctx</span><span class="p">);</span> <span class="nx">err</span> <span class="o">!=</span> <span class="kc">nil</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl">        <span class="nx">log</span><span class="p">.</span><span class="nf">Fatal</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">7</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">8</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p><code>ctx</code> 是 root context。HTTP server、worker、hub、diagnostics 都應從它派生出自己的 lifecycle，而不是每個元件各自監聽 signal。</p>
<p>Signal handler 不應放大量清理邏輯。它只負責發出停止意圖；實際清理由各元件在自己的 ownership 邊界內完成。</p>
<h2 id="執行http-server-用-shutdown-停止接新-request">【執行】HTTP server 用 Shutdown 停止接新 request</h2>
<p><code>http.Server.Shutdown</code> 的核心行為是停止接受新連線，並等待既有 request 在 timeout 內完成。它比直接 <code>Close</code> 更適合 graceful shutdown。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="kd">func</span> <span class="nf">RunHTTPServer</span><span class="p">(</span><span class="nx">ctx</span> <span class="nx">context</span><span class="p">.</span><span class="nx">Context</span><span class="p">,</span> <span class="nx">handler</span> <span class="nx">http</span><span class="p">.</span><span class="nx">Handler</span><span class="p">)</span> <span class="kt">error</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="nx">server</span> <span class="o">:=</span> <span class="o">&amp;</span><span class="nx">http</span><span class="p">.</span><span class="nx">Server</span><span class="p">{</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">        <span class="nx">Addr</span><span class="p">:</span>    <span class="s">&#34;:8080&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">        <span class="nx">Handler</span><span class="p">:</span> <span class="nx">handler</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">    <span class="nx">errCh</span> <span class="o">:=</span> <span class="nb">make</span><span class="p">(</span><span class="kd">chan</span> <span class="kt">error</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">    <span class="k">go</span> <span class="kd">func</span><span class="p">()</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">        <span class="nx">errCh</span> <span class="o">&lt;-</span> <span class="nx">server</span><span class="p">.</span><span class="nf">ListenAndServe</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">    <span class="p">}()</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">
</span></span><span class="line"><span class="ln">12</span><span class="cl">    <span class="k">select</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">    <span class="k">case</span> <span class="o">&lt;-</span><span class="nx">ctx</span><span class="p">.</span><span class="nf">Done</span><span class="p">():</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">        <span class="nx">shutdownCtx</span><span class="p">,</span> <span class="nx">cancel</span> <span class="o">:=</span> <span class="nx">context</span><span class="p">.</span><span class="nf">WithTimeout</span><span class="p">(</span><span class="nx">context</span><span class="p">.</span><span class="nf">Background</span><span class="p">(),</span> <span class="mi">10</span><span class="o">*</span><span class="nx">time</span><span class="p">.</span><span class="nx">Second</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">        <span class="k">defer</span> <span class="nf">cancel</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl">        <span class="k">return</span> <span class="nx">server</span><span class="p">.</span><span class="nf">Shutdown</span><span class="p">(</span><span class="nx">shutdownCtx</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl">
</span></span><span class="line"><span class="ln">18</span><span class="cl">    <span class="k">case</span> <span class="nx">err</span> <span class="o">:=</span> <span class="o">&lt;-</span><span class="nx">errCh</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">19</span><span class="cl">        <span class="k">if</span> <span class="nx">errors</span><span class="p">.</span><span class="nf">Is</span><span class="p">(</span><span class="nx">err</span><span class="p">,</span> <span class="nx">http</span><span class="p">.</span><span class="nx">ErrServerClosed</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">20</span><span class="cl">            <span class="k">return</span> <span class="kc">nil</span>
</span></span><span class="line"><span class="ln">21</span><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="ln">22</span><span class="cl">        <span class="k">return</span> <span class="nx">err</span>
</span></span><span class="line"><span class="ln">23</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">24</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>Shutdown timeout 是必要邊界。沒有 timeout 的 shutdown 可能永遠等待某個卡住 request；timeout 太短則可能讓合理 request 來不及收尾。</p>
<h2 id="策略readiness-應先變成-false">【策略】readiness 應先變成 false</h2>
<p>Readiness 的核心用途是控制服務是否應接新流量。Shutdown 開始後，readiness 應先變成 false，再停止 server 或等待既有工作。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="kd">type</span> <span class="nx">Lifecycle</span> <span class="kd">struct</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="nx">shuttingDown</span> <span class="nx">atomic</span><span class="p">.</span><span class="nx">Bool</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="kd">func</span> <span class="p">(</span><span class="nx">l</span> <span class="o">*</span><span class="nx">Lifecycle</span><span class="p">)</span> <span class="nf">BeginShutdown</span><span class="p">()</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">    <span class="nx">l</span><span class="p">.</span><span class="nx">shuttingDown</span><span class="p">.</span><span class="nf">Store</span><span class="p">(</span><span class="kc">true</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="kd">func</span> <span class="p">(</span><span class="nx">l</span> <span class="o">*</span><span class="nx">Lifecycle</span><span class="p">)</span> <span class="nf">Ready</span><span class="p">()</span> <span class="kt">bool</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">    <span class="k">return</span> <span class="p">!</span><span class="nx">l</span><span class="p">.</span><span class="nx">shuttingDown</span><span class="p">.</span><span class="nf">Load</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>Signal 收到後：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln">1</span><span class="cl"><span class="nx">lifecycle</span><span class="p">.</span><span class="nf">BeginShutdown</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="nf">cancel</span><span class="p">()</span></span></span></code></pre></div><p>這讓負載平衡器或監控能知道服務不應再接新流量。Process 還活著，但 readiness 已經反映操作狀態。</p>
<h2 id="執行背景工作要觀察-context">【執行】背景工作要觀察 context</h2>
<p>背景 worker 的核心 shutdown 條件是每個 loop 都能觀察停止訊號。Ticker、queue <a href="/blog/backend/knowledge-cards/consumer/" data-link-title="Consumer" data-link-desc="說明 consumer 如何取得等待處理的工作並產生業務結果">consumer</a>、WebSocket hub 都應該有退出路徑。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="kd">func</span> <span class="nf">RunWorker</span><span class="p">(</span><span class="nx">ctx</span> <span class="nx">context</span><span class="p">.</span><span class="nx">Context</span><span class="p">)</span> <span class="kt">error</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="nx">ticker</span> <span class="o">:=</span> <span class="nx">time</span><span class="p">.</span><span class="nf">NewTicker</span><span class="p">(</span><span class="nx">time</span><span class="p">.</span><span class="nx">Minute</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">    <span class="k">defer</span> <span class="nx">ticker</span><span class="p">.</span><span class="nf">Stop</span><span class="p">()</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">    <span class="k">for</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">        <span class="k">select</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">        <span class="k">case</span> <span class="o">&lt;-</span><span class="nx">ctx</span><span class="p">.</span><span class="nf">Done</span><span class="p">():</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">            <span class="k">return</span> <span class="nx">ctx</span><span class="p">.</span><span class="nf">Err</span><span class="p">()</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">        <span class="k">case</span> <span class="o">&lt;-</span><span class="nx">ticker</span><span class="p">.</span><span class="nx">C</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">            <span class="k">if</span> <span class="nx">err</span> <span class="o">:=</span> <span class="nf">RunOnce</span><span class="p">(</span><span class="nx">ctx</span><span class="p">);</span> <span class="nx">err</span> <span class="o">!=</span> <span class="kc">nil</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">                <span class="k">return</span> <span class="nx">err</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">            <span class="p">}</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>若 <code>RunOnce</code> 可能執行很久，也應接收 context。否則外層 loop 看到 cancel，內層 I/O 或計算仍可能卡住。</p>
<h2 id="策略websocket-cleanup-要回到-hub-owner">【策略】WebSocket cleanup 要回到 hub owner</h2>
<p>WebSocket shutdown 的核心原則是讓 hub 或 connection manager 統一清理 client。不要讓 signal handler 直接遍歷各種 connection 並隨意 close。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="kd">func</span> <span class="p">(</span><span class="nx">h</span> <span class="o">*</span><span class="nx">Hub</span><span class="p">)</span> <span class="nf">Run</span><span class="p">(</span><span class="nx">ctx</span> <span class="nx">context</span><span class="p">.</span><span class="nx">Context</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="k">for</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">        <span class="k">select</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">        <span class="k">case</span> <span class="o">&lt;-</span><span class="nx">ctx</span><span class="p">.</span><span class="nf">Done</span><span class="p">():</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">            <span class="nx">h</span><span class="p">.</span><span class="nf">closeAllClients</span><span class="p">()</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">            <span class="k">return</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">        <span class="k">case</span> <span class="nx">client</span> <span class="o">:=</span> <span class="o">&lt;-</span><span class="nx">h</span><span class="p">.</span><span class="nx">register</span><span class="p">:</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">            <span class="nx">h</span><span class="p">.</span><span class="nf">registerClient</span><span class="p">(</span><span class="nx">client</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">        <span class="k">case</span> <span class="nx">client</span> <span class="o">:=</span> <span class="o">&lt;-</span><span class="nx">h</span><span class="p">.</span><span class="nx">unregister</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">            <span class="nx">h</span><span class="p">.</span><span class="nf">unregisterClient</span><span class="p">(</span><span class="nx">client</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p><code>closeAllClients</code> 應透過 hub 的既有 owner 邏輯關閉 <code>send</code>、移除訂閱、關閉 connection。這延續前面模組的 ownership 原則。</p>
<h2 id="測試shutdown-測試要觀察明確條件">【測試】shutdown 測試要觀察明確條件</h2>
<p>Shutdown 測試的核心是確認停止訊號能讓元件退出，而不是等待固定時間。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="kd">func</span> <span class="nf">TestWorkerStopsOnContextCancel</span><span class="p">(</span><span class="nx">t</span> <span class="o">*</span><span class="nx">testing</span><span class="p">.</span><span class="nx">T</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="nx">ctx</span><span class="p">,</span> <span class="nx">cancel</span> <span class="o">:=</span> <span class="nx">context</span><span class="p">.</span><span class="nf">WithCancel</span><span class="p">(</span><span class="nx">context</span><span class="p">.</span><span class="nf">Background</span><span class="p">())</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">    <span class="nx">done</span> <span class="o">:=</span> <span class="nb">make</span><span class="p">(</span><span class="kd">chan</span> <span class="kd">struct</span><span class="p">{})</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">    <span class="k">go</span> <span class="kd">func</span><span class="p">()</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">        <span class="k">defer</span> <span class="nb">close</span><span class="p">(</span><span class="nx">done</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">        <span class="nx">_</span> <span class="p">=</span> <span class="nf">RunWorker</span><span class="p">(</span><span class="nx">ctx</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">    <span class="p">}()</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">
</span></span><span class="line"><span class="ln">10</span><span class="cl">    <span class="nf">cancel</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">
</span></span><span class="line"><span class="ln">12</span><span class="cl">    <span class="k">select</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">    <span class="k">case</span> <span class="o">&lt;-</span><span class="nx">done</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">    <span class="k">case</span> <span class="o">&lt;-</span><span class="nx">time</span><span class="p">.</span><span class="nf">After</span><span class="p">(</span><span class="nx">time</span><span class="p">.</span><span class="nx">Second</span><span class="p">):</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">        <span class="nx">t</span><span class="p">.</span><span class="nf">Fatalf</span><span class="p">(</span><span class="s">&#34;worker did not stop&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>HTTP server 測試可以啟動 server 後 cancel context，確認 <code>RunHTTPServer</code> 回傳。測試應使用隨機 port 或 <code>httptest.Server</code>，避免固定 port 造成衝突。</p>
<h2 id="本章不處理">本章不處理</h2>
<p>本章先處理服務內部的 shutdown 順序與 cleanup owner；平台 hook、timeout 與 load balancer 合約，會在下列章節再往外延伸：</p>
<ul>
<li><a href="/blog/go-advanced/07-distributed-operations/deployment-contracts/" data-link-title="7.5 Kubernetes、systemd 與 load balancer 合約" data-link-desc="理解部署平台如何影響 Go 服務的 shutdown、health 與資源限制">Go 進階：Kubernetes、systemd 與 load balancer 合約</a></li>
</ul>
<h2 id="和-go-教材的關係">和 Go 教材的關係</h2>
<p>這一章承接的是 goroutine lifecycle、ticker cleanup 與 platform handoff；如果你要先回看語言教材，可以讀：</p>
<ul>
<li><a href="/blog/go/04-concurrency/goroutine/" data-link-title="4.1 goroutine：輕量並發工作" data-link-desc="用 goroutine 啟動並發工作，並設計清楚的退出條件">Go：goroutine：輕量並發工作</a></li>
<li><a href="/blog/go/03-stdlib/defer-cleanup/" data-link-title="3.8 defer 與資源清理" data-link-desc="用 defer 管理 close、unlock、cleanup 與 panic 邊界">Go：defer 與資源清理</a></li>
<li><a href="/blog/go/04-concurrency/select/" data-link-title="4.3 select：同時等待多種事件" data-link-desc="用 select 建立事件迴圈">Go：select：同時等待多種事件</a></li>
<li><a href="/blog/go-advanced/03-runtime-profiling/goroutine-leak/" data-link-title="3.3 goroutine leak 偵測" data-link-desc="判斷背景工作與 client pump 是否正確退出">Go：goroutine leak 偵測</a></li>
<li><a href="/blog/backend/05-deployment-platform/" data-link-title="模組五：部署平台與網路入口" data-link-desc="整理 Kubernetes、systemd、load balancer、container 與服務生命週期合約">Backend：部署平台與網路入口</a></li>
</ul>
<h2 id="小結">小結</h2>
<p>Graceful shutdown 是多階段流程：signal 轉成 root context，readiness 先關閉，HTTP server 停止接新 request，worker 和 WebSocket hub 觀察 context 收尾，整體流程受 timeout 保護。停止訊號越集中，元件 ownership 越清楚，服務在部署、測試與本機開發時越不容易留下殘存 goroutine 或未釋放連線。</p>
]]></content:encoded></item><item><title>6.2 健康檢查與診斷 endpoint</title><link>https://tarrragon.github.io/blog/go-advanced/06-production-operations/health-diagnostics/</link><pubDate>Wed, 22 Apr 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/go-advanced/06-production-operations/health-diagnostics/</guid><description>&lt;p>健康檢查與診斷 endpoint 的核心差異是使用者與風險不同。&lt;code>/health&lt;/code> 給監控或負載平衡器判斷 process 是否活著，&lt;code>/ready&lt;/code> 判斷是否應接流量，&lt;code>/debug/...&lt;/code> 則給工程師排查問題且必須限制存取。&lt;/p>
&lt;h2 id="本章目標">本章目標&lt;/h2>
&lt;p>學完本章後，你將能夠：&lt;/p>
&lt;ol>
&lt;li>分辨 health、&lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/readiness/" data-link-title="Readiness" data-link-desc="說明 instance 何時可以安全接收流量，以及 readiness 如何和部署平台協作">readiness&lt;/a>、diagnostics 的語意&lt;/li>
&lt;li>設計快速穩定的 &lt;code>/health&lt;/code>&lt;/li>
&lt;li>用 &lt;code>/ready&lt;/code> 控制是否接新流量&lt;/li>
&lt;li>條件啟用 pprof、runtime stats 等診斷入口&lt;/li>
&lt;li>測試 status code 與 JSON response 合約&lt;/li>
&lt;/ol>
&lt;hr>
&lt;h2 id="觀察所有狀態都塞進-health-會讓監控失真">【觀察】所有狀態都塞進 health 會讓監控失真&lt;/h2>
&lt;p>Health endpoint 的核心風險是語意混亂。若 &lt;code>/health&lt;/code> 同時檢查 process、&lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/database/" data-link-title="Database" data-link-desc="說明 database 在後端系統中如何承擔正式狀態、查詢與一致性責任">database&lt;/a>、&lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/queue/" data-link-title="Queue" data-link-desc="說明 queue 如何保存等待處理的工作並形成容量邊界">queue&lt;/a>、外部 API、cache、背景同步，任何依賴短暫波動都可能讓服務被判定死亡。&lt;/p>
&lt;p>問題範例：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-text" data-lang="text">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">/health
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl"> ├── process alive?
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl"> ├── database reachable?
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl"> ├── queue lag small?
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl"> ├── external API reachable?
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">6&lt;/span>&lt;span class="cl"> └── background sync fresh?&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>這些問題不應全部塞進同一個 endpoint。Process 活著、可接流量、依賴降級、工程診斷，是不同操作訊號。&lt;/p>
&lt;h2 id="判讀healthreadydiagnostics-回答不同問題">【判讀】health、ready、diagnostics 回答不同問題&lt;/h2>
&lt;p>操作 endpoint 的核心設計是每個 endpoint 只回答一個問題。&lt;/p>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>Endpoint&lt;/th>
 &lt;th>使用者&lt;/th>
 &lt;th>回答的問題&lt;/th>
 &lt;th>失敗影響&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>&lt;code>/health&lt;/code>&lt;/td>
 &lt;td>process monitor&lt;/td>
 &lt;td>process 是否基本活著&lt;/td>
 &lt;td>可能重啟 process&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;code>/ready&lt;/code>&lt;/td>
 &lt;td>load balancer&lt;/td>
 &lt;td>是否應接新流量&lt;/td>
 &lt;td>暫停導流&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;code>/debug/...&lt;/code>&lt;/td>
 &lt;td>工程師&lt;/td>
 &lt;td>服務內部狀態如何&lt;/td>
 &lt;td>不應公開&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;code>/metrics&lt;/code>&lt;/td>
 &lt;td>&lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/metrics/" data-link-title="Metrics" data-link-desc="說明指標如何描述服務趨勢、容量與健康狀態">metrics&lt;/a> collector&lt;/td>
 &lt;td>可聚合監控資料&lt;/td>
 &lt;td>監控缺資料&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>這樣切分後，某個外部依賴故障不一定要讓 process 被重啟；服務可能只是不 ready，或處於 degraded 狀態。&lt;/p>
&lt;h2 id="執行health-endpoint-應簡單快速">【執行】health endpoint 應簡單快速&lt;/h2>
&lt;p>Health endpoint 的核心責任是快速回答 process 是否能處理基本 HTTP request。它應該簡單、快速、穩定。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="ln"> 1&lt;/span>&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">HandleHealth&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">w&lt;/span> &lt;span class="nx">http&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">ResponseWriter&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">r&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">http&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Request&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 2&lt;/span>&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="nx">r&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Method&lt;/span> &lt;span class="o">!=&lt;/span> &lt;span class="nx">http&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">MethodGet&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 3&lt;/span>&lt;span class="cl"> &lt;span class="nx">http&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Error&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">w&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;method not allowed&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">http&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">StatusMethodNotAllowed&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 4&lt;/span>&lt;span class="cl"> &lt;span class="k">return&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 5&lt;/span>&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 6&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 7&lt;/span>&lt;span class="cl"> &lt;span class="nx">w&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Header&lt;/span>&lt;span class="p">().&lt;/span>&lt;span class="nf">Set&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;Content-Type&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;application/json&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 8&lt;/span>&lt;span class="cl"> &lt;span class="nx">w&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">WriteHeader&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">http&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">StatusOK&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 9&lt;/span>&lt;span class="cl"> &lt;span class="nx">_&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">_&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="nx">w&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Write&lt;/span>&lt;span class="p">([]&lt;/span>&lt;span class="nb">byte&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">`{&amp;#34;status&amp;#34;:&amp;#34;ok&amp;#34;}`&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">10&lt;/span>&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;code>/health&lt;/code> 不應執行昂貴查詢，也不應依賴大量下游服務。若健康檢查本身很慢，監控會把診斷工具變成新問題。&lt;/p>
&lt;h2 id="執行readiness-控制是否接流量">【執行】readiness 控制是否接流量&lt;/h2>
&lt;p>Readiness 的核心責任是回答「服務現在是否應該接新流量」。它可以檢查啟動狀態、必要依賴、shutdown 狀態。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="ln"> 1&lt;/span>&lt;span class="cl">&lt;span class="kd">type&lt;/span> &lt;span class="nx">Readiness&lt;/span> &lt;span class="kd">struct&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 2&lt;/span>&lt;span class="cl"> &lt;span class="nx">ready&lt;/span> &lt;span class="nx">atomic&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Bool&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 3&lt;/span>&lt;span class="cl"> &lt;span class="nx">shuttingDown&lt;/span> &lt;span class="nx">atomic&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Bool&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 4&lt;/span>&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 5&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 6&lt;/span>&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">r&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">Readiness&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nf">Ready&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="kt">bool&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 7&lt;/span>&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">r&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">ready&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Load&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="o">&amp;amp;&amp;amp;&lt;/span> &lt;span class="p">!&lt;/span>&lt;span class="nx">r&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">shuttingDown&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Load&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 8&lt;/span>&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 9&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">10&lt;/span>&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">HandleReady&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">readiness&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">Readiness&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nx">http&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">HandlerFunc&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">11&lt;/span>&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="kd">func&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">w&lt;/span> &lt;span class="nx">http&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">ResponseWriter&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">r&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">http&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Request&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">12&lt;/span>&lt;span class="cl"> &lt;span class="nx">w&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Header&lt;/span>&lt;span class="p">().&lt;/span>&lt;span class="nf">Set&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;Content-Type&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;application/json&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">13&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">14&lt;/span>&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="p">!&lt;/span>&lt;span class="nx">readiness&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Ready&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">15&lt;/span>&lt;span class="cl"> &lt;span class="nx">w&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">WriteHeader&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">http&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">StatusServiceUnavailable&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">16&lt;/span>&lt;span class="cl"> &lt;span class="nx">_&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">_&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="nx">w&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Write&lt;/span>&lt;span class="p">([]&lt;/span>&lt;span class="nb">byte&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">`{&amp;#34;status&amp;#34;:&amp;#34;not_ready&amp;#34;}`&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">17&lt;/span>&lt;span class="cl"> &lt;span class="k">return&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">18&lt;/span>&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">19&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">20&lt;/span>&lt;span class="cl"> &lt;span class="nx">w&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">WriteHeader&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">http&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">StatusOK&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">21&lt;/span>&lt;span class="cl"> &lt;span class="nx">_&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">_&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="nx">w&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Write&lt;/span>&lt;span class="p">([]&lt;/span>&lt;span class="nb">byte&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">`{&amp;#34;status&amp;#34;:&amp;#34;ready&amp;#34;}`&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">22&lt;/span>&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">23&lt;/span>&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>服務啟動尚未完成、必要背景同步尚未就緒、或 &lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/graceful-shutdown/" data-link-title="Graceful Shutdown" data-link-desc="說明服務停止前如何排空流量、完成工作與保存狀態">graceful shutdown&lt;/a> 已開始時，readiness 應回 &lt;code>503&lt;/code>。Process 仍然活著，但不應接新流量。&lt;/p>
&lt;h2 id="策略dependency-check-依照監控語意分層">【策略】dependency check 依照監控語意分層&lt;/h2>
&lt;p>依賴檢查的核心判斷是故障是否代表 process 應重啟。Database 暫時不可用不一定代表 process 壞掉；重啟可能無法修復，反而造成更多負載。&lt;/p></description><content:encoded><![CDATA[<p>健康檢查與診斷 endpoint 的核心差異是使用者與風險不同。<code>/health</code> 給監控或負載平衡器判斷 process 是否活著，<code>/ready</code> 判斷是否應接流量，<code>/debug/...</code> 則給工程師排查問題且必須限制存取。</p>
<h2 id="本章目標">本章目標</h2>
<p>學完本章後，你將能夠：</p>
<ol>
<li>分辨 health、<a href="/blog/backend/knowledge-cards/readiness/" data-link-title="Readiness" data-link-desc="說明 instance 何時可以安全接收流量，以及 readiness 如何和部署平台協作">readiness</a>、diagnostics 的語意</li>
<li>設計快速穩定的 <code>/health</code></li>
<li>用 <code>/ready</code> 控制是否接新流量</li>
<li>條件啟用 pprof、runtime stats 等診斷入口</li>
<li>測試 status code 與 JSON response 合約</li>
</ol>
<hr>
<h2 id="觀察所有狀態都塞進-health-會讓監控失真">【觀察】所有狀態都塞進 health 會讓監控失真</h2>
<p>Health endpoint 的核心風險是語意混亂。若 <code>/health</code> 同時檢查 process、<a href="/blog/backend/knowledge-cards/database/" data-link-title="Database" data-link-desc="說明 database 在後端系統中如何承擔正式狀態、查詢與一致性責任">database</a>、<a href="/blog/backend/knowledge-cards/queue/" data-link-title="Queue" data-link-desc="說明 queue 如何保存等待處理的工作並形成容量邊界">queue</a>、外部 API、cache、背景同步，任何依賴短暫波動都可能讓服務被判定死亡。</p>
<p>問題範例：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">/health
</span></span><span class="line"><span class="ln">2</span><span class="cl">  ├── process alive?
</span></span><span class="line"><span class="ln">3</span><span class="cl">  ├── database reachable?
</span></span><span class="line"><span class="ln">4</span><span class="cl">  ├── queue lag small?
</span></span><span class="line"><span class="ln">5</span><span class="cl">  ├── external API reachable?
</span></span><span class="line"><span class="ln">6</span><span class="cl">  └── background sync fresh?</span></span></code></pre></div><p>這些問題不應全部塞進同一個 endpoint。Process 活著、可接流量、依賴降級、工程診斷，是不同操作訊號。</p>
<h2 id="判讀healthreadydiagnostics-回答不同問題">【判讀】health、ready、diagnostics 回答不同問題</h2>
<p>操作 endpoint 的核心設計是每個 endpoint 只回答一個問題。</p>
<table>
  <thead>
      <tr>
          <th>Endpoint</th>
          <th>使用者</th>
          <th>回答的問題</th>
          <th>失敗影響</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><code>/health</code></td>
          <td>process monitor</td>
          <td>process 是否基本活著</td>
          <td>可能重啟 process</td>
      </tr>
      <tr>
          <td><code>/ready</code></td>
          <td>load balancer</td>
          <td>是否應接新流量</td>
          <td>暫停導流</td>
      </tr>
      <tr>
          <td><code>/debug/...</code></td>
          <td>工程師</td>
          <td>服務內部狀態如何</td>
          <td>不應公開</td>
      </tr>
      <tr>
          <td><code>/metrics</code></td>
          <td><a href="/blog/backend/knowledge-cards/metrics/" data-link-title="Metrics" data-link-desc="說明指標如何描述服務趨勢、容量與健康狀態">metrics</a> collector</td>
          <td>可聚合監控資料</td>
          <td>監控缺資料</td>
      </tr>
  </tbody>
</table>
<p>這樣切分後，某個外部依賴故障不一定要讓 process 被重啟；服務可能只是不 ready，或處於 degraded 狀態。</p>
<h2 id="執行health-endpoint-應簡單快速">【執行】health endpoint 應簡單快速</h2>
<p>Health endpoint 的核心責任是快速回答 process 是否能處理基本 HTTP request。它應該簡單、快速、穩定。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="kd">func</span> <span class="nf">HandleHealth</span><span class="p">(</span><span class="nx">w</span> <span class="nx">http</span><span class="p">.</span><span class="nx">ResponseWriter</span><span class="p">,</span> <span class="nx">r</span> <span class="o">*</span><span class="nx">http</span><span class="p">.</span><span class="nx">Request</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="k">if</span> <span class="nx">r</span><span class="p">.</span><span class="nx">Method</span> <span class="o">!=</span> <span class="nx">http</span><span class="p">.</span><span class="nx">MethodGet</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">        <span class="nx">http</span><span class="p">.</span><span class="nf">Error</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span> <span class="s">&#34;method not allowed&#34;</span><span class="p">,</span> <span class="nx">http</span><span class="p">.</span><span class="nx">StatusMethodNotAllowed</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">        <span class="k">return</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">    <span class="nx">w</span><span class="p">.</span><span class="nf">Header</span><span class="p">().</span><span class="nf">Set</span><span class="p">(</span><span class="s">&#34;Content-Type&#34;</span><span class="p">,</span> <span class="s">&#34;application/json&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">    <span class="nx">w</span><span class="p">.</span><span class="nf">WriteHeader</span><span class="p">(</span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusOK</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">    <span class="nx">_</span><span class="p">,</span> <span class="nx">_</span> <span class="p">=</span> <span class="nx">w</span><span class="p">.</span><span class="nf">Write</span><span class="p">([]</span><span class="nb">byte</span><span class="p">(</span><span class="s">`{&#34;status&#34;:&#34;ok&#34;}`</span><span class="p">))</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p><code>/health</code> 不應執行昂貴查詢，也不應依賴大量下游服務。若健康檢查本身很慢，監控會把診斷工具變成新問題。</p>
<h2 id="執行readiness-控制是否接流量">【執行】readiness 控制是否接流量</h2>
<p>Readiness 的核心責任是回答「服務現在是否應該接新流量」。它可以檢查啟動狀態、必要依賴、shutdown 狀態。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="kd">type</span> <span class="nx">Readiness</span> <span class="kd">struct</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="nx">ready</span>        <span class="nx">atomic</span><span class="p">.</span><span class="nx">Bool</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">    <span class="nx">shuttingDown</span> <span class="nx">atomic</span><span class="p">.</span><span class="nx">Bool</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="kd">func</span> <span class="p">(</span><span class="nx">r</span> <span class="o">*</span><span class="nx">Readiness</span><span class="p">)</span> <span class="nf">Ready</span><span class="p">()</span> <span class="kt">bool</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">    <span class="k">return</span> <span class="nx">r</span><span class="p">.</span><span class="nx">ready</span><span class="p">.</span><span class="nf">Load</span><span class="p">()</span> <span class="o">&amp;&amp;</span> <span class="p">!</span><span class="nx">r</span><span class="p">.</span><span class="nx">shuttingDown</span><span class="p">.</span><span class="nf">Load</span><span class="p">()</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="kd">func</span> <span class="nf">HandleReady</span><span class="p">(</span><span class="nx">readiness</span> <span class="o">*</span><span class="nx">Readiness</span><span class="p">)</span> <span class="nx">http</span><span class="p">.</span><span class="nx">HandlerFunc</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">    <span class="k">return</span> <span class="kd">func</span><span class="p">(</span><span class="nx">w</span> <span class="nx">http</span><span class="p">.</span><span class="nx">ResponseWriter</span><span class="p">,</span> <span class="nx">r</span> <span class="o">*</span><span class="nx">http</span><span class="p">.</span><span class="nx">Request</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">        <span class="nx">w</span><span class="p">.</span><span class="nf">Header</span><span class="p">().</span><span class="nf">Set</span><span class="p">(</span><span class="s">&#34;Content-Type&#34;</span><span class="p">,</span> <span class="s">&#34;application/json&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">
</span></span><span class="line"><span class="ln">14</span><span class="cl">        <span class="k">if</span> <span class="p">!</span><span class="nx">readiness</span><span class="p">.</span><span class="nf">Ready</span><span class="p">()</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">            <span class="nx">w</span><span class="p">.</span><span class="nf">WriteHeader</span><span class="p">(</span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusServiceUnavailable</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl">            <span class="nx">_</span><span class="p">,</span> <span class="nx">_</span> <span class="p">=</span> <span class="nx">w</span><span class="p">.</span><span class="nf">Write</span><span class="p">([]</span><span class="nb">byte</span><span class="p">(</span><span class="s">`{&#34;status&#34;:&#34;not_ready&#34;}`</span><span class="p">))</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl">            <span class="k">return</span>
</span></span><span class="line"><span class="ln">18</span><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="ln">19</span><span class="cl">
</span></span><span class="line"><span class="ln">20</span><span class="cl">        <span class="nx">w</span><span class="p">.</span><span class="nf">WriteHeader</span><span class="p">(</span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusOK</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">21</span><span class="cl">        <span class="nx">_</span><span class="p">,</span> <span class="nx">_</span> <span class="p">=</span> <span class="nx">w</span><span class="p">.</span><span class="nf">Write</span><span class="p">([]</span><span class="nb">byte</span><span class="p">(</span><span class="s">`{&#34;status&#34;:&#34;ready&#34;}`</span><span class="p">))</span>
</span></span><span class="line"><span class="ln">22</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">23</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>服務啟動尚未完成、必要背景同步尚未就緒、或 <a href="/blog/backend/knowledge-cards/graceful-shutdown/" data-link-title="Graceful Shutdown" data-link-desc="說明服務停止前如何排空流量、完成工作與保存狀態">graceful shutdown</a> 已開始時，readiness 應回 <code>503</code>。Process 仍然活著，但不應接新流量。</p>
<h2 id="策略dependency-check-依照監控語意分層">【策略】dependency check 依照監控語意分層</h2>
<p>依賴檢查的核心判斷是故障是否代表 process 應重啟。Database 暫時不可用不一定代表 process 壞掉；重啟可能無法修復，反而造成更多負載。</p>
<p>建議分層：</p>
<ul>
<li><code>/health</code>：只確認 process alive。</li>
<li><code>/ready</code>：確認必要依賴是否足以接新流量。</li>
<li><code>/diagnostics/dependencies</code>：提供工程師查看細節。</li>
</ul>
<p>診斷 response 可以包含穩定欄位：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-json" data-lang="json"><span class="line"><span class="ln">1</span><span class="cl"><span class="p">{</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">  <span class="nt">&#34;status&#34;</span><span class="p">:</span> <span class="s2">&#34;degraded&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">  <span class="nt">&#34;dependencies&#34;</span><span class="p">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">    <span class="nt">&#34;database&#34;</span><span class="p">:</span> <span class="s2">&#34;ok&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">    <span class="nt">&#34;queue&#34;</span><span class="p">:</span> <span class="s2">&#34;lagging&#34;</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl">  <span class="p">}</span>
</span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>監控應依賴 status code 與穩定欄位，工程師再用 body 細節診斷問題。自由文字可以輔助閱讀，但不應成為監控規則的依據。</p>
<h2 id="執行diagnostics-endpoint-要條件啟用">【執行】diagnostics endpoint 要條件啟用</h2>
<p>Diagnostics endpoint 的核心用途是提供工程師排查問題的資料。pprof、runtime metrics、internal queue length、goroutine count 都屬於這類。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="kd">func</span> <span class="nf">RegisterDiagnostics</span><span class="p">(</span><span class="nx">mux</span> <span class="o">*</span><span class="nx">http</span><span class="p">.</span><span class="nx">ServeMux</span><span class="p">,</span> <span class="nx">enabled</span> <span class="kt">bool</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="k">if</span> <span class="p">!</span><span class="nx">enabled</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">        <span class="k">return</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">    <span class="nx">mux</span><span class="p">.</span><span class="nf">HandleFunc</span><span class="p">(</span><span class="s">&#34;/debug/runtime&#34;</span><span class="p">,</span> <span class="nx">HandleRuntimeStats</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="kd">func</span> <span class="nf">HandleRuntimeStats</span><span class="p">(</span><span class="nx">w</span> <span class="nx">http</span><span class="p">.</span><span class="nx">ResponseWriter</span><span class="p">,</span> <span class="nx">r</span> <span class="o">*</span><span class="nx">http</span><span class="p">.</span><span class="nx">Request</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">    <span class="kd">var</span> <span class="nx">stats</span> <span class="nx">runtime</span><span class="p">.</span><span class="nx">MemStats</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">    <span class="nx">runtime</span><span class="p">.</span><span class="nf">ReadMemStats</span><span class="p">(</span><span class="o">&amp;</span><span class="nx">stats</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">
</span></span><span class="line"><span class="ln">13</span><span class="cl">    <span class="nx">response</span> <span class="o">:=</span> <span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">any</span><span class="p">{</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">        <span class="s">&#34;heap_alloc&#34;</span><span class="p">:</span>  <span class="nx">stats</span><span class="p">.</span><span class="nx">HeapAlloc</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">        <span class="s">&#34;num_gc&#34;</span><span class="p">:</span>      <span class="nx">stats</span><span class="p">.</span><span class="nx">NumGC</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl">        <span class="s">&#34;goroutines&#34;</span><span class="p">:</span>  <span class="nx">runtime</span><span class="p">.</span><span class="nf">NumGoroutine</span><span class="p">(),</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">18</span><span class="cl">
</span></span><span class="line"><span class="ln">19</span><span class="cl">    <span class="nx">_</span> <span class="p">=</span> <span class="nx">json</span><span class="p">.</span><span class="nf">NewEncoder</span><span class="p">(</span><span class="nx">w</span><span class="p">).</span><span class="nf">Encode</span><span class="p">(</span><span class="nx">response</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">20</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>Diagnostics 可能揭露內部狀態、記憶體資訊、goroutine 數量、路徑與部署細節，不應公開給一般使用者。若需要長期保留，至少應限制在內網、管理 port、認證或防火牆後。</p>
<h2 id="判讀status-code-是監控合約">【判讀】status code 是監控合約</h2>
<p>健康檢查的核心合約是 status code。監控系統通常先看 HTTP code 與 <a href="/blog/backend/knowledge-cards/timeout/" data-link-title="Timeout" data-link-desc="說明等待外部操作的時間上限如何保護資源與使用者體驗">timeout</a>，不會理解複雜 body。</p>
<table>
  <thead>
      <tr>
          <th>狀態</th>
          <th>意義</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><code>200 OK</code></td>
          <td>符合該 endpoint 的健康條件</td>
      </tr>
      <tr>
          <td><code>503 Service Unavailable</code></td>
          <td>暫時不可用或不應接流量</td>
      </tr>
      <tr>
          <td><code>405 Method Not Allowed</code></td>
          <td>呼叫方式錯誤</td>
      </tr>
      <tr>
          <td>timeout</td>
          <td>endpoint 無法在預期時間內回應</td>
      </tr>
  </tbody>
</table>
<p>Body 可以提供人類可讀資訊，但不應讓監控依賴自由文字。若要機器讀取，使用穩定 JSON 欄位，例如 <code>status</code>、<code>reason</code>、<code>dependencies</code>。</p>
<h2 id="測試endpoint-測試要鎖定-status-code">【測試】endpoint 測試要鎖定 status code</h2>
<p>Endpoint 測試的核心是驗證 status code 與穩定 JSON 欄位，而不是完整自由文字。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="kd">func</span> <span class="nf">TestReadyReturnsUnavailableWhenShuttingDown</span><span class="p">(</span><span class="nx">t</span> <span class="o">*</span><span class="nx">testing</span><span class="p">.</span><span class="nx">T</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="nx">readiness</span> <span class="o">:=</span> <span class="o">&amp;</span><span class="nx">Readiness</span><span class="p">{}</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">    <span class="nx">readiness</span><span class="p">.</span><span class="nx">ready</span><span class="p">.</span><span class="nf">Store</span><span class="p">(</span><span class="kc">true</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">    <span class="nx">readiness</span><span class="p">.</span><span class="nx">shuttingDown</span><span class="p">.</span><span class="nf">Store</span><span class="p">(</span><span class="kc">true</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">    <span class="nx">req</span> <span class="o">:=</span> <span class="nx">httptest</span><span class="p">.</span><span class="nf">NewRequest</span><span class="p">(</span><span class="nx">http</span><span class="p">.</span><span class="nx">MethodGet</span><span class="p">,</span> <span class="s">&#34;/ready&#34;</span><span class="p">,</span> <span class="kc">nil</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">    <span class="nx">rec</span> <span class="o">:=</span> <span class="nx">httptest</span><span class="p">.</span><span class="nf">NewRecorder</span><span class="p">()</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">    <span class="nf">HandleReady</span><span class="p">(</span><span class="nx">readiness</span><span class="p">).</span><span class="nf">ServeHTTP</span><span class="p">(</span><span class="nx">rec</span><span class="p">,</span> <span class="nx">req</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">
</span></span><span class="line"><span class="ln">11</span><span class="cl">    <span class="k">if</span> <span class="nx">rec</span><span class="p">.</span><span class="nx">Code</span> <span class="o">!=</span> <span class="nx">http</span><span class="p">.</span><span class="nx">StatusServiceUnavailable</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">        <span class="nx">t</span><span class="p">.</span><span class="nf">Fatalf</span><span class="p">(</span><span class="s">&#34;status = %d, want %d&#34;</span><span class="p">,</span> <span class="nx">rec</span><span class="p">.</span><span class="nx">Code</span><span class="p">,</span> <span class="nx">http</span><span class="p">.</span><span class="nx">StatusServiceUnavailable</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>Diagnostics endpoint 也應測 gate 關閉時不註冊或回 404，避免診斷入口不小心暴露。</p>
<h2 id="本章不處理">本章不處理</h2>
<p>本章先處理 health、readiness 與 diagnostics 的語意切分；Prometheus、OpenTelemetry 與平台設定，會在下列章節再往外延伸：</p>
<ul>
<li><a href="/blog/go-advanced/07-distributed-operations/observability-pipeline/" data-link-title="7.4 Observability pipeline、metrics 與 tracing" data-link-desc="把 structured log、metric、trace 與 profile 組成可操作的診斷系統">Go 進階：Observability pipeline、metrics 與 tracing</a></li>
</ul>
<h2 id="和-go-教材的關係">和 Go 教材的關係</h2>
<p>這一章承接的是 pprof、runtime metrics 與 deploy readiness；如果你要先回看語言教材，可以讀：</p>
<ul>
<li><a href="/blog/go-advanced/03-runtime-profiling/pprof/" data-link-title="3.2 pprof 基礎診斷流程" data-link-desc="用 pprof endpoint 診斷 heap、goroutine 與 CPU 問題">Go：pprof 基礎診斷流程</a></li>
<li><a href="/blog/go-advanced/03-runtime-profiling/gc-memory-limit/" data-link-title="3.1 GC 與 memory limit" data-link-desc="理解 debug.SetMemoryLimit 在長時間服務中的用途">Go：GC 與 memory limit</a></li>
<li><a href="/blog/go/03-stdlib/slog/" data-link-title="3.6 log/slog：結構化日誌" data-link-desc="用 key-value log 設計可查詢、可過濾的程式訊號">Go：結構化日誌</a></li>
<li><a href="/blog/backend/knowledge-cards/graceful-shutdown/" data-link-title="Graceful Shutdown" data-link-desc="說明服務停止前如何排空流量、完成工作與保存狀態">Go：graceful shutdown 與 signal handling</a></li>
<li><a href="/blog/go-advanced/07-distributed-operations/deployment-contracts/" data-link-title="7.5 Kubernetes、systemd 與 load balancer 合約" data-link-desc="理解部署平台如何影響 Go 服務的 shutdown、health 與資源限制">Go 進階：Kubernetes、systemd 與 load balancer 合約</a></li>
<li><a href="/blog/backend/04-observability/" data-link-title="模組四：可觀測性平台" data-link-desc="整理 log、metric、trace、dashboard 與 alert 的後端操作實務">Backend：可觀測性平台</a></li>
<li><a href="/blog/backend/05-deployment-platform/" data-link-title="模組五：部署平台與網路入口" data-link-desc="整理 Kubernetes、systemd、load balancer、container 與服務生命週期合約">Backend：部署平台與網路入口</a></li>
</ul>
<h2 id="小結">小結</h2>
<p><code>/health</code>、<code>/ready</code>、diagnostics endpoint 解決不同問題。Health 檢查 process 基本可用性，readiness 控制是否接新流量，diagnostics 支援工程排查且應限制存取。Status code 是監控合約，JSON body 是補充細節；把這些訊號混在一起會讓操作判斷與安全邊界都變模糊。</p>
]]></content:encoded></item><item><title>6.3 結構化日誌欄位設計</title><link>https://tarrragon.github.io/blog/go-advanced/06-production-operations/log-fields/</link><pubDate>Wed, 22 Apr 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/go-advanced/06-production-operations/log-fields/</guid><description>&lt;p>結構化日誌欄位的核心目標是讓 &lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/log/" data-link-title="Log" data-link-desc="說明 log 如何記錄單一事件的上下文並支援事故排查">log&lt;/a> 可查詢、可聚合、可追蹤。Message 給人讀，欄位給系統查；重要資訊應放在穩定欄位，不應只藏在自由文字裡。&lt;/p>
&lt;h2 id="本章目標">本章目標&lt;/h2>
&lt;p>學完本章後，你將能夠：&lt;/p>
&lt;ol>
&lt;li>設計穩定 &lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/log-schema/" data-link-title="Log Schema" data-link-desc="說明結構化 log 欄位如何支援搜尋、關聯與事故排查">log schema&lt;/a>&lt;/li>
&lt;li>用 &lt;code>layer&lt;/code>、&lt;code>request_id&lt;/code>、&lt;code>event_type&lt;/code>、&lt;code>reason&lt;/code> 支援查詢&lt;/li>
&lt;li>區分 message 與 structured fields 的責任&lt;/li>
&lt;li>避免重複記錄同一個錯誤&lt;/li>
&lt;li>避免把敏感資料寫進 log&lt;/li>
&lt;/ol>
&lt;hr>
&lt;h2 id="觀察自由文字-log-很難查詢">【觀察】自由文字 log 很難查詢&lt;/h2>
&lt;p>Log 設計的核心問題是事故發生時需要快速查詢。若所有資訊都在 message 裡，查詢只能依賴模糊字串。&lt;/p>
&lt;p>不穩定 log：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="nx">logger&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Info&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;event accepted for user 123 request abc&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>這行給人看可以，但系統很難穩定查 &lt;code>request_id=abc&lt;/code> 或 &lt;code>user_id=123&lt;/code>。不同工程師改字句後，查詢就可能失效。&lt;/p>
&lt;p>結構化 log：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="nx">logger&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Info&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;event accepted&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl"> &lt;span class="s">&amp;#34;layer&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;http&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl"> &lt;span class="s">&amp;#34;request_id&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">requestID&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl"> &lt;span class="s">&amp;#34;user_id&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">userID&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl"> &lt;span class="s">&amp;#34;event_type&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">event&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Type&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">6&lt;/span>&lt;span class="cl">&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Message 描述發生什麼事，欄位提供可查詢資料。這是 log schema 的基本分工。&lt;/p>
&lt;h2 id="判讀log-schema-是查詢合約">【判讀】log schema 是查詢合約&lt;/h2>
&lt;p>Log schema 的核心規則是欄位名稱與值集合要穩定。&lt;code>request_id&lt;/code>、&lt;code>requestID&lt;/code>、&lt;code>rid&lt;/code> 混用會讓查詢與儀表板變得困難。&lt;/p>
&lt;p>常用欄位：&lt;/p>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>欄位&lt;/th>
 &lt;th>用途&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>&lt;code>layer&lt;/code>&lt;/td>
 &lt;td>問題發生在哪個系統層&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;code>request_id&lt;/code>&lt;/td>
 &lt;td>串起單次 HTTP request&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;code>event_id&lt;/code>&lt;/td>
 &lt;td>串起事件處理流程&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;code>event_type&lt;/code>&lt;/td>
 &lt;td>聚合某類 domain event&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;code>client_id&lt;/code>&lt;/td>
 &lt;td>查 &lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/websocket/" data-link-title="WebSocket" data-link-desc="說明 WebSocket 如何提供長連線雙向即時通訊">WebSocket&lt;/a> client 行為&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;code>topic&lt;/code>&lt;/td>
 &lt;td>查訂閱或推送範圍&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;code>reason&lt;/code>&lt;/td>
 &lt;td>聚合失敗原因&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;code>error&lt;/code>&lt;/td>
 &lt;td>保存錯誤文字&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>欄位不需要很多，但要一致。穩定欄位能讓除錯從「讀一堆文字」變成「查一組條件」。&lt;/p>
&lt;h2 id="執行layer-表示發生位置">【執行】layer 表示發生位置&lt;/h2>
&lt;p>&lt;code>layer&lt;/code> 的核心用途是標示 log 來自哪個系統層，協助工程師快速縮小問題範圍。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="nx">logger&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Warn&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;queue full&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl"> &lt;span class="s">&amp;#34;layer&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;worker&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl"> &lt;span class="s">&amp;#34;queue&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;events&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl"> &lt;span class="s">&amp;#34;reason&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;buffer_full&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl">&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>常見 layer：&lt;/p>
&lt;ul>
&lt;li>&lt;code>http&lt;/code>&lt;/li>
&lt;li>&lt;code>websocket&lt;/code>&lt;/li>
&lt;li>&lt;code>worker&lt;/code>&lt;/li>
&lt;li>&lt;code>repository&lt;/code>&lt;/li>
&lt;li>&lt;code>runtime&lt;/code>&lt;/li>
&lt;li>&lt;code>diagnostics&lt;/code>&lt;/li>
&lt;/ul>
&lt;p>名稱不需要多，但應穩定。若 &lt;code>worker&lt;/code>、&lt;code>background&lt;/code>、&lt;code>job_runner&lt;/code> 混用，查詢就會變麻煩。&lt;/p>
&lt;h2 id="策略correlation-id-串起一次流程">【策略】correlation ID 串起一次流程&lt;/h2>
&lt;p>&lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/correlation-id/" data-link-title="Correlation ID" data-link-desc="說明跨事件或跨服務的關聯識別碼如何支援排障">Correlation ID&lt;/a> 的核心目標是把同一次請求或同一個事件流串起來。HTTP request 常用 &lt;code>request_id&lt;/code>，背景事件可以用 &lt;code>event_id&lt;/code> 或 &lt;code>trace_id&lt;/code>。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">WithRequestLog&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">r&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">http&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Request&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">logger&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">slog&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Logger&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">slog&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Logger&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl"> &lt;span class="nx">requestID&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">r&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Header&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Get&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;X-Request-ID&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="nx">requestID&lt;/span> &lt;span class="o">==&lt;/span> &lt;span class="s">&amp;#34;&amp;#34;&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl"> &lt;span class="nx">requestID&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="nx">uuid&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">NewString&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">6&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">7&lt;/span>&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">logger&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">With&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;request_id&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">requestID&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">8&lt;/span>&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>後續 handler、service、repository 都使用帶有 &lt;code>request_id&lt;/code> 的 logger。查詢單次流程時，不需要靠時間範圍猜哪些 log 相關。&lt;/p>
&lt;p>Correlation ID 不應包含敏感資料。它是追蹤用識別碼，不是使用者資料容器。&lt;/p>
&lt;h2 id="執行reason-欄位讓失敗可統計">【執行】reason 欄位讓失敗可統計&lt;/h2>
&lt;p>&lt;code>reason&lt;/code> 的核心用途是把錯誤原因變成可聚合分類。Message 可以給人讀，reason 給查詢與統計使用。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="nx">logger&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Warn&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;reject event&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl"> &lt;span class="s">&amp;#34;layer&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;http&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl"> &lt;span class="s">&amp;#34;reason&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;invalid_payload&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl"> &lt;span class="s">&amp;#34;event_type&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">event&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Type&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl">&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>穩定 reason 可以回答「最近一小時最多的拒絕原因是什麼」。如果原因只寫在 message 中，查詢會依賴模糊字串比對。&lt;/p>
&lt;p>Reason 值應像 enum 一樣維持小集合，例如：&lt;/p>
&lt;ul>
&lt;li>&lt;code>invalid_payload&lt;/code>&lt;/li>
&lt;li>&lt;code>queue_full&lt;/code>&lt;/li>
&lt;li>&lt;code>permission_denied&lt;/code>&lt;/li>
&lt;li>&lt;code>timeout&lt;/code>&lt;/li>
&lt;li>&lt;code>client_disconnected&lt;/code>&lt;/li>
&lt;li>&lt;code>dependency_unavailable&lt;/code>&lt;/li>
&lt;/ul>
&lt;p>&lt;code>reason&lt;/code> 應維持小集合分類，完整錯誤應放在 &lt;code>error&lt;/code> 欄位。這樣監控可以穩定聚合原因，工程師仍能從錯誤欄位取得診斷細節。&lt;/p>
&lt;h2 id="判讀錯誤只在負責處理的邊界記一次">【判讀】錯誤只在負責處理的邊界記一次&lt;/h2>
&lt;p>錯誤日誌的核心風險是同一個錯誤被每一層都記一次。這會放大噪音，讓事故時很難看出真正的失敗點。&lt;/p>
&lt;p>反模式：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="nx">logger&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Error&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;repository failed&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;error&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">err&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="k">return&lt;/span> &lt;span class="nx">fmt&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Errorf&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;save notification: %w&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">err&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>上層又記一次：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="nx">logger&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Error&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;request failed&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;error&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">err&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>較清楚的做法是底層 wrap error，上層在決定 response 或重試策略的邊界記錄一次：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="k">if&lt;/span> &lt;span class="nx">err&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">service&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Create&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">ctx&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">cmd&lt;/span>&lt;span class="p">);&lt;/span> &lt;span class="nx">err&lt;/span> &lt;span class="o">!=&lt;/span> &lt;span class="kc">nil&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl"> &lt;span class="nx">logger&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Warn&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;create notification failed&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl"> &lt;span class="s">&amp;#34;layer&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;http&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl"> &lt;span class="s">&amp;#34;reason&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nf">reasonOf&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">err&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl"> &lt;span class="s">&amp;#34;error&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">err&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">6&lt;/span>&lt;span class="cl"> &lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">7&lt;/span>&lt;span class="cl"> &lt;span class="nf">writeError&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">w&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">err&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">8&lt;/span>&lt;span class="cl"> &lt;span class="k">return&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">9&lt;/span>&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>底層若有必要補充脈絡，優先透過 error wrapping 或 structured error，而不是每層都 &lt;code>Error&lt;/code> log。&lt;/p></description><content:encoded><![CDATA[<p>結構化日誌欄位的核心目標是讓 <a href="/blog/backend/knowledge-cards/log/" data-link-title="Log" data-link-desc="說明 log 如何記錄單一事件的上下文並支援事故排查">log</a> 可查詢、可聚合、可追蹤。Message 給人讀，欄位給系統查；重要資訊應放在穩定欄位，不應只藏在自由文字裡。</p>
<h2 id="本章目標">本章目標</h2>
<p>學完本章後，你將能夠：</p>
<ol>
<li>設計穩定 <a href="/blog/backend/knowledge-cards/log-schema/" data-link-title="Log Schema" data-link-desc="說明結構化 log 欄位如何支援搜尋、關聯與事故排查">log schema</a></li>
<li>用 <code>layer</code>、<code>request_id</code>、<code>event_type</code>、<code>reason</code> 支援查詢</li>
<li>區分 message 與 structured fields 的責任</li>
<li>避免重複記錄同一個錯誤</li>
<li>避免把敏感資料寫進 log</li>
</ol>
<hr>
<h2 id="觀察自由文字-log-很難查詢">【觀察】自由文字 log 很難查詢</h2>
<p>Log 設計的核心問題是事故發生時需要快速查詢。若所有資訊都在 message 裡，查詢只能依賴模糊字串。</p>
<p>不穩定 log：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln">1</span><span class="cl"><span class="nx">logger</span><span class="p">.</span><span class="nf">Info</span><span class="p">(</span><span class="s">&#34;event accepted for user 123 request abc&#34;</span><span class="p">)</span></span></span></code></pre></div><p>這行給人看可以，但系統很難穩定查 <code>request_id=abc</code> 或 <code>user_id=123</code>。不同工程師改字句後，查詢就可能失效。</p>
<p>結構化 log：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln">1</span><span class="cl"><span class="nx">logger</span><span class="p">.</span><span class="nf">Info</span><span class="p">(</span><span class="s">&#34;event accepted&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">    <span class="s">&#34;layer&#34;</span><span class="p">,</span> <span class="s">&#34;http&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">    <span class="s">&#34;request_id&#34;</span><span class="p">,</span> <span class="nx">requestID</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">    <span class="s">&#34;user_id&#34;</span><span class="p">,</span> <span class="nx">userID</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">    <span class="s">&#34;event_type&#34;</span><span class="p">,</span> <span class="nx">event</span><span class="p">.</span><span class="nx">Type</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="p">)</span></span></span></code></pre></div><p>Message 描述發生什麼事，欄位提供可查詢資料。這是 log schema 的基本分工。</p>
<h2 id="判讀log-schema-是查詢合約">【判讀】log schema 是查詢合約</h2>
<p>Log schema 的核心規則是欄位名稱與值集合要穩定。<code>request_id</code>、<code>requestID</code>、<code>rid</code> 混用會讓查詢與儀表板變得困難。</p>
<p>常用欄位：</p>
<table>
  <thead>
      <tr>
          <th>欄位</th>
          <th>用途</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><code>layer</code></td>
          <td>問題發生在哪個系統層</td>
      </tr>
      <tr>
          <td><code>request_id</code></td>
          <td>串起單次 HTTP request</td>
      </tr>
      <tr>
          <td><code>event_id</code></td>
          <td>串起事件處理流程</td>
      </tr>
      <tr>
          <td><code>event_type</code></td>
          <td>聚合某類 domain event</td>
      </tr>
      <tr>
          <td><code>client_id</code></td>
          <td>查 <a href="/blog/backend/knowledge-cards/websocket/" data-link-title="WebSocket" data-link-desc="說明 WebSocket 如何提供長連線雙向即時通訊">WebSocket</a> client 行為</td>
      </tr>
      <tr>
          <td><code>topic</code></td>
          <td>查訂閱或推送範圍</td>
      </tr>
      <tr>
          <td><code>reason</code></td>
          <td>聚合失敗原因</td>
      </tr>
      <tr>
          <td><code>error</code></td>
          <td>保存錯誤文字</td>
      </tr>
  </tbody>
</table>
<p>欄位不需要很多，但要一致。穩定欄位能讓除錯從「讀一堆文字」變成「查一組條件」。</p>
<h2 id="執行layer-表示發生位置">【執行】layer 表示發生位置</h2>
<p><code>layer</code> 的核心用途是標示 log 來自哪個系統層，協助工程師快速縮小問題範圍。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln">1</span><span class="cl"><span class="nx">logger</span><span class="p">.</span><span class="nf">Warn</span><span class="p">(</span><span class="s">&#34;queue full&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">    <span class="s">&#34;layer&#34;</span><span class="p">,</span> <span class="s">&#34;worker&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">    <span class="s">&#34;queue&#34;</span><span class="p">,</span> <span class="s">&#34;events&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">    <span class="s">&#34;reason&#34;</span><span class="p">,</span> <span class="s">&#34;buffer_full&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="p">)</span></span></span></code></pre></div><p>常見 layer：</p>
<ul>
<li><code>http</code></li>
<li><code>websocket</code></li>
<li><code>worker</code></li>
<li><code>repository</code></li>
<li><code>runtime</code></li>
<li><code>diagnostics</code></li>
</ul>
<p>名稱不需要多，但應穩定。若 <code>worker</code>、<code>background</code>、<code>job_runner</code> 混用，查詢就會變麻煩。</p>
<h2 id="策略correlation-id-串起一次流程">【策略】correlation ID 串起一次流程</h2>
<p><a href="/blog/backend/knowledge-cards/correlation-id/" data-link-title="Correlation ID" data-link-desc="說明跨事件或跨服務的關聯識別碼如何支援排障">Correlation ID</a> 的核心目標是把同一次請求或同一個事件流串起來。HTTP request 常用 <code>request_id</code>，背景事件可以用 <code>event_id</code> 或 <code>trace_id</code>。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln">1</span><span class="cl"><span class="kd">func</span> <span class="nf">WithRequestLog</span><span class="p">(</span><span class="nx">r</span> <span class="o">*</span><span class="nx">http</span><span class="p">.</span><span class="nx">Request</span><span class="p">,</span> <span class="nx">logger</span> <span class="o">*</span><span class="nx">slog</span><span class="p">.</span><span class="nx">Logger</span><span class="p">)</span> <span class="o">*</span><span class="nx">slog</span><span class="p">.</span><span class="nx">Logger</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">    <span class="nx">requestID</span> <span class="o">:=</span> <span class="nx">r</span><span class="p">.</span><span class="nx">Header</span><span class="p">.</span><span class="nf">Get</span><span class="p">(</span><span class="s">&#34;X-Request-ID&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">    <span class="k">if</span> <span class="nx">requestID</span> <span class="o">==</span> <span class="s">&#34;&#34;</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">        <span class="nx">requestID</span> <span class="p">=</span> <span class="nx">uuid</span><span class="p">.</span><span class="nf">NewString</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl">
</span></span><span class="line"><span class="ln">7</span><span class="cl">    <span class="k">return</span> <span class="nx">logger</span><span class="p">.</span><span class="nf">With</span><span class="p">(</span><span class="s">&#34;request_id&#34;</span><span class="p">,</span> <span class="nx">requestID</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">8</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>後續 handler、service、repository 都使用帶有 <code>request_id</code> 的 logger。查詢單次流程時，不需要靠時間範圍猜哪些 log 相關。</p>
<p>Correlation ID 不應包含敏感資料。它是追蹤用識別碼，不是使用者資料容器。</p>
<h2 id="執行reason-欄位讓失敗可統計">【執行】reason 欄位讓失敗可統計</h2>
<p><code>reason</code> 的核心用途是把錯誤原因變成可聚合分類。Message 可以給人讀，reason 給查詢與統計使用。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln">1</span><span class="cl"><span class="nx">logger</span><span class="p">.</span><span class="nf">Warn</span><span class="p">(</span><span class="s">&#34;reject event&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">    <span class="s">&#34;layer&#34;</span><span class="p">,</span> <span class="s">&#34;http&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">    <span class="s">&#34;reason&#34;</span><span class="p">,</span> <span class="s">&#34;invalid_payload&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">    <span class="s">&#34;event_type&#34;</span><span class="p">,</span> <span class="nx">event</span><span class="p">.</span><span class="nx">Type</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="p">)</span></span></span></code></pre></div><p>穩定 reason 可以回答「最近一小時最多的拒絕原因是什麼」。如果原因只寫在 message 中，查詢會依賴模糊字串比對。</p>
<p>Reason 值應像 enum 一樣維持小集合，例如：</p>
<ul>
<li><code>invalid_payload</code></li>
<li><code>queue_full</code></li>
<li><code>permission_denied</code></li>
<li><code>timeout</code></li>
<li><code>client_disconnected</code></li>
<li><code>dependency_unavailable</code></li>
</ul>
<p><code>reason</code> 應維持小集合分類，完整錯誤應放在 <code>error</code> 欄位。這樣監控可以穩定聚合原因，工程師仍能從錯誤欄位取得診斷細節。</p>
<h2 id="判讀錯誤只在負責處理的邊界記一次">【判讀】錯誤只在負責處理的邊界記一次</h2>
<p>錯誤日誌的核心風險是同一個錯誤被每一層都記一次。這會放大噪音，讓事故時很難看出真正的失敗點。</p>
<p>反模式：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln">1</span><span class="cl"><span class="nx">logger</span><span class="p">.</span><span class="nf">Error</span><span class="p">(</span><span class="s">&#34;repository failed&#34;</span><span class="p">,</span> <span class="s">&#34;error&#34;</span><span class="p">,</span> <span class="nx">err</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="k">return</span> <span class="nx">fmt</span><span class="p">.</span><span class="nf">Errorf</span><span class="p">(</span><span class="s">&#34;save notification: %w&#34;</span><span class="p">,</span> <span class="nx">err</span><span class="p">)</span></span></span></code></pre></div><p>上層又記一次：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln">1</span><span class="cl"><span class="nx">logger</span><span class="p">.</span><span class="nf">Error</span><span class="p">(</span><span class="s">&#34;request failed&#34;</span><span class="p">,</span> <span class="s">&#34;error&#34;</span><span class="p">,</span> <span class="nx">err</span><span class="p">)</span></span></span></code></pre></div><p>較清楚的做法是底層 wrap error，上層在決定 response 或重試策略的邊界記錄一次：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">if</span> <span class="nx">err</span> <span class="o">:=</span> <span class="nx">service</span><span class="p">.</span><span class="nf">Create</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span> <span class="nx">cmd</span><span class="p">);</span> <span class="nx">err</span> <span class="o">!=</span> <span class="kc">nil</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">    <span class="nx">logger</span><span class="p">.</span><span class="nf">Warn</span><span class="p">(</span><span class="s">&#34;create notification failed&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">        <span class="s">&#34;layer&#34;</span><span class="p">,</span> <span class="s">&#34;http&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">        <span class="s">&#34;reason&#34;</span><span class="p">,</span> <span class="nf">reasonOf</span><span class="p">(</span><span class="nx">err</span><span class="p">),</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">        <span class="s">&#34;error&#34;</span><span class="p">,</span> <span class="nx">err</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl">    <span class="p">)</span>
</span></span><span class="line"><span class="ln">7</span><span class="cl">    <span class="nf">writeError</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span> <span class="nx">err</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">8</span><span class="cl">    <span class="k">return</span>
</span></span><span class="line"><span class="ln">9</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>底層若有必要補充脈絡，優先透過 error wrapping 或 structured error，而不是每層都 <code>Error</code> log。</p>
<h2 id="策略敏感資料不進-log">【策略】敏感資料不進 log</h2>
<p>Log 欄位設計的核心安全邊界是只記錄診斷必要資料。token、密碼、完整 cookie、完整個資與機密 payload 都屬於應排除資料；結構化 log 很容易被集中保存與搜尋，敏感資料一旦進入 log，清理成本很高。</p>
<p>可以記錄：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln">1</span><span class="cl"><span class="nx">logger</span><span class="p">.</span><span class="nf">Info</span><span class="p">(</span><span class="s">&#34;user login&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">    <span class="s">&#34;user_id&#34;</span><span class="p">,</span> <span class="nx">user</span><span class="p">.</span><span class="nx">ID</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="p">)</span></span></span></code></pre></div><p>應排除：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln">1</span><span class="cl"><span class="nx">logger</span><span class="p">.</span><span class="nf">Info</span><span class="p">(</span><span class="s">&#34;user login&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">    <span class="s">&#34;password&#34;</span><span class="p">,</span> <span class="nx">password</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">    <span class="s">&#34;token&#34;</span><span class="p">,</span> <span class="nx">token</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="p">)</span></span></span></code></pre></div><p>若需要診斷 payload，可記錄長度、hash、欄位是否存在，而不是完整內容。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln">1</span><span class="cl"><span class="nx">logger</span><span class="p">.</span><span class="nf">Debug</span><span class="p">(</span><span class="s">&#34;payload received&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">    <span class="s">&#34;payload_bytes&#34;</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="nx">body</span><span class="p">),</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">    <span class="s">&#34;payload_sha256&#34;</span><span class="p">,</span> <span class="nf">checksum</span><span class="p">(</span><span class="nx">body</span><span class="p">),</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="p">)</span></span></span></code></pre></div><p>所有會被收集或保存的 log 都應遵守同一套資料保護規則。Debug log 也會進入檔案、集中式 log 或診斷封包，因此不能把它當成敏感資料的例外通道。</p>
<h2 id="測試log-欄位可以用-handler-驗證">【測試】log 欄位可以用 handler 驗證</h2>
<p>Log schema 的測試核心是確認重要欄位存在，避免未來重構時消失。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="kd">func</span> <span class="nf">TestLogAttrsForEvent</span><span class="p">(</span><span class="nx">t</span> <span class="o">*</span><span class="nx">testing</span><span class="p">.</span><span class="nx">T</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="nx">event</span> <span class="o">:=</span> <span class="nx">DomainEvent</span><span class="p">{</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">        <span class="nx">ID</span><span class="p">:</span>        <span class="s">&#34;evt_1&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">        <span class="nx">Type</span><span class="p">:</span>      <span class="s">&#34;notification.created&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">        <span class="nx">SubjectID</span><span class="p">:</span> <span class="s">&#34;notification_1&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">    <span class="nx">attrs</span> <span class="o">:=</span> <span class="nf">LogAttrsForEvent</span><span class="p">(</span><span class="nx">event</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">
</span></span><span class="line"><span class="ln">10</span><span class="cl">    <span class="k">if</span> <span class="p">!</span><span class="nf">hasAttr</span><span class="p">(</span><span class="nx">attrs</span><span class="p">,</span> <span class="s">&#34;event_id&#34;</span><span class="p">,</span> <span class="s">&#34;evt_1&#34;</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">        <span class="nx">t</span><span class="p">.</span><span class="nf">Fatalf</span><span class="p">(</span><span class="s">&#34;event_id attr missing&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">    <span class="k">if</span> <span class="p">!</span><span class="nf">hasAttr</span><span class="p">(</span><span class="nx">attrs</span><span class="p">,</span> <span class="s">&#34;event_type&#34;</span><span class="p">,</span> <span class="s">&#34;notification.created&#34;</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">        <span class="nx">t</span><span class="p">.</span><span class="nf">Fatalf</span><span class="p">(</span><span class="s">&#34;event_type attr missing&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>不需要測整行 log 字串。測穩定欄位即可，message 文字可以保留一定調整空間。</p>
<h2 id="本章不處理">本章不處理</h2>
<p>本章先處理 Go 服務內部的 structured log schema；集中式平台、欄位標準與隱私治理，會在下列章節再往外延伸：</p>
<ul>
<li><a href="/blog/go-advanced/07-distributed-operations/observability-pipeline/" data-link-title="7.4 Observability pipeline、metrics 與 tracing" data-link-desc="把 structured log、metric、trace 與 profile 組成可操作的診斷系統">Go 進階：Observability pipeline、metrics 與 tracing</a></li>
</ul>
<h2 id="和-go-教材的關係">和 Go 教材的關係</h2>
<p>這一章承接的是 structured recording、<a href="/blog/backend/knowledge-cards/event-log/" data-link-title="Event Log" data-link-desc="說明事件歷史如何保存、重播與支援跨服務資料重建">event log</a> 與 observability pipeline；如果你要先回看語言教材，可以讀：</p>
<ul>
<li><a href="/blog/go/06-practical/structured-recording/" data-link-title="6.5 如何新增結構化記錄欄位" data-link-desc="區分 operational log、domain event log 與狀態資料">Go：如何新增結構化記錄欄位</a></li>
<li><a href="/blog/go/03-stdlib/slog/" data-link-title="3.6 log/slog：結構化日誌" data-link-desc="用 key-value log 設計可查詢、可過濾的程式訊號">Go：結構化日誌</a></li>
<li><a href="/blog/go/06-practical/new-event-type/" data-link-title="6.2 如何新增一種 domain event" data-link-desc="擴展事件常數、輸入驗證與處理流程">Go：如何新增一種 domain event</a></li>
<li><a href="/blog/go-advanced/07-distributed-operations/observability-pipeline/" data-link-title="7.4 Observability pipeline、metrics 與 tracing" data-link-desc="把 structured log、metric、trace 與 profile 組成可操作的診斷系統">Go：Observability pipeline、metrics 與 tracing</a></li>
<li><a href="/blog/backend/04-observability/" data-link-title="模組四：可觀測性平台" data-link-desc="整理 log、metric、trace、dashboard 與 alert 的後端操作實務">Backend：可觀測性平台</a></li>
<li><a href="/blog/go/03-stdlib/slog/" data-link-title="3.6 log/slog：結構化日誌" data-link-desc="用 key-value log 設計可查詢、可過濾的程式訊號">Go 入門：log/slog</a></li>
<li><a href="/blog/go/06-practical/structured-recording/" data-link-title="6.5 如何新增結構化記錄欄位" data-link-desc="區分 operational log、domain event log 與狀態資料">Go 入門：如何新增結構化記錄欄位</a></li>
</ul>
<h2 id="小結">小結</h2>
<p>結構化日誌的價值在於穩定欄位：<code>layer</code> 定位層級，<code>request_id</code> 串起請求，<code>event_id</code> 串起事件，<code>event_type</code> 支援聚合，<code>reason</code> 支援失敗分類。Message 給人讀，欄位給系統查。好的 log schema 能讓除錯從猜測變成查詢，同時避免敏感資料外洩與錯誤重複記錄。</p>
]]></content:encoded></item><item><title>6.4 版本偵測與 feature gate</title><link>https://tarrragon.github.io/blog/go-advanced/06-production-operations/feature-gate/</link><pubDate>Wed, 22 Apr 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/go-advanced/06-production-operations/feature-gate/</guid><description>&lt;p>Feature gate 的核心目標是在外部能力、部署環境或版本不同時，讓服務保留可預期行為。它明確管理功能何時啟用、關閉時如何降級、錯誤時如何回報。&lt;/p>
&lt;h2 id="本章目標">本章目標&lt;/h2>
&lt;p>學完本章後，你將能夠：&lt;/p>
&lt;ol>
&lt;li>用 config struct 集中載入 feature gate&lt;/li>
&lt;li>把外部版本偵測轉成 capability&lt;/li>
&lt;li>為 gate 關閉時定義降級、回錯或延後處理策略&lt;/li>
&lt;li>避免在程式各處直接讀環境變數&lt;/li>
&lt;li>同時測試 feature 開與關兩條路徑&lt;/li>
&lt;/ol>
&lt;hr>
&lt;h2 id="觀察新功能上線需要可控行為">【觀察】新功能上線需要可控行為&lt;/h2>
&lt;p>Feature gate 的核心需求來自生產環境差異。新功能可能只在部分部署環境可用，外部依賴可能版本不同，某些診斷入口只應在內網啟用，某些即時能力需要先灰度。&lt;/p>
&lt;p>沒有 gate 時常見問題：&lt;/p>
&lt;ul>
&lt;li>新功能只能一次性全開或全關。&lt;/li>
&lt;li>部署環境不支援時服務直接失敗。&lt;/li>
&lt;li>測試只能覆蓋預設路徑。&lt;/li>
&lt;li>問題發生時無法快速降級。&lt;/li>
&lt;li>程式各處用環境變數判斷，行為難以推理。&lt;/li>
&lt;/ul>
&lt;p>Feature gate 的目的是讓行為決策集中、可測、可回滾。&lt;/p>
&lt;h2 id="判讀feature-gate-是行為合約">【判讀】feature gate 是行為合約&lt;/h2>
&lt;p>Feature gate 的核心語意是控制某段行為是否啟用，以及未啟用時系統要做什麼。它不只是 &lt;code>if&lt;/code>，而是一個操作合約。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="kd">type&lt;/span> &lt;span class="nx">Features&lt;/span> &lt;span class="kd">struct&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl"> &lt;span class="nx">RealtimePush&lt;/span> &lt;span class="kt">bool&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl"> &lt;span class="nx">Diagnostics&lt;/span> &lt;span class="kt">bool&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl"> &lt;span class="nx">Pprof&lt;/span> &lt;span class="kt">bool&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>開關名稱應描述功能，而不是描述臨時任務。&lt;code>RealtimePush&lt;/code> 比 &lt;code>NewCode&lt;/code> 更能長期維護；&lt;code>Diagnostics&lt;/code> 比 &lt;code>DebugStuff&lt;/code> 更清楚。&lt;/p>
&lt;p>Gate 應在應用啟動時集中載入，再傳給需要的元件。不要在程式各處反覆直接讀環境變數，否則測試與推理都會變困難。&lt;/p>
&lt;h2 id="執行集中載入-feature-config">【執行】集中載入 feature config&lt;/h2>
&lt;p>Feature config 的核心責任是把環境變數、設定檔或啟動參數轉成明確資料。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">LoadFeaturesFromEnv&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="nx">Features&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">Features&lt;/span>&lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl"> &lt;span class="nx">RealtimePush&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nx">os&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Getenv&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;FEATURE_REALTIME_PUSH&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">==&lt;/span> &lt;span class="s">&amp;#34;1&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl"> &lt;span class="nx">Diagnostics&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nx">os&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Getenv&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;APP_DIAGNOSTICS&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">==&lt;/span> &lt;span class="s">&amp;#34;1&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl"> &lt;span class="nx">Pprof&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nx">os&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Getenv&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;APP_PPROF&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">==&lt;/span> &lt;span class="s">&amp;#34;1&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">6&lt;/span>&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">7&lt;/span>&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>組裝時傳入元件：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="ln"> 1&lt;/span>&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">main&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 2&lt;/span>&lt;span class="cl"> &lt;span class="nx">features&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nf">LoadFeaturesFromEnv&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 3&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 4&lt;/span>&lt;span class="cl"> &lt;span class="nx">mux&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">http&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">NewServeMux&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 5&lt;/span>&lt;span class="cl"> &lt;span class="nf">RegisterDiagnostics&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">mux&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">features&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Diagnostics&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 6&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 7&lt;/span>&lt;span class="cl"> &lt;span class="nx">publisher&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nf">NewPublisher&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">PublisherConfig&lt;/span>&lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 8&lt;/span>&lt;span class="cl"> &lt;span class="nx">RealtimeEnabled&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nx">features&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">RealtimePush&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 9&lt;/span>&lt;span class="cl"> &lt;span class="p">})&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">10&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">11&lt;/span>&lt;span class="cl"> &lt;span class="nx">_&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="nx">publisher&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">12&lt;/span>&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>這樣功能測試可以直接建構 &lt;code>Features&lt;/code>，不必依賴全域環境變數。環境變數解析只需要在 &lt;code>LoadFeaturesFromEnv&lt;/code> 的測試中覆蓋。&lt;/p>
&lt;h2 id="判讀版本偵測要轉成能力">【判讀】版本偵測要轉成能力&lt;/h2>
&lt;p>版本偵測的核心原則是不要讓整個程式到處比較版本字串。應把外部版本轉成 capability，內部只判斷能力。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="ln"> 1&lt;/span>&lt;span class="cl">&lt;span class="kd">type&lt;/span> &lt;span class="nx">Capabilities&lt;/span> &lt;span class="kd">struct&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 2&lt;/span>&lt;span class="cl"> &lt;span class="nx">SupportsStreaming&lt;/span> &lt;span class="kt">bool&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 3&lt;/span>&lt;span class="cl"> &lt;span class="nx">SupportsMetadata&lt;/span> &lt;span class="kt">bool&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 4&lt;/span>&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 5&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 6&lt;/span>&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">DetectCapabilities&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">version&lt;/span> &lt;span class="nx">semver&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Version&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nx">Capabilities&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 7&lt;/span>&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">Capabilities&lt;/span>&lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 8&lt;/span>&lt;span class="cl"> &lt;span class="nx">SupportsStreaming&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nx">version&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">GTE&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">semver&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">MustParse&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;2.0.0&amp;#34;&lt;/span>&lt;span class="p">)),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 9&lt;/span>&lt;span class="cl"> &lt;span class="nx">SupportsMetadata&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nx">version&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">GTE&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">semver&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">MustParse&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;2.1.0&amp;#34;&lt;/span>&lt;span class="p">)),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">10&lt;/span>&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">11&lt;/span>&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>內部程式應寫成：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="k">if&lt;/span> &lt;span class="nx">caps&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">SupportsStreaming&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nf">useStreaming&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">ctx&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl">&lt;span class="k">return&lt;/span> &lt;span class="nf">usePolling&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">ctx&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>這比到處寫 &lt;code>if version &amp;gt;= ...&lt;/code> 更清楚，也更容易測試。版本字串是外部事實，capability 是內部行為判斷。&lt;/p>
&lt;h2 id="策略gate-關閉時要有降級策略">【策略】gate 關閉時要有降級策略&lt;/h2>
&lt;p>Feature gate 的核心問題是關閉時要做什麼。常見策略包括降級、回錯、隱藏入口、排程稍後處理。&lt;/p>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>策略&lt;/th>
 &lt;th>行為&lt;/th>
 &lt;th>適用情境&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>&lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/fallback/" data-link-title="Fallback" data-link-desc="說明主要路徑失敗時使用替代結果或替代流程的設計責任">fallback&lt;/a>&lt;/td>
 &lt;td>使用舊流程&lt;/td>
 &lt;td>新能力只是效率改善&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>reject&lt;/td>
 &lt;td>回明確錯誤&lt;/td>
 &lt;td>功能沒有安全替代方案&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>hide&lt;/td>
 &lt;td>不註冊 endpoint 或不顯示入口&lt;/td>
 &lt;td>使用者不應看到該功能&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>store for later&lt;/td>
 &lt;td>先保存，稍後處理&lt;/td>
 &lt;td>即時能力暫不可用但資料不能丟&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>例如即時推送關閉時，可以改成保存待處理資料：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">p&lt;/span> &lt;span class="nx">Publisher&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nf">Publish&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">ctx&lt;/span> &lt;span class="nx">context&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Context&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">event&lt;/span> &lt;span class="nx">DomainEvent&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="kt">error&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="nx">p&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">realtimeEnabled&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">p&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">realtime&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Publish&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">ctx&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">event&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">6&lt;/span>&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">p&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">repository&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">SaveForLater&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">ctx&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">event&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">7&lt;/span>&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>降級策略要符合資料語意。不能即時送出不代表可以直接丟掉重要事件。&lt;/p>
&lt;h2 id="執行http-endpoint-可用-gate-控制註冊或行為">【執行】HTTP endpoint 可用 gate 控制註冊或行為&lt;/h2>
&lt;p>HTTP feature gate 的核心選擇是「不註冊 endpoint」或「註冊但回明確錯誤」。兩者語意不同。&lt;/p>
&lt;p>不註冊 endpoint：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="k">if&lt;/span> &lt;span class="nx">features&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Diagnostics&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl"> &lt;span class="nf">RegisterDiagnostics&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">mux&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="kc">true&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>適合診斷入口、內部工具或不希望使用者看見的功能。&lt;/p>
&lt;p>註冊但回錯：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="ln"> 1&lt;/span>&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">HandleRealtimeExport&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">features&lt;/span> &lt;span class="nx">Features&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nx">http&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">HandlerFunc&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 2&lt;/span>&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="kd">func&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">w&lt;/span> &lt;span class="nx">http&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">ResponseWriter&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">r&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">http&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Request&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 3&lt;/span>&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="p">!&lt;/span>&lt;span class="nx">features&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">RealtimePush&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 4&lt;/span>&lt;span class="cl"> &lt;span class="nx">http&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Error&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">w&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;realtime export is disabled&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">http&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">StatusNotImplemented&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 5&lt;/span>&lt;span class="cl"> &lt;span class="k">return&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 6&lt;/span>&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 7&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 8&lt;/span>&lt;span class="cl"> &lt;span class="nf">startRealtimeExport&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">w&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">r&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 9&lt;/span>&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">10&lt;/span>&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>適合公開 API，讓呼叫端知道功能存在但目前不可用。&lt;/p>
&lt;h2 id="策略gate-不應散落成巢狀-if">【策略】gate 不應散落成巢狀 if&lt;/h2>
&lt;p>Feature gate 的核心維護風險是判斷散落在多層呼叫中，最後沒人知道功能到底何時啟用。&lt;/p></description><content:encoded><![CDATA[<p>Feature gate 的核心目標是在外部能力、部署環境或版本不同時，讓服務保留可預期行為。它明確管理功能何時啟用、關閉時如何降級、錯誤時如何回報。</p>
<h2 id="本章目標">本章目標</h2>
<p>學完本章後，你將能夠：</p>
<ol>
<li>用 config struct 集中載入 feature gate</li>
<li>把外部版本偵測轉成 capability</li>
<li>為 gate 關閉時定義降級、回錯或延後處理策略</li>
<li>避免在程式各處直接讀環境變數</li>
<li>同時測試 feature 開與關兩條路徑</li>
</ol>
<hr>
<h2 id="觀察新功能上線需要可控行為">【觀察】新功能上線需要可控行為</h2>
<p>Feature gate 的核心需求來自生產環境差異。新功能可能只在部分部署環境可用，外部依賴可能版本不同，某些診斷入口只應在內網啟用，某些即時能力需要先灰度。</p>
<p>沒有 gate 時常見問題：</p>
<ul>
<li>新功能只能一次性全開或全關。</li>
<li>部署環境不支援時服務直接失敗。</li>
<li>測試只能覆蓋預設路徑。</li>
<li>問題發生時無法快速降級。</li>
<li>程式各處用環境變數判斷，行為難以推理。</li>
</ul>
<p>Feature gate 的目的是讓行為決策集中、可測、可回滾。</p>
<h2 id="判讀feature-gate-是行為合約">【判讀】feature gate 是行為合約</h2>
<p>Feature gate 的核心語意是控制某段行為是否啟用，以及未啟用時系統要做什麼。它不只是 <code>if</code>，而是一個操作合約。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln">1</span><span class="cl"><span class="kd">type</span> <span class="nx">Features</span> <span class="kd">struct</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">    <span class="nx">RealtimePush</span> <span class="kt">bool</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">    <span class="nx">Diagnostics</span>  <span class="kt">bool</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">    <span class="nx">Pprof</span>        <span class="kt">bool</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>開關名稱應描述功能，而不是描述臨時任務。<code>RealtimePush</code> 比 <code>NewCode</code> 更能長期維護；<code>Diagnostics</code> 比 <code>DebugStuff</code> 更清楚。</p>
<p>Gate 應在應用啟動時集中載入，再傳給需要的元件。不要在程式各處反覆直接讀環境變數，否則測試與推理都會變困難。</p>
<h2 id="執行集中載入-feature-config">【執行】集中載入 feature config</h2>
<p>Feature config 的核心責任是把環境變數、設定檔或啟動參數轉成明確資料。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln">1</span><span class="cl"><span class="kd">func</span> <span class="nf">LoadFeaturesFromEnv</span><span class="p">()</span> <span class="nx">Features</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">    <span class="k">return</span> <span class="nx">Features</span><span class="p">{</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">        <span class="nx">RealtimePush</span><span class="p">:</span> <span class="nx">os</span><span class="p">.</span><span class="nf">Getenv</span><span class="p">(</span><span class="s">&#34;FEATURE_REALTIME_PUSH&#34;</span><span class="p">)</span> <span class="o">==</span> <span class="s">&#34;1&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">        <span class="nx">Diagnostics</span><span class="p">:</span>  <span class="nx">os</span><span class="p">.</span><span class="nf">Getenv</span><span class="p">(</span><span class="s">&#34;APP_DIAGNOSTICS&#34;</span><span class="p">)</span> <span class="o">==</span> <span class="s">&#34;1&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">        <span class="nx">Pprof</span><span class="p">:</span>        <span class="nx">os</span><span class="p">.</span><span class="nf">Getenv</span><span class="p">(</span><span class="s">&#34;APP_PPROF&#34;</span><span class="p">)</span> <span class="o">==</span> <span class="s">&#34;1&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>組裝時傳入元件：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="kd">func</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="nx">features</span> <span class="o">:=</span> <span class="nf">LoadFeaturesFromEnv</span><span class="p">()</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">    <span class="nx">mux</span> <span class="o">:=</span> <span class="nx">http</span><span class="p">.</span><span class="nf">NewServeMux</span><span class="p">()</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">    <span class="nf">RegisterDiagnostics</span><span class="p">(</span><span class="nx">mux</span><span class="p">,</span> <span class="nx">features</span><span class="p">.</span><span class="nx">Diagnostics</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">    <span class="nx">publisher</span> <span class="o">:=</span> <span class="nf">NewPublisher</span><span class="p">(</span><span class="nx">PublisherConfig</span><span class="p">{</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">        <span class="nx">RealtimeEnabled</span><span class="p">:</span> <span class="nx">features</span><span class="p">.</span><span class="nx">RealtimePush</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">    <span class="p">})</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">
</span></span><span class="line"><span class="ln">11</span><span class="cl">    <span class="nx">_</span> <span class="p">=</span> <span class="nx">publisher</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>這樣功能測試可以直接建構 <code>Features</code>，不必依賴全域環境變數。環境變數解析只需要在 <code>LoadFeaturesFromEnv</code> 的測試中覆蓋。</p>
<h2 id="判讀版本偵測要轉成能力">【判讀】版本偵測要轉成能力</h2>
<p>版本偵測的核心原則是不要讓整個程式到處比較版本字串。應把外部版本轉成 capability，內部只判斷能力。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="kd">type</span> <span class="nx">Capabilities</span> <span class="kd">struct</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="nx">SupportsStreaming</span> <span class="kt">bool</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">    <span class="nx">SupportsMetadata</span>  <span class="kt">bool</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="kd">func</span> <span class="nf">DetectCapabilities</span><span class="p">(</span><span class="nx">version</span> <span class="nx">semver</span><span class="p">.</span><span class="nx">Version</span><span class="p">)</span> <span class="nx">Capabilities</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">    <span class="k">return</span> <span class="nx">Capabilities</span><span class="p">{</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">        <span class="nx">SupportsStreaming</span><span class="p">:</span> <span class="nx">version</span><span class="p">.</span><span class="nf">GTE</span><span class="p">(</span><span class="nx">semver</span><span class="p">.</span><span class="nf">MustParse</span><span class="p">(</span><span class="s">&#34;2.0.0&#34;</span><span class="p">)),</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">        <span class="nx">SupportsMetadata</span><span class="p">:</span>  <span class="nx">version</span><span class="p">.</span><span class="nf">GTE</span><span class="p">(</span><span class="nx">semver</span><span class="p">.</span><span class="nf">MustParse</span><span class="p">(</span><span class="s">&#34;2.1.0&#34;</span><span class="p">)),</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>內部程式應寫成：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">if</span> <span class="nx">caps</span><span class="p">.</span><span class="nx">SupportsStreaming</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">    <span class="k">return</span> <span class="nf">useStreaming</span><span class="p">(</span><span class="nx">ctx</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">
</span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="k">return</span> <span class="nf">usePolling</span><span class="p">(</span><span class="nx">ctx</span><span class="p">)</span></span></span></code></pre></div><p>這比到處寫 <code>if version &gt;= ...</code> 更清楚，也更容易測試。版本字串是外部事實，capability 是內部行為判斷。</p>
<h2 id="策略gate-關閉時要有降級策略">【策略】gate 關閉時要有降級策略</h2>
<p>Feature gate 的核心問題是關閉時要做什麼。常見策略包括降級、回錯、隱藏入口、排程稍後處理。</p>
<table>
  <thead>
      <tr>
          <th>策略</th>
          <th>行為</th>
          <th>適用情境</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><a href="/blog/backend/knowledge-cards/fallback/" data-link-title="Fallback" data-link-desc="說明主要路徑失敗時使用替代結果或替代流程的設計責任">fallback</a></td>
          <td>使用舊流程</td>
          <td>新能力只是效率改善</td>
      </tr>
      <tr>
          <td>reject</td>
          <td>回明確錯誤</td>
          <td>功能沒有安全替代方案</td>
      </tr>
      <tr>
          <td>hide</td>
          <td>不註冊 endpoint 或不顯示入口</td>
          <td>使用者不應看到該功能</td>
      </tr>
      <tr>
          <td>store for later</td>
          <td>先保存，稍後處理</td>
          <td>即時能力暫不可用但資料不能丟</td>
      </tr>
  </tbody>
</table>
<p>例如即時推送關閉時，可以改成保存待處理資料：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln">1</span><span class="cl"><span class="kd">func</span> <span class="p">(</span><span class="nx">p</span> <span class="nx">Publisher</span><span class="p">)</span> <span class="nf">Publish</span><span class="p">(</span><span class="nx">ctx</span> <span class="nx">context</span><span class="p">.</span><span class="nx">Context</span><span class="p">,</span> <span class="nx">event</span> <span class="nx">DomainEvent</span><span class="p">)</span> <span class="kt">error</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">    <span class="k">if</span> <span class="nx">p</span><span class="p">.</span><span class="nx">realtimeEnabled</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">        <span class="k">return</span> <span class="nx">p</span><span class="p">.</span><span class="nx">realtime</span><span class="p">.</span><span class="nf">Publish</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span> <span class="nx">event</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">
</span></span><span class="line"><span class="ln">6</span><span class="cl">    <span class="k">return</span> <span class="nx">p</span><span class="p">.</span><span class="nx">repository</span><span class="p">.</span><span class="nf">SaveForLater</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span> <span class="nx">event</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>降級策略要符合資料語意。不能即時送出不代表可以直接丟掉重要事件。</p>
<h2 id="執行http-endpoint-可用-gate-控制註冊或行為">【執行】HTTP endpoint 可用 gate 控制註冊或行為</h2>
<p>HTTP feature gate 的核心選擇是「不註冊 endpoint」或「註冊但回明確錯誤」。兩者語意不同。</p>
<p>不註冊 endpoint：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">if</span> <span class="nx">features</span><span class="p">.</span><span class="nx">Diagnostics</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">    <span class="nf">RegisterDiagnostics</span><span class="p">(</span><span class="nx">mux</span><span class="p">,</span> <span class="kc">true</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>適合診斷入口、內部工具或不希望使用者看見的功能。</p>
<p>註冊但回錯：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="kd">func</span> <span class="nf">HandleRealtimeExport</span><span class="p">(</span><span class="nx">features</span> <span class="nx">Features</span><span class="p">)</span> <span class="nx">http</span><span class="p">.</span><span class="nx">HandlerFunc</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="k">return</span> <span class="kd">func</span><span class="p">(</span><span class="nx">w</span> <span class="nx">http</span><span class="p">.</span><span class="nx">ResponseWriter</span><span class="p">,</span> <span class="nx">r</span> <span class="o">*</span><span class="nx">http</span><span class="p">.</span><span class="nx">Request</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">        <span class="k">if</span> <span class="p">!</span><span class="nx">features</span><span class="p">.</span><span class="nx">RealtimePush</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">            <span class="nx">http</span><span class="p">.</span><span class="nf">Error</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span> <span class="s">&#34;realtime export is disabled&#34;</span><span class="p">,</span> <span class="nx">http</span><span class="p">.</span><span class="nx">StatusNotImplemented</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">            <span class="k">return</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">        <span class="nf">startRealtimeExport</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span> <span class="nx">r</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>適合公開 API，讓呼叫端知道功能存在但目前不可用。</p>
<h2 id="策略gate-不應散落成巢狀-if">【策略】gate 不應散落成巢狀 if</h2>
<p>Feature gate 的核心維護風險是判斷散落在多層呼叫中，最後沒人知道功能到底何時啟用。</p>
<p>反模式：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">if</span> <span class="nx">os</span><span class="p">.</span><span class="nf">Getenv</span><span class="p">(</span><span class="s">&#34;FEATURE_REALTIME_PUSH&#34;</span><span class="p">)</span> <span class="o">==</span> <span class="s">&#34;1&#34;</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">    <span class="k">if</span> <span class="nx">version</span> <span class="o">&gt;=</span> <span class="s">&#34;2.0.0&#34;</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">        <span class="k">if</span> <span class="nx">user</span><span class="p">.</span><span class="nx">Enabled</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">            <span class="c1">// ...</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>較清楚的做法是先組出 decision：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="kd">type</span> <span class="nx">RealtimeDecision</span> <span class="kd">struct</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="nx">Enabled</span> <span class="kt">bool</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">    <span class="nx">Reason</span>  <span class="kt">string</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="kd">func</span> <span class="nf">DecideRealtime</span><span class="p">(</span><span class="nx">features</span> <span class="nx">Features</span><span class="p">,</span> <span class="nx">caps</span> <span class="nx">Capabilities</span><span class="p">)</span> <span class="nx">RealtimeDecision</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">    <span class="k">if</span> <span class="p">!</span><span class="nx">features</span><span class="p">.</span><span class="nx">RealtimePush</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">        <span class="k">return</span> <span class="nx">RealtimeDecision</span><span class="p">{</span><span class="nx">Enabled</span><span class="p">:</span> <span class="kc">false</span><span class="p">,</span> <span class="nx">Reason</span><span class="p">:</span> <span class="s">&#34;feature_disabled&#34;</span><span class="p">}</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">    <span class="k">if</span> <span class="p">!</span><span class="nx">caps</span><span class="p">.</span><span class="nx">SupportsStreaming</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">        <span class="k">return</span> <span class="nx">RealtimeDecision</span><span class="p">{</span><span class="nx">Enabled</span><span class="p">:</span> <span class="kc">false</span><span class="p">,</span> <span class="nx">Reason</span><span class="p">:</span> <span class="s">&#34;streaming_not_supported&#34;</span><span class="p">}</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">    <span class="k">return</span> <span class="nx">RealtimeDecision</span><span class="p">{</span><span class="nx">Enabled</span><span class="p">:</span> <span class="kc">true</span><span class="p">}</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>Decision 物件讓 <a href="/blog/backend/knowledge-cards/log/" data-link-title="Log" data-link-desc="說明 log 如何記錄單一事件的上下文並支援事故排查">log</a>、測試與錯誤回應都能使用相同 reason。</p>
<h2 id="執行log-要記錄-gate-decision">【執行】log 要記錄 gate decision</h2>
<p>Feature gate 的核心操作需求是知道功能為何啟用或關閉。當 gate 影響行為時，應記錄穩定 reason。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln">1</span><span class="cl"><span class="nx">decision</span> <span class="o">:=</span> <span class="nf">DecideRealtime</span><span class="p">(</span><span class="nx">features</span><span class="p">,</span> <span class="nx">caps</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="nx">logger</span><span class="p">.</span><span class="nf">Info</span><span class="p">(</span><span class="s">&#34;realtime decision&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">    <span class="s">&#34;feature&#34;</span><span class="p">,</span> <span class="s">&#34;realtime_push&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">    <span class="s">&#34;enabled&#34;</span><span class="p">,</span> <span class="nx">decision</span><span class="p">.</span><span class="nx">Enabled</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">    <span class="s">&#34;reason&#34;</span><span class="p">,</span> <span class="nx">decision</span><span class="p">.</span><span class="nx">Reason</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="p">)</span></span></span></code></pre></div><p>這能回答「功能為什麼沒有走即時推送」這類問題。Reason 應是小集合，不要塞完整錯誤字串。</p>
<h2 id="測試開與關兩條路徑都要測">【測試】開與關兩條路徑都要測</h2>
<p>Feature gate 測試的核心規則是同時測啟用與停用路徑。只測預設值很容易讓另一條路徑壞掉。</p>
<p>停用路徑：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="kd">func</span> <span class="nf">TestHandleRealtimeExportFeatureDisabled</span><span class="p">(</span><span class="nx">t</span> <span class="o">*</span><span class="nx">testing</span><span class="p">.</span><span class="nx">T</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="nx">req</span> <span class="o">:=</span> <span class="nx">httptest</span><span class="p">.</span><span class="nf">NewRequest</span><span class="p">(</span><span class="nx">http</span><span class="p">.</span><span class="nx">MethodPost</span><span class="p">,</span> <span class="s">&#34;/export&#34;</span><span class="p">,</span> <span class="kc">nil</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">    <span class="nx">rec</span> <span class="o">:=</span> <span class="nx">httptest</span><span class="p">.</span><span class="nf">NewRecorder</span><span class="p">()</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">    <span class="nx">handler</span> <span class="o">:=</span> <span class="nf">HandleRealtimeExport</span><span class="p">(</span><span class="nx">Features</span><span class="p">{</span><span class="nx">RealtimePush</span><span class="p">:</span> <span class="kc">false</span><span class="p">})</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">    <span class="nx">handler</span><span class="p">.</span><span class="nf">ServeHTTP</span><span class="p">(</span><span class="nx">rec</span><span class="p">,</span> <span class="nx">req</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">    <span class="k">if</span> <span class="nx">rec</span><span class="p">.</span><span class="nx">Code</span> <span class="o">!=</span> <span class="nx">http</span><span class="p">.</span><span class="nx">StatusNotImplemented</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">        <span class="nx">t</span><span class="p">.</span><span class="nf">Fatalf</span><span class="p">(</span><span class="s">&#34;status = %d, want %d&#34;</span><span class="p">,</span> <span class="nx">rec</span><span class="p">.</span><span class="nx">Code</span><span class="p">,</span> <span class="nx">http</span><span class="p">.</span><span class="nx">StatusNotImplemented</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>啟用路徑：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="kd">func</span> <span class="nf">TestDecideRealtimeEnabled</span><span class="p">(</span><span class="nx">t</span> <span class="o">*</span><span class="nx">testing</span><span class="p">.</span><span class="nx">T</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="nx">decision</span> <span class="o">:=</span> <span class="nf">DecideRealtime</span><span class="p">(</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">        <span class="nx">Features</span><span class="p">{</span><span class="nx">RealtimePush</span><span class="p">:</span> <span class="kc">true</span><span class="p">},</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">        <span class="nx">Capabilities</span><span class="p">{</span><span class="nx">SupportsStreaming</span><span class="p">:</span> <span class="kc">true</span><span class="p">},</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">    <span class="p">)</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">    <span class="k">if</span> <span class="p">!</span><span class="nx">decision</span><span class="p">.</span><span class="nx">Enabled</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">        <span class="nx">t</span><span class="p">.</span><span class="nf">Fatalf</span><span class="p">(</span><span class="s">&#34;realtime should be enabled, reason %q&#34;</span><span class="p">,</span> <span class="nx">decision</span><span class="p">.</span><span class="nx">Reason</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>環境變數解析應單獨測 <code>LoadFeaturesFromEnv</code>。功能測試應直接傳入 <code>Features</code>，不要依賴全域環境狀態。</p>
<h2 id="本章不處理">本章不處理</h2>
<p>本章先處理服務內部的 gate 行為邊界；遠端 <a href="/blog/backend/knowledge-cards/feature-flag/" data-link-title="Feature Flag" data-link-desc="說明如何用可動態開關控制功能曝光與風險">feature flag</a> 平台與灰度流程，會在下列章節再往外延伸：</p>
<ul>
<li><a href="/blog/backend/05-deployment-platform/" data-link-title="模組五：部署平台與網路入口" data-link-desc="整理 Kubernetes、systemd、load balancer、container 與服務生命週期合約">Backend：部署平台與網路入口</a></li>
</ul>
<h2 id="和-go-教材的關係">和 Go 教材的關係</h2>
<p>這一章承接的是 composition root、handler boundary 與 runtime gate；如果你要先回看語言教材，可以讀：</p>
<ul>
<li><a href="/blog/go/07-refactoring/composition-root/" data-link-title="7.7 composition root 與依賴組裝" data-link-desc="把具體 adapter、config 與 usecase wiring 留在應用入口層">Go：composition root 與依賴組裝</a></li>
<li><a href="/blog/go/07-refactoring/handler-boundary/" data-link-title="7.1 把 handler 邏輯拆成可測單元" data-link-desc="分離 HTTP 協定處理與核心邏輯">Go：把 handler 邏輯拆成可測單元</a></li>
<li><a href="/blog/go/07-refactoring/interface-boundary/" data-link-title="7.2 用 interface 隔離外部依賴" data-link-desc="建立小而穩定的測試替身">Go：用 interface 隔離外部依賴</a></li>
<li><a href="/blog/go/05-error-testing/testing-basics/" data-link-title="5.2 testing 基礎" data-link-desc="用 testing package 驗證函式行為">Go：testing 基礎</a></li>
<li><a href="/blog/go-advanced/07-distributed-operations/deployment-contracts/" data-link-title="7.5 Kubernetes、systemd 與 load balancer 合約" data-link-desc="理解部署平台如何影響 Go 服務的 shutdown、health 與資源限制">Go 進階：Kubernetes、systemd 與 load balancer 合約</a></li>
</ul>
<h2 id="小結">小結</h2>
<p>Feature gate 是生產操作工具，也是程式設計邊界。好的 gate 會集中載入、轉成 capability、定義降級策略、輸出穩定 reason，並同時測試開與關兩條路徑。它控制的是行為合約，不只是把新程式碼藏在 <code>if</code> 後面。</p>
]]></content:encoded></item></channel></rss>