<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Graceful-Shutdown on Tarragon</title><link>https://tarrragon.github.io/blog/tags/graceful-shutdown/</link><description>Recent content in Graceful-Shutdown on Tarragon</description><generator>Hugo -- gohugo.io</generator><language>zh-TW</language><copyright>Tarragon (CC BY 4.0)</copyright><lastBuildDate>Fri, 19 Jun 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://tarrragon.github.io/blog/tags/graceful-shutdown/index.xml" rel="self" type="application/rss+xml"/><item><title>Go 平台適配</title><link>https://tarrragon.github.io/blog/monitoring/05-platform-adaptation/go-platform/</link><pubDate>Fri, 19 Jun 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/monitoring/05-platform-adaptation/go-platform/</guid><description>&lt;p>Go 的 monitoring SDK 和其他平台 SDK 的定位不同。JS / Flutter / Python SDK 是 client-side 的事件上報工具，Go SDK 更常用在 server-side — 包括 collector 本身的自身監控。Go 的 goroutine 並行模型、signal handling 機制和 HTTP server 的 graceful shutdown 是 Go 環境中的三個核心適配問題。&lt;/p>
&lt;h2 id="graceful-shutdown">Graceful shutdown&lt;/h2>
&lt;p>Go 程式收到 SIGTERM 或 SIGINT 時需要在退出前完成清理：flush 剩餘的 buffer、關閉網路連線、寫入最後的 lifecycle 事件。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="nx">ctx&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">stop&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">signal&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">NotifyContext&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">context&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Background&lt;/span>&lt;span class="p">(),&lt;/span> &lt;span class="nx">syscall&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">SIGTERM&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">syscall&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">SIGINT&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="k">defer&lt;/span> &lt;span class="nf">stop&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">&lt;span class="o">&amp;lt;-&lt;/span>&lt;span class="nx">ctx&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Done&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl">&lt;span class="c1">// signal received, start graceful shutdown&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">6&lt;/span>&lt;span class="cl">&lt;span class="nx">monitor&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Close&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">context&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">WithTimeout&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">context&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Background&lt;/span>&lt;span class="p">(),&lt;/span> &lt;span class="mi">5&lt;/span>&lt;span class="o">*&lt;/span>&lt;span class="nx">time&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Second&lt;/span>&lt;span class="p">))&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>graceful shutdown 的時間窗口由部署環境決定。Kubernetes 的預設 terminationGracePeriodSeconds 是 30 秒，Docker 的 stop timeout 是 10 秒。SDK 的 Close 方法接受 context 讓呼叫端控制超時。&lt;/p>
&lt;h3 id="http-server-的-shutdown-順序">HTTP server 的 shutdown 順序&lt;/h3>
&lt;p>如果 Go 程式同時是 HTTP server 和 monitoring SDK 的使用者，shutdown 順序需要正確：&lt;/p>
&lt;ol>
&lt;li>停止接受新連線（&lt;code>server.Shutdown(ctx)&lt;/code>）&lt;/li>
&lt;li>等待進行中的請求完成&lt;/li>
&lt;li>flush 監控 buffer（&lt;code>monitor.Close(ctx)&lt;/code>）&lt;/li>
&lt;li>關閉 log 和其他資源&lt;/li>
&lt;/ol>
&lt;p>如果先 close monitor 再 shutdown server，進行中的請求產生的事件會在 monitor 已關閉後嘗試送出，被靜默丟棄。&lt;/p>
&lt;h2 id="signal-handling">Signal handling&lt;/h2>
&lt;p>Go 的 &lt;code>signal.Notify&lt;/code> 和 &lt;code>signal.NotifyContext&lt;/code> 是接收 OS signal 的標準方式。SDK 在 init 時不應該自己註冊 signal handler — 這會和應用程式的 signal handling 衝突（Go 的 signal handler 是先到先得，後註冊的覆蓋先註冊的）。&lt;/p>
&lt;p>SDK 端的適配方式是提供 &lt;code>Close&lt;/code> 方法讓應用程式在自己的 signal handler 中呼叫，而非 SDK 內部攔截 signal。應用程式控制 shutdown 流程，SDK 只負責在被告知關閉時 flush 和清理。&lt;/p>
&lt;h3 id="panic-recovery">panic recovery&lt;/h3>
&lt;p>Go 的 panic 會終止當前 goroutine。如果 panic 發生在 main goroutine 且沒有 recover，程式直接退出，SDK 的 buffer 中的事件遺失。&lt;/p>
&lt;p>SDK 可以提供 &lt;code>monitor.RecoverAndReport()&lt;/code> 讓開發者在 goroutine 的入口用 &lt;code>defer monitor.RecoverAndReport()&lt;/code> 攔截 panic，記錄 error 事件後再 re-panic（保持原有的 crash 行為）。&lt;/p>
&lt;p>HTTP handler 的 panic 可以用 middleware 攔截：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">monitorMiddleware&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">next&lt;/span> &lt;span class="nx">http&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Handler&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nx">http&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Handler&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">http&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">HandlerFunc&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="kd">func&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">w&lt;/span> &lt;span class="nx">http&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">ResponseWriter&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">r&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">http&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Request&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl"> &lt;span class="k">defer&lt;/span> &lt;span class="nx">monitor&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">RecoverAndReport&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl"> &lt;span class="nx">next&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">ServeHTTP&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">w&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">r&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl"> &lt;span class="p">})&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">6&lt;/span>&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="http-server-自身監控">HTTP server 自身監控&lt;/h2>
&lt;p>Go 常用來寫 collector 本身。Collector 需要監控自己的健康狀態 — 請求處理速率、錯誤率、goroutine 數量、記憶體使用量。&lt;/p></description><content:encoded><![CDATA[<p>Go 的 monitoring SDK 和其他平台 SDK 的定位不同。JS / Flutter / Python SDK 是 client-side 的事件上報工具，Go SDK 更常用在 server-side — 包括 collector 本身的自身監控。Go 的 goroutine 並行模型、signal handling 機制和 HTTP server 的 graceful shutdown 是 Go 環境中的三個核心適配問題。</p>
<h2 id="graceful-shutdown">Graceful shutdown</h2>
<p>Go 程式收到 SIGTERM 或 SIGINT 時需要在退出前完成清理：flush 剩餘的 buffer、關閉網路連線、寫入最後的 lifecycle 事件。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln">1</span><span class="cl"><span class="nx">ctx</span><span class="p">,</span> <span class="nx">stop</span> <span class="o">:=</span> <span class="nx">signal</span><span class="p">.</span><span class="nf">NotifyContext</span><span class="p">(</span><span class="nx">context</span><span class="p">.</span><span class="nf">Background</span><span class="p">(),</span> <span class="nx">syscall</span><span class="p">.</span><span class="nx">SIGTERM</span><span class="p">,</span> <span class="nx">syscall</span><span class="p">.</span><span class="nx">SIGINT</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="k">defer</span> <span class="nf">stop</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="o">&lt;-</span><span class="nx">ctx</span><span class="p">.</span><span class="nf">Done</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="c1">// signal received, start graceful shutdown</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="nx">monitor</span><span class="p">.</span><span class="nf">Close</span><span class="p">(</span><span class="nx">context</span><span class="p">.</span><span class="nf">WithTimeout</span><span class="p">(</span><span class="nx">context</span><span class="p">.</span><span class="nf">Background</span><span class="p">(),</span> <span class="mi">5</span><span class="o">*</span><span class="nx">time</span><span class="p">.</span><span class="nx">Second</span><span class="p">))</span></span></span></code></pre></div><p>graceful shutdown 的時間窗口由部署環境決定。Kubernetes 的預設 terminationGracePeriodSeconds 是 30 秒，Docker 的 stop timeout 是 10 秒。SDK 的 Close 方法接受 context 讓呼叫端控制超時。</p>
<h3 id="http-server-的-shutdown-順序">HTTP server 的 shutdown 順序</h3>
<p>如果 Go 程式同時是 HTTP server 和 monitoring SDK 的使用者，shutdown 順序需要正確：</p>
<ol>
<li>停止接受新連線（<code>server.Shutdown(ctx)</code>）</li>
<li>等待進行中的請求完成</li>
<li>flush 監控 buffer（<code>monitor.Close(ctx)</code>）</li>
<li>關閉 log 和其他資源</li>
</ol>
<p>如果先 close monitor 再 shutdown server，進行中的請求產生的事件會在 monitor 已關閉後嘗試送出，被靜默丟棄。</p>
<h2 id="signal-handling">Signal handling</h2>
<p>Go 的 <code>signal.Notify</code> 和 <code>signal.NotifyContext</code> 是接收 OS signal 的標準方式。SDK 在 init 時不應該自己註冊 signal handler — 這會和應用程式的 signal handling 衝突（Go 的 signal handler 是先到先得，後註冊的覆蓋先註冊的）。</p>
<p>SDK 端的適配方式是提供 <code>Close</code> 方法讓應用程式在自己的 signal handler 中呼叫，而非 SDK 內部攔截 signal。應用程式控制 shutdown 流程，SDK 只負責在被告知關閉時 flush 和清理。</p>
<h3 id="panic-recovery">panic recovery</h3>
<p>Go 的 panic 會終止當前 goroutine。如果 panic 發生在 main goroutine 且沒有 recover，程式直接退出，SDK 的 buffer 中的事件遺失。</p>
<p>SDK 可以提供 <code>monitor.RecoverAndReport()</code> 讓開發者在 goroutine 的入口用 <code>defer monitor.RecoverAndReport()</code> 攔截 panic，記錄 error 事件後再 re-panic（保持原有的 crash 行為）。</p>
<p>HTTP handler 的 panic 可以用 middleware 攔截：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln">1</span><span class="cl"><span class="kd">func</span> <span class="nf">monitorMiddleware</span><span class="p">(</span><span class="nx">next</span> <span class="nx">http</span><span class="p">.</span><span class="nx">Handler</span><span class="p">)</span> <span class="nx">http</span><span class="p">.</span><span class="nx">Handler</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">    <span class="k">return</span> <span class="nx">http</span><span class="p">.</span><span class="nf">HandlerFunc</span><span class="p">(</span><span class="kd">func</span><span class="p">(</span><span class="nx">w</span> <span class="nx">http</span><span class="p">.</span><span class="nx">ResponseWriter</span><span class="p">,</span> <span class="nx">r</span> <span class="o">*</span><span class="nx">http</span><span class="p">.</span><span class="nx">Request</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">        <span class="k">defer</span> <span class="nx">monitor</span><span class="p">.</span><span class="nf">RecoverAndReport</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">        <span class="nx">next</span><span class="p">.</span><span class="nf">ServeHTTP</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span> <span class="nx">r</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">    <span class="p">})</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><h2 id="http-server-自身監控">HTTP server 自身監控</h2>
<p>Go 常用來寫 collector 本身。Collector 需要監控自己的健康狀態 — 請求處理速率、錯誤率、goroutine 數量、記憶體使用量。</p>
<p>Collector 的自身監控和接收外部事件是兩個獨立的管線。自身監控的 metric 可以寫入獨立的 JSONL 檔案（和外部事件分開），或透過 Go 的 <code>expvar</code> / <code>runtime.ReadMemStats</code> 暴露為 HTTP endpoint。</p>
<p>自身監控的關鍵指標：</p>
<ul>
<li><code>collector.events.received</code>：每秒收到的事件數</li>
<li><code>collector.events.invalid</code>：schema 驗證失敗的事件數</li>
<li><code>collector.storage.write_duration_ms</code>：寫入 JSONL 的耗時</li>
<li><code>collector.goroutines</code>：goroutine 數量（洩漏偵測）</li>
<li><code>collector.memory.alloc_mb</code>：記憶體使用量</li>
</ul>
<h2 id="下一步路由">下一步路由</h2>
<ul>
<li>跨平台 timestamp 一致性 → <a href="/blog/monitoring/05-platform-adaptation/cross-platform-timestamp/" data-link-title="跨平台 timestamp 一致性" data-link-desc="時區、精度、clock drift — 不同平台產生的 timestamp 在 collector 端需要能正確比對和排序">跨平台 timestamp 一致性</a></li>
<li>Collector 的架構設計 → <a href="/blog/monitoring/04-collector/" data-link-title="模組四：Collector 設計" data-link-desc="收 → 驗 → 存 → 查 → 觸發的完整鏈路 — Go 單一 binary、可插拔 Storage Backend、rule engine">模組四 Collector 設計</a></li>
<li>SDK 公開 API 的 Close 方法 → <a href="/blog/monitoring/03-sdk-design/public-api/" data-link-title="SDK 公開 API 設計" data-link-desc="init / event / error / metric / flush / close 六個方法構成 SDK 的完整生命週期 — 跨平台共用相同 API 介面">模組三 SDK 公開 API</a></li>
</ul>
]]></content:encoded></item><item><title>Kubernetes Graceful Shutdown：termination 序列跟你以為的不一樣</title><link>https://tarrragon.github.io/blog/backend/05-deployment-platform/vendors/kubernetes/graceful-shutdown/</link><pubDate>Mon, 18 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/05-deployment-platform/vendors/kubernetes/graceful-shutdown/</guid><description>&lt;blockquote>
&lt;p>本文是 &lt;a href="https://tarrragon.github.io/blog/backend/05-deployment-platform/vendors/kubernetes/" data-link-title="Kubernetes" data-link-desc="Container orchestration 主流、GKE / EKS / AKS / 自管">Kubernetes&lt;/a> overview 的 implementation-layer deep article。Overview 已說明 K8s 在 deployment platform 譜系的定位、本文聚焦 &lt;em>pod termination&lt;/em> 這個 production 最常踩、被誤解最深的議題：序列、配置、五個 case、跟 service mesh 整合。&lt;/p>&lt;/blockquote>
&lt;h2 id="graceful-shutdown-沒做對500-期間每次-deploy-都吃-502">Graceful shutdown 沒做對、500 期間每次 deploy 都吃 502&lt;/h2>
&lt;p>最常見的觸發場景：deploy 新 image、prometheus alert 在 5 分鐘內收到一波 502 / 503、SRE 翻 application log 看到「正在處理 request」「connection closed」交替出現。Application 本身沒 bug、但 K8s 在 pod terminate 時跟 traffic 來源 &lt;em>沒對齊步調&lt;/em>、舊 pod 還在處理請求時就被 SIGKILL、新 request 還在打到準備關閉的 pod 上。&lt;/p>
&lt;p>很多團隊修法是 &lt;em>把 terminationGracePeriodSeconds 從 30 拉到 120&lt;/em>、暫時掩蓋問題；但症狀會在下次 rolling update / HPA scale-down / node drain 時換個形式回來。根因在 &lt;em>termination 序列&lt;/em> — pod 不是收到 SIGTERM 就 graceful、序列裡每一步出錯都有不同 fail mode。&lt;/p>
&lt;h2 id="termination-序列五步每步都能爆">Termination 序列：五步、每步都能爆&lt;/h2>
&lt;p>K8s 收到 delete pod 請求後、發生的事 &lt;em>按時間&lt;/em> 是：&lt;/p>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>時序&lt;/th>
 &lt;th>事件&lt;/th>
 &lt;th>動作來源&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>t=0&lt;/td>
 &lt;td>API server 標 pod 為 Terminating&lt;/td>
 &lt;td>kubelet 收到 delete&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>t=0&lt;/td>
 &lt;td>Pod 從 Service Endpoints 移除（&lt;strong>async&lt;/strong>）&lt;/td>
 &lt;td>endpoint controller&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>t=0&lt;/td>
 &lt;td>kubelet 跑 preStop hook（若有定義）&lt;/td>
 &lt;td>container runtime&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>t=preStop 結束&lt;/td>
 &lt;td>container 收到 SIGTERM&lt;/td>
 &lt;td>container runtime&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>t=SIGTERM + terminationGracePeriodSeconds&lt;/td>
 &lt;td>container 收到 SIGKILL&lt;/td>
 &lt;td>container runtime&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>關鍵誤解：&lt;/p>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>「pod 從 Service 移除」跟「container 收到 SIGTERM」是 &lt;em>平行&lt;/em>、不是序列&lt;/strong>。Endpoint controller 更新 Endpoints object → kube-proxy 重新寫 iptables → 各 node 的 traffic 才真正停 — 這條鏈通常需要 &lt;em>1-5 秒&lt;/em>；同時間 SIGTERM 已經發給 application。&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>preStop hook 是「container 還在跑、SIGTERM 還沒發」期間執行&lt;/strong>。pre-Stop 設 &lt;code>sleep 10&lt;/code> 是 production 標準作法 — 用 sleep 讓 endpoint controller 有時間把 pod 從 Service 移除、避免 SIGTERM 期間還有新 request 進來。&lt;/p></description><content:encoded><![CDATA[<blockquote>
<p>本文是 <a href="/blog/backend/05-deployment-platform/vendors/kubernetes/" data-link-title="Kubernetes" data-link-desc="Container orchestration 主流、GKE / EKS / AKS / 自管">Kubernetes</a> overview 的 implementation-layer deep article。Overview 已說明 K8s 在 deployment platform 譜系的定位、本文聚焦 <em>pod termination</em> 這個 production 最常踩、被誤解最深的議題：序列、配置、五個 case、跟 service mesh 整合。</p></blockquote>
<h2 id="graceful-shutdown-沒做對500-期間每次-deploy-都吃-502">Graceful shutdown 沒做對、500 期間每次 deploy 都吃 502</h2>
<p>最常見的觸發場景：deploy 新 image、prometheus alert 在 5 分鐘內收到一波 502 / 503、SRE 翻 application log 看到「正在處理 request」「connection closed」交替出現。Application 本身沒 bug、但 K8s 在 pod terminate 時跟 traffic 來源 <em>沒對齊步調</em>、舊 pod 還在處理請求時就被 SIGKILL、新 request 還在打到準備關閉的 pod 上。</p>
<p>很多團隊修法是 <em>把 terminationGracePeriodSeconds 從 30 拉到 120</em>、暫時掩蓋問題；但症狀會在下次 rolling update / HPA scale-down / node drain 時換個形式回來。根因在 <em>termination 序列</em> — pod 不是收到 SIGTERM 就 graceful、序列裡每一步出錯都有不同 fail mode。</p>
<h2 id="termination-序列五步每步都能爆">Termination 序列：五步、每步都能爆</h2>
<p>K8s 收到 delete pod 請求後、發生的事 <em>按時間</em> 是：</p>
<table>
  <thead>
      <tr>
          <th>時序</th>
          <th>事件</th>
          <th>動作來源</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>t=0</td>
          <td>API server 標 pod 為 Terminating</td>
          <td>kubelet 收到 delete</td>
      </tr>
      <tr>
          <td>t=0</td>
          <td>Pod 從 Service Endpoints 移除（<strong>async</strong>）</td>
          <td>endpoint controller</td>
      </tr>
      <tr>
          <td>t=0</td>
          <td>kubelet 跑 preStop hook（若有定義）</td>
          <td>container runtime</td>
      </tr>
      <tr>
          <td>t=preStop 結束</td>
          <td>container 收到 SIGTERM</td>
          <td>container runtime</td>
      </tr>
      <tr>
          <td>t=SIGTERM + terminationGracePeriodSeconds</td>
          <td>container 收到 SIGKILL</td>
          <td>container runtime</td>
      </tr>
  </tbody>
</table>
<p>關鍵誤解：</p>
<ol>
<li>
<p><strong>「pod 從 Service 移除」跟「container 收到 SIGTERM」是 <em>平行</em>、不是序列</strong>。Endpoint controller 更新 Endpoints object → kube-proxy 重新寫 iptables → 各 node 的 traffic 才真正停 — 這條鏈通常需要 <em>1-5 秒</em>；同時間 SIGTERM 已經發給 application。</p>
</li>
<li>
<p><strong>preStop hook 是「container 還在跑、SIGTERM 還沒發」期間執行</strong>。pre-Stop 設 <code>sleep 10</code> 是 production 標準作法 — 用 sleep 讓 endpoint controller 有時間把 pod 從 Service 移除、避免 SIGTERM 期間還有新 request 進來。</p>
</li>
<li>
<p><strong>terminationGracePeriodSeconds 是 <em>從 preStop 開始</em> 計時、不是從 SIGTERM</strong>。preStop sleep 10s + application 30s graceful = 至少要設 40s。</p>
</li>
<li>
<p><strong>graceful 不是 framework 自動的</strong>。Application 必須 <em>主動處理 SIGTERM</em>：拒絕新 request、等 in-flight 完成、close DB connection、flush log。沒處理 SIGTERM、container 會在 grace period 後被強殺。</p>
</li>
<li>
<p><strong>readiness probe 在 Terminating 期間 <em>仍會被執行</em>、但結果不影響 traffic</strong>（已經從 Endpoints 移除）。但若 application 沒主動讓 readiness fail、service mesh / external LB 可能仍在送 request（依不同 mesh 行為）。</p>
</li>
</ol>
<h2 id="配置全圖">配置全圖</h2>
<h3 id="deployment-spec">Deployment spec</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">apps/v1</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="w"></span><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">Deployment</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="w"></span><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="w">  </span><span class="nt">template</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="w">    </span><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="w">      </span><span class="nt">terminationGracePeriodSeconds</span><span class="p">:</span><span class="w"> </span><span class="m">60</span><span class="w">          </span><span class="c"># SIGTERM 後 60s 才 SIGKILL</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="w">      </span><span class="nt">containers</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="w">        </span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">app</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="w">          </span><span class="nt">lifecycle</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="w">            </span><span class="nt">preStop</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="w">              </span><span class="nt">exec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="w">                </span><span class="nt">command</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="s2">&#34;/bin/sh&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;-c&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;sleep 10&#34;</span><span class="p">]</span><span class="w">
</span></span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="w">          </span><span class="nt">readinessProbe</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="w">            </span><span class="nt">httpGet</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="w">              </span><span class="nt">path</span><span class="p">:</span><span class="w"> </span><span class="l">/healthz/ready</span><span class="w">
</span></span></span><span class="line"><span class="ln">16</span><span class="cl"><span class="w">              </span><span class="nt">port</span><span class="p">:</span><span class="w"> </span><span class="m">8080</span><span class="w">
</span></span></span><span class="line"><span class="ln">17</span><span class="cl"><span class="w">            </span><span class="nt">periodSeconds</span><span class="p">:</span><span class="w"> </span><span class="m">5</span><span class="w">
</span></span></span><span class="line"><span class="ln">18</span><span class="cl"><span class="w">            </span><span class="nt">failureThreshold</span><span class="p">:</span><span class="w"> </span><span class="m">2</span></span></span></code></pre></div><p>時序：t=0 preStop 開始 sleep 10s → t=10s container SIGTERM → t=70s SIGKILL（不是 t=60s、是 60s after SIGTERM）。</p>
<h3 id="application-處理-sigtermgo-範例">Application 處理 SIGTERM（Go 範例）</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="nx">sigs</span> <span class="o">:=</span> <span class="nb">make</span><span class="p">(</span><span class="kd">chan</span> <span class="nx">os</span><span class="p">.</span><span class="nx">Signal</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="nx">signal</span><span class="p">.</span><span class="nf">Notify</span><span class="p">(</span><span class="nx">sigs</span><span class="p">,</span> <span class="nx">syscall</span><span class="p">.</span><span class="nx">SIGTERM</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="nx">server</span> <span class="o">:=</span> <span class="o">&amp;</span><span class="nx">http</span><span class="p">.</span><span class="nx">Server</span><span class="p">{</span><span class="nx">Addr</span><span class="p">:</span> <span class="s">&#34;:8080&#34;</span><span class="p">}</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="k">go</span> <span class="nx">server</span><span class="p">.</span><span class="nf">ListenAndServe</span><span class="p">()</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="o">&lt;-</span><span class="nx">sigs</span>                                              <span class="c1">// 等 SIGTERM</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="nx">log</span><span class="p">.</span><span class="nf">Println</span><span class="p">(</span><span class="s">&#34;SIGTERM received, draining...&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="c1">// 1. readiness fail（讓 mesh-aware 流量停）</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="nx">ready</span><span class="p">.</span><span class="nf">Store</span><span class="p">(</span><span class="kc">false</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">
</span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="c1">// 2. wait 5s 讓 readiness probe failureThreshold 觸發</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="nx">time</span><span class="p">.</span><span class="nf">Sleep</span><span class="p">(</span><span class="mi">5</span> <span class="o">*</span> <span class="nx">time</span><span class="p">.</span><span class="nx">Second</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">
</span></span><span class="line"><span class="ln">16</span><span class="cl"><span class="c1">// 3. graceful shutdown server（拒新請求、等 in-flight）</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl"><span class="nx">ctx</span><span class="p">,</span> <span class="nx">cancel</span> <span class="o">:=</span> <span class="nx">context</span><span class="p">.</span><span class="nf">WithTimeout</span><span class="p">(</span><span class="nx">context</span><span class="p">.</span><span class="nf">Background</span><span class="p">(),</span> <span class="mi">45</span><span class="o">*</span><span class="nx">time</span><span class="p">.</span><span class="nx">Second</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">18</span><span class="cl"><span class="k">defer</span> <span class="nf">cancel</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">19</span><span class="cl"><span class="nx">server</span><span class="p">.</span><span class="nf">Shutdown</span><span class="p">(</span><span class="nx">ctx</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">20</span><span class="cl">
</span></span><span class="line"><span class="ln">21</span><span class="cl"><span class="c1">// 4. close DB / cache / message consumer</span>
</span></span><span class="line"><span class="ln">22</span><span class="cl"><span class="nx">db</span><span class="p">.</span><span class="nf">Close</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">23</span><span class="cl"><span class="nx">consumer</span><span class="p">.</span><span class="nf">Stop</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">24</span><span class="cl">
</span></span><span class="line"><span class="ln">25</span><span class="cl"><span class="c1">// 5. flush log + exit</span>
</span></span><span class="line"><span class="ln">26</span><span class="cl"><span class="nx">logger</span><span class="p">.</span><span class="nf">Sync</span><span class="p">()</span></span></span></code></pre></div><p>關鍵：<code>server.Shutdown(ctx)</code> 是 <em>拒新請求、等 in-flight</em>、ctx timeout 設 <em>grace period 減去 preStop sleep 跟 readiness fail 等待時間</em>（60s - 10s - 5s = 45s）。</p>
<h2 id="production-故障演練">Production 故障演練</h2>
<h3 id="case-1rolling-update-期間-502--503">Case 1：Rolling update 期間 502 / 503</h3>
<p><strong>徵兆</strong>：每次 deploy 後 5 分鐘內 LB / ingress log 一波 502 / 503、application log 顯示「context canceled」「connection closed by peer」、新 pod 已 ready 但舊 pod 在 grace period 內仍收 request。</p>
<p><strong>根因</strong>：沒設 preStop sleep、container 收到 SIGTERM 後立刻 <code>server.Shutdown()</code>、但 kube-proxy 還沒把舊 pod 從 iptables 移除、新 request 持續送到舊 pod、舊 pod 已拒收。</p>
<p><strong>修法</strong>：preStop <code>sleep 10</code>、讓 endpoint propagation 完成再進入 SIGTERM 流程。</p>
<h3 id="case-2connection-drain-racelong-running-request-被中斷">Case 2：Connection drain race，long-running request 被中斷</h3>
<p><strong>徵兆</strong>：deploy 後 application log 有大量 <code>context canceled</code> 對應到 long-running endpoint（例：報表生成、檔案上傳）、user 端看到 transaction 失敗、但短 request 沒事。</p>
<p><strong>根因</strong>：long-running endpoint 處理時間 &gt; terminationGracePeriodSeconds、<code>server.Shutdown(ctx)</code> ctx timeout 設太短、in-flight 強制中斷。</p>
<p><strong>修法</strong>：</p>
<ol>
<li>把 long-running endpoint 改 async（背景 job + status endpoint）、HTTP request 立刻 return job ID</li>
<li>短期：terminationGracePeriodSeconds 拉到 long-running 99 percentile + buffer</li>
<li>application 側 ctx timeout = grace period - preStop - readiness fail wait</li>
</ol>
<h3 id="case-3init-container-在-grace-period-期間重啟sigterm-沒到-main">Case 3：Init container 在 grace period 期間重啟、SIGTERM 沒到 main</h3>
<p><strong>徵兆</strong>：pod 顯示 Terminating 但 phase 一直在 Running、main container restart count + 1、application log 沒看到「SIGTERM received」。</p>
<p><strong>根因</strong>：init container 用 <code>restartPolicy: Always</code>（K8s 1.28+ sidecar 模式）、或 main container 在 SIGTERM 前先 crash 觸發 restart、kubelet 在 restart 後 <em>不重發 SIGTERM</em>、main container 跑到 grace period 結束直接 SIGKILL。</p>
<p><strong>修法</strong>：</p>
<ol>
<li>Sidecar container（restartPolicy: Always）的 preStop 也要設 <code>sleep</code>、跟 main 同 lifecycle</li>
<li>main container readinessProbe 失敗時 <em>別自動 restart</em>（restartPolicy: OnFailure + crashLoopBackOff 觀察）</li>
<li>觀察 <code>kubectl describe pod</code> 的 events、SIGTERM 沒發出來會有 <code>Killing container</code> event 缺失</li>
</ol>
<h3 id="case-4statefulset-串行終止總時間--pod-數--grace-period">Case 4：StatefulSet 串行終止、總時間 = pod 數 × grace period</h3>
<p><strong>徵兆</strong>：StatefulSet rolling update / scale-down 比 Deployment 慢 N 倍（N = replica 數）、deploy 一個 5 replica 的 statefulset 要 5 分鐘以上。</p>
<p><strong>根因</strong>：StatefulSet 預設 <code>podManagementPolicy: OrderedReady</code> — pod 串行終止 + 串行創建、每個 pod 至少要 grace period 完成才動下一個。Deployment 用 <code>RollingUpdate</code> 預設 maxUnavailable=25% 平行終止。</p>
<p><strong>修法</strong>：</p>
<ol>
<li>StatefulSet 改 <code>podManagementPolicy: Parallel</code>（若 application 不要求嚴格順序）</li>
<li>嚴格順序情境（Cassandra / Kafka / etcd）保留 OrderedReady、但 grace period 設 <em>單 pod 必要時間</em>、不要設 <em>總時間能承受</em></li>
<li>接受序列化代價、把 deploy 排在低流量時段</li>
</ol>
<h3 id="case-5job--cronjob-不-gracefulsigterm-直接-sigkill">Case 5：Job / CronJob 不 graceful、SIGTERM 直接 SIGKILL</h3>
<p><strong>徵兆</strong>：CronJob 在 Job timeout / pod eviction 時不 graceful、寫一半的 file 留在 PVC、下次跑時 corrupt；application log 沒「SIGTERM received」、直接斷。</p>
<p><strong>根因</strong>：Job 的 <code>activeDeadlineSeconds</code> 到期 / node eviction 觸發時、K8s 對 Job pod <em>仍會發 SIGTERM</em>、但 <em>很多 batch framework（Spring Batch / Argo Workflow worker）沒處理 SIGTERM</em>、application 沒主動 checkpoint。</p>
<p><strong>修法</strong>：</p>
<ol>
<li>Batch application 處理 SIGTERM、checkpoint 進度寫 storage、下次跑時 resume</li>
<li>不適合 checkpoint 的 batch、保證 <em>idempotent re-run</em>、SIGKILL 後重跑不會 corrupt</li>
<li>Job spec 加 <code>terminationGracePeriodSeconds</code>（預設 30、batch 通常要 60-300）</li>
</ol>
<h2 id="規模影響">規模影響</h2>
<p>Graceful shutdown 的成本主要在 <em>deploy 時間</em> 跟 <em>capacity buffer</em>：</p>
<table>
  <thead>
      <tr>
          <th>規模因素</th>
          <th>影響</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>terminationGracePeriod 60s</td>
          <td>單 pod deploy ~70-80s（含 preStop + grace + new pod startup）</td>
      </tr>
      <tr>
          <td>Deployment 100 replica + maxSurge 25%</td>
          <td>全 deploy ~5-10 分鐘、需要 <em>25% extra capacity</em>（25 replica buffer）</td>
      </tr>
      <tr>
          <td>StatefulSet 串行 + 60s grace</td>
          <td>10 replica 約 10-12 分鐘、deploy window 要在低流量時段</td>
      </tr>
      <tr>
          <td>HPA scale-down 跟 graceful 一起跑</td>
          <td>scale-down 觸發 → preStop + grace + new metric → 下次 scale 判斷、avg 反應週期 ≈ 3-5 分鐘</td>
      </tr>
  </tbody>
</table>
<p>實務 default：</p>
<ul>
<li>Web service：<code>terminationGracePeriodSeconds: 60</code>、preStop sleep 10、application graceful 45s</li>
<li>Backend worker（消費 queue）：<code>terminationGracePeriodSeconds: 120</code>、preStop 不 sleep（用 readiness 控）、application 處理當前 message + commit offset</li>
<li>Batch job：<code>terminationGracePeriodSeconds: 300</code>、checkpoint pattern</li>
<li>StatefulSet（DB / queue）：grace period 對齊 vendor 建議（Kafka 90s、PostgreSQL 60s）</li>
</ul>
<h2 id="跟其他元件整合">跟其他元件整合</h2>
<h3 id="service-meshistio--linkerd">Service mesh（Istio / Linkerd）</h3>
<p>Service mesh sidecar（envoy / linkerd-proxy）也有自己的 termination — 通常比 main container 晚一點關。配置原則：</p>
<ol>
<li>mesh sidecar 設 <code>terminationGracePeriodSeconds</code> 比 main 多 5-10s、main 處理完才換 sidecar</li>
<li>Istio 1.12+ 的 <code>proxy.istio.io/config.holdApplicationUntilProxyStarts</code> 控啟動順序、shutdown 也要對應</li>
<li>mTLS 環境 graceful 多一道：在 SIGTERM 後等 mesh 主動 close cert rotation、不要硬斷</li>
</ol>
<h3 id="readiness-probe-跟-mesh-aware-traffic">Readiness probe 跟 mesh-aware traffic</h3>
<p>純 K8s Service（kube-proxy iptables）：endpoint 移除後 <em>已建立 connection 仍會跑完</em>、新 connection 不來。Mesh-aware traffic（service mesh / external LB with health check）：要 readiness fail 才會停送。</p>
<p>修法：application graceful 第一步是 <code>ready.Store(false)</code> + 等 readiness probe 至少 fail 一次（5-10s）、才開始 server.Shutdown。</p>
<h3 id="跟-pod-disruption-budgetpdb的衝突">跟 Pod Disruption Budget（PDB）的衝突</h3>
<p>Node drain 時 PDB 限制可同時 unavailable 的 pod 數、graceful shutdown 拖長會讓 drain 卡住。對策：</p>
<ol>
<li>緊急 drain（node 硬體故障）：<code>kubectl drain --grace-period=30 --force</code>、接受短時間 502</li>
<li>正常 drain（升級 / 維運）：PDB 設 <code>minAvailable: &lt;replicas-1&gt;</code>、容許單 pod 慢慢 graceful</li>
<li>不要設 <code>maxUnavailable: 0</code>、會讓 drain 卡死</li>
</ol>
<h2 id="下一步">下一步</h2>
<ul>
<li><strong>Application graceful 寫法</strong>：<a href="https://12factor.net/disposability">12-factor app</a> disposability 章節給 framework-agnostic 模板、各語言 SDK 寫法見對應 framework</li>
<li><strong>Queue consumer 的 graceful</strong>：訊息 ack / offset commit 必須在 SIGTERM 內完成、否則 duplicate message — 對應 <a href="/blog/backend/03-message-queue/" data-link-title="模組三：訊息佇列與事件傳遞" data-link-desc="整理 durable queue、broker、retry、outbox 與 idempotency 的後端實務">03 message queue</a> 模組的 consumer-design 段</li>
<li><strong>跨 region / 多 cluster 的 graceful</strong>：multi-cluster service mesh（Istio multicluster / Linkerd multicluster）的 traffic shift 期間 graceful 行為跟單 cluster 不同、需要對齊 mesh 配置</li>
</ul>
<h2 id="相關連結">相關連結</h2>
<ul>
<li>上游 vendor 頁：<a href="/blog/backend/05-deployment-platform/vendors/kubernetes/" data-link-title="Kubernetes" data-link-desc="Container orchestration 主流、GKE / EKS / AKS / 自管">Kubernetes</a></li>
<li>上游 chapter：<a href="/blog/backend/05-deployment-platform/deployment-rollout-drain-rollback/" data-link-title="5.8 Deployment Rollout with Drain and Rollback（實作示範）" data-link-desc="以 checkout service 示範部署切換如何交付 canary evidence、drain signal、release gate 與 incident decision log。">5.X deployment-rollout-drain-rollback</a></li>
<li>對照案例：rolling update 期間 502 多見於 stage-3 mesh adoption case 庫</li>
<li>平行 deep article：<a href="/blog/backend/01-database/vendors/postgresql/pgbouncer-config/" data-link-title="PostgreSQL pgBouncer 配置 &#43; 連線池治理" data-link-desc="pgBouncer transaction pooling 配置、跟 application connection pool 的分層、production 故障演練（pool exhaustion / stale connection / DNS failover）跟容量規劃">pgBouncer 配置</a> / <a href="/blog/backend/07-security-data-protection/vendors/hashicorp-vault/dynamic-credential/" data-link-title="HashiCorp Vault Dynamic Credential：lease 治理跟 application 整合的實作層" data-link-desc="Vault database secrets engine 怎麼配、application 怎麼 renew lease、production 五大踩雷（lease 過期 race、DB max_connections 撞牆、Vault sealed、token expire、scope 過寬）、容量規劃跟 vault-agent injector 整合">Vault Dynamic Credential</a></li>
<li>Methodology：<a href="/blog/posts/vendor-%E6%B7%B1%E5%BA%A6%E6%8A%80%E8%A1%93%E6%96%87%E7%AB%A0%E6%96%B9%E6%B3%95%E8%AB%96%E7%9A%84%E6%BC%94%E5%8C%96%E7%B4%80%E9%8C%84%E5%90%8C-vendor-%E7%B3%BB%E5%88%97%E7%9A%84%E9%96%8B%E5%A0%B4%E8%BC%AA%E6%9B%BF%E9%A9%97%E8%AD%89/" data-link-title="Vendor 深度技術文章方法論的演化紀錄：同 vendor 系列的開場輪替驗證" data-link-desc="vendor overview 飽和後要寫單一功能深度文章、需要選題與結構依據時回來。這套方法論的驗證來源與 cadence variant 在高風險場景（同 vendor sub-tool 系列）的實證。">Vendor 深度技術文章的寫作方法論</a></li>
</ul>
]]></content:encoded></item></channel></rss>