<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Logging on Tarragon</title><link>https://tarrragon.github.io/blog/tags/logging/</link><description>Recent content in Logging on Tarragon</description><generator>Hugo -- gohugo.io</generator><language>zh-TW</language><copyright>Tarragon (CC BY 4.0)</copyright><lastBuildDate>Fri, 26 Jun 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://tarrragon.github.io/blog/tags/logging/index.xml" rel="self" type="application/rss+xml"/><item><title>可觀測性與 log 同生命週期管理</title><link>https://tarrragon.github.io/blog/infra/06-observability-logging/log-metric-alarm-lifecycle/</link><pubDate>Fri, 26 Jun 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/infra/06-observability-logging/log-metric-alarm-lifecycle/</guid><description>&lt;p>可觀測性要跟它監控的資源同生命週期：log group、metric 與 alarm 寫進建立資源的同一套 IaC，資源開出來的那一刻監控就在線，而非等出事才補。這條規則的責任是讓基礎設施在出事時可被追查、在日常時可被量化，而它的建立與銷毀和被監控的資源綁在一起，則保證監控的覆蓋率不會隨時間衰退。&lt;/p>
&lt;p>沒有同生命週期管理時，新服務上線後的監控覆蓋率取決於有沒有人記得手動建立 log group 和 alarm，而這個記憶在服務數量增長後會衰退。監控缺口在平時不被注意，在事故排查時才浮現 — 需要回溯「什麼時候開始劣化」時，可能發現劣化期間根本沒有對應的 metric 資料。&lt;/p>
&lt;h2 id="同生命週期的落地方式">同生命週期的落地方式&lt;/h2>
&lt;p>可觀測性是基礎設施的一部分，它的建立、變更與銷毀要跟被監控的資源綁在同一個 apply 單位裡。一個 RDS 實例被 IaC 建立時，它的 log group、它的關鍵 metric alarm 應該在同一份 &lt;code>terraform plan&lt;/code> 裡一起出現；這個資源被 destroy 時，對應的 alarm 也一起收掉。&lt;/p>
&lt;p>落地方式是把監控宣告收進服務的 module。&lt;a href="https://tarrragon.github.io/blog/infra/04-environment-separation/" data-link-title="模組四：環境分離與模組化" data-link-desc="dev / staging / prod 切分、目錄結構 vs workspace、用可重用 module 避免環境漂移">模組四（環境分離與模組化）&lt;/a>談的模組化在這裡延伸成「每個服務模組自帶它的 observability 宣告」。一個 database module 內部除了 &lt;code>aws_db_instance&lt;/code>，還包含它的 log group、CPU alarm、連線數 alarm：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-hcl" data-lang="hcl">&lt;span class="line">&lt;span class="ln"> 1&lt;/span>&lt;span class="cl">&lt;span class="c1"># modules/database/monitoring.tf — 跟 database 資源同一個 module
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 2&lt;/span>&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="k">resource&lt;/span> &lt;span class="s2">&amp;#34;aws_cloudwatch_log_group&amp;#34; &amp;#34;db_slow_query&amp;#34;&lt;/span> {
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 3&lt;/span>&lt;span class="cl">&lt;span class="n"> name&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s2">&amp;#34;/rds/${var.env}/${var.db_identifier}/slowquery&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 4&lt;/span>&lt;span class="cl">&lt;span class="n"> retention_in_days&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="k">var&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="k">log_retention_days&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 5&lt;/span>&lt;span class="cl">&lt;span class="n"> kms_key_id&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="k">var&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="k">log_kms_key_arn&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 6&lt;/span>&lt;span class="cl">}
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 7&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 8&lt;/span>&lt;span class="cl">&lt;span class="k">resource&lt;/span> &lt;span class="s2">&amp;#34;aws_cloudwatch_metric_alarm&amp;#34; &amp;#34;db_cpu&amp;#34;&lt;/span> {
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 9&lt;/span>&lt;span class="cl">&lt;span class="n"> alarm_name&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s2">&amp;#34;${var.env}-${var.db_identifier}-cpu-high&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">10&lt;/span>&lt;span class="cl">&lt;span class="n"> comparison_operator&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s2">&amp;#34;GreaterThanThreshold&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">11&lt;/span>&lt;span class="cl">&lt;span class="n"> evaluation_periods&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="m">3&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">12&lt;/span>&lt;span class="cl">&lt;span class="n"> metric_name&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s2">&amp;#34;CPUUtilization&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">13&lt;/span>&lt;span class="cl">&lt;span class="n"> namespace&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s2">&amp;#34;AWS/RDS&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">14&lt;/span>&lt;span class="cl">&lt;span class="n"> period&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="m">300&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">15&lt;/span>&lt;span class="cl">&lt;span class="n"> statistic&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s2">&amp;#34;Average&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">16&lt;/span>&lt;span class="cl">&lt;span class="n"> threshold&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="m">80&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">17&lt;/span>&lt;span class="cl">&lt;span class="n"> alarm_actions&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="k">var&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="k">oncall_sns_arn&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">18&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">19&lt;/span>&lt;span class="cl">&lt;span class="n"> dimensions&lt;/span> &lt;span class="o">=&lt;/span> {
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">20&lt;/span>&lt;span class="cl">&lt;span class="n"> DBInstanceIdentifier&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="k">aws_db_instance&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="k">primary&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="k">identifier&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">21&lt;/span>&lt;span class="cl"> }
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">22&lt;/span>&lt;span class="cl">}&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>這樣 &lt;code>terraform apply&lt;/code> 建資料庫的同一刻，監控就存在；&lt;code>terraform destroy&lt;/code> 砍資料庫時，孤兒 alarm 也一起清掉。新環境套用同一個 module 時，監控覆蓋率自動跟著資源走，不需要額外的人工記憶。&lt;/p>
&lt;h2 id="監控脫鉤造成的兩類漂移">監控脫鉤造成的兩類漂移&lt;/h2>
&lt;p>把監控外掛在資源之外（用另一份 IaC、另一個 repo、或手動在 console 設定）會製造兩種方向相反的漂移，兩者的共同根因都是監控跟資源不在同一個 apply 單位裡。&lt;/p>
&lt;h3 id="漂移一新資源沒有監控">漂移一：新資源沒有監控&lt;/h3>
&lt;p>service 透過 PR 加上去了，但 alarm 的建立依賴某人事後手動進 console 設定，或等另一個 repo 的 PR 跟上。於是有些 service 有 alarm、有些沒有，覆蓋率取決於「誰記得」。沒有 alarm 的 service 出事時，事故發現路徑從「告警 → 排查」退化成「客訴 → 排查」，反應時間從分鐘級退化到小時級。&lt;/p>
&lt;p>用一條查詢就能看出這個漂移有多嚴重：列出所有 RDS instance，比對各自有沒有對應的 CloudWatch alarm。沒有 alarm 的 instance 就是漂移的活證據。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="c1"># 列出所有 RDS instance，比對有沒有對應的 CloudWatch alarm&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">aws rds describe-db-instances &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">&lt;span class="se">&lt;/span> --query &lt;span class="s1">&amp;#39;DBInstances[].DBInstanceIdentifier&amp;#39;&lt;/span> --output text &lt;span class="p">|&lt;/span> tr &lt;span class="s1">&amp;#39;\t&amp;#39;&lt;/span> &lt;span class="s1">&amp;#39;\n&amp;#39;&lt;/span> &lt;span class="p">|&lt;/span> &lt;span class="k">while&lt;/span> &lt;span class="nb">read&lt;/span> db&lt;span class="p">;&lt;/span> &lt;span class="k">do&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl"> &lt;span class="nv">count&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="k">$(&lt;/span>aws cloudwatch describe-alarms &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl">&lt;span class="se">&lt;/span> --alarm-name-prefix &lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="si">${&lt;/span>&lt;span class="nv">db&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span> --query &lt;span class="s1">&amp;#39;MetricAlarms | length(@)&amp;#39;&lt;/span>&lt;span class="k">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">6&lt;/span>&lt;span class="cl"> &lt;span class="nb">echo&lt;/span> &lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="si">${&lt;/span>&lt;span class="nv">db&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s2">: &lt;/span>&lt;span class="si">${&lt;/span>&lt;span class="nv">count&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s2"> alarms&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">7&lt;/span>&lt;span class="cl">&lt;span class="k">done&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="漂移二死資源留下殘響">漂移二：死資源留下殘響&lt;/h3>
&lt;p>資源砍了但 alarm 還在，orphan alarm 對不存在的 target 持續報 &lt;code>INSUFFICIENT_DATA&lt;/code>，跟有效 alarm 混在同一個通知頻道裡，降低告警的訊噪比。訊噪比低到一定程度後，有效的 &lt;code>INSUFFICIENT_DATA&lt;/code>（某個服務停止送 metric）也被一起略過 — 告警疲勞讓 alarm 從保護機制退化成背景噪音。&lt;/p></description><content:encoded><![CDATA[<p>可觀測性要跟它監控的資源同生命週期：log group、metric 與 alarm 寫進建立資源的同一套 IaC，資源開出來的那一刻監控就在線，而非等出事才補。這條規則的責任是讓基礎設施在出事時可被追查、在日常時可被量化，而它的建立與銷毀和被監控的資源綁在一起，則保證監控的覆蓋率不會隨時間衰退。</p>
<p>沒有同生命週期管理時，新服務上線後的監控覆蓋率取決於有沒有人記得手動建立 log group 和 alarm，而這個記憶在服務數量增長後會衰退。監控缺口在平時不被注意，在事故排查時才浮現 — 需要回溯「什麼時候開始劣化」時，可能發現劣化期間根本沒有對應的 metric 資料。</p>
<h2 id="同生命週期的落地方式">同生命週期的落地方式</h2>
<p>可觀測性是基礎設施的一部分，它的建立、變更與銷毀要跟被監控的資源綁在同一個 apply 單位裡。一個 RDS 實例被 IaC 建立時，它的 log group、它的關鍵 metric alarm 應該在同一份 <code>terraform plan</code> 裡一起出現；這個資源被 destroy 時，對應的 alarm 也一起收掉。</p>
<p>落地方式是把監控宣告收進服務的 module。<a href="/blog/infra/04-environment-separation/" data-link-title="模組四：環境分離與模組化" data-link-desc="dev / staging / prod 切分、目錄結構 vs workspace、用可重用 module 避免環境漂移">模組四（環境分離與模組化）</a>談的模組化在這裡延伸成「每個服務模組自帶它的 observability 宣告」。一個 database module 內部除了 <code>aws_db_instance</code>，還包含它的 log group、CPU alarm、連線數 alarm：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-hcl" data-lang="hcl"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># modules/database/monitoring.tf — 跟 database 資源同一個 module
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="c1"></span><span class="k">resource</span> <span class="s2">&#34;aws_cloudwatch_log_group&#34; &#34;db_slow_query&#34;</span> {
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="n">  name</span>              <span class="o">=</span> <span class="s2">&#34;/rds/${var.env}/${var.db_identifier}/slowquery&#34;</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="n">  retention_in_days</span> <span class="o">=</span> <span class="k">var</span><span class="p">.</span><span class="k">log_retention_days</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="n">  kms_key_id</span>        <span class="o">=</span> <span class="k">var</span><span class="p">.</span><span class="k">log_kms_key_arn</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">}
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">
</span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="k">resource</span> <span class="s2">&#34;aws_cloudwatch_metric_alarm&#34; &#34;db_cpu&#34;</span> {
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="n">  alarm_name</span>          <span class="o">=</span> <span class="s2">&#34;${var.env}-${var.db_identifier}-cpu-high&#34;</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="n">  comparison_operator</span> <span class="o">=</span> <span class="s2">&#34;GreaterThanThreshold&#34;</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="n">  evaluation_periods</span>  <span class="o">=</span> <span class="m">3</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="n">  metric_name</span>         <span class="o">=</span> <span class="s2">&#34;CPUUtilization&#34;</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="n">  namespace</span>           <span class="o">=</span> <span class="s2">&#34;AWS/RDS&#34;</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="n">  period</span>              <span class="o">=</span> <span class="m">300</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="n">  statistic</span>           <span class="o">=</span> <span class="s2">&#34;Average&#34;</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl"><span class="n">  threshold</span>           <span class="o">=</span> <span class="m">80</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl"><span class="n">  alarm_actions</span>       <span class="o">=</span> <span class="p">[</span><span class="k">var</span><span class="p">.</span><span class="k">oncall_sns_arn</span><span class="p">]</span>
</span></span><span class="line"><span class="ln">18</span><span class="cl">
</span></span><span class="line"><span class="ln">19</span><span class="cl"><span class="n">  dimensions</span> <span class="o">=</span> {
</span></span><span class="line"><span class="ln">20</span><span class="cl"><span class="n">    DBInstanceIdentifier</span> <span class="o">=</span> <span class="k">aws_db_instance</span><span class="p">.</span><span class="k">primary</span><span class="p">.</span><span class="k">identifier</span>
</span></span><span class="line"><span class="ln">21</span><span class="cl">  }
</span></span><span class="line"><span class="ln">22</span><span class="cl">}</span></span></code></pre></div><p>這樣 <code>terraform apply</code> 建資料庫的同一刻，監控就存在；<code>terraform destroy</code> 砍資料庫時，孤兒 alarm 也一起清掉。新環境套用同一個 module 時，監控覆蓋率自動跟著資源走，不需要額外的人工記憶。</p>
<h2 id="監控脫鉤造成的兩類漂移">監控脫鉤造成的兩類漂移</h2>
<p>把監控外掛在資源之外（用另一份 IaC、另一個 repo、或手動在 console 設定）會製造兩種方向相反的漂移，兩者的共同根因都是監控跟資源不在同一個 apply 單位裡。</p>
<h3 id="漂移一新資源沒有監控">漂移一：新資源沒有監控</h3>
<p>service 透過 PR 加上去了，但 alarm 的建立依賴某人事後手動進 console 設定，或等另一個 repo 的 PR 跟上。於是有些 service 有 alarm、有些沒有，覆蓋率取決於「誰記得」。沒有 alarm 的 service 出事時，事故發現路徑從「告警 → 排查」退化成「客訴 → 排查」，反應時間從分鐘級退化到小時級。</p>
<p>用一條查詢就能看出這個漂移有多嚴重：列出所有 RDS instance，比對各自有沒有對應的 CloudWatch alarm。沒有 alarm 的 instance 就是漂移的活證據。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># 列出所有 RDS instance，比對有沒有對應的 CloudWatch alarm</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">aws rds describe-db-instances <span class="se">\
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="se"></span>  --query <span class="s1">&#39;DBInstances[].DBInstanceIdentifier&#39;</span> --output text <span class="p">|</span> tr <span class="s1">&#39;\t&#39;</span> <span class="s1">&#39;\n&#39;</span> <span class="p">|</span> <span class="k">while</span> <span class="nb">read</span> db<span class="p">;</span> <span class="k">do</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">  <span class="nv">count</span><span class="o">=</span><span class="k">$(</span>aws cloudwatch describe-alarms <span class="se">\
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="se"></span>    --alarm-name-prefix <span class="s2">&#34;</span><span class="si">${</span><span class="nv">db</span><span class="si">}</span><span class="s2">&#34;</span> --query <span class="s1">&#39;MetricAlarms | length(@)&#39;</span><span class="k">)</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl">  <span class="nb">echo</span> <span class="s2">&#34;</span><span class="si">${</span><span class="nv">db</span><span class="si">}</span><span class="s2">: </span><span class="si">${</span><span class="nv">count</span><span class="si">}</span><span class="s2"> alarms&#34;</span>
</span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="k">done</span></span></span></code></pre></div><h3 id="漂移二死資源留下殘響">漂移二：死資源留下殘響</h3>
<p>資源砍了但 alarm 還在，orphan alarm 對不存在的 target 持續報 <code>INSUFFICIENT_DATA</code>，跟有效 alarm 混在同一個通知頻道裡，降低告警的訊噪比。訊噪比低到一定程度後，有效的 <code>INSUFFICIENT_DATA</code>（某個服務停止送 metric）也被一起略過 — 告警疲勞讓 alarm 從保護機制退化成背景噪音。</p>
<p>漂移二的成本不只是注意力。殘留的 alarm 會佔用 CloudWatch alarm 的配額（每個帳號有配額上限），大量孤兒 alarm 累積後，新服務要加 alarm 可能需要先清理舊的 — 這在事故當下是最不該花時間的事。</p>
<p>修法是把 alarm 的生命週期綁進 module：資源 destroy 時 alarm 跟著 destroy，不需要另一個流程去「記得清理」。如果因為歷史原因已經有大量孤兒 alarm，可以用 alarm 的 <code>StateValue</code> 為 <code>INSUFFICIENT_DATA</code> 且持續超過 7 天作為清理候選的篩選條件。</p>
<h2 id="log-group-設計">log group 設計</h2>
<p>Log group 是日誌的歸屬與保存單位，它要回答兩個治理問題：留多久（retention）、誰能讀（access control）。這兩個問題寫進 IaC 才能稽核，而非依賴 vendor 的隱性預設。</p>
<h3 id="retention三方取捨">Retention：三方取捨</h3>
<p>許多雲端服務在沒有明確宣告 log group 時會自動建一個、套上「永久保留」的預設值。永久保留的問題不是技術性的 — CloudWatch Logs 可以存到無限久 — 而是治理性的：日誌無限堆積、帳單緩慢長大，而沒有人做過「這條 log 該留多久」的顯式決定。</p>
<p>Retention 是成本、合規與除錯需求的三方取捨：</p>
<table>
  <thead>
      <tr>
          <th>日誌類型</th>
          <th>除錯需求</th>
          <th>合規需求</th>
          <th>建議 retention</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>應用 log（request、error）</td>
          <td>近 2-4 週</td>
          <td>通常無特殊要求</td>
          <td>14-30 天</td>
      </tr>
      <tr>
          <td>資料庫 slow query log</td>
          <td>近 1-2 週</td>
          <td>通常無特殊要求</td>
          <td>14 天</td>
      </tr>
      <tr>
          <td>存取稽核 log（CloudTrail）</td>
          <td>偶爾回溯</td>
          <td>1-7 年</td>
          <td>90-365 天 + 歸檔 S3</td>
      </tr>
      <tr>
          <td>金流 / 交易 log</td>
          <td>對帳用、偶爾</td>
          <td>依法規 3-7 年</td>
          <td>短期保留 + 長期歸檔</td>
      </tr>
  </tbody>
</table>
<p>較合理的做法是按日誌類型分層：高頻、除錯用的 application log 設短 retention，稽核相關的 access log 按合規要求設長期保留，必要時再把冷資料用 subscription filter 歸檔到更便宜的物件儲存（S3 + Glacier）。把這些值寫進 IaC，讓「為什麼這條 log 留 90 天」是一個能在 PR 上被討論的決定，而非某人半年前在 console 點的一個數字。成本參考：CloudWatch Logs 的儲存費用約 $0.03/GB/月。一個每天產生 10GB log 的服務，30 天 retention 的月費約 $9，7 天約 $2。retention 天數的選擇是合規需求（留多久才合規）與儲存成本的直接取捨，可以按 log 類型分層設定。</p>
<p>觀測平台的帳單在規模化後容易超線性成長，而缺乏 per-team cost attribution 的環境只能靠全域砍 retention 或降 sampling 來控制成本，兩者都會傷害觀測品質。把 log retention 跟 cardinality budget 的決定從全域級拆到團隊級（用 tag 歸因），才能做到「該省的省、該留的留」。這個取捨在 <a href="/blog/backend/04-observability/cases/observability-cost-governance-at-scale/" data-link-title="4.C14 觀測平台成本治理：從帳單驚嚇到可預測成本" data-link-desc="觀測帳單持續超線性成長時，用 cost attribution、cardinality budget、log tiering 跟 adaptive sampling 建立可預測成本模型。">4.C14 觀測平台成本治理</a> 有多家企業的具體經驗。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-hcl" data-lang="hcl"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="k">resource</span> <span class="s2">&#34;aws_cloudwatch_log_group&#34; &#34;api&#34;</span> {
</span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="n">  name</span>              <span class="o">=</span> <span class="s2">&#34;/app/${var.env}/api&#34;</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="n">  retention_in_days</span> <span class="o">=</span><span class="n"> var.env</span> <span class="o">==</span> <span class="s2">&#34;prod&#34;</span> <span class="err">?</span> <span class="m">30</span> <span class="err">:</span> <span class="m">7</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="n">  kms_key_id</span>        <span class="o">=</span> <span class="k">aws_kms_key</span><span class="p">.</span><span class="k">logs</span><span class="p">.</span><span class="k">arn</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">}
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="k">resource</span> <span class="s2">&#34;aws_cloudwatch_log_group&#34; &#34;audit&#34;</span> {
</span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="n">  name</span>              <span class="o">=</span> <span class="s2">&#34;/app/${var.env}/audit&#34;</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="n">  retention_in_days</span> <span class="o">=</span> <span class="m">365</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="n">  kms_key_id</span>        <span class="o">=</span> <span class="k">aws_kms_key</span><span class="p">.</span><span class="k">logs</span><span class="p">.</span><span class="k">arn</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">}</span></span></code></pre></div><p>Dev 環境的 retention 可以大幅縮短（7 天甚至 3 天），因為它不承擔合規責任，存取量也低，帳單節省直接對應這個差值。</p>
<h3 id="存取控制與加密">存取控制與加密</h3>
<p>「誰能讀」是 retention 之外的另一半。Log 經常夾帶 PII（使用者信箱、IP）、token 或內部結構，讀取權限要跟<a href="/blog/infra/02-identity-credentials/" data-link-title="模組二：身分與憑證地基 — IAM 與 OIDC" data-link-desc="IAM role / policy 設計、最小權限，以及用 OIDC 短期憑證取代長期 access key">模組二（身分與憑證地基）</a>建立的 IAM 角色一起管。</p>
<p>常見陷阱是 log 在傳輸與儲存都加密了（<code>kms_key_id</code> 有設），卻對整個團隊開放讀取。加密保護的是靜態資料不被未授權存取，但如果整個開發團隊都有 <code>logs:GetLogEvents</code> 權限，加密形同虛設 — read 權限應該縮到值班與稽核需要的最小集合。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-hcl" data-lang="hcl"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># 只允許 oncall role 讀取 prod log
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="c1"></span><span class="k">data</span> <span class="s2">&#34;aws_iam_policy_document&#34; &#34;log_read&#34;</span> {
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">  <span class="k">statement</span> {
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="n">    actions</span>   <span class="o">=</span> <span class="p">[</span><span class="s2">&#34;logs:GetLogEvents&#34;, &#34;logs:FilterLogEvents&#34;</span><span class="p">]</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="n">    resources</span> <span class="o">=</span> <span class="p">[</span><span class="k">aws_cloudwatch_log_group</span><span class="p">.</span><span class="k">api</span><span class="p">.</span><span class="k">arn</span><span class="p">]</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">  }
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">}
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="k">resource</span> <span class="s2">&#34;aws_iam_role_policy&#34; &#34;oncall_log_read&#34;</span> {
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="n">  role</span>   <span class="o">=</span> <span class="k">var</span><span class="p">.</span><span class="k">oncall_role_name</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="n">  policy</span> <span class="o">=</span> <span class="k">data</span><span class="p">.</span><span class="k">aws_iam_policy_document</span><span class="p">.</span><span class="k">log_read</span><span class="p">.</span><span class="k">json</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">}</span></span></code></pre></div><p>應用層該怎麼決定哪些欄位根本不該進 log（例如在 logger 層做 PII masking），屬於資料保護的範圍，見 <a href="/blog/backend/07-security-data-protection/" data-link-title="模組七：資安與資料保護" data-link-desc="以問題驅動方式擴充資安知識網：先定義服務環節問題，再以案例作為觸發式參考">backend 模組七：資安與資料保護</a>。</p>
<h2 id="metric-與-alarm-設計">metric 與 alarm 設計</h2>
<p>Metric 與 alarm 寫進 IaC，目的是讓「資源被建立的同時就帶著它的健康判準」。Alarm 是一份成文約定：哪條 metric、跨多長的評估窗口、超過什麼值要通知誰。把這份約定寫進 code，它就能被 review、被版本控制、被跨環境複用。</p>
<h3 id="症狀型-vs-成因型告警">症狀型 vs 成因型告警</h3>
<p>閾值設計是訊號與雜訊的取捨。告警可以分成兩類：症狀型（symptom-based）對應的是「使用者已經受影響」的指標 — 5xx 錯誤率、p99 延遲、佇列積壓。成因型（cause-based）對應的是「某個元件在劣化但使用者可能還沒感知」的指標 — CPU 使用率、記憶體使用率、磁碟 IOPS。</p>
<p>收益最高的起點是：症狀型設 alarm 並綁通知，成因型留在 dashboard 上作為診斷線索。理由是成因和症狀之間不一定有直接關係 — CPU 在 80% 不代表使用者受影響（可能 auto-scaling 正在長新節點），而 CPU 在 30% 也不代表安全（可能是某個 goroutine 卡住了，CPU 反而閒下來）。如果每個成因指標都獨立設 alarm，告警數量會與資源數量等比增長，訊噪比下降後症狀型告警容易被成因型告警淹沒。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-hcl" data-lang="hcl"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># 症狀型 alarm：5xx 超過閾值代表使用者已受影響
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="c1"></span><span class="k">resource</span> <span class="s2">&#34;aws_cloudwatch_metric_alarm&#34; &#34;api_5xx&#34;</span> {
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="n">  alarm_name</span>          <span class="o">=</span> <span class="s2">&#34;${var.env}-api-5xx-rate&#34;</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="n">  comparison_operator</span> <span class="o">=</span> <span class="s2">&#34;GreaterThanThreshold&#34;</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="n">  evaluation_periods</span>  <span class="o">=</span> <span class="m">3</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="n">  metric_name</span>         <span class="o">=</span> <span class="s2">&#34;5XXError&#34;</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="n">  namespace</span>           <span class="o">=</span> <span class="s2">&#34;AWS/ApiGateway&#34;</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="n">  period</span>              <span class="o">=</span> <span class="m">60</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="n">  statistic</span>           <span class="o">=</span> <span class="s2">&#34;Sum&#34;</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="n">  threshold</span>           <span class="o">=</span> <span class="m">10</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="n">  treat_missing_data</span>  <span class="o">=</span> <span class="s2">&#34;notBreaching&#34;</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="n">  alarm_actions</span>       <span class="o">=</span> <span class="p">[</span><span class="k">var</span><span class="p">.</span><span class="k">oncall_sns_arn</span><span class="p">]</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">}<span class="c1">
</span></span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="c1">
</span></span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="c1"># 成因型指標：CPU 放 dashboard、不設 alarm
</span></span></span><span class="line"><span class="ln">16</span><span class="cl"><span class="c1"></span><span class="err">#</span> <span class="k">除非確認</span><span class="err">「</span><span class="k">CPU</span> <span class="k">到</span> <span class="k">X</span><span class="err">%</span> <span class="k">一定代表服務即將不可用</span><span class="err">」</span><span class="k">這個因果關係</span></span></span></code></pre></div><p>當成因和症狀之間有明確的因果閾值（例如 RDS 磁碟用量到 90% 就會開始拒絕寫入），那條成因也值得設 alarm — 關鍵是因果關係要確認過、而非假設。</p>
<h3 id="insufficient_data-的處理">INSUFFICIENT_DATA 的處理</h3>
<p><code>treat_missing_data</code> 決定了「沒收到 metric 資料點」時 alarm 怎麼判定。這個設定常被忽略，但它在兩個情境下會造成顯著差異：</p>
<p><strong>持續有資料的 metric</strong>（如 API request count）：資料突然消失通常代表服務掛了或 metric 管線斷了，應該設 <code>treat_missing_data = &quot;breaching&quot;</code> — 沒資料本身就是異常訊號。</p>
<p><strong>間歇性的 metric</strong>（如錯誤 count、某個低頻 Lambda 的 invocation）：平常就沒有資料點，沒資料代表正常運作，應該設 <code>treat_missing_data = &quot;notBreaching&quot;</code> — 避免每次低谷時段都觸發假告警。</p>
<p>判讀方式是問自己：「這條 metric 如果 10 分鐘沒有任何資料，代表好事還是壞事？」好事用 <code>notBreaching</code>，壞事用 <code>breaching</code>，不確定用 <code>ignore</code>（不改變 alarm 狀態，等下一個有資料的評估週期再判定）。</p>
<h3 id="告警必須連到動作">告警必須連到動作</h3>
<p>一條有用的 alarm 至少要綁定通知去向。<code>alarm_actions</code> 為空的 alarm 只會在 CloudWatch console 裡變色，而事故發生時沒有人會盯著 console 看 — alarm 的價值在於它主動推送到值班的人手上。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-hcl" data-lang="hcl"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">resource</span> <span class="s2">&#34;aws_sns_topic&#34; &#34;oncall&#34;</span> {
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="n">  name</span> <span class="o">=</span> <span class="s2">&#34;${var.env}-oncall-alerts&#34;</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">}
</span></span><span class="line"><span class="ln">4</span><span class="cl">
</span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="k">resource</span> <span class="s2">&#34;aws_sns_topic_subscription&#34; &#34;pagerduty&#34;</span> {
</span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="n">  topic_arn</span> <span class="o">=</span> <span class="k">aws_sns_topic</span><span class="p">.</span><span class="k">oncall</span><span class="p">.</span><span class="k">arn</span>
</span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="n">  protocol</span>  <span class="o">=</span> <span class="s2">&#34;https&#34;</span>
</span></span><span class="line"><span class="ln">8</span><span class="cl"><span class="n">  endpoint</span>  <span class="o">=</span> <span class="k">var</span><span class="p">.</span><span class="k">pagerduty_integration_url</span>
</span></span><span class="line"><span class="ln">9</span><span class="cl">}</span></span></code></pre></div><p>通知去向也該寫進 IaC — SNS topic、subscription、整合端點都是基礎設施的一部分。手動建的 SNS subscription 跟手動建的 alarm 有同樣的問題：沒人記得、沒人維護、出事才發現斷了。</p>
<h3 id="把基礎告警做成-module-預設">把基礎告警做成 module 預設</h3>
<p>如果每次新服務上線都要有人「記得」去加 alarm，代表 alarm 還沒進 module 模板。把基礎告警（錯誤率、延遲、健康檢查失敗）做成服務模組的預設輸出，新服務 apply 時 alarm 跟著一起生出來：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-hcl" data-lang="hcl"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># modules/service/variables.tf
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="c1"></span><span class="k">variable</span> <span class="s2">&#34;alarm_5xx_threshold&#34;</span> {
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="n">  type</span>    <span class="o">=</span> <span class="k">number</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="n">  default</span> <span class="o">=</span> <span class="m">10</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">}
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="k">variable</span> <span class="s2">&#34;alarm_latency_p99_ms&#34;</span> {
</span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="n">  type</span>    <span class="o">=</span> <span class="k">number</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="n">  default</span> <span class="o">=</span> <span class="m">3000</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">}</span></span></code></pre></div><p>開新服務時 alarm 跟著資源一起生出來，調整閾值才是該服務 owner 的選配。預設值的選擇依據是「保守但不擾民」— 初始閾值設寬一點，上線穩定後再根據實際基線收斂。</p>
<p>觀測訊號的設計有一個容易忽略的盲區：aggregated metric 會遮蔽局部惡化。Discord 在三代儲存架構的遷移過程中反覆遇到同一個問題——整體 p95 延遲正常，但少數 hot partition 或大型群組的延遲已經飆升，直到使用者回報才發現。教訓是 alarm 的維度要跟業務的 fan-out 結構對齊，而非只看全域聚合。詳見 <a href="/blog/backend/04-observability/cases/discord-storage-growth-observability-gap/" data-link-title="4.C13 Discord：從儲存問題回推觀測缺口" data-link-desc="每次儲存遷移都暴露觀測盲區，把儲存成長問題重新框架為訊號設計問題。">4.C13 Discord：從儲存問題回推觀測缺口</a>。規模化後叢集的動態擴縮也會改變觀測模型——擴縮事件本身要成為觀測對象，見 <a href="/blog/backend/04-observability/cases/airbnb-observability-k8s-scale-signals/" data-link-title="4.C8 Airbnb：Kubernetes 規模化下的觀測訊號治理" data-link-desc="叢集擴縮與工作負載變動如何回寫觀測模型。">4.C8 Airbnb：K8s 規模化觀測訊號治理</a>。</p>
<h2 id="基礎設施訊號-vs-客戶端行為訊號">基礎設施訊號 vs 客戶端行為訊號</h2>
<p>本模組的可觀測性處理基礎設施訊號，<a href="/blog/monitoring/" data-link-title="監控實務指南" data-link-desc="整理非伺服器端運行時的監控體系 — 行為蒐集、錯誤回報、效能指標、生命週期追蹤，從自架方案到商業方案的完整知識路線">Monitoring 監控體系</a>處理客戶端與業務行為訊號。兩者觀測的對象不同、生命週期也不同，因此分屬不同的 code 與不同的部署管道。</p>
<p>基礎設施訊號是資源層的健康狀態：log group retention、CPU、佇列深度、5xx 比例、實例存活。它們跟著資源被 IaC 建立與銷毀，回答的問題是「這個系統還活著嗎、哪裡壞了」。</p>
<p>客戶端行為訊號則是 SDK、Collector、業務埋點那一層：使用者點了什麼、轉換漏斗在哪裡流失、前端 JS 錯誤率、自訂業務事件。它們跟著產品功能演進、不跟著基礎設施資源同生共滅。</p>
<p>判讀分界的問法是：這個訊號是「資源建立時就該存在」還是「功能開發時才埋」。前者進本模組的 IaC，後者進 monitoring 那層的應用程式碼。</p>
<p>兩者在事故排查時會合流 — 基礎設施 alarm 告訴值班「RDS CPU 飆到 95%」，客戶端訊號告訴產品團隊「結帳頁面的失敗率從 0.1% 跳到 12%」。把兩條訊號交叉比對才能判斷影響範圍。但它們的擁有者、變更節奏與部署管道不同 — 基礎設施 alarm 跟著 infra PR 走，前端埋點跟著產品 sprint 走。混在同一份 code 裡會讓「誰負責這條訊號的閾值」變模糊，也讓 infra PR 的 review 範圍擴大到不相干的業務邏輯。</p>
<h2 id="跨分類引用">跨分類引用</h2>
<ul>
<li>→ <a href="/blog/monitoring/" data-link-title="監控實務指南" data-link-desc="整理非伺服器端運行時的監控體系 — 行為蒐集、錯誤回報、效能指標、生命週期追蹤，從自架方案到商業方案的完整知識路線">monitoring 監控體系</a>：客戶端 SDK / Collector 那層的監控</li>
<li>→ <a href="/blog/infra/04-environment-separation/" data-link-title="模組四：環境分離與模組化" data-link-desc="dev / staging / prod 切分、目錄結構 vs workspace、用可重用 module 避免環境漂移">模組四：環境分離與模組化</a>：module 化在這裡延伸成「每個模組自帶 observability 宣告」</li>
<li>→ <a href="/blog/infra/05-core-services/" data-link-title="模組五：核心服務上 IaC" data-link-desc="資料庫、運算、儲存、load balancer 怎麼寫進基礎設施程式碼，以及上線順序">模組五：核心服務上 IaC</a>：每個核心服務帶自己的 log 與 alarm</li>
<li>→ <a href="/blog/infra/07-infra-as-pr/" data-link-title="模組七：infra 走 PR 流程與自動化護欄" data-link-desc="infra 變更走 PR → plan → review diff → 合併 → apply，配 fmt / validate / tflint / checkov / tfsec 與 Atlantis 自動化，讓基礎設施可審查、可回溯、可交接">模組七：infra 走 PR 流程</a>：observability 變更也走 PR 與自動化護欄</li>
<li>→ <a href="/blog/backend/07-security-data-protection/" data-link-title="模組七：資安與資料保護" data-link-desc="以問題驅動方式擴充資安知識網：先定義服務環節問題，再以案例作為觸發式參考">backend 模組七：資安與資料保護</a>：哪些欄位不該進 log、PII 處理</li>
</ul>
]]></content:encoded></item><item><title>三層 log 設計</title><link>https://tarrragon.github.io/blog/testing/02-client-observability/three-layer-log-design/</link><pubDate>Fri, 19 Jun 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/testing/02-client-observability/three-layer-log-design/</guid><description>&lt;p>客戶端 log 分成三層，每層記錄不同粒度的資訊，服務不同的 debug 場景。三層的區別在於回答的問題不同：連線生命週期回答「整體流程走到哪一步」，protocol 訊息回答「通訊細節是什麼」，使用者行為回答「使用者做了什麼操作」。&lt;/p>
&lt;h2 id="連線生命週期-log">連線生命週期 log&lt;/h2>
&lt;p>連線生命週期 log 記錄的是「流程走到第幾步、每步成功或失敗」。這一層的 log 粒度是步驟級 — 不記錄每一個封包或每一次函式呼叫，只記錄流程中的關鍵節點。&lt;/p>
&lt;p>以 app_tunnel 的連線流程為例，連線生命週期包含五步：biometric 認證 → credential 讀取 → WebSocket 連線 → auth token 發送 → stream 訂閱。每步完成時記一條 log，失敗時記一條包含原因的 log。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-text" data-lang="text">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">[conn] Step 1/5: biometric auth completed (duration: 320ms)
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">[conn] Step 2/5: credential loaded (user: admin)
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">[conn] Step 3/5: WebSocket connected (url: wss://...)
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">[conn] Step 4/5: auth token sent
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl">[conn] Step 5/5: stream subscribed, ready&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>app_tunnel 在實機測試前六個核心元件中只有兩個有 log，且全是 W2 修復時事後補上的（&lt;a href="https://tarrragon.github.io/blog/testing/cases/client-log-absent-debug-cost/" data-link-title="T.C4 Client-side log 缺失導致 debug 只能靠實機盲測" data-link-desc="Flutter app 六個核心元件中只有兩個有 log（且全是 W2 hotfix 補的），連線失敗時開發者無法從任何 log 判斷失敗發生在哪一步 — 被迫用最昂貴的 debug 方式：插拔裝置反覆測試">T.C4&lt;/a>）。W2-002 auth token 問題的 debug 過程中，開發者無法從任何 log 判斷失敗發生在五步中的哪一步。如果有連線生命週期 log，第一次連線就能看到「Step 3 完成，Step 4 未執行」— 直接定位到 auth token 缺失。&lt;/p>
&lt;p>連線生命週期 log 在所有模式（debug 和 release）都應該啟用。這層 log 量小（每次連線 5-10 條），不影響效能，但在 production 問題回報時是第一手資訊來源。&lt;/p>
&lt;h2 id="protocol-訊息-log">Protocol 訊息 log&lt;/h2>
&lt;p>Protocol 訊息 log 記錄的是通訊協議層面的細節：發送和接收的 frame type、payload 前綴、handshake 參數、逾時值。這一層的粒度比連線生命週期更細 — 每一次 send/receive 都記錄。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-text" data-lang="text">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">[proto] TX: text frame, payload: {&amp;#34;AuthToken&amp;#34;:&amp;#34;base64...&amp;#34;} (42 bytes)
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">[proto] RX: text frame, payload prefix: &amp;#34;0&amp;#34; (output data, 128 bytes)
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">[proto] TX: binary frame, payload: [72, 101, 108, 108, 111] (5 bytes)&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Protocol log 在 debug 時幫助確認「程式碼發送了什麼、收到了什麼」。app_tunnel 的 text/binary frame 問題（&lt;a href="https://tarrragon.github.io/blog/testing/cases/ws-text-binary-frame-mock-blindspot/" data-link-title="T.C1 WebSocket text/binary frame 被 FakeWebSocketChannel 遮蔽" data-link-desc="Flutter app 用 Uint8List 發送 WS 資料走 binary frame，ttyd 期望 text frame 靜默忽略 — FakeWebSocketChannel 的 sink.add 接受 dynamic 不區分 frame type，192 個 test 全過但實機無回應">T.C1&lt;/a>）如果有 protocol log，開發者會在 log 中看到 &lt;code>TX: binary frame&lt;/code> 而非預期的 &lt;code>TX: text frame&lt;/code> — 直接指向 frame type 問題。&lt;/p></description><content:encoded><![CDATA[<p>客戶端 log 分成三層，每層記錄不同粒度的資訊，服務不同的 debug 場景。三層的區別在於回答的問題不同：連線生命週期回答「整體流程走到哪一步」，protocol 訊息回答「通訊細節是什麼」，使用者行為回答「使用者做了什麼操作」。</p>
<h2 id="連線生命週期-log">連線生命週期 log</h2>
<p>連線生命週期 log 記錄的是「流程走到第幾步、每步成功或失敗」。這一層的 log 粒度是步驟級 — 不記錄每一個封包或每一次函式呼叫，只記錄流程中的關鍵節點。</p>
<p>以 app_tunnel 的連線流程為例，連線生命週期包含五步：biometric 認證 → credential 讀取 → WebSocket 連線 → auth token 發送 → stream 訂閱。每步完成時記一條 log，失敗時記一條包含原因的 log。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">[conn] Step 1/5: biometric auth completed (duration: 320ms)
</span></span><span class="line"><span class="ln">2</span><span class="cl">[conn] Step 2/5: credential loaded (user: admin)
</span></span><span class="line"><span class="ln">3</span><span class="cl">[conn] Step 3/5: WebSocket connected (url: wss://...)
</span></span><span class="line"><span class="ln">4</span><span class="cl">[conn] Step 4/5: auth token sent
</span></span><span class="line"><span class="ln">5</span><span class="cl">[conn] Step 5/5: stream subscribed, ready</span></span></code></pre></div><p>app_tunnel 在實機測試前六個核心元件中只有兩個有 log，且全是 W2 修復時事後補上的（<a href="/blog/testing/cases/client-log-absent-debug-cost/" data-link-title="T.C4 Client-side log 缺失導致 debug 只能靠實機盲測" data-link-desc="Flutter app 六個核心元件中只有兩個有 log（且全是 W2 hotfix 補的），連線失敗時開發者無法從任何 log 判斷失敗發生在哪一步 — 被迫用最昂貴的 debug 方式：插拔裝置反覆測試">T.C4</a>）。W2-002 auth token 問題的 debug 過程中，開發者無法從任何 log 判斷失敗發生在五步中的哪一步。如果有連線生命週期 log，第一次連線就能看到「Step 3 完成，Step 4 未執行」— 直接定位到 auth token 缺失。</p>
<p>連線生命週期 log 在所有模式（debug 和 release）都應該啟用。這層 log 量小（每次連線 5-10 條），不影響效能，但在 production 問題回報時是第一手資訊來源。</p>
<h2 id="protocol-訊息-log">Protocol 訊息 log</h2>
<p>Protocol 訊息 log 記錄的是通訊協議層面的細節：發送和接收的 frame type、payload 前綴、handshake 參數、逾時值。這一層的粒度比連線生命週期更細 — 每一次 send/receive 都記錄。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">[proto] TX: text frame, payload: {&#34;AuthToken&#34;:&#34;base64...&#34;} (42 bytes)
</span></span><span class="line"><span class="ln">2</span><span class="cl">[proto] RX: text frame, payload prefix: &#34;0&#34; (output data, 128 bytes)
</span></span><span class="line"><span class="ln">3</span><span class="cl">[proto] TX: binary frame, payload: [72, 101, 108, 108, 111] (5 bytes)</span></span></code></pre></div><p>Protocol log 在 debug 時幫助確認「程式碼發送了什麼、收到了什麼」。app_tunnel 的 text/binary frame 問題（<a href="/blog/testing/cases/ws-text-binary-frame-mock-blindspot/" data-link-title="T.C1 WebSocket text/binary frame 被 FakeWebSocketChannel 遮蔽" data-link-desc="Flutter app 用 Uint8List 發送 WS 資料走 binary frame，ttyd 期望 text frame 靜默忽略 — FakeWebSocketChannel 的 sink.add 接受 dynamic 不區分 frame type，192 個 test 全過但實機無回應">T.C1</a>）如果有 protocol log，開發者會在 log 中看到 <code>TX: binary frame</code> 而非預期的 <code>TX: text frame</code> — 直接指向 frame type 問題。</p>
<p>Protocol log 在 release mode 應該能關閉。這層 log 量大（每次鍵盤輸入一條），且 payload 可能包含敏感資訊。Debug mode 預設啟用，release mode 提供開關（例如隱藏設定頁的 toggle）讓進階使用者在回報問題時開啟。</p>
<h2 id="使用者行為-log">使用者行為 log</h2>
<p>使用者行為 log 記錄的是使用者在 UI 上的操作：按鈕點擊、畫面切換、設定變更。這層 log 的粒度是操作級 — 使用者做了一個有意義的動作記一條。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">[ui] screen: HomeScreen, action: tap Connect Terminal
</span></span><span class="line"><span class="ln">2</span><span class="cl">[ui] screen: TerminalScreen, state: connecting → connected
</span></span><span class="line"><span class="ln">3</span><span class="cl">[ui] screen: TerminalScreen, action: tap back button
</span></span><span class="line"><span class="ln">4</span><span class="cl">[ui] screen: HomeScreen, state: returned from terminal</span></span></code></pre></div><p>使用者行為 log 在兩個場景有價值：第一，debug 時還原使用者操作路徑 — 「使用者做了什麼導致問題出現」；第二，結合狀態矩陣（<a href="/blog/ux-design/01-screen-state-machine/" data-link-title="模組一：畫面狀態機設計" data-link-desc="畫面狀態矩陣（顯示 / 操作 / 進入 / 退出）— 退出路徑為空 = UX 死胡同">ux-design 模組一</a>）做狀態轉換的實際覆蓋率分析 — 哪些狀態轉換在真實使用中經常發生，哪些從未發生。</p>
<p>使用者行為 log 在 release mode 啟用時需要注意隱私。記錄「使用者切換了畫面」是合理的；記錄「使用者輸入了密碼 abc123」需要 redaction 機制（<a href="/blog/monitoring/07-security-privacy/" data-link-title="模組七：資安與隱私" data-link-desc="SDK redaction / transport 加密 / collector access control / 去識別化 — 蒐集的資料本身就是風險資產">monitoring 模組七 資安</a>）。</p>
<h2 id="三層的關係">三層的關係</h2>
<p>三層 log 各自獨立運作，debug 時通常按照從粗到細的順序使用。</p>
<p><strong>粗篩</strong>：先看連線生命週期 log，確認流程走到哪一步。如果 Step 3 失敗，問題在 WebSocket 連線層。</p>
<p><strong>細查</strong>：切到 protocol 訊息 log，看 Step 3 的連線嘗試中發送和接收了什麼。如果看到 binary frame 發送但沒有回應，問題可能在 frame type。</p>
<p><strong>還原</strong>：如果問題和使用者操作有關（例如只在特定操作順序下觸發），看使用者行為 log，還原操作路徑。</p>
<p>三層 log 用同一個時間戳和 correlation ID（例如連線 session ID），讓跨層比對可行。</p>
<h2 id="下一步路由">下一步路由</h2>
<ul>
<li>在功能規格中定義 log 點 → <a href="/blog/testing/02-client-observability/log-point-in-spec/" data-link-title="功能規格中的 log 點定義方法" data-link-desc="把 log 點設計從 debug 階段前移到功能規格階段 — 每個功能的規格文件新增可觀測性欄位，列出啟動 / 步驟 / 錯誤 / 完成四類 log 點">功能規格中的 log 點定義方法</a></li>
<li>事後補 log 和設計產物 log 的品質差異 → <a href="/blog/testing/02-client-observability/hotfix-log-vs-designed-log/" data-link-title="「事後補 log」vs「設計產物 log」的品質差異" data-link-desc="事後補的 log 是救火工具、設計產物的 log 是可觀測性基礎設施 — 從 app_tunnel 的 W2 hotfix log 拆解兩者在格式、覆蓋率、維護成本上的差異">「事後補 log」vs「設計產物 log」的品質差異</a></li>
<li>Log 收集方案選擇 → <a href="/blog/testing/02-client-observability/log-endpoint-tradeoff/" data-link-title="自架 log endpoint vs 商業方案的取捨判斷" data-link-desc="自用工具用自架 log receiver（20 行 Go &#43; grep）、商業 app 用 Sentry/Crashlytics — 判斷依據是使用者規模和 debug 需求">自架 log endpoint vs 商業方案</a></li>
<li>事件分類與收集策略 → <a href="/blog/monitoring/01-mental-model/" data-link-title="模組一：監控心智模型" data-link-desc="四類事件（event / error / metric / lifecycle）的分類與收集策略">monitoring 模組一 監控心智模型</a></li>
</ul>
]]></content:encoded></item><item><title>功能規格中的 log 點定義方法</title><link>https://tarrragon.github.io/blog/testing/02-client-observability/log-point-in-spec/</link><pubDate>Fri, 19 Jun 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/testing/02-client-observability/log-point-in-spec/</guid><description>&lt;p>Log 點定義是功能規格的一部分，和 API schema 同級。功能規格描述「這個功能做什麼」，log 點規格描述「這個功能執行時留下什麼可觀察的紀錄」。把 log 點設計前移到規格階段，讓 log 成為功能的設計產物，而非事後的 debug 工具（本章合成，TF-9 Derive）。&lt;/p>
&lt;h2 id="四類-log-點">四類 log 點&lt;/h2>
&lt;p>每個功能的 log 點按執行時機分成四類。&lt;/p>
&lt;h3 id="啟動-log">啟動 log&lt;/h3>
&lt;p>功能開始執行時記錄。回答「這個功能是否被觸發了」。&lt;/p>
&lt;p>啟動 log 包含觸發來源（使用者操作、系統排程、外部事件）和初始參數（連線目標、操作類型）。如果一個功能從未被觸發，啟動 log 的缺席就是線索。&lt;/p>
&lt;h3 id="步驟-log">步驟 log&lt;/h3>
&lt;p>功能執行過程中的每個關鍵步驟完成時記錄。回答「流程走到哪裡了」。&lt;/p>
&lt;p>步驟 log 的粒度依功能複雜度而定。三步驟的功能每步記一條；十步驟的功能可以只記關鍵的三到五步。判斷標準是：如果這一步失敗，開發者是否需要知道失敗點在哪。&lt;/p>
&lt;h3 id="錯誤-log">錯誤 log&lt;/h3>
&lt;p>步驟失敗、例外捕獲、非預期狀態出現時記錄。回答「出了什麼問題」。&lt;/p>
&lt;p>錯誤 log 必須包含足夠的 context 讓開發者不需要重現問題就能判斷原因。至少包含：哪一步失敗、失敗原因（error message）、當時的關鍵狀態值。&lt;/p>
&lt;h3 id="完成-log">完成 log&lt;/h3>
&lt;p>功能正常結束時記錄。回答「功能是否成功完成、花了多久」。&lt;/p>
&lt;p>完成 log 包含執行結果和耗時。和啟動 log 配對使用 — 有啟動但沒有完成代表功能中途異常退出。&lt;/p>
&lt;h2 id="在功能規格中加可觀測性欄位">在功能規格中加可觀測性欄位&lt;/h2>
&lt;p>以 app_tunnel 的「連線到 ttyd 終端機」功能為例，傳統規格只寫：&lt;/p>
&lt;ul>
&lt;li>輸入：使用者選擇的伺服器&lt;/li>
&lt;li>處理：建立 WebSocket 連線、發送 auth token、開始接收 terminal output&lt;/li>
&lt;li>輸出：終端機畫面顯示 terminal output&lt;/li>
&lt;/ul>
&lt;p>加上可觀測性欄位後：&lt;/p>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>類型&lt;/th>
 &lt;th>log 點&lt;/th>
 &lt;th>內容&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>啟動&lt;/td>
 &lt;td>connect.start&lt;/td>
 &lt;td>目標 URL、觸發來源（使用者操作 / 自動重連）&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>步驟&lt;/td>
 &lt;td>connect.biometric.done&lt;/td>
 &lt;td>認證結果、耗時&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>步驟&lt;/td>
 &lt;td>connect.credential.loaded&lt;/td>
 &lt;td>使用者名稱（密碼 redact）&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>步驟&lt;/td>
 &lt;td>connect.ws.connected&lt;/td>
 &lt;td>連線 URL、耗時&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>步驟&lt;/td>
 &lt;td>connect.auth.sent&lt;/td>
 &lt;td>token 長度（內容 redact）&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>步驟&lt;/td>
 &lt;td>connect.stream.subscribed&lt;/td>
 &lt;td>stream 狀態&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>錯誤&lt;/td>
 &lt;td>connect.{step}.failed&lt;/td>
 &lt;td>失敗步驟、error message、retry count&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>完成&lt;/td>
 &lt;td>connect.done&lt;/td>
 &lt;td>總耗時、最終狀態&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>這張表在功能規格階段就能寫出來，因為它只依賴功能的流程設計，不依賴實作細節。功能流程確定後，每一步在哪裡需要 log 點就確定了。&lt;/p>
&lt;h2 id="log-點命名規則">log 點命名規則&lt;/h2>
&lt;p>統一的命名規則讓 log 可以被 grep、過濾和統計。&lt;/p>
&lt;p>&lt;strong>階層式命名&lt;/strong>：&lt;code>{功能}.{步驟}.{事件}&lt;/code>。例如 &lt;code>connect.ws.connected&lt;/code>、&lt;code>connect.auth.failed&lt;/code>。&lt;/p>
&lt;p>&lt;strong>事件後綴統一&lt;/strong>：&lt;code>start&lt;/code>（啟動）、&lt;code>done&lt;/code>（步驟完成）、&lt;code>failed&lt;/code>（失敗）、&lt;code>complete&lt;/code>（功能完成）。&lt;/p>
&lt;p>&lt;strong>和程式碼結構對應&lt;/strong>：log 點名稱對應到程式碼中的函式或模組。&lt;code>connect.biometric.done&lt;/code> 對應 &lt;code>BiometricService.authenticate()&lt;/code> 的成功路徑。這讓開發者看到 log 名稱就知道去哪裡找程式碼。&lt;/p>
&lt;h2 id="log-點規格的-review-檢查">log 點規格的 review 檢查&lt;/h2>
&lt;p>功能規格 review 時，可觀測性欄位的檢查要點：&lt;/p>
&lt;p>&lt;strong>每步都有 log&lt;/strong>：流程中的每個步驟在成功和失敗時都有對應的 log 點。遺漏的步驟意味著該步驟出問題時無法從 log 判斷。&lt;/p>
&lt;p>&lt;strong>錯誤 log 有足夠 context&lt;/strong>：error log 只寫「連線失敗」不夠；需要寫「連線失敗」加上 error code、目標 URL、已完成的步驟。&lt;/p>
&lt;p>&lt;strong>敏感欄位有 redaction 標記&lt;/strong>：密碼、token、個人資料在 log 規格中標記為 redact，實作時用 redaction 機制處理。&lt;/p>
&lt;p>&lt;strong>啟動和完成配對&lt;/strong>：每個功能有啟動 log 就應該有完成 log，形成完整的生命週期。&lt;/p>
&lt;h2 id="下一步路由">下一步路由&lt;/h2>
&lt;ul>
&lt;li>三層 log 的詳細設計 → &lt;a href="https://tarrragon.github.io/blog/testing/02-client-observability/three-layer-log-design/" data-link-title="三層 log 設計" data-link-desc="連線生命週期 log、protocol 訊息 log、使用者行為 log — 三層各自的職責、詳細程度和啟停控制">三層 log 設計&lt;/a>&lt;/li>
&lt;li>事後補 log 和設計產物 log 的差異 → &lt;a href="https://tarrragon.github.io/blog/testing/02-client-observability/hotfix-log-vs-designed-log/" data-link-title="「事後補 log」vs「設計產物 log」的品質差異" data-link-desc="事後補的 log 是救火工具、設計產物的 log 是可觀測性基礎設施 — 從 app_tunnel 的 W2 hotfix log 拆解兩者在格式、覆蓋率、維護成本上的差異">「事後補 log」vs「設計產物 log」的品質差異&lt;/a>&lt;/li>
&lt;li>Log 中的敏感資訊處理 → &lt;a href="https://tarrragon.github.io/blog/monitoring/07-security-privacy/" data-link-title="模組七：資安與隱私" data-link-desc="SDK redaction / transport 加密 / collector access control / 去識別化 — 蒐集的資料本身就是風險資產">monitoring 模組七 資安&lt;/a>&lt;/li>
&lt;/ul></description><content:encoded><![CDATA[<p>Log 點定義是功能規格的一部分，和 API schema 同級。功能規格描述「這個功能做什麼」，log 點規格描述「這個功能執行時留下什麼可觀察的紀錄」。把 log 點設計前移到規格階段，讓 log 成為功能的設計產物，而非事後的 debug 工具（本章合成，TF-9 Derive）。</p>
<h2 id="四類-log-點">四類 log 點</h2>
<p>每個功能的 log 點按執行時機分成四類。</p>
<h3 id="啟動-log">啟動 log</h3>
<p>功能開始執行時記錄。回答「這個功能是否被觸發了」。</p>
<p>啟動 log 包含觸發來源（使用者操作、系統排程、外部事件）和初始參數（連線目標、操作類型）。如果一個功能從未被觸發，啟動 log 的缺席就是線索。</p>
<h3 id="步驟-log">步驟 log</h3>
<p>功能執行過程中的每個關鍵步驟完成時記錄。回答「流程走到哪裡了」。</p>
<p>步驟 log 的粒度依功能複雜度而定。三步驟的功能每步記一條；十步驟的功能可以只記關鍵的三到五步。判斷標準是：如果這一步失敗，開發者是否需要知道失敗點在哪。</p>
<h3 id="錯誤-log">錯誤 log</h3>
<p>步驟失敗、例外捕獲、非預期狀態出現時記錄。回答「出了什麼問題」。</p>
<p>錯誤 log 必須包含足夠的 context 讓開發者不需要重現問題就能判斷原因。至少包含：哪一步失敗、失敗原因（error message）、當時的關鍵狀態值。</p>
<h3 id="完成-log">完成 log</h3>
<p>功能正常結束時記錄。回答「功能是否成功完成、花了多久」。</p>
<p>完成 log 包含執行結果和耗時。和啟動 log 配對使用 — 有啟動但沒有完成代表功能中途異常退出。</p>
<h2 id="在功能規格中加可觀測性欄位">在功能規格中加可觀測性欄位</h2>
<p>以 app_tunnel 的「連線到 ttyd 終端機」功能為例，傳統規格只寫：</p>
<ul>
<li>輸入：使用者選擇的伺服器</li>
<li>處理：建立 WebSocket 連線、發送 auth token、開始接收 terminal output</li>
<li>輸出：終端機畫面顯示 terminal output</li>
</ul>
<p>加上可觀測性欄位後：</p>
<table>
  <thead>
      <tr>
          <th>類型</th>
          <th>log 點</th>
          <th>內容</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>啟動</td>
          <td>connect.start</td>
          <td>目標 URL、觸發來源（使用者操作 / 自動重連）</td>
      </tr>
      <tr>
          <td>步驟</td>
          <td>connect.biometric.done</td>
          <td>認證結果、耗時</td>
      </tr>
      <tr>
          <td>步驟</td>
          <td>connect.credential.loaded</td>
          <td>使用者名稱（密碼 redact）</td>
      </tr>
      <tr>
          <td>步驟</td>
          <td>connect.ws.connected</td>
          <td>連線 URL、耗時</td>
      </tr>
      <tr>
          <td>步驟</td>
          <td>connect.auth.sent</td>
          <td>token 長度（內容 redact）</td>
      </tr>
      <tr>
          <td>步驟</td>
          <td>connect.stream.subscribed</td>
          <td>stream 狀態</td>
      </tr>
      <tr>
          <td>錯誤</td>
          <td>connect.{step}.failed</td>
          <td>失敗步驟、error message、retry count</td>
      </tr>
      <tr>
          <td>完成</td>
          <td>connect.done</td>
          <td>總耗時、最終狀態</td>
      </tr>
  </tbody>
</table>
<p>這張表在功能規格階段就能寫出來，因為它只依賴功能的流程設計，不依賴實作細節。功能流程確定後，每一步在哪裡需要 log 點就確定了。</p>
<h2 id="log-點命名規則">log 點命名規則</h2>
<p>統一的命名規則讓 log 可以被 grep、過濾和統計。</p>
<p><strong>階層式命名</strong>：<code>{功能}.{步驟}.{事件}</code>。例如 <code>connect.ws.connected</code>、<code>connect.auth.failed</code>。</p>
<p><strong>事件後綴統一</strong>：<code>start</code>（啟動）、<code>done</code>（步驟完成）、<code>failed</code>（失敗）、<code>complete</code>（功能完成）。</p>
<p><strong>和程式碼結構對應</strong>：log 點名稱對應到程式碼中的函式或模組。<code>connect.biometric.done</code> 對應 <code>BiometricService.authenticate()</code> 的成功路徑。這讓開發者看到 log 名稱就知道去哪裡找程式碼。</p>
<h2 id="log-點規格的-review-檢查">log 點規格的 review 檢查</h2>
<p>功能規格 review 時，可觀測性欄位的檢查要點：</p>
<p><strong>每步都有 log</strong>：流程中的每個步驟在成功和失敗時都有對應的 log 點。遺漏的步驟意味著該步驟出問題時無法從 log 判斷。</p>
<p><strong>錯誤 log 有足夠 context</strong>：error log 只寫「連線失敗」不夠；需要寫「連線失敗」加上 error code、目標 URL、已完成的步驟。</p>
<p><strong>敏感欄位有 redaction 標記</strong>：密碼、token、個人資料在 log 規格中標記為 redact，實作時用 redaction 機制處理。</p>
<p><strong>啟動和完成配對</strong>：每個功能有啟動 log 就應該有完成 log，形成完整的生命週期。</p>
<h2 id="下一步路由">下一步路由</h2>
<ul>
<li>三層 log 的詳細設計 → <a href="/blog/testing/02-client-observability/three-layer-log-design/" data-link-title="三層 log 設計" data-link-desc="連線生命週期 log、protocol 訊息 log、使用者行為 log — 三層各自的職責、詳細程度和啟停控制">三層 log 設計</a></li>
<li>事後補 log 和設計產物 log 的差異 → <a href="/blog/testing/02-client-observability/hotfix-log-vs-designed-log/" data-link-title="「事後補 log」vs「設計產物 log」的品質差異" data-link-desc="事後補的 log 是救火工具、設計產物的 log 是可觀測性基礎設施 — 從 app_tunnel 的 W2 hotfix log 拆解兩者在格式、覆蓋率、維護成本上的差異">「事後補 log」vs「設計產物 log」的品質差異</a></li>
<li>Log 中的敏感資訊處理 → <a href="/blog/monitoring/07-security-privacy/" data-link-title="模組七：資安與隱私" data-link-desc="SDK redaction / transport 加密 / collector access control / 去識別化 — 蒐集的資料本身就是風險資產">monitoring 模組七 資安</a></li>
</ul>
]]></content:encoded></item><item><title>模組二：客戶端可觀測性</title><link>https://tarrragon.github.io/blog/testing/02-client-observability/</link><pubDate>Fri, 19 Jun 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/testing/02-client-observability/</guid><description>&lt;p>回答「使用者的裝置上發生了什麼事」。log 設計應在功能規格階段完成，跟 API schema 同級。&lt;/p>
&lt;h2 id="對應-findings">對應 findings&lt;/h2>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>Finding&lt;/th>
 &lt;th>來源&lt;/th>
 &lt;th>內容&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>TF-6&lt;/td>
 &lt;td>&lt;a href="https://tarrragon.github.io/blog/testing/cases/client-log-absent-debug-cost/" data-link-title="T.C4 Client-side log 缺失導致 debug 只能靠實機盲測" data-link-desc="Flutter app 六個核心元件中只有兩個有 log（且全是 W2 hotfix 補的），連線失敗時開發者無法從任何 log 判斷失敗發生在哪一步 — 被迫用最昂貴的 debug 方式：插拔裝置反覆測試">T.C4&lt;/a>&lt;/td>
 &lt;td>6 元件中 4 個零 log，2 個全是 W2 hotfix&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>TF-7&lt;/td>
 &lt;td>&lt;a href="https://tarrragon.github.io/blog/testing/cases/client-log-absent-debug-cost/" data-link-title="T.C4 Client-side log 缺失導致 debug 只能靠實機盲測" data-link-desc="Flutter app 六個核心元件中只有兩個有 log（且全是 W2 hotfix 補的），連線失敗時開發者無法從任何 log 判斷失敗發生在哪一步 — 被迫用最昂貴的 debug 方式：插拔裝置反覆測試">T.C4&lt;/a>&lt;/td>
 &lt;td>事後補的 developer.log 格式不統一&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>TF-9&lt;/td>
 &lt;td>&lt;a href="https://tarrragon.github.io/blog/testing/cases/client-log-absent-debug-cost/" data-link-title="T.C4 Client-side log 缺失導致 debug 只能靠實機盲測" data-link-desc="Flutter app 六個核心元件中只有兩個有 log（且全是 W2 hotfix 補的），連線失敗時開發者無法從任何 log 判斷失敗發生在哪一步 — 被迫用最昂貴的 debug 方式：插拔裝置反覆測試">T.C4&lt;/a>&lt;/td>
 &lt;td>log 設計應在功能規格階段完成 — &lt;strong>本模組主寫&lt;/strong>&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;h2 id="待寫章節">待寫章節&lt;/h2>
&lt;ul>
&lt;li>&lt;input checked="" disabled="" type="checkbox"> 三層 log 設計（連線生命週期 / protocol 訊息 / 使用者行為）&lt;/li>
&lt;li>&lt;input checked="" disabled="" type="checkbox"> 功能規格中的 log 點定義方法&lt;/li>
&lt;li>&lt;input checked="" disabled="" type="checkbox"> 自架 log endpoint vs 商業方案的取捨判斷&lt;/li>
&lt;li>&lt;input checked="" disabled="" type="checkbox"> 「事後補 log」vs「設計產物 log」的品質差異&lt;/li>
&lt;/ul>
&lt;h2 id="跨分類引用">跨分類引用&lt;/h2>
&lt;ul>
&lt;li>→ &lt;a href="https://tarrragon.github.io/blog/monitoring/02-log-schema/" data-link-title="模組二：Log Schema 設計" data-link-desc="跨平台統一事件格式、欄位設計、版本演進策略">monitoring 模組二 Log Schema&lt;/a>：本模組教「設計 log 點」，monitoring 教「log 收集到之後怎麼處理」&lt;/li>
&lt;li>→ &lt;a href="https://tarrragon.github.io/blog/monitoring/07-security-privacy/" data-link-title="模組七：資安與隱私" data-link-desc="SDK redaction / transport 加密 / collector access control / 去識別化 — 蒐集的資料本身就是風險資產">monitoring 模組七 資安&lt;/a>：log 內容可能含 secret，SDK redaction 在這裡介入&lt;/li>
&lt;li>← &lt;a href="https://tarrragon.github.io/blog/ux-design/01-screen-state-machine/" data-link-title="模組一：畫面狀態機設計" data-link-desc="畫面狀態矩陣（顯示 / 操作 / 進入 / 退出）— 退出路徑為空 = UX 死胡同">ux-design 模組一&lt;/a>：狀態矩陣可加「可觀測性」欄位&lt;/li>
&lt;/ul></description><content:encoded><![CDATA[<p>回答「使用者的裝置上發生了什麼事」。log 設計應在功能規格階段完成，跟 API schema 同級。</p>
<h2 id="對應-findings">對應 findings</h2>
<table>
  <thead>
      <tr>
          <th>Finding</th>
          <th>來源</th>
          <th>內容</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>TF-6</td>
          <td><a href="/blog/testing/cases/client-log-absent-debug-cost/" data-link-title="T.C4 Client-side log 缺失導致 debug 只能靠實機盲測" data-link-desc="Flutter app 六個核心元件中只有兩個有 log（且全是 W2 hotfix 補的），連線失敗時開發者無法從任何 log 判斷失敗發生在哪一步 — 被迫用最昂貴的 debug 方式：插拔裝置反覆測試">T.C4</a></td>
          <td>6 元件中 4 個零 log，2 個全是 W2 hotfix</td>
      </tr>
      <tr>
          <td>TF-7</td>
          <td><a href="/blog/testing/cases/client-log-absent-debug-cost/" data-link-title="T.C4 Client-side log 缺失導致 debug 只能靠實機盲測" data-link-desc="Flutter app 六個核心元件中只有兩個有 log（且全是 W2 hotfix 補的），連線失敗時開發者無法從任何 log 判斷失敗發生在哪一步 — 被迫用最昂貴的 debug 方式：插拔裝置反覆測試">T.C4</a></td>
          <td>事後補的 developer.log 格式不統一</td>
      </tr>
      <tr>
          <td>TF-9</td>
          <td><a href="/blog/testing/cases/client-log-absent-debug-cost/" data-link-title="T.C4 Client-side log 缺失導致 debug 只能靠實機盲測" data-link-desc="Flutter app 六個核心元件中只有兩個有 log（且全是 W2 hotfix 補的），連線失敗時開發者無法從任何 log 判斷失敗發生在哪一步 — 被迫用最昂貴的 debug 方式：插拔裝置反覆測試">T.C4</a></td>
          <td>log 設計應在功能規格階段完成 — <strong>本模組主寫</strong></td>
      </tr>
  </tbody>
</table>
<h2 id="待寫章節">待寫章節</h2>
<ul>
<li><input checked="" disabled="" type="checkbox"> 三層 log 設計（連線生命週期 / protocol 訊息 / 使用者行為）</li>
<li><input checked="" disabled="" type="checkbox"> 功能規格中的 log 點定義方法</li>
<li><input checked="" disabled="" type="checkbox"> 自架 log endpoint vs 商業方案的取捨判斷</li>
<li><input checked="" disabled="" type="checkbox"> 「事後補 log」vs「設計產物 log」的品質差異</li>
</ul>
<h2 id="跨分類引用">跨分類引用</h2>
<ul>
<li>→ <a href="/blog/monitoring/02-log-schema/" data-link-title="模組二：Log Schema 設計" data-link-desc="跨平台統一事件格式、欄位設計、版本演進策略">monitoring 模組二 Log Schema</a>：本模組教「設計 log 點」，monitoring 教「log 收集到之後怎麼處理」</li>
<li>→ <a href="/blog/monitoring/07-security-privacy/" data-link-title="模組七：資安與隱私" data-link-desc="SDK redaction / transport 加密 / collector access control / 去識別化 — 蒐集的資料本身就是風險資產">monitoring 模組七 資安</a>：log 內容可能含 secret，SDK redaction 在這裡介入</li>
<li>← <a href="/blog/ux-design/01-screen-state-machine/" data-link-title="模組一：畫面狀態機設計" data-link-desc="畫面狀態矩陣（顯示 / 操作 / 進入 / 退出）— 退出路徑為空 = UX 死胡同">ux-design 模組一</a>：狀態矩陣可加「可觀測性」欄位</li>
</ul>
]]></content:encoded></item><item><title>自架 log endpoint vs 商業方案的取捨判斷</title><link>https://tarrragon.github.io/blog/testing/02-client-observability/log-endpoint-tradeoff/</link><pubDate>Fri, 19 Jun 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/testing/02-client-observability/log-endpoint-tradeoff/</guid><description>&lt;p>Log 收集方案的選擇取決於兩個因素：使用者在哪裡（同機 / 同網段 / 外部網路），以及 log 的消費者是誰（開發者自己 / 維運團隊 / 客服團隊）。自用工具和商業產品對這兩個因素的答案不同，適合不同的方案。&lt;/p>
&lt;h2 id="自架-log-endpoint-的適用場景">自架 log endpoint 的適用場景&lt;/h2>
&lt;p>自架 log endpoint 適合的前提是：client 和 server 在同一個網路內（同機、同 LAN、同 VPN/tailnet），log 的唯一消費者是開發者本人。&lt;/p>
&lt;p>app_tunnel 就是這個場景。Server（ttyd）和 client（Flutter app）在同一台機器或同一個 Tailscale tailnet 內。開發者同時是使用者和維運者。Log 的消費方式是 grep — 不需要 dashboard、不需要告警、不需要多人共享。&lt;/p>
&lt;p>在這個場景下，自架 log endpoint 的成本遠低於商業方案。一個 Go 程式開 HTTP endpoint 接收 JSON log 寫入檔案，20 行程式碼就能完成。Client 端的 &lt;code>AppLogger&lt;/code> 在 debug mode 同時寫 console 和 POST 到 endpoint。Debug 時用 &lt;code>grep&lt;/code> + &lt;code>jq&lt;/code> 查詢，不需要額外工具。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-text" data-lang="text">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">Client (Flutter) → HTTP POST /log → Go receiver → JSON file → grep/jq&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>這個方案沒有外部依賴、沒有帳號管理、沒有費用、沒有資料隱私顧慮（log 不離開本機網路）。&lt;/p>
&lt;h2 id="商業方案的適用場景">商業方案的適用場景&lt;/h2>
&lt;p>商業方案（Sentry、Crashlytics、Datadog）適合的前提是：使用者分佈在外部網路，log 的消費者包含非開發者（維運、客服、產品），且需要告警和趨勢分析。&lt;/p>
&lt;p>商業方案提供的能力包括：跨網路收集（SDK 自動處理網路不穩定和批次傳輸）、多人查看 dashboard、告警規則設定、crash 報告自動分群、用戶 session 重播。這些能力在自用工具場景下不需要，在商業產品場景下是基礎需求。&lt;/p>
&lt;p>商業方案的成本包括：SDK 整合和設定、帳號和權限管理、月費（依事件量計費）、資料隱私合規（log 傳到第三方伺服器）。&lt;/p>
&lt;h2 id="判斷流程">判斷流程&lt;/h2>
&lt;h3 id="使用者在哪裡">使用者在哪裡&lt;/h3>
&lt;p>使用者和 server 在同一個網路內（自用工具、內部工具、開發期測試）→ 自架 log endpoint 是成本最低的選擇。&lt;/p>
&lt;p>使用者在外部網路（上架 app store、SaaS 產品、B2B 部署）→ 商業方案的跨網路收集能力是必要的，自架需要處理的 edge case（離線緩衝、重試、批次傳輸）太多。&lt;/p>
&lt;h3 id="log-消費者是誰">Log 消費者是誰&lt;/h3>
&lt;p>只有開發者自己 → grep/jq 足夠，不需要 dashboard。&lt;/p>
&lt;p>包含非技術人員（客服、產品經理）→ 需要視覺化 dashboard 和搜尋介面，商業方案的 UI 是這個需求的標準答案。&lt;/p>
&lt;h3 id="是否需要告警">是否需要告警&lt;/h3>
&lt;p>開發者自己用、即時看 log → 不需要告警。&lt;/p>
&lt;p>有維運值班、需要被動發現問題 → 需要告警規則，商業方案內建。&lt;/p>
&lt;h2 id="混合方案">混合方案&lt;/h2>
&lt;p>開發期用自架 log endpoint（零成本、即時可用），production 切換到商業方案 — 這個策略可行的前提是 log 層的 API 設計足夠抽象。&lt;/p>
&lt;p>&lt;code>AppLogger&lt;/code> 提供統一的 log 介面（&lt;code>log(level, name, data)&lt;/code>），底層實作在 debug mode 寫 console + POST 到本機 endpoint，在 release mode 寫 console + 呼叫 Sentry/Crashlytics SDK。切換只改 &lt;code>AppLogger&lt;/code> 的底層實作，不改呼叫端。&lt;/p>
&lt;p>這個抽象的投資在自用工具階段就值得做 — 即使目前不需要商業方案，統一的 log 介面也讓 log 點的管理更一致。&lt;/p>
&lt;h2 id="下一步路由">下一步路由&lt;/h2>
&lt;ul>
&lt;li>三層 log 的詳細設計 → &lt;a href="https://tarrragon.github.io/blog/testing/02-client-observability/three-layer-log-design/" data-link-title="三層 log 設計" data-link-desc="連線生命週期 log、protocol 訊息 log、使用者行為 log — 三層各自的職責、詳細程度和啟停控制">三層 log 設計&lt;/a>&lt;/li>
&lt;li>在功能規格中定義 log 點 → &lt;a href="https://tarrragon.github.io/blog/testing/02-client-observability/log-point-in-spec/" data-link-title="功能規格中的 log 點定義方法" data-link-desc="把 log 點設計從 debug 階段前移到功能規格階段 — 每個功能的規格文件新增可觀測性欄位，列出啟動 / 步驟 / 錯誤 / 完成四類 log 點">功能規格中的 log 點定義方法&lt;/a>&lt;/li>
&lt;li>Log 收集後的 schema 設計 → &lt;a href="https://tarrragon.github.io/blog/monitoring/02-log-schema/" data-link-title="模組二：Log Schema 設計" data-link-desc="跨平台統一事件格式、欄位設計、版本演進策略">monitoring 模組二 Log Schema&lt;/a>&lt;/li>
&lt;/ul></description><content:encoded><![CDATA[<p>Log 收集方案的選擇取決於兩個因素：使用者在哪裡（同機 / 同網段 / 外部網路），以及 log 的消費者是誰（開發者自己 / 維運團隊 / 客服團隊）。自用工具和商業產品對這兩個因素的答案不同，適合不同的方案。</p>
<h2 id="自架-log-endpoint-的適用場景">自架 log endpoint 的適用場景</h2>
<p>自架 log endpoint 適合的前提是：client 和 server 在同一個網路內（同機、同 LAN、同 VPN/tailnet），log 的唯一消費者是開發者本人。</p>
<p>app_tunnel 就是這個場景。Server（ttyd）和 client（Flutter app）在同一台機器或同一個 Tailscale tailnet 內。開發者同時是使用者和維運者。Log 的消費方式是 grep — 不需要 dashboard、不需要告警、不需要多人共享。</p>
<p>在這個場景下，自架 log endpoint 的成本遠低於商業方案。一個 Go 程式開 HTTP endpoint 接收 JSON log 寫入檔案，20 行程式碼就能完成。Client 端的 <code>AppLogger</code> 在 debug mode 同時寫 console 和 POST 到 endpoint。Debug 時用 <code>grep</code> + <code>jq</code> 查詢，不需要額外工具。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">Client (Flutter) → HTTP POST /log → Go receiver → JSON file → grep/jq</span></span></code></pre></div><p>這個方案沒有外部依賴、沒有帳號管理、沒有費用、沒有資料隱私顧慮（log 不離開本機網路）。</p>
<h2 id="商業方案的適用場景">商業方案的適用場景</h2>
<p>商業方案（Sentry、Crashlytics、Datadog）適合的前提是：使用者分佈在外部網路，log 的消費者包含非開發者（維運、客服、產品），且需要告警和趨勢分析。</p>
<p>商業方案提供的能力包括：跨網路收集（SDK 自動處理網路不穩定和批次傳輸）、多人查看 dashboard、告警規則設定、crash 報告自動分群、用戶 session 重播。這些能力在自用工具場景下不需要，在商業產品場景下是基礎需求。</p>
<p>商業方案的成本包括：SDK 整合和設定、帳號和權限管理、月費（依事件量計費）、資料隱私合規（log 傳到第三方伺服器）。</p>
<h2 id="判斷流程">判斷流程</h2>
<h3 id="使用者在哪裡">使用者在哪裡</h3>
<p>使用者和 server 在同一個網路內（自用工具、內部工具、開發期測試）→ 自架 log endpoint 是成本最低的選擇。</p>
<p>使用者在外部網路（上架 app store、SaaS 產品、B2B 部署）→ 商業方案的跨網路收集能力是必要的，自架需要處理的 edge case（離線緩衝、重試、批次傳輸）太多。</p>
<h3 id="log-消費者是誰">Log 消費者是誰</h3>
<p>只有開發者自己 → grep/jq 足夠，不需要 dashboard。</p>
<p>包含非技術人員（客服、產品經理）→ 需要視覺化 dashboard 和搜尋介面，商業方案的 UI 是這個需求的標準答案。</p>
<h3 id="是否需要告警">是否需要告警</h3>
<p>開發者自己用、即時看 log → 不需要告警。</p>
<p>有維運值班、需要被動發現問題 → 需要告警規則，商業方案內建。</p>
<h2 id="混合方案">混合方案</h2>
<p>開發期用自架 log endpoint（零成本、即時可用），production 切換到商業方案 — 這個策略可行的前提是 log 層的 API 設計足夠抽象。</p>
<p><code>AppLogger</code> 提供統一的 log 介面（<code>log(level, name, data)</code>），底層實作在 debug mode 寫 console + POST 到本機 endpoint，在 release mode 寫 console + 呼叫 Sentry/Crashlytics SDK。切換只改 <code>AppLogger</code> 的底層實作，不改呼叫端。</p>
<p>這個抽象的投資在自用工具階段就值得做 — 即使目前不需要商業方案，統一的 log 介面也讓 log 點的管理更一致。</p>
<h2 id="下一步路由">下一步路由</h2>
<ul>
<li>三層 log 的詳細設計 → <a href="/blog/testing/02-client-observability/three-layer-log-design/" data-link-title="三層 log 設計" data-link-desc="連線生命週期 log、protocol 訊息 log、使用者行為 log — 三層各自的職責、詳細程度和啟停控制">三層 log 設計</a></li>
<li>在功能規格中定義 log 點 → <a href="/blog/testing/02-client-observability/log-point-in-spec/" data-link-title="功能規格中的 log 點定義方法" data-link-desc="把 log 點設計從 debug 階段前移到功能規格階段 — 每個功能的規格文件新增可觀測性欄位，列出啟動 / 步驟 / 錯誤 / 完成四類 log 點">功能規格中的 log 點定義方法</a></li>
<li>Log 收集後的 schema 設計 → <a href="/blog/monitoring/02-log-schema/" data-link-title="模組二：Log Schema 設計" data-link-desc="跨平台統一事件格式、欄位設計、版本演進策略">monitoring 模組二 Log Schema</a></li>
</ul>
]]></content:encoded></item><item><title>「事後補 log」vs「設計產物 log」的品質差異</title><link>https://tarrragon.github.io/blog/testing/02-client-observability/hotfix-log-vs-designed-log/</link><pubDate>Fri, 19 Jun 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/testing/02-client-observability/hotfix-log-vs-designed-log/</guid><description>&lt;p>事後補 log 和設計產物 log 的差別在於產出時機和品質標準。事後補的 log 在 debug 壓力下產出，目的是「讓這次的問題能被定位」；設計產物的 log 在功能規格階段產出，目的是「讓未來任何問題都能被定位」。兩者的品質差異在格式統一性、覆蓋完整性和長期維護成本三個面向上表現明顯。&lt;/p>
&lt;h2 id="格式統一性">格式統一性&lt;/h2>
&lt;p>app_tunnel 在 W2 修復時補的 &lt;code>developer.log&lt;/code> 格式不統一（&lt;a href="https://tarrragon.github.io/blog/testing/cases/client-log-absent-debug-cost/" data-link-title="T.C4 Client-side log 缺失導致 debug 只能靠實機盲測" data-link-desc="Flutter app 六個核心元件中只有兩個有 log（且全是 W2 hotfix 補的），連線失敗時開發者無法從任何 log 判斷失敗發生在哪一步 — 被迫用最昂貴的 debug 方式：插拔裝置反覆測試">T.C4&lt;/a>）。不同元件由不同時間點、不同 debug 需求補上的 log，各自有各自的風格：&lt;/p>
&lt;p>有的帶 &lt;code>name:&lt;/code> 參數讓 log 可以按元件過濾：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-dart" data-lang="dart">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="n">developer&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="n">log&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;WS connected&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nl">name:&lt;/span> &lt;span class="s1">&amp;#39;ConnectionManager&amp;#39;&lt;/span>&lt;span class="p">);&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>有的不帶，混在全域 log 裡無法過濾：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-dart" data-lang="dart">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="n">developer&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="n">log&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;auth token sent&amp;#39;&lt;/span>&lt;span class="p">);&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>有的帶 &lt;code>// i18n-exempt&lt;/code> 標記（因為 linter 會對 hardcoded string 報警），有的忘了加。有的把錯誤訊息放在 &lt;code>error:&lt;/code> 參數，有的用字串串接。&lt;/p>
&lt;p>這些不一致來自事後補 log 的結構性原因：每條 log 是在解決當下問題時加的，沒有統一規範，也沒有 review。加完能定位問題就提交，下次遇到新問題再加新的 log — 格式隨機。&lt;/p>
&lt;p>設計產物 log 在產出前就有命名規則和格式規範（見 &lt;a href="https://tarrragon.github.io/blog/testing/02-client-observability/log-point-in-spec/" data-link-title="功能規格中的 log 點定義方法" data-link-desc="把 log 點設計從 debug 階段前移到功能規格階段 — 每個功能的規格文件新增可觀測性欄位，列出啟動 / 步驟 / 錯誤 / 完成四類 log 點">功能規格中的 log 點定義方法&lt;/a>）。所有 log 點走同一個 &lt;code>AppLogger&lt;/code> 介面，name、level、結構化欄位在規格階段就定義好，實作時照規格寫。&lt;/p>
&lt;h2 id="覆蓋完整性">覆蓋完整性&lt;/h2>
&lt;p>事後補 log 的覆蓋範圍由「哪些問題已經發生過」決定。W2-002 auth token 問題觸發了 &lt;code>ConnectionManager&lt;/code> 和 &lt;code>TerminalScreen&lt;/code> 的 log 補充，但 &lt;code>TtydProtocol&lt;/code>、&lt;code>BiometricService&lt;/code>、&lt;code>CredentialRepository&lt;/code>、&lt;code>EnrollmentScreen&lt;/code> 四個元件仍然零 log — 因為這四個元件在 W2 的 debug 過程中不是瓶頸。&lt;/p>
&lt;p>六個核心元件中四個零 log 的狀態意味著：下次如果問題出在 &lt;code>BiometricService&lt;/code>（例如特定 iOS 版本的 biometric API 行為改變），debug 又會回到「手動加 log → 重新編譯 → 插拔裝置」的循環。事後補 log 只覆蓋已知問題的路徑，對未知問題沒有防護。&lt;/p>
&lt;p>設計產物 log 的覆蓋範圍由功能流程的步驟數決定。每個功能規格列出所有步驟的 log 點，不管這些步驟是否曾經出過問題。&lt;code>BiometricService.authenticate()&lt;/code> 在規格中就有 start/done/failed 三個 log 點，無論是否遇過 biometric 問題。&lt;/p>
&lt;h2 id="維護成本">維護成本&lt;/h2>
&lt;p>事後補 log 隨 debug 過程累積，沒有統一管理。隨時間推移：&lt;/p>
&lt;ul>
&lt;li>某些 log 的觸發條件已經不存在了（被修復的 bug 對應的 log），但沒人清理&lt;/li>
&lt;li>某些 log 的格式和新加的 log 不一致，但沒人統一&lt;/li>
&lt;li>某些 log 的 context 資訊不足（當時能定位問題是因為開發者記得 context，半年後換人接手就不夠了）&lt;/li>
&lt;li>某些 log 在 release build 中不該出現但忘了加條件&lt;/li>
&lt;/ul>
&lt;p>設計產物 log 有規格文件作為 source of truth。功能變更時更新規格中的 log 點列表，刪除的步驟對應的 log 點一起刪除，新增的步驟對應的 log 點一起新增。Log 的生命週期和功能的生命週期綁定。&lt;/p>
&lt;h2 id="從事後補過渡到設計產物">從事後補過渡到設計產物&lt;/h2>
&lt;p>已有的事後補 log 不需要全部重寫。過渡策略是：&lt;/p>
&lt;p>&lt;strong>統一入口&lt;/strong>：建立 &lt;code>AppLogger&lt;/code> 封裝，把現有的 &lt;code>developer.log&lt;/code> 呼叫改為走 &lt;code>AppLogger&lt;/code>。這一步不改 log 內容，只改呼叫方式，讓後續的格式統一和功能切換有統一入口。&lt;/p></description><content:encoded><![CDATA[<p>事後補 log 和設計產物 log 的差別在於產出時機和品質標準。事後補的 log 在 debug 壓力下產出，目的是「讓這次的問題能被定位」；設計產物的 log 在功能規格階段產出，目的是「讓未來任何問題都能被定位」。兩者的品質差異在格式統一性、覆蓋完整性和長期維護成本三個面向上表現明顯。</p>
<h2 id="格式統一性">格式統一性</h2>
<p>app_tunnel 在 W2 修復時補的 <code>developer.log</code> 格式不統一（<a href="/blog/testing/cases/client-log-absent-debug-cost/" data-link-title="T.C4 Client-side log 缺失導致 debug 只能靠實機盲測" data-link-desc="Flutter app 六個核心元件中只有兩個有 log（且全是 W2 hotfix 補的），連線失敗時開發者無法從任何 log 判斷失敗發生在哪一步 — 被迫用最昂貴的 debug 方式：插拔裝置反覆測試">T.C4</a>）。不同元件由不同時間點、不同 debug 需求補上的 log，各自有各自的風格：</p>
<p>有的帶 <code>name:</code> 參數讓 log 可以按元件過濾：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-dart" data-lang="dart"><span class="line"><span class="ln">1</span><span class="cl"><span class="n">developer</span><span class="p">.</span><span class="n">log</span><span class="p">(</span><span class="s1">&#39;WS connected&#39;</span><span class="p">,</span> <span class="nl">name:</span> <span class="s1">&#39;ConnectionManager&#39;</span><span class="p">);</span></span></span></code></pre></div><p>有的不帶，混在全域 log 裡無法過濾：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-dart" data-lang="dart"><span class="line"><span class="ln">1</span><span class="cl"><span class="n">developer</span><span class="p">.</span><span class="n">log</span><span class="p">(</span><span class="s1">&#39;auth token sent&#39;</span><span class="p">);</span></span></span></code></pre></div><p>有的帶 <code>// i18n-exempt</code> 標記（因為 linter 會對 hardcoded string 報警），有的忘了加。有的把錯誤訊息放在 <code>error:</code> 參數，有的用字串串接。</p>
<p>這些不一致來自事後補 log 的結構性原因：每條 log 是在解決當下問題時加的，沒有統一規範，也沒有 review。加完能定位問題就提交，下次遇到新問題再加新的 log — 格式隨機。</p>
<p>設計產物 log 在產出前就有命名規則和格式規範（見 <a href="/blog/testing/02-client-observability/log-point-in-spec/" data-link-title="功能規格中的 log 點定義方法" data-link-desc="把 log 點設計從 debug 階段前移到功能規格階段 — 每個功能的規格文件新增可觀測性欄位，列出啟動 / 步驟 / 錯誤 / 完成四類 log 點">功能規格中的 log 點定義方法</a>）。所有 log 點走同一個 <code>AppLogger</code> 介面，name、level、結構化欄位在規格階段就定義好，實作時照規格寫。</p>
<h2 id="覆蓋完整性">覆蓋完整性</h2>
<p>事後補 log 的覆蓋範圍由「哪些問題已經發生過」決定。W2-002 auth token 問題觸發了 <code>ConnectionManager</code> 和 <code>TerminalScreen</code> 的 log 補充，但 <code>TtydProtocol</code>、<code>BiometricService</code>、<code>CredentialRepository</code>、<code>EnrollmentScreen</code> 四個元件仍然零 log — 因為這四個元件在 W2 的 debug 過程中不是瓶頸。</p>
<p>六個核心元件中四個零 log 的狀態意味著：下次如果問題出在 <code>BiometricService</code>（例如特定 iOS 版本的 biometric API 行為改變），debug 又會回到「手動加 log → 重新編譯 → 插拔裝置」的循環。事後補 log 只覆蓋已知問題的路徑，對未知問題沒有防護。</p>
<p>設計產物 log 的覆蓋範圍由功能流程的步驟數決定。每個功能規格列出所有步驟的 log 點，不管這些步驟是否曾經出過問題。<code>BiometricService.authenticate()</code> 在規格中就有 start/done/failed 三個 log 點，無論是否遇過 biometric 問題。</p>
<h2 id="維護成本">維護成本</h2>
<p>事後補 log 隨 debug 過程累積，沒有統一管理。隨時間推移：</p>
<ul>
<li>某些 log 的觸發條件已經不存在了（被修復的 bug 對應的 log），但沒人清理</li>
<li>某些 log 的格式和新加的 log 不一致，但沒人統一</li>
<li>某些 log 的 context 資訊不足（當時能定位問題是因為開發者記得 context，半年後換人接手就不夠了）</li>
<li>某些 log 在 release build 中不該出現但忘了加條件</li>
</ul>
<p>設計產物 log 有規格文件作為 source of truth。功能變更時更新規格中的 log 點列表，刪除的步驟對應的 log 點一起刪除，新增的步驟對應的 log 點一起新增。Log 的生命週期和功能的生命週期綁定。</p>
<h2 id="從事後補過渡到設計產物">從事後補過渡到設計產物</h2>
<p>已有的事後補 log 不需要全部重寫。過渡策略是：</p>
<p><strong>統一入口</strong>：建立 <code>AppLogger</code> 封裝，把現有的 <code>developer.log</code> 呼叫改為走 <code>AppLogger</code>。這一步不改 log 內容，只改呼叫方式，讓後續的格式統一和功能切換有統一入口。</p>
<p><strong>補規格</strong>：對每個功能寫出 log 點規格表（四類 log 點），比對現有 log 和規格的差距。規格中有但程式碼中沒有的 log 點 = 覆蓋缺口，補上。程式碼中有但規格中沒有的 log 點 = 可能是過時的 debug log，評估是否刪除。</p>
<p><strong>新功能走設計產物流程</strong>：從下一個新功能開始，功能規格中包含可觀測性欄位。新功能的 log 從一開始就是設計產物品質。</p>
<p>過渡的第一步是建立統一入口，具體的 log 點規格格式見<a href="/blog/testing/02-client-observability/log-point-in-spec/" data-link-title="功能規格中的 log 點定義方法" data-link-desc="把 log 點設計從 debug 階段前移到功能規格階段 — 每個功能的規格文件新增可觀測性欄位，列出啟動 / 步驟 / 錯誤 / 完成四類 log 點">功能規格中的 log 點定義方法</a>。規格中的每個 log 點屬於哪一層（連線生命週期 / protocol / 使用者行為），在<a href="/blog/testing/02-client-observability/three-layer-log-design/" data-link-title="三層 log 設計" data-link-desc="連線生命週期 log、protocol 訊息 log、使用者行為 log — 三層各自的職責、詳細程度和啟停控制">三層 log 設計</a>中定義。收集到 log 之後用自架還是商業方案處理，見<a href="/blog/testing/02-client-observability/log-endpoint-tradeoff/" data-link-title="自架 log endpoint vs 商業方案的取捨判斷" data-link-desc="自用工具用自架 log receiver（20 行 Go &#43; grep）、商業 app 用 Sentry/Crashlytics — 判斷依據是使用者規模和 debug 需求">自架 log endpoint vs 商業方案</a>的判斷流程。</p>
]]></content:encoded></item><item><title>T.C4 Client-side log 缺失導致 debug 只能靠實機盲測</title><link>https://tarrragon.github.io/blog/testing/cases/client-log-absent-debug-cost/</link><pubDate>Fri, 19 Jun 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/testing/cases/client-log-absent-debug-cost/</guid><description>&lt;p>這個案例的核心責任是說明「客戶端 log 設計」為什麼應該在功能企劃階段完成，而不是 debug 時才補。Log 不是 debug 工具，是可觀測性基礎設施。&lt;/p>
&lt;h2 id="觀察">觀察&lt;/h2>
&lt;p>app_tunnel 的六個核心元件在實機測試前的 log 覆蓋狀態：&lt;/p>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>元件&lt;/th>
 &lt;th>log 點數&lt;/th>
 &lt;th>備註&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>ConnectionManager&lt;/td>
 &lt;td>0 → 10&lt;/td>
 &lt;td>W2 修復後補的 &lt;code>developer.log&lt;/code>&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>TerminalScreen&lt;/td>
 &lt;td>0 → 5&lt;/td>
 &lt;td>W2 修復後補的&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>TtydProtocol&lt;/td>
 &lt;td>0&lt;/td>
 &lt;td>encode/decode/buildAuth 無 log&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>BiometricService&lt;/td>
 &lt;td>0&lt;/td>
 &lt;td>isAvailable/authenticate 結果無 log&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>CredentialRepository&lt;/td>
 &lt;td>0&lt;/td>
 &lt;td>load/save/delete 操作無 log&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>EnrollmentScreen&lt;/td>
 &lt;td>0&lt;/td>
 &lt;td>QR 掃描/解析/儲存無 log&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>W2-004（P0：iOS 實機 WS stream 不觸發）的 debug 過程：無法從任何 log 判斷問題發生在 biometric → credential → WS connect → auth token → stream listen 的哪一步。開發者被迫在每個函式手動加 &lt;code>developer.log&lt;/code>，重新編譯，插拔裝置測試，反覆數次才定位到「stream 訂閱時機」問題。&lt;/p>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>指標&lt;/th>
 &lt;th>值&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>debug 成本&lt;/td>
 &lt;td>每次修改→編譯→部署→測試約 3-5 分鐘&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>定位 W2-002 (auth token) 花費&lt;/td>
 &lt;td>約 30 分鐘反覆測試&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>若有連線生命週期 log&lt;/td>
 &lt;td>第一次連線就能看到「Step 3 之後無 auth token 發送」&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;h2 id="判讀">判讀&lt;/h2>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Log 缺失把 debug 成本從秒級升到分鐘級&lt;/strong>。如果 ConnectionManager 在企劃階段就設計了「Step 1: biometric → Step 2: credential → Step 3: WS connect → Step 4: auth token → Step 5: listen stream」五步 log，W2-002 的 auth token 問題在第一次連線就能從 log 看到「Step 3 完成，Step 4 未執行」。&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>「事後補 log」的 log 品質較低&lt;/strong>。W2 修復時補的 &lt;code>developer.log&lt;/code> 格式不統一（有的帶 &lt;code>name:&lt;/code>，有的不帶；有的用 &lt;code>// i18n-exempt&lt;/code> 標記，有的忘了），沒有統一的 log 層級，沒有結構化欄位。事後補的 log 是救火工具，不是可觀測性設計。&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>自用工具最適合自架 log 收集&lt;/strong>。app_tunnel 的 server 和 client 都在同一台機器上（或同一個 Tailscale tailnet），client 可以直接打 HTTP POST 到本機的 log endpoint，不需要 Sentry 或 Crashlytics。一個 Go 寫的 JSON log receiver（20 行）+ grep 就是完整的 debug 工具鏈。&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Log 設計是功能規格的一部分&lt;/strong>。「連線到 ttyd 終端機」這個功能的規格不只是「建立 WS 連線」，還包含「每步有 log、失敗有 log、成功有 log」。跟 API 規格需要定義 request/response 一樣，連線功能需要定義 log 點。&lt;/p>
&lt;/li>
&lt;/ol>
&lt;h2 id="策略">策略&lt;/h2>
&lt;ol>
&lt;li>&lt;strong>功能規格階段列出 log 點清單&lt;/strong>：每個功能的規格文件新增「可觀測性」欄位，列出啟動/步驟/錯誤/完成四類 log 點。&lt;/li>
&lt;li>&lt;strong>建立統一 log 層&lt;/strong>：封裝 &lt;code>developer.log&lt;/code> 為 &lt;code>AppLogger&lt;/code>，統一 name、level、格式。開發期用 &lt;code>developer.log&lt;/code>，後續可切換到 HTTP log endpoint。&lt;/li>
&lt;li>&lt;strong>自架 log endpoint 方案&lt;/strong>：本機 Go server 開一個 &lt;code>/log&lt;/code> POST endpoint，接收 JSON log，寫入檔案。Client 端 &lt;code>AppLogger&lt;/code> 在 debug mode 同時寫 console + POST 到 endpoint。開發期 grep 查詢，不需要 dashboard。&lt;/li>
&lt;li>&lt;strong>Protocol log 獨立一層&lt;/strong>：WebSocket frame type、payload 前綴、auth handshake 結果獨立記錄，跟 business log 分開。這層 log 在 release mode 應該能關閉。&lt;/li>
&lt;/ol>
&lt;h2 id="下一步路由">下一步路由&lt;/h2>
&lt;ul>
&lt;li>想設計客戶端 log 方案 → &lt;a href="https://tarrragon.github.io/blog/testing/02-client-observability/" data-link-title="模組二：客戶端可觀測性" data-link-desc="連線生命週期 log、protocol 訊息 log、使用者行為 log — log 設計是功能規格的一部分">模組二：客戶端可觀測性&lt;/a>&lt;/li>
&lt;li>想理解三層 log 設計 → &lt;a href="https://tarrragon.github.io/blog/testing/02-client-observability/three-layer-log-design/" data-link-title="三層 log 設計" data-link-desc="連線生命週期 log、protocol 訊息 log、使用者行為 log — 三層各自的職責、詳細程度和啟停控制">三層 log 設計&lt;/a>&lt;/li>
&lt;li>想建自架 log endpoint → &lt;a href="https://tarrragon.github.io/blog/testing/02-client-observability/log-endpoint-tradeoff/" data-link-title="自架 log endpoint vs 商業方案的取捨判斷" data-link-desc="自用工具用自架 log receiver（20 行 Go &amp;#43; grep）、商業 app 用 Sentry/Crashlytics — 判斷依據是使用者規模和 debug 需求">自架 log endpoint vs 商業方案&lt;/a>&lt;/li>
&lt;/ul></description><content:encoded><![CDATA[<p>這個案例的核心責任是說明「客戶端 log 設計」為什麼應該在功能企劃階段完成，而不是 debug 時才補。Log 不是 debug 工具，是可觀測性基礎設施。</p>
<h2 id="觀察">觀察</h2>
<p>app_tunnel 的六個核心元件在實機測試前的 log 覆蓋狀態：</p>
<table>
  <thead>
      <tr>
          <th>元件</th>
          <th>log 點數</th>
          <th>備註</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>ConnectionManager</td>
          <td>0 → 10</td>
          <td>W2 修復後補的 <code>developer.log</code></td>
      </tr>
      <tr>
          <td>TerminalScreen</td>
          <td>0 → 5</td>
          <td>W2 修復後補的</td>
      </tr>
      <tr>
          <td>TtydProtocol</td>
          <td>0</td>
          <td>encode/decode/buildAuth 無 log</td>
      </tr>
      <tr>
          <td>BiometricService</td>
          <td>0</td>
          <td>isAvailable/authenticate 結果無 log</td>
      </tr>
      <tr>
          <td>CredentialRepository</td>
          <td>0</td>
          <td>load/save/delete 操作無 log</td>
      </tr>
      <tr>
          <td>EnrollmentScreen</td>
          <td>0</td>
          <td>QR 掃描/解析/儲存無 log</td>
      </tr>
  </tbody>
</table>
<p>W2-004（P0：iOS 實機 WS stream 不觸發）的 debug 過程：無法從任何 log 判斷問題發生在 biometric → credential → WS connect → auth token → stream listen 的哪一步。開發者被迫在每個函式手動加 <code>developer.log</code>，重新編譯，插拔裝置測試，反覆數次才定位到「stream 訂閱時機」問題。</p>
<table>
  <thead>
      <tr>
          <th>指標</th>
          <th>值</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>debug 成本</td>
          <td>每次修改→編譯→部署→測試約 3-5 分鐘</td>
      </tr>
      <tr>
          <td>定位 W2-002 (auth token) 花費</td>
          <td>約 30 分鐘反覆測試</td>
      </tr>
      <tr>
          <td>若有連線生命週期 log</td>
          <td>第一次連線就能看到「Step 3 之後無 auth token 發送」</td>
      </tr>
  </tbody>
</table>
<h2 id="判讀">判讀</h2>
<ol>
<li>
<p><strong>Log 缺失把 debug 成本從秒級升到分鐘級</strong>。如果 ConnectionManager 在企劃階段就設計了「Step 1: biometric → Step 2: credential → Step 3: WS connect → Step 4: auth token → Step 5: listen stream」五步 log，W2-002 的 auth token 問題在第一次連線就能從 log 看到「Step 3 完成，Step 4 未執行」。</p>
</li>
<li>
<p><strong>「事後補 log」的 log 品質較低</strong>。W2 修復時補的 <code>developer.log</code> 格式不統一（有的帶 <code>name:</code>，有的不帶；有的用 <code>// i18n-exempt</code> 標記，有的忘了），沒有統一的 log 層級，沒有結構化欄位。事後補的 log 是救火工具，不是可觀測性設計。</p>
</li>
<li>
<p><strong>自用工具最適合自架 log 收集</strong>。app_tunnel 的 server 和 client 都在同一台機器上（或同一個 Tailscale tailnet），client 可以直接打 HTTP POST 到本機的 log endpoint，不需要 Sentry 或 Crashlytics。一個 Go 寫的 JSON log receiver（20 行）+ grep 就是完整的 debug 工具鏈。</p>
</li>
<li>
<p><strong>Log 設計是功能規格的一部分</strong>。「連線到 ttyd 終端機」這個功能的規格不只是「建立 WS 連線」，還包含「每步有 log、失敗有 log、成功有 log」。跟 API 規格需要定義 request/response 一樣，連線功能需要定義 log 點。</p>
</li>
</ol>
<h2 id="策略">策略</h2>
<ol>
<li><strong>功能規格階段列出 log 點清單</strong>：每個功能的規格文件新增「可觀測性」欄位，列出啟動/步驟/錯誤/完成四類 log 點。</li>
<li><strong>建立統一 log 層</strong>：封裝 <code>developer.log</code> 為 <code>AppLogger</code>，統一 name、level、格式。開發期用 <code>developer.log</code>，後續可切換到 HTTP log endpoint。</li>
<li><strong>自架 log endpoint 方案</strong>：本機 Go server 開一個 <code>/log</code> POST endpoint，接收 JSON log，寫入檔案。Client 端 <code>AppLogger</code> 在 debug mode 同時寫 console + POST 到 endpoint。開發期 grep 查詢，不需要 dashboard。</li>
<li><strong>Protocol log 獨立一層</strong>：WebSocket frame type、payload 前綴、auth handshake 結果獨立記錄，跟 business log 分開。這層 log 在 release mode 應該能關閉。</li>
</ol>
<h2 id="下一步路由">下一步路由</h2>
<ul>
<li>想設計客戶端 log 方案 → <a href="/blog/testing/02-client-observability/" data-link-title="模組二：客戶端可觀測性" data-link-desc="連線生命週期 log、protocol 訊息 log、使用者行為 log — log 設計是功能規格的一部分">模組二：客戶端可觀測性</a></li>
<li>想理解三層 log 設計 → <a href="/blog/testing/02-client-observability/three-layer-log-design/" data-link-title="三層 log 設計" data-link-desc="連線生命週期 log、protocol 訊息 log、使用者行為 log — 三層各自的職責、詳細程度和啟停控制">三層 log 設計</a></li>
<li>想建自架 log endpoint → <a href="/blog/testing/02-client-observability/log-endpoint-tradeoff/" data-link-title="自架 log endpoint vs 商業方案的取捨判斷" data-link-desc="自用工具用自架 log receiver（20 行 Go &#43; grep）、商業 app 用 Sentry/Crashlytics — 判斷依據是使用者規模和 debug 需求">自架 log endpoint vs 商業方案</a></li>
</ul>
]]></content:encoded></item><item><title>6.5 如何新增結構化記錄欄位</title><link>https://tarrragon.github.io/blog/go/06-practical/structured-recording/</link><pubDate>Wed, 22 Apr 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/go/06-practical/structured-recording/</guid><description>&lt;p>新增結構化記錄欄位的核心規則是先判斷這筆資訊是給工程師除錯、給系統重播，還是給使用者查詢。不同用途對應不同記錄邊界，資料應依用途進入 &lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/log/" data-link-title="Log" data-link-desc="說明 log 如何記錄單一事件的上下文並支援事故排查">log&lt;/a>、&lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/event-log/" data-link-title="Event Log" data-link-desc="說明事件歷史如何保存、重播與支援跨服務資料重建">event log&lt;/a> 或 repository。&lt;/p>
&lt;h2 id="本章目標">本章目標&lt;/h2>
&lt;p>學完本章後，你將能夠：&lt;/p>
&lt;ol>
&lt;li>分辨 structured log、domain event log 與 state repository&lt;/li>
&lt;li>設計穩定的 log 欄位名稱&lt;/li>
&lt;li>判斷哪些資料不應寫進 log&lt;/li>
&lt;li>用 &lt;code>EventLog.Append&lt;/code> 表達事件記錄邊界&lt;/li>
&lt;li>測試穩定欄位，而不是測自由文字&lt;/li>
&lt;/ol>
&lt;hr>
&lt;h2 id="觀察先判斷記錄用途">【觀察】先判斷記錄用途&lt;/h2>
&lt;p>記錄邊界的核心問題是資料要服務誰。工程師除錯、系統重播、使用者查詢是三種不同用途，對應三種不同儲存與格式責任。&lt;/p>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>記錄類型&lt;/th>
 &lt;th>用途&lt;/th>
 &lt;th>範例&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>structured log&lt;/td>
 &lt;td>操作診斷、除錯、聚合查詢&lt;/td>
 &lt;td>&lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/queue/" data-link-title="Queue" data-link-desc="說明 queue 如何保存等待處理的工作並形成容量邊界">queue&lt;/a> full、event rejected、worker failed&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>domain event log&lt;/td>
 &lt;td>記錄已發生事實、audit、replay&lt;/td>
 &lt;td>&lt;code>notification.created&lt;/code>、&lt;code>job.failed&lt;/code>&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>state repository&lt;/td>
 &lt;td>查詢目前狀態或投影&lt;/td>
 &lt;td>job current status、notification summary&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>structured log 服務操作診斷，event log 保存 normalized fact，state repository 回答目前狀態。先分清楚用途，才知道欄位該放哪裡。這個用途判斷比選擇哪個 logging package 更關鍵 — 工具決定怎麼寫，用途決定寫什麼、放哪裡。&lt;/p>
&lt;h2 id="判讀structured-log-是操作訊號">【判讀】structured log 是操作訊號&lt;/h2>
&lt;p>structured log 的核心用途是讓工程師知道系統正在發生什麼，並且能用欄位查詢。它應該記錄操作訊號，而不是完整業務資料。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="nx">logger&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Info&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl"> &lt;span class="s">&amp;#34;event accepted&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl"> &lt;span class="s">&amp;#34;layer&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;adapter&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl"> &lt;span class="s">&amp;#34;event_type&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nb">string&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">event&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Type&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl"> &lt;span class="s">&amp;#34;event_id&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">event&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">ID&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">6&lt;/span>&lt;span class="cl"> &lt;span class="s">&amp;#34;subject_id&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">event&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">SubjectID&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">7&lt;/span>&lt;span class="cl"> &lt;span class="s">&amp;#34;correlation_id&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">event&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">CorrelationID&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">8&lt;/span>&lt;span class="cl">&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;code>message&lt;/code> 給人讀，欄位給查詢工具使用。若未來要查某種事件是否大量進入系統，&lt;code>event_type&lt;/code> 欄位比文字搜尋更可靠。&lt;/p>
&lt;p>常見 log 欄位可以先定義成 helper，避免不同地方拼出不同名稱：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="ln"> 1&lt;/span>&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">LogAttrsForEvent&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">event&lt;/span> &lt;span class="nx">DomainEvent&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">[]&lt;/span>&lt;span class="kt">any&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 2&lt;/span>&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="p">[]&lt;/span>&lt;span class="kt">any&lt;/span>&lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 3&lt;/span>&lt;span class="cl"> &lt;span class="s">&amp;#34;event_id&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">event&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">ID&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 4&lt;/span>&lt;span class="cl"> &lt;span class="s">&amp;#34;event_type&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nb">string&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">event&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Type&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 5&lt;/span>&lt;span class="cl"> &lt;span class="s">&amp;#34;subject_kind&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nb">string&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">event&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">SubjectKind&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 6&lt;/span>&lt;span class="cl"> &lt;span class="s">&amp;#34;subject_id&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">event&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">SubjectID&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 7&lt;/span>&lt;span class="cl"> &lt;span class="s">&amp;#34;correlation_id&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">event&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">CorrelationID&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 8&lt;/span>&lt;span class="cl"> &lt;span class="s">&amp;#34;schema_version&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">event&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">SchemaVersion&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 9&lt;/span>&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">10&lt;/span>&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>使用時可以展開欄位：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="nx">logger&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Info&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;event accepted&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nf">LogAttrsForEvent&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">event&lt;/span>&lt;span class="p">)&lt;/span>&lt;span class="o">...&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>這個 helper 保護的是 &lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/log-schema/" data-link-title="Log Schema" data-link-desc="說明結構化 log 欄位如何支援搜尋、關聯與事故排查">log schema&lt;/a>。欄位名稱穩定，查詢與 &lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/dashboard/" data-link-title="Dashboard" data-link-desc="說明 dashboard 如何把關鍵訊號組成可判讀的服務狀態畫面">dashboard&lt;/a> 才能穩定。&lt;/p>
&lt;h2 id="策略reason-欄位要像-enum">【策略】reason 欄位要像 enum&lt;/h2>
&lt;p>&lt;code>reason&lt;/code> 的核心語意是可聚合的原因分類。它應使用小集合穩定值；完整錯誤訊息則放在 &lt;code>error&lt;/code> 欄位協助診斷。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="kd">const&lt;/span> &lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl"> &lt;span class="nx">ReasonInvalidPayload&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="s">&amp;#34;invalid_payload&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl"> &lt;span class="nx">ReasonQueueFull&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="s">&amp;#34;queue_full&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl"> &lt;span class="nx">ReasonDuplicateEvent&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="s">&amp;#34;duplicate_event&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl"> &lt;span class="nx">ReasonTimeout&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="s">&amp;#34;timeout&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">6&lt;/span>&lt;span class="cl">&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>記錄拒絕事件時：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="nx">logger&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Warn&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl"> &lt;span class="s">&amp;#34;event rejected&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl"> &lt;span class="s">&amp;#34;layer&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;adapter&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl"> &lt;span class="s">&amp;#34;reason&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">ReasonInvalidPayload&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl"> &lt;span class="s">&amp;#34;event_type&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nb">string&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">event&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Type&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">6&lt;/span>&lt;span class="cl"> &lt;span class="s">&amp;#34;error&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">err&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">7&lt;/span>&lt;span class="cl">&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;code>reason&lt;/code> 用來統計，&lt;code>error&lt;/code> 用來診斷，message 用來讓人快速理解。這三者不要混成一個大字串。&lt;/p>
&lt;h2 id="判讀event-log-記錄-normalized-fact">【判讀】event log 記錄 normalized fact&lt;/h2>
&lt;p>domain event log 的核心責任是保存已正規化的 domain event。它記錄的是系統承認的事實；raw request、debug log 與目前狀態分別屬於不同記錄邊界。&lt;/p>
&lt;p>先定義 port：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="kd">type&lt;/span> &lt;span class="nx">EventLog&lt;/span> &lt;span class="kd">interface&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl"> &lt;span class="nf">Append&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">ctx&lt;/span> &lt;span class="nx">context&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Context&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">event&lt;/span> &lt;span class="nx">DomainEvent&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="kt">error&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>memory implementation 可以先這樣寫：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="ln"> 1&lt;/span>&lt;span class="cl">&lt;span class="kd">type&lt;/span> &lt;span class="nx">InMemoryEventLog&lt;/span> &lt;span class="kd">struct&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 2&lt;/span>&lt;span class="cl"> &lt;span class="nx">mu&lt;/span> &lt;span class="nx">sync&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Mutex&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 3&lt;/span>&lt;span class="cl"> &lt;span class="nx">events&lt;/span> &lt;span class="p">[]&lt;/span>&lt;span class="nx">DomainEvent&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 4&lt;/span>&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 5&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 6&lt;/span>&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">NewInMemoryEventLog&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">InMemoryEventLog&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 7&lt;/span>&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="o">&amp;amp;&lt;/span>&lt;span class="nx">InMemoryEventLog&lt;/span>&lt;span class="p">{}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 8&lt;/span>&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 9&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">10&lt;/span>&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">l&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">InMemoryEventLog&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nf">Append&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">ctx&lt;/span> &lt;span class="nx">context&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Context&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">event&lt;/span> &lt;span class="nx">DomainEvent&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="kt">error&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">11&lt;/span>&lt;span class="cl"> &lt;span class="nx">l&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">mu&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Lock&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">12&lt;/span>&lt;span class="cl"> &lt;span class="k">defer&lt;/span> &lt;span class="nx">l&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">mu&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Unlock&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">13&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">14&lt;/span>&lt;span class="cl"> &lt;span class="nx">l&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">events&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="nb">append&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">l&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">events&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nf">cloneDomainEvent&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">event&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">15&lt;/span>&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="kc">nil&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">16&lt;/span>&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>event log 應該保存 &lt;code>DomainEvent&lt;/code> envelope 中的穩定欄位，例如 event ID、type、subject、schema version、occurred/received time。它不需要保存 adapter 的 raw input，除非你已經明確設計 raw &lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/audit-log/" data-link-title="Audit Log" data-link-desc="說明高風險操作如何留下可追溯、可稽核的紀錄">audit log&lt;/a>。&lt;/p></description><content:encoded><![CDATA[<p>新增結構化記錄欄位的核心規則是先判斷這筆資訊是給工程師除錯、給系統重播，還是給使用者查詢。不同用途對應不同記錄邊界，資料應依用途進入 <a href="/blog/backend/knowledge-cards/log/" data-link-title="Log" data-link-desc="說明 log 如何記錄單一事件的上下文並支援事故排查">log</a>、<a href="/blog/backend/knowledge-cards/event-log/" data-link-title="Event Log" data-link-desc="說明事件歷史如何保存、重播與支援跨服務資料重建">event log</a> 或 repository。</p>
<h2 id="本章目標">本章目標</h2>
<p>學完本章後，你將能夠：</p>
<ol>
<li>分辨 structured log、domain event log 與 state repository</li>
<li>設計穩定的 log 欄位名稱</li>
<li>判斷哪些資料不應寫進 log</li>
<li>用 <code>EventLog.Append</code> 表達事件記錄邊界</li>
<li>測試穩定欄位，而不是測自由文字</li>
</ol>
<hr>
<h2 id="觀察先判斷記錄用途">【觀察】先判斷記錄用途</h2>
<p>記錄邊界的核心問題是資料要服務誰。工程師除錯、系統重播、使用者查詢是三種不同用途，對應三種不同儲存與格式責任。</p>
<table>
  <thead>
      <tr>
          <th>記錄類型</th>
          <th>用途</th>
          <th>範例</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>structured log</td>
          <td>操作診斷、除錯、聚合查詢</td>
          <td><a href="/blog/backend/knowledge-cards/queue/" data-link-title="Queue" data-link-desc="說明 queue 如何保存等待處理的工作並形成容量邊界">queue</a> full、event rejected、worker failed</td>
      </tr>
      <tr>
          <td>domain event log</td>
          <td>記錄已發生事實、audit、replay</td>
          <td><code>notification.created</code>、<code>job.failed</code></td>
      </tr>
      <tr>
          <td>state repository</td>
          <td>查詢目前狀態或投影</td>
          <td>job current status、notification summary</td>
      </tr>
  </tbody>
</table>
<p>structured log 服務操作診斷，event log 保存 normalized fact，state repository 回答目前狀態。先分清楚用途，才知道欄位該放哪裡。這個用途判斷比選擇哪個 logging package 更關鍵 — 工具決定怎麼寫，用途決定寫什麼、放哪裡。</p>
<h2 id="判讀structured-log-是操作訊號">【判讀】structured log 是操作訊號</h2>
<p>structured log 的核心用途是讓工程師知道系統正在發生什麼，並且能用欄位查詢。它應該記錄操作訊號，而不是完整業務資料。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln">1</span><span class="cl"><span class="nx">logger</span><span class="p">.</span><span class="nf">Info</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">    <span class="s">&#34;event accepted&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">    <span class="s">&#34;layer&#34;</span><span class="p">,</span> <span class="s">&#34;adapter&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">    <span class="s">&#34;event_type&#34;</span><span class="p">,</span> <span class="nb">string</span><span class="p">(</span><span class="nx">event</span><span class="p">.</span><span class="nx">Type</span><span class="p">),</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">    <span class="s">&#34;event_id&#34;</span><span class="p">,</span> <span class="nx">event</span><span class="p">.</span><span class="nx">ID</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl">    <span class="s">&#34;subject_id&#34;</span><span class="p">,</span> <span class="nx">event</span><span class="p">.</span><span class="nx">SubjectID</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">7</span><span class="cl">    <span class="s">&#34;correlation_id&#34;</span><span class="p">,</span> <span class="nx">event</span><span class="p">.</span><span class="nx">CorrelationID</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">8</span><span class="cl"><span class="p">)</span></span></span></code></pre></div><p><code>message</code> 給人讀，欄位給查詢工具使用。若未來要查某種事件是否大量進入系統，<code>event_type</code> 欄位比文字搜尋更可靠。</p>
<p>常見 log 欄位可以先定義成 helper，避免不同地方拼出不同名稱：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="kd">func</span> <span class="nf">LogAttrsForEvent</span><span class="p">(</span><span class="nx">event</span> <span class="nx">DomainEvent</span><span class="p">)</span> <span class="p">[]</span><span class="kt">any</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="k">return</span> <span class="p">[]</span><span class="kt">any</span><span class="p">{</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">        <span class="s">&#34;event_id&#34;</span><span class="p">,</span> <span class="nx">event</span><span class="p">.</span><span class="nx">ID</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">        <span class="s">&#34;event_type&#34;</span><span class="p">,</span> <span class="nb">string</span><span class="p">(</span><span class="nx">event</span><span class="p">.</span><span class="nx">Type</span><span class="p">),</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">        <span class="s">&#34;subject_kind&#34;</span><span class="p">,</span> <span class="nb">string</span><span class="p">(</span><span class="nx">event</span><span class="p">.</span><span class="nx">SubjectKind</span><span class="p">),</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">        <span class="s">&#34;subject_id&#34;</span><span class="p">,</span> <span class="nx">event</span><span class="p">.</span><span class="nx">SubjectID</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">        <span class="s">&#34;correlation_id&#34;</span><span class="p">,</span> <span class="nx">event</span><span class="p">.</span><span class="nx">CorrelationID</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">        <span class="s">&#34;schema_version&#34;</span><span class="p">,</span> <span class="nx">event</span><span class="p">.</span><span class="nx">SchemaVersion</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>使用時可以展開欄位：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln">1</span><span class="cl"><span class="nx">logger</span><span class="p">.</span><span class="nf">Info</span><span class="p">(</span><span class="s">&#34;event accepted&#34;</span><span class="p">,</span> <span class="nf">LogAttrsForEvent</span><span class="p">(</span><span class="nx">event</span><span class="p">)</span><span class="o">...</span><span class="p">)</span></span></span></code></pre></div><p>這個 helper 保護的是 <a href="/blog/backend/knowledge-cards/log-schema/" data-link-title="Log Schema" data-link-desc="說明結構化 log 欄位如何支援搜尋、關聯與事故排查">log schema</a>。欄位名稱穩定，查詢與 <a href="/blog/backend/knowledge-cards/dashboard/" data-link-title="Dashboard" data-link-desc="說明 dashboard 如何把關鍵訊號組成可判讀的服務狀態畫面">dashboard</a> 才能穩定。</p>
<h2 id="策略reason-欄位要像-enum">【策略】reason 欄位要像 enum</h2>
<p><code>reason</code> 的核心語意是可聚合的原因分類。它應使用小集合穩定值；完整錯誤訊息則放在 <code>error</code> 欄位協助診斷。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln">1</span><span class="cl"><span class="kd">const</span> <span class="p">(</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">    <span class="nx">ReasonInvalidPayload</span> <span class="p">=</span> <span class="s">&#34;invalid_payload&#34;</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">    <span class="nx">ReasonQueueFull</span>      <span class="p">=</span> <span class="s">&#34;queue_full&#34;</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">    <span class="nx">ReasonDuplicateEvent</span> <span class="p">=</span> <span class="s">&#34;duplicate_event&#34;</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">    <span class="nx">ReasonTimeout</span>        <span class="p">=</span> <span class="s">&#34;timeout&#34;</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="p">)</span></span></span></code></pre></div><p>記錄拒絕事件時：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln">1</span><span class="cl"><span class="nx">logger</span><span class="p">.</span><span class="nf">Warn</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">    <span class="s">&#34;event rejected&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">    <span class="s">&#34;layer&#34;</span><span class="p">,</span> <span class="s">&#34;adapter&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">    <span class="s">&#34;reason&#34;</span><span class="p">,</span> <span class="nx">ReasonInvalidPayload</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">    <span class="s">&#34;event_type&#34;</span><span class="p">,</span> <span class="nb">string</span><span class="p">(</span><span class="nx">event</span><span class="p">.</span><span class="nx">Type</span><span class="p">),</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl">    <span class="s">&#34;error&#34;</span><span class="p">,</span> <span class="nx">err</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="p">)</span></span></span></code></pre></div><p><code>reason</code> 用來統計，<code>error</code> 用來診斷，message 用來讓人快速理解。這三者不要混成一個大字串。</p>
<h2 id="判讀event-log-記錄-normalized-fact">【判讀】event log 記錄 normalized fact</h2>
<p>domain event log 的核心責任是保存已正規化的 domain event。它記錄的是系統承認的事實；raw request、debug log 與目前狀態分別屬於不同記錄邊界。</p>
<p>先定義 port：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln">1</span><span class="cl"><span class="kd">type</span> <span class="nx">EventLog</span> <span class="kd">interface</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">    <span class="nf">Append</span><span class="p">(</span><span class="nx">ctx</span> <span class="nx">context</span><span class="p">.</span><span class="nx">Context</span><span class="p">,</span> <span class="nx">event</span> <span class="nx">DomainEvent</span><span class="p">)</span> <span class="kt">error</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>memory implementation 可以先這樣寫：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="kd">type</span> <span class="nx">InMemoryEventLog</span> <span class="kd">struct</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="nx">mu</span>     <span class="nx">sync</span><span class="p">.</span><span class="nx">Mutex</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">    <span class="nx">events</span> <span class="p">[]</span><span class="nx">DomainEvent</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="kd">func</span> <span class="nf">NewInMemoryEventLog</span><span class="p">()</span> <span class="o">*</span><span class="nx">InMemoryEventLog</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">    <span class="k">return</span> <span class="o">&amp;</span><span class="nx">InMemoryEventLog</span><span class="p">{}</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="kd">func</span> <span class="p">(</span><span class="nx">l</span> <span class="o">*</span><span class="nx">InMemoryEventLog</span><span class="p">)</span> <span class="nf">Append</span><span class="p">(</span><span class="nx">ctx</span> <span class="nx">context</span><span class="p">.</span><span class="nx">Context</span><span class="p">,</span> <span class="nx">event</span> <span class="nx">DomainEvent</span><span class="p">)</span> <span class="kt">error</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">    <span class="nx">l</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nf">Lock</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">    <span class="k">defer</span> <span class="nx">l</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nf">Unlock</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">
</span></span><span class="line"><span class="ln">14</span><span class="cl">    <span class="nx">l</span><span class="p">.</span><span class="nx">events</span> <span class="p">=</span> <span class="nb">append</span><span class="p">(</span><span class="nx">l</span><span class="p">.</span><span class="nx">events</span><span class="p">,</span> <span class="nf">cloneDomainEvent</span><span class="p">(</span><span class="nx">event</span><span class="p">))</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">    <span class="k">return</span> <span class="kc">nil</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>event log 應該保存 <code>DomainEvent</code> envelope 中的穩定欄位，例如 event ID、type、subject、schema version、occurred/received time。它不需要保存 adapter 的 raw input，除非你已經明確設計 raw <a href="/blog/backend/knowledge-cards/audit-log/" data-link-title="Audit Log" data-link-desc="說明高風險操作如何留下可追溯、可稽核的紀錄">audit log</a>。</p>
<h2 id="執行event-log-要保護-copy-boundary">【執行】event log 要保護 copy boundary</h2>
<p>event log 的核心資料也是內部狀態。若 event 包含 slice、map 或 <code>json.RawMessage</code>，append 與讀取時都要避免外部修改內部資料。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln">1</span><span class="cl"><span class="kd">func</span> <span class="nf">cloneDomainEvent</span><span class="p">(</span><span class="nx">event</span> <span class="nx">DomainEvent</span><span class="p">)</span> <span class="nx">DomainEvent</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">    <span class="nx">cloned</span> <span class="o">:=</span> <span class="nx">event</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">    <span class="k">if</span> <span class="nx">event</span><span class="p">.</span><span class="nx">Payload</span> <span class="o">!=</span> <span class="kc">nil</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">        <span class="nx">cloned</span><span class="p">.</span><span class="nx">Payload</span> <span class="p">=</span> <span class="nb">append</span><span class="p">(</span><span class="nx">json</span><span class="p">.</span><span class="nf">RawMessage</span><span class="p">(</span><span class="kc">nil</span><span class="p">),</span> <span class="nx">event</span><span class="p">.</span><span class="nx">Payload</span><span class="o">...</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl">    <span class="k">return</span> <span class="nx">cloned</span>
</span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>若要提供查詢方法，也要回傳複製資料：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="kd">func</span> <span class="p">(</span><span class="nx">l</span> <span class="o">*</span><span class="nx">InMemoryEventLog</span><span class="p">)</span> <span class="nf">List</span><span class="p">()</span> <span class="p">[]</span><span class="nx">DomainEvent</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="nx">l</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nf">Lock</span><span class="p">()</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">    <span class="k">defer</span> <span class="nx">l</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nf">Unlock</span><span class="p">()</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">    <span class="nx">result</span> <span class="o">:=</span> <span class="nb">make</span><span class="p">([]</span><span class="nx">DomainEvent</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="nx">l</span><span class="p">.</span><span class="nx">events</span><span class="p">))</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">    <span class="k">for</span> <span class="nx">i</span><span class="p">,</span> <span class="nx">event</span> <span class="o">:=</span> <span class="k">range</span> <span class="nx">l</span><span class="p">.</span><span class="nx">events</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">        <span class="nx">result</span><span class="p">[</span><span class="nx">i</span><span class="p">]</span> <span class="p">=</span> <span class="nf">cloneDomainEvent</span><span class="p">(</span><span class="nx">event</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">    <span class="k">return</span> <span class="nx">result</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>這裡展示的是教學用記錄邊界。真正 event store 還需要持久化、排序、[schema <a href="/blog/backend/knowledge-cards/migration/" data-link-title="Migration" data-link-desc="說明系統如何把資料、流量或結構從舊狀態移到新狀態">migration</a>](/go/backend/knowledge-cards/schema-migration)、重播策略與交易語意。</p>
<h2 id="策略state-repository-保存目前狀態">【策略】state repository 保存目前狀態</h2>
<p>state repository 的核心責任是回答目前狀態。它可以由 event 更新，但用途不同於保存所有歷史事實的 event log。</p>
<p>例如：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln">1</span><span class="cl"><span class="kd">type</span> <span class="nx">JobRepository</span> <span class="kd">interface</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">    <span class="nf">Apply</span><span class="p">(</span><span class="nx">ctx</span> <span class="nx">context</span><span class="p">.</span><span class="nx">Context</span><span class="p">,</span> <span class="nx">event</span> <span class="nx">DomainEvent</span><span class="p">)</span> <span class="kt">error</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">    <span class="nf">Get</span><span class="p">(</span><span class="nx">ctx</span> <span class="nx">context</span><span class="p">.</span><span class="nx">Context</span><span class="p">,</span> <span class="nx">id</span> <span class="kt">string</span><span class="p">)</span> <span class="p">(</span><span class="nx">JobProjection</span><span class="p">,</span> <span class="kt">bool</span><span class="p">,</span> <span class="kt">error</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>event log 和 state repository 可以在 processor 中各自被呼叫：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="kd">type</span> <span class="nx">RecordingEventProcessor</span> <span class="kd">struct</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="nx">eventLog</span>   <span class="nx">EventLog</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">    <span class="nx">repository</span> <span class="nx">JobRepository</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">    <span class="nx">logger</span>     <span class="o">*</span><span class="nx">slog</span><span class="p">.</span><span class="nx">Logger</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="kd">func</span> <span class="p">(</span><span class="nx">p</span> <span class="o">*</span><span class="nx">RecordingEventProcessor</span><span class="p">)</span> <span class="nf">Process</span><span class="p">(</span><span class="nx">ctx</span> <span class="nx">context</span><span class="p">.</span><span class="nx">Context</span><span class="p">,</span> <span class="nx">event</span> <span class="nx">DomainEvent</span><span class="p">)</span> <span class="kt">error</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">    <span class="k">if</span> <span class="nx">err</span> <span class="o">:=</span> <span class="nx">p</span><span class="p">.</span><span class="nx">eventLog</span><span class="p">.</span><span class="nf">Append</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span> <span class="nx">event</span><span class="p">);</span> <span class="nx">err</span> <span class="o">!=</span> <span class="kc">nil</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">        <span class="k">return</span> <span class="nx">fmt</span><span class="p">.</span><span class="nf">Errorf</span><span class="p">(</span><span class="s">&#34;append event log: %w&#34;</span><span class="p">,</span> <span class="nx">err</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">
</span></span><span class="line"><span class="ln">12</span><span class="cl">    <span class="k">if</span> <span class="nx">err</span> <span class="o">:=</span> <span class="nx">p</span><span class="p">.</span><span class="nx">repository</span><span class="p">.</span><span class="nf">Apply</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span> <span class="nx">event</span><span class="p">);</span> <span class="nx">err</span> <span class="o">!=</span> <span class="kc">nil</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">        <span class="k">return</span> <span class="nx">fmt</span><span class="p">.</span><span class="nf">Errorf</span><span class="p">(</span><span class="s">&#34;apply state projection: %w&#34;</span><span class="p">,</span> <span class="nx">err</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">
</span></span><span class="line"><span class="ln">16</span><span class="cl">    <span class="nx">p</span><span class="p">.</span><span class="nx">logger</span><span class="p">.</span><span class="nf">Info</span><span class="p">(</span><span class="s">&#34;event processed&#34;</span><span class="p">,</span> <span class="nf">LogAttrsForEvent</span><span class="p">(</span><span class="nx">event</span><span class="p">)</span><span class="o">...</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl">    <span class="k">return</span> <span class="kc">nil</span>
</span></span><span class="line"><span class="ln">18</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>這段程式展示三種記錄邊界：event log 保存事實，repository 更新目前狀態，structured log 記錄操作訊號。</p>
<h2 id="判讀記錄位置要跟錯誤發生層一致">【判讀】記錄位置要跟錯誤發生層一致</h2>
<p>記錄位置的核心規則是在哪一層能提供最多上下文，就在哪一層記錄。同一個錯誤通常選擇一個主要層次記錄，避免 log 被重複訊號淹沒。</p>
<p>常見位置：</p>
<table>
  <thead>
      <tr>
          <th>發生位置</th>
          <th>應記錄內容</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>adapter</td>
          <td>raw input decode/normalize 失敗</td>
      </tr>
      <tr>
          <td>router/usecase</td>
          <td>command 被拒絕、權限不足、狀態不允許</td>
      </tr>
      <tr>
          <td>processor</td>
          <td>event validation、dedup、<a href="/blog/backend/knowledge-cards/projection/" data-link-title="Projection" data-link-desc="說明從事件流或資料變更推算出查詢用讀取視圖的轉換機制">projection</a> apply 結果</td>
      </tr>
      <tr>
          <td>worker</td>
          <td>queue full、外部來源失敗、重試結果</td>
      </tr>
  </tbody>
</table>
<p>例如 adapter 解碼失敗：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln">1</span><span class="cl"><span class="nx">logger</span><span class="p">.</span><span class="nf">Warn</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">    <span class="s">&#34;callback rejected&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">    <span class="s">&#34;layer&#34;</span><span class="p">,</span> <span class="s">&#34;adapter&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">    <span class="s">&#34;reason&#34;</span><span class="p">,</span> <span class="nx">ReasonInvalidPayload</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">    <span class="s">&#34;payload_bytes&#34;</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="nx">body</span><span class="p">),</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="p">)</span></span></span></code></pre></div><p>這裡記錄 payload 大小即可診斷資料是否異常；完整 payload 可能包含敏感資料或過大內容。</p>
<h2 id="策略敏感資料預設不進-log">【策略】敏感資料預設不進 log</h2>
<p>敏感資料邊界的核心規則是 log 會被保存、轉發與搜尋，所以 token、password、完整 payload、完整個資應排除在 log 之外。</p>
<p>可以記錄：</p>
<ul>
<li>ID 或 opaque identifier</li>
<li>payload byte length</li>
<li>schema version</li>
<li>欄位是否存在</li>
<li>hash 或 checksum</li>
</ul>
<p>不應記錄：</p>
<ul>
<li>password</li>
<li>access token</li>
<li>cookie</li>
<li>完整 request body</li>
<li>完整 personal data</li>
</ul>
<p>若需要追蹤同一筆資料，可以記錄安全識別碼：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln">1</span><span class="cl"><span class="nx">logger</span><span class="p">.</span><span class="nf">Debug</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">    <span class="s">&#34;payload received&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">    <span class="s">&#34;payload_bytes&#34;</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="nx">body</span><span class="p">),</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">    <span class="s">&#34;payload_sha256&#34;</span><span class="p">,</span> <span class="nf">sha256Hex</span><span class="p">(</span><span class="nx">body</span><span class="p">),</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="p">)</span></span></span></code></pre></div><p>debug log 也需要遵守同樣規則；只要可能被集中收集，就要先控制敏感資料。</p>
<h2 id="執行log-helper-測試只測穩定欄位">【執行】log helper 測試只測穩定欄位</h2>
<p>log helper 測試的核心目標是保護欄位名稱與值。log message 文案是給人讀的內容，通常保留調整空間。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="kd">func</span> <span class="nf">TestLogAttrsForEvent</span><span class="p">(</span><span class="nx">t</span> <span class="o">*</span><span class="nx">testing</span><span class="p">.</span><span class="nx">T</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="nx">event</span> <span class="o">:=</span> <span class="nx">DomainEvent</span><span class="p">{</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">        <span class="nx">ID</span><span class="p">:</span>            <span class="s">&#34;evt_1&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">        <span class="nx">Type</span><span class="p">:</span>          <span class="nx">EventNotificationCreated</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">        <span class="nx">SubjectKind</span><span class="p">:</span>   <span class="nx">SubjectNotification</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">        <span class="nx">SubjectID</span><span class="p">:</span>     <span class="s">&#34;ntf_1&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">        <span class="nx">CorrelationID</span><span class="p">:</span> <span class="s">&#34;corr_1&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">        <span class="nx">SchemaVersion</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">
</span></span><span class="line"><span class="ln">11</span><span class="cl">    <span class="nx">attrs</span> <span class="o">:=</span> <span class="nf">LogAttrsForEvent</span><span class="p">(</span><span class="nx">event</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">    <span class="nx">got</span> <span class="o">:=</span> <span class="nf">attrsToMap</span><span class="p">(</span><span class="nx">attrs</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">
</span></span><span class="line"><span class="ln">14</span><span class="cl">    <span class="k">if</span> <span class="nx">got</span><span class="p">[</span><span class="s">&#34;event_id&#34;</span><span class="p">]</span> <span class="o">!=</span> <span class="s">&#34;evt_1&#34;</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">        <span class="nx">t</span><span class="p">.</span><span class="nf">Fatalf</span><span class="p">(</span><span class="s">&#34;event_id = %v, want evt_1&#34;</span><span class="p">,</span> <span class="nx">got</span><span class="p">[</span><span class="s">&#34;event_id&#34;</span><span class="p">])</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl">    <span class="k">if</span> <span class="nx">got</span><span class="p">[</span><span class="s">&#34;event_type&#34;</span><span class="p">]</span> <span class="o">!=</span> <span class="nb">string</span><span class="p">(</span><span class="nx">EventNotificationCreated</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">18</span><span class="cl">        <span class="nx">t</span><span class="p">.</span><span class="nf">Fatalf</span><span class="p">(</span><span class="s">&#34;event_type = %v, want %s&#34;</span><span class="p">,</span> <span class="nx">got</span><span class="p">[</span><span class="s">&#34;event_type&#34;</span><span class="p">],</span> <span class="nx">EventNotificationCreated</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">19</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">20</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>測試輔助函式可以把 key-value slice 轉成 map：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="kd">func</span> <span class="nf">attrsToMap</span><span class="p">(</span><span class="nx">attrs</span> <span class="p">[]</span><span class="kt">any</span><span class="p">)</span> <span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">any</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="nx">result</span> <span class="o">:=</span> <span class="nb">make</span><span class="p">(</span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">any</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">    <span class="k">for</span> <span class="nx">i</span> <span class="o">:=</span> <span class="mi">0</span><span class="p">;</span> <span class="nx">i</span><span class="o">+</span><span class="mi">1</span> <span class="p">&lt;</span> <span class="nb">len</span><span class="p">(</span><span class="nx">attrs</span><span class="p">);</span> <span class="nx">i</span> <span class="o">+=</span> <span class="mi">2</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">        <span class="nx">key</span><span class="p">,</span> <span class="nx">ok</span> <span class="o">:=</span> <span class="nx">attrs</span><span class="p">[</span><span class="nx">i</span><span class="p">].(</span><span class="kt">string</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">        <span class="k">if</span> <span class="p">!</span><span class="nx">ok</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">            <span class="k">continue</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">        <span class="nx">result</span><span class="p">[</span><span class="nx">key</span><span class="p">]</span> <span class="p">=</span> <span class="nx">attrs</span><span class="p">[</span><span class="nx">i</span><span class="o">+</span><span class="mi">1</span><span class="p">]</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">    <span class="k">return</span> <span class="nx">result</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>這個測試直接檢查 helper 輸出，不需要真的寫 log 或解析 logger output。</p>
<h2 id="執行event-log-測試要保護-append-與-copy">【執行】event log 測試要保護 append 與 copy</h2>
<p>event log 測試的核心目標是確認事件被 append，且外部無法透過原始 payload 或回傳值修改內部紀錄。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="kd">func</span> <span class="nf">TestInMemoryEventLogAppendCopiesPayload</span><span class="p">(</span><span class="nx">t</span> <span class="o">*</span><span class="nx">testing</span><span class="p">.</span><span class="nx">T</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="nx">log</span> <span class="o">:=</span> <span class="nf">NewInMemoryEventLog</span><span class="p">()</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">    <span class="nx">payload</span> <span class="o">:=</span> <span class="nx">json</span><span class="p">.</span><span class="nf">RawMessage</span><span class="p">(</span><span class="s">`{&#34;topic&#34;:&#34;deployments&#34;}`</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">    <span class="nx">event</span> <span class="o">:=</span> <span class="nx">DomainEvent</span><span class="p">{</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">        <span class="nx">ID</span><span class="p">:</span>            <span class="s">&#34;evt_1&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">        <span class="nx">Type</span><span class="p">:</span>          <span class="nx">EventNotificationCreated</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">        <span class="nx">SubjectKind</span><span class="p">:</span>   <span class="nx">SubjectNotification</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">        <span class="nx">SubjectID</span><span class="p">:</span>     <span class="s">&#34;ntf_1&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">        <span class="nx">OccurredAt</span><span class="p">:</span>    <span class="nx">time</span><span class="p">.</span><span class="nf">Date</span><span class="p">(</span><span class="mi">2026</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">22</span><span class="p">,</span> <span class="mi">10</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="nx">time</span><span class="p">.</span><span class="nx">UTC</span><span class="p">),</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">        <span class="nx">ReceivedAt</span><span class="p">:</span>    <span class="nx">time</span><span class="p">.</span><span class="nf">Date</span><span class="p">(</span><span class="mi">2026</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">22</span><span class="p">,</span> <span class="mi">10</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="nx">time</span><span class="p">.</span><span class="nx">UTC</span><span class="p">),</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">        <span class="nx">SchemaVersion</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">        <span class="nx">Payload</span><span class="p">:</span>       <span class="nx">payload</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">
</span></span><span class="line"><span class="ln">16</span><span class="cl">    <span class="k">if</span> <span class="nx">err</span> <span class="o">:=</span> <span class="nx">log</span><span class="p">.</span><span class="nf">Append</span><span class="p">(</span><span class="nx">context</span><span class="p">.</span><span class="nf">Background</span><span class="p">(),</span> <span class="nx">event</span><span class="p">);</span> <span class="nx">err</span> <span class="o">!=</span> <span class="kc">nil</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl">        <span class="nx">t</span><span class="p">.</span><span class="nf">Fatalf</span><span class="p">(</span><span class="s">&#34;append event: %v&#34;</span><span class="p">,</span> <span class="nx">err</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">18</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">19</span><span class="cl">
</span></span><span class="line"><span class="ln">20</span><span class="cl">    <span class="nx">payload</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="p">=</span> <span class="sc">&#39;[&#39;</span>
</span></span><span class="line"><span class="ln">21</span><span class="cl">
</span></span><span class="line"><span class="ln">22</span><span class="cl">    <span class="nx">events</span> <span class="o">:=</span> <span class="nx">log</span><span class="p">.</span><span class="nf">List</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">23</span><span class="cl">    <span class="k">if</span> <span class="nb">string</span><span class="p">(</span><span class="nx">events</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="nx">Payload</span><span class="p">)</span> <span class="o">!=</span> <span class="s">`{&#34;topic&#34;:&#34;deployments&#34;}`</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">24</span><span class="cl">        <span class="nx">t</span><span class="p">.</span><span class="nf">Fatalf</span><span class="p">(</span><span class="s">&#34;payload was modified through original slice&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">25</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">26</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p><code>json.RawMessage</code> 本質是 <code>[]byte</code>，所以需要 copy。這類細節很容易被忽略，測試可以把邊界固定下來。</p>
<h2 id="實作檢查清單">實作檢查清單</h2>
<p>新增結構化記錄欄位時，可以依序檢查：</p>
<ol>
<li>這筆資料是給除錯、重播，還是查詢</li>
<li>structured log 是否只保存操作訊號與安全欄位</li>
<li>event log 是否保存 normalized domain event</li>
<li>state repository 是否只保存目前 projection</li>
<li>log 欄位名稱是否穩定</li>
<li><code>reason</code> 是否是小集合分類</li>
<li>是否避免完整 payload 與敏感資料</li>
<li>event log 是否保護 copy boundary</li>
<li>測試是否檢查穩定欄位，而不是自由文字</li>
</ol>
<h2 id="設計檢查">設計檢查</h2>
<h3 id="檢查一log-服務操作診斷">檢查一：log 服務操作診斷</h3>
<p>log 是操作診斷訊號，不是穩定查詢 API。需要使用者查詢的目前狀態，應該進 repository 或 <a href="/blog/backend/knowledge-cards/read-model/" data-link-title="Read Model" data-link-desc="說明為查詢場景建立的讀取模型，與正式狀態的責任分離">read model</a>。</p>
<h3 id="檢查二event-log-保存-normalized-fact">檢查二：event log 保存 normalized fact</h3>
<p>event log 記錄的是 normalized fact。若把暫時性錯誤、debug 訊息與 raw payload 全塞進 event log，重播與 audit 會變得不可信。</p>
<h3 id="檢查三欄位名稱維持一致">檢查三：欄位名稱維持一致</h3>
<p><code>event_id</code>、<code>eventID</code>、<code>id</code> 混用會讓查詢失效。欄位 schema 要像 API 一樣維持穩定。</p>
<h3 id="檢查四完整-payload-需要明確策略">檢查四：完整 payload 需要明確策略</h3>
<p>完整 payload 可能包含敏感資料，也可能非常大。除非有明確安全與保存策略，否則只記錄大小、hash、ID 與必要欄位。</p>
<h2 id="本章不處理">本章不處理</h2>
<p>本章先處理 log、event log 與 repository 的分工；集中式 log 平台與可重播事件系統，會在下列章節再往外延伸：</p>
<ul>
<li><a href="/blog/go-advanced/07-distributed-operations/outbox-idempotency/" data-link-title="7.2 Durable queue、outbox 與 idempotency" data-link-desc="設計跨 process 事件傳遞的可靠性與去重邊界">Go 進階：Durable queue、outbox 與 idempotency</a></li>
<li><a href="/blog/go-advanced/07-distributed-operations/observability-pipeline/" data-link-title="7.4 Observability pipeline、metrics 與 tracing" data-link-desc="把 structured log、metric、trace 與 profile 組成可操作的診斷系統">Go 進階：Observability pipeline、metrics 與 tracing</a></li>
<li><a href="/blog/backend/04-observability/" data-link-title="模組四：可觀測性平台" data-link-desc="整理 log、metric、trace、dashboard 與 alert 的後端操作實務">Backend：可觀測性平台</a></li>
</ul>
<h2 id="和-go-教材的關係">和 Go 教材的關係</h2>
<p>這一章承接的是 event log、state repository 與 log schema；如果你要先回看語言教材，可以讀：</p>
<ul>
<li><a href="/blog/go/06-practical/new-event-type/" data-link-title="6.2 如何新增一種 domain event" data-link-desc="擴展事件常數、輸入驗證與處理流程">Go：如何新增一種 domain event</a></li>
<li><a href="/blog/go/06-practical/repository-port/" data-link-title="6.6 如何新增 repository port" data-link-desc="先建立儲存邊界，再決定 memory、SQLite 或外部資料庫實作">Go：如何新增 repository port</a></li>
<li><a href="/blog/go/07-refactoring/interface-boundary/" data-link-title="7.2 用 interface 隔離外部依賴" data-link-desc="建立小而穩定的測試替身">Go：用 interface 隔離外部依賴</a></li>
<li><a href="/blog/go-advanced/06-production-operations/log-fields/" data-link-title="6.3 結構化日誌欄位設計" data-link-desc="讓 log 可 grep、可聚合、可追蹤">Go：結構化日誌欄位設計</a></li>
</ul>
]]></content:encoded></item><item><title>3.5 logging - 日誌系統</title><link>https://tarrragon.github.io/blog/python/03-stdlib/logging/</link><pubDate>Tue, 20 Jan 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/python/03-stdlib/logging/</guid><description>&lt;p>&lt;code>logging&lt;/code> 模組提供了靈活的日誌記錄功能。相較於 &lt;code>print()&lt;/code>，日誌系統提供了等級控制、格式化和輸出目標管理等功能。&lt;/p>
&lt;h2 id="為什麼用-logging-而非-print">為什麼用 logging 而非 print？&lt;/h2>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="ln"> 1&lt;/span>&lt;span class="cl">&lt;span class="c1"># 使用 print 的問題&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 2&lt;/span>&lt;span class="cl">&lt;span class="nb">print&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s2">&amp;#34;Processing started&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="c1"># 無法控制輸出等級&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 3&lt;/span>&lt;span class="cl">&lt;span class="nb">print&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="sa">f&lt;/span>&lt;span class="s2">&amp;#34;Error: &lt;/span>&lt;span class="si">{&lt;/span>&lt;span class="n">error&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="c1"># 無法區分一般訊息和錯誤&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 4&lt;/span>&lt;span class="cl">&lt;span class="nb">print&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s2">&amp;#34;Debug: x =&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">x&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="c1"># 生產環境也會輸出&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 5&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 6&lt;/span>&lt;span class="cl">&lt;span class="c1"># 使用 logging 的好處&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 7&lt;/span>&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">logging&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 8&lt;/span>&lt;span class="cl">&lt;span class="n">logger&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">logging&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">getLogger&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="vm">__name__&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 9&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">10&lt;/span>&lt;span class="cl">&lt;span class="n">logger&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">info&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s2">&amp;#34;Processing started&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="c1"># 可以控制等級&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">11&lt;/span>&lt;span class="cl">&lt;span class="n">logger&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">error&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="sa">f&lt;/span>&lt;span class="s2">&amp;#34;Error: &lt;/span>&lt;span class="si">{&lt;/span>&lt;span class="n">error&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="c1"># 明確標示為錯誤&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">12&lt;/span>&lt;span class="cl">&lt;span class="n">logger&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">debug&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="sa">f&lt;/span>&lt;span class="s2">&amp;#34;x = &lt;/span>&lt;span class="si">{&lt;/span>&lt;span class="n">x&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="c1"># 只在 DEBUG 模式輸出&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="日誌等級">日誌等級&lt;/h2>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>等級&lt;/th>
 &lt;th>數值&lt;/th>
 &lt;th>使用時機&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>DEBUG&lt;/td>
 &lt;td>10&lt;/td>
 &lt;td>詳細的除錯資訊&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>INFO&lt;/td>
 &lt;td>20&lt;/td>
 &lt;td>一般的操作資訊&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>WARNING&lt;/td>
 &lt;td>30&lt;/td>
 &lt;td>警告但程式仍可運行&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>ERROR&lt;/td>
 &lt;td>40&lt;/td>
 &lt;td>錯誤但程式仍可運行&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>CRITICAL&lt;/td>
 &lt;td>50&lt;/td>
 &lt;td>嚴重錯誤，程式可能無法繼續&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;h2 id="實際範例hook-日誌系統">實際範例：Hook 日誌系統&lt;/h2>
&lt;p>來自 &lt;code>.claude/lib/hook_logging.py&lt;/code>：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="ln"> 1&lt;/span>&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">logging&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 2&lt;/span>&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">os&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 3&lt;/span>&lt;span class="cl">&lt;span class="kn">from&lt;/span> &lt;span class="nn">datetime&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">datetime&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 4&lt;/span>&lt;span class="cl">&lt;span class="kn">from&lt;/span> &lt;span class="nn">pathlib&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">Path&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 5&lt;/span>&lt;span class="cl">&lt;span class="kn">from&lt;/span> &lt;span class="nn">typing&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">Optional&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 6&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 7&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 8&lt;/span>&lt;span class="cl">&lt;span class="k">def&lt;/span> &lt;span class="nf">setup_hook_logging&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 9&lt;/span>&lt;span class="cl"> &lt;span class="n">hook_name&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nb">str&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">10&lt;/span>&lt;span class="cl"> &lt;span class="n">log_subdir&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="n">Optional&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="nb">str&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="kc">None&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">11&lt;/span>&lt;span class="cl"> &lt;span class="n">log_level&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="n">Optional&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="nb">int&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="kc">None&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">12&lt;/span>&lt;span class="cl"> &lt;span class="n">include_stderr&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nb">bool&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="kc">False&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">13&lt;/span>&lt;span class="cl">&lt;span class="p">)&lt;/span> &lt;span class="o">-&amp;gt;&lt;/span> &lt;span class="n">logging&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Logger&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">14&lt;/span>&lt;span class="cl"> &lt;span class="s2">&amp;#34;&amp;#34;&amp;#34;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">15&lt;/span>&lt;span class="cl">&lt;span class="s2"> 設定 Hook 日誌系統
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">16&lt;/span>&lt;span class="cl">&lt;span class="s2">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">17&lt;/span>&lt;span class="cl">&lt;span class="s2"> Args:
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">18&lt;/span>&lt;span class="cl">&lt;span class="s2"> hook_name: Hook 名稱，用於識別日誌來源
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">19&lt;/span>&lt;span class="cl">&lt;span class="s2"> log_subdir: 日誌子目錄，預設為 hook_name
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">20&lt;/span>&lt;span class="cl">&lt;span class="s2"> log_level: 日誌等級，預設根據環境變數決定
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">21&lt;/span>&lt;span class="cl">&lt;span class="s2"> include_stderr: 是否同時輸出到 stderr
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">22&lt;/span>&lt;span class="cl">&lt;span class="s2">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">23&lt;/span>&lt;span class="cl">&lt;span class="s2"> Returns:
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">24&lt;/span>&lt;span class="cl">&lt;span class="s2"> logging.Logger: 配置好的 Logger 實例
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">25&lt;/span>&lt;span class="cl">&lt;span class="s2"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">26&lt;/span>&lt;span class="cl"> &lt;span class="c1"># 決定日誌等級&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">27&lt;/span>&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="n">log_level&lt;/span> &lt;span class="ow">is&lt;/span> &lt;span class="kc">None&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">28&lt;/span>&lt;span class="cl"> &lt;span class="n">debug_mode&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">os&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">getenv&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s2">&amp;#34;HOOK_DEBUG&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s2">&amp;#34;&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">lower&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="o">==&lt;/span> &lt;span class="s2">&amp;#34;true&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">29&lt;/span>&lt;span class="cl"> &lt;span class="n">log_level&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">logging&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">DEBUG&lt;/span> &lt;span class="k">if&lt;/span> &lt;span class="n">debug_mode&lt;/span> &lt;span class="k">else&lt;/span> &lt;span class="n">logging&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">INFO&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">30&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">31&lt;/span>&lt;span class="cl"> &lt;span class="c1"># 建立 Logger&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">32&lt;/span>&lt;span class="cl"> &lt;span class="n">logger&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">logging&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">getLogger&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">hook_name&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">33&lt;/span>&lt;span class="cl"> &lt;span class="n">logger&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">setLevel&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">log_level&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">34&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">35&lt;/span>&lt;span class="cl"> &lt;span class="c1"># 避免重複添加 handler&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">36&lt;/span>&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="n">logger&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">handlers&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">37&lt;/span>&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">logger&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">38&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">39&lt;/span>&lt;span class="cl"> &lt;span class="c1"># 建立日誌目錄&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">40&lt;/span>&lt;span class="cl"> &lt;span class="n">project_root&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">os&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">environ&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">get&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s2">&amp;#34;CLAUDE_PROJECT_DIR&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">os&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">getcwd&lt;/span>&lt;span class="p">())&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">41&lt;/span>&lt;span class="cl"> &lt;span class="n">subdir&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">log_subdir&lt;/span> &lt;span class="ow">or&lt;/span> &lt;span class="n">hook_name&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">42&lt;/span>&lt;span class="cl"> &lt;span class="n">log_dir&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">Path&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">project_root&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">/&lt;/span> &lt;span class="s2">&amp;#34;.claude&amp;#34;&lt;/span> &lt;span class="o">/&lt;/span> &lt;span class="s2">&amp;#34;hook-logs&amp;#34;&lt;/span> &lt;span class="o">/&lt;/span> &lt;span class="n">subdir&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">43&lt;/span>&lt;span class="cl"> &lt;span class="n">log_dir&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">mkdir&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">parents&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="kc">True&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">exist_ok&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="kc">True&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">44&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">45&lt;/span>&lt;span class="cl"> &lt;span class="c1"># 日誌檔案路徑&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">46&lt;/span>&lt;span class="cl"> &lt;span class="n">timestamp&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">datetime&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">now&lt;/span>&lt;span class="p">()&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">strftime&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s2">&amp;#34;%Y%m&lt;/span>&lt;span class="si">%d&lt;/span>&lt;span class="s2">-%H%M%S&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">47&lt;/span>&lt;span class="cl"> &lt;span class="n">log_file&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">log_dir&lt;/span> &lt;span class="o">/&lt;/span> &lt;span class="sa">f&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="si">{&lt;/span>&lt;span class="n">hook_name&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s2">-&lt;/span>&lt;span class="si">{&lt;/span>&lt;span class="n">timestamp&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s2">.log&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">48&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">49&lt;/span>&lt;span class="cl"> &lt;span class="c1"># 設定 formatter&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">50&lt;/span>&lt;span class="cl"> &lt;span class="n">formatter&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">logging&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Formatter&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">51&lt;/span>&lt;span class="cl"> &lt;span class="s2">&amp;#34;[&lt;/span>&lt;span class="si">%(asctime)s&lt;/span>&lt;span class="s2">] &lt;/span>&lt;span class="si">%(levelname)s&lt;/span>&lt;span class="s2"> - &lt;/span>&lt;span class="si">%(message)s&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">52&lt;/span>&lt;span class="cl"> &lt;span class="n">datefmt&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;%Y-%m-&lt;/span>&lt;span class="si">%d&lt;/span>&lt;span class="s2"> %H:%M:%S&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">53&lt;/span>&lt;span class="cl"> &lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">54&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">55&lt;/span>&lt;span class="cl"> &lt;span class="c1"># 檔案 handler&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">56&lt;/span>&lt;span class="cl"> &lt;span class="n">file_handler&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">logging&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">FileHandler&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">log_file&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">encoding&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;utf-8&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">57&lt;/span>&lt;span class="cl"> &lt;span class="n">file_handler&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">setFormatter&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">formatter&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">58&lt;/span>&lt;span class="cl"> &lt;span class="n">logger&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">addHandler&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">file_handler&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">59&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">60&lt;/span>&lt;span class="cl"> &lt;span class="c1"># 可選的 stderr handler&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">61&lt;/span>&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="n">include_stderr&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">62&lt;/span>&lt;span class="cl"> &lt;span class="kn">import&lt;/span> &lt;span class="nn">sys&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">63&lt;/span>&lt;span class="cl"> &lt;span class="n">stderr_handler&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">logging&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">StreamHandler&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">sys&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">stderr&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">64&lt;/span>&lt;span class="cl"> &lt;span class="n">stderr_handler&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">setFormatter&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">formatter&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">65&lt;/span>&lt;span class="cl"> &lt;span class="n">logger&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">addHandler&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">stderr_handler&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">66&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">67&lt;/span>&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">logger&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="使用-logger">使用 Logger&lt;/h2>
&lt;h3 id="在-hook-腳本中使用">在 Hook 腳本中使用&lt;/h3>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="ln"> 1&lt;/span>&lt;span class="cl">&lt;span class="ch">#!/usr/bin/env python3&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 2&lt;/span>&lt;span class="cl">&lt;span class="kn">from&lt;/span> &lt;span class="nn">hook_logging&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">setup_hook_logging&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 3&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 4&lt;/span>&lt;span class="cl">&lt;span class="c1"># 初始化 logger&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 5&lt;/span>&lt;span class="cl">&lt;span class="n">logger&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">setup_hook_logging&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s2">&amp;#34;branch-verify&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 6&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 7&lt;/span>&lt;span class="cl">&lt;span class="k">def&lt;/span> &lt;span class="nf">main&lt;/span>&lt;span class="p">():&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 8&lt;/span>&lt;span class="cl"> &lt;span class="n">logger&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">info&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s2">&amp;#34;Hook started&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 9&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">10&lt;/span>&lt;span class="cl"> &lt;span class="n">branch&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">get_current_branch&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">11&lt;/span>&lt;span class="cl"> &lt;span class="n">logger&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">debug&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="sa">f&lt;/span>&lt;span class="s2">&amp;#34;Current branch: &lt;/span>&lt;span class="si">{&lt;/span>&lt;span class="n">branch&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">12&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">13&lt;/span>&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="n">is_protected_branch&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">branch&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">14&lt;/span>&lt;span class="cl"> &lt;span class="n">logger&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">warning&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="sa">f&lt;/span>&lt;span class="s2">&amp;#34;Operating on protected branch: &lt;/span>&lt;span class="si">{&lt;/span>&lt;span class="n">branch&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">15&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">16&lt;/span>&lt;span class="cl"> &lt;span class="k">try&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">17&lt;/span>&lt;span class="cl"> &lt;span class="c1"># 執行操作&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">18&lt;/span>&lt;span class="cl"> &lt;span class="n">result&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">do_something&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">19&lt;/span>&lt;span class="cl"> &lt;span class="n">logger&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">info&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="sa">f&lt;/span>&lt;span class="s2">&amp;#34;Operation completed: &lt;/span>&lt;span class="si">{&lt;/span>&lt;span class="n">result&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">20&lt;/span>&lt;span class="cl"> &lt;span class="k">except&lt;/span> &lt;span class="ne">Exception&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="n">e&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">21&lt;/span>&lt;span class="cl"> &lt;span class="n">logger&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">error&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="sa">f&lt;/span>&lt;span class="s2">&amp;#34;Operation failed: &lt;/span>&lt;span class="si">{&lt;/span>&lt;span class="n">e&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">22&lt;/span>&lt;span class="cl"> &lt;span class="k">raise&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">23&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">24&lt;/span>&lt;span class="cl">&lt;span class="k">if&lt;/span> &lt;span class="vm">__name__&lt;/span> &lt;span class="o">==&lt;/span> &lt;span class="s2">&amp;#34;__main__&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">25&lt;/span>&lt;span class="cl"> &lt;span class="n">main&lt;/span>&lt;span class="p">()&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="核心概念">核心概念&lt;/h2>
&lt;h3 id="logger">Logger&lt;/h3>
&lt;p>日誌記錄器，用於發送日誌訊息：&lt;/p></description><content:encoded><![CDATA[<p><code>logging</code> 模組提供了靈活的日誌記錄功能。相較於 <code>print()</code>，日誌系統提供了等級控制、格式化和輸出目標管理等功能。</p>
<h2 id="為什麼用-logging-而非-print">為什麼用 logging 而非 print？</h2>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># 使用 print 的問題</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="nb">print</span><span class="p">(</span><span class="s2">&#34;Processing started&#34;</span><span class="p">)</span>        <span class="c1"># 無法控制輸出等級</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;Error: </span><span class="si">{</span><span class="n">error</span><span class="si">}</span><span class="s2">&#34;</span><span class="p">)</span>          <span class="c1"># 無法區分一般訊息和錯誤</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="nb">print</span><span class="p">(</span><span class="s2">&#34;Debug: x =&#34;</span><span class="p">,</span> <span class="n">x</span><span class="p">)</span>            <span class="c1"># 生產環境也會輸出</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="c1"># 使用 logging 的好處</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="kn">import</span> <span class="nn">logging</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="n">logger</span> <span class="o">=</span> <span class="n">logging</span><span class="o">.</span><span class="n">getLogger</span><span class="p">(</span><span class="vm">__name__</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="n">logger</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="s2">&#34;Processing started&#34;</span><span class="p">)</span>   <span class="c1"># 可以控制等級</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="n">logger</span><span class="o">.</span><span class="n">error</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;Error: </span><span class="si">{</span><span class="n">error</span><span class="si">}</span><span class="s2">&#34;</span><span class="p">)</span>    <span class="c1"># 明確標示為錯誤</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="n">logger</span><span class="o">.</span><span class="n">debug</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;x = </span><span class="si">{</span><span class="n">x</span><span class="si">}</span><span class="s2">&#34;</span><span class="p">)</span>           <span class="c1"># 只在 DEBUG 模式輸出</span></span></span></code></pre></div><h2 id="日誌等級">日誌等級</h2>
<table>
  <thead>
      <tr>
          <th>等級</th>
          <th>數值</th>
          <th>使用時機</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>DEBUG</td>
          <td>10</td>
          <td>詳細的除錯資訊</td>
      </tr>
      <tr>
          <td>INFO</td>
          <td>20</td>
          <td>一般的操作資訊</td>
      </tr>
      <tr>
          <td>WARNING</td>
          <td>30</td>
          <td>警告但程式仍可運行</td>
      </tr>
      <tr>
          <td>ERROR</td>
          <td>40</td>
          <td>錯誤但程式仍可運行</td>
      </tr>
      <tr>
          <td>CRITICAL</td>
          <td>50</td>
          <td>嚴重錯誤，程式可能無法繼續</td>
      </tr>
  </tbody>
</table>
<h2 id="實際範例hook-日誌系統">實際範例：Hook 日誌系統</h2>
<p>來自 <code>.claude/lib/hook_logging.py</code>：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="kn">import</span> <span class="nn">logging</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="kn">import</span> <span class="nn">os</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="kn">from</span> <span class="nn">datetime</span> <span class="kn">import</span> <span class="n">datetime</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="kn">from</span> <span class="nn">pathlib</span> <span class="kn">import</span> <span class="n">Path</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="kn">from</span> <span class="nn">typing</span> <span class="kn">import</span> <span class="n">Optional</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">
</span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="k">def</span> <span class="nf">setup_hook_logging</span><span class="p">(</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">    <span class="n">hook_name</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">    <span class="n">log_subdir</span><span class="p">:</span> <span class="n">Optional</span><span class="p">[</span><span class="nb">str</span><span class="p">]</span> <span class="o">=</span> <span class="kc">None</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">    <span class="n">log_level</span><span class="p">:</span> <span class="n">Optional</span><span class="p">[</span><span class="nb">int</span><span class="p">]</span> <span class="o">=</span> <span class="kc">None</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">    <span class="n">include_stderr</span><span class="p">:</span> <span class="nb">bool</span> <span class="o">=</span> <span class="kc">False</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">logging</span><span class="o">.</span><span class="n">Logger</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">    <span class="s2">&#34;&#34;&#34;
</span></span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="s2">    設定 Hook 日誌系統
</span></span></span><span class="line"><span class="ln">16</span><span class="cl"><span class="s2">
</span></span></span><span class="line"><span class="ln">17</span><span class="cl"><span class="s2">    Args:
</span></span></span><span class="line"><span class="ln">18</span><span class="cl"><span class="s2">        hook_name: Hook 名稱，用於識別日誌來源
</span></span></span><span class="line"><span class="ln">19</span><span class="cl"><span class="s2">        log_subdir: 日誌子目錄，預設為 hook_name
</span></span></span><span class="line"><span class="ln">20</span><span class="cl"><span class="s2">        log_level: 日誌等級，預設根據環境變數決定
</span></span></span><span class="line"><span class="ln">21</span><span class="cl"><span class="s2">        include_stderr: 是否同時輸出到 stderr
</span></span></span><span class="line"><span class="ln">22</span><span class="cl"><span class="s2">
</span></span></span><span class="line"><span class="ln">23</span><span class="cl"><span class="s2">    Returns:
</span></span></span><span class="line"><span class="ln">24</span><span class="cl"><span class="s2">        logging.Logger: 配置好的 Logger 實例
</span></span></span><span class="line"><span class="ln">25</span><span class="cl"><span class="s2">    &#34;&#34;&#34;</span>
</span></span><span class="line"><span class="ln">26</span><span class="cl">    <span class="c1"># 決定日誌等級</span>
</span></span><span class="line"><span class="ln">27</span><span class="cl">    <span class="k">if</span> <span class="n">log_level</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">28</span><span class="cl">        <span class="n">debug_mode</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">getenv</span><span class="p">(</span><span class="s2">&#34;HOOK_DEBUG&#34;</span><span class="p">,</span> <span class="s2">&#34;&#34;</span><span class="p">)</span><span class="o">.</span><span class="n">lower</span><span class="p">()</span> <span class="o">==</span> <span class="s2">&#34;true&#34;</span>
</span></span><span class="line"><span class="ln">29</span><span class="cl">        <span class="n">log_level</span> <span class="o">=</span> <span class="n">logging</span><span class="o">.</span><span class="n">DEBUG</span> <span class="k">if</span> <span class="n">debug_mode</span> <span class="k">else</span> <span class="n">logging</span><span class="o">.</span><span class="n">INFO</span>
</span></span><span class="line"><span class="ln">30</span><span class="cl">
</span></span><span class="line"><span class="ln">31</span><span class="cl">    <span class="c1"># 建立 Logger</span>
</span></span><span class="line"><span class="ln">32</span><span class="cl">    <span class="n">logger</span> <span class="o">=</span> <span class="n">logging</span><span class="o">.</span><span class="n">getLogger</span><span class="p">(</span><span class="n">hook_name</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">33</span><span class="cl">    <span class="n">logger</span><span class="o">.</span><span class="n">setLevel</span><span class="p">(</span><span class="n">log_level</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">34</span><span class="cl">
</span></span><span class="line"><span class="ln">35</span><span class="cl">    <span class="c1"># 避免重複添加 handler</span>
</span></span><span class="line"><span class="ln">36</span><span class="cl">    <span class="k">if</span> <span class="n">logger</span><span class="o">.</span><span class="n">handlers</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">37</span><span class="cl">        <span class="k">return</span> <span class="n">logger</span>
</span></span><span class="line"><span class="ln">38</span><span class="cl">
</span></span><span class="line"><span class="ln">39</span><span class="cl">    <span class="c1"># 建立日誌目錄</span>
</span></span><span class="line"><span class="ln">40</span><span class="cl">    <span class="n">project_root</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">environ</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">&#34;CLAUDE_PROJECT_DIR&#34;</span><span class="p">,</span> <span class="n">os</span><span class="o">.</span><span class="n">getcwd</span><span class="p">())</span>
</span></span><span class="line"><span class="ln">41</span><span class="cl">    <span class="n">subdir</span> <span class="o">=</span> <span class="n">log_subdir</span> <span class="ow">or</span> <span class="n">hook_name</span>
</span></span><span class="line"><span class="ln">42</span><span class="cl">    <span class="n">log_dir</span> <span class="o">=</span> <span class="n">Path</span><span class="p">(</span><span class="n">project_root</span><span class="p">)</span> <span class="o">/</span> <span class="s2">&#34;.claude&#34;</span> <span class="o">/</span> <span class="s2">&#34;hook-logs&#34;</span> <span class="o">/</span> <span class="n">subdir</span>
</span></span><span class="line"><span class="ln">43</span><span class="cl">    <span class="n">log_dir</span><span class="o">.</span><span class="n">mkdir</span><span class="p">(</span><span class="n">parents</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">exist_ok</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">44</span><span class="cl">
</span></span><span class="line"><span class="ln">45</span><span class="cl">    <span class="c1"># 日誌檔案路徑</span>
</span></span><span class="line"><span class="ln">46</span><span class="cl">    <span class="n">timestamp</span> <span class="o">=</span> <span class="n">datetime</span><span class="o">.</span><span class="n">now</span><span class="p">()</span><span class="o">.</span><span class="n">strftime</span><span class="p">(</span><span class="s2">&#34;%Y%m</span><span class="si">%d</span><span class="s2">-%H%M%S&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">47</span><span class="cl">    <span class="n">log_file</span> <span class="o">=</span> <span class="n">log_dir</span> <span class="o">/</span> <span class="sa">f</span><span class="s2">&#34;</span><span class="si">{</span><span class="n">hook_name</span><span class="si">}</span><span class="s2">-</span><span class="si">{</span><span class="n">timestamp</span><span class="si">}</span><span class="s2">.log&#34;</span>
</span></span><span class="line"><span class="ln">48</span><span class="cl">
</span></span><span class="line"><span class="ln">49</span><span class="cl">    <span class="c1"># 設定 formatter</span>
</span></span><span class="line"><span class="ln">50</span><span class="cl">    <span class="n">formatter</span> <span class="o">=</span> <span class="n">logging</span><span class="o">.</span><span class="n">Formatter</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">51</span><span class="cl">        <span class="s2">&#34;[</span><span class="si">%(asctime)s</span><span class="s2">] </span><span class="si">%(levelname)s</span><span class="s2"> - </span><span class="si">%(message)s</span><span class="s2">&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">52</span><span class="cl">        <span class="n">datefmt</span><span class="o">=</span><span class="s2">&#34;%Y-%m-</span><span class="si">%d</span><span class="s2"> %H:%M:%S&#34;</span>
</span></span><span class="line"><span class="ln">53</span><span class="cl">    <span class="p">)</span>
</span></span><span class="line"><span class="ln">54</span><span class="cl">
</span></span><span class="line"><span class="ln">55</span><span class="cl">    <span class="c1"># 檔案 handler</span>
</span></span><span class="line"><span class="ln">56</span><span class="cl">    <span class="n">file_handler</span> <span class="o">=</span> <span class="n">logging</span><span class="o">.</span><span class="n">FileHandler</span><span class="p">(</span><span class="n">log_file</span><span class="p">,</span> <span class="n">encoding</span><span class="o">=</span><span class="s2">&#34;utf-8&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">57</span><span class="cl">    <span class="n">file_handler</span><span class="o">.</span><span class="n">setFormatter</span><span class="p">(</span><span class="n">formatter</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">58</span><span class="cl">    <span class="n">logger</span><span class="o">.</span><span class="n">addHandler</span><span class="p">(</span><span class="n">file_handler</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">59</span><span class="cl">
</span></span><span class="line"><span class="ln">60</span><span class="cl">    <span class="c1"># 可選的 stderr handler</span>
</span></span><span class="line"><span class="ln">61</span><span class="cl">    <span class="k">if</span> <span class="n">include_stderr</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">62</span><span class="cl">        <span class="kn">import</span> <span class="nn">sys</span>
</span></span><span class="line"><span class="ln">63</span><span class="cl">        <span class="n">stderr_handler</span> <span class="o">=</span> <span class="n">logging</span><span class="o">.</span><span class="n">StreamHandler</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">stderr</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">64</span><span class="cl">        <span class="n">stderr_handler</span><span class="o">.</span><span class="n">setFormatter</span><span class="p">(</span><span class="n">formatter</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">65</span><span class="cl">        <span class="n">logger</span><span class="o">.</span><span class="n">addHandler</span><span class="p">(</span><span class="n">stderr_handler</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">66</span><span class="cl">
</span></span><span class="line"><span class="ln">67</span><span class="cl">    <span class="k">return</span> <span class="n">logger</span></span></span></code></pre></div><h2 id="使用-logger">使用 Logger</h2>
<h3 id="在-hook-腳本中使用">在 Hook 腳本中使用</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="ch">#!/usr/bin/env python3</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="kn">from</span> <span class="nn">hook_logging</span> <span class="kn">import</span> <span class="n">setup_hook_logging</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="c1"># 初始化 logger</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="n">logger</span> <span class="o">=</span> <span class="n">setup_hook_logging</span><span class="p">(</span><span class="s2">&#34;branch-verify&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="k">def</span> <span class="nf">main</span><span class="p">():</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">    <span class="n">logger</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="s2">&#34;Hook started&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">
</span></span><span class="line"><span class="ln">10</span><span class="cl">    <span class="n">branch</span> <span class="o">=</span> <span class="n">get_current_branch</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">    <span class="n">logger</span><span class="o">.</span><span class="n">debug</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;Current branch: </span><span class="si">{</span><span class="n">branch</span><span class="si">}</span><span class="s2">&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">
</span></span><span class="line"><span class="ln">13</span><span class="cl">    <span class="k">if</span> <span class="n">is_protected_branch</span><span class="p">(</span><span class="n">branch</span><span class="p">):</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">        <span class="n">logger</span><span class="o">.</span><span class="n">warning</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;Operating on protected branch: </span><span class="si">{</span><span class="n">branch</span><span class="si">}</span><span class="s2">&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">
</span></span><span class="line"><span class="ln">16</span><span class="cl">    <span class="k">try</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl">        <span class="c1"># 執行操作</span>
</span></span><span class="line"><span class="ln">18</span><span class="cl">        <span class="n">result</span> <span class="o">=</span> <span class="n">do_something</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">19</span><span class="cl">        <span class="n">logger</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;Operation completed: </span><span class="si">{</span><span class="n">result</span><span class="si">}</span><span class="s2">&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">20</span><span class="cl">    <span class="k">except</span> <span class="ne">Exception</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">21</span><span class="cl">        <span class="n">logger</span><span class="o">.</span><span class="n">error</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;Operation failed: </span><span class="si">{</span><span class="n">e</span><span class="si">}</span><span class="s2">&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">22</span><span class="cl">        <span class="k">raise</span>
</span></span><span class="line"><span class="ln">23</span><span class="cl">
</span></span><span class="line"><span class="ln">24</span><span class="cl"><span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s2">&#34;__main__&#34;</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">25</span><span class="cl">    <span class="n">main</span><span class="p">()</span></span></span></code></pre></div><h2 id="核心概念">核心概念</h2>
<h3 id="logger">Logger</h3>
<p>日誌記錄器，用於發送日誌訊息：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="ln">1</span><span class="cl"><span class="kn">import</span> <span class="nn">logging</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="c1"># 取得 logger（使用模組名稱作為標識）</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="n">logger</span> <span class="o">=</span> <span class="n">logging</span><span class="o">.</span><span class="n">getLogger</span><span class="p">(</span><span class="vm">__name__</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">
</span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="c1"># 或使用自訂名稱</span>
</span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="n">logger</span> <span class="o">=</span> <span class="n">logging</span><span class="o">.</span><span class="n">getLogger</span><span class="p">(</span><span class="s2">&#34;my_app&#34;</span><span class="p">)</span></span></span></code></pre></div><h3 id="handler">Handler</h3>
<p>決定日誌輸出到哪裡：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="kn">import</span> <span class="nn">logging</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="n">logger</span> <span class="o">=</span> <span class="n">logging</span><span class="o">.</span><span class="n">getLogger</span><span class="p">(</span><span class="s2">&#34;my_app&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="c1"># 輸出到檔案</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="n">file_handler</span> <span class="o">=</span> <span class="n">logging</span><span class="o">.</span><span class="n">FileHandler</span><span class="p">(</span><span class="s2">&#34;app.log&#34;</span><span class="p">,</span> <span class="n">encoding</span><span class="o">=</span><span class="s2">&#34;utf-8&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="n">logger</span><span class="o">.</span><span class="n">addHandler</span><span class="p">(</span><span class="n">file_handler</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="c1"># 輸出到控制台</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="n">console_handler</span> <span class="o">=</span> <span class="n">logging</span><span class="o">.</span><span class="n">StreamHandler</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="n">logger</span><span class="o">.</span><span class="n">addHandler</span><span class="p">(</span><span class="n">console_handler</span><span class="p">)</span></span></span></code></pre></div><h3 id="formatter">Formatter</h3>
<p>決定日誌的格式：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="ln">1</span><span class="cl"><span class="kn">import</span> <span class="nn">logging</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="n">formatter</span> <span class="o">=</span> <span class="n">logging</span><span class="o">.</span><span class="n">Formatter</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">    <span class="s2">&#34;[</span><span class="si">%(asctime)s</span><span class="s2">] </span><span class="si">%(levelname)s</span><span class="s2"> - </span><span class="si">%(name)s</span><span class="s2"> - </span><span class="si">%(message)s</span><span class="s2">&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">    <span class="n">datefmt</span><span class="o">=</span><span class="s2">&#34;%Y-%m-</span><span class="si">%d</span><span class="s2"> %H:%M:%S&#34;</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="p">)</span>
</span></span><span class="line"><span class="ln">7</span><span class="cl">
</span></span><span class="line"><span class="ln">8</span><span class="cl"><span class="n">handler</span> <span class="o">=</span> <span class="n">logging</span><span class="o">.</span><span class="n">FileHandler</span><span class="p">(</span><span class="s2">&#34;app.log&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">9</span><span class="cl"><span class="n">handler</span><span class="o">.</span><span class="n">setFormatter</span><span class="p">(</span><span class="n">formatter</span><span class="p">)</span></span></span></code></pre></div><h3 id="格式化字串變數">格式化字串變數</h3>
<table>
  <thead>
      <tr>
          <th>變數</th>
          <th>說明</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><code>%(asctime)s</code></td>
          <td>時間戳</td>
      </tr>
      <tr>
          <td><code>%(levelname)s</code></td>
          <td>日誌等級名稱</td>
      </tr>
      <tr>
          <td><code>%(name)s</code></td>
          <td>Logger 名稱</td>
      </tr>
      <tr>
          <td><code>%(message)s</code></td>
          <td>日誌訊息</td>
      </tr>
      <tr>
          <td><code>%(filename)s</code></td>
          <td>檔案名稱</td>
      </tr>
      <tr>
          <td><code>%(lineno)d</code></td>
          <td>行號</td>
      </tr>
      <tr>
          <td><code>%(funcName)s</code></td>
          <td>函式名稱</td>
      </tr>
  </tbody>
</table>
<h2 id="實用技巧">實用技巧</h2>
<h3 id="避免重複-handler">避免重複 Handler</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="k">def</span> <span class="nf">setup_logger</span><span class="p">(</span><span class="n">name</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">logging</span><span class="o">.</span><span class="n">Logger</span><span class="p">:</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="n">logger</span> <span class="o">=</span> <span class="n">logging</span><span class="o">.</span><span class="n">getLogger</span><span class="p">(</span><span class="n">name</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">    <span class="n">logger</span><span class="o">.</span><span class="n">setLevel</span><span class="p">(</span><span class="n">logging</span><span class="o">.</span><span class="n">INFO</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">    <span class="c1"># 重要：檢查是否已有 handler</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">    <span class="k">if</span> <span class="n">logger</span><span class="o">.</span><span class="n">handlers</span><span class="p">:</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">        <span class="k">return</span> <span class="n">logger</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">    <span class="n">handler</span> <span class="o">=</span> <span class="n">logging</span><span class="o">.</span><span class="n">FileHandler</span><span class="p">(</span><span class="s2">&#34;app.log&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">    <span class="n">logger</span><span class="o">.</span><span class="n">addHandler</span><span class="p">(</span><span class="n">handler</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">    <span class="k">return</span> <span class="n">logger</span></span></span></code></pre></div><h3 id="環境變數控制日誌等級">環境變數控制日誌等級</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="kn">import</span> <span class="nn">os</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="kn">import</span> <span class="nn">logging</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="k">def</span> <span class="nf">get_log_level</span><span class="p">()</span> <span class="o">-&gt;</span> <span class="nb">int</span><span class="p">:</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">    <span class="s2">&#34;&#34;&#34;從環境變數取得日誌等級&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">    <span class="n">level_name</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">getenv</span><span class="p">(</span><span class="s2">&#34;LOG_LEVEL&#34;</span><span class="p">,</span> <span class="s2">&#34;INFO&#34;</span><span class="p">)</span><span class="o">.</span><span class="n">upper</span><span class="p">()</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">    <span class="k">return</span> <span class="nb">getattr</span><span class="p">(</span><span class="n">logging</span><span class="p">,</span> <span class="n">level_name</span><span class="p">,</span> <span class="n">logging</span><span class="o">.</span><span class="n">INFO</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="n">logger</span> <span class="o">=</span> <span class="n">logging</span><span class="o">.</span><span class="n">getLogger</span><span class="p">(</span><span class="vm">__name__</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="n">logger</span><span class="o">.</span><span class="n">setLevel</span><span class="p">(</span><span class="n">get_log_level</span><span class="p">())</span></span></span></code></pre></div><h3 id="日誌輪替">日誌輪替</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="ln">1</span><span class="cl"><span class="kn">from</span> <span class="nn">logging.handlers</span> <span class="kn">import</span> <span class="n">RotatingFileHandler</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="n">handler</span> <span class="o">=</span> <span class="n">RotatingFileHandler</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">    <span class="s2">&#34;app.log&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">    <span class="n">maxBytes</span><span class="o">=</span><span class="mi">10</span><span class="o">*</span><span class="mi">1024</span><span class="o">*</span><span class="mi">1024</span><span class="p">,</span>  <span class="c1"># 10MB</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl">    <span class="n">backupCount</span><span class="o">=</span><span class="mi">5</span>           <span class="c1"># 保留 5 個備份</span>
</span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="p">)</span></span></span></code></pre></div><h2 id="日誌檔案結構">日誌檔案結構</h2>
<p>Hook 系統的日誌結構：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">.claude/hook-logs/
</span></span><span class="line"><span class="ln">2</span><span class="cl">├── branch-verify/
</span></span><span class="line"><span class="ln">3</span><span class="cl">│   ├── branch-verify-20240120-153000.log
</span></span><span class="line"><span class="ln">4</span><span class="cl">│   └── branch-verify-20240120-160000.log
</span></span><span class="line"><span class="ln">5</span><span class="cl">├── ticket-quality-gate/
</span></span><span class="line"><span class="ln">6</span><span class="cl">│   └── ticket-quality-gate-20240120-155000.log
</span></span><span class="line"><span class="ln">7</span><span class="cl">└── ...</span></span></code></pre></div><h2 id="最佳實踐">最佳實踐</h2>
<h3 id="1-使用-__name__-作為-logger-名稱">1. 使用 <code>__name__</code> 作為 Logger 名稱</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="ln">1</span><span class="cl"><span class="kn">import</span> <span class="nn">logging</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="c1"># 好：使用模組名稱，便於追蹤</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="n">logger</span> <span class="o">=</span> <span class="n">logging</span><span class="o">.</span><span class="n">getLogger</span><span class="p">(</span><span class="vm">__name__</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">
</span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="c1"># 不好：使用固定字串，難以區分來源</span>
</span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="n">logger</span> <span class="o">=</span> <span class="n">logging</span><span class="o">.</span><span class="n">getLogger</span><span class="p">(</span><span class="s2">&#34;my_logger&#34;</span><span class="p">)</span></span></span></code></pre></div><h3 id="2-在適當的等級記錄訊息">2. 在適當的等級記錄訊息</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="ln">1</span><span class="cl"><span class="n">logger</span><span class="o">.</span><span class="n">debug</span><span class="p">(</span><span class="s2">&#34;Variable x = </span><span class="si">%s</span><span class="s2">&#34;</span><span class="p">,</span> <span class="n">x</span><span class="p">)</span>           <span class="c1"># 詳細除錯</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="n">logger</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="s2">&#34;Processing file </span><span class="si">%s</span><span class="s2">&#34;</span><span class="p">,</span> <span class="n">filename</span><span class="p">)</span>  <span class="c1"># 一般操作</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="n">logger</span><span class="o">.</span><span class="n">warning</span><span class="p">(</span><span class="s2">&#34;Config not found, using default&#34;</span><span class="p">)</span>  <span class="c1"># 警告</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="n">logger</span><span class="o">.</span><span class="n">error</span><span class="p">(</span><span class="s2">&#34;Failed to connect: </span><span class="si">%s</span><span class="s2">&#34;</span><span class="p">,</span> <span class="n">error</span><span class="p">)</span>  <span class="c1"># 錯誤</span></span></span></code></pre></div><h3 id="3-使用延遲格式化">3. 使用延遲格式化</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># 好：使用 % 格式化（只在需要時才格式化）</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="n">logger</span><span class="o">.</span><span class="n">debug</span><span class="p">(</span><span class="s2">&#34;Data: </span><span class="si">%s</span><span class="s2">&#34;</span><span class="p">,</span> <span class="n">expensive_function</span><span class="p">())</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="c1"># 不好：f-string 總是會執行</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="n">logger</span><span class="o">.</span><span class="n">debug</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;Data: </span><span class="si">{</span><span class="n">expensive_function</span><span class="p">()</span><span class="si">}</span><span class="s2">&#34;</span><span class="p">)</span></span></span></code></pre></div><h2 id="思考題">思考題</h2>
<ol>
<li>為什麼 <code>setup_hook_logging</code> 要檢查 <code>logger.handlers</code>？</li>
<li><code>logging.DEBUG</code> 和 <code>logging.INFO</code> 的差別是什麼？什麼時候用哪個？</li>
<li>如何讓日誌同時輸出到檔案和控制台？</li>
</ol>
<h2 id="實作練習">實作練習</h2>
<ol>
<li>修改 <code>setup_hook_logging</code>，添加日誌輪替功能</li>
<li>實作一個裝飾器，自動記錄函式的進入和離開</li>
<li>建立一個日誌分析腳本，統計各等級日誌的數量</li>
</ol>
<hr>
<p><em>上一章：<a href="/blog/python/03-stdlib/regex/" data-link-title="3.4 re - 正規表達式" data-link-desc="文字模式匹配與擷取">re - 正規表達式</a></em>
<em>下一章：<a href="/blog/python/03-stdlib/argparse/" data-link-title="3.6 argparse - CLI 介面" data-link-desc="命令列參數解析">argparse - CLI 介面</a></em></p>
]]></content:encoded></item><item><title>模組六：可觀測性與 log 一併寫進 code</title><link>https://tarrragon.github.io/blog/infra/06-observability-logging/</link><pubDate>Fri, 26 Jun 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/infra/06-observability-logging/</guid><description>&lt;p>可觀測性要跟它監控的資源同生命週期：log group、metric 與 alarm 寫進建立資源的同一套 IaC，資源開出來的那一刻監控就在線，而非等出事才補。少了這條規則的代價很具體：凌晨資料庫 CPU 飆到 100%、API 開始逾時，值班工程師打開 console 想看 log，卻發現那個服務根本沒接 log group、metric 也只有 vendor 預設的幾條粗線，追不到呼叫鏈、查不到錯誤訊息，只能靠重啟賭它恢復。&lt;/p>
&lt;h2 id="observability-跟-infra-同一套-code同生命週期">observability 跟 infra 同一套 code、同生命週期&lt;/h2>
&lt;p>可觀測性是基礎設施的一部分，承擔「讓資源在出事時可被追查」的責任，因此它的建立、變更與銷毀要跟被監控的資源綁在同一個生命週期裡。一個 RDS 實例、一個 Lambda、一個 ECS service 被 IaC 建立時，它的 log group、它的關鍵 metric alarm 應該在同一份 plan 裡一起 apply；這個資源被 destroy 時，對應的 alarm 也一起收掉，不留下對著空資源狂叫的孤兒告警。&lt;/p>
&lt;p>把監控外掛在資源之外會製造兩種漂移。第一種是新資源沒有監控：service 透過 PR 加上去了，但 alarm 要某人事後手動進 console 點，於是有些 service 有 alarm、有些沒有，覆蓋率取決於誰記得。第二種是死資源留下殘響：資源砍了但 alarm 還在，半夜對著不存在的 target 噴 &lt;code>INSUFFICIENT_DATA&lt;/code>，值班的人學會忽略它，告警疲勞讓真的事故也被一起忽略。兩種漂移的共同根因都是監控跟資源不在同一個 apply 單位裡。&lt;/p>
&lt;p>判讀訊號很直接：如果有人能回答「這個服務有沒有 alarm」要去翻 console 而不是讀 code，監控就已經跟資源脫鉤了。修法是把監控宣告收進該資源的 module——模組四（環境分離與模組化）談的模組化在這裡延伸成「每個服務模組自帶它的 observability 宣告」，模組五（核心服務上 IaC）談的每個核心服務也應該在同一個 module 裡帶上自己的 log 與 alarm。&lt;/p>
&lt;h2 id="log-group-與-retention-設計">log group 與 retention 設計&lt;/h2>
&lt;p>Log group 是日誌的歸屬與保存單位，它要回答兩個治理問題：留多久、誰能讀。這兩個問題寫進 IaC 才能稽核，而非依賴 vendor 的隱性預設。許多雲端服務在你沒宣告 log group 時會自動建一個、套上「永久保留」的預設值，於是日誌無限堆積、帳單緩慢長大，而真正敏感的內容反而沒人管控存取。&lt;/p>
&lt;p>Retention 是成本、合規與除錯需求的三方取捨。除錯通常只需要近幾天到幾週的熱資料；合規（如稽核軌跡、金流紀錄）可能要求保留數年；而每多留一天就多一天的儲存費。划算的做法是按日誌類型分層：高頻、除錯用的 application log 設短 retention（例如 14 到 30 天），稽核相關的 access log 按合規要求設長期保留，必要時再把冷資料歸檔到更便宜的物件儲存。把這些值寫進 IaC，讓「為什麼這條 log 留 90 天」是一個能在 PR 上被討論的決定。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-hcl" data-lang="hcl">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="k">resource&lt;/span> &lt;span class="s2">&amp;#34;aws_cloudwatch_log_group&amp;#34; &amp;#34;api&amp;#34;&lt;/span> {
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="n"> name&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s2">&amp;#34;/app/${var.env}/api&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">&lt;span class="n"> retention_in_days&lt;/span> &lt;span class="o">=&lt;/span>&lt;span class="n"> var.env&lt;/span> &lt;span class="o">==&lt;/span> &lt;span class="s2">&amp;#34;prod&amp;#34;&lt;/span> &lt;span class="err">?&lt;/span> &lt;span class="m">30&lt;/span> &lt;span class="err">:&lt;/span> &lt;span class="m">7&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">&lt;span class="n"> kms_key_id&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="k">aws_kms_key&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="k">logs&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="k">arn&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl">}&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>「誰能讀」是 retention 之外的另一半，因為 log 經常夾帶 PII、token 或內部結構，讀取權限要跟身分地基一起管。存取控制掛在模組二（身分與憑證地基）建立的 IAM 角色上，加密金鑰則對應模組三、模組七一路延伸的金鑰治理。常見陷阱是 log 在傳輸與儲存都加密了，卻對整個團隊開放讀取，等於把敏感資料攤在所有人面前；read 權限應該縮到值班與稽核需要的最小集合。應用層該怎麼決定哪些欄位根本不該進 log，屬於資料保護的範圍，可往 &lt;code>/backend/07-security-data-protection/&lt;/code> 對齊。&lt;/p>
&lt;h2 id="metric-與-alarm-寫進-iac">metric 與 alarm 寫進 IaC&lt;/h2>
&lt;p>Metric 與 alarm 寫進 IaC，目的是讓「資源被建立的同時就帶著它的健康判準」。Alarm 不只是一個閾值，它是一份對「這個資源什麼狀態算不正常」的成文約定：哪條 metric、跨多長的評估窗口、超過什麼值要通知誰。把這份約定寫進 code，它就能被 review、被版本控制、被跨環境複用，而不是散落在某個人腦中或 console 的某個角落。&lt;/p>
&lt;p>Alarm 的價值在於它連到動作，而非只是亮一盞燈。一條有用的 alarm 至少要綁定通知去向（on-call 的 SNS topic、PagerDuty、Slack），並寫清楚 &lt;code>INSUFFICIENT_DATA&lt;/code> 怎麼處理——資料不足到底算正常還是異常，取決於這條 metric 平常是否持續有資料。閾值設計是訊號與雜訊的取捨：設太敏感會頻繁誤報、養出告警疲勞，設太鈍則錯過真正的劣化。划算的起點是針對「使用者已經受影響」的症狀型 metric 設 alarm（錯誤率、p99 延遲、佇列積壓），而把成因型指標（CPU、記憶體）留作 dashboard 上的診斷線索，避免每個成因都獨立告警。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-hcl" data-lang="hcl">&lt;span class="line">&lt;span class="ln"> 1&lt;/span>&lt;span class="cl">&lt;span class="k">resource&lt;/span> &lt;span class="s2">&amp;#34;aws_cloudwatch_metric_alarm&amp;#34; &amp;#34;api_5xx&amp;#34;&lt;/span> {
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 2&lt;/span>&lt;span class="cl">&lt;span class="n"> alarm_name&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s2">&amp;#34;${var.env}-api-5xx-rate&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 3&lt;/span>&lt;span class="cl">&lt;span class="n"> comparison_operator&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s2">&amp;#34;GreaterThanThreshold&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 4&lt;/span>&lt;span class="cl">&lt;span class="n"> evaluation_periods&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="m">3&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 5&lt;/span>&lt;span class="cl">&lt;span class="n"> metric_name&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s2">&amp;#34;5XXError&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 6&lt;/span>&lt;span class="cl">&lt;span class="n"> namespace&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s2">&amp;#34;AWS/ApiGateway&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 7&lt;/span>&lt;span class="cl">&lt;span class="n"> period&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="m">60&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 8&lt;/span>&lt;span class="cl">&lt;span class="n"> statistic&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s2">&amp;#34;Sum&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 9&lt;/span>&lt;span class="cl">&lt;span class="n"> threshold&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="m">10&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">10&lt;/span>&lt;span class="cl">&lt;span class="n"> treat_missing_data&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s2">&amp;#34;notBreaching&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">11&lt;/span>&lt;span class="cl">&lt;span class="n"> alarm_actions&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="k">aws_sns_topic&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="k">oncall&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="k">arn&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">12&lt;/span>&lt;span class="cl">}&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>判讀訊號是：每次新服務上線都要有人「記得」去加 alarm，代表 alarm 還沒進 module 模板。修法是把基礎告警（錯誤率、延遲、健康檢查失敗）做成服務模組的預設輸出，讓開新服務時 alarm 跟著資源一起生出來，調整閾值才是該服務 owner 的選配。&lt;/p></description><content:encoded><![CDATA[<p>可觀測性要跟它監控的資源同生命週期：log group、metric 與 alarm 寫進建立資源的同一套 IaC，資源開出來的那一刻監控就在線，而非等出事才補。少了這條規則的代價很具體：凌晨資料庫 CPU 飆到 100%、API 開始逾時，值班工程師打開 console 想看 log，卻發現那個服務根本沒接 log group、metric 也只有 vendor 預設的幾條粗線，追不到呼叫鏈、查不到錯誤訊息，只能靠重啟賭它恢復。</p>
<h2 id="observability-跟-infra-同一套-code同生命週期">observability 跟 infra 同一套 code、同生命週期</h2>
<p>可觀測性是基礎設施的一部分，承擔「讓資源在出事時可被追查」的責任，因此它的建立、變更與銷毀要跟被監控的資源綁在同一個生命週期裡。一個 RDS 實例、一個 Lambda、一個 ECS service 被 IaC 建立時，它的 log group、它的關鍵 metric alarm 應該在同一份 plan 裡一起 apply；這個資源被 destroy 時，對應的 alarm 也一起收掉，不留下對著空資源狂叫的孤兒告警。</p>
<p>把監控外掛在資源之外會製造兩種漂移。第一種是新資源沒有監控：service 透過 PR 加上去了，但 alarm 要某人事後手動進 console 點，於是有些 service 有 alarm、有些沒有，覆蓋率取決於誰記得。第二種是死資源留下殘響：資源砍了但 alarm 還在，半夜對著不存在的 target 噴 <code>INSUFFICIENT_DATA</code>，值班的人學會忽略它，告警疲勞讓真的事故也被一起忽略。兩種漂移的共同根因都是監控跟資源不在同一個 apply 單位裡。</p>
<p>判讀訊號很直接：如果有人能回答「這個服務有沒有 alarm」要去翻 console 而不是讀 code，監控就已經跟資源脫鉤了。修法是把監控宣告收進該資源的 module——模組四（環境分離與模組化）談的模組化在這裡延伸成「每個服務模組自帶它的 observability 宣告」，模組五（核心服務上 IaC）談的每個核心服務也應該在同一個 module 裡帶上自己的 log 與 alarm。</p>
<h2 id="log-group-與-retention-設計">log group 與 retention 設計</h2>
<p>Log group 是日誌的歸屬與保存單位，它要回答兩個治理問題：留多久、誰能讀。這兩個問題寫進 IaC 才能稽核，而非依賴 vendor 的隱性預設。許多雲端服務在你沒宣告 log group 時會自動建一個、套上「永久保留」的預設值，於是日誌無限堆積、帳單緩慢長大，而真正敏感的內容反而沒人管控存取。</p>
<p>Retention 是成本、合規與除錯需求的三方取捨。除錯通常只需要近幾天到幾週的熱資料；合規（如稽核軌跡、金流紀錄）可能要求保留數年；而每多留一天就多一天的儲存費。划算的做法是按日誌類型分層：高頻、除錯用的 application log 設短 retention（例如 14 到 30 天），稽核相關的 access log 按合規要求設長期保留，必要時再把冷資料歸檔到更便宜的物件儲存。把這些值寫進 IaC，讓「為什麼這條 log 留 90 天」是一個能在 PR 上被討論的決定。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-hcl" data-lang="hcl"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">resource</span> <span class="s2">&#34;aws_cloudwatch_log_group&#34; &#34;api&#34;</span> {
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="n">  name</span>              <span class="o">=</span> <span class="s2">&#34;/app/${var.env}/api&#34;</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="n">  retention_in_days</span> <span class="o">=</span><span class="n"> var.env</span> <span class="o">==</span> <span class="s2">&#34;prod&#34;</span> <span class="err">?</span> <span class="m">30</span> <span class="err">:</span> <span class="m">7</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="n">  kms_key_id</span>        <span class="o">=</span> <span class="k">aws_kms_key</span><span class="p">.</span><span class="k">logs</span><span class="p">.</span><span class="k">arn</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">}</span></span></code></pre></div><p>「誰能讀」是 retention 之外的另一半，因為 log 經常夾帶 PII、token 或內部結構，讀取權限要跟身分地基一起管。存取控制掛在模組二（身分與憑證地基）建立的 IAM 角色上，加密金鑰則對應模組三、模組七一路延伸的金鑰治理。常見陷阱是 log 在傳輸與儲存都加密了，卻對整個團隊開放讀取，等於把敏感資料攤在所有人面前；read 權限應該縮到值班與稽核需要的最小集合。應用層該怎麼決定哪些欄位根本不該進 log，屬於資料保護的範圍，可往 <code>/backend/07-security-data-protection/</code> 對齊。</p>
<h2 id="metric-與-alarm-寫進-iac">metric 與 alarm 寫進 IaC</h2>
<p>Metric 與 alarm 寫進 IaC，目的是讓「資源被建立的同時就帶著它的健康判準」。Alarm 不只是一個閾值，它是一份對「這個資源什麼狀態算不正常」的成文約定：哪條 metric、跨多長的評估窗口、超過什麼值要通知誰。把這份約定寫進 code，它就能被 review、被版本控制、被跨環境複用，而不是散落在某個人腦中或 console 的某個角落。</p>
<p>Alarm 的價值在於它連到動作，而非只是亮一盞燈。一條有用的 alarm 至少要綁定通知去向（on-call 的 SNS topic、PagerDuty、Slack），並寫清楚 <code>INSUFFICIENT_DATA</code> 怎麼處理——資料不足到底算正常還是異常，取決於這條 metric 平常是否持續有資料。閾值設計是訊號與雜訊的取捨：設太敏感會頻繁誤報、養出告警疲勞，設太鈍則錯過真正的劣化。划算的起點是針對「使用者已經受影響」的症狀型 metric 設 alarm（錯誤率、p99 延遲、佇列積壓），而把成因型指標（CPU、記憶體）留作 dashboard 上的診斷線索，避免每個成因都獨立告警。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-hcl" data-lang="hcl"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="k">resource</span> <span class="s2">&#34;aws_cloudwatch_metric_alarm&#34; &#34;api_5xx&#34;</span> {
</span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="n">  alarm_name</span>          <span class="o">=</span> <span class="s2">&#34;${var.env}-api-5xx-rate&#34;</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="n">  comparison_operator</span> <span class="o">=</span> <span class="s2">&#34;GreaterThanThreshold&#34;</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="n">  evaluation_periods</span>  <span class="o">=</span> <span class="m">3</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="n">  metric_name</span>         <span class="o">=</span> <span class="s2">&#34;5XXError&#34;</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="n">  namespace</span>           <span class="o">=</span> <span class="s2">&#34;AWS/ApiGateway&#34;</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="n">  period</span>              <span class="o">=</span> <span class="m">60</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="n">  statistic</span>           <span class="o">=</span> <span class="s2">&#34;Sum&#34;</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="n">  threshold</span>           <span class="o">=</span> <span class="m">10</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="n">  treat_missing_data</span>  <span class="o">=</span> <span class="s2">&#34;notBreaching&#34;</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="n">  alarm_actions</span>       <span class="o">=</span> <span class="p">[</span><span class="k">aws_sns_topic</span><span class="p">.</span><span class="k">oncall</span><span class="p">.</span><span class="k">arn</span><span class="p">]</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">}</span></span></code></pre></div><p>判讀訊號是：每次新服務上線都要有人「記得」去加 alarm，代表 alarm 還沒進 module 模板。修法是把基礎告警（錯誤率、延遲、健康檢查失敗）做成服務模組的預設輸出，讓開新服務時 alarm 跟著資源一起生出來，調整閾值才是該服務 owner 的選配。</p>
<h2 id="跟-monitoring-系列的分工基礎設施訊號-vs-客戶端行為訊號">跟 monitoring 系列的分工：基礎設施訊號 vs 客戶端行為訊號</h2>
<p>本模組的可觀測性處理基礎設施訊號，monitoring 系列處理客戶端與業務行為訊號，兩者觀測的對象不同、生命週期也不同，因此分屬不同的 code 與不同的章節。基礎設施訊號是資源層的健康狀態：log group、CPU、佇列深度、5xx 比例、實例存活，它們跟著資源被 IaC 建立與銷毀，回答「這個系統還活著嗎、哪裡壞了」。</p>
<p>客戶端行為訊號則是 SDK、Collector、業務埋點那一層：使用者點了什麼、轉換漏斗、前端錯誤、自訂事件，它們跟著產品功能演進、不跟著基礎設施資源同生共滅，所以放在 <code>/monitoring/</code>。判讀分界的問法是：這個訊號是「資源建立時就該存在」還是「功能開發時才埋」。前者進本模組的 IaC，後者進 monitoring 那層的應用程式碼。兩者在事故排查時會合流——基礎設施 alarm 告訴你哪個資源異常，客戶端訊號告訴你使用者實際受了什麼影響——但它們的擁有者、變更節奏與部署管道不同，混在一起會讓「誰負責這條訊號」變模糊。</p>
<p>收斂成一句判準：資源建立時就該存在的訊號歸本模組的 IaC，功能開發時才埋的客戶端行為訊號歸另一層；各條延伸章節見下方跨分類引用。</p>
<h2 id="章節文章">章節文章</h2>
<table>
  <thead>
      <tr>
          <th>文章</th>
          <th>主題</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><a href="/blog/infra/06-observability-logging/log-metric-alarm-lifecycle/" data-link-title="可觀測性與 log 同生命週期管理" data-link-desc="log group、metric、alarm 寫進建立資源的同一套 IaC，讓監控跟資源同生共滅，出事時追得到查得到">可觀測性與 log 同生命週期管理</a></td>
          <td>log group、metric、alarm 寫進同一套 IaC，讓監控跟資源同生共滅，出事時追得到查得到</td>
      </tr>
  </tbody>
</table>
<h2 id="跨分類引用">跨分類引用</h2>
<ul>
<li>→ <a href="/blog/monitoring/" data-link-title="監控實務指南" data-link-desc="整理非伺服器端運行時的監控體系 — 行為蒐集、錯誤回報、效能指標、生命週期追蹤，從自架方案到商業方案的完整知識路線">Monitoring 監控體系</a>：客戶端 SDK / Collector 那層的監控</li>
<li>→ <a href="/blog/infra/05-core-services/" data-link-title="模組五：核心服務上 IaC" data-link-desc="資料庫、運算、儲存、load balancer 怎麼寫進基礎設施程式碼，以及上線順序">模組五：核心服務上 IaC</a>：每個核心服務帶自己的 log 與 alarm</li>
<li>→ <a href="/blog/infra/07-infra-as-pr/" data-link-title="模組七：infra 走 PR 流程與自動化護欄" data-link-desc="infra 變更走 PR → plan → review diff → 合併 → apply，配 fmt / validate / tflint / checkov / tfsec 與 Atlantis 自動化，讓基礎設施可審查、可回溯、可交接">模組七：infra 走 PR 流程</a>：observability 變更也走 PR 與自動化護欄</li>
<li>→ <a href="/blog/backend/07-security-data-protection/" data-link-title="模組七：資安與資料保護" data-link-desc="以問題驅動方式擴充資安知識網：先定義服務環節問題，再以案例作為觸發式參考">backend 模組七：資安與資料保護</a>：哪些欄位不該進 log、PII 處理</li>
</ul>
]]></content:encoded></item><item><title>3.6 log/slog：結構化日誌</title><link>https://tarrragon.github.io/blog/go/03-stdlib/slog/</link><pubDate>Wed, 22 Apr 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/go/03-stdlib/slog/</guid><description>&lt;p>&lt;code>log/slog&lt;/code> 是 Go 標準庫提供的結構化日誌 package。它的核心用途是把 &lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/log/" data-link-title="Log" data-link-desc="說明 log 如何記錄單一事件的上下文並支援事故排查">log&lt;/a> 寫成「訊息 + key-value 欄位」，讓人類能讀，也讓工具能搜尋、過濾與聚合。&lt;/p>
&lt;h2 id="本章目標">本章目標&lt;/h2>
&lt;p>學完本章後，你將能夠：&lt;/p>
&lt;ol>
&lt;li>建立 text 或 JSON logger&lt;/li>
&lt;li>使用 log level 區分訊號重要性&lt;/li>
&lt;li>用 key-value 欄位保存可查詢資訊&lt;/li>
&lt;li>設計穩定的 log 欄位名稱&lt;/li>
&lt;li>避免把所有資訊塞進自由文字&lt;/li>
&lt;/ol>
&lt;hr>
&lt;h2 id="觀察結構化日誌把資訊放進欄位">【觀察】結構化日誌把資訊放進欄位&lt;/h2>
&lt;p>結構化日誌的核心規則是：穩定資訊放欄位，敘述文字只描述事件。以下範例記錄一筆 user 建立事件：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="nx">logger&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">slog&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">New&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">slog&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">NewJSONHandler&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">os&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Stdout&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="kc">nil&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">&lt;span class="nx">logger&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Info&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl"> &lt;span class="s">&amp;#34;user created&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl"> &lt;span class="s">&amp;#34;userID&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;u_1&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">6&lt;/span>&lt;span class="cl"> &lt;span class="s">&amp;#34;email&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;alice@example.com&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">7&lt;/span>&lt;span class="cl">&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>JSON handler 會輸出類似：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-json" data-lang="json">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl"> &lt;span class="nt">&amp;#34;time&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s2">&amp;#34;2026-04-22T10:00:00Z&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl"> &lt;span class="nt">&amp;#34;level&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s2">&amp;#34;INFO&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl"> &lt;span class="nt">&amp;#34;msg&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s2">&amp;#34;user created&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl"> &lt;span class="nt">&amp;#34;userID&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s2">&amp;#34;u_1&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">6&lt;/span>&lt;span class="cl"> &lt;span class="nt">&amp;#34;email&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s2">&amp;#34;alice@example.com&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">7&lt;/span>&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>這比 &lt;code>fmt.Printf(&amp;quot;user u_1 alice@example.com created&amp;quot;)&lt;/code> 更容易被查詢。&lt;/p>
&lt;h2 id="判讀level-表示事件嚴重度">【判讀】level 表示事件嚴重度&lt;/h2>
&lt;p>log level 的核心規則是：level 表示事件需要多少注意力，不表示程式碼所在位置。&lt;/p>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>level&lt;/th>
 &lt;th>適用情境&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>Debug&lt;/td>
 &lt;td>開發或診斷細節&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Info&lt;/td>
 &lt;td>正常但重要的狀態變化&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Warn&lt;/td>
 &lt;td>可恢復但需要注意的異常&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Error&lt;/td>
 &lt;td>操作失敗或需要處理的錯誤&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>範例：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="nx">logger&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Debug&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;cache miss&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;key&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">key&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="nx">logger&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Info&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;server started&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;addr&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">addr&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">&lt;span class="nx">logger&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Warn&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;queue full&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;dropped&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">count&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">&lt;span class="nx">logger&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Error&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;write file failed&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;path&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">path&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;error&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">err&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;code>Error&lt;/code> log 應該包含 error 欄位，讓讀者知道失敗原因。&lt;/p>
&lt;h2 id="策略欄位名稱要穩定">【策略】欄位名稱要穩定&lt;/h2>
&lt;p>log 欄位設計的核心規則是：同一個概念使用同一個欄位名稱，不要在不同地方混用別名。&lt;/p>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>概念&lt;/th>
 &lt;th>建議欄位&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>使用者 ID&lt;/td>
 &lt;td>&lt;code>userID&lt;/code>&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/request-id/" data-link-title="Request ID" data-link-desc="說明單次 request 的識別碼如何支援 log 搜尋與問題定位">request ID&lt;/a>&lt;/td>
 &lt;td>&lt;code>requestID&lt;/code>&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>工作 ID&lt;/td>
 &lt;td>&lt;code>jobID&lt;/code>&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>元件名稱&lt;/td>
 &lt;td>&lt;code>component&lt;/code>&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>錯誤&lt;/td>
 &lt;td>&lt;code>error&lt;/code>&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>不要這樣混用：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="nx">logger&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Info&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;job queued&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;id&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">job&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">ID&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="nx">logger&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Info&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;job started&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;job_id&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">job&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">ID&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">&lt;span class="nx">logger&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Info&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;job done&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;jobID&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">job&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">ID&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>應該統一：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="nx">logger&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Info&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;job queued&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;jobID&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">job&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">ID&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="nx">logger&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Info&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;job started&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;jobID&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">job&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">ID&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">&lt;span class="nx">logger&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Info&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;job done&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;jobID&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">job&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">ID&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>欄位穩定後，grep、log query、&lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/dashboard/" data-link-title="Dashboard" data-link-desc="說明 dashboard 如何把關鍵訊號組成可判讀的服務狀態畫面">dashboard&lt;/a> 才能可靠。&lt;/p>
&lt;h2 id="執行建立帶預設欄位的-logger">【執行】建立帶預設欄位的 logger&lt;/h2>
&lt;p>預設欄位的核心規則是：每筆 log 都需要的上下文，應該掛在 logger 上，而不是每次手動重複。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="ln"> 1&lt;/span>&lt;span class="cl">&lt;span class="nx">base&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">slog&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">New&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">slog&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">NewJSONHandler&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">os&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Stdout&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="o">&amp;amp;&lt;/span>&lt;span class="nx">slog&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">HandlerOptions&lt;/span>&lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 2&lt;/span>&lt;span class="cl"> &lt;span class="nx">Level&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nx">slog&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">LevelInfo&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 3&lt;/span>&lt;span class="cl">&lt;span class="p">}))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 4&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 5&lt;/span>&lt;span class="cl">&lt;span class="nx">logger&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">base&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">With&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 6&lt;/span>&lt;span class="cl"> &lt;span class="s">&amp;#34;component&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;worker&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 7&lt;/span>&lt;span class="cl"> &lt;span class="s">&amp;#34;version&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;1.0.0&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 8&lt;/span>&lt;span class="cl">&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 9&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">10&lt;/span>&lt;span class="cl">&lt;span class="nx">logger&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Info&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;job started&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;jobID&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;j_1&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">11&lt;/span>&lt;span class="cl">&lt;span class="nx">logger&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Info&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;job finished&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;jobID&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;j_1&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;code>With&lt;/code> 會回傳帶有固定欄位的新 logger。這適合 component、version、requestID 這類上下文。&lt;/p>
&lt;h2 id="設計檢查">設計檢查&lt;/h2>
&lt;h3 id="把所有資訊塞進-msg">把所有資訊塞進 msg&lt;/h3>
&lt;p>不佳：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="nx">logger&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Info&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;job j_1 for user u_1 started&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>較佳：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="nx">logger&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Info&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;job started&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;jobID&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;j_1&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;userID&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;u_1&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>第二種寫法可以直接查 &lt;code>jobID=j_1&lt;/code> 或 &lt;code>userID=u_1&lt;/code>。&lt;/p>
&lt;h3 id="欄位名稱不穩定">欄位名稱不穩定&lt;/h3>
&lt;p>欄位名稱不穩定會讓查詢失效。選定 &lt;code>userID&lt;/code> 就一路使用 &lt;code>userID&lt;/code>，不要混用 &lt;code>uid&lt;/code>、&lt;code>user_id&lt;/code>、&lt;code>user&lt;/code>。&lt;/p>
&lt;h3 id="忽略敏感資訊">忽略敏感資訊&lt;/h3>
&lt;p>log 會被保存與轉發。密碼、token、完整信用卡號等敏感資訊不應寫入 log。&lt;/p>
&lt;h2 id="延伸閱讀">延伸閱讀&lt;/h2>
&lt;p>本章只介紹標準庫 &lt;code>log/slog&lt;/code> 的基本用法。服務開始有 domain event、state repository 或查詢需求時，可以接著閱讀 &lt;a href="https://tarrragon.github.io/blog/go/06-practical/structured-recording/" data-link-title="6.5 如何新增結構化記錄欄位" data-link-desc="區分 operational log、domain event log 與狀態資料">如何新增結構化記錄欄位&lt;/a>；進入生產操作後，再閱讀 &lt;a href="https://tarrragon.github.io/blog/go-advanced/06-production-operations/log-fields/" data-link-title="6.3 結構化日誌欄位設計" data-link-desc="讓 log 可 grep、可聚合、可追蹤">Go 進階：結構化日誌欄位設計&lt;/a> 與 &lt;a href="https://tarrragon.github.io/blog/go-advanced/07-distributed-operations/observability-pipeline/" data-link-title="7.4 Observability pipeline、metrics 與 tracing" data-link-desc="把 structured log、metric、trace 與 profile 組成可操作的診斷系統">Observability pipeline、metrics 與 tracing&lt;/a>。&lt;/p></description><content:encoded><![CDATA[<p><code>log/slog</code> 是 Go 標準庫提供的結構化日誌 package。它的核心用途是把 <a href="/blog/backend/knowledge-cards/log/" data-link-title="Log" data-link-desc="說明 log 如何記錄單一事件的上下文並支援事故排查">log</a> 寫成「訊息 + key-value 欄位」，讓人類能讀，也讓工具能搜尋、過濾與聚合。</p>
<h2 id="本章目標">本章目標</h2>
<p>學完本章後，你將能夠：</p>
<ol>
<li>建立 text 或 JSON logger</li>
<li>使用 log level 區分訊號重要性</li>
<li>用 key-value 欄位保存可查詢資訊</li>
<li>設計穩定的 log 欄位名稱</li>
<li>避免把所有資訊塞進自由文字</li>
</ol>
<hr>
<h2 id="觀察結構化日誌把資訊放進欄位">【觀察】結構化日誌把資訊放進欄位</h2>
<p>結構化日誌的核心規則是：穩定資訊放欄位，敘述文字只描述事件。以下範例記錄一筆 user 建立事件：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln">1</span><span class="cl"><span class="nx">logger</span> <span class="o">:=</span> <span class="nx">slog</span><span class="p">.</span><span class="nf">New</span><span class="p">(</span><span class="nx">slog</span><span class="p">.</span><span class="nf">NewJSONHandler</span><span class="p">(</span><span class="nx">os</span><span class="p">.</span><span class="nx">Stdout</span><span class="p">,</span> <span class="kc">nil</span><span class="p">))</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="nx">logger</span><span class="p">.</span><span class="nf">Info</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">    <span class="s">&#34;user created&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">    <span class="s">&#34;userID&#34;</span><span class="p">,</span> <span class="s">&#34;u_1&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl">    <span class="s">&#34;email&#34;</span><span class="p">,</span> <span class="s">&#34;alice@example.com&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="p">)</span></span></span></code></pre></div><p>JSON handler 會輸出類似：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-json" data-lang="json"><span class="line"><span class="ln">1</span><span class="cl"><span class="p">{</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">  <span class="nt">&#34;time&#34;</span><span class="p">:</span> <span class="s2">&#34;2026-04-22T10:00:00Z&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">  <span class="nt">&#34;level&#34;</span><span class="p">:</span> <span class="s2">&#34;INFO&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">  <span class="nt">&#34;msg&#34;</span><span class="p">:</span> <span class="s2">&#34;user created&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">  <span class="nt">&#34;userID&#34;</span><span class="p">:</span> <span class="s2">&#34;u_1&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl">  <span class="nt">&#34;email&#34;</span><span class="p">:</span> <span class="s2">&#34;alice@example.com&#34;</span>
</span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>這比 <code>fmt.Printf(&quot;user u_1 alice@example.com created&quot;)</code> 更容易被查詢。</p>
<h2 id="判讀level-表示事件嚴重度">【判讀】level 表示事件嚴重度</h2>
<p>log level 的核心規則是：level 表示事件需要多少注意力，不表示程式碼所在位置。</p>
<table>
  <thead>
      <tr>
          <th>level</th>
          <th>適用情境</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Debug</td>
          <td>開發或診斷細節</td>
      </tr>
      <tr>
          <td>Info</td>
          <td>正常但重要的狀態變化</td>
      </tr>
      <tr>
          <td>Warn</td>
          <td>可恢復但需要注意的異常</td>
      </tr>
      <tr>
          <td>Error</td>
          <td>操作失敗或需要處理的錯誤</td>
      </tr>
  </tbody>
</table>
<p>範例：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln">1</span><span class="cl"><span class="nx">logger</span><span class="p">.</span><span class="nf">Debug</span><span class="p">(</span><span class="s">&#34;cache miss&#34;</span><span class="p">,</span> <span class="s">&#34;key&#34;</span><span class="p">,</span> <span class="nx">key</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="nx">logger</span><span class="p">.</span><span class="nf">Info</span><span class="p">(</span><span class="s">&#34;server started&#34;</span><span class="p">,</span> <span class="s">&#34;addr&#34;</span><span class="p">,</span> <span class="nx">addr</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="nx">logger</span><span class="p">.</span><span class="nf">Warn</span><span class="p">(</span><span class="s">&#34;queue full&#34;</span><span class="p">,</span> <span class="s">&#34;dropped&#34;</span><span class="p">,</span> <span class="nx">count</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="nx">logger</span><span class="p">.</span><span class="nf">Error</span><span class="p">(</span><span class="s">&#34;write file failed&#34;</span><span class="p">,</span> <span class="s">&#34;path&#34;</span><span class="p">,</span> <span class="nx">path</span><span class="p">,</span> <span class="s">&#34;error&#34;</span><span class="p">,</span> <span class="nx">err</span><span class="p">)</span></span></span></code></pre></div><p><code>Error</code> log 應該包含 error 欄位，讓讀者知道失敗原因。</p>
<h2 id="策略欄位名稱要穩定">【策略】欄位名稱要穩定</h2>
<p>log 欄位設計的核心規則是：同一個概念使用同一個欄位名稱，不要在不同地方混用別名。</p>
<table>
  <thead>
      <tr>
          <th>概念</th>
          <th>建議欄位</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>使用者 ID</td>
          <td><code>userID</code></td>
      </tr>
      <tr>
          <td><a href="/blog/backend/knowledge-cards/request-id/" data-link-title="Request ID" data-link-desc="說明單次 request 的識別碼如何支援 log 搜尋與問題定位">request ID</a></td>
          <td><code>requestID</code></td>
      </tr>
      <tr>
          <td>工作 ID</td>
          <td><code>jobID</code></td>
      </tr>
      <tr>
          <td>元件名稱</td>
          <td><code>component</code></td>
      </tr>
      <tr>
          <td>錯誤</td>
          <td><code>error</code></td>
      </tr>
  </tbody>
</table>
<p>不要這樣混用：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln">1</span><span class="cl"><span class="nx">logger</span><span class="p">.</span><span class="nf">Info</span><span class="p">(</span><span class="s">&#34;job queued&#34;</span><span class="p">,</span> <span class="s">&#34;id&#34;</span><span class="p">,</span> <span class="nx">job</span><span class="p">.</span><span class="nx">ID</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="nx">logger</span><span class="p">.</span><span class="nf">Info</span><span class="p">(</span><span class="s">&#34;job started&#34;</span><span class="p">,</span> <span class="s">&#34;job_id&#34;</span><span class="p">,</span> <span class="nx">job</span><span class="p">.</span><span class="nx">ID</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="nx">logger</span><span class="p">.</span><span class="nf">Info</span><span class="p">(</span><span class="s">&#34;job done&#34;</span><span class="p">,</span> <span class="s">&#34;jobID&#34;</span><span class="p">,</span> <span class="nx">job</span><span class="p">.</span><span class="nx">ID</span><span class="p">)</span></span></span></code></pre></div><p>應該統一：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln">1</span><span class="cl"><span class="nx">logger</span><span class="p">.</span><span class="nf">Info</span><span class="p">(</span><span class="s">&#34;job queued&#34;</span><span class="p">,</span> <span class="s">&#34;jobID&#34;</span><span class="p">,</span> <span class="nx">job</span><span class="p">.</span><span class="nx">ID</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="nx">logger</span><span class="p">.</span><span class="nf">Info</span><span class="p">(</span><span class="s">&#34;job started&#34;</span><span class="p">,</span> <span class="s">&#34;jobID&#34;</span><span class="p">,</span> <span class="nx">job</span><span class="p">.</span><span class="nx">ID</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="nx">logger</span><span class="p">.</span><span class="nf">Info</span><span class="p">(</span><span class="s">&#34;job done&#34;</span><span class="p">,</span> <span class="s">&#34;jobID&#34;</span><span class="p">,</span> <span class="nx">job</span><span class="p">.</span><span class="nx">ID</span><span class="p">)</span></span></span></code></pre></div><p>欄位穩定後，grep、log query、<a href="/blog/backend/knowledge-cards/dashboard/" data-link-title="Dashboard" data-link-desc="說明 dashboard 如何把關鍵訊號組成可判讀的服務狀態畫面">dashboard</a> 才能可靠。</p>
<h2 id="執行建立帶預設欄位的-logger">【執行】建立帶預設欄位的 logger</h2>
<p>預設欄位的核心規則是：每筆 log 都需要的上下文，應該掛在 logger 上，而不是每次手動重複。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="nx">base</span> <span class="o">:=</span> <span class="nx">slog</span><span class="p">.</span><span class="nf">New</span><span class="p">(</span><span class="nx">slog</span><span class="p">.</span><span class="nf">NewJSONHandler</span><span class="p">(</span><span class="nx">os</span><span class="p">.</span><span class="nx">Stdout</span><span class="p">,</span> <span class="o">&amp;</span><span class="nx">slog</span><span class="p">.</span><span class="nx">HandlerOptions</span><span class="p">{</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="nx">Level</span><span class="p">:</span> <span class="nx">slog</span><span class="p">.</span><span class="nx">LevelInfo</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="p">}))</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="nx">logger</span> <span class="o">:=</span> <span class="nx">base</span><span class="p">.</span><span class="nf">With</span><span class="p">(</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">    <span class="s">&#34;component&#34;</span><span class="p">,</span> <span class="s">&#34;worker&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">    <span class="s">&#34;version&#34;</span><span class="p">,</span> <span class="s">&#34;1.0.0&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="nx">logger</span><span class="p">.</span><span class="nf">Info</span><span class="p">(</span><span class="s">&#34;job started&#34;</span><span class="p">,</span> <span class="s">&#34;jobID&#34;</span><span class="p">,</span> <span class="s">&#34;j_1&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="nx">logger</span><span class="p">.</span><span class="nf">Info</span><span class="p">(</span><span class="s">&#34;job finished&#34;</span><span class="p">,</span> <span class="s">&#34;jobID&#34;</span><span class="p">,</span> <span class="s">&#34;j_1&#34;</span><span class="p">)</span></span></span></code></pre></div><p><code>With</code> 會回傳帶有固定欄位的新 logger。這適合 component、version、requestID 這類上下文。</p>
<h2 id="設計檢查">設計檢查</h2>
<h3 id="把所有資訊塞進-msg">把所有資訊塞進 msg</h3>
<p>不佳：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln">1</span><span class="cl"><span class="nx">logger</span><span class="p">.</span><span class="nf">Info</span><span class="p">(</span><span class="s">&#34;job j_1 for user u_1 started&#34;</span><span class="p">)</span></span></span></code></pre></div><p>較佳：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln">1</span><span class="cl"><span class="nx">logger</span><span class="p">.</span><span class="nf">Info</span><span class="p">(</span><span class="s">&#34;job started&#34;</span><span class="p">,</span> <span class="s">&#34;jobID&#34;</span><span class="p">,</span> <span class="s">&#34;j_1&#34;</span><span class="p">,</span> <span class="s">&#34;userID&#34;</span><span class="p">,</span> <span class="s">&#34;u_1&#34;</span><span class="p">)</span></span></span></code></pre></div><p>第二種寫法可以直接查 <code>jobID=j_1</code> 或 <code>userID=u_1</code>。</p>
<h3 id="欄位名稱不穩定">欄位名稱不穩定</h3>
<p>欄位名稱不穩定會讓查詢失效。選定 <code>userID</code> 就一路使用 <code>userID</code>，不要混用 <code>uid</code>、<code>user_id</code>、<code>user</code>。</p>
<h3 id="忽略敏感資訊">忽略敏感資訊</h3>
<p>log 會被保存與轉發。密碼、token、完整信用卡號等敏感資訊不應寫入 log。</p>
<h2 id="延伸閱讀">延伸閱讀</h2>
<p>本章只介紹標準庫 <code>log/slog</code> 的基本用法。服務開始有 domain event、state repository 或查詢需求時，可以接著閱讀 <a href="/blog/go/06-practical/structured-recording/" data-link-title="6.5 如何新增結構化記錄欄位" data-link-desc="區分 operational log、domain event log 與狀態資料">如何新增結構化記錄欄位</a>；進入生產操作後，再閱讀 <a href="/blog/go-advanced/06-production-operations/log-fields/" data-link-title="6.3 結構化日誌欄位設計" data-link-desc="讓 log 可 grep、可聚合、可追蹤">Go 進階：結構化日誌欄位設計</a> 與 <a href="/blog/go-advanced/07-distributed-operations/observability-pipeline/" data-link-title="7.4 Observability pipeline、metrics 與 tracing" data-link-desc="把 structured log、metric、trace 與 profile 組成可操作的診斷系統">Observability pipeline、metrics 與 tracing</a>。</p>
]]></content:encoded></item><item><title>5.6 Hook 系統可觀測性設計</title><link>https://tarrragon.github.io/blog/python/05-error-testing/observability-design/</link><pubDate>Wed, 04 Mar 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/python/05-error-testing/observability-design/</guid><description>&lt;p>&lt;a href="https://tarrragon.github.io/blog/python/05-error-testing/error-infrastructure/" data-link-title="5.5 頂層例外處理機制" data-link-desc="run_hook_safely 與統一錯誤基礎設施">上一章&lt;/a>介紹了 &lt;code>run_hook_safely&lt;/code> 這個頂層例外處理器，解決了「44 個 Hook 各自處理錯誤」的問題。但「捕獲錯誤」只是可觀測性的第一步。真正的問題是：&lt;/p>
&lt;blockquote>
&lt;p>當 44 個 Hook 每天執行數百次，你怎麼知道它們運行正常？出了問題你怎麼找到原因？&lt;/p>&lt;/blockquote>
&lt;p>本章從三個維度建立 Hook 系統的可觀測性：&lt;/p>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>維度&lt;/th>
 &lt;th>解決的問題&lt;/th>
 &lt;th>核心機制&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>日誌架構&lt;/td>
 &lt;td>每次執行的痕跡在哪裡？&lt;/td>
 &lt;td>Structured Logging + Log Rotation&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>錯誤可見性&lt;/td>
 &lt;td>出錯了誰來告訴用戶？&lt;/td>
 &lt;td>stderr 輸出 + Fallback 策略&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>健康監控&lt;/td>
 &lt;td>系統整體是否正常？&lt;/td>
 &lt;td>執行時間追蹤 + 日誌清理&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;hr>
&lt;h2 id="一日誌架構設計">一、日誌架構設計&lt;/h2>
&lt;h3 id="11-需求分析">1.1 需求分析&lt;/h3>
&lt;p>Hook 日誌系統和一般應用程式的日誌有兩個根本差異：&lt;/p>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>差異&lt;/th>
 &lt;th>一般應用程式&lt;/th>
 &lt;th>Hook 系統&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>生命週期&lt;/td>
 &lt;td>長時間運行&lt;/td>
 &lt;td>每次觸發執行一次（秒級）&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>實例數量&lt;/td>
 &lt;td>1-3 個服務&lt;/td>
 &lt;td>44 個獨立腳本&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>日誌量&lt;/td>
 &lt;td>大量、持續&lt;/td>
 &lt;td>少量、離散&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>讀者&lt;/td>
 &lt;td>運維團隊&lt;/td>
 &lt;td>開發者自己&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>這些差異決定了日誌架構的選擇：不需要集中式日誌服務，但需要&lt;strong>按 Hook 名稱隔離&lt;/strong>和&lt;strong>按時間自動清理&lt;/strong>。&lt;/p>
&lt;h3 id="12-目錄結構設計">1.2 目錄結構設計&lt;/h3>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-text" data-lang="text">&lt;span class="line">&lt;span class="ln"> 1&lt;/span>&lt;span class="cl">.claude/hook-logs/
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 2&lt;/span>&lt;span class="cl">├── acceptance-gate-hook/
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 3&lt;/span>&lt;span class="cl">│ ├── acceptance-gate-hook-20260304-091523.log
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 4&lt;/span>&lt;span class="cl">│ ├── acceptance-gate-hook-20260304-091845.log
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 5&lt;/span>&lt;span class="cl">│ └── .cleanup_trigger # 清理觸發計數器
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 6&lt;/span>&lt;span class="cl">├── command-entrance-gate-hook/
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 7&lt;/span>&lt;span class="cl">│ ├── command-entrance-gate-hook-20260304-091523.log
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 8&lt;/span>&lt;span class="cl">│ └── ...
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 9&lt;/span>&lt;span class="cl">└── phase-completion-gate-hook/
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">10&lt;/span>&lt;span class="cl"> └── ...&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>每個 Hook 有獨立的日誌目錄。每次執行產生一個獨立的日誌檔案，檔名包含時間戳。這個設計的好處：&lt;/p>
&lt;ul>
&lt;li>&lt;strong>隔離性&lt;/strong>：排查問題時只需看特定 Hook 的目錄&lt;/li>
&lt;li>&lt;strong>時間線&lt;/strong>：按檔名排序就能看到執行歷史&lt;/li>
&lt;li>&lt;strong>清理&lt;/strong>：按目錄或按時間清理都很容易&lt;/li>
&lt;/ul>
&lt;h3 id="13-日誌系統初始化">1.3 日誌系統初始化&lt;/h3>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="ln"> 1&lt;/span>&lt;span class="cl">&lt;span class="k">def&lt;/span> &lt;span class="nf">setup_hook_logging&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">hook_name&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nb">str&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">-&amp;gt;&lt;/span> &lt;span class="n">logging&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Logger&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 2&lt;/span>&lt;span class="cl"> &lt;span class="s2">&amp;#34;&amp;#34;&amp;#34;建立並設定 Hook 日誌系統&amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 3&lt;/span>&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="ow">not&lt;/span> &lt;span class="n">hook_name&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 4&lt;/span>&lt;span class="cl"> &lt;span class="n">hook_name&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">DEFAULT_HOOK_NAME&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 5&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 6&lt;/span>&lt;span class="cl"> &lt;span class="n">sanitized_name&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">_sanitize_hook_name&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">hook_name&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 7&lt;/span>&lt;span class="cl"> &lt;span class="n">root_dir&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">_find_project_root&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 8&lt;/span>&lt;span class="cl"> &lt;span class="n">log_base_dir&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">root_dir&lt;/span> &lt;span class="o">/&lt;/span> &lt;span class="s2">&amp;#34;.claude&amp;#34;&lt;/span> &lt;span class="o">/&lt;/span> &lt;span class="s2">&amp;#34;hook-logs&amp;#34;&lt;/span> &lt;span class="o">/&lt;/span> &lt;span class="n">sanitized_name&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 9&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">10&lt;/span>&lt;span class="cl"> &lt;span class="c1"># 建立日誌目錄（失敗時降級，不拋出異常）&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">11&lt;/span>&lt;span class="cl"> &lt;span class="k">try&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">12&lt;/span>&lt;span class="cl"> &lt;span class="n">log_base_dir&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">mkdir&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">parents&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="kc">True&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">exist_ok&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="kc">True&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">13&lt;/span>&lt;span class="cl"> &lt;span class="k">except&lt;/span> &lt;span class="ne">OSError&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">14&lt;/span>&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">_create_fallback_logger&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">hook_name&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">15&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">16&lt;/span>&lt;span class="cl"> &lt;span class="n">logger&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">logging&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">getLogger&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">hook_name&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">17&lt;/span>&lt;span class="cl"> &lt;span class="n">_clear_logger_handlers&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">logger&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">18&lt;/span>&lt;span class="cl"> &lt;span class="n">logger&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">setLevel&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">logging&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">DEBUG&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">19&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">20&lt;/span>&lt;span class="cl"> &lt;span class="n">is_debug&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">os&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">getenv&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s2">&amp;#34;HOOK_DEBUG&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s2">&amp;#34;&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">lower&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="o">==&lt;/span> &lt;span class="s2">&amp;#34;true&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">21&lt;/span>&lt;span class="cl"> &lt;span class="n">_setup_logger_handlers&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">logger&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">log_base_dir&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">sanitized_name&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">is_debug&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">22&lt;/span>&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">logger&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>這段程式碼有幾個值得注意的設計決策。&lt;/p>
&lt;p>&lt;strong>Named Logger&lt;/strong>：使用 &lt;code>logging.getLogger(hook_name)&lt;/code> 取得 named logger，而非 root logger。這確保每個 Hook 的日誌設定互不干擾：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="c1"># 每個 Hook 有自己的 logger 實例&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="n">logger_a&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">logging&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">getLogger&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s2">&amp;#34;acceptance-gate-hook&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">&lt;span class="n">logger_b&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">logging&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">getLogger&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s2">&amp;#34;command-entrance-gate-hook&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">&lt;span class="c1"># 兩者的 handlers、level、format 完全獨立&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;strong>Handler 清理&lt;/strong>：每次初始化前先清除舊的 handlers。這防止同一個 logger 被重複配置（例如在測試中多次呼叫 &lt;code>setup_hook_logging&lt;/code>）：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="k">def&lt;/span> &lt;span class="nf">_clear_logger_handlers&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">logger&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="n">logging&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Logger&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">-&amp;gt;&lt;/span> &lt;span class="kc">None&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl"> &lt;span class="s2">&amp;#34;&amp;#34;&amp;#34;清除 logger 的所有 handlers&amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl"> &lt;span class="k">for&lt;/span> &lt;span class="n">handler&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="n">logger&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">handlers&lt;/span>&lt;span class="p">[:]:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl"> &lt;span class="n">logger&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">removeHandler&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">handler&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl"> &lt;span class="n">handler&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">close&lt;/span>&lt;span class="p">()&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>注意 &lt;code>logger.handlers[:]&lt;/code> 的切片複製。直接遍歷 &lt;code>logger.handlers&lt;/code> 並在迴圈中 &lt;code>removeHandler&lt;/code> 會修改列表長度，導致跳過元素。這是 Python 中遍歷時修改集合的經典陷阱。&lt;/p>
&lt;p>&lt;strong>環境變數控制&lt;/strong>：透過 &lt;code>HOOK_DEBUG&lt;/code> 環境變數切換日誌詳細程度，不需要修改程式碼：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="c1"># 正常模式：stdout 只顯示 WARNING 以上&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">python3 my-hook.py
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">&lt;span class="c1"># 除錯模式：stdout 顯示所有等級&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl">&lt;span class="nv">HOOK_DEBUG&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="nb">true&lt;/span> python3 my-hook.py&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="14-雙通道輸出">1.4 雙通道輸出&lt;/h3>
&lt;p>每個 logger 配置兩個 handler，分別負責不同用途：&lt;/p></description><content:encoded><![CDATA[<p><a href="/blog/python/05-error-testing/error-infrastructure/" data-link-title="5.5 頂層例外處理機制" data-link-desc="run_hook_safely 與統一錯誤基礎設施">上一章</a>介紹了 <code>run_hook_safely</code> 這個頂層例外處理器，解決了「44 個 Hook 各自處理錯誤」的問題。但「捕獲錯誤」只是可觀測性的第一步。真正的問題是：</p>
<blockquote>
<p>當 44 個 Hook 每天執行數百次，你怎麼知道它們運行正常？出了問題你怎麼找到原因？</p></blockquote>
<p>本章從三個維度建立 Hook 系統的可觀測性：</p>
<table>
  <thead>
      <tr>
          <th>維度</th>
          <th>解決的問題</th>
          <th>核心機制</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>日誌架構</td>
          <td>每次執行的痕跡在哪裡？</td>
          <td>Structured Logging + Log Rotation</td>
      </tr>
      <tr>
          <td>錯誤可見性</td>
          <td>出錯了誰來告訴用戶？</td>
          <td>stderr 輸出 + Fallback 策略</td>
      </tr>
      <tr>
          <td>健康監控</td>
          <td>系統整體是否正常？</td>
          <td>執行時間追蹤 + 日誌清理</td>
      </tr>
  </tbody>
</table>
<hr>
<h2 id="一日誌架構設計">一、日誌架構設計</h2>
<h3 id="11-需求分析">1.1 需求分析</h3>
<p>Hook 日誌系統和一般應用程式的日誌有兩個根本差異：</p>
<table>
  <thead>
      <tr>
          <th>差異</th>
          <th>一般應用程式</th>
          <th>Hook 系統</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>生命週期</td>
          <td>長時間運行</td>
          <td>每次觸發執行一次（秒級）</td>
      </tr>
      <tr>
          <td>實例數量</td>
          <td>1-3 個服務</td>
          <td>44 個獨立腳本</td>
      </tr>
      <tr>
          <td>日誌量</td>
          <td>大量、持續</td>
          <td>少量、離散</td>
      </tr>
      <tr>
          <td>讀者</td>
          <td>運維團隊</td>
          <td>開發者自己</td>
      </tr>
  </tbody>
</table>
<p>這些差異決定了日誌架構的選擇：不需要集中式日誌服務，但需要<strong>按 Hook 名稱隔離</strong>和<strong>按時間自動清理</strong>。</p>
<h3 id="12-目錄結構設計">1.2 目錄結構設計</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln"> 1</span><span class="cl">.claude/hook-logs/
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">├── acceptance-gate-hook/
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">│   ├── acceptance-gate-hook-20260304-091523.log
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">│   ├── acceptance-gate-hook-20260304-091845.log
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">│   └── .cleanup_trigger           # 清理觸發計數器
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">├── command-entrance-gate-hook/
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">│   ├── command-entrance-gate-hook-20260304-091523.log
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">│   └── ...
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">└── phase-completion-gate-hook/
</span></span><span class="line"><span class="ln">10</span><span class="cl">    └── ...</span></span></code></pre></div><p>每個 Hook 有獨立的日誌目錄。每次執行產生一個獨立的日誌檔案，檔名包含時間戳。這個設計的好處：</p>
<ul>
<li><strong>隔離性</strong>：排查問題時只需看特定 Hook 的目錄</li>
<li><strong>時間線</strong>：按檔名排序就能看到執行歷史</li>
<li><strong>清理</strong>：按目錄或按時間清理都很容易</li>
</ul>
<h3 id="13-日誌系統初始化">1.3 日誌系統初始化</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="k">def</span> <span class="nf">setup_hook_logging</span><span class="p">(</span><span class="n">hook_name</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">logging</span><span class="o">.</span><span class="n">Logger</span><span class="p">:</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="s2">&#34;&#34;&#34;建立並設定 Hook 日誌系統&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">    <span class="k">if</span> <span class="ow">not</span> <span class="n">hook_name</span><span class="p">:</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">        <span class="n">hook_name</span> <span class="o">=</span> <span class="n">DEFAULT_HOOK_NAME</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">    <span class="n">sanitized_name</span> <span class="o">=</span> <span class="n">_sanitize_hook_name</span><span class="p">(</span><span class="n">hook_name</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">    <span class="n">root_dir</span> <span class="o">=</span> <span class="n">_find_project_root</span><span class="p">()</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">    <span class="n">log_base_dir</span> <span class="o">=</span> <span class="n">root_dir</span> <span class="o">/</span> <span class="s2">&#34;.claude&#34;</span> <span class="o">/</span> <span class="s2">&#34;hook-logs&#34;</span> <span class="o">/</span> <span class="n">sanitized_name</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">
</span></span><span class="line"><span class="ln">10</span><span class="cl">    <span class="c1"># 建立日誌目錄（失敗時降級，不拋出異常）</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">    <span class="k">try</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">        <span class="n">log_base_dir</span><span class="o">.</span><span class="n">mkdir</span><span class="p">(</span><span class="n">parents</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">exist_ok</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">    <span class="k">except</span> <span class="ne">OSError</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">        <span class="k">return</span> <span class="n">_create_fallback_logger</span><span class="p">(</span><span class="n">hook_name</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">
</span></span><span class="line"><span class="ln">16</span><span class="cl">    <span class="n">logger</span> <span class="o">=</span> <span class="n">logging</span><span class="o">.</span><span class="n">getLogger</span><span class="p">(</span><span class="n">hook_name</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl">    <span class="n">_clear_logger_handlers</span><span class="p">(</span><span class="n">logger</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">18</span><span class="cl">    <span class="n">logger</span><span class="o">.</span><span class="n">setLevel</span><span class="p">(</span><span class="n">logging</span><span class="o">.</span><span class="n">DEBUG</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">19</span><span class="cl">
</span></span><span class="line"><span class="ln">20</span><span class="cl">    <span class="n">is_debug</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">getenv</span><span class="p">(</span><span class="s2">&#34;HOOK_DEBUG&#34;</span><span class="p">,</span> <span class="s2">&#34;&#34;</span><span class="p">)</span><span class="o">.</span><span class="n">lower</span><span class="p">()</span> <span class="o">==</span> <span class="s2">&#34;true&#34;</span>
</span></span><span class="line"><span class="ln">21</span><span class="cl">    <span class="n">_setup_logger_handlers</span><span class="p">(</span><span class="n">logger</span><span class="p">,</span> <span class="n">log_base_dir</span><span class="p">,</span> <span class="n">sanitized_name</span><span class="p">,</span> <span class="n">is_debug</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">22</span><span class="cl">    <span class="k">return</span> <span class="n">logger</span></span></span></code></pre></div><p>這段程式碼有幾個值得注意的設計決策。</p>
<p><strong>Named Logger</strong>：使用 <code>logging.getLogger(hook_name)</code> 取得 named logger，而非 root logger。這確保每個 Hook 的日誌設定互不干擾：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># 每個 Hook 有自己的 logger 實例</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="n">logger_a</span> <span class="o">=</span> <span class="n">logging</span><span class="o">.</span><span class="n">getLogger</span><span class="p">(</span><span class="s2">&#34;acceptance-gate-hook&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="n">logger_b</span> <span class="o">=</span> <span class="n">logging</span><span class="o">.</span><span class="n">getLogger</span><span class="p">(</span><span class="s2">&#34;command-entrance-gate-hook&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="c1"># 兩者的 handlers、level、format 完全獨立</span></span></span></code></pre></div><p><strong>Handler 清理</strong>：每次初始化前先清除舊的 handlers。這防止同一個 logger 被重複配置（例如在測試中多次呼叫 <code>setup_hook_logging</code>）：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">def</span> <span class="nf">_clear_logger_handlers</span><span class="p">(</span><span class="n">logger</span><span class="p">:</span> <span class="n">logging</span><span class="o">.</span><span class="n">Logger</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="kc">None</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">    <span class="s2">&#34;&#34;&#34;清除 logger 的所有 handlers&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">    <span class="k">for</span> <span class="n">handler</span> <span class="ow">in</span> <span class="n">logger</span><span class="o">.</span><span class="n">handlers</span><span class="p">[:]:</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">        <span class="n">logger</span><span class="o">.</span><span class="n">removeHandler</span><span class="p">(</span><span class="n">handler</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">        <span class="n">handler</span><span class="o">.</span><span class="n">close</span><span class="p">()</span></span></span></code></pre></div><p>注意 <code>logger.handlers[:]</code> 的切片複製。直接遍歷 <code>logger.handlers</code> 並在迴圈中 <code>removeHandler</code> 會修改列表長度，導致跳過元素。這是 Python 中遍歷時修改集合的經典陷阱。</p>
<p><strong>環境變數控制</strong>：透過 <code>HOOK_DEBUG</code> 環境變數切換日誌詳細程度，不需要修改程式碼：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># 正常模式：stdout 只顯示 WARNING 以上</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">python3 my-hook.py
</span></span><span class="line"><span class="ln">3</span><span class="cl">
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="c1"># 除錯模式：stdout 顯示所有等級</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="nv">HOOK_DEBUG</span><span class="o">=</span><span class="nb">true</span> python3 my-hook.py</span></span></code></pre></div><h3 id="14-雙通道輸出">1.4 雙通道輸出</h3>
<p>每個 logger 配置兩個 handler，分別負責不同用途：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="k">def</span> <span class="nf">_setup_logger_handlers</span><span class="p">(</span><span class="n">logger</span><span class="p">,</span> <span class="n">log_base_dir</span><span class="p">,</span> <span class="n">sanitized_name</span><span class="p">,</span> <span class="n">is_debug</span><span class="p">):</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="s2">&#34;&#34;&#34;為 logger 配置 handlers&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">    <span class="c1"># 檔案 handler：記錄所有等級，供事後分析</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">    <span class="n">timestamp</span> <span class="o">=</span> <span class="n">datetime</span><span class="o">.</span><span class="n">now</span><span class="p">()</span><span class="o">.</span><span class="n">strftime</span><span class="p">(</span><span class="s2">&#34;%Y%m</span><span class="si">%d</span><span class="s2">-%H%M%S&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">    <span class="n">log_file_path</span> <span class="o">=</span> <span class="n">log_base_dir</span> <span class="o">/</span> <span class="sa">f</span><span class="s2">&#34;</span><span class="si">{</span><span class="n">sanitized_name</span><span class="si">}</span><span class="s2">-</span><span class="si">{</span><span class="n">timestamp</span><span class="si">}</span><span class="s2">.log&#34;</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">    <span class="n">file_handler</span> <span class="o">=</span> <span class="n">_create_file_handler</span><span class="p">(</span><span class="n">log_file_path</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">    <span class="k">if</span> <span class="n">file_handler</span><span class="p">:</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">        <span class="n">logger</span><span class="o">.</span><span class="n">addHandler</span><span class="p">(</span><span class="n">file_handler</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">
</span></span><span class="line"><span class="ln">10</span><span class="cl">    <span class="c1"># 控制台 handler：正常模式只顯示 WARNING+，除錯模式顯示全部</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">    <span class="n">logger</span><span class="o">.</span><span class="n">addHandler</span><span class="p">(</span><span class="n">_create_stream_handler</span><span class="p">(</span><span class="n">is_debug</span><span class="p">))</span></span></span></code></pre></div><table>
  <thead>
      <tr>
          <th>Handler</th>
          <th>輸出目標</th>
          <th>等級</th>
          <th>格式</th>
          <th>用途</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>FileHandler</td>
          <td>日誌檔案</td>
          <td>DEBUG</td>
          <td><code>[2026-03-04 09:15:23] DEBUG - message</code></td>
          <td>事後分析</td>
      </tr>
      <tr>
          <td>StreamHandler</td>
          <td>stdout</td>
          <td>WARNING（正常）/ DEBUG（除錯）</td>
          <td><code>[WARNING] message</code></td>
          <td>即時回饋</td>
      </tr>
  </tbody>
</table>
<p>為什麼 StreamHandler 輸出到 <strong>stdout</strong> 而非 stderr？這和 Claude Code 的 Hook 系統規則有關：</p>
<table>
  <thead>
      <tr>
          <th>輸出管道</th>
          <th>Claude Code 的解讀</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>stdout</td>
          <td>正常訊息，顯示為 <code>hook success</code></td>
      </tr>
      <tr>
          <td>stderr</td>
          <td>錯誤訊息，顯示為 <code>hook error</code></td>
      </tr>
  </tbody>
</table>
<p>日誌中的 WARNING 訊息是給開發者的提醒，不是 Hook 執行失敗。如果把 WARNING 輸出到 stderr，Claude Code 會把它當成錯誤。所以 StreamHandler 必須走 stdout。</p>
<h3 id="15-hook-名稱淨化">1.5 Hook 名稱淨化</h3>
<p>Hook 名稱會用於檔案系統路徑（目錄名和檔名），所以需要淨化：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="k">def</span> <span class="nf">_sanitize_hook_name</span><span class="p">(</span><span class="n">name</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">str</span><span class="p">:</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="s2">&#34;&#34;&#34;淨化 hook 名稱，移除無法用於檔案系統的字元&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">    <span class="k">if</span> <span class="ow">not</span> <span class="n">name</span><span class="p">:</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">        <span class="k">return</span> <span class="n">DEFAULT_HOOK_NAME</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">    <span class="k">for</span> <span class="n">char</span> <span class="ow">in</span> <span class="p">[</span><span class="s2">&#34;&lt;&#34;</span><span class="p">,</span> <span class="s2">&#34;&gt;&#34;</span><span class="p">,</span> <span class="s2">&#34;|&#34;</span><span class="p">]:</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">        <span class="n">name</span> <span class="o">=</span> <span class="n">name</span><span class="o">.</span><span class="n">replace</span><span class="p">(</span><span class="n">char</span><span class="p">,</span> <span class="s2">&#34;-&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">    <span class="n">name</span> <span class="o">=</span> <span class="n">name</span><span class="o">.</span><span class="n">replace</span><span class="p">(</span><span class="s2">&#34;/&#34;</span><span class="p">,</span> <span class="s2">&#34;-&#34;</span><span class="p">)</span><span class="o">.</span><span class="n">replace</span><span class="p">(</span><span class="s2">&#34;</span><span class="se">\\</span><span class="s2">&#34;</span><span class="p">,</span> <span class="s2">&#34;-&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">
</span></span><span class="line"><span class="ln">10</span><span class="cl">    <span class="c1"># 合併連續 &#34;-&#34; 並移除前後</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">    <span class="k">while</span> <span class="s2">&#34;--&#34;</span> <span class="ow">in</span> <span class="n">name</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">        <span class="n">name</span> <span class="o">=</span> <span class="n">name</span><span class="o">.</span><span class="n">replace</span><span class="p">(</span><span class="s2">&#34;--&#34;</span><span class="p">,</span> <span class="s2">&#34;-&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">    <span class="n">name</span> <span class="o">=</span> <span class="n">name</span><span class="o">.</span><span class="n">strip</span><span class="p">(</span><span class="s2">&#34;-&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">
</span></span><span class="line"><span class="ln">15</span><span class="cl">    <span class="k">return</span> <span class="n">name</span> <span class="k">if</span> <span class="n">name</span> <span class="k">else</span> <span class="n">DEFAULT_HOOK_NAME</span></span></span></code></pre></div><p>這是防禦性程式設計的典型例子。雖然目前所有 Hook 的名稱都是合法的檔案名（像 <code>acceptance-gate-hook</code>），但<strong>不能假設呼叫端一定傳入合法名稱</strong>。淨化函式確保即使傳入 <code>&lt;invalid|name&gt;</code> 也能產生合法的目錄名 <code>invalid-name</code>。</p>
<h3 id="16-專案根目錄定位">1.6 專案根目錄定位</h3>
<p>日誌目錄在專案根目錄下的 <code>.claude/hook-logs/</code>。但 Hook 可能從不同的工作目錄被執行，所以需要動態定位：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="k">def</span> <span class="nf">_find_project_root</span><span class="p">()</span> <span class="o">-&gt;</span> <span class="n">Path</span><span class="p">:</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="s2">&#34;&#34;&#34;查詢專案根目錄
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="s2">
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="s2">    優先順序：
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="s2">    1. 環境變數 CLAUDE_PROJECT_DIR
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="s2">    2. 從 cwd 向上搜尋 CLAUDE.md（最多 5 層）
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="s2">    3. os.getcwd() fallback（永不失敗）
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="s2">    &#34;&#34;&#34;</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">    <span class="n">env_dir</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">getenv</span><span class="p">(</span><span class="s2">&#34;CLAUDE_PROJECT_DIR&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">    <span class="k">if</span> <span class="n">env_dir</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">        <span class="k">return</span> <span class="n">Path</span><span class="p">(</span><span class="n">env_dir</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">
</span></span><span class="line"><span class="ln">13</span><span class="cl">    <span class="n">current_dir</span> <span class="o">=</span> <span class="n">Path</span><span class="o">.</span><span class="n">cwd</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">    <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">CLAUDE_MD_SEARCH_DEPTH</span><span class="p">):</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">        <span class="k">if</span> <span class="p">(</span><span class="n">current_dir</span> <span class="o">/</span> <span class="s2">&#34;CLAUDE.md&#34;</span><span class="p">)</span><span class="o">.</span><span class="n">exists</span><span class="p">():</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl">            <span class="k">return</span> <span class="n">current_dir</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl">        <span class="n">parent</span> <span class="o">=</span> <span class="n">current_dir</span><span class="o">.</span><span class="n">parent</span>
</span></span><span class="line"><span class="ln">18</span><span class="cl">        <span class="k">if</span> <span class="n">parent</span> <span class="o">==</span> <span class="n">current_dir</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">19</span><span class="cl">            <span class="k">break</span>
</span></span><span class="line"><span class="ln">20</span><span class="cl">        <span class="n">current_dir</span> <span class="o">=</span> <span class="n">parent</span>
</span></span><span class="line"><span class="ln">21</span><span class="cl">
</span></span><span class="line"><span class="ln">22</span><span class="cl">    <span class="k">return</span> <span class="n">Path</span><span class="o">.</span><span class="n">cwd</span><span class="p">()</span></span></span></code></pre></div><p>三層 fallback 的設計邏輯：</p>
<table>
  <thead>
      <tr>
          <th>優先級</th>
          <th>方式</th>
          <th>適用場景</th>
          <th>失敗條件</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>1</td>
          <td>環境變數</td>
          <td>Claude Code 啟動時自動設定</td>
          <td>手動執行時未設定</td>
      </tr>
      <tr>
          <td>2</td>
          <td>向上搜尋 CLAUDE.md</td>
          <td>手動執行、測試</td>
          <td>在非專案目錄執行</td>
      </tr>
      <tr>
          <td>3</td>
          <td>cwd</td>
          <td>最後手段</td>
          <td>永不失敗</td>
      </tr>
  </tbody>
</table>
<p>注意搜尋深度限制 <code>CLAUDE_MD_SEARCH_DEPTH = 5</code>。不做深度限制的話，在 <code>/</code> 目錄執行時會遍歷整個檔案系統。5 層足以覆蓋大多數專案結構（<code>/Users/user/projects/my-app/.claude/hooks/</code> 需要 4 層）。</p>
<hr>
<h2 id="二錯誤可見性設計">二、錯誤可見性設計</h2>
<h3 id="21-核心問題靜默失敗">2.1 核心問題：靜默失敗</h3>
<p>IMP-003 事件是錯誤可見性設計的直接動機。7 個 Hook 因為變數作用域問題（<code>NameError</code>）靜默失敗了至少 2 個 session。失敗的流程是：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">Hook 執行 → NameError → run_hook_safely 捕獲 → 寫入日誌檔案 → 返回 EXIT_ERROR
</span></span><span class="line"><span class="ln">2</span><span class="cl">                                                    ↑
</span></span><span class="line"><span class="ln">3</span><span class="cl">                                              用戶看不到這裡</span></span></code></pre></div><p>問題出在 <code>_log_exception</code> 的初版只寫入日誌檔案：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># W25-005 之前的版本（有缺陷）</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="k">def</span> <span class="nf">_log_exception</span><span class="p">(</span><span class="n">logger</span><span class="p">,</span> <span class="n">hook_name</span><span class="p">,</span> <span class="n">tb_str</span><span class="p">):</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">    <span class="n">logger</span><span class="o">.</span><span class="n">critical</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;Unhandled exception in </span><span class="si">{</span><span class="n">hook_name</span><span class="si">}</span><span class="s2">&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">    <span class="n">logger</span><span class="o">.</span><span class="n">critical</span><span class="p">(</span><span class="n">tb_str</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">    <span class="c1"># 到這裡就結束了 -- 用戶完全不知道出錯</span></span></span></code></pre></div><h3 id="22-修正stderr-強制可見">2.2 修正：stderr 強制可見</h3>
<p>W25-005 在日誌寫入之後加了一行 stderr 輸出：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="k">def</span> <span class="nf">_log_exception</span><span class="p">(</span><span class="n">logger</span><span class="p">,</span> <span class="n">hook_name</span><span class="p">,</span> <span class="n">tb_str</span><span class="p">):</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="s2">&#34;&#34;&#34;記錄異常 traceback 到日誌&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">    <span class="c1"># 1. 寫入日誌檔案（完整 traceback，供事後分析）</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">    <span class="k">try</span><span class="p">:</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">        <span class="n">logger</span><span class="o">.</span><span class="n">critical</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;Unhandled exception in </span><span class="si">{</span><span class="n">hook_name</span><span class="si">}</span><span class="s2">&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">        <span class="n">logger</span><span class="o">.</span><span class="n">critical</span><span class="p">(</span><span class="n">tb_str</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">    <span class="k">except</span> <span class="ne">Exception</span> <span class="k">as</span> <span class="n">logging_error</span><span class="p">:</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">        <span class="c1"># 日誌系統本身也可能失敗（磁碟滿了、權限問題）</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">        <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;Failed to log exception: </span><span class="si">{</span><span class="n">logging_error</span><span class="si">}</span><span class="s2">&#34;</span><span class="p">,</span> <span class="n">file</span><span class="o">=</span><span class="n">sys</span><span class="o">.</span><span class="n">stdout</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">        <span class="nb">print</span><span class="p">(</span><span class="n">tb_str</span><span class="p">,</span> <span class="n">file</span><span class="o">=</span><span class="n">sys</span><span class="o">.</span><span class="n">stdout</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">
</span></span><span class="line"><span class="ln">12</span><span class="cl">    <span class="c1"># 2. 輸出到 stderr，讓 Claude Code 顯示 &#34;hook error&#34;（W25-005 新增）</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">    <span class="nb">print</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">        <span class="sa">f</span><span class="s2">&#34;[Hook Error] </span><span class="si">{</span><span class="n">hook_name</span><span class="si">}</span><span class="s2"> failed unexpectedly. &#34;</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">        <span class="sa">f</span><span class="s2">&#34;Check hook logs for details.&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl">        <span class="n">file</span><span class="o">=</span><span class="n">sys</span><span class="o">.</span><span class="n">stderr</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl">    <span class="p">)</span></span></span></code></pre></div><p>這個設計的關鍵在於<strong>兩層輸出各司其職</strong>：</p>
<table>
  <thead>
      <tr>
          <th>輸出</th>
          <th>目標</th>
          <th>內容</th>
          <th>讀者</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>日誌檔案</td>
          <td><code>.claude/hook-logs/{name}/</code></td>
          <td>完整 traceback</td>
          <td>開發者（事後分析）</td>
      </tr>
      <tr>
          <td>stderr</td>
          <td>Claude Code UI</td>
          <td>簡短錯誤提示</td>
          <td>用戶（即時感知）</td>
      </tr>
  </tbody>
</table>
<p><strong>為什麼不把完整 traceback 輸出到 stderr？</strong> 因為 stderr 的內容會直接顯示在 Claude Code 的對話介面中。一段 20 行的 Python traceback 對用戶來說是噪音。只需要告訴用戶「哪個 Hook 出錯了」和「去哪裡看詳情」就夠了。</p>
<h3 id="23-日誌系統自身的-fallback">2.3 日誌系統自身的 Fallback</h3>
<p>如果日誌系統本身出了問題（例如磁碟已滿，無法寫入日誌檔案），怎麼辦？</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># 目錄建立失敗時的 Fallback</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="k">try</span><span class="p">:</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">    <span class="n">log_base_dir</span><span class="o">.</span><span class="n">mkdir</span><span class="p">(</span><span class="n">parents</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">exist_ok</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="k">except</span> <span class="ne">OSError</span><span class="p">:</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">    <span class="k">return</span> <span class="n">_create_fallback_logger</span><span class="p">(</span><span class="n">hook_name</span><span class="p">)</span>  <span class="c1"># 降級為純 stdout 輸出</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="k">def</span> <span class="nf">_create_fallback_logger</span><span class="p">(</span><span class="n">hook_name</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">logging</span><span class="o">.</span><span class="n">Logger</span><span class="p">:</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">    <span class="s2">&#34;&#34;&#34;建立 Fallback Logger（僅 StreamHandler）&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">    <span class="n">logger</span> <span class="o">=</span> <span class="n">logging</span><span class="o">.</span><span class="n">getLogger</span><span class="p">(</span><span class="n">hook_name</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">    <span class="n">_clear_logger_handlers</span><span class="p">(</span><span class="n">logger</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">    <span class="n">logger</span><span class="o">.</span><span class="n">setLevel</span><span class="p">(</span><span class="n">logging</span><span class="o">.</span><span class="n">DEBUG</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">    <span class="n">logger</span><span class="o">.</span><span class="n">addHandler</span><span class="p">(</span><span class="n">_create_stream_handler</span><span class="p">())</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">    <span class="k">return</span> <span class="n">logger</span></span></span></code></pre></div><p>Fallback Logger 只有 StreamHandler（stdout），沒有 FileHandler。這表示日誌不會被儲存到檔案，但<strong>至少 Hook 能正常運行</strong>，而且重要訊息仍然會出現在控制台。</p>
<p>這體現了一個重要的設計原則：<strong>可觀測性基礎設施的故障不應該導致業務功能中斷</strong>。日誌系統壞了，Hook 仍然要能工作。</p>
<h3 id="24-imp-005-的教訓import-階段的可見性">2.4 IMP-005 的教訓：Import 階段的可見性</h3>
<p>IMP-005 暴露了另一個可見性盲區：<strong>import 階段的錯誤</strong>。當模組遷移後 import 路徑沒更新，<code>ModuleNotFoundError</code> 在 <code>run_hook_safely</code> 之前就發生了：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="ch">#!/usr/bin/env python3</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="kn">import</span> <span class="nn">sys</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="kn">from</span> <span class="nn">pathlib</span> <span class="kn">import</span> <span class="n">Path</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="c1"># 這一行在 run_hook_safely 之前執行</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="c1"># 如果失敗，run_hook_safely 根本不會被呼叫</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="kn">from</span> <span class="nn">lib.common_functions</span> <span class="kn">import</span> <span class="n">hook_output</span>  <span class="c1"># ModuleNotFoundError!</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="kn">from</span> <span class="nn">hook_utils</span> <span class="kn">import</span> <span class="n">run_hook_safely</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">
</span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="k">def</span> <span class="nf">main</span><span class="p">()</span> <span class="o">-&gt;</span> <span class="nb">int</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">    <span class="c1"># ...</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">    <span class="k">return</span> <span class="mi">0</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">
</span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s2">&#34;__main__&#34;</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl">    <span class="n">sys</span><span class="o">.</span><span class="n">exit</span><span class="p">(</span><span class="n">run_hook_safely</span><span class="p">(</span><span class="n">main</span><span class="p">,</span> <span class="s2">&#34;my-hook&#34;</span><span class="p">))</span></span></span></code></pre></div><p><code>run_hook_safely</code> 的保護範圍是 <code>main()</code> 函式內部，但 import 發生在模組載入階段。解決方案是在 import 處加入 try-except 防護：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="ch">#!/usr/bin/env python3</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="kn">import</span> <span class="nn">sys</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="kn">from</span> <span class="nn">pathlib</span> <span class="kn">import</span> <span class="n">Path</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="c1"># Import 防護：確保失敗時有明確的 stderr 輸出</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="k">try</span><span class="p">:</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">    <span class="n">sys</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">insert</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="nb">str</span><span class="p">(</span><span class="n">Path</span><span class="p">(</span><span class="vm">__file__</span><span class="p">)</span><span class="o">.</span><span class="n">parent</span><span class="p">))</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">    <span class="kn">from</span> <span class="nn">hook_utils</span> <span class="kn">import</span> <span class="n">run_hook_safely</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">    <span class="kn">from</span> <span class="nn">lib.common_functions</span> <span class="kn">import</span> <span class="n">hook_output</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="k">except</span> <span class="ne">ImportError</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">    <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;[Hook Import Error] </span><span class="si">{</span><span class="n">Path</span><span class="p">(</span><span class="vm">__file__</span><span class="p">)</span><span class="o">.</span><span class="n">name</span><span class="si">}</span><span class="s2">: </span><span class="si">{</span><span class="n">e</span><span class="si">}</span><span class="s2">&#34;</span><span class="p">,</span> <span class="n">file</span><span class="o">=</span><span class="n">sys</span><span class="o">.</span><span class="n">stderr</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">    <span class="n">sys</span><span class="o">.</span><span class="n">exit</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span></span></span></code></pre></div><table>
  <thead>
      <tr>
          <th>沒有 Import 防護</th>
          <th>有 Import 防護</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Claude Code 顯示 <code>hook error</code></td>
          <td>Claude Code 顯示 <code>hook error</code></td>
      </tr>
      <tr>
          <td>無法得知是哪個 Hook</td>
          <td><code>[Hook Import Error] my-hook.py: No module named 'common_functions'</code></td>
      </tr>
      <tr>
          <td>無法得知什麼原因</td>
          <td>精確到模組名稱和檔案名稱</td>
      </tr>
  </tbody>
</table>
<h3 id="25-imp-006-的教訓兩條錯誤路徑">2.5 IMP-006 的教訓：兩條錯誤路徑</h3>
<p>IMP-006 案例 D 揭示了一個更隱蔽的問題：Hook 有兩條不同的「失敗路徑」，但只有一條有 stderr 輸出。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="k">def</span> <span class="nf">main</span><span class="p">()</span> <span class="o">-&gt;</span> <span class="nb">int</span><span class="p">:</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="c1"># ...驗證邏輯...</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">    <span class="k">if</span> <span class="n">should_block</span><span class="p">:</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">        <span class="c1"># 路徑 1：業務邏輯拒絕（有意阻止）</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">        <span class="n">result</span> <span class="o">=</span> <span class="p">{</span><span class="s2">&#34;error&#34;</span><span class="p">:</span> <span class="n">error_message</span><span class="p">}</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">        <span class="nb">print</span><span class="p">(</span><span class="n">json</span><span class="o">.</span><span class="n">dumps</span><span class="p">(</span><span class="n">result</span><span class="p">),</span> <span class="n">file</span><span class="o">=</span><span class="n">sys</span><span class="o">.</span><span class="n">stdout</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">        <span class="k">return</span> <span class="mi">2</span>  <span class="c1"># 只有 stdout，沒有 stderr！</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">
</span></span><span class="line"><span class="ln">10</span><span class="cl">    <span class="k">return</span> <span class="mi">0</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">
</span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="c1"># run_hook_safely 包裝</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="c1"># 路徑 2：未預期異常 -- _log_exception 已有 stderr 輸出</span></span></span></code></pre></div><p>開發者只考慮了「未預期異常」這條路徑（由 <code>_log_exception</code> 處理），忘了「有意阻止」也需要 stderr 輸出。修復：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">if</span> <span class="n">should_block</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">    <span class="n">result</span> <span class="o">=</span> <span class="p">{</span><span class="s2">&#34;error&#34;</span><span class="p">:</span> <span class="n">error_message</span><span class="p">}</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">    <span class="nb">print</span><span class="p">(</span><span class="n">json</span><span class="o">.</span><span class="n">dumps</span><span class="p">(</span><span class="n">result</span><span class="p">),</span> <span class="n">file</span><span class="o">=</span><span class="n">sys</span><span class="o">.</span><span class="n">stdout</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">    <span class="c1"># 新增：確保用戶在 Claude Code UI 能看到拒絕原因</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">    <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;[Agent Ticket Validation] blocked: </span><span class="si">{</span><span class="n">error_message</span><span class="si">}</span><span class="s2">&#34;</span><span class="p">,</span> <span class="n">file</span><span class="o">=</span><span class="n">sys</span><span class="o">.</span><span class="n">stderr</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl">    <span class="k">return</span> <span class="mi">2</span></span></span></code></pre></div><p>教訓歸納為一條規則：<strong>Hook 的所有非成功路徑都必須有 stderr 輸出</strong>。不只是 exception，業務邏輯的拒絕也算。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">Hook 執行結果
</span></span><span class="line"><span class="ln">2</span><span class="cl">├── 成功（return 0）→ stdout 正常訊息
</span></span><span class="line"><span class="ln">3</span><span class="cl">├── 未預期異常（Exception）→ stderr 由 _log_exception 處理
</span></span><span class="line"><span class="ln">4</span><span class="cl">└── 有意阻止（return 非 0）→ stderr 必須有原因說明  ← 容易遺漏</span></span></code></pre></div><hr>
<h2 id="三健康監控設計">三、健康監控設計</h2>
<h3 id="31-執行時間追蹤">3.1 執行時間追蹤</h3>
<p><code>run_hook_safely</code> 記錄每次執行的耗時：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="k">def</span> <span class="nf">run_hook_safely</span><span class="p">(</span><span class="n">main_func</span><span class="p">,</span> <span class="n">hook_name</span><span class="p">):</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="n">logger</span> <span class="o">=</span> <span class="n">setup_hook_logging</span><span class="p">(</span><span class="n">hook_name</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">    <span class="n">start_time</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">time</span><span class="p">()</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">    <span class="k">try</span><span class="p">:</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">        <span class="n">exit_code</span> <span class="o">=</span> <span class="n">main_func</span><span class="p">()</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">        <span class="n">elapsed_time</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">time</span><span class="p">()</span> <span class="o">-</span> <span class="n">start_time</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">        <span class="n">logger</span><span class="o">.</span><span class="n">debug</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;Hook execution time: </span><span class="si">{</span><span class="n">elapsed_time</span><span class="si">:</span><span class="s2">.2f</span><span class="si">}</span><span class="s2">s&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">        <span class="k">return</span> <span class="n">exit_code</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">    <span class="k">except</span> <span class="p">(</span><span class="ne">KeyboardInterrupt</span><span class="p">,</span> <span class="ne">SystemExit</span><span class="p">):</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">        <span class="k">raise</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">    <span class="k">except</span> <span class="ne">Exception</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">        <span class="n">elapsed_time</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">time</span><span class="p">()</span> <span class="o">-</span> <span class="n">start_time</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">        <span class="n">logger</span><span class="o">.</span><span class="n">debug</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;Hook execution time before failure: </span><span class="si">{</span><span class="n">elapsed_time</span><span class="si">:</span><span class="s2">.2f</span><span class="si">}</span><span class="s2">s&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">        <span class="n">tb_str</span> <span class="o">=</span> <span class="n">traceback</span><span class="o">.</span><span class="n">format_exc</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl">        <span class="n">_log_exception</span><span class="p">(</span><span class="n">logger</span><span class="p">,</span> <span class="n">hook_name</span><span class="p">,</span> <span class="n">tb_str</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl">        <span class="k">return</span> <span class="n">EXIT_ERROR</span></span></span></code></pre></div><p>注意兩處 <code>elapsed_time</code> 的記錄位置——成功和失敗路徑各記一次。失敗時記錄「失敗前的執行時間」，可以判斷是立即失敗（import 錯誤，&lt; 0.01s）還是在執行過程中失敗（邏輯錯誤，可能數秒）。</p>
<p>日誌檔案中的記錄：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">[2026-03-04 09:15:23] DEBUG - Hook execution time: 0.05s       # 正常
</span></span><span class="line"><span class="ln">2</span><span class="cl">[2026-03-04 09:15:24] DEBUG - Hook execution time: 2.34s       # 偏慢，值得關注
</span></span><span class="line"><span class="ln">3</span><span class="cl">[2026-03-04 09:15:25] DEBUG - Hook execution time before failure: 0.00s  # import 階段就失敗了</span></span></code></pre></div><p>這些數據在 IMP-006 案例 C 的排查中發揮了作用。hookify plugin 的 timeout 設定為 10ms，而 Python 啟動需要約 24ms。比對 Hook 執行時間和 timeout 設定，就能定位超時問題。</p>
<h3 id="32-日誌自動清理log-rotation">3.2 日誌自動清理（Log Rotation）</h3>
<p>44 個 Hook 每天執行數百次，日誌檔案會快速累積。自動清理機制避免磁碟空間被耗盡：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="n">LOG_RETENTION_DAYS</span> <span class="o">=</span> <span class="mi">7</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="n">LOG_CLEANUP_TRIGGER_FREQUENCY</span> <span class="o">=</span> <span class="mi">10</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="k">def</span> <span class="nf">_cleanup_old_logs</span><span class="p">(</span><span class="n">log_base_dir</span><span class="p">:</span> <span class="n">Path</span><span class="p">,</span> <span class="n">retention_days</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="n">LOG_RETENTION_DAYS</span><span class="p">):</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">    <span class="s2">&#34;&#34;&#34;清理超期日誌檔案&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">    <span class="k">try</span><span class="p">:</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">        <span class="n">cutoff_time</span> <span class="o">=</span> <span class="n">datetime</span><span class="o">.</span><span class="n">now</span><span class="p">()</span> <span class="o">-</span> <span class="n">timedelta</span><span class="p">(</span><span class="n">days</span><span class="o">=</span><span class="n">retention_days</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">        <span class="k">for</span> <span class="n">log_file</span> <span class="ow">in</span> <span class="n">log_base_dir</span><span class="o">.</span><span class="n">glob</span><span class="p">(</span><span class="s2">&#34;*.log&#34;</span><span class="p">):</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">            <span class="k">try</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">                <span class="n">mtime</span> <span class="o">=</span> <span class="n">datetime</span><span class="o">.</span><span class="n">fromtimestamp</span><span class="p">(</span><span class="n">log_file</span><span class="o">.</span><span class="n">stat</span><span class="p">()</span><span class="o">.</span><span class="n">st_mtime</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">                <span class="k">if</span> <span class="n">mtime</span> <span class="o">&lt;</span> <span class="n">cutoff_time</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">                    <span class="n">log_file</span><span class="o">.</span><span class="n">unlink</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">            <span class="k">except</span> <span class="p">(</span><span class="ne">OSError</span><span class="p">,</span> <span class="ne">ValueError</span><span class="p">):</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">                <span class="k">pass</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">    <span class="k">except</span> <span class="ne">OSError</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl">        <span class="k">pass</span></span></span></code></pre></div><h4 id="為什麼不用-python-標準庫的-rotatingfilehandler">為什麼不用 Python 標準庫的 RotatingFileHandler</h4>
<p><code>RotatingFileHandler</code> 按照<strong>單一檔案大小</strong>輪轉，適合長時間運行的服務。但 Hook 系統的日誌模式是每次執行一個新檔案，需要的是按<strong>時間</strong>清理舊檔案。兩者的需求場景不同：</p>
<table>
  <thead>
      <tr>
          <th>機制</th>
          <th>適用場景</th>
          <th>Hook 系統需求</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>RotatingFileHandler</td>
          <td>單一長期運行程序，同一個日誌檔</td>
          <td>不適用</td>
      </tr>
      <tr>
          <td>TimedRotatingFileHandler</td>
          <td>單一程序按時間分割日誌</td>
          <td>部分適用</td>
      </tr>
      <tr>
          <td>自訂清理</td>
          <td>多程序、每次新檔案、按時間保留</td>
          <td>適用</td>
      </tr>
  </tbody>
</table>
<h3 id="33-清理頻率控制">3.3 清理頻率控制</h3>
<p>每次 Hook 執行都檢查是否需要清理，這本身也有成本。所以用一個 <code>.cleanup_trigger</code> 檔案作為計數器，每 N 次呼叫才真正執行清理：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="k">def</span> <span class="nf">_setup_logger_handlers</span><span class="p">(</span><span class="n">logger</span><span class="p">,</span> <span class="n">log_base_dir</span><span class="p">,</span> <span class="n">sanitized_name</span><span class="p">,</span> <span class="n">is_debug</span><span class="p">):</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="s2">&#34;&#34;&#34;為 logger 配置 handlers&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">    <span class="c1"># 觸發日誌清理（降低頻率）</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">    <span class="n">cleanup_marker</span> <span class="o">=</span> <span class="n">log_base_dir</span> <span class="o">/</span> <span class="s2">&#34;.cleanup_trigger&#34;</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">    <span class="k">try</span><span class="p">:</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">        <span class="k">if</span> <span class="n">cleanup_marker</span><span class="o">.</span><span class="n">exists</span><span class="p">():</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">            <span class="n">count</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">cleanup_marker</span><span class="o">.</span><span class="n">read_text</span><span class="p">()</span><span class="o">.</span><span class="n">strip</span><span class="p">()</span> <span class="ow">or</span> <span class="s2">&#34;0&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">            <span class="k">if</span> <span class="n">count</span> <span class="o">&gt;=</span> <span class="n">LOG_CLEANUP_TRIGGER_FREQUENCY</span><span class="p">:</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">                <span class="n">_cleanup_old_logs</span><span class="p">(</span><span class="n">log_base_dir</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">                <span class="n">cleanup_marker</span><span class="o">.</span><span class="n">write_text</span><span class="p">(</span><span class="s2">&#34;0&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">            <span class="k">else</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">                <span class="n">cleanup_marker</span><span class="o">.</span><span class="n">write_text</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">count</span> <span class="o">+</span> <span class="mi">1</span><span class="p">))</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">        <span class="k">else</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">            <span class="n">cleanup_marker</span><span class="o">.</span><span class="n">write_text</span><span class="p">(</span><span class="s2">&#34;1&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">    <span class="k">except</span> <span class="p">(</span><span class="ne">OSError</span><span class="p">,</span> <span class="ne">ValueError</span><span class="p">):</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl">        <span class="k">pass</span>  <span class="c1"># 清理失敗不影響日誌功能</span></span></span></code></pre></div><p><code>LOG_CLEANUP_TRIGGER_FREQUENCY = 10</code> 表示每 10 次執行才清理一次。這是一個權衡：</p>
<table>
  <thead>
      <tr>
          <th>頻率</th>
          <th>好處</th>
          <th>代價</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>每次（1）</td>
          <td>日誌目錄永遠乾淨</td>
          <td>每次 Hook 都多一次目錄掃描</td>
      </tr>
      <tr>
          <td>每 10 次</td>
          <td>幾乎感覺不到開銷</td>
          <td>最多累積 10 個多餘檔案</td>
      </tr>
      <tr>
          <td>每 100 次</td>
          <td>開銷最小</td>
          <td>可能累積數百個多餘檔案</td>
      </tr>
  </tbody>
</table>
<p><strong>為什麼用檔案而不用記憶體計數器？</strong> 因為 Hook 是獨立程序，每次執行都是新進程。記憶體中的計數器在進程結束後就消失了。檔案是跨進程持久化的最簡單方式。</p>
<p>注意最外層的 <code>except (OSError, ValueError): pass</code>。清理機制本身的故障（例如檔案被鎖定、計數器檔案損壞）不應該影響日誌功能。這和 Fallback Logger 的設計原則一致：<strong>輔助功能的故障不阻擋核心功能</strong>。</p>
<hr>
<h2 id="四三個錯誤模式的可觀測性教訓">四、三個錯誤模式的可觀測性教訓</h2>
<p>前面三個維度的設計，很大程度源自三個真實錯誤模式（IMP-003、IMP-005、IMP-006）的教訓。把它們放在一起看，可以提煉出可觀測性設計的通用原則。</p>
<h3 id="41-imp-003作用域迴歸--靜默失敗的代價">4.1 IMP-003：作用域迴歸 &ndash; 靜默失敗的代價</h3>
<table>
  <thead>
      <tr>
          <th>項目</th>
          <th>說明</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><strong>事件</strong></td>
          <td>7 個 Hook 因 <code>NameError</code> 靜默失敗 2+ session</td>
      </tr>
      <tr>
          <td><strong>根因</strong></td>
          <td>logger 從全域移入 main()，引用者未更新</td>
      </tr>
      <tr>
          <td><strong>可觀測性缺陷</strong></td>
          <td><code>_log_exception</code> 只寫檔案日誌，不輸出 stderr</td>
      </tr>
      <tr>
          <td><strong>修正</strong></td>
          <td>新增 stderr 輸出（W25-005）</td>
      </tr>
      <tr>
          <td><strong>通用原則</strong></td>
          <td><strong>錯誤必須有用戶可感知的通知管道</strong></td>
      </tr>
  </tbody>
</table>
<p>詳細的作用域分析見<a href="/blog/python/07-refactoring/scope-regression/" data-link-title="作用域迴歸案例研究" data-link-desc="從 IMP-003 事件學習 Python 變數作用域的陷阱">作用域迴歸案例研究</a>。</p>
<h3 id="42-imp-005import-未同步--保護範圍的盲區">4.2 IMP-005：Import 未同步 &ndash; 保護範圍的盲區</h3>
<table>
  <thead>
      <tr>
          <th>項目</th>
          <th>說明</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><strong>事件</strong></td>
          <td>5 個 Hook 因 <code>ModuleNotFoundError</code> 啟動失敗</td>
      </tr>
      <tr>
          <td><strong>根因</strong></td>
          <td>模組遷移後 import 路徑未更新</td>
      </tr>
      <tr>
          <td><strong>可觀測性缺陷</strong></td>
          <td><code>run_hook_safely</code> 無法保護 import 階段</td>
      </tr>
      <tr>
          <td><strong>修正</strong></td>
          <td>在 import 處加入 try-except + stderr</td>
      </tr>
      <tr>
          <td><strong>通用原則</strong></td>
          <td><strong>頂層保護的範圍必須覆蓋所有執行階段</strong></td>
      </tr>
  </tbody>
</table>
<h3 id="43-imp-006隱性故障--錯誤路徑的完整性">4.3 IMP-006：隱性故障 &ndash; 錯誤路徑的完整性</h3>
<table>
  <thead>
      <tr>
          <th>項目</th>
          <th>說明</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><strong>事件</strong></td>
          <td>多種不同根因的 hook error 無法區分</td>
      </tr>
      <tr>
          <td><strong>案例 A</strong></td>
          <td>函式參數遺漏（部分 call site 缺少 logger）</td>
      </tr>
      <tr>
          <td><strong>案例 C</strong></td>
          <td>Plugin timeout 10ms，Python 啟動需 24ms</td>
      </tr>
      <tr>
          <td><strong>案例 D</strong></td>
          <td>有意阻止路徑缺少 stderr</td>
      </tr>
      <tr>
          <td><strong>通用原則</strong></td>
          <td><strong>所有非成功路徑都需要可區分的錯誤輸出</strong></td>
      </tr>
  </tbody>
</table>
<h3 id="44-共通教訓">4.4 共通教訓</h3>
<p>三個錯誤模式的共通點，提煉為三條可觀測性設計規則：</p>
<h4 id="規則-1錯誤不可靜默">規則 1：錯誤不可靜默</h4>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># 錯誤做法：只寫日誌，用戶不知道</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="n">logger</span><span class="o">.</span><span class="n">critical</span><span class="p">(</span><span class="n">tb_str</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="c1"># 正確做法：日誌 + 用戶通知</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="n">logger</span><span class="o">.</span><span class="n">critical</span><span class="p">(</span><span class="n">tb_str</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;[Hook Error] </span><span class="si">{</span><span class="n">hook_name</span><span class="si">}</span><span class="s2"> failed&#34;</span><span class="p">,</span> <span class="n">file</span><span class="o">=</span><span class="n">sys</span><span class="o">.</span><span class="n">stderr</span><span class="p">)</span></span></span></code></pre></div><h4 id="規則-2保護必須完整">規則 2：保護必須完整</h4>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># 錯誤做法：只保護 main()</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="n">sys</span><span class="o">.</span><span class="n">exit</span><span class="p">(</span><span class="n">run_hook_safely</span><span class="p">(</span><span class="n">main</span><span class="p">,</span> <span class="s2">&#34;hook&#34;</span><span class="p">))</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="c1"># 正確做法：import 也要保護</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="k">try</span><span class="p">:</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">    <span class="kn">from</span> <span class="nn">lib.module</span> <span class="kn">import</span> <span class="n">function</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="k">except</span> <span class="ne">ImportError</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">    <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;[Hook Import Error] </span><span class="si">{</span><span class="vm">__file__</span><span class="si">}</span><span class="s2">: </span><span class="si">{</span><span class="n">e</span><span class="si">}</span><span class="s2">&#34;</span><span class="p">,</span> <span class="n">file</span><span class="o">=</span><span class="n">sys</span><span class="o">.</span><span class="n">stderr</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">    <span class="n">sys</span><span class="o">.</span><span class="n">exit</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">
</span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="n">sys</span><span class="o">.</span><span class="n">exit</span><span class="p">(</span><span class="n">run_hook_safely</span><span class="p">(</span><span class="n">main</span><span class="p">,</span> <span class="s2">&#34;hook&#34;</span><span class="p">))</span></span></span></code></pre></div><h4 id="規則-3錯誤要可區分">規則 3：錯誤要可區分</h4>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># 錯誤做法：所有錯誤用同一種訊息</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="nb">print</span><span class="p">(</span><span class="s2">&#34;hook error&#34;</span><span class="p">,</span> <span class="n">file</span><span class="o">=</span><span class="n">sys</span><span class="o">.</span><span class="n">stderr</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="c1"># 正確做法：包含 Hook 名稱和錯誤類型</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;[Hook Error] </span><span class="si">{</span><span class="n">hook_name</span><span class="si">}</span><span class="s2"> failed unexpectedly&#34;</span><span class="p">,</span> <span class="n">file</span><span class="o">=</span><span class="n">sys</span><span class="o">.</span><span class="n">stderr</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;[Hook Import Error] </span><span class="si">{</span><span class="n">filename</span><span class="si">}</span><span class="s2">: </span><span class="si">{</span><span class="n">error</span><span class="si">}</span><span class="s2">&#34;</span><span class="p">,</span> <span class="n">file</span><span class="o">=</span><span class="n">sys</span><span class="o">.</span><span class="n">stderr</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;[Agent Validation] blocked: </span><span class="si">{</span><span class="n">reason</span><span class="si">}</span><span class="s2">&#34;</span><span class="p">,</span> <span class="n">file</span><span class="o">=</span><span class="n">sys</span><span class="o">.</span><span class="n">stderr</span><span class="p">)</span></span></span></code></pre></div><hr>
<h2 id="五完整的可觀測性架構">五、完整的可觀測性架構</h2>
<p>把前面的設計串在一起，一個 Hook 的完整執行路徑和可觀測性覆蓋如下：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln"> 1</span><span class="cl">Hook 被觸發
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">│
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">├─ [階段 1] Import 載入
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">│  ├─ 成功 → 繼續
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">│  └─ 失敗 → try-except 捕獲
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">│            ├─ stderr: [Hook Import Error] hook.py: error
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">│            └─ sys.exit(1)
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">│
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">├─ [階段 2] setup_hook_logging
</span></span><span class="line"><span class="ln">10</span><span class="cl">│  ├─ 成功 → Logger 就緒（FileHandler + StreamHandler）
</span></span><span class="line"><span class="ln">11</span><span class="cl">│  └─ 失敗 → Fallback Logger（僅 StreamHandler）
</span></span><span class="line"><span class="ln">12</span><span class="cl">│
</span></span><span class="line"><span class="ln">13</span><span class="cl">├─ [階段 3] main() 執行
</span></span><span class="line"><span class="ln">14</span><span class="cl">│  ├─ 成功 → logger.debug(&#34;execution time: Xs&#34;)
</span></span><span class="line"><span class="ln">15</span><span class="cl">│  │         return exit_code
</span></span><span class="line"><span class="ln">16</span><span class="cl">│  ├─ 業務拒絕 → stderr: [Hook Name] blocked: reason
</span></span><span class="line"><span class="ln">17</span><span class="cl">│  │             return 2
</span></span><span class="line"><span class="ln">18</span><span class="cl">│  └─ 未預期異常 → logger.critical(traceback)
</span></span><span class="line"><span class="ln">19</span><span class="cl">│                   stderr: [Hook Error] hook failed
</span></span><span class="line"><span class="ln">20</span><span class="cl">│                   return 1
</span></span><span class="line"><span class="ln">21</span><span class="cl">│
</span></span><span class="line"><span class="ln">22</span><span class="cl">└─ [階段 4] 日誌清理（每 10 次觸發）
</span></span><span class="line"><span class="ln">23</span><span class="cl">   └─ 清理 7 天前的日誌檔案</span></span></code></pre></div><p>每個階段都有對應的可觀測性機制。沒有任何執行路徑是「靜默」的。</p>
<hr>
<h2 id="思考題">思考題</h2>
<ol>
<li>
<p>為什麼 <code>_cleanup_old_logs</code> 使用 <code>mtime</code>（修改時間）而非 <code>ctime</code>（建立時間）來判斷過期？在什麼情況下兩者會不同？</p>
</li>
<li>
<p>如果兩個 Hook 同時執行（例如同時觸發的 PreToolUse Hook），它們的日誌會互相干擾嗎？提示：思考 <code>logging.getLogger(hook_name)</code> 的行為。</p>
</li>
<li>
<p>目前的清理計數器用檔案系統實作。如果改用原子操作（例如 <code>os.rename</code>），能否解決並行存取的 race condition？值得嗎？</p>
</li>
</ol>
<h2 id="實作練習">實作練習</h2>
<ol>
<li>
<p><strong>寫一個日誌分析腳本</strong>：掃描 <code>.claude/hook-logs/</code> 目錄，統計每個 Hook 的平均執行時間、失敗次數、最後一次執行時間。</p>
</li>
<li>
<p><strong>實作 RotatingFileHandler 版本</strong>：修改 <code>setup_hook_logging</code>，改用單一日誌檔 + <code>RotatingFileHandler</code>（按大小輪轉），並比較和目前方案的優缺點。</p>
</li>
<li>
<p><strong>加入健康檢查端點</strong>：寫一個 <code>hook-health-check.py</code> 腳本，檢查每個 Hook 目錄的最新日誌是否包含 <code>CRITICAL</code> 等級的記錄，輸出健康報告。</p>
</li>
</ol>
<hr>
<p><em>上一章：<a href="/blog/python/05-error-testing/error-infrastructure/" data-link-title="5.5 頂層例外處理機制" data-link-desc="run_hook_safely 與統一錯誤基礎設施">頂層例外處理機制</a></em>
<em>相關：<a href="/blog/python/07-refactoring/refactoring-pitfalls/" data-link-title="重構陷阱與防護" data-link-desc="三個真實重構事故的共通模式：部分更新問題與系統性防護方法">重構陷阱與防護</a> &ndash; IMP-003/005/006 的重構角度分析</em>
<em>相關：<a href="/blog/python/07-refactoring/scope-regression/" data-link-title="作用域迴歸案例研究" data-link-desc="從 IMP-003 事件學習 Python 變數作用域的陷阱">作用域迴歸案例研究</a> &ndash; IMP-003 的完整技術分析</em></p>
]]></content:encoded></item></channel></rss>