<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Query on Tarragon</title><link>https://tarrragon.github.io/blog/tags/query/</link><description>Recent content in Query on Tarragon</description><generator>Hugo -- gohugo.io</generator><language>zh-TW</language><copyright>Tarragon (CC BY 4.0)</copyright><lastBuildDate>Sat, 20 Jun 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://tarrragon.github.io/blog/tags/query/index.xml" rel="self" type="application/rss+xml"/><item><title>查詢 API 設計</title><link>https://tarrragon.github.io/blog/monitoring/04-collector/query-api/</link><pubDate>Fri, 19 Jun 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/monitoring/04-collector/query-api/</guid><description>&lt;p>查詢是監控資料的消費介面。Collector 提供兩種查詢方式：CLI 直接操作 JSONL 檔案（grep + jq），和 HTTP 查詢 endpoint。兩種方式服務不同的消費者 — CLI 給開發者即時探索，HTTP endpoint 給自動化工具和非 CLI 使用者。&lt;/p>
&lt;h2 id="cli-查詢grep--jq">CLI 查詢：grep + jq&lt;/h2>
&lt;p>JSONL 格式的最大優勢是原生支援 Unix 文字處理工具。不需要額外的查詢語言、不需要客戶端工具、不需要連線到 database。&lt;/p>
&lt;h3 id="常見查詢模式">常見查詢模式&lt;/h3>
&lt;p>按事件類型過濾：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">grep &lt;span class="s1">&amp;#39;&amp;#34;type&amp;#34;:&amp;#34;error&amp;#34;&amp;#39;&lt;/span> events-2026-06-19.jsonl &lt;span class="p">|&lt;/span> jq .&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>按 namespace 過濾：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">grep &lt;span class="s1">&amp;#39;&amp;#34;name&amp;#34;:&amp;#34;terminal.connect&amp;#39;&lt;/span> events-2026-06-19.jsonl &lt;span class="p">|&lt;/span> jq .&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>按時間範圍過濾（跨檔案）：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">cat events-2026-06-1&lt;span class="o">{&lt;/span>8,9&lt;span class="o">}&lt;/span>.jsonl &lt;span class="p">|&lt;/span> jq &lt;span class="s1">&amp;#39;select(.ts &amp;gt;= &amp;#34;2026-06-18T18:00:00&amp;#34;)&amp;#39;&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>統計每種事件的數量：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">jq -r &lt;span class="s1">&amp;#39;.name&amp;#39;&lt;/span> events-2026-06-19.jsonl &lt;span class="p">|&lt;/span> sort &lt;span class="p">|&lt;/span> uniq -c &lt;span class="p">|&lt;/span> sort -rn&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="grep-友好的-jsonl-設計">grep 友好的 JSONL 設計&lt;/h3>
&lt;p>JSONL 的每行 JSON 結構影響 grep 的查詢效率和準確性。&lt;/p>
&lt;p>&lt;strong>把常用過濾欄位放在 JSON 的前面&lt;/strong>。grep 是字串匹配，把 &lt;code>type&lt;/code> 和 &lt;code>name&lt;/code> 放在行首讓 grep pattern 更簡單、誤匹配更少。&lt;/p>
&lt;p>&lt;strong>避免 JSON 值中包含雙引號&lt;/strong>。事件名稱和型別用簡單字串（不含特殊字元），讓 grep 的 pattern 不需要處理 escape。&lt;/p>
&lt;p>&lt;strong>每行 JSON 不換行&lt;/strong>。JSONL 的定義就是每行一個 JSON，但格式化工具可能自動加換行。寫入時用 &lt;code>json.Marshal&lt;/code>（Go）或 &lt;code>JSON.stringify&lt;/code>（JS）確保單行輸出。&lt;/p>
&lt;h2 id="http-查詢-endpoint">HTTP 查詢 endpoint&lt;/h2>
&lt;p>HTTP 查詢 endpoint 讓非 CLI 使用者（dashboard、自動化腳本、其他服務）能查詢事件資料。&lt;/p>
&lt;h3 id="endpoint-設計">Endpoint 設計&lt;/h3>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-text" data-lang="text">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">GET /v1/events?type=error&amp;amp;name=terminal.connect.*&amp;amp;from=2026-06-18T00:00:00Z&amp;amp;to=2026-06-19T00:00:00Z&amp;amp;limit=100&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>查詢參數：&lt;/p>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>參數&lt;/th>
 &lt;th>說明&lt;/th>
 &lt;th>預設值&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>type&lt;/td>
 &lt;td>事件類型（event/error/metric/lifecycle）&lt;/td>
 &lt;td>全部&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>name&lt;/td>
 &lt;td>事件名稱（支援 &lt;code>*&lt;/code> 萬用字元）&lt;/td>
 &lt;td>全部&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>from&lt;/td>
 &lt;td>起始時間（ISO 8601）&lt;/td>
 &lt;td>24 小時前&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>to&lt;/td>
 &lt;td>結束時間（ISO 8601）&lt;/td>
 &lt;td>現在&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>limit&lt;/td>
 &lt;td>回傳筆數上限&lt;/td>
 &lt;td>100&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>offset&lt;/td>
 &lt;td>分頁偏移&lt;/td>
 &lt;td>0&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;h3 id="回應格式">回應格式&lt;/h3>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-json" data-lang="json">&lt;span class="line">&lt;span class="ln"> 1&lt;/span>&lt;span class="cl">&lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 2&lt;/span>&lt;span class="cl"> &lt;span class="nt">&amp;#34;events&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="p">[&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 3&lt;/span>&lt;span class="cl"> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 4&lt;/span>&lt;span class="cl"> &lt;span class="nt">&amp;#34;v&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 5&lt;/span>&lt;span class="cl"> &lt;span class="nt">&amp;#34;type&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s2">&amp;#34;error&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 6&lt;/span>&lt;span class="cl"> &lt;span class="nt">&amp;#34;timestamp&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s2">&amp;#34;2026-06-19T08:42:00Z&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 7&lt;/span>&lt;span class="cl"> &lt;span class="nt">&amp;#34;source&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="p">{&lt;/span> &lt;span class="nt">&amp;#34;sdk&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s2">&amp;#34;python&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nt">&amp;#34;platform&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s2">&amp;#34;macos&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nt">&amp;#34;app&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s2">&amp;#34;claude-hooks&amp;#34;&lt;/span> &lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 8&lt;/span>&lt;span class="cl"> &lt;span class="nt">&amp;#34;name&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s2">&amp;#34;hook.failure&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 9&lt;/span>&lt;span class="cl"> &lt;span class="nt">&amp;#34;level&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s2">&amp;#34;error&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">10&lt;/span>&lt;span class="cl"> &lt;span class="nt">&amp;#34;data&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="p">{&lt;/span> &lt;span class="nt">&amp;#34;hook&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s2">&amp;#34;branch-status-reminder&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nt">&amp;#34;step&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s2">&amp;#34;validation&amp;#34;&lt;/span> &lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">11&lt;/span>&lt;span class="cl"> &lt;span class="nt">&amp;#34;error&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="p">{&lt;/span> &lt;span class="nt">&amp;#34;message&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s2">&amp;#34;KeyError: &amp;#39;status&amp;#39;&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nt">&amp;#34;stack&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s2">&amp;#34;Traceback...&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nt">&amp;#34;type&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s2">&amp;#34;KeyError&amp;#34;&lt;/span> &lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">12&lt;/span>&lt;span class="cl"> &lt;span class="nt">&amp;#34;context&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="p">{&lt;/span> &lt;span class="nt">&amp;#34;session_id&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s2">&amp;#34;sess-abc-123&amp;#34;&lt;/span> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">13&lt;/span>&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">14&lt;/span>&lt;span class="cl"> &lt;span class="p">],&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">15&lt;/span>&lt;span class="cl"> &lt;span class="nt">&amp;#34;total&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="mi">42&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">16&lt;/span>&lt;span class="cl"> &lt;span class="nt">&amp;#34;limit&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="mi">100&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">17&lt;/span>&lt;span class="cl"> &lt;span class="nt">&amp;#34;offset&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="mi">0&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">18&lt;/span>&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;code>events&lt;/code> 陣列按 &lt;code>timestamp&lt;/code> 降序排列。&lt;code>total&lt;/code> 是符合篩選條件的全量筆數（不受 limit 截斷），讓呼叫端計算分頁（&lt;code>total_pages = ceil(total / limit)&lt;/code>）。分頁用 offset-based（&lt;code>offset=100&lt;/code> 取第二頁），適合資料量在十萬筆以下的場景。資料量大到 offset 效能不足時，改用 cursor-based（&lt;code>after=&amp;lt;last_event_id&amp;gt;&lt;/code>），但 cursor-based 是 PostgreSQL 層的演進，SQLite 層用 offset 足夠。&lt;/p>
&lt;h3 id="實作策略">實作策略&lt;/h3>
&lt;p>HTTP 查詢 endpoint 的底層實作可以直接讀取 JSONL 檔案 — 根據 from/to 確定要讀哪些日期的檔案，逐行 parse 並過濾。這個實作在資料量小（單日萬筆以下）時足夠快。&lt;/p>
&lt;p>當查詢效能成為問題時，在 JSONL 之上加一層索引（按 type/name 建立反向索引），或演進到 SQLite 儲存（見 &lt;a href="https://tarrragon.github.io/blog/monitoring/04-collector/scaling-evolution/" data-link-title="規模演進" data-link-desc="可插拔 Storage Backend 架構 — SQLite 預設、PostgreSQL 觸發切換、時間序列 DB 長期演進">規模演進&lt;/a>）。&lt;/p></description><content:encoded><![CDATA[<p>查詢是監控資料的消費介面。Collector 提供兩種查詢方式：CLI 直接操作 JSONL 檔案（grep + jq），和 HTTP 查詢 endpoint。兩種方式服務不同的消費者 — CLI 給開發者即時探索，HTTP endpoint 給自動化工具和非 CLI 使用者。</p>
<h2 id="cli-查詢grep--jq">CLI 查詢：grep + jq</h2>
<p>JSONL 格式的最大優勢是原生支援 Unix 文字處理工具。不需要額外的查詢語言、不需要客戶端工具、不需要連線到 database。</p>
<h3 id="常見查詢模式">常見查詢模式</h3>
<p>按事件類型過濾：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">grep <span class="s1">&#39;&#34;type&#34;:&#34;error&#34;&#39;</span> events-2026-06-19.jsonl <span class="p">|</span> jq .</span></span></code></pre></div><p>按 namespace 過濾：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">grep <span class="s1">&#39;&#34;name&#34;:&#34;terminal.connect&#39;</span> events-2026-06-19.jsonl <span class="p">|</span> jq .</span></span></code></pre></div><p>按時間範圍過濾（跨檔案）：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">cat events-2026-06-1<span class="o">{</span>8,9<span class="o">}</span>.jsonl <span class="p">|</span> jq <span class="s1">&#39;select(.ts &gt;= &#34;2026-06-18T18:00:00&#34;)&#39;</span></span></span></code></pre></div><p>統計每種事件的數量：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">jq -r <span class="s1">&#39;.name&#39;</span> events-2026-06-19.jsonl <span class="p">|</span> sort <span class="p">|</span> uniq -c <span class="p">|</span> sort -rn</span></span></code></pre></div><h3 id="grep-友好的-jsonl-設計">grep 友好的 JSONL 設計</h3>
<p>JSONL 的每行 JSON 結構影響 grep 的查詢效率和準確性。</p>
<p><strong>把常用過濾欄位放在 JSON 的前面</strong>。grep 是字串匹配，把 <code>type</code> 和 <code>name</code> 放在行首讓 grep pattern 更簡單、誤匹配更少。</p>
<p><strong>避免 JSON 值中包含雙引號</strong>。事件名稱和型別用簡單字串（不含特殊字元），讓 grep 的 pattern 不需要處理 escape。</p>
<p><strong>每行 JSON 不換行</strong>。JSONL 的定義就是每行一個 JSON，但格式化工具可能自動加換行。寫入時用 <code>json.Marshal</code>（Go）或 <code>JSON.stringify</code>（JS）確保單行輸出。</p>
<h2 id="http-查詢-endpoint">HTTP 查詢 endpoint</h2>
<p>HTTP 查詢 endpoint 讓非 CLI 使用者（dashboard、自動化腳本、其他服務）能查詢事件資料。</p>
<h3 id="endpoint-設計">Endpoint 設計</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">GET /v1/events?type=error&amp;name=terminal.connect.*&amp;from=2026-06-18T00:00:00Z&amp;to=2026-06-19T00:00:00Z&amp;limit=100</span></span></code></pre></div><p>查詢參數：</p>
<table>
  <thead>
      <tr>
          <th>參數</th>
          <th>說明</th>
          <th>預設值</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>type</td>
          <td>事件類型（event/error/metric/lifecycle）</td>
          <td>全部</td>
      </tr>
      <tr>
          <td>name</td>
          <td>事件名稱（支援 <code>*</code> 萬用字元）</td>
          <td>全部</td>
      </tr>
      <tr>
          <td>from</td>
          <td>起始時間（ISO 8601）</td>
          <td>24 小時前</td>
      </tr>
      <tr>
          <td>to</td>
          <td>結束時間（ISO 8601）</td>
          <td>現在</td>
      </tr>
      <tr>
          <td>limit</td>
          <td>回傳筆數上限</td>
          <td>100</td>
      </tr>
      <tr>
          <td>offset</td>
          <td>分頁偏移</td>
          <td>0</td>
      </tr>
  </tbody>
</table>
<h3 id="回應格式">回應格式</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-json" data-lang="json"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="p">{</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">  <span class="nt">&#34;events&#34;</span><span class="p">:</span> <span class="p">[</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">    <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">      <span class="nt">&#34;v&#34;</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">      <span class="nt">&#34;type&#34;</span><span class="p">:</span> <span class="s2">&#34;error&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">      <span class="nt">&#34;timestamp&#34;</span><span class="p">:</span> <span class="s2">&#34;2026-06-19T08:42:00Z&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">      <span class="nt">&#34;source&#34;</span><span class="p">:</span> <span class="p">{</span> <span class="nt">&#34;sdk&#34;</span><span class="p">:</span> <span class="s2">&#34;python&#34;</span><span class="p">,</span> <span class="nt">&#34;platform&#34;</span><span class="p">:</span> <span class="s2">&#34;macos&#34;</span><span class="p">,</span> <span class="nt">&#34;app&#34;</span><span class="p">:</span> <span class="s2">&#34;claude-hooks&#34;</span> <span class="p">},</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">      <span class="nt">&#34;name&#34;</span><span class="p">:</span> <span class="s2">&#34;hook.failure&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">      <span class="nt">&#34;level&#34;</span><span class="p">:</span> <span class="s2">&#34;error&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">      <span class="nt">&#34;data&#34;</span><span class="p">:</span> <span class="p">{</span> <span class="nt">&#34;hook&#34;</span><span class="p">:</span> <span class="s2">&#34;branch-status-reminder&#34;</span><span class="p">,</span> <span class="nt">&#34;step&#34;</span><span class="p">:</span> <span class="s2">&#34;validation&#34;</span> <span class="p">},</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">      <span class="nt">&#34;error&#34;</span><span class="p">:</span> <span class="p">{</span> <span class="nt">&#34;message&#34;</span><span class="p">:</span> <span class="s2">&#34;KeyError: &#39;status&#39;&#34;</span><span class="p">,</span> <span class="nt">&#34;stack&#34;</span><span class="p">:</span> <span class="s2">&#34;Traceback...&#34;</span><span class="p">,</span> <span class="nt">&#34;type&#34;</span><span class="p">:</span> <span class="s2">&#34;KeyError&#34;</span> <span class="p">},</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">      <span class="nt">&#34;context&#34;</span><span class="p">:</span> <span class="p">{</span> <span class="nt">&#34;session_id&#34;</span><span class="p">:</span> <span class="s2">&#34;sess-abc-123&#34;</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">  <span class="p">],</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">  <span class="nt">&#34;total&#34;</span><span class="p">:</span> <span class="mi">42</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl">  <span class="nt">&#34;limit&#34;</span><span class="p">:</span> <span class="mi">100</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl">  <span class="nt">&#34;offset&#34;</span><span class="p">:</span> <span class="mi">0</span>
</span></span><span class="line"><span class="ln">18</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p><code>events</code> 陣列按 <code>timestamp</code> 降序排列。<code>total</code> 是符合篩選條件的全量筆數（不受 limit 截斷），讓呼叫端計算分頁（<code>total_pages = ceil(total / limit)</code>）。分頁用 offset-based（<code>offset=100</code> 取第二頁），適合資料量在十萬筆以下的場景。資料量大到 offset 效能不足時，改用 cursor-based（<code>after=&lt;last_event_id&gt;</code>），但 cursor-based 是 PostgreSQL 層的演進，SQLite 層用 offset 足夠。</p>
<h3 id="實作策略">實作策略</h3>
<p>HTTP 查詢 endpoint 的底層實作可以直接讀取 JSONL 檔案 — 根據 from/to 確定要讀哪些日期的檔案，逐行 parse 並過濾。這個實作在資料量小（單日萬筆以下）時足夠快。</p>
<p>當查詢效能成為問題時，在 JSONL 之上加一層索引（按 type/name 建立反向索引），或演進到 SQLite 儲存（見 <a href="/blog/monitoring/04-collector/scaling-evolution/" data-link-title="規模演進" data-link-desc="可插拔 Storage Backend 架構 — SQLite 預設、PostgreSQL 觸發切換、時間序列 DB 長期演進">規模演進</a>）。</p>
<h2 id="聚合查詢">聚合查詢</h2>
<p>逐筆查詢回答「發生了什麼」，聚合查詢回答「發生了多少」。Error 調查的第一步是定位最頻繁的 error — 「哪些 error 最多」需要按 name 分群計數的聚合結果，逐筆列表在這個階段資訊量太大。</p>
<h3 id="endpoint-設計-1">Endpoint 設計</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">GET /v1/events/summary?type=error&amp;from=2026-06-18T00:00:00Z&amp;to=2026-06-19T00:00:00Z&amp;group_by=name</span></span></code></pre></div><p>回傳按 name 分群的統計：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-json" data-lang="json"><span class="line"><span class="ln">1</span><span class="cl"><span class="p">{</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">  <span class="nt">&#34;groups&#34;</span><span class="p">:</span> <span class="p">[</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">    <span class="p">{</span> <span class="nt">&#34;name&#34;</span><span class="p">:</span> <span class="s2">&#34;hook.failure&#34;</span><span class="p">,</span> <span class="nt">&#34;count&#34;</span><span class="p">:</span> <span class="mi">15</span><span class="p">,</span> <span class="nt">&#34;last_seen&#34;</span><span class="p">:</span> <span class="s2">&#34;2026-06-19T08:42:00Z&#34;</span> <span class="p">},</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">    <span class="p">{</span> <span class="nt">&#34;name&#34;</span><span class="p">:</span> <span class="s2">&#34;terminal.connect.failed&#34;</span><span class="p">,</span> <span class="nt">&#34;count&#34;</span><span class="p">:</span> <span class="mi">3</span><span class="p">,</span> <span class="nt">&#34;last_seen&#34;</span><span class="p">:</span> <span class="s2">&#34;2026-06-19T07:10:00Z&#34;</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">  <span class="p">],</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl">  <span class="nt">&#34;total&#34;</span><span class="p">:</span> <span class="mi">18</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">7</span><span class="cl">  <span class="nt">&#34;from&#34;</span><span class="p">:</span> <span class="s2">&#34;2026-06-18T00:00:00Z&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">8</span><span class="cl">  <span class="nt">&#34;to&#34;</span><span class="p">:</span> <span class="s2">&#34;2026-06-19T00:00:00Z&#34;</span>
</span></span><span class="line"><span class="ln">9</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>查詢參數和逐筆查詢共用（type、name、from、to），額外的 <code>group_by</code> 指定分群欄位（name 或 type）。</p>
<h3 id="sql-實作">SQL 實作</h3>
<p>SQLite backend 下直接用 GROUP BY：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">SELECT</span><span class="w"> </span><span class="n">name</span><span class="p">,</span><span class="w"> </span><span class="k">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="k">count</span><span class="p">,</span><span class="w"> </span><span class="k">MAX</span><span class="p">(</span><span class="k">timestamp</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">last_seen</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w"></span><span class="k">FROM</span><span class="w"> </span><span class="n">events</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w"></span><span class="k">WHERE</span><span class="w"> </span><span class="k">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;error&#39;</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="k">timestamp</span><span class="w"> </span><span class="k">BETWEEN</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="o">?</span><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w"></span><span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">name</span><span class="w">
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="w"></span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="k">count</span><span class="w"> </span><span class="k">DESC</span><span class="w">
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="w"></span><span class="k">LIMIT</span><span class="w"> </span><span class="mi">100</span></span></span></code></pre></div><p>有 type + timestamp 複合索引時，這個查詢在 10 萬筆資料內的效能和逐筆查詢相當 — GROUP BY 在索引掃描後做，不需要全表掃描。</p>
<h3 id="和逐筆查詢的定位差異">和逐筆查詢的定位差異</h3>
<table>
  <thead>
      <tr>
          <th>面向</th>
          <th>逐筆查詢 <code>/v1/events</code></th>
          <th>聚合查詢 <code>/v1/events/summary</code></th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>回答</td>
          <td>發生了什麼（事件列表）</td>
          <td>發生了多少（統計摘要）</td>
      </tr>
      <tr>
          <td>用途</td>
          <td>看單筆 error 的 stack trace</td>
          <td>找出最頻繁的 error</td>
      </tr>
      <tr>
          <td>回傳</td>
          <td>事件陣列（含完整 JSON）</td>
          <td>分群摘要（name + count + last_seen）</td>
      </tr>
      <tr>
          <td>資料量</td>
          <td>大（完整事件 body）</td>
          <td>小（只有統計值）</td>
      </tr>
      <tr>
          <td>典型工作流</td>
          <td>聚合查詢找到問題 name → 逐筆查詢看細節</td>
          <td>首先使用</td>
      </tr>
  </tbody>
</table>
<p>兩者是互補的工作流 — 聚合查詢定位問題方向，逐筆查詢深入細節。Dashboard 的 Error 列表頁面直接消費聚合查詢的結果。</p>
<h2 id="cli-vs-http-的定位">CLI vs HTTP 的定位</h2>
<table>
  <thead>
      <tr>
          <th>面向</th>
          <th>CLI (grep + jq)</th>
          <th>HTTP endpoint</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>使用者</td>
          <td>開發者</td>
          <td>自動化工具、dashboard</td>
      </tr>
      <tr>
          <td>適合</td>
          <td>即時探索、ad-hoc 查詢</td>
          <td>結構化查詢、程式化存取</td>
      </tr>
      <tr>
          <td>優勢</td>
          <td>零安裝、可組合</td>
          <td>遠端存取、標準化</td>
      </tr>
      <tr>
          <td>限制</td>
          <td>需要 SSH 存取 server</td>
          <td>需要 collector 啟動</td>
      </tr>
  </tbody>
</table>
<p>兩種介面共存 — CLI 用於開發者日常 debug，HTTP endpoint 用於自動化和遠端存取。兩者底層讀取同一份 JSONL 檔案，結果一致。</p>
<h2 id="下一步路由">下一步路由</h2>
<ul>
<li>JSONL 儲存的設計 → <a href="/blog/monitoring/04-collector/jsonl-storage/" data-link-title="JSONL 匯出與備份格式" data-link-desc="JSONL 作為匯出和備份格式的設計 — 人類可讀、grep 友好、SQLite 損壞時的重建來源">JSONL 儲存設計</a></li>
<li>Rule engine 的自動化處理 → <a href="/blog/monitoring/04-collector/rule-engine/" data-link-title="Rule engine 設計" data-link-desc="條件 → 動作 → 模板的三段式規則結構 — 讓 collector 從被動儲存變成主動回應">Rule engine 設計</a></li>
<li>Collector 的完整架構 → <a href="/blog/monitoring/04-collector/architecture/" data-link-title="Collector 架構" data-link-desc="HTTP endpoint → JSON Schema 驗證 → 儲存 → 查詢 → rule engine 的五段式處理鏈路">Collector 架構</a></li>
</ul>
]]></content:encoded></item><item><title>查詢消費模式</title><link>https://tarrragon.github.io/blog/monitoring/04-collector/query-consumption-patterns/</link><pubDate>Sat, 20 Jun 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/monitoring/04-collector/query-consumption-patterns/</guid><description>&lt;p>事件的價值在於被查詢消費。設計事件時反過來想：查詢需要什麼欄位 → 事件需要帶什麼 data → 感測器需要在什麼時機觸發。從消費端反推設計，避免「收了一堆事件但查不到想要的答案」。&lt;/p>
&lt;p>五種查詢場景各自需要不同的事件類型、欄位和查詢模式。每種場景的查詢模式也決定了需要 SQLite 層還是 PostgreSQL 層（見 &lt;a href="https://tarrragon.github.io/blog/monitoring/04-collector/feature-tier-boundary/" data-link-title="功能分層與 Backend 選擇" data-link-desc="SQLite 層和 PostgreSQL 層各自承載哪些功能 — 分界線是查詢模式而非資料量、觸發升級的是功能需求而非規模成長">功能分層與 Backend 選擇&lt;/a>）。&lt;/p>
&lt;h2 id="debug-查詢">Debug 查詢&lt;/h2>
&lt;p>Debug 查詢回答「問題出在哪」。觸發時機是使用者回報問題或 error alert 觸發後，開發者需要還原問題的 context。&lt;/p>
&lt;h3 id="查詢場景">查詢場景&lt;/h3>
&lt;h4 id="剛才使用者回報的問題">剛才使用者回報的問題&lt;/h4>
&lt;p>查詢模式：用 session_id 過濾，拉出該 session 的全部事件，按時間排序。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-sql" data-lang="sql">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="c1">-- SQLite
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="k">SELECT&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">type&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">name&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">ts&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">data&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="k">FROM&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">events&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="k">WHERE&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">session_id&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s1">&amp;#39;abc-123&amp;#39;&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="k">ORDER&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">BY&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">ts&lt;/span>&lt;span class="p">;&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>需要的事件欄位：session_id（關聯同次使用的事件）、ts（排序）、error 的 stack trace 和 step（定位失敗點）。&lt;/p>
&lt;h4 id="這個-error-多常發生">這個 error 多常發生&lt;/h4>
&lt;p>查詢模式：按 error name 分群計數，看時間趨勢。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-sql" data-lang="sql">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="c1">-- SQLite
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="k">SELECT&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">name&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">COUNT&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="o">*&lt;/span>&lt;span class="p">)&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">as&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">count&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="n">strftime&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;%Y-%m-%d&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">ts&lt;/span>&lt;span class="p">)&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">as&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">day&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="k">FROM&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">events&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="k">WHERE&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">type&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s1">&amp;#39;error&amp;#39;&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">6&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="k">AND&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">ts&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="o">&amp;gt;=&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">datetime&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;now&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s1">&amp;#39;-7 days&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">7&lt;/span>&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="k">GROUP&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">BY&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">name&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">day&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">8&lt;/span>&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="k">ORDER&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">BY&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">day&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">count&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">DESC&lt;/span>&lt;span class="p">;&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>需要的事件欄位：type=&amp;lsquo;error&amp;rsquo;、name（分群鍵）、ts（時間分桶）。&lt;/p>
&lt;h3 id="需要的事件">需要的事件&lt;/h3>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>事件類型&lt;/th>
 &lt;th>必要欄位&lt;/th>
 &lt;th>用途&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>error&lt;/td>
 &lt;td>stack_trace, step, session_id&lt;/td>
 &lt;td>定位失敗點 + 關聯 session&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>event&lt;/td>
 &lt;td>name, session_id&lt;/td>
 &lt;td>還原使用者操作路徑&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>lifecycle&lt;/td>
 &lt;td>name, session_id&lt;/td>
 &lt;td>還原系統狀態轉換&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;h2 id="alerting-查詢">Alerting 查詢&lt;/h2>
&lt;p>Alerting 查詢回答「需要注意嗎」。分兩種機制：rule engine 的即時評估（事件到達時逐筆比對規則）和事後查詢的趨勢分析。&lt;/p>
&lt;h3 id="查詢場景-1">查詢場景&lt;/h3>
&lt;h4 id="error-數量突然上升">Error 數量突然上升&lt;/h4>
&lt;p>查詢模式：最近 1 小時的 error 計數 vs 前一天同時段，偏差超過閾值則告警。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-sql" data-lang="sql">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="c1">-- SQLite
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="k">SELECT&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">COUNT&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="o">*&lt;/span>&lt;span class="p">)&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">as&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">recent_count&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="k">FROM&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">events&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="k">WHERE&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">type&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s1">&amp;#39;error&amp;#39;&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="k">AND&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">ts&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="o">&amp;gt;=&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">datetime&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;now&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s1">&amp;#39;-1 hour&amp;#39;&lt;/span>&lt;span class="p">);&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Rule engine 的即時版：每收到一筆 error 事件，遞增計數器，計數器超過閾值觸發動作。&lt;/p>
&lt;h4 id="特定-error-首次出現">特定 error 首次出現&lt;/h4>
&lt;p>查詢模式：收到 error 時查是否有歷史記錄。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-sql" data-lang="sql">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="c1">-- SQLite
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="k">SELECT&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">COUNT&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="o">*&lt;/span>&lt;span class="p">)&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">FROM&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">events&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="k">WHERE&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">type&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s1">&amp;#39;error&amp;#39;&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">AND&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">name&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="o">?&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="k">AND&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">ts&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="o">?&lt;/span>&lt;span class="p">;&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>結果為 0 代表首次出現 — 觸發「新 error 類型」告警。Sentry 的核心功能之一就是這個查詢。&lt;/p>
&lt;h3 id="rule-engine-vs-事後查詢">Rule engine vs 事後查詢&lt;/h3>
&lt;p>Rule engine 逐筆評估，延遲在毫秒級，適合「error 出現就通知」。事後查詢用 SQL 聚合，延遲在秒到分鐘級，適合「過去一小時的 error 趨勢」。兩者互補 — rule engine 做即時告警、SQL 查詢做事後分析。&lt;/p></description><content:encoded><![CDATA[<p>事件的價值在於被查詢消費。設計事件時反過來想：查詢需要什麼欄位 → 事件需要帶什麼 data → 感測器需要在什麼時機觸發。從消費端反推設計，避免「收了一堆事件但查不到想要的答案」。</p>
<p>五種查詢場景各自需要不同的事件類型、欄位和查詢模式。每種場景的查詢模式也決定了需要 SQLite 層還是 PostgreSQL 層（見 <a href="/blog/monitoring/04-collector/feature-tier-boundary/" data-link-title="功能分層與 Backend 選擇" data-link-desc="SQLite 層和 PostgreSQL 層各自承載哪些功能 — 分界線是查詢模式而非資料量、觸發升級的是功能需求而非規模成長">功能分層與 Backend 選擇</a>）。</p>
<h2 id="debug-查詢">Debug 查詢</h2>
<p>Debug 查詢回答「問題出在哪」。觸發時機是使用者回報問題或 error alert 觸發後，開發者需要還原問題的 context。</p>
<h3 id="查詢場景">查詢場景</h3>
<h4 id="剛才使用者回報的問題">剛才使用者回報的問題</h4>
<p>查詢模式：用 session_id 過濾，拉出該 session 的全部事件，按時間排序。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1">-- SQLite
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"></span><span class="k">SELECT</span><span class="w"> </span><span class="k">type</span><span class="p">,</span><span class="w"> </span><span class="n">name</span><span class="p">,</span><span class="w"> </span><span class="n">ts</span><span class="p">,</span><span class="w"> </span><span class="k">data</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w"></span><span class="k">FROM</span><span class="w"> </span><span class="n">events</span><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w"></span><span class="k">WHERE</span><span class="w"> </span><span class="n">session_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;abc-123&#39;</span><span class="w">
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="w"></span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">ts</span><span class="p">;</span></span></span></code></pre></div><p>需要的事件欄位：session_id（關聯同次使用的事件）、ts（排序）、error 的 stack trace 和 step（定位失敗點）。</p>
<h4 id="這個-error-多常發生">這個 error 多常發生</h4>
<p>查詢模式：按 error name 分群計數，看時間趨勢。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1">-- SQLite
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"></span><span class="k">SELECT</span><span class="w"> </span><span class="n">name</span><span class="p">,</span><span class="w"> </span><span class="k">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="k">count</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w">       </span><span class="n">strftime</span><span class="p">(</span><span class="s1">&#39;%Y-%m-%d&#39;</span><span class="p">,</span><span class="w"> </span><span class="n">ts</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="k">day</span><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w"></span><span class="k">FROM</span><span class="w"> </span><span class="n">events</span><span class="w">
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="w"></span><span class="k">WHERE</span><span class="w"> </span><span class="k">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;error&#39;</span><span class="w">
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="w">  </span><span class="k">AND</span><span class="w"> </span><span class="n">ts</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="n">datetime</span><span class="p">(</span><span class="s1">&#39;now&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;-7 days&#39;</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="w"></span><span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">name</span><span class="p">,</span><span class="w"> </span><span class="k">day</span><span class="w">
</span></span></span><span class="line"><span class="ln">8</span><span class="cl"><span class="w"></span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="k">day</span><span class="p">,</span><span class="w"> </span><span class="k">count</span><span class="w"> </span><span class="k">DESC</span><span class="p">;</span></span></span></code></pre></div><p>需要的事件欄位：type=&lsquo;error&rsquo;、name（分群鍵）、ts（時間分桶）。</p>
<h3 id="需要的事件">需要的事件</h3>
<table>
  <thead>
      <tr>
          <th>事件類型</th>
          <th>必要欄位</th>
          <th>用途</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>error</td>
          <td>stack_trace, step, session_id</td>
          <td>定位失敗點 + 關聯 session</td>
      </tr>
      <tr>
          <td>event</td>
          <td>name, session_id</td>
          <td>還原使用者操作路徑</td>
      </tr>
      <tr>
          <td>lifecycle</td>
          <td>name, session_id</td>
          <td>還原系統狀態轉換</td>
      </tr>
  </tbody>
</table>
<h2 id="alerting-查詢">Alerting 查詢</h2>
<p>Alerting 查詢回答「需要注意嗎」。分兩種機制：rule engine 的即時評估（事件到達時逐筆比對規則）和事後查詢的趨勢分析。</p>
<h3 id="查詢場景-1">查詢場景</h3>
<h4 id="error-數量突然上升">Error 數量突然上升</h4>
<p>查詢模式：最近 1 小時的 error 計數 vs 前一天同時段，偏差超過閾值則告警。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1">-- SQLite
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"></span><span class="k">SELECT</span><span class="w"> </span><span class="k">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">recent_count</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w"></span><span class="k">FROM</span><span class="w"> </span><span class="n">events</span><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w"></span><span class="k">WHERE</span><span class="w"> </span><span class="k">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;error&#39;</span><span class="w">
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="w">  </span><span class="k">AND</span><span class="w"> </span><span class="n">ts</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="n">datetime</span><span class="p">(</span><span class="s1">&#39;now&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;-1 hour&#39;</span><span class="p">);</span></span></span></code></pre></div><p>Rule engine 的即時版：每收到一筆 error 事件，遞增計數器，計數器超過閾值觸發動作。</p>
<h4 id="特定-error-首次出現">特定 error 首次出現</h4>
<p>查詢模式：收到 error 時查是否有歷史記錄。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1">-- SQLite
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"></span><span class="k">SELECT</span><span class="w"> </span><span class="k">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">events</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w"></span><span class="k">WHERE</span><span class="w"> </span><span class="k">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;error&#39;</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">?</span><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w">  </span><span class="k">AND</span><span class="w"> </span><span class="n">ts</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="o">?</span><span class="p">;</span></span></span></code></pre></div><p>結果為 0 代表首次出現 — 觸發「新 error 類型」告警。Sentry 的核心功能之一就是這個查詢。</p>
<h3 id="rule-engine-vs-事後查詢">Rule engine vs 事後查詢</h3>
<p>Rule engine 逐筆評估，延遲在毫秒級，適合「error 出現就通知」。事後查詢用 SQL 聚合，延遲在秒到分鐘級，適合「過去一小時的 error 趨勢」。兩者互補 — rule engine 做即時告警、SQL 查詢做事後分析。</p>
<h3 id="需要的事件-1">需要的事件</h3>
<table>
  <thead>
      <tr>
          <th>事件類型</th>
          <th>必要欄位</th>
          <th>用途</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>error</td>
          <td>name, ts</td>
          <td>計數 + 時間趨勢</td>
      </tr>
      <tr>
          <td>error</td>
          <td>source.version</td>
          <td>按版本分群看是否新版本引入</td>
      </tr>
  </tbody>
</table>
<h2 id="產品決策查詢">產品決策查詢</h2>
<p>產品決策查詢回答「使用者怎麼用產品」。從簡單的功能使用率到複雜的 funnel 分析。</p>
<h3 id="查詢場景-2">查詢場景</h3>
<h4 id="新功能有多少人用">新功能有多少人用</h4>
<p>查詢模式：按 event name 計數。SQLite 層即可。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1">-- SQLite
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"></span><span class="k">SELECT</span><span class="w"> </span><span class="n">name</span><span class="p">,</span><span class="w"> </span><span class="k">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="k">count</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w">       </span><span class="k">COUNT</span><span class="p">(</span><span class="k">DISTINCT</span><span class="w"> </span><span class="n">session_id</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">unique_sessions</span><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w"></span><span class="k">FROM</span><span class="w"> </span><span class="n">events</span><span class="w">
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="w"></span><span class="k">WHERE</span><span class="w"> </span><span class="k">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;event&#39;</span><span class="w">
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="w">  </span><span class="k">AND</span><span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="k">LIKE</span><span class="w"> </span><span class="s1">&#39;new_feature.%&#39;</span><span class="w">
</span></span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="w">  </span><span class="k">AND</span><span class="w"> </span><span class="n">ts</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="n">datetime</span><span class="p">(</span><span class="s1">&#39;now&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;-7 days&#39;</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="ln">8</span><span class="cl"><span class="w"></span><span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">name</span><span class="p">;</span></span></span></code></pre></div><h4 id="註冊流程在哪流失">註冊流程在哪流失</h4>
<p>查詢模式：session 級 funnel JOIN。需要 PostgreSQL 層。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1">-- PostgreSQL
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="c1"></span><span class="k">WITH</span><span class="w"> </span><span class="n">session_steps</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="w">  </span><span class="k">SELECT</span><span class="w"> </span><span class="n">session_id</span><span class="p">,</span><span class="w"> </span><span class="n">name</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="w">         </span><span class="n">ROW_NUMBER</span><span class="p">()</span><span class="w"> </span><span class="n">OVER</span><span class="w"> </span><span class="p">(</span><span class="n">PARTITION</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">session_id</span><span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">ts</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">step_order</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="w">  </span><span class="k">FROM</span><span class="w"> </span><span class="n">events</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="w">  </span><span class="k">WHERE</span><span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="k">IN</span><span class="w"> </span><span class="p">(</span><span class="s1">&#39;signup.start&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;signup.email&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;signup.verify&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;signup.complete&#39;</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="w">    </span><span class="k">AND</span><span class="w"> </span><span class="n">ts</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="n">NOW</span><span class="p">()</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="nb">INTERVAL</span><span class="w"> </span><span class="s1">&#39;30 days&#39;</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="w"></span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="w"></span><span class="k">SELECT</span><span class="w"> </span><span class="n">name</span><span class="p">,</span><span class="w"> </span><span class="k">COUNT</span><span class="p">(</span><span class="k">DISTINCT</span><span class="w"> </span><span class="n">session_id</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">sessions</span><span class="w">
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="w"></span><span class="k">FROM</span><span class="w"> </span><span class="n">session_steps</span><span class="w">
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="w"></span><span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">name</span><span class="w">
</span></span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="w"></span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="k">MIN</span><span class="p">(</span><span class="n">step_order</span><span class="p">);</span></span></span></code></pre></div><p>完整的 funnel 分析方法論見 <a href="/blog/monitoring/08-business-analytics/self-hosted-funnel/" data-link-title="從 collector 資料做基礎 funnel 分析" data-link-desc="SQLite 層能做什麼程度的 funnel、PostgreSQL 層提供什麼進階能力、JSONL 匯出後的臨時分析">從 collector 資料做基礎 funnel 分析</a>。</p>
<h3 id="需要的事件-2">需要的事件</h3>
<table>
  <thead>
      <tr>
          <th>事件類型</th>
          <th>必要欄位</th>
          <th>用途</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>event</td>
          <td>name, session_id, ts</td>
          <td>漏斗步驟計數和排序</td>
      </tr>
      <tr>
          <td>lifecycle</td>
          <td>session.start, ts</td>
          <td>session 邊界定義</td>
      </tr>
  </tbody>
</table>
<h2 id="安全審計查詢">安全審計查詢</h2>
<p>安全審計查詢回答「有沒有非預期的存取」。重點是偵測異常模式而非單筆事件。</p>
<h3 id="查詢場景-3">查詢場景</h3>
<h4 id="有沒有異常登入">有沒有異常登入</h4>
<p>查詢模式：auth 失敗事件按 session 分群計數，短時間內大量失敗 = 暴力破解嘗試。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1">-- SQLite
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"></span><span class="k">SELECT</span><span class="w"> </span><span class="n">session_id</span><span class="p">,</span><span class="w"> </span><span class="k">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">fail_count</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w">       </span><span class="k">MIN</span><span class="p">(</span><span class="n">ts</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">first_attempt</span><span class="p">,</span><span class="w"> </span><span class="k">MAX</span><span class="p">(</span><span class="n">ts</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">last_attempt</span><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w"></span><span class="k">FROM</span><span class="w"> </span><span class="n">events</span><span class="w">
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="w"></span><span class="k">WHERE</span><span class="w"> </span><span class="k">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;error&#39;</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;auth.login.failed&#39;</span><span class="w">
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="w">  </span><span class="k">AND</span><span class="w"> </span><span class="n">ts</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="n">datetime</span><span class="p">(</span><span class="s1">&#39;now&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;-1 hour&#39;</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="w"></span><span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">session_id</span><span class="w">
</span></span></span><span class="line"><span class="ln">8</span><span class="cl"><span class="w"></span><span class="k">HAVING</span><span class="w"> </span><span class="n">fail_count</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span><span class="mi">5</span><span class="p">;</span></span></span></code></pre></div><h4 id="誰存取了什麼敏感資料">誰存取了什麼敏感資料</h4>
<p>查詢模式：敏感操作的 audit trail — 按時間列出所有敏感操作事件。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1">-- SQLite
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"></span><span class="k">SELECT</span><span class="w"> </span><span class="n">ts</span><span class="p">,</span><span class="w"> </span><span class="n">session_id</span><span class="p">,</span><span class="w"> </span><span class="n">name</span><span class="p">,</span><span class="w"> </span><span class="k">data</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w"></span><span class="k">FROM</span><span class="w"> </span><span class="n">events</span><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w"></span><span class="k">WHERE</span><span class="w"> </span><span class="k">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;event&#39;</span><span class="w">
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="w">  </span><span class="k">AND</span><span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="k">IN</span><span class="w"> </span><span class="p">(</span><span class="s1">&#39;data.export&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;admin.user_lookup&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;config.secret_read&#39;</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="w"></span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">ts</span><span class="w"> </span><span class="k">DESC</span><span class="p">;</span></span></span></code></pre></div><h3 id="需要的事件-3">需要的事件</h3>
<table>
  <thead>
      <tr>
          <th>事件類型</th>
          <th>必要欄位</th>
          <th>用途</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>error</td>
          <td>name=&lsquo;auth.*.failed&rsquo;, session_id</td>
          <td>偵測暴力破解</td>
      </tr>
      <tr>
          <td>event</td>
          <td>敏感操作的 name, session_id</td>
          <td>audit trail</td>
      </tr>
      <tr>
          <td>event</td>
          <td>data 中的操作目標（哪筆資料）</td>
          <td>存取範圍追溯</td>
      </tr>
  </tbody>
</table>
<p>安全事件的取樣率必須是 1.0（全收）— 取樣會讓攻擊嘗試在統計上隱形。見 <a href="/blog/monitoring/03-sdk-design/sensor-lifecycle-management/" data-link-title="感測器生命週期管理" data-link-desc="產品生命週期的五個階段各啟用什麼感測器 — feature flag 整合、取樣率動態調整、感測器開關的可觀察性">感測器生命週期管理</a> 的取樣率設計段。</p>
<h2 id="效能查詢">效能查詢</h2>
<p>效能查詢回答「系統有多快」和「哪裡變慢了」。</p>
<h3 id="查詢場景-4">查詢場景</h3>
<h4 id="p95-回應時間趨勢">P95 回應時間趨勢</h4>
<p>查詢模式：時間分桶 + percentile 聚合。需要 PostgreSQL 層。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1">-- PostgreSQL
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"></span><span class="k">SELECT</span><span class="w"> </span><span class="n">date_trunc</span><span class="p">(</span><span class="s1">&#39;hour&#39;</span><span class="p">,</span><span class="w"> </span><span class="n">ts</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">hour</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w">       </span><span class="n">percentile_cont</span><span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">95</span><span class="p">)</span><span class="w"> </span><span class="n">WITHIN</span><span class="w"> </span><span class="k">GROUP</span><span class="w"> </span><span class="p">(</span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="p">(</span><span class="k">data</span><span class="o">-&gt;&gt;</span><span class="s1">&#39;duration_ms&#39;</span><span class="p">)::</span><span class="nb">int</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">p95</span><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w"></span><span class="k">FROM</span><span class="w"> </span><span class="n">events</span><span class="w">
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="w"></span><span class="k">WHERE</span><span class="w"> </span><span class="k">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;metric&#39;</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;api.response.duration&#39;</span><span class="w">
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="w">  </span><span class="k">AND</span><span class="w"> </span><span class="n">ts</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="n">NOW</span><span class="p">()</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="nb">INTERVAL</span><span class="w"> </span><span class="s1">&#39;7 days&#39;</span><span class="w">
</span></span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="w"></span><span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">hour</span><span class="w">
</span></span></span><span class="line"><span class="ln">8</span><span class="cl"><span class="w"></span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">hour</span><span class="p">;</span></span></span></code></pre></div><p>SQLite 沒有內建 percentile 函數。SQLite 層的替代方案是排序後取第 95% 位置的值，但在大資料量時效能差。</p>
<h4 id="哪個版本變慢了">哪個版本變慢了</h4>
<p>查詢模式：按 source.version 分群比較效能。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1">-- SQLite / PostgreSQL
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"></span><span class="k">SELECT</span><span class="w"> </span><span class="n">source_version</span><span class="p">,</span><span class="w"> </span><span class="k">AVG</span><span class="p">((</span><span class="k">data</span><span class="o">-&gt;&gt;</span><span class="s1">&#39;duration_ms&#39;</span><span class="p">)::</span><span class="nb">int</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">avg_ms</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w">       </span><span class="k">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">sample_count</span><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w"></span><span class="k">FROM</span><span class="w"> </span><span class="n">events</span><span class="w">
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="w"></span><span class="k">WHERE</span><span class="w"> </span><span class="k">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;metric&#39;</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;api.response.duration&#39;</span><span class="w">
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="w">  </span><span class="k">AND</span><span class="w"> </span><span class="n">ts</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="n">datetime</span><span class="p">(</span><span class="s1">&#39;now&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;-7 days&#39;</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="w"></span><span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">source_version</span><span class="p">;</span></span></span></code></pre></div><h3 id="需要的事件-4">需要的事件</h3>
<table>
  <thead>
      <tr>
          <th>事件類型</th>
          <th>必要欄位</th>
          <th>用途</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>metric</td>
          <td>name, data.duration_ms, ts</td>
          <td>延遲趨勢</td>
      </tr>
      <tr>
          <td>metric</td>
          <td>source.version</td>
          <td>按版本比較</td>
      </tr>
      <tr>
          <td>metric</td>
          <td>data.memory_mb, data.cpu_percent</td>
          <td>資源使用趨勢</td>
      </tr>
  </tbody>
</table>
<h2 id="查詢--事件反推表">查詢 → 事件反推表</h2>
<p>設計事件時用這張表反向確認：每種查詢場景需要什麼事件、什麼欄位、什麼 storage 層級。</p>
<table>
  <thead>
      <tr>
          <th>查詢場景</th>
          <th>事件類型</th>
          <th>必要欄位</th>
          <th>Storage 層級</th>
          <th>保留需求</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Session 回放</td>
          <td>全部</td>
          <td>session_id, ts</td>
          <td>SQLite</td>
          <td>原始 7d</td>
      </tr>
      <tr>
          <td>Error 計數趨勢</td>
          <td>error</td>
          <td>name, ts</td>
          <td>SQLite</td>
          <td>小時聚合 90d</td>
      </tr>
      <tr>
          <td>功能使用率</td>
          <td>event</td>
          <td>name</td>
          <td>SQLite</td>
          <td>天聚合 365d</td>
      </tr>
      <tr>
          <td>Funnel 分析</td>
          <td>event</td>
          <td>name, session_id, ts</td>
          <td>PostgreSQL</td>
          <td>原始 30d</td>
      </tr>
      <tr>
          <td>暴力破解偵測</td>
          <td>error</td>
          <td>auth name, session_id</td>
          <td>SQLite</td>
          <td>原始 30d</td>
      </tr>
      <tr>
          <td>Audit trail</td>
          <td>event</td>
          <td>敏感操作 name, session_id</td>
          <td>SQLite</td>
          <td>原始 365d</td>
      </tr>
      <tr>
          <td>P95 趨勢</td>
          <td>metric</td>
          <td>duration_ms, ts</td>
          <td>PostgreSQL</td>
          <td>小時聚合 90d</td>
      </tr>
      <tr>
          <td>版本比較</td>
          <td>metric</td>
          <td>duration_ms, version</td>
          <td>SQLite</td>
          <td>天聚合 365d</td>
      </tr>
  </tbody>
</table>
<p>這張表和 <a href="/blog/monitoring/01-mental-model/event-enumeration-method/" data-link-title="事件枚舉與補齊檢查" data-link-desc="從操作盤點系統性地推導出完整的事件清單 — 四類補齊檢查確保沒有遺漏、粒度判準確保每個事件只記一個事實">事件枚舉與補齊檢查</a> 的事件表互補 — 事件枚舉從操作端正向推導「要收什麼」，本表從查詢端反向確認「收的夠不夠」。</p>
<h2 id="下一步路由">下一步路由</h2>
<ul>
<li>從操作端正向推導事件 → <a href="/blog/monitoring/01-mental-model/event-enumeration-method/" data-link-title="事件枚舉與補齊檢查" data-link-desc="從操作盤點系統性地推導出完整的事件清單 — 四類補齊檢查確保沒有遺漏、粒度判準確保每個事件只記一個事實">事件枚舉與補齊檢查</a></li>
<li>動機和事件的對應關係 → <a href="/blog/monitoring/01-mental-model/motivation-to-event-mapping/" data-link-title="動機驅動的事件設計" data-link-desc="Debug / 商業 / 資安 / 效能四個動機各自需要什麼事件 — 從「為什麼收」反推「收什麼」和「什麼階段啟用」">動機驅動的事件設計</a></li>
<li>SQLite vs PostgreSQL 的查詢能力分界 → <a href="/blog/monitoring/04-collector/feature-tier-boundary/" data-link-title="功能分層與 Backend 選擇" data-link-desc="SQLite 層和 PostgreSQL 層各自承載哪些功能 — 分界線是查詢模式而非資料量、觸發升級的是功能需求而非規模成長">功能分層與 Backend 選擇</a></li>
<li>Rule engine 的即時評估 → <a href="/blog/monitoring/04-collector/rule-engine/" data-link-title="Rule engine 設計" data-link-desc="條件 → 動作 → 模板的三段式規則結構 — 讓 collector 從被動儲存變成主動回應">Rule engine 設計</a></li>
</ul>
]]></content:encoded></item><item><title>1.13 應用層查詢反模式與 Query 預算</title><link>https://tarrragon.github.io/blog/backend/01-database/query-anti-patterns/</link><pubDate>Wed, 27 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/01-database/query-anti-patterns/</guid><description>&lt;p>應用程式變慢、第一個直覺常常是「資料庫不夠力」。多數團隊的真實瓶頸在應用程式發給資料庫的查詢方式、資料庫本身反而不是問題：N+1、select *、缺索引、ORM lazy load、長 transaction。本章把這些反模式列成可診斷、可修正的清單、並提出「每請求的 query 預算」作為發布前的判讀基準 — 讓讀者在資料層撞牆之前、先在應用層發現問題。&lt;/p>
&lt;h2 id="為什麼查詢反模式比-vendor-細節更重要">為什麼查詢反模式比 vendor 細節更重要&lt;/h2>
&lt;p>多數團隊面對「資料庫變慢」時，會先去看 vendor 的調校（buffer pool、配置升級、replica 加開）。這些調校通常把基礎效能拉高 1-2 倍；一個 N+1 query 反模式可以讓回應時間慢 10-1000 倍（具體倍數取決於 N 跟 RTT — N=100 + RTT=1ms 約慢 100 倍）。先解掉應用層的反模式、再去調 vendor 配置，整體效益遠高於反過來。&lt;/p>
&lt;p>這條優先序也對應 &lt;a href="https://tarrragon.github.io/blog/backend/09-performance-capacity/bottleneck-localization/" data-link-title="9.5 瓶頸定位流程" data-link-desc="從 app 到 DB / cache / broker / 第三方 quota 的逐層瓶頸定位">9.5 瓶頸定位流程&lt;/a> 的精神：先定位真正的瓶頸再決定是否加資源。應用層 query 是最常被忽略的瓶頸來源。&lt;/p>
&lt;h2 id="n1-query最常見也最隱性的反模式">N+1 Query：最常見也最隱性的反模式&lt;/h2>
&lt;p>N+1 query 指「先發一個 query 取回 N 筆資料、再對每一筆各發一個 query 取相關資料」，總共 1 + N 次 round trip。N 越大、整體越慢。&lt;/p>
&lt;p>典型範例：列出 100 個訂單跟每筆訂單的客戶資料。錯誤寫法是先 &lt;code>SELECT * FROM orders LIMIT 100&lt;/code> 拿到 100 筆訂單、再對每一筆訂單做 &lt;code>SELECT * FROM customers WHERE id = ?&lt;/code>，總共 101 次 query。正確寫法是 JOIN 或 IN 一次取回：&lt;code>SELECT o.*, c.* FROM orders o JOIN customers c ON o.customer_id = c.id LIMIT 100&lt;/code>，1 次 query 完成。&lt;/p>
&lt;p>N+1 在 ORM 環境特別隱性，因為它常被框架的 lazy loading 機制隱藏。Django ORM 的 &lt;code>order.customer&lt;/code> 看起來像存取 attribute，背後對應一次 query。寫程式時看不到 SQL，發布後才從 slow log 發現問題。&lt;/p>
&lt;p>判讀方式：開啟 ORM 的 query log（debug mode）、看一個 API request 跑出幾個 query。預期是個位數；若 query 數隨著資料集大小線性成長（例如 list 100 筆觸發 100 query、list 1000 筆觸發 1000 query），這條 scaling 訊號就是 N+1 — 比固定閾值更可靠的判讀。&lt;/p>
&lt;p>修正方向：&lt;/p>
&lt;ul>
&lt;li>ORM 端用 eager loading（Django &lt;code>select_related&lt;/code> / &lt;code>prefetch_related&lt;/code>、Rails &lt;code>includes&lt;/code>、SQLAlchemy &lt;code>joinedload&lt;/code>）&lt;/li>
&lt;li>自己寫 SQL 用 JOIN 或 IN 條件批次取&lt;/li>
&lt;li>確認 ORM 預設不是 lazy（有些 ORM 的設計鼓勵 lazy，需要明確標示 eager）&lt;/li>
&lt;/ul>
&lt;h2 id="select--與超量讀取">Select * 與超量讀取&lt;/h2>
&lt;p>&lt;code>SELECT *&lt;/code> 把表的所有欄位都拉出來，包含可能很大的欄位（content、blob、JSON）跟根本用不到的欄位。代價有三：&lt;/p>
&lt;ol>
&lt;li>&lt;strong>網路傳輸成本&lt;/strong>：query 結果在 DB 跟應用之間傳輸，欄位越多越大。&lt;/li>
&lt;li>&lt;strong>記憶體成本&lt;/strong>：應用程式要 deserialize 整個 row，物件越大記憶體佔越多。&lt;/li>
&lt;li>&lt;strong>隱性耦合&lt;/strong>：欄位有變動（新增、刪除、改型別）時，所有 &lt;code>SELECT *&lt;/code> 的 query 都會被影響。&lt;/li>
&lt;/ol>
&lt;p>修正方向是明確列出需要的欄位：&lt;code>SELECT id, name, status FROM orders&lt;/code>。如果擔心欄位列表太長，問自己是不是 query 試圖一次處理太多責任。&lt;/p></description><content:encoded><![CDATA[<p>應用程式變慢、第一個直覺常常是「資料庫不夠力」。多數團隊的真實瓶頸在應用程式發給資料庫的查詢方式、資料庫本身反而不是問題：N+1、select *、缺索引、ORM lazy load、長 transaction。本章把這些反模式列成可診斷、可修正的清單、並提出「每請求的 query 預算」作為發布前的判讀基準 — 讓讀者在資料層撞牆之前、先在應用層發現問題。</p>
<h2 id="為什麼查詢反模式比-vendor-細節更重要">為什麼查詢反模式比 vendor 細節更重要</h2>
<p>多數團隊面對「資料庫變慢」時，會先去看 vendor 的調校（buffer pool、配置升級、replica 加開）。這些調校通常把基礎效能拉高 1-2 倍；一個 N+1 query 反模式可以讓回應時間慢 10-1000 倍（具體倍數取決於 N 跟 RTT — N=100 + RTT=1ms 約慢 100 倍）。先解掉應用層的反模式、再去調 vendor 配置，整體效益遠高於反過來。</p>
<p>這條優先序也對應 <a href="/blog/backend/09-performance-capacity/bottleneck-localization/" data-link-title="9.5 瓶頸定位流程" data-link-desc="從 app 到 DB / cache / broker / 第三方 quota 的逐層瓶頸定位">9.5 瓶頸定位流程</a> 的精神：先定位真正的瓶頸再決定是否加資源。應用層 query 是最常被忽略的瓶頸來源。</p>
<h2 id="n1-query最常見也最隱性的反模式">N+1 Query：最常見也最隱性的反模式</h2>
<p>N+1 query 指「先發一個 query 取回 N 筆資料、再對每一筆各發一個 query 取相關資料」，總共 1 + N 次 round trip。N 越大、整體越慢。</p>
<p>典型範例：列出 100 個訂單跟每筆訂單的客戶資料。錯誤寫法是先 <code>SELECT * FROM orders LIMIT 100</code> 拿到 100 筆訂單、再對每一筆訂單做 <code>SELECT * FROM customers WHERE id = ?</code>，總共 101 次 query。正確寫法是 JOIN 或 IN 一次取回：<code>SELECT o.*, c.* FROM orders o JOIN customers c ON o.customer_id = c.id LIMIT 100</code>，1 次 query 完成。</p>
<p>N+1 在 ORM 環境特別隱性，因為它常被框架的 lazy loading 機制隱藏。Django ORM 的 <code>order.customer</code> 看起來像存取 attribute，背後對應一次 query。寫程式時看不到 SQL，發布後才從 slow log 發現問題。</p>
<p>判讀方式：開啟 ORM 的 query log（debug mode）、看一個 API request 跑出幾個 query。預期是個位數；若 query 數隨著資料集大小線性成長（例如 list 100 筆觸發 100 query、list 1000 筆觸發 1000 query），這條 scaling 訊號就是 N+1 — 比固定閾值更可靠的判讀。</p>
<p>修正方向：</p>
<ul>
<li>ORM 端用 eager loading（Django <code>select_related</code> / <code>prefetch_related</code>、Rails <code>includes</code>、SQLAlchemy <code>joinedload</code>）</li>
<li>自己寫 SQL 用 JOIN 或 IN 條件批次取</li>
<li>確認 ORM 預設不是 lazy（有些 ORM 的設計鼓勵 lazy，需要明確標示 eager）</li>
</ul>
<h2 id="select--與超量讀取">Select * 與超量讀取</h2>
<p><code>SELECT *</code> 把表的所有欄位都拉出來，包含可能很大的欄位（content、blob、JSON）跟根本用不到的欄位。代價有三：</p>
<ol>
<li><strong>網路傳輸成本</strong>：query 結果在 DB 跟應用之間傳輸，欄位越多越大。</li>
<li><strong>記憶體成本</strong>：應用程式要 deserialize 整個 row，物件越大記憶體佔越多。</li>
<li><strong>隱性耦合</strong>：欄位有變動（新增、刪除、改型別）時，所有 <code>SELECT *</code> 的 query 都會被影響。</li>
</ol>
<p>修正方向是明確列出需要的欄位：<code>SELECT id, name, status FROM orders</code>。如果擔心欄位列表太長，問自己是不是 query 試圖一次處理太多責任。</p>
<p>例外是 ad-hoc query 跟 DB tool 環境，可以接受 <code>SELECT *</code>。production code 不應該有。</p>
<h2 id="缺索引查詢計畫沒走索引">缺索引：查詢計畫沒走索引</h2>
<p>缺索引的徵兆是 query 在小資料量時很快、資料一多就突然慢。原因是 query 走了 full table scan，資料量小時 scan 還快、資料量上百萬筆就慢。</p>
<p>判讀方式是用 <code>EXPLAIN</code> 看查詢計畫：</p>
<ul>
<li><code>type=ALL</code> 或 <code>Seq Scan</code> 代表沒走索引</li>
<li><code>rows</code> 估計值跟實際表大小接近，代表掃描範圍過大</li>
<li><code>Using filesort</code> / <code>Using temporary</code> 代表排序或暫存資料的成本</li>
</ul>
<p>修正方向不是「對每個 WHERE 條件都建索引」，這會讓寫入變慢、索引變大。要建索引的判讀條件：</p>
<ul>
<li>該 query 是熱路徑（頻率高、影響 user）</li>
<li>該欄位有足夠選擇性（distinct 值多）</li>
<li>該欄位沒有跟其他索引重複覆蓋</li>
<li>寫入路徑能承受多一個索引的維護成本</li>
</ul>
<p>複合索引的欄位順序也要對齊 query 的 WHERE 條件。<code>WHERE a = ? AND b = ?</code> 適合 <code>(a, b)</code> 複合索引，不適合 <code>(b, a)</code>。這部分屬於 <a href="/blog/backend/01-database/schema-design/" data-link-title="1.2 Schema Design 與資料建模" data-link-desc="整理 table、index、key、partition、denormalization 與命名規則">1.2 schema design 與資料建模</a> 的範圍、本章只標出徵兆跟診斷起點。</p>
<h2 id="orm-lazy-load-陷阱">ORM Lazy Load 陷阱</h2>
<p>ORM 的 lazy load 預設行為是「存取 attribute 時才發 query」，這在開發時讓 code 很乾淨，但隱藏了 query 的數量。</p>
<p>常見陷阱：</p>
<ul>
<li><strong>跨 transaction 邊界存取 lazy attribute</strong>：query 在原 transaction 已關閉後才發，連線狀態錯誤。</li>
<li><strong>在 template / serializer 裡存取 lazy attribute</strong>：一個 page render 觸發數十個額外 query。</li>
<li><strong>lazy load 跨服務邊界</strong>：DTO 傳遞時不知道哪些 attribute 是 lazy、哪些是 eager，前端拿到 DTO 後 trigger 額外 query。</li>
</ul>
<p>修正方向：</p>
<ul>
<li>明確標示 eager loading 邊界，serializer 之前完成所有需要的資料載入</li>
<li>ORM 配置改成 default eager 或 strict mode（query 太多會 warning）</li>
<li>DTO 出 service 邊界前做 fully materialized</li>
</ul>
<h2 id="long-running-transaction">Long-Running Transaction</h2>
<p>長時間佔住的 transaction 會擋住其他 query、產生 lock 等待、消耗連線池資源。</p>
<p>常見成因：</p>
<ul>
<li>在 transaction 內做 HTTP call 或外部 API 呼叫</li>
<li>在 transaction 內做檔案 I/O 或長計算</li>
<li>用 transaction 包住整個 request handler（從 request 開始到 response 結束都在 transaction）</li>
<li>ORM 設定 default transaction-per-request 但業務只需要短交易</li>
</ul>
<p>修正方向是把 transaction 範圍縮到最小：只包住「需要原子性」的那幾個 SQL 操作。外部呼叫、計算、檔案 I/O 都要在 transaction 之外。詳見 <a href="/blog/backend/01-database/transaction-boundary/" data-link-title="1.3 Transaction 與一致性邊界" data-link-desc="交易邊界、isolation level、retry 策略、distributed transaction（2PC、Saga）與跨 region 強一致取捨">1.3 transaction 與一致性邊界</a>。</p>
<h2 id="其他常見反模式">其他常見反模式</h2>
<p>上面五個是讀路徑高頻反模式。實務上其他幾類在 slow log 出現頻率不低、要一併列入發布前檢查：</p>
<ul>
<li><strong><a href="/blog/backend/knowledge-cards/cardinality-explosion/" data-link-title="Query Cardinality Explosion" data-link-desc="Query 結果行數因 join / cross product / 條件缺失爆炸性放大的反模式">Cardinality explosion</a> / cross join 誤用</strong>：兩個多對多關聯 join 沒加 filter、結果集從 N 行炸成 N×M 行。判讀訊號：query 結果行數遠超業務直覺、<code>EXPLAIN</code> 估計 rows 異常大。修正方向：補 filter、改 EXISTS / IN 半連接、或拆兩段 query。</li>
<li><strong>OFFSET-based pagination on large tables</strong>：<code>LIMIT 20 OFFSET 100000</code> 在大表退化成「掃描 100020 行 + skip 100000 行」。修正方向：用 <a href="/blog/backend/knowledge-cards/keyset-pagination/" data-link-title="Keyset Pagination" data-link-desc="用上一頁最後一筆的 key 當下一頁起點、避開 OFFSET 大表時的線性退化">keyset / cursor pagination</a>（<code>WHERE id &gt; last_seen_id LIMIT 20</code>）— 一致 O(LIMIT) 而非 O(OFFSET + LIMIT)。</li>
<li><strong>隱式型別轉換讓 index 失效</strong>：<code>WHERE varchar_col = 123</code> 把 column 轉成 int 比較、index 失效退到 full scan。判讀訊號：EXPLAIN 顯示 index 沒命中但 schema 上有 index。修正方向：明示型別（<code>WHERE varchar_col = '123'</code>）。</li>
<li><strong>應用層做大結果集排序 / 聚合</strong>：把 100 萬行拉回應用、在記憶體 sort 或 group。應該 push 給 DB 做 <code>ORDER BY</code> / <code>GROUP BY</code> + <code>LIMIT</code>。判讀訊號：應用程式記憶體用量隨 endpoint 流量線性升高。</li>
<li><strong>N+1 write</strong>：在 loop 內單筆 insert / update 而非 bulk insert。每筆觸發一次 round trip + 可能的 fsync。修正方向：用 <code>INSERT ... VALUES (), (), ()</code> 或 <code>executemany</code> / <code>bulk_create</code>。</li>
</ul>
<p>NoSQL / KV DB 也有 sibling 反模式（hot partition、read amplification、scan-and-filter），不在本章 SQL 範疇但邏輯類似 — 詳見 <a href="/blog/backend/01-database/kv-document-capacity-planning/" data-link-title="1.10 KV / Document DB 容量規劃" data-link-desc="DynamoDB / Cosmos DB / Bigtable / MongoDB 等 KV / Document DB 的容量設計、partition key 取捨、capacity mode 選擇">1.10 KV / Document DB 容量規劃</a>。</p>
<h2 id="每請求的-query-預算">每請求的 Query 預算</h2>
<p>把上面這些反模式收斂成一個發布前可檢查的判準：每個 API request 允許發多少個 query。</p>
<table>
  <thead>
      <tr>
          <th>API 類型</th>
          <th>建議 query 預算</th>
          <th>判讀說明</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>簡單 read（取單筆）</td>
          <td>1–3 個</td>
          <td>主資源 1 個 + 相關資源 join 或 1–2 個額外</td>
      </tr>
      <tr>
          <td>List read（取列表）</td>
          <td>1–5 個</td>
          <td>主列表 1 個 + filter / pagination / 關聯 batch query</td>
      </tr>
      <tr>
          <td>Write（單筆操作）</td>
          <td>2–5 個</td>
          <td>check 1 個 + write 1 個 + 觸發後續 query</td>
      </tr>
      <tr>
          <td>Complex（多步驟業務）</td>
          <td>5–15 個</td>
          <td>視業務複雜度，但每多 1 個都要能講出為什麼</td>
      </tr>
  </tbody>
</table>
<p>超過預算不一定錯，但需要解釋。CI / staging 可以加 middleware 統計每個 endpoint 的 query 數，超過閾值在 PR review 時觸發討論。這比事後從 slow log 找問題更有效。</p>
<p>這張表以 OLTP API 為主。Dashboard / report / search endpoint 常需要 10-30 query 解 join / aggregation、用「Complex」涵蓋不夠精確；batch / bulk write（一次寫入 1000 筆訂單）不該用 query count 評估、應該看 batch size 跟 transaction 範圍。預算是判讀工具、不是硬閾值。</p>
<h2 id="判讀訊號">判讀訊號</h2>
<table>
  <thead>
      <tr>
          <th>訊號</th>
          <th>判讀重點</th>
          <th>對應動作</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>API 在資料量增加後突然變慢</td>
          <td>缺索引或查詢計畫退化</td>
          <td>跑 EXPLAIN、檢查 query plan</td>
      </tr>
      <tr>
          <td>同一個 API 跑出 dozens 個 query</td>
          <td>N+1 反模式</td>
          <td>加 eager loading 或改寫成 JOIN</td>
      </tr>
      <tr>
          <td>應用程式記憶體用量隨流量線性升高</td>
          <td><code>SELECT *</code> 載入過多資料</td>
          <td>改成明確欄位、加 pagination</td>
      </tr>
      <tr>
          <td>DB connection 等待時間升高</td>
          <td>long transaction 或 connection pool 不足</td>
          <td>縮 transaction 範圍、評估 <a href="/blog/backend/knowledge-cards/connection-pool/" data-link-title="Connection Pool" data-link-desc="說明連線池如何限制下游資源並影響服務容量">connection pool</a> 上限</td>
      </tr>
      <tr>
          <td>Lock wait timeout 變多</td>
          <td>long transaction 或 hot row 競爭</td>
          <td>拆 transaction、檢查 hot row 設計</td>
      </tr>
      <tr>
          <td>Slow query log 集中在某類 SQL</td>
          <td>該 query 走了 full scan 或 join 順序錯誤</td>
          <td>EXPLAIN + 加索引或改寫 query</td>
      </tr>
      <tr>
          <td>ORM debug log 顯示 hundreds query</td>
          <td>lazy load 失控</td>
          <td>換 eager loading 策略、檢視 serializer 邊界</td>
      </tr>
  </tbody>
</table>
<h2 id="常見誤區">常見誤區</h2>
<p>把「資料庫變慢」直接解讀成「該升級資料庫」。先看應用層 query。多數效能問題是反模式造成的、而不是 DB 規格不夠。</p>
<p>把索引當「想加就加」。每個索引都有寫入成本跟空間成本。索引太多會讓 INSERT/UPDATE 變慢、backup 變大。要建索引前先驗證該 query 是熱路徑。</p>
<p>把 N+1 當「在 ORM 環境無解」。多數 ORM 都有 eager loading 選項，只是預設 lazy。問題是團隊沒把這當作預設策略。設定 ORM 為 default eager 或在 CI 加 query 數量檢查就能避免。</p>
<p>把 transaction 範圍當「越大越安全」。長 transaction 是 lock 風險來源，不是一致性保證。一致性靠正確的 isolation level 跟業務邏輯，不是靠長 transaction 鎖住整個流程。</p>
<h2 id="定位邊界">定位邊界</h2>
<p>本章專注「應用層發給資料庫的 query 反模式」。當問題進入 schema 設計（要不要拆表？要不要 partition？）交給 <a href="/blog/backend/01-database/schema-design/" data-link-title="1.2 Schema Design 與資料建模" data-link-desc="整理 table、index、key、partition、denormalization 與命名規則">1.2 schema design</a>；進入 transaction 語意（什麼時候用 SERIALIZABLE？怎麼 retry？）交給 <a href="/blog/backend/01-database/transaction-boundary/" data-link-title="1.3 Transaction 與一致性邊界" data-link-desc="交易邊界、isolation level、retry 策略、distributed transaction（2PC、Saga）與跨 region 強一致取捨">1.3 transaction boundary</a>；進入跨服務的查詢責任拆分（哪些查詢屬於該服務？）交給 <a href="/blog/backend/01-database/state-ownership-query-boundary/" data-link-title="1.8 State Ownership 與 Query Boundary" data-link-desc="正式狀態 vs 派生狀態的責任分層、CQRS / event sourcing / materialized view、四種 query 邊界">1.8 state ownership 與 query boundary</a>；進入瓶頸定位的工程流程交給 <a href="/blog/backend/09-performance-capacity/bottleneck-localization/" data-link-title="9.5 瓶頸定位流程" data-link-desc="從 app 到 DB / cache / broker / 第三方 quota 的逐層瓶頸定位">9.5 瓶頸定位流程</a>。</p>
<h2 id="案例回寫">案例回寫</h2>
<p>09 案例庫的主軸是規模、vendor 與容量壓力，直接以「query 反模式」為主題的案例較少。下列案例可以反向讀：每一個都展示了「在沒有先用 query 反模式優化收回壓力的前提下、團隊直接走 vendor 遷移或 scale-out 路徑」的決策。讀者讀完應追問：這些 case 啟動遷移前、是否有可能用本章的反模式清單先收回一部分容量？</p>
<ul>
<li><a href="/blog/backend/09-performance-capacity/cases/doordash-cockroachdb-orders-platform/" data-link-title="9.C39 DoorDash：Aurora Postgres 寫入瓶頸 → CockroachDB 多主寫入" data-link-desc="DoorDash 從 Aurora Postgres 遷到 CockroachDB、解 1.6 M QPS 單主寫入瓶頸、外送平台爆量壓力下重做 OLTP 拓樸">9.C39 DoorDash：Aurora Postgres 寫入瓶頸 → CockroachDB</a> — DoorDash 撞到 Aurora single-primary write 天花板（瓶頸在 primary CPU + WAL flush rate）、用 PostgreSQL wire protocol 相容的 CockroachDB 換成多主寫入、ORM 不必重寫。對照本章可問：寫入熱點是否伴隨長 transaction 或熱 row 競爭？這些是 vendor 遷移前可以先用本章「Long-Running Transaction」清單檢查的點。</li>
<li><a href="/blog/backend/09-performance-capacity/cases/zomato-tidb-to-dynamodb-migration/" data-link-title="9.C20 Zomato：從 TiDB 遷移到 DynamoDB、吞吐 4 倍、延遲降 90%、成本減 50%" data-link-desc="Zomato 帳單系統從 TiDB 遷移到 DynamoDB、吞吐 2K→8K RPM、延遲降 90%、成本減 50%">9.C20 Zomato：TiDB 遷到 DynamoDB</a> — Zomato 判斷 billing 事件本身可接受 eventually consistent、用一致性語意換取 4 倍吞吐 + 50% 成本。對照本章可問：遷移前每筆業務動作平均發了多少 query、是否有 N+1 或 select * 在放大壓力？把這條問題擺進「每請求 Query 預算」段一起讀。</li>
<li><a href="/blog/backend/09-performance-capacity/cases/standard-chartered-aurora-banking/" data-link-title="9.C14 Standard Chartered：受監管銀行的 Aurora 4000 TPS 容量提升" data-link-desc="Standard Chartered 銀行遷移到 Aurora 後吞吐量提升 10 倍至 4000 TPS、跨 7 個受監管市場">9.C14 Standard Chartered：Aurora 4000 TPS 合規容量</a> — Standard Chartered 在 7 個受監管市場各跑獨立 Aurora cluster（資料不能跨境）、容量規劃單位是「per 市場」、合規邊界決定了 cluster 拓樸。對照本章可問：query 預算假設是否進入容量模型？預算寫鬆、規劃出的 per-cluster TPS 上限會偏低。</li>
</ul>
<p>DoorDash 案例是這條反向追問最直接的應用 — 寫入瓶頸的判讀不該停在 vendor 規格、而是先檢查 transaction 範圍跟熱 row 競爭。Zomato 跟 Standard Chartered 的反向追問則退一步問「query 預算假設是否進入容量模型」。三條追問共享同一條診斷邏輯：應用層 query 不是事後解釋的細節、是事前可以收回的容量。這個讀法承認案例本身不直接示範 query 反模式、是用反向追問把案例當成 query 反模式重要性的反證。</p>
<h2 id="跨模組路由">跨模組路由</h2>
<ol>
<li>與 <a href="/blog/backend/01-database/high-concurrency-access/" data-link-title="1.1 高併發下的 SQL 讀寫邊界" data-link-desc="說明高併發服務如何共用資料庫 client、控制 transaction、管理 connection pool、避免資料庫成為瓶頸">1.1 高併發下的 SQL 讀寫邊界</a> 的交接：1.1 處理連線池與 read replica 機制、1.13 處理 query 寫法本身。高併發場景下兩者要同步檢查。</li>
<li>與 <a href="/blog/backend/01-database/schema-design/" data-link-title="1.2 Schema Design 與資料建模" data-link-desc="整理 table、index、key、partition、denormalization 與命名規則">1.2 schema design</a> 的交接：索引設計是 schema 層的事、本章只指出徵兆。</li>
<li>與 <a href="/blog/backend/04-observability/" data-link-title="模組四：可觀測性平台" data-link-desc="整理 log、metric、trace、dashboard 與 alert 的後端操作實務">04 observability</a> 的交接：slow query log、APM、query trace 是判讀反模式的主要訊號來源。</li>
<li>與 <a href="/blog/backend/09-performance-capacity/bottleneck-localization/" data-link-title="9.5 瓶頸定位流程" data-link-desc="從 app 到 DB / cache / broker / 第三方 quota 的逐層瓶頸定位">9.5 瓶頸定位流程</a> 的交接：先在應用層查反模式，再考慮 DB 配置升級。</li>
<li>與 <a href="/blog/backend/09-performance-capacity/scaling-axes/" data-link-title="9.13 擴展軸與 Stateless 前提" data-link-desc="整理垂直 / 水平擴展取捨、stateless vs stateful 前提、auto scaling 操作模型與兩種擴展的 hidden cost">9.13 擴展軸</a> 的交接：規模成長路線上、9.13 解擴展軸選擇後、1.13 是緊接著的下一站 — 在加機器或加 replica 前、先用本章反模式清單收回單機能撐住的容量。</li>
<li>與 <a href="/blog/backend/10-system-evolution/service-decomposition-boundaries/" data-link-title="10.1 服務拆分與邊界判讀" data-link-desc="整理 monolith vs microservice 取捨、服務邊界判讀訊號、拆分時機與回退路徑">10.1 服務拆分</a> 的交接：拆服務常被用來「解決 DB 慢」，但本章的反模式優化通常比拆服務 ROI 更高、應該優先嘗試。</li>
</ol>
<h2 id="下一步路由">下一步路由</h2>
<p><strong>規模成長路線下一站 → <a href="/blog/backend/01-database/high-concurrency-access/" data-link-title="1.1 高併發下的 SQL 讀寫邊界" data-link-desc="說明高併發服務如何共用資料庫 client、控制 transaction、管理 connection pool、避免資料庫成為瓶頸">1.1 高併發下的 SQL 讀寫邊界</a></strong>：query 反模式收完後、處理連線池與 read replica 的擴展。</p>
<p>其他延伸方向：</p>
<ul>
<li>Schema 與索引設計 → <a href="/blog/backend/01-database/schema-design/" data-link-title="1.2 Schema Design 與資料建模" data-link-desc="整理 table、index、key、partition、denormalization 與命名規則">1.2 schema design 與資料建模</a></li>
<li>Transaction 範圍收斂 → <a href="/blog/backend/01-database/transaction-boundary/" data-link-title="1.3 Transaction 與一致性邊界" data-link-desc="交易邊界、isolation level、retry 策略、distributed transaction（2PC、Saga）與跨 region 強一致取捨">1.3 transaction 與一致性邊界</a></li>
<li>瓶頸定位完整流程 → <a href="/blog/backend/09-performance-capacity/bottleneck-localization/" data-link-title="9.5 瓶頸定位流程" data-link-desc="從 app 到 DB / cache / broker / 第三方 quota 的逐層瓶頸定位">9.5 瓶頸定位流程</a></li>
</ul>
]]></content:encoded></item></channel></rss>