<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Query-Optimization on Tarragon</title><link>https://tarrragon.github.io/blog/tags/query-optimization/</link><description>Recent content in Query-Optimization on Tarragon</description><generator>Hugo -- gohugo.io</generator><language>zh-TW</language><copyright>Tarragon (CC BY 4.0)</copyright><lastBuildDate>Wed, 27 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://tarrragon.github.io/blog/tags/query-optimization/index.xml" rel="self" type="application/rss+xml"/><item><title>MySQL Query Optimization：從 EXPLAIN 看到實際執行、5 條 query 從 5 秒變 50ms 的 anatomy</title><link>https://tarrragon.github.io/blog/backend/01-database/vendors/mysql/query-optimization/</link><pubDate>Tue, 19 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/01-database/vendors/mysql/query-optimization/</guid><description>&lt;blockquote>
&lt;p>本文是 &lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/mysql/" data-link-title="MySQL" data-link-desc="高併發網路服務常用關聯式資料庫、Vitess / PlanetScale 分片生態、GitHub / Shopify / Facebook 規模驗證">MySQL&lt;/a> overview 的 implementation-layer deep article。Overview 已說明 MySQL 在 OLTP 譜系的定位、本文聚焦 &lt;em>query optimization&lt;/em> — EXPLAIN / optimizer trace / hint 三層工具跟 5 個實際 case。&lt;/p>&lt;/blockquote>
&lt;hr>
&lt;h2 id="5-個常見-production-case">5 個常見 production case&lt;/h2>
&lt;p>production 上 query 慢、root cause 幾乎都是 &lt;em>optimizer 選錯 plan&lt;/em>。從以下 5 個 case 進入 query optimization：&lt;/p>
&lt;h3 id="case-15-秒--50ms--join-順序選錯">Case 1：5 秒 → 50ms — JOIN 順序選錯&lt;/h3>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-sql" data-lang="sql">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="c1">-- 慢 (5 秒)：optimizer 選 customers 為 outer table、scan 全 1M row
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="k">SELECT&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">o&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="n">id&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">o&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="n">amount&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">c&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="n">name&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="k">FROM&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">orders&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">o&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">JOIN&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">customers&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">c&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">ON&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">o&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="n">customer_id&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">c&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="n">id&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="k">WHERE&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">o&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="n">created_at&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s1">&amp;#39;2026-05-01&amp;#39;&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">AND&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">c&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="n">region&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s1">&amp;#39;TW&amp;#39;&lt;/span>&lt;span class="p">;&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>EXPLAIN 顯示：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-text" data-lang="text">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">+----+-------------+-------+------+---------------+--------+
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">| id | select_type | table | type | possible_keys | rows |
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">+----+-------------+-------+------+---------------+--------+
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">| 1 | SIMPLE | c | ALL | NULL | 1000000|
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl">| 1 | SIMPLE | o | ref | idx_cust_id | 100 |
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">6&lt;/span>&lt;span class="cl">+----+-------------+-------+------+---------------+--------+&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;code>c&lt;/code> table type=ALL（full scan）、rows=1M。問題：&lt;code>customers&lt;/code> 沒在 &lt;code>region&lt;/code> 上的 index、optimizer 預估「region=TW filter 沒效率、就 full scan」、但 region=TW 只佔 10% row（100K row）。&lt;/p>
&lt;p>修法：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-sql" data-lang="sql">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="k">ALTER&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">TABLE&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">customers&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">ADD&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">INDEX&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">idx_region&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">region&lt;/span>&lt;span class="p">);&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="k">ANALYZE&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">TABLE&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">customers&lt;/span>&lt;span class="p">;&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="c1">-- 更新 statistics&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>加 index 後 optimizer 切 plan：先 scan &lt;code>customers&lt;/code> 用 &lt;code>idx_region&lt;/code> 篩 100K row、再 join &lt;code>orders&lt;/code>。從 5 秒降到 50ms。&lt;/p>
&lt;h3 id="case-230-秒--200ms--range-scan-退化-all">Case 2：30 秒 → 200ms — Range scan 退化 ALL&lt;/h3>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-sql" data-lang="sql">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="k">SELECT&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="o">*&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">FROM&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">events&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="k">WHERE&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">created_at&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">BETWEEN&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s1">&amp;#39;2026-05-01&amp;#39;&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">AND&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s1">&amp;#39;2026-05-02&amp;#39;&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="k">AND&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">user_id&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="mi">12345&lt;/span>&lt;span class="p">;&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;code>events&lt;/code> 有 &lt;code>idx_user_id&lt;/code> 跟 &lt;code>idx_created_at&lt;/code> 兩個 index、optimizer 應該選一個 + 二級 filter、但實際 &lt;code>type=ALL&lt;/code>（full scan）。&lt;/p></description><content:encoded><![CDATA[<blockquote>
<p>本文是 <a href="/blog/backend/01-database/vendors/mysql/" data-link-title="MySQL" data-link-desc="高併發網路服務常用關聯式資料庫、Vitess / PlanetScale 分片生態、GitHub / Shopify / Facebook 規模驗證">MySQL</a> overview 的 implementation-layer deep article。Overview 已說明 MySQL 在 OLTP 譜系的定位、本文聚焦 <em>query optimization</em> — EXPLAIN / optimizer trace / hint 三層工具跟 5 個實際 case。</p></blockquote>
<hr>
<h2 id="5-個常見-production-case">5 個常見 production case</h2>
<p>production 上 query 慢、root cause 幾乎都是 <em>optimizer 選錯 plan</em>。從以下 5 個 case 進入 query optimization：</p>
<h3 id="case-15-秒--50ms--join-順序選錯">Case 1：5 秒 → 50ms — JOIN 順序選錯</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1">-- 慢 (5 秒)：optimizer 選 customers 為 outer table、scan 全 1M row
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"></span><span class="k">SELECT</span><span class="w"> </span><span class="n">o</span><span class="p">.</span><span class="n">id</span><span class="p">,</span><span class="w"> </span><span class="n">o</span><span class="p">.</span><span class="n">amount</span><span class="p">,</span><span class="w"> </span><span class="k">c</span><span class="p">.</span><span class="n">name</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w"></span><span class="k">FROM</span><span class="w"> </span><span class="n">orders</span><span class="w"> </span><span class="n">o</span><span class="w"> </span><span class="k">JOIN</span><span class="w"> </span><span class="n">customers</span><span class="w"> </span><span class="k">c</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">o</span><span class="p">.</span><span class="n">customer_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">c</span><span class="p">.</span><span class="n">id</span><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w"></span><span class="k">WHERE</span><span class="w"> </span><span class="n">o</span><span class="p">.</span><span class="n">created_at</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span><span class="s1">&#39;2026-05-01&#39;</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="k">c</span><span class="p">.</span><span class="n">region</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;TW&#39;</span><span class="p">;</span></span></span></code></pre></div><p>EXPLAIN 顯示：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">+----+-------------+-------+------+---------------+--------+
</span></span><span class="line"><span class="ln">2</span><span class="cl">| id | select_type | table | type | possible_keys | rows   |
</span></span><span class="line"><span class="ln">3</span><span class="cl">+----+-------------+-------+------+---------------+--------+
</span></span><span class="line"><span class="ln">4</span><span class="cl">|  1 | SIMPLE      | c     | ALL  | NULL          | 1000000|
</span></span><span class="line"><span class="ln">5</span><span class="cl">|  1 | SIMPLE      | o     | ref  | idx_cust_id   | 100    |
</span></span><span class="line"><span class="ln">6</span><span class="cl">+----+-------------+-------+------+---------------+--------+</span></span></code></pre></div><p><code>c</code> table type=ALL（full scan）、rows=1M。問題：<code>customers</code> 沒在 <code>region</code> 上的 index、optimizer 預估「region=TW filter 沒效率、就 full scan」、但 region=TW 只佔 10% row（100K row）。</p>
<p>修法：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">ALTER</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">customers</span><span class="w"> </span><span class="k">ADD</span><span class="w"> </span><span class="k">INDEX</span><span class="w"> </span><span class="n">idx_region</span><span class="w"> </span><span class="p">(</span><span class="n">region</span><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w"></span><span class="k">ANALYZE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">customers</span><span class="p">;</span><span class="w">  </span><span class="c1">-- 更新 statistics</span></span></span></code></pre></div><p>加 index 後 optimizer 切 plan：先 scan <code>customers</code> 用 <code>idx_region</code> 篩 100K row、再 join <code>orders</code>。從 5 秒降到 50ms。</p>
<h3 id="case-230-秒--200ms--range-scan-退化-all">Case 2：30 秒 → 200ms — Range scan 退化 ALL</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">events</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w"></span><span class="k">WHERE</span><span class="w"> </span><span class="n">created_at</span><span class="w"> </span><span class="k">BETWEEN</span><span class="w"> </span><span class="s1">&#39;2026-05-01&#39;</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="s1">&#39;2026-05-02&#39;</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w"></span><span class="k">AND</span><span class="w"> </span><span class="n">user_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">12345</span><span class="p">;</span></span></span></code></pre></div><p><code>events</code> 有 <code>idx_user_id</code> 跟 <code>idx_created_at</code> 兩個 index、optimizer 應該選一個 + 二級 filter、但實際 <code>type=ALL</code>（full scan）。</p>
<p>EXPLAIN ANALYZE 顯示：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">-&gt; Filter: ((events.user_id = 12345) and (events.created_at between ...))  (cost=2M rows=100)
</span></span><span class="line"><span class="ln">2</span><span class="cl">    -&gt; Table scan on events  (cost=2M rows=10000000)  (actual time=0.1..30s ...)</span></span></code></pre></div><p>問題：optimizer estimated rows=100、實際 <em>cardinality estimation</em> 失準（distribution skew）、選了 ALL。</p>
<p>修法：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1">-- 用 composite index 直接 cover 兩個條件
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"></span><span class="k">ALTER</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">events</span><span class="w"> </span><span class="k">ADD</span><span class="w"> </span><span class="k">INDEX</span><span class="w"> </span><span class="n">idx_user_created</span><span class="w"> </span><span class="p">(</span><span class="n">user_id</span><span class="p">,</span><span class="w"> </span><span class="n">created_at</span><span class="p">);</span></span></span></code></pre></div><p>Composite index 讓 optimizer 看到 <em>單一 index 直接 satisfy 兩個 predicate</em>、走 range scan + index condition pushdown。30 秒降到 200ms。</p>
<h3 id="case-38-秒--30ms--subquery-沒-unnest">Case 3：8 秒 → 30ms — Subquery 沒 unnest</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">orders</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w"></span><span class="k">WHERE</span><span class="w"> </span><span class="n">customer_id</span><span class="w"> </span><span class="k">IN</span><span class="w"> </span><span class="p">(</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w">    </span><span class="k">SELECT</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">customers</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">region</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;TW&#39;</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="n">vip_level</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="mi">3</span><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w"></span><span class="p">);</span></span></span></code></pre></div><p>5.6 之前 MySQL 把 <code>IN (subquery)</code> 寫成 <em>correlated subquery</em>、外表每 row 都 re-run subquery、極慢。5.6+ 加 subquery unnesting、轉換成 JOIN，但某些情況 unnest 失敗。</p>
<p>EXPLAIN 顯示：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">+----+--------------------+-----------+-------+
</span></span><span class="line"><span class="ln">2</span><span class="cl">| id | select_type        | table     | type  |
</span></span><span class="line"><span class="ln">3</span><span class="cl">+----+--------------------+-----------+-------+
</span></span><span class="line"><span class="ln">4</span><span class="cl">|  1 | PRIMARY            | orders    | ALL   |
</span></span><span class="line"><span class="ln">5</span><span class="cl">|  2 | DEPENDENT SUBQUERY | customers | unique_subquery |
</span></span><span class="line"><span class="ln">6</span><span class="cl">+----+--------------------+-----------+-------+</span></span></code></pre></div><p><code>DEPENDENT SUBQUERY</code> 是危險訊號。修法：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1">-- 手動改寫成 JOIN
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"></span><span class="k">SELECT</span><span class="w"> </span><span class="n">o</span><span class="p">.</span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">orders</span><span class="w"> </span><span class="n">o</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w"></span><span class="k">JOIN</span><span class="w"> </span><span class="n">customers</span><span class="w"> </span><span class="k">c</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">o</span><span class="p">.</span><span class="n">customer_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">c</span><span class="p">.</span><span class="n">id</span><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w"></span><span class="k">WHERE</span><span class="w"> </span><span class="k">c</span><span class="p">.</span><span class="n">region</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;TW&#39;</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="k">c</span><span class="p">.</span><span class="n">vip_level</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="mi">3</span><span class="p">;</span></span></span></code></pre></div><p>或用 <code>EXISTS</code>（部分 case 比 <code>IN</code> plan 好）：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">orders</span><span class="w"> </span><span class="n">o</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w"></span><span class="k">WHERE</span><span class="w"> </span><span class="k">EXISTS</span><span class="w"> </span><span class="p">(</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w">    </span><span class="k">SELECT</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">customers</span><span class="w"> </span><span class="k">c</span><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w">    </span><span class="k">WHERE</span><span class="w"> </span><span class="k">c</span><span class="p">.</span><span class="n">id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">o</span><span class="p">.</span><span class="n">customer_id</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="k">c</span><span class="p">.</span><span class="n">region</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;TW&#39;</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="k">c</span><span class="p">.</span><span class="n">vip_level</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="mi">3</span><span class="w">
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="w"></span><span class="p">);</span></span></span></code></pre></div><p>不同寫法 plan 差異需用 EXPLAIN 驗證、不能假設「JOIN 一定比 IN 快」。</p>
<h3 id="case-42-秒--100ms--derived-table-沒-materialize">Case 4：2 秒 → 100ms — Derived table 沒 materialize</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">orders</span><span class="w"> </span><span class="n">o</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w"></span><span class="k">JOIN</span><span class="w"> </span><span class="p">(</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w">    </span><span class="k">SELECT</span><span class="w"> </span><span class="n">customer_id</span><span class="p">,</span><span class="w"> </span><span class="k">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">order_count</span><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w">    </span><span class="k">FROM</span><span class="w"> </span><span class="n">orders</span><span class="w">
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="w">    </span><span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">customer_id</span><span class="w">
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="w"></span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">counts</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">o</span><span class="p">.</span><span class="n">customer_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">counts</span><span class="p">.</span><span class="n">customer_id</span><span class="w">
</span></span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="w"></span><span class="k">WHERE</span><span class="w"> </span><span class="n">counts</span><span class="p">.</span><span class="n">order_count</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span><span class="mi">10</span><span class="p">;</span></span></span></code></pre></div><p>5.6 之前 derived table（FROM subquery）每次 query 都 re-run、慢。5.7+ 有 <em>derived table materialization</em>、但 optimizer 有時不觸發。</p>
<p>EXPLAIN 顯示：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">+----+-------------+-------+------+
</span></span><span class="line"><span class="ln">2</span><span class="cl">| id | select_type | table | type |
</span></span><span class="line"><span class="ln">3</span><span class="cl">+----+-------------+-------+------+
</span></span><span class="line"><span class="ln">4</span><span class="cl">|  1 | PRIMARY     | o     | ALL  |
</span></span><span class="line"><span class="ln">5</span><span class="cl">|  2 | DERIVED     | orders| ALL  |  -- 沒 materialize、每次 join 都跑
</span></span><span class="line"><span class="ln">6</span><span class="cl">+----+-------------+-------+------+</span></span></code></pre></div><p>修法：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1">-- 顯式用 CTE + 改寫
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"></span><span class="k">WITH</span><span class="w"> </span><span class="n">counts</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w">    </span><span class="k">SELECT</span><span class="w"> </span><span class="n">customer_id</span><span class="p">,</span><span class="w"> </span><span class="k">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">order_count</span><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w">    </span><span class="k">FROM</span><span class="w"> </span><span class="n">orders</span><span class="w"> </span><span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">customer_id</span><span class="w">
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="w"></span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="w"></span><span class="k">SELECT</span><span class="w"> </span><span class="n">o</span><span class="p">.</span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">orders</span><span class="w"> </span><span class="n">o</span><span class="w">
</span></span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="w"></span><span class="k">JOIN</span><span class="w"> </span><span class="n">counts</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">o</span><span class="p">.</span><span class="n">customer_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">counts</span><span class="p">.</span><span class="n">customer_id</span><span class="w">
</span></span></span><span class="line"><span class="ln">8</span><span class="cl"><span class="w"></span><span class="k">WHERE</span><span class="w"> </span><span class="n">counts</span><span class="p">.</span><span class="n">order_count</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span><span class="mi">10</span><span class="p">;</span></span></span></code></pre></div><p>但記得 MySQL CTE 也不 materialize 預設、可能要 <em>temporary table</em> 才強制 cache：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">CREATE</span><span class="w"> </span><span class="k">TEMPORARY</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">counts</span><span class="w"> </span><span class="k">AS</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w"></span><span class="k">SELECT</span><span class="w"> </span><span class="n">customer_id</span><span class="p">,</span><span class="w"> </span><span class="k">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">order_count</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">orders</span><span class="w"> </span><span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">customer_id</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w"></span><span class="k">SELECT</span><span class="w"> </span><span class="n">o</span><span class="p">.</span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">orders</span><span class="w"> </span><span class="n">o</span><span class="w"> </span><span class="k">JOIN</span><span class="w"> </span><span class="n">counts</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">o</span><span class="p">.</span><span class="n">customer_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">counts</span><span class="p">.</span><span class="n">customer_id</span><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w"></span><span class="k">WHERE</span><span class="w"> </span><span class="n">counts</span><span class="p">.</span><span class="n">order_count</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span><span class="mi">10</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="w"></span><span class="k">DROP</span><span class="w"> </span><span class="k">TEMPORARY</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">counts</span><span class="p">;</span></span></span></code></pre></div><h3 id="case-510-秒--100ms--optimizer-選-index-不對">Case 5：10 秒 → 100ms — Optimizer 選 index 不對</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">users</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">age</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span><span class="mi">30</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="n">active</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span></span></span></code></pre></div><p><code>users</code> 有 <code>idx_active</code> (selectivity 高) 跟 <code>idx_age</code> (selectivity 低)。Optimizer 選 <code>idx_age</code>、scan 60% rows、慢。</p>
<p>EXPLAIN：<code>key: idx_age</code> — 但 active=1 filter 後 row 量 &lt; 5%。</p>
<p>修法選一：</p>
<ol>
<li>
<p><strong>Index hint 強制</strong>：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">users</span><span class="w"> </span><span class="n">USE</span><span class="w"> </span><span class="k">INDEX</span><span class="w"> </span><span class="p">(</span><span class="n">idx_active</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w"></span><span class="k">WHERE</span><span class="w"> </span><span class="n">age</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span><span class="mi">30</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="n">active</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span></span></span></code></pre></div></li>
<li>
<p><strong>Composite index 取代</strong>：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">ALTER</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">users</span><span class="w"> </span><span class="k">ADD</span><span class="w"> </span><span class="k">INDEX</span><span class="w"> </span><span class="n">idx_active_age</span><span class="w"> </span><span class="p">(</span><span class="n">active</span><span class="p">,</span><span class="w"> </span><span class="n">age</span><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w"></span><span class="k">DROP</span><span class="w"> </span><span class="k">INDEX</span><span class="w"> </span><span class="n">idx_age</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">users</span><span class="p">;</span></span></span></code></pre></div></li>
<li>
<p><strong>Optimizer hint (8.0+)</strong>：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">SELECT</span><span class="w"> </span><span class="cm">/*+ INDEX(users idx_active) */</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">users</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w"></span><span class="k">WHERE</span><span class="w"> </span><span class="n">age</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span><span class="mi">30</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="n">active</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span></span></span></code></pre></div></li>
</ol>
<p>Composite index 是最持久解（不依賴 hint）。Index hint 是 quick fix、但對 future schema change 脆弱。</p>
<h2 id="explain-三層工具">EXPLAIN 三層工具</h2>
<h3 id="tool-1explain--query-plan-preview">Tool 1：EXPLAIN — query plan preview</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">EXPLAIN</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="p">...;</span></span></span></code></pre></div><p>輸出每個 step 的 <em>估計</em> cost / row count / key used。<strong>用於 quick check plan 形狀</strong>。</p>
<p>關鍵欄位：</p>
<ul>
<li><code>type</code>：access type（ALL &lt; index &lt; range &lt; ref &lt; eq_ref &lt; const）、ALL / index 是警訊</li>
<li><code>key</code>：實際選的 index、可能跟 <code>possible_keys</code> 不同</li>
<li><code>rows</code>：估計 scan row 數</li>
<li><code>Extra</code>：<code>Using filesort</code> / <code>Using temporary</code> / <code>Using index condition</code> 等行為標記</li>
</ul>
<h3 id="tool-2explain-analyze--實際執行統計">Tool 2：EXPLAIN ANALYZE — 實際執行統計</h3>
<p>8.0+ 加的。差別：實際 run query、回實際 row count / time、跟 estimate 對比。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">EXPLAIN</span><span class="w"> </span><span class="k">ANALYZE</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="p">...;</span></span></span></code></pre></div><p>輸出格式（tree format）：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">-&gt; Nested loop inner join  (cost=2.4e6 rows=100000) (actual time=0.05..3.2 rows=10000 loops=1)
</span></span><span class="line"><span class="ln">2</span><span class="cl">    -&gt; Index range scan on orders using idx_created (cost=2.4e6 rows=10000) (actual time=0.04..3.0 rows=10000 loops=1)
</span></span><span class="line"><span class="ln">3</span><span class="cl">    -&gt; Single-row index lookup on customers using PRIMARY (cost=1 rows=1) (actual time=0.0001..0.0001 rows=1 loops=10000)</span></span></code></pre></div><p>關鍵：對比 <code>cost / rows</code>（estimate） vs <code>actual time / rows</code>。如果 estimate=100K / actual=10M、optimizer 嚴重低估、可能選錯 plan。</p>
<h3 id="tool-3optimizer-trace--看-optimizer-為何選這個-plan">Tool 3：Optimizer Trace — 看 optimizer 為何選這個 plan</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">SET</span><span class="w"> </span><span class="n">optimizer_trace</span><span class="o">=</span><span class="s1">&#39;enabled=on&#39;</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w"></span><span class="k">SELECT</span><span class="w"> </span><span class="p">...;</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w"></span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">information_schema</span><span class="p">.</span><span class="n">optimizer_trace</span><span class="p">;</span></span></span></code></pre></div><p>輸出 JSON、列每個 step optimizer 考慮過的 plan + cost estimate + 為什麼選最終 plan。<strong>用於：optimizer 行為跟你預期不符時、debug 為什麼</strong>。</p>
<p>複雜 query 的 optimizer trace 可能 100+ KB、要熟讀 JSON 結構。production debug tool、不是常規 tool。</p>
<h2 id="optimizer-hint-vs-index-hint">Optimizer hint vs Index hint</h2>
<p>兩種 hint、語法不同、行為不同：</p>
<h3 id="index-hint5x-就有">Index hint（5.x 就有）</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">SELECT</span><span class="w"> </span><span class="p">...</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="k">table</span><span class="w"> </span><span class="n">USE</span><span class="w"> </span><span class="k">INDEX</span><span class="w"> </span><span class="p">(</span><span class="n">idx_name</span><span class="p">)</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="p">...;</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w"></span><span class="k">SELECT</span><span class="w"> </span><span class="p">...</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="k">table</span><span class="w"> </span><span class="k">FORCE</span><span class="w"> </span><span class="k">INDEX</span><span class="w"> </span><span class="p">(</span><span class="n">idx_name</span><span class="p">)</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="p">...;</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w"></span><span class="k">SELECT</span><span class="w"> </span><span class="p">...</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="k">table</span><span class="w"> </span><span class="k">IGNORE</span><span class="w"> </span><span class="k">INDEX</span><span class="w"> </span><span class="p">(</span><span class="n">idx_name</span><span class="p">)</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="p">...;</span></span></span></code></pre></div><ul>
<li><code>USE INDEX</code>：建議 optimizer 用這 index、但 optimizer 仍可拒絕</li>
<li><code>FORCE INDEX</code>：強制用、optimizer 不能拒絕</li>
<li><code>IGNORE INDEX</code>：禁止用</li>
</ul>
<p><strong>問題</strong>：</p>
<ul>
<li>對 table name 寫死、refactor / partition 時容易斷</li>
<li><code>FORCE</code> 太強、可能讓 optimizer 跑得比沒 hint 更慢（forced index 不是最佳 plan）</li>
</ul>
<h3 id="optimizer-hint80">Optimizer hint（8.0+）</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">SELECT</span><span class="w"> </span><span class="cm">/*+ INDEX(table_name idx_name) */</span><span class="w"> </span><span class="p">...</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="k">table</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="p">...;</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w"></span><span class="k">SELECT</span><span class="w"> </span><span class="cm">/*+ JOIN_ORDER(t1, t2, t3) */</span><span class="w"> </span><span class="p">...</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">t1</span><span class="p">,</span><span class="w"> </span><span class="n">t2</span><span class="p">,</span><span class="w"> </span><span class="n">t3</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="p">...;</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w"></span><span class="k">SELECT</span><span class="w"> </span><span class="cm">/*+ HASH_JOIN(t1 t2) */</span><span class="w"> </span><span class="p">...</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">t1</span><span class="w"> </span><span class="k">JOIN</span><span class="w"> </span><span class="n">t2</span><span class="w"> </span><span class="p">...;</span><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w"></span><span class="k">SELECT</span><span class="w"> </span><span class="cm">/*+ NO_INDEX_MERGE(table) */</span><span class="w"> </span><span class="p">...</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="k">table</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="p">...;</span></span></span></code></pre></div><ul>
<li>更細粒度（join order / join method / index 選擇分開）</li>
<li>注入 query comment 內、不污染 SQL syntax</li>
<li>比 index hint 安全：optimizer 看 hint 但仍走 plan space search</li>
</ul>
<p><strong>推薦</strong>：</p>
<ul>
<li>8.0+ 用 optimizer hint</li>
<li>5.7 仍用 index hint、但謹慎 — 觀察 hint 加上去後 <em>實際 plan</em> 是否真的好</li>
</ul>
<h2 id="5-個-production-踩雷">5 個 Production 踩雷</h2>
<h3 id="1-statistics-過時--optimizer-估錯-row-count">1. Statistics 過時 — optimizer 估錯 row count</h3>
<p><code>information_schema.STATISTICS</code> 紀錄每個 index 的 cardinality。如果 <em>過 1 個月沒 ANALYZE</em>、statistics 跟實際資料 distribution 嚴重偏差、optimizer 估計錯。</p>
<p>修法：</p>
<ul>
<li>定期跑 <code>ANALYZE TABLE</code>（大表改 nightly cron）</li>
<li>8.0+ <code>innodb_stats_auto_recalc=ON</code> 預設、但變更超過 10% row 才觸發</li>
<li>設 <code>innodb_stats_persistent=ON</code>（預設、把 statistics 存 disk）+ <code>innodb_stats_persistent_sample_pages=20</code>（提高 sample 精度）</li>
</ul>
<h3 id="2-forced-index-用錯--hint-比沒-hint-還慢">2. Forced index 用錯 — Hint 比沒 hint 還慢</h3>
<p><code>FORCE INDEX (idx)</code> 強制 optimizer 用、但 <em>idx 不是最佳</em> 時、query 變慢。常見：開發 staging 試出 <code>FORCE INDEX</code> 有效、production 資料 distribution 不同、forced index 反而慢。</p>
<p>修法：</p>
<ul>
<li>用 <code>USE INDEX</code> 而不是 <code>FORCE INDEX</code>（optimizer 仍可換）</li>
<li>不依賴 hint、用 composite index / 重寫 query 達到目的</li>
<li>已用 hint 的 query 進 <em>staging review 機制</em>、確認 plan 仍合理</li>
</ul>
<h3 id="3-hash-join-沒觸發--equality-是-expression">3. Hash join 沒觸發 — Equality 是 expression</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">SELECT</span><span class="w"> </span><span class="p">...</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="k">JOIN</span><span class="w"> </span><span class="n">b</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">a</span><span class="p">.</span><span class="n">id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">b</span><span class="p">.</span><span class="n">parent_id</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span></span></span></code></pre></div><p><code>b.parent_id + 1</code> 是 expression、不是 raw column、optimizer 不選 hash join、用 nested loop。</p>
<p>修法：</p>
<ul>
<li>Schema 改：把 <code>parent_id + 1</code> 變成 <em>generated column</em></li>
<li>Query 改：JOIN 之前 <em>預計算 expression</em> 存 temp table</li>
<li>或 <code>/*+ HASH_JOIN(a b) */</code> 顯式（但 plan 仍可能拒絕）</li>
</ul>
<h3 id="4-range-scan-退化-all--cardinality-估計太低">4. Range scan 退化 ALL — Cardinality 估計太低</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">SELECT</span><span class="w"> </span><span class="p">...</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">t</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">col</span><span class="w"> </span><span class="k">IN</span><span class="w"> </span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="mi">2</span><span class="p">,</span><span class="w"> </span><span class="mi">3</span><span class="p">,</span><span class="w"> </span><span class="p">...,</span><span class="w"> </span><span class="mi">1000</span><span class="p">);</span></span></span></code></pre></div><p><code>IN</code> 1000 value、optimizer 預估「range scan 太多 lookup、不如 ALL」、選 full table scan。對 <em>中型表</em>（1M row）通常 IN 仍快、但 optimizer 估錯。</p>
<p>修法：</p>
<ul>
<li>
<p><code>IN</code> 拆成 <em>temp table JOIN</em>：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">CREATE</span><span class="w"> </span><span class="k">TEMPORARY</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">in_values</span><span class="w"> </span><span class="p">(</span><span class="n">val</span><span class="w"> </span><span class="nb">INT</span><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w"></span><span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">in_values</span><span class="w"> </span><span class="k">VALUES</span><span class="w"> </span><span class="p">(</span><span class="mi">1</span><span class="p">),</span><span class="w"> </span><span class="p">(</span><span class="mi">2</span><span class="p">),</span><span class="w"> </span><span class="p">...,</span><span class="w"> </span><span class="p">(</span><span class="mi">1000</span><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w"></span><span class="k">SELECT</span><span class="w"> </span><span class="n">t</span><span class="p">.</span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">t</span><span class="w"> </span><span class="k">JOIN</span><span class="w"> </span><span class="n">in_values</span><span class="w"> </span><span class="n">iv</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">t</span><span class="p">.</span><span class="n">col</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">iv</span><span class="p">.</span><span class="n">val</span><span class="p">;</span></span></span></code></pre></div></li>
<li>
<p>或 <code>optimizer_switch='index_merge=on'</code>（multi-value IN 可能走 index merge）</p>
</li>
<li>
<p>或大 <code>IN</code> 改 application 層拆批 query</p>
</li>
</ul>
<h3 id="5-derived-table-materialization-off--重複-scan">5. Derived table materialization off — 重複 scan</h3>
<p><code>optimizer_switch='derived_merge=on'</code>（預設 ON、derived table 自動 inline merge）某些 query 反而慢（merge 後 plan 變複雜）。或 <em>反向問題</em>：derived table <em>沒</em> materialize、每次都 re-run。</p>
<p>修法：</p>
<ul>
<li>看 EXPLAIN 是否有 <code>DERIVED</code> row、確認 materialization 行為</li>
<li>可 <code>optimizer_switch='derived_merge=off'</code> 強制 materialize（影響整個 connection、謹慎用）</li>
<li>大 derived table 改 explicit <em>temporary table</em> 完全控制</li>
</ul>
<h2 id="跟-postgresql-explain-對比">跟 PostgreSQL EXPLAIN 對比</h2>
<table>
  <thead>
      <tr>
          <th>工具</th>
          <th>MySQL</th>
          <th>PostgreSQL</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Query plan preview</td>
          <td><code>EXPLAIN</code></td>
          <td><code>EXPLAIN</code></td>
      </tr>
      <tr>
          <td>實際執行統計</td>
          <td><code>EXPLAIN ANALYZE</code> (8.0+)</td>
          <td><code>EXPLAIN ANALYZE</code></td>
      </tr>
      <tr>
          <td>Optimizer 內部 trace</td>
          <td>optimizer_trace (JSON)</td>
          <td><code>auto_explain</code> extension</td>
      </tr>
      <tr>
          <td>Format</td>
          <td>TABLE / JSON / TREE</td>
          <td>TEXT / JSON / XML / YAML</td>
      </tr>
      <tr>
          <td>Parallel query plan</td>
          <td>受限（8.0 限 hash join）</td>
          <td>Full（PG 10+ parallel scan / aggregate / join）</td>
      </tr>
      <tr>
          <td>Index merge</td>
          <td>有</td>
          <td>有 (<code>bitmap index scan</code>)</td>
      </tr>
      <tr>
          <td>Genetic Query Optimizer</td>
          <td>無</td>
          <td>PG 有（適合 &gt; 12 table JOIN）</td>
      </tr>
      <tr>
          <td>Cost estimate accuracy</td>
          <td>中（histograms 8.0+）</td>
          <td>高（成熟 statistics）</td>
      </tr>
  </tbody>
</table>
<p>PG optimizer 整體更成熟、複雜 OLAP-style query plan 更穩定。MySQL 8.0 補了不少（histograms、hash join、derived table merge）、簡單 OLTP query 已 OK、複雜 query 仍弱。</p>
<h2 id="跟其他模組整合">跟其他模組整合</h2>
<h3 id="跟-modern-sql-features">跟 Modern SQL Features</h3>
<p>CTE / window function / lateral / hash join 都改變 query plan space、optimizer 跟著要識別新 pattern。8.0 optimizer 對新 SQL feature plan 仍有改進空間。詳見 <a href="/blog/backend/01-database/vendors/mysql/modern-sql-features/" data-link-title="MySQL 8.0 Modern SQL：CTE / window function / JSON_TABLE 不是「終於跟上 PG」、是進入 SQL 工程深度的入場券" data-link-desc="MySQL 8.0 在 SQL 特性上 *終於補齊* CTE、window function、lateral derived table、JSON_TABLE、hash join 等現代 SQL 特性。本文走 5 個關鍵特性、各自實際 production 場景、跟 PostgreSQL 對應特性的行為差異（特別是 JSON_TABLE vs PG JSONB / jsonb_path_query）、配置 / migration 注意事項、5 production 踩雷（CTE 不 materialize / window function 大量 sort spill / JSON_TABLE 跟 generated column 取捨 / hash join 預設沒開 / recursive CTE 深度上限）">Modern SQL Features</a>。</p>
<h3 id="跟-innodb-tuning">跟 InnoDB Tuning</h3>
<p>Query plan 受 <em>buffer pool hit rate</em> 影響 — optimizer 假設 random IO cost、實際資料在 buffer pool 內讀取快。Buffer pool 不夠時 plan estimate 失真。詳見 <a href="/blog/backend/01-database/vendors/mysql/innodb-tuning/" data-link-title="MySQL InnoDB Tuning：為什麼一個 100 GB DB 在 64 GB RAM server 上 query 慢 5 倍" data-link-desc="InnoDB 是 MySQL 預設 storage engine、預設值給 256 MB buffer pool（早期 default）。本文從一個常見痛點開場（DB &gt; RAM 但 server 仍 swap）、走 4 個 critical knob（buffer pool / redo log / flush method / IO capacity）、各自如何影響讀寫吞吐、配置 step-by-step、5 production 踩雷（buffer pool warm-up / log file 大小 / 設 sync_binlog=0 換速度 / IO scheduler / undo log 膨脹）、跟 SSD / NVMe / EBS 的 IO 假設">InnoDB Tuning</a>。</p>
<h3 id="跟-proxysql">跟 ProxySQL</h3>
<p>ProxySQL query rule 不影響 optimizer plan、但可以 <em>rewrite query</em>（rule engine 的 <code>replace_pattern</code>）— 用於把 application 寫不好的 query 改成 optimizer-friendly 形式、application 不必改。詳見 <a href="/blog/backend/01-database/vendors/mysql/proxysql-config/" data-link-title="MySQL ProxySQL 配置：connection / query / route / response 四段 lifecycle 跟 query rule 設計" data-link-desc="ProxySQL 是 MySQL 生態的 connection pool &#43; query routing 標準。本文走 connection → query parse → route → response 四段 lifecycle、query rule engine 的 rule chain 設計、Hostgroup / Server / User 三層 schema、配置 step-by-step（讀寫分離 &#43; replica lag-aware routing）、5 production 踩雷（query rule 順序錯亂 / connection 漂移 / write 路由到 replica / runtime / disk schema drift / mirror traffic 副作用）、跟 Replication / Orchestrator / HAProxy 整合">ProxySQL 配置</a>。</p>
<h3 id="跟-lock-contention">跟 Lock Contention</h3>
<p>Slow query 持有 lock 久、其他 query wait、整個 cluster lock contention 爆。Query optimization 不只是 latency 問題、也是 <em>lock 影響範圍</em> 問題。詳見 <em>Lock Contention deep dive</em> 篇（待寫）。</p>
<h3 id="跟-partitioning">跟 Partitioning</h3>
<p>Partition pruning 是 optimizer 決定的、<code>EXPLAIN PARTITIONS</code> 看 partition 命中。partition + index 組合可能比 single big table + index 慢（cross-partition query overhead）。詳見 <em>Partitioning</em> 篇（待寫）。</p>
<h2 id="觀測-metric">觀測 metric</h2>
<p>Production 持續 monitor：</p>
<ul>
<li><code>Performance_schema.events_statements_summary_by_digest</code>：每個 query digest 的累計 time / row examined / row sent</li>
<li><code>slow_query_log</code>：slow query 進 log 檔（<code>long_query_time=1</code>）</li>
<li><code>sys.statements_with_full_table_scans</code>：列 query 用 full scan 的歷史</li>
<li><code>sys.schema_unused_indexes</code>：列從未用過的 index、可以 drop 省 write cost</li>
</ul>
<p>把這些丟進 Datadog / Percona Monitoring &amp; Management 做 trend analysis。</p>
<h2 id="相關連結">相關連結</h2>
<ul>
<li><a href="/blog/backend/01-database/vendors/mysql/" data-link-title="MySQL" data-link-desc="高併發網路服務常用關聯式資料庫、Vitess / PlanetScale 分片生態、GitHub / Shopify / Facebook 規模驗證">MySQL vendor overview</a></li>
<li><a href="/blog/backend/01-database/vendors/mysql/modern-sql-features/" data-link-title="MySQL 8.0 Modern SQL：CTE / window function / JSON_TABLE 不是「終於跟上 PG」、是進入 SQL 工程深度的入場券" data-link-desc="MySQL 8.0 在 SQL 特性上 *終於補齊* CTE、window function、lateral derived table、JSON_TABLE、hash join 等現代 SQL 特性。本文走 5 個關鍵特性、各自實際 production 場景、跟 PostgreSQL 對應特性的行為差異（特別是 JSON_TABLE vs PG JSONB / jsonb_path_query）、配置 / migration 注意事項、5 production 踩雷（CTE 不 materialize / window function 大量 sort spill / JSON_TABLE 跟 generated column 取捨 / hash join 預設沒開 / recursive CTE 深度上限）">MySQL Modern SQL Features</a>（hash join / window / CTE 的 plan 議題）</li>
<li><a href="/blog/backend/01-database/vendors/mysql/innodb-tuning/" data-link-title="MySQL InnoDB Tuning：為什麼一個 100 GB DB 在 64 GB RAM server 上 query 慢 5 倍" data-link-desc="InnoDB 是 MySQL 預設 storage engine、預設值給 256 MB buffer pool（早期 default）。本文從一個常見痛點開場（DB &gt; RAM 但 server 仍 swap）、走 4 個 critical knob（buffer pool / redo log / flush method / IO capacity）、各自如何影響讀寫吞吐、配置 step-by-step、5 production 踩雷（buffer pool warm-up / log file 大小 / 設 sync_binlog=0 換速度 / IO scheduler / undo log 膨脹）、跟 SSD / NVMe / EBS 的 IO 假設">MySQL InnoDB Tuning</a>（buffer pool 對 plan estimate）</li>
<li><a href="/blog/backend/01-database/vendors/mysql/proxysql-config/" data-link-title="MySQL ProxySQL 配置：connection / query / route / response 四段 lifecycle 跟 query rule 設計" data-link-desc="ProxySQL 是 MySQL 生態的 connection pool &#43; query routing 標準。本文走 connection → query parse → route → response 四段 lifecycle、query rule engine 的 rule chain 設計、Hostgroup / Server / User 三層 schema、配置 step-by-step（讀寫分離 &#43; replica lag-aware routing）、5 production 踩雷（query rule 順序錯亂 / connection 漂移 / write 路由到 replica / runtime / disk schema drift / mirror traffic 副作用）、跟 Replication / Orchestrator / HAProxy 整合">MySQL ProxySQL 配置</a>（query rewrite 整合）</li>
<li><a href="/blog/backend/01-database/vendors/mysql/online-schema-change-tools/" data-link-title="MySQL Online Schema Change：gh-ost 跟 pt-online-schema-change 兩條完全不同的 ghost table 路徑" data-link-desc="MySQL ALTER TABLE 可能鎖整張表，production 需要 online schema change 流程。gh-ost（GitHub）跟 pt-online-schema-change（Percona）都用 ghost table 解決、但底層機制完全不同：pt-osc 用 trigger 同步、gh-ost 用 binlog stream 同步。本文走兩工具機制對照表 → trigger vs binlog 各自取捨 → 配置 step-by-step → 5 production 踩雷（trigger overhead / binlog 延遲 / FK constraint / hot trigger lock / 切換瞬間 deadlock）→ 何時用哪一個">MySQL Online Schema Change Tools</a>（add index 走 OSC）</li>
<li><a href="/blog/backend/01-database/vendors/postgresql/query-optimization/" data-link-title="PostgreSQL Query Optimization：EXPLAIN ANALYZE / pg_hint_plan / auto_explain 三層工具跟 4 個 case" data-link-desc="PG query 慢的根因常是 *planner 選錯 plan 或 statistics 過時*。本文從 4 個 production case 開場（seq scan vs index / hash vs nested loop / 多 column 統計缺 / parallel query 沒觸發）、走 EXPLAIN / EXPLAIN ANALYZE / auto_explain 三層工具、pg_hint_plan extension 跟 planner GUC 取捨、5 production 踩雷（ANALYZE 過時 / multi-column statistics / cost-base setting 不對齊硬體 / random_page_cost SSD 沒調 / parallel query 配置）、跟 MySQL query-optimization sibling 對比">PostgreSQL Query Optimization</a>（PG sibling、EXPLAIN ANALYZE / pg_hint_plan / auto_explain 三層工具）</li>
<li><a href="/blog/backend/01-database/vendors/postgresql/index-selection/" data-link-title="PostgreSQL Index Selection：B-tree / GIN / GiST / BRIN / Hash 對應 workload 的決策樹" data-link-desc="PG 有 6 種 index method（B-tree / Hash / GIN / GiST / SP-GiST / BRIN）跟 partial / expression / covering 三種變體、不是「都用 B-tree 就好」。每種 index 有自己的 query pattern、儲存代價、write amplification 跟 maintenance 成本。本文走 6 種 index 的適用 workload 對照、決策樹、partial / expression / covering / multi-column 變體、5 production 踩雷（過度 index / partial 條件不對 / B-tree 對 JSON 無效 / BRIN 對非 correlated 資料無效 / multi-column 順序錯）、跟 query-optimization 的 EXPLAIN 互補">PostgreSQL Index Selection</a>（B-tree / GIN / GiST / BRIN 決策樹 vs MySQL B-tree only）</li>
<li><a href="/blog/backend/01-database/vendors/postgresql/" data-link-title="PostgreSQL" data-link-desc="多用途 OLTP 主流關聯式資料庫、MVCC、豐富 SQL 特性、是 Aurora / Cosmos DB / Spanner / CockroachDB / Aurora DSQL 的相容目標">PostgreSQL vendor page</a>（EXPLAIN ANALYZE 對比）</li>
<li>官方：<a href="https://dev.mysql.com/doc/refman/8.0/en/optimization.html">MySQL Optimization</a> / <a href="https://dev.mysql.com/doc/refman/8.0/en/optimizer-hints.html">Optimizer Hints</a></li>
</ul>
]]></content:encoded></item><item><title>PostgreSQL Query Optimization：EXPLAIN ANALYZE / pg_hint_plan / auto_explain 三層工具跟 4 個 case</title><link>https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/query-optimization/</link><pubDate>Tue, 19 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/query-optimization/</guid><description>&lt;blockquote>
&lt;p>本文是 &lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/" data-link-title="PostgreSQL" data-link-desc="多用途 OLTP 主流關聯式資料庫、MVCC、豐富 SQL 特性、是 Aurora / Cosmos DB / Spanner / CockroachDB / Aurora DSQL 的相容目標">PostgreSQL&lt;/a> overview 的 implementation-layer deep article。Overview 已說明 PG 在 OLTP 譜系的定位、本文聚焦 &lt;em>query optimization&lt;/em> — EXPLAIN ANALYZE / auto_explain / pg_hint_plan 三層工具跟 4 個實際 case。&lt;/p>&lt;/blockquote>
&lt;hr>
&lt;h2 id="4-個常見-production-case">4 個常見 production case&lt;/h2>
&lt;p>PG query 慢的 root cause 多數是 &lt;em>planner 選錯 plan&lt;/em>。從以下 4 個 case 進入 query optimization：&lt;/p>
&lt;h3 id="case-15-秒--50ms--seq-scan-vs-index">Case 1：5 秒 → 50ms — Seq scan vs index&lt;/h3>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-sql" data-lang="sql">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="c1">-- 慢 (5 秒)
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="k">SELECT&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">o&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="n">id&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">o&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="n">amount&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">c&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="n">name&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="k">FROM&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">orders&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">o&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">JOIN&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">customers&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">c&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">ON&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">o&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="n">customer_id&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">c&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="n">id&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="k">WHERE&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">c&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="n">region&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s1">&amp;#39;TW&amp;#39;&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">AND&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">o&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="n">created_at&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s1">&amp;#39;2026-05-01&amp;#39;&lt;/span>&lt;span class="p">;&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;code>EXPLAIN (ANALYZE, BUFFERS)&lt;/code>：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-text" data-lang="text">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">Hash Join (cost=20000..50000 rows=100 width=...) (actual time=4900..5000 rows=10000)
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl"> -&amp;gt; Seq Scan on customers c (cost=0..20000 rows=1000000 width=...)
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl"> Filter: (region = &amp;#39;TW&amp;#39;)
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl"> Rows Removed by Filter: 900000
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl"> -&amp;gt; Hash (cost=...)
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">6&lt;/span>&lt;span class="cl"> -&amp;gt; Index Scan on orders_created_idx&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>問題：&lt;code>customers.region&lt;/code> 沒 index、planner 選 seq scan、實際 region=TW 只 10% row。修法：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-sql" data-lang="sql">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="k">CREATE&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">INDEX&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">CONCURRENTLY&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">idx_customers_region&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">ON&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">customers&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">region&lt;/span>&lt;span class="p">);&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="k">ANALYZE&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">customers&lt;/span>&lt;span class="p">;&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="c1">-- 更新 statistics、讓 planner 看到新 index&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>加完 5 秒降 50ms。&lt;/p>
&lt;h3 id="case-230-秒--200ms--hash-join-沒觸發用-nested-loop">Case 2：30 秒 → 200ms — Hash join 沒觸發、用 nested loop&lt;/h3>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-sql" data-lang="sql">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="k">SELECT&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">u&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="n">name&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">count&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">o&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="n">id&lt;/span>&lt;span class="p">)&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="k">FROM&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">users&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">u&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">LEFT&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">JOIN&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">orders&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">o&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">ON&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">o&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="n">user_id&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">u&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="n">id&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="k">GROUP&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">BY&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">u&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="n">name&lt;/span>&lt;span class="p">;&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>EXPLAIN ANALYZE 顯示 &lt;em>Nested Loop&lt;/em> 跑 1M 次 inner loop、執行 30 秒。Planner 估錯 row count、選 nested loop。Hash join 應該 &amp;lt; 200ms。&lt;/p></description><content:encoded><![CDATA[<blockquote>
<p>本文是 <a href="/blog/backend/01-database/vendors/postgresql/" data-link-title="PostgreSQL" data-link-desc="多用途 OLTP 主流關聯式資料庫、MVCC、豐富 SQL 特性、是 Aurora / Cosmos DB / Spanner / CockroachDB / Aurora DSQL 的相容目標">PostgreSQL</a> overview 的 implementation-layer deep article。Overview 已說明 PG 在 OLTP 譜系的定位、本文聚焦 <em>query optimization</em> — EXPLAIN ANALYZE / auto_explain / pg_hint_plan 三層工具跟 4 個實際 case。</p></blockquote>
<hr>
<h2 id="4-個常見-production-case">4 個常見 production case</h2>
<p>PG query 慢的 root cause 多數是 <em>planner 選錯 plan</em>。從以下 4 個 case 進入 query optimization：</p>
<h3 id="case-15-秒--50ms--seq-scan-vs-index">Case 1：5 秒 → 50ms — Seq scan vs index</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1">-- 慢 (5 秒)
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"></span><span class="k">SELECT</span><span class="w"> </span><span class="n">o</span><span class="p">.</span><span class="n">id</span><span class="p">,</span><span class="w"> </span><span class="n">o</span><span class="p">.</span><span class="n">amount</span><span class="p">,</span><span class="w"> </span><span class="k">c</span><span class="p">.</span><span class="n">name</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w"></span><span class="k">FROM</span><span class="w"> </span><span class="n">orders</span><span class="w"> </span><span class="n">o</span><span class="w"> </span><span class="k">JOIN</span><span class="w"> </span><span class="n">customers</span><span class="w"> </span><span class="k">c</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">o</span><span class="p">.</span><span class="n">customer_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">c</span><span class="p">.</span><span class="n">id</span><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w"></span><span class="k">WHERE</span><span class="w"> </span><span class="k">c</span><span class="p">.</span><span class="n">region</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;TW&#39;</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="n">o</span><span class="p">.</span><span class="n">created_at</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span><span class="s1">&#39;2026-05-01&#39;</span><span class="p">;</span></span></span></code></pre></div><p><code>EXPLAIN (ANALYZE, BUFFERS)</code>：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">Hash Join  (cost=20000..50000 rows=100 width=...) (actual time=4900..5000 rows=10000)
</span></span><span class="line"><span class="ln">2</span><span class="cl">  -&gt;  Seq Scan on customers c  (cost=0..20000 rows=1000000 width=...)
</span></span><span class="line"><span class="ln">3</span><span class="cl">      Filter: (region = &#39;TW&#39;)
</span></span><span class="line"><span class="ln">4</span><span class="cl">      Rows Removed by Filter: 900000
</span></span><span class="line"><span class="ln">5</span><span class="cl">  -&gt;  Hash  (cost=...)
</span></span><span class="line"><span class="ln">6</span><span class="cl">      -&gt;  Index Scan on orders_created_idx</span></span></code></pre></div><p>問題：<code>customers.region</code> 沒 index、planner 選 seq scan、實際 region=TW 只 10% row。修法：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">CREATE</span><span class="w"> </span><span class="k">INDEX</span><span class="w"> </span><span class="n">CONCURRENTLY</span><span class="w"> </span><span class="n">idx_customers_region</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">customers</span><span class="p">(</span><span class="n">region</span><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w"></span><span class="k">ANALYZE</span><span class="w"> </span><span class="n">customers</span><span class="p">;</span><span class="w">  </span><span class="c1">-- 更新 statistics、讓 planner 看到新 index</span></span></span></code></pre></div><p>加完 5 秒降 50ms。</p>
<h3 id="case-230-秒--200ms--hash-join-沒觸發用-nested-loop">Case 2：30 秒 → 200ms — Hash join 沒觸發、用 nested loop</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">SELECT</span><span class="w"> </span><span class="n">u</span><span class="p">.</span><span class="n">name</span><span class="p">,</span><span class="w"> </span><span class="k">count</span><span class="p">(</span><span class="n">o</span><span class="p">.</span><span class="n">id</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w"></span><span class="k">FROM</span><span class="w"> </span><span class="n">users</span><span class="w"> </span><span class="n">u</span><span class="w"> </span><span class="k">LEFT</span><span class="w"> </span><span class="k">JOIN</span><span class="w"> </span><span class="n">orders</span><span class="w"> </span><span class="n">o</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">o</span><span class="p">.</span><span class="n">user_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">u</span><span class="p">.</span><span class="n">id</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w"></span><span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">u</span><span class="p">.</span><span class="n">name</span><span class="p">;</span></span></span></code></pre></div><p>EXPLAIN ANALYZE 顯示 <em>Nested Loop</em> 跑 1M 次 inner loop、執行 30 秒。Planner 估錯 row count、選 nested loop。Hash join 應該 &lt; 200ms。</p>
<p>修法：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">ANALYZE</span><span class="w"> </span><span class="n">users</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w"></span><span class="k">ANALYZE</span><span class="w"> </span><span class="n">orders</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w"></span><span class="c1">-- 提高 default_statistics_target 對 critical column
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="c1"></span><span class="k">ALTER</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">orders</span><span class="w"> </span><span class="k">ALTER</span><span class="w"> </span><span class="k">COLUMN</span><span class="w"> </span><span class="n">user_id</span><span class="w"> </span><span class="k">SET</span><span class="w"> </span><span class="k">STATISTICS</span><span class="w"> </span><span class="mi">1000</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="w"></span><span class="k">ANALYZE</span><span class="w"> </span><span class="n">orders</span><span class="p">;</span></span></span></code></pre></div><p>統計精度提升、planner 估 row count 準、自動切 hash join。</p>
<h3 id="case-38-秒--100ms--multi-column-統計缺">Case 3：8 秒 → 100ms — Multi-column 統計缺</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">orders</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">status</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;pending&#39;</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="n">region</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;TW&#39;</span><span class="p">;</span></span></span></code></pre></div><p><code>status = 'pending'</code> 5% row、<code>region = 'TW'</code> 10% row。Planner 假設兩 column 獨立、估 0.5% (5K row)。實際 status=&lsquo;pending&rsquo; 跟 region=&lsquo;TW&rsquo; 強相關（TW 訂單多 pending）、實際 4% (40K row)。Planner 估錯 8x、選錯 plan。</p>
<p>修法（PG 10+）：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">CREATE</span><span class="w"> </span><span class="k">STATISTICS</span><span class="w"> </span><span class="n">stats_orders_status_region</span><span class="w"> </span><span class="p">(</span><span class="n">dependencies</span><span class="p">,</span><span class="w"> </span><span class="n">ndistinct</span><span class="p">,</span><span class="w"> </span><span class="n">mcv</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w"></span><span class="k">ON</span><span class="w"> </span><span class="n">status</span><span class="p">,</span><span class="w"> </span><span class="n">region</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">orders</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w"></span><span class="k">ANALYZE</span><span class="w"> </span><span class="n">orders</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w"></span><span class="c1">-- 之後 planner 知道 status+region 相關度、估準</span></span></span></code></pre></div><h3 id="case-420-秒--5-秒--parallel-query-沒觸發">Case 4：20 秒 → 5 秒 — Parallel query 沒觸發</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">SELECT</span><span class="w"> </span><span class="n">region</span><span class="p">,</span><span class="w"> </span><span class="k">count</span><span class="p">(</span><span class="o">*</span><span class="p">),</span><span class="w"> </span><span class="k">sum</span><span class="p">(</span><span class="n">amount</span><span class="p">)</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">orders</span><span class="w"> </span><span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">region</span><span class="p">;</span></span></span></code></pre></div><p><code>orders</code> 100M row、預期 PG parallel scan + parallel aggregate、實際 single worker 跑 20 秒。</p>
<p>EXPLAIN：<code>Workers Planned: 0</code>。</p>
<p>修法：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-ini" data-lang="ini"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># postgresql.conf</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="na">max_parallel_workers_per_gather</span> <span class="o">=</span> <span class="s">4</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="na">max_parallel_workers</span> <span class="o">=</span> <span class="s">8</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="na">max_worker_processes</span> <span class="o">=</span> <span class="s">16</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="na">parallel_setup_cost</span> <span class="o">=</span> <span class="s">100        # 預設 1000、降低讓 planner 更敢 parallel</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="na">parallel_tuple_cost</span> <span class="o">=</span> <span class="s">0.01       # 預設 0.1</span></span></span></code></pre></div><p>並行後 5 秒。</p>
<h2 id="explain-三層工具">EXPLAIN 三層工具</h2>
<h3 id="tool-1explain--plan-preview">Tool 1：EXPLAIN — Plan preview</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">EXPLAIN</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="p">...;</span></span></span></code></pre></div><p>輸出每個 node 的 <em>估計</em> cost / row count / width。<strong>用於 quick plan check</strong>。</p>
<p>關鍵欄位：</p>
<ul>
<li><code>Plan node 類型</code>：<code>Seq Scan</code> &lt; <code>Index Scan</code> &lt; <code>Index Only Scan</code>、警訊看 <em>unexpected</em> node type</li>
<li><code>cost=START..END</code>：planner 估的 cost、START 是 startup cost、END 是 total</li>
<li><code>rows</code>：估計 output row 數</li>
<li><code>width</code>：每 row average byte（影響 sort / hash memory）</li>
</ul>
<h3 id="tool-2explain-analyze--實際執行--對比-estimate">Tool 2：EXPLAIN ANALYZE — 實際執行 + 對比 estimate</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">EXPLAIN</span><span class="w"> </span><span class="p">(</span><span class="k">ANALYZE</span><span class="p">,</span><span class="w"> </span><span class="n">BUFFERS</span><span class="p">,</span><span class="w"> </span><span class="k">VERBOSE</span><span class="p">)</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="p">...;</span></span></span></code></pre></div><p>差別：實際 <em>跑 query</em>、輸出實際 row count / time、跟 estimate 對比：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">Hash Join  (cost=20000..50000 rows=100) (actual time=400..500 rows=10000 loops=1)</span></span></code></pre></div><p><code>rows=100 (estimate)</code> vs <code>rows=10000 (actual)</code> — 估錯 100x、planner 可能選錯 plan。<code>BUFFERS</code> 顯示 disk read vs buffer cache hit。</p>
<p><strong>注意</strong>：EXPLAIN ANALYZE <em>實際跑 query</em>、修改性 query（UPDATE / DELETE）會真的改 data。讀 query 安全。修改性 query 包 transaction：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">BEGIN</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w"></span><span class="k">EXPLAIN</span><span class="w"> </span><span class="k">ANALYZE</span><span class="w"> </span><span class="k">UPDATE</span><span class="w"> </span><span class="n">orders</span><span class="w"> </span><span class="k">SET</span><span class="w"> </span><span class="n">status</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;x&#39;</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="p">...;</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w"></span><span class="k">ROLLBACK</span><span class="p">;</span></span></span></code></pre></div><h3 id="tool-3auto_explain--production-query-自動-capture">Tool 3：auto_explain — Production query 自動 capture</h3>
<p><code>auto_explain</code> extension 自動 log slow query 的 plan：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-ini" data-lang="ini"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># postgresql.conf</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="na">shared_preload_libraries</span> <span class="o">=</span> <span class="s">&#39;auto_explain&#39;</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="na">auto_explain.log_min_duration</span> <span class="o">=</span> <span class="s">&#39;1s&#39;    # 超過 1 秒 log plan</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="na">auto_explain.log_analyze</span> <span class="o">=</span> <span class="s">on            # 含 ANALYZE 統計</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="na">auto_explain.log_buffers</span> <span class="o">=</span> <span class="s">on</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="na">auto_explain.log_format</span> <span class="o">=</span> <span class="s">&#39;json&#39;         # JSON 格式給工具消費</span></span></span></code></pre></div><p>Production slow query 自動進 log、不必手動 EXPLAIN。組合 pg_stat_statements + auto_explain 是 PG 標準 query observability。</p>
<h2 id="pg_hint_plan-vs-planner-guc">pg_hint_plan vs Planner GUC</h2>
<p>PG 兩種方式 nudge planner：</p>
<h3 id="planner-gucglobal">Planner GUC（global）</h3>
<p><code>postgresql.conf</code> 內：</p>
<ul>
<li><code>enable_seqscan = off</code> — 禁用 seq scan（force index）</li>
<li><code>enable_nestloop = off</code> — 禁用 nested loop（force hash/merge join）</li>
<li><code>random_page_cost = 1.1</code> — SSD 設低（預設 4 是 HDD assumption）</li>
<li><code>effective_cache_size = '16GB'</code> — buffer pool + OS cache 估、影響 planner</li>
</ul>
<p>GUC 是 <em>global</em> — 影響所有 query。對 <em>單一 query 用 hint</em>：</p>
<h3 id="pg_hint_plan-extensionper-query-hint">pg_hint_plan extension（per-query hint）</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1">-- 強制特定 plan
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"></span><span class="cm">/*+ IndexScan(orders idx_orders_status) NestLoop(orders customers) */</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w"></span><span class="k">SELECT</span><span class="w"> </span><span class="p">...</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">orders</span><span class="w"> </span><span class="k">JOIN</span><span class="w"> </span><span class="n">customers</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="p">...;</span></span></span></code></pre></div><p>Hint 形態：</p>
<ul>
<li><code>IndexScan(t1 idx_name)</code> — 強制 index scan</li>
<li><code>SeqScan(t1)</code> — 強制 seq scan</li>
<li><code>HashJoin(t1 t2)</code> / <code>NestLoop(t1 t2)</code> / <code>MergeJoin(t1 t2)</code></li>
<li><code>Leading(t1 t2 t3)</code> — 強制 join order</li>
<li><code>Rows(t1 t2 #100)</code> — 強制 row 估計</li>
</ul>
<p><strong>推薦</strong>：</p>
<ul>
<li>全 cluster 行為：用 GUC（如 <code>random_page_cost</code>）</li>
<li>單 query 行為：用 pg_hint_plan（不污染其他 query）</li>
<li>不要過度 hint — planner 多數時候 <em>是對的</em>、hint 是 last resort</li>
</ul>
<h2 id="5-個-production-踩雷">5 個 Production 踩雷</h2>
<h3 id="1-statistics-過時--planner-估錯-row-count">1. Statistics 過時 — Planner 估錯 row count</h3>
<p><code>ANALYZE</code> 是 autovacuum 一部分、預設 <em>autovacuum_analyze_scale_factor=0.1</em>（10% row 變動才 analyze）。對 <em>快速 grow 的表</em>（log / event）、ANALYZE 跟不上、planner 用過時 statistics。</p>
<p>修法：</p>
<ul>
<li>
<p>對 critical table 設 <em>較 aggressive autovacuum_analyze_scale_factor</em>：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">ALTER</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">events</span><span class="w"> </span><span class="k">SET</span><span class="w"> </span><span class="p">(</span><span class="n">autovacuum_analyze_scale_factor</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">.</span><span class="mi">02</span><span class="p">);</span></span></span></code></pre></div></li>
<li>
<p>對 <em>大批量寫入後</em>、手動 <code>ANALYZE events;</code></p>
</li>
<li>
<p>監控 <code>pg_stat_user_tables.last_analyze</code> — 跟 row count 比、判定是否需手動 trigger</p>
</li>
</ul>
<h3 id="2-multi-column-statistics--planner-假設-column-獨立">2. Multi-column statistics — Planner 假設 column 獨立</h3>
<p>如 Case 3、單 column statistics 對 <em>相關 column</em> 估錯。</p>
<p>修法：</p>
<ul>
<li>對 <em>常一起 query 的 column 組合</em>、建 <code>CREATE STATISTICS</code>（PG 10+）</li>
<li>3 種 type：<code>dependencies</code>（functional dependency）、<code>ndistinct</code>（multi-column distinct count）、<code>mcv</code>（most common value combinations）</li>
<li>設完 <em>必須跑 ANALYZE</em> 才生效</li>
</ul>
<h3 id="3-cost-base-setting-不對齊硬體--planner-偏-seq-scan">3. Cost-base setting 不對齊硬體 — Planner 偏 seq scan</h3>
<p>預設 <code>random_page_cost = 4</code>、<code>seq_page_cost = 1</code> 是 <em>HDD assumption</em>（random IO 比 sequential 慢 4x）。SSD / NVMe random / seq IO 差別小、planner 不該 4x penalty random。</p>
<p>修法：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1">-- SSD
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"></span><span class="k">ALTER</span><span class="w"> </span><span class="k">SYSTEM</span><span class="w"> </span><span class="k">SET</span><span class="w"> </span><span class="n">random_page_cost</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">1</span><span class="p">.</span><span class="mi">1</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w"></span><span class="c1">-- NVMe
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="c1"></span><span class="k">ALTER</span><span class="w"> </span><span class="k">SYSTEM</span><span class="w"> </span><span class="k">SET</span><span class="w"> </span><span class="n">random_page_cost</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="w"></span><span class="k">SELECT</span><span class="w"> </span><span class="n">pg_reload_conf</span><span class="p">();</span></span></span></code></pre></div><p><code>random_page_cost</code> 改了 planner 對 index scan 的 cost 估計更準、自動選 index 更積極。</p>
<h3 id="4-effective_cache_size-不對齊實際-ram">4. <code>effective_cache_size</code> 不對齊實際 RAM</h3>
<p><code>effective_cache_size</code> 預設 4 GB、planner 假設 buffer pool + OS cache 共 4 GB。實際 server 64 GB RAM、<code>shared_buffers = 16GB</code>、OS page cache ~30 GB、實際可用 cache 46 GB。</p>
<p>修法：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">ALTER</span><span class="w"> </span><span class="k">SYSTEM</span><span class="w"> </span><span class="k">SET</span><span class="w"> </span><span class="n">effective_cache_size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;46GB&#39;</span><span class="p">;</span><span class="w">  </span><span class="c1">-- shared_buffers + OS cache 估</span></span></span></code></pre></div><p>提升後 planner 估 query 多數 page 在 cache、降低 <em>估計 random IO cost</em>、選 index 更積極。</p>
<h3 id="5-parallel-query-不觸發">5. Parallel query 不觸發</h3>
<p>預設 <code>max_parallel_workers_per_gather = 2</code>、有些 workload 不夠。或 <em>table size 太小</em>、<code>min_parallel_table_scan_size = 8MB</code> 預設、小表不 parallel。</p>
<p>修法：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">ALTER</span><span class="w"> </span><span class="k">SYSTEM</span><span class="w"> </span><span class="k">SET</span><span class="w"> </span><span class="n">max_parallel_workers_per_gather</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">4</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w"></span><span class="k">ALTER</span><span class="w"> </span><span class="k">SYSTEM</span><span class="w"> </span><span class="k">SET</span><span class="w"> </span><span class="n">parallel_setup_cost</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">100</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w"></span><span class="k">ALTER</span><span class="w"> </span><span class="k">SYSTEM</span><span class="w"> </span><span class="k">SET</span><span class="w"> </span><span class="n">parallel_tuple_cost</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">.</span><span class="mi">01</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w"></span><span class="k">ALTER</span><span class="w"> </span><span class="k">SYSTEM</span><span class="w"> </span><span class="k">SET</span><span class="w"> </span><span class="n">min_parallel_table_scan_size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;0&#39;</span><span class="p">;</span><span class="w">  </span><span class="c1">-- 任何 size 都 parallel</span></span></span></code></pre></div><p>監控 <code>EXPLAIN</code> 的 <code>Workers Planned</code> 數量、看是否真 parallel。</p>
<h2 id="觀測-metric">觀測 metric</h2>
<p>Production 持續 monitor：</p>
<ul>
<li><code>pg_stat_statements</code>：每個 query digest 累計 calls / time / rows / IO</li>
<li><code>auto_explain</code> log：slow query 的實際 plan + ANALYZE 統計</li>
<li><code>pg_stat_user_tables.last_analyze</code> / <code>last_autoanalyze</code>：statistics 新鮮度</li>
<li><code>pg_stat_user_indexes.idx_scan</code>：每個 index 使用次數 — 0 表示沒用、可考慮 drop</li>
</ul>
<p>把這些丟進 Datadog / Prometheus（用 <code>postgres_exporter</code> / <code>pg_exporter</code>）做 trend analysis。</p>
<h2 id="跟-mysql-query-optimization-對照">跟 MySQL Query Optimization 對照</h2>
<table>
  <thead>
      <tr>
          <th>維度</th>
          <th>PG</th>
          <th>MySQL</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Query plan preview</td>
          <td><code>EXPLAIN</code></td>
          <td><code>EXPLAIN</code></td>
      </tr>
      <tr>
          <td>實際執行統計</td>
          <td><code>EXPLAIN ANALYZE</code></td>
          <td><code>EXPLAIN ANALYZE</code> (8.0+)</td>
      </tr>
      <tr>
          <td>Auto-capture</td>
          <td><code>auto_explain</code> extension</td>
          <td><code>slow_query_log</code> + <code>pt-query-digest</code></td>
      </tr>
      <tr>
          <td>Optimizer trace</td>
          <td>log_planner_stats / log_executor_stats</td>
          <td><code>optimizer_trace</code> (JSON)</td>
      </tr>
      <tr>
          <td>Per-query hint</td>
          <td><code>pg_hint_plan</code> extension</td>
          <td>optimizer hint comment (<code>/*+ */</code>)</td>
      </tr>
      <tr>
          <td>Multi-column statistics</td>
          <td><code>CREATE STATISTICS</code></td>
          <td>無原生（依賴 index 統計）</td>
      </tr>
      <tr>
          <td>Parallel query</td>
          <td>Full (scan / agg / join, PG 9.6+)</td>
          <td>受限 (8.0 hash join)</td>
      </tr>
      <tr>
          <td>Cost-base setting</td>
          <td>random_page_cost / effective_cache_size</td>
          <td>隱性、optimizer 預設</td>
      </tr>
  </tbody>
</table>
<p>PG planner 整體成熟、複雜 OLAP-style query 處理較好。MySQL 8.0 補了不少（histograms / hash join）但複雜 query 仍弱於 PG。詳見 <a href="/blog/backend/01-database/vendors/mysql/query-optimization/" data-link-title="MySQL Query Optimization：從 EXPLAIN 看到實際執行、5 條 query 從 5 秒變 50ms 的 anatomy" data-link-desc="MySQL query 慢的根因不在「SQL 寫法」、在「optimizer 選錯 plan」。本文從 5 個常見 production case 開場（5 秒 → 50ms / 30 秒 → 200ms / 8 秒 → 30ms 等）、走 EXPLAIN / EXPLAIN ANALYZE / optimizer trace 三層分析工具、index hint vs optimizer hint 取捨、cardinality estimation 失效時的修法、5 production 踩雷（statistics 過時 / forced index 用錯 / hash join 沒觸發 / range scan 退化 ALL / derived table materialization）">MySQL Query Optimization</a>。</p>
<h2 id="跟其他模組整合">跟其他模組整合</h2>
<h3 id="跟-autovacuum-tuning">跟 Autovacuum Tuning</h3>
<p>ANALYZE 是 autovacuum 一部分、autovacuum 跟不上 → statistics 過時 → planner 估錯。詳見 <a href="/blog/backend/01-database/vendors/postgresql/autovacuum-tuning/" data-link-title="PostgreSQL autovacuum tuning：為什麼你的 autovacuum 永遠追不上 bloat" data-link-desc="MVCC 怎麼產生 dead tuple、autovacuum cost-based throttle 為什麼預設保守、per-table tuning 怎麼設、5 個 production 踩雷（cost_limit 太低 / 長 transaction blocks vacuum / anti-wraparound 在 peak / partition vacuum 滿 worker / index bloat 沒處理）、跟 partitioning &#43; monitoring 整合">Autovacuum Tuning</a>。</p>
<h3 id="跟-replication-topology">跟 Replication Topology</h3>
<p>Standby 上跑 query 用同 statistics（streaming replication copy 整個 system catalog）、planner 行為一致。但 <em>standby 有 hot_standby_feedback</em> 影響 primary autovacuum / ANALYZE 行為。詳見 <a href="/blog/backend/01-database/vendors/postgresql/replication-topology/" data-link-title="PostgreSQL Replication Topology：async / sync / quorum 三模式跟 LSN &#43; replication slot 的三軸組合" data-link-desc="PostgreSQL streaming replication 不是「sync 或 async」、是 *durability / latency / consistency* 三軸組合 &#43; LSN-based 進度追蹤 &#43; replication slot 治理。本文走 3 軸取捨模型、async / sync / quorum-based sync 行為對比、LSN &#43; replication slot 機制、配置 step-by-step、5 production 踩雷（standby lag 暴衝 / sync standby 退回 async / orphan replication slot / cascading replication 雪崩 / failover 後 timeline 分歧）、跟 Patroni HA &#43; logical replication 整合">Replication Topology</a>。</p>
<h3 id="跟-partitioning">跟 Partitioning</h3>
<p>Partition pruning 跟 query plan 緊密 — <code>EXPLAIN</code> 看是否 prune 對的 partition。詳見 <a href="/blog/backend/01-database/vendors/postgresql/declarative-partitioning/" data-link-title="PostgreSQL declarative partitioning：partition 不是切表、是讓 planner pruning" data-link-desc="Declarative partitioning 的真實價值是 query planner pruning &#43; maintenance scope 縮小、不是「把大表切小」；RANGE / LIST / HASH 取捨、partition key 選法、5 個 production 踩雷（key 選錯不 prune / unique 不 enforce 跨 partition / ATTACH 鎖太久 / partition 數爆 / DETACH 不 reclaim 空間）、跟 autovacuum &#43; index 設計整合">Declarative Partitioning</a>。</p>
<h2 id="何時用-pg_hint_plan-vs-guc">何時用 pg_hint_plan vs GUC</h2>
<table>
  <thead>
      <tr>
          <th>情境</th>
          <th>選擇</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>全 cluster 行為（如 SSD random_page_cost）</td>
          <td>GUC</td>
      </tr>
      <tr>
          <td>單一 critical query 強制特定 plan</td>
          <td>pg_hint_plan</td>
      </tr>
      <tr>
          <td>暫時 disable 某類 plan 給 debug</td>
          <td><code>SET enable_xxx=off</code> per-session</td>
      </tr>
      <tr>
          <td>Production stable use</td>
          <td>GUC + multi-column statistics 為主、hint 為 last resort</td>
      </tr>
  </tbody>
</table>
<h2 id="相關連結">相關連結</h2>
<ul>
<li><a href="/blog/backend/01-database/vendors/postgresql/" data-link-title="PostgreSQL" data-link-desc="多用途 OLTP 主流關聯式資料庫、MVCC、豐富 SQL 特性、是 Aurora / Cosmos DB / Spanner / CockroachDB / Aurora DSQL 的相容目標">PostgreSQL vendor overview</a></li>
<li><a href="/blog/backend/01-database/vendors/postgresql/autovacuum-tuning/" data-link-title="PostgreSQL autovacuum tuning：為什麼你的 autovacuum 永遠追不上 bloat" data-link-desc="MVCC 怎麼產生 dead tuple、autovacuum cost-based throttle 為什麼預設保守、per-table tuning 怎麼設、5 個 production 踩雷（cost_limit 太低 / 長 transaction blocks vacuum / anti-wraparound 在 peak / partition vacuum 滿 worker / index bloat 沒處理）、跟 partitioning &#43; monitoring 整合">PG Autovacuum Tuning</a>（ANALYZE 跟 statistics 新鮮度）</li>
<li><a href="/blog/backend/01-database/vendors/postgresql/replication-topology/" data-link-title="PostgreSQL Replication Topology：async / sync / quorum 三模式跟 LSN &#43; replication slot 的三軸組合" data-link-desc="PostgreSQL streaming replication 不是「sync 或 async」、是 *durability / latency / consistency* 三軸組合 &#43; LSN-based 進度追蹤 &#43; replication slot 治理。本文走 3 軸取捨模型、async / sync / quorum-based sync 行為對比、LSN &#43; replication slot 機制、配置 step-by-step、5 production 踩雷（standby lag 暴衝 / sync standby 退回 async / orphan replication slot / cascading replication 雪崩 / failover 後 timeline 分歧）、跟 Patroni HA &#43; logical replication 整合">PG Replication Topology</a>（standby planner 行為）</li>
<li><a href="/blog/backend/01-database/vendors/postgresql/declarative-partitioning/" data-link-title="PostgreSQL declarative partitioning：partition 不是切表、是讓 planner pruning" data-link-desc="Declarative partitioning 的真實價值是 query planner pruning &#43; maintenance scope 縮小、不是「把大表切小」；RANGE / LIST / HASH 取捨、partition key 選法、5 個 production 踩雷（key 選錯不 prune / unique 不 enforce 跨 partition / ATTACH 鎖太久 / partition 數爆 / DETACH 不 reclaim 空間）、跟 autovacuum &#43; index 設計整合">PG Declarative Partitioning</a>（partition pruning）</li>
<li><a href="/blog/backend/01-database/vendors/mysql/query-optimization/" data-link-title="MySQL Query Optimization：從 EXPLAIN 看到實際執行、5 條 query 從 5 秒變 50ms 的 anatomy" data-link-desc="MySQL query 慢的根因不在「SQL 寫法」、在「optimizer 選錯 plan」。本文從 5 個常見 production case 開場（5 秒 → 50ms / 30 秒 → 200ms / 8 秒 → 30ms 等）、走 EXPLAIN / EXPLAIN ANALYZE / optimizer trace 三層分析工具、index hint vs optimizer hint 取捨、cardinality estimation 失效時的修法、5 production 踩雷（statistics 過時 / forced index 用錯 / hash join 沒觸發 / range scan 退化 ALL / derived table materialization）">MySQL Query Optimization</a>（sibling、不同 optimizer 成熟度）</li>
<li>官方：<a href="https://www.postgresql.org/docs/current/sql-explain.html">EXPLAIN</a> / <a href="https://github.com/ossc-db/pg_hint_plan">pg_hint_plan</a> / <a href="https://www.postgresql.org/docs/current/auto-explain.html">auto_explain</a></li>
</ul>
]]></content:encoded></item><item><title>MongoDB Aggregation Pipeline Optimization：stage 順序、index 配合與 memory 邊界</title><link>https://tarrragon.github.io/blog/backend/01-database/vendors/mongodb/aggregation-pipeline-optimization/</link><pubDate>Wed, 27 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/01-database/vendors/mongodb/aggregation-pipeline-optimization/</guid><description>&lt;p>MongoDB aggregation pipeline 是 document model 做 analytical query 的主要介面、stage stream 設計直觀但 production 容易踩雷 — 上線時 200ms、半年後資料量翻倍變 8s、加 index 沒用；profiler 顯示 stage 之間在 memory 累積上百 MB temp data。Aggregation pipeline 的最佳化跟 RDBMS 的 SQL planner 完全不同邏輯 — RDBMS 靠 planner 自動重排 join / filter、MongoDB 靠寫 query 的人手動排 stage 順序。本文把 stage 機制、index 配合、memory 邊界、cross-shard 限制講清楚、並對「report dashboard 跑爆 primary」這個常見 anti-pattern 給治理路徑。&lt;/p>
&lt;p>本文不重複 &lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/mongodb/" data-link-title="MongoDB" data-link-desc="Document database 代表、Atlas managed、跨雲可用、許多大規模平台從 MongoDB 起家">MongoDB vendor overview&lt;/a> 已寫過的 aggregation 簡介 — 而是 production tuning + 失敗修復的實作層教學。&lt;/p>
&lt;blockquote>
&lt;p>&lt;strong>前置閱讀&lt;/strong>：MongoDB workload 適配判讀（document shape 主導 / contract layer 該放哪 / 跨雲 hedging 是否需要）見 &lt;a href="../schema-design-pattern/#%e5%95%8f%e9%a1%8c%e6%83%85%e5%a2%83document-%e8%87%aa%e7%94%b1%e7%9a%84%e5%be%8c%e5%ba%a7%e5%8a%9b">schema-design-pattern 開頭 3 軸前置判讀&lt;/a>。本文聚焦 aggregation pipeline 操作層、是 &lt;em>已選 MongoDB 後&lt;/em> 的 query 層工程議題、不重複前置判讀。&lt;/p>&lt;/blockquote>
&lt;h2 id="問題情境aggregation-是-hot-path-的反模式">問題情境：aggregation 是 hot path 的反模式&lt;/h2>
&lt;p>典型觸發場景：報表 pipeline 上線時 200ms、半年後資料量翻倍變 8s、加 index 沒用；profiler 顯示 stage 之間在 memory 累積上百 MB temp data。&lt;/p>
&lt;p>進一步徵兆：&lt;/p>
&lt;ul>
&lt;li>「OLTP collection 上跑 analytical query」的混合 workload：把 &lt;code>$group + $lookup + $sort&lt;/code> 接成長 pipeline、aggregation 把整個 working set 從 cache 擠走&lt;/li>
&lt;li>Sharded cluster 上跑 cross-shard aggregation：&lt;code>$group&lt;/code> / &lt;code>$sort&lt;/code> 必須在 mongos 合併、mongos 變單點瓶頸&lt;/li>
&lt;li>&lt;code>$lookup&lt;/code> 出現在 hot path：每筆 input doc 都要去另一個 collection 查、嚴格意義上是 N+1&lt;/li>
&lt;li>&lt;code>db.serverStatus().metrics.aggStageCounters&lt;/code> 飆、&lt;code>executionStats.executionTimeMillis&lt;/code> 跟 doc 數線性增長&lt;/li>
&lt;li>Profiler 報 &lt;code>usedDisk: true&lt;/code>、aggregation OOM kill &lt;code>QueryExceededMemoryLimitNoDiskUseAllowed&lt;/code>&lt;/li>
&lt;/ul>
&lt;p>Case anchor：report dashboard 跑爆 primary 的具體 incident 細節需未來 case 補完、本文以「常見 anti-pattern」處理、不憑空編造 incident 數字。側面引用 &lt;a href="https://tarrragon.github.io/blog/backend/09-performance-capacity/cases/microsoft-365-cosmos-db-analytics/" data-link-title="9.C30 Microsoft 365：從 MongoDB 遷移到 Cosmos DB 的分析平台" data-link-desc="Microsoft 365 把使用分析平台從 MongoDB 遷移到 Cosmos DB、planet-scale 全球分散式分析">9.C30 Microsoft 365&lt;/a> — 從 MongoDB 把 analytics 分離出來的 driver。&lt;/p></description><content:encoded><![CDATA[<p>MongoDB aggregation pipeline 是 document model 做 analytical query 的主要介面、stage stream 設計直觀但 production 容易踩雷 — 上線時 200ms、半年後資料量翻倍變 8s、加 index 沒用；profiler 顯示 stage 之間在 memory 累積上百 MB temp data。Aggregation pipeline 的最佳化跟 RDBMS 的 SQL planner 完全不同邏輯 — RDBMS 靠 planner 自動重排 join / filter、MongoDB 靠寫 query 的人手動排 stage 順序。本文把 stage 機制、index 配合、memory 邊界、cross-shard 限制講清楚、並對「report dashboard 跑爆 primary」這個常見 anti-pattern 給治理路徑。</p>
<p>本文不重複 <a href="/blog/backend/01-database/vendors/mongodb/" data-link-title="MongoDB" data-link-desc="Document database 代表、Atlas managed、跨雲可用、許多大規模平台從 MongoDB 起家">MongoDB vendor overview</a> 已寫過的 aggregation 簡介 — 而是 production tuning + 失敗修復的實作層教學。</p>
<blockquote>
<p><strong>前置閱讀</strong>：MongoDB workload 適配判讀（document shape 主導 / contract layer 該放哪 / 跨雲 hedging 是否需要）見 <a href="../schema-design-pattern/#%e5%95%8f%e9%a1%8c%e6%83%85%e5%a2%83document-%e8%87%aa%e7%94%b1%e7%9a%84%e5%be%8c%e5%ba%a7%e5%8a%9b">schema-design-pattern 開頭 3 軸前置判讀</a>。本文聚焦 aggregation pipeline 操作層、是 <em>已選 MongoDB 後</em> 的 query 層工程議題、不重複前置判讀。</p></blockquote>
<h2 id="問題情境aggregation-是-hot-path-的反模式">問題情境：aggregation 是 hot path 的反模式</h2>
<p>典型觸發場景：報表 pipeline 上線時 200ms、半年後資料量翻倍變 8s、加 index 沒用；profiler 顯示 stage 之間在 memory 累積上百 MB temp data。</p>
<p>進一步徵兆：</p>
<ul>
<li>「OLTP collection 上跑 analytical query」的混合 workload：把 <code>$group + $lookup + $sort</code> 接成長 pipeline、aggregation 把整個 working set 從 cache 擠走</li>
<li>Sharded cluster 上跑 cross-shard aggregation：<code>$group</code> / <code>$sort</code> 必須在 mongos 合併、mongos 變單點瓶頸</li>
<li><code>$lookup</code> 出現在 hot path：每筆 input doc 都要去另一個 collection 查、嚴格意義上是 N+1</li>
<li><code>db.serverStatus().metrics.aggStageCounters</code> 飆、<code>executionStats.executionTimeMillis</code> 跟 doc 數線性增長</li>
<li>Profiler 報 <code>usedDisk: true</code>、aggregation OOM kill <code>QueryExceededMemoryLimitNoDiskUseAllowed</code></li>
</ul>
<p>Case anchor：report dashboard 跑爆 primary 的具體 incident 細節需未來 case 補完、本文以「常見 anti-pattern」處理、不憑空編造 incident 數字。側面引用 <a href="/blog/backend/09-performance-capacity/cases/microsoft-365-cosmos-db-analytics/" data-link-title="9.C30 Microsoft 365：從 MongoDB 遷移到 Cosmos DB 的分析平台" data-link-desc="Microsoft 365 把使用分析平台從 MongoDB 遷移到 Cosmos DB、planet-scale 全球分散式分析">9.C30 Microsoft 365</a> — 從 MongoDB 把 analytics 分離出來的 driver。</p>
<h2 id="核心機制">核心機制</h2>
<p>Aggregation pipeline 是 stage 序列：每個 stage 接 stream of document、產出 stream of document。Stage 順序直接決定後續 stage 處理量 — 第一個 stage 是 IXSCAN 還是 COLLSCAN、<code>$match</code> 推到前面還是後面、<code>$project</code> 早 drop 還是晚 drop、都會放大或縮小後續 cost。</p>
<p><strong>Optimizer rewrite</strong>：MongoDB 會自動把 <code>$match</code> / <code>$project</code> 往前推、把 <code>$sort + $limit</code> 合併成 top-K、但不保證所有 case。用 <code>explain(&quot;executionStats&quot;)</code> 看 rewrite 後的 effective pipeline、不要靠原始 pipeline 推斷實際執行順序。</p>
<p><strong>Index 配合</strong>：pipeline 的 <em>第一個 stage</em> 若是 <code>$match</code> 或 <code>$sort</code>、且能對到 index、就走 IXSCAN。中間 stage 都是 in-memory stream、沒 index 概念。所以 <code>$match</code> 永遠該排第一、配合對應 index。</p>
<p><strong>Memory 邊界</strong>：每個 aggregation stage 預設 100MB memory 上限、超過要 <code>allowDiskUse: true</code>（4.2+ 是預設）。Disk spill 啟動後 IO 嚴重拖慢、aggregation 變慢 50-100x。</p>
<p><strong><code>$lookup</code> 在 sharded cluster</strong>：foreign collection 不能 sharded（5.0 前完全不行、5.0+ 有限放寬）；<code>$lookup</code> 本質是 nested loop join、沒 hash join / merge join — 對大 collection 不可用。</p>
<p><strong><code>$facet</code> 平行多 pipeline</strong>：但所有 facet 共享同一個 100MB 限制、複雜 facet 容易撞 memory ceiling。</p>
<p><strong><code>$merge</code> / <code>$out</code></strong>：把結果寫回 collection（pre-computed view / materialized view）— 把 hot analytical query 移出 read path、是治理 anti-pattern 的主要工具。</p>
<p>對應 knowledge card：<a href="/blog/backend/knowledge-cards/hot-partition/" data-link-title="Hot Partition" data-link-desc="說明分散式 KV / OLTP 中、單一 partition 流量遠超其他的容量問題">hot-partition</a>（aggregation 集中讀單 shard 的副作用）、<a href="/blog/backend/knowledge-cards/document-store/" data-link-title="Document Store" data-link-desc="說明以 JSON 文件與彈性 schema 提供資料存取的模式，以及它仍需的治理邊界">document-store</a>、<a href="/blog/backend/knowledge-cards/stale-read/" data-link-title="Stale Read" data-link-desc="讀取到落後於最新寫入版本的舊資料">stale-read</a>（從 secondary 跑 aggregation 的 trade-off）。</p>
<h2 id="操作流程">操作流程</h2>
<p><strong>Step 0：把壞 pipeline 跟好 pipeline 並排</strong>。看一個簡化但典型的優化：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-javascript" data-lang="javascript"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1">// 壞：lookup 在 match 前、sort 沒 limit、project 在最後
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="c1"></span><span class="nx">db</span><span class="p">.</span><span class="nx">orders</span><span class="p">.</span><span class="nx">aggregate</span><span class="p">([</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">  <span class="p">{</span> <span class="nx">$lookup</span><span class="o">:</span> <span class="p">{</span> <span class="nx">from</span><span class="o">:</span> <span class="s2">&#34;users&#34;</span><span class="p">,</span> <span class="nx">localField</span><span class="o">:</span> <span class="s2">&#34;userId&#34;</span><span class="p">,</span> <span class="nx">foreignField</span><span class="o">:</span> <span class="s2">&#34;_id&#34;</span><span class="p">,</span> <span class="nx">as</span><span class="o">:</span> <span class="s2">&#34;user&#34;</span> <span class="p">}</span> <span class="p">},</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">  <span class="p">{</span> <span class="nx">$match</span><span class="o">:</span> <span class="p">{</span> <span class="nx">status</span><span class="o">:</span> <span class="s2">&#34;completed&#34;</span><span class="p">,</span> <span class="s2">&#34;user.region&#34;</span><span class="o">:</span> <span class="s2">&#34;ap-tokyo&#34;</span> <span class="p">}</span> <span class="p">},</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">  <span class="p">{</span> <span class="nx">$sort</span><span class="o">:</span> <span class="p">{</span> <span class="nx">createdAt</span><span class="o">:</span> <span class="o">-</span><span class="mi">1</span> <span class="p">}</span> <span class="p">},</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">  <span class="p">{</span> <span class="nx">$project</span><span class="o">:</span> <span class="p">{</span> <span class="nx">_id</span><span class="o">:</span> <span class="mi">1</span><span class="p">,</span> <span class="nx">total</span><span class="o">:</span> <span class="mi">1</span><span class="p">,</span> <span class="nx">createdAt</span><span class="o">:</span> <span class="mi">1</span> <span class="p">}</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="p">])</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="c1">// 好：可推前的 match 寫前面、sort + limit 配對、project 早寫
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="c1"></span><span class="nx">db</span><span class="p">.</span><span class="nx">orders</span><span class="p">.</span><span class="nx">aggregate</span><span class="p">([</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">  <span class="p">{</span> <span class="nx">$match</span><span class="o">:</span> <span class="p">{</span> <span class="nx">status</span><span class="o">:</span> <span class="s2">&#34;completed&#34;</span> <span class="p">}</span> <span class="p">},</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">  <span class="p">{</span> <span class="nx">$sort</span><span class="o">:</span> <span class="p">{</span> <span class="nx">createdAt</span><span class="o">:</span> <span class="o">-</span><span class="mi">1</span> <span class="p">}</span> <span class="p">},</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">  <span class="p">{</span> <span class="nx">$limit</span><span class="o">:</span> <span class="mi">100</span> <span class="p">},</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">  <span class="p">{</span> <span class="nx">$lookup</span><span class="o">:</span> <span class="p">{</span> <span class="nx">from</span><span class="o">:</span> <span class="s2">&#34;users&#34;</span><span class="p">,</span> <span class="nx">localField</span><span class="o">:</span> <span class="s2">&#34;userId&#34;</span><span class="p">,</span> <span class="nx">foreignField</span><span class="o">:</span> <span class="s2">&#34;_id&#34;</span><span class="p">,</span> <span class="nx">as</span><span class="o">:</span> <span class="s2">&#34;user&#34;</span> <span class="p">}</span> <span class="p">},</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">  <span class="p">{</span> <span class="nx">$match</span><span class="o">:</span> <span class="p">{</span> <span class="s2">&#34;user.region&#34;</span><span class="o">:</span> <span class="s2">&#34;ap-tokyo&#34;</span> <span class="p">}</span> <span class="p">},</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl">  <span class="p">{</span> <span class="nx">$project</span><span class="o">:</span> <span class="p">{</span> <span class="nx">_id</span><span class="o">:</span> <span class="mi">1</span><span class="p">,</span> <span class="nx">total</span><span class="o">:</span> <span class="mi">1</span><span class="p">,</span> <span class="nx">createdAt</span><span class="o">:</span> <span class="mi">1</span><span class="p">,</span> <span class="s2">&#34;user.name&#34;</span><span class="o">:</span> <span class="mi">1</span> <span class="p">}</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl"><span class="p">])</span></span></span></code></pre></div><p>差別：壞 pipeline 對整個 orders 做 lookup、然後才過濾；好 pipeline 先過濾 + top-100、只對 100 筆做 lookup、再過濾 lookup 結果。實際 collection 大時兩者差 50-100x。</p>
<p><strong>Step 1：拿 explain plan</strong>。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-javascript" data-lang="javascript"><span class="line"><span class="ln">1</span><span class="cl"><span class="nx">db</span><span class="p">.</span><span class="nx">coll</span><span class="p">.</span><span class="nx">explain</span><span class="p">(</span><span class="s2">&#34;executionStats&#34;</span><span class="p">).</span><span class="nx">aggregate</span><span class="p">([...])</span></span></span></code></pre></div><p>看 <code>stages[]</code> 顯示 rewrite 後的 effective pipeline、<code>executionTimeMillis</code>、<code>totalDocsExamined / totalDocsReturned</code> 比值、是否 <code>usedDisk</code>。</p>
<p><strong>Step 2：把 <code>$match</code> 推到最前</strong>。越早過濾、後續 stage 處理量越小。Optimizer 通常自己會推、但 <code>$lookup</code> 之後的 <code>$match</code> 不會自動推到 <code>$lookup</code> 之前 — 因為 lookup 出的欄位才能被那個 match 用、邏輯依賴。寫 query 時就把能推前的 <code>$match</code> 寫前面。</p>
<p><strong>Step 3：對 <code>$match</code> 欄位建 compound index</strong>。確保 <code>executionStages</code> 顯示 <code>IXSCAN</code> 而不是 <code>COLLSCAN</code>。Compound index 順序敏感 — <code>{ status: 1, createdAt: -1 }</code> 對 <code>{ status: ..., createdAt: $gte: ... }</code> 高效、對 <code>{ createdAt: $gte: ... }</code> 走不到 index。</p>
<p><strong>Step 4：<code>$sort + $limit</code> 寫在一起</strong>。Optimizer 才會推 top-K（不需要 full sort、只需要 heap）。單 <code>$sort</code> 不限 limit 會做 full sort、容易撞 memory。</p>
<p><strong>Step 5：<code>$project</code> 早寫</strong>。把不需要的欄位早期 drop、減少後續 stage 處理 doc size。對大 document 特別有效。</p>
<p><strong>Step 6：把 hot analytical pipeline 寫成 materialized view</strong>。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-javascript" data-lang="javascript"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="nx">db</span><span class="p">.</span><span class="nx">orders</span><span class="p">.</span><span class="nx">aggregate</span><span class="p">([</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">  <span class="p">{</span> <span class="nx">$match</span><span class="o">:</span> <span class="p">{</span> <span class="nx">createdAt</span><span class="o">:</span> <span class="p">{</span> <span class="nx">$gte</span><span class="o">:</span> <span class="nx">ISODate</span><span class="p">(</span><span class="s2">&#34;2026-05-01&#34;</span><span class="p">)</span> <span class="p">}</span> <span class="p">}</span> <span class="p">},</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">  <span class="p">{</span> <span class="nx">$group</span><span class="o">:</span> <span class="p">{</span> <span class="nx">_id</span><span class="o">:</span> <span class="s2">&#34;$customerId&#34;</span><span class="p">,</span> <span class="nx">total</span><span class="o">:</span> <span class="p">{</span> <span class="nx">$sum</span><span class="o">:</span> <span class="s2">&#34;$amount&#34;</span> <span class="p">}</span> <span class="p">}</span> <span class="p">},</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">  <span class="p">{</span> <span class="nx">$merge</span><span class="o">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">      <span class="nx">into</span><span class="o">:</span> <span class="s2">&#34;monthly_customer_summary&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">      <span class="nx">on</span><span class="o">:</span> <span class="s2">&#34;_id&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">      <span class="nx">whenMatched</span><span class="o">:</span> <span class="s2">&#34;merge&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">      <span class="nx">whenNotMatched</span><span class="o">:</span> <span class="s2">&#34;insert&#34;</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">  <span class="p">}}</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="p">])</span></span></span></code></pre></div><p>定時更新（cron / 5 分鐘一次）、application 讀 materialized view 而不是即時跑 aggregation。</p>
<p><strong>Step 7：sharded cluster 處理</strong>。避免在 hot path 用 cross-shard <code>$lookup</code> / <code>$group</code>、或把這類 query 路由到 analytical replica（用 tag set + read preference）、見 <a href="../replica-set-read-preference/">replica set read preference</a>。</p>
<p>驗證點：</p>
<ul>
<li><code>executionTimeMillis</code> 在預期 budget 內</li>
<li><code>totalDocsExamined / totalDocsReturned</code> 比值接近 1（過濾效率高）</li>
<li>無 <code>usedDisk: true</code></li>
<li>無 stage 看到 <code>inMemory &gt; 50MB</code></li>
</ul>
<p>Rollback boundary：pipeline 改寫是 application code 變更、可以灰度；materialized view（<code>$merge</code>）需備份 target collection 才能還原。</p>
<h3 id="典型-tuning-過程200ms--8s--250ms">典型 tuning 過程（200ms → 8s → 250ms）</h3>
<p>一個常見的 production pipeline 演化路徑：</p>
<ol>
<li><strong>上線時 200ms</strong>：collection 100K doc、<code>$match</code> 過濾 95%、<code>$lookup</code> 只跑 5K 次、in-memory <code>$sort</code> 處理 5K row 在 100MB 內</li>
<li><strong>半年後 8s</strong>：collection 長到 2M doc、<code>$match</code> 仍過濾 95% 但變 100K row、<code>$lookup</code> 跑 100K 次（5K → 100K 是 20x）、<code>$sort</code> 在 in-memory 撞 100MB 開始 disk spill、IO 100x 退化</li>
<li><strong>加 compound index 沒用</strong>：index 是給 <code>$match</code> 用的、但 <code>$match</code> 之後的 stage（<code>$lookup</code> / <code>$sort</code>）走的是 in-memory pipeline、index 救不了</li>
<li><strong>修法到 250ms</strong>：(a) <code>$sort + $limit</code> 配對讓 optimizer 走 top-K、避免 full sort (b) 改 schema embed 把 <code>$lookup</code> 拿掉（見 <a href="../schema-design-pattern/">schema design pattern</a>）(c) hot pipeline 寫成 <code>$merge</code> materialized view、application 讀 view 不跑 aggregation</li>
</ol>
<p>關鍵教訓：aggregation 慢的原因不在 query 本身、在 <em>資料形狀演進</em>。Index 是 hot path 的第一個槓桿、但只對 <code>$match</code> / <code>$sort</code> 第一 stage 有效；後續 stage 要靠 stage 順序、materialized view、schema denormalize 來救。</p>
<h2 id="失敗模式">失敗模式</h2>
<p><strong><code>$lookup</code> 在 hot path</strong>：list page 每行去另一 collection 查、p99 隨 page size 線性增。應在 schema design 階段 denormalize、把 read-together 資料 embed 回 aggregate root（見 <a href="../schema-design-pattern/">schema design pattern</a>）。</p>
<p><strong><code>$sort</code> 不帶 limit + 沒 index</strong>：全表 in-memory sort、撞 100MB 限制 → OOM 或 disk spill。<code>allowDiskUse: true</code> 解 OOM 但 IO 100x 退化。修法是建對應 index 走 IXSCAN sort、或限 limit 走 top-K。</p>
<p><strong>Sharded cluster cross-shard aggregation</strong>：<code>$group</code> 階段所有 partial result 跑到 mongos 合併、mongos memory + CPU 爆。修法是 group key 包含 shard key prefix（讓 group 在 shard 內完成）、或路由到 analytical replica 跑。</p>
<p><strong>Stage 順序錯</strong>：<code>$lookup</code> 放在 <code>$match</code> 前、等於對全表都做 lookup 再過濾、每個 input doc 都觸發 lookup。<code>$match</code> 永遠該排第一。</p>
<p><strong>Aggregation 把 working set 擠走</strong>：OLTP 的 hot page 被 aggregation 的 cold scan 擠出 cache、整體 query latency 一起退化。修法是 analytical workload 跟 OLTP read 隔離（read preference tag）、或搬走 analytical（見下面 anti-recommendation）。</p>
<p><strong><code>$facet</code> 滿載</strong>：四個 facet 各跑大 pipeline、共享 100MB 限制立刻爆。修法是拆成獨立 query、不要硬塞 facet。</p>
<p>Anti-recommendation：</p>
<ul>
<li><strong>報表 / BI / analytics workload 跑 MongoDB primary 是反模式</strong>：應該 (a) 設定 analytical secondary + read preference tag (b) 用 <code>$merge</code> 寫到 reporting collection (c) 進階用 BI Connector / data lake / 把 analytical workload 整批搬到 <a href="https://clickhouse.com">ClickHouse</a> / BigQuery</li>
<li><strong>「report dashboard 跑爆 primary」典型 anti-pattern</strong>：BI 工具直連 MongoDB primary 跑長 pipeline、cache eviction 把 OLTP working set 擠走、p99 latency 在報表時段集體升。沒拿到具體 incident 數字、不在本文編造、改寫成「常見 anti-pattern」並推到治理路徑</li>
<li><strong>Aggregation 不能解 read scaling</strong>：aggregation 是 OLTP 的補位、不是 read scaling 的主路。Read scaling 在大規模 OLTP 走 cache + freshness token（見 <a href="../connection-management-and-cache-layer/">connection management and cache layer</a>）、不是把 aggregation 跑爆 secondary</li>
</ul>
<h2 id="容量與觀測">容量與觀測</h2>
<p>關鍵 metric：</p>
<ul>
<li>Aggregation operation time 分布</li>
<li>Disk spill 次數</li>
<li><code>opcounters.command</code> 中 aggregate 比例</li>
<li>Cache eviction rate 在 aggregation 高峰時的變化</li>
</ul>
<p>Mongo command：</p>
<ul>
<li><code>db.currentOp({ &quot;command.aggregate&quot;: { $exists: true } })</code>：當前 aggregation 在跑</li>
<li><code>db.serverStatus().metrics.aggStageCounters</code>：stage 級別 counter</li>
<li><code>explain(&quot;executionStats&quot;)</code>：單 query 詳細分析</li>
</ul>
<p>Profiler：<code>db.setProfilingLevel(1, {slowms: 200})</code>、看 <code>usedDisk</code> flag 跟 <code>numYield</code>。</p>
<p>回到 <a href="/blog/backend/04-observability/observability-evidence-package/" data-link-title="4.20 Observability Evidence Package" data-link-desc="把 log、metric、trace、audit 與資料品質限制包成可交接證據">4.20 observability evidence</a>：aggregation slow log + cache hit ratio + disk spill rate 是「analytical 壓力」的 evidence 三件套。</p>
<p>回到 <a href="/blog/backend/09-performance-capacity/bottleneck-localization/" data-link-title="9.5 瓶頸定位流程" data-link-desc="從 app 到 DB / cache / broker / 第三方 quota 的逐層瓶頸定位">9.5 bottleneck localization</a>：用 explain executionStats 把 pipeline stage 對到瓶頸（IXSCAN 還是 COLLSCAN、in-memory 還是 disk spill、shard-local 還是 mongos merge）。</p>
<h2 id="邊界與整合">邊界與整合</h2>
<p>Sibling deep articles：</p>
<ul>
<li><a href="../schema-design-pattern/">schema design pattern</a> — embedded 設計可消除大部分 <code>$lookup</code></li>
<li><a href="../shard-key-selection/">shard key selection</a> — 決定 aggregation 是 shard-local 還是 cross-shard</li>
<li><a href="../replica-set-read-preference/">replica set read preference</a> — aggregation 跑 secondary 的 stale read trade-off</li>
<li><a href="../connection-management-and-cache-layer/">connection management and cache layer</a> — report dashboard 跑爆 primary 時的 cache + read scaling 主路</li>
</ul>
<p>Migration playbook：analytical workload 大到不能繼續混在 MongoDB → split 出 <a href="/blog/backend/01-database/vendors/cosmosdb/" data-link-title="Azure Cosmos DB" data-link-desc="全球分散式 multi-model DB、5 個 consistency levels、Microsoft 自家 dogfood 證據">→ Cosmos DB MongoDB API + Synapse</a> 或 <a href="/blog/backend/01-database/vendors/dynamodb/" data-link-title="DynamoDB" data-link-desc="AWS managed key-value、partition-based scaling、9000 萬 RPS sustained 實戰證據">→ DynamoDB + Athena/Glue</a>（access pattern 重設計）。</p>
<p>跟 1.x 互引：<a href="/blog/backend/01-database/kv-document-capacity-planning/" data-link-title="1.10 KV / Document DB 容量規劃" data-link-desc="DynamoDB / Cosmos DB / Bigtable / MongoDB 等 KV / Document DB 的容量設計、partition key 取捨、capacity mode 選擇">1.10 KV / Document DB 容量規劃</a> 把 aggregation 列為 read-shape 的成本維度；<a href="/blog/backend/01-database/high-concurrency-access/" data-link-title="1.1 高併發下的 SQL 讀寫邊界" data-link-desc="說明高併發服務如何共用資料庫 client、控制 transaction、管理 connection pool、避免資料庫成為瓶頸">1.1 高併發資料存取</a> 處理「OLTP + analytical 同 cluster」的反模式。</p>
<h2 id="相關連結">相關連結</h2>
<ul>
<li><a href="/blog/backend/01-database/vendors/mongodb/" data-link-title="MongoDB" data-link-desc="Document database 代表、Atlas managed、跨雲可用、許多大規模平台從 MongoDB 起家">MongoDB vendor overview</a> — 本文是該頁尾「aggregation pipeline optimization」backlog 的深度展開</li>
<li><a href="/blog/posts/vendor-%E6%B7%B1%E5%BA%A6%E6%8A%80%E8%A1%93%E6%96%87%E7%AB%A0%E6%96%B9%E6%B3%95%E8%AB%96%E7%9A%84%E6%BC%94%E5%8C%96%E7%B4%80%E9%8C%84%E5%90%8C-vendor-%E7%B3%BB%E5%88%97%E7%9A%84%E9%96%8B%E5%A0%B4%E8%BC%AA%E6%9B%BF%E9%A9%97%E8%AD%89/" data-link-title="Vendor 深度技術文章方法論的演化紀錄：同 vendor 系列的開場輪替驗證" data-link-desc="vendor overview 飽和後要寫單一功能深度文章、需要選題與結構依據時回來。這套方法論的驗證來源與 cadence variant 在高風險場景（同 vendor sub-tool 系列）的實證。">Vendor 深度技術文章方法論</a></li>
<li>官方：<a href="https://www.mongodb.com/docs/manual/aggregation/">Aggregation Pipeline</a>、<a href="https://www.mongodb.com/docs/manual/core/aggregation-pipeline-optimization/">Optimize Pipelines</a>、<a href="https://www.mongodb.com/docs/manual/reference/operator/aggregation/merge/">$merge</a></li>
</ul>
]]></content:encoded></item></channel></rss>