<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Pgvector on Tarragon</title><link>https://tarrragon.github.io/blog/tags/pgvector/</link><description>Recent content in Pgvector on Tarragon</description><generator>Hugo -- gohugo.io</generator><language>zh-TW</language><copyright>Tarragon (CC BY 4.0)</copyright><lastBuildDate>Tue, 19 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://tarrragon.github.io/blog/tags/pgvector/index.xml" rel="self" type="application/rss+xml"/><item><title>pgvector Deep Dive：HNSW / IVFFlat 取捨跟跟專業 Vector DB 對比</title><link>https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/pgvector-deep-dive/</link><pubDate>Tue, 19 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/pgvector-deep-dive/</guid><description>&lt;blockquote>
&lt;p>本文是 &lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/" data-link-title="PostgreSQL" data-link-desc="多用途 OLTP 主流關聯式資料庫、MVCC、豐富 SQL 特性、是 Aurora / Cosmos DB / Spanner / CockroachDB / Aurora DSQL 的相容目標">PostgreSQL&lt;/a> overview 的 implementation-layer deep article。Overview 已說明 PG 在 OLTP 譜系的定位、本文聚焦 &lt;em>pgvector extension&lt;/em> — 用 PG 解 vector search workload 的路徑、是 &lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/extension-ecosystem/" data-link-title="PostgreSQL Extension Ecosystem：把 PG 變成 vector DB / time-series / sharded 的 plugin 生態" data-link-desc="PG 的 extension 機制不只是 plugin、是 *結構性產品線擴張* — pgvector 讓 PG 變 vector DB、TimescaleDB 變 time-series、Citus 變 sharded、PostGIS 變 GIS。本文走 PG extension lifecycle、6 個 production-critical extension（pg_stat_statements / pg_partman / pg_repack / pgvector / TimescaleDB / PostGIS）、5 production 踩雷（extension version 跟 PG version 對齊 / managed PG 限制 / upgrade order / shared_preload_libraries 衝突 / extension 跟 logical replication 互動）、cloud vendor 對 extension 的限制">extension-ecosystem&lt;/a> 內最受關注的 extension。&lt;/p>&lt;/blockquote>
&lt;hr>
&lt;h2 id="pgvector-是-pg-變-vector-db-的最短路徑">pgvector 是 PG 變 Vector DB 的最短路徑&lt;/h2>
&lt;p>pgvector 加兩件事：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-sql" data-lang="sql">&lt;span class="line">&lt;span class="ln"> 1&lt;/span>&lt;span class="cl">&lt;span class="k">CREATE&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">EXTENSION&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">vector&lt;/span>&lt;span class="p">;&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 2&lt;/span>&lt;span class="cl">&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 3&lt;/span>&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="c1">-- 加 vector column（dimension 必須事先決定）
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 4&lt;/span>&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="k">CREATE&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">TABLE&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">documents&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 5&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="n">id&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="nb">SERIAL&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">PRIMARY&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">KEY&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 6&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="n">content&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="nb">TEXT&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 7&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="n">embedding&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">vector&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">1536&lt;/span>&lt;span class="p">)&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="c1">-- OpenAI ada-002 維度
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 8&lt;/span>&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="p">);&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 9&lt;/span>&lt;span class="cl">&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">10&lt;/span>&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="c1">-- 三種 distance operator
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">11&lt;/span>&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="k">SELECT&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="o">*&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">FROM&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">documents&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">ORDER&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">BY&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">embedding&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="o">&amp;lt;-&amp;gt;&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s1">&amp;#39;[0.1, 0.2, ...]&amp;#39;&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">LIMIT&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="mi">10&lt;/span>&lt;span class="p">;&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="c1">-- L2
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">12&lt;/span>&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="k">SELECT&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="o">*&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">FROM&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">documents&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">ORDER&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">BY&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">embedding&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="o">&amp;lt;#&amp;gt;&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s1">&amp;#39;[0.1, 0.2, ...]&amp;#39;&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">LIMIT&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="mi">10&lt;/span>&lt;span class="p">;&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="c1">-- inner product
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">13&lt;/span>&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="k">SELECT&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="o">*&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">FROM&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">documents&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">ORDER&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">BY&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">embedding&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="o">&amp;lt;=&amp;gt;&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s1">&amp;#39;[0.1, 0.2, ...]&amp;#39;&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">LIMIT&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="mi">10&lt;/span>&lt;span class="p">;&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="c1">-- cosine&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Operator 對應：&lt;/p></description><content:encoded><![CDATA[<blockquote>
<p>本文是 <a href="/blog/backend/01-database/vendors/postgresql/" data-link-title="PostgreSQL" data-link-desc="多用途 OLTP 主流關聯式資料庫、MVCC、豐富 SQL 特性、是 Aurora / Cosmos DB / Spanner / CockroachDB / Aurora DSQL 的相容目標">PostgreSQL</a> overview 的 implementation-layer deep article。Overview 已說明 PG 在 OLTP 譜系的定位、本文聚焦 <em>pgvector extension</em> — 用 PG 解 vector search workload 的路徑、是 <a href="/blog/backend/01-database/vendors/postgresql/extension-ecosystem/" data-link-title="PostgreSQL Extension Ecosystem：把 PG 變成 vector DB / time-series / sharded 的 plugin 生態" data-link-desc="PG 的 extension 機制不只是 plugin、是 *結構性產品線擴張* — pgvector 讓 PG 變 vector DB、TimescaleDB 變 time-series、Citus 變 sharded、PostGIS 變 GIS。本文走 PG extension lifecycle、6 個 production-critical extension（pg_stat_statements / pg_partman / pg_repack / pgvector / TimescaleDB / PostGIS）、5 production 踩雷（extension version 跟 PG version 對齊 / managed PG 限制 / upgrade order / shared_preload_libraries 衝突 / extension 跟 logical replication 互動）、cloud vendor 對 extension 的限制">extension-ecosystem</a> 內最受關注的 extension。</p></blockquote>
<hr>
<h2 id="pgvector-是-pg-變-vector-db-的最短路徑">pgvector 是 PG 變 Vector DB 的最短路徑</h2>
<p>pgvector 加兩件事：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="k">CREATE</span><span class="w"> </span><span class="n">EXTENSION</span><span class="w"> </span><span class="n">vector</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="w"></span><span class="c1">-- 加 vector column（dimension 必須事先決定）
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="c1"></span><span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">documents</span><span class="w"> </span><span class="p">(</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="w">    </span><span class="n">id</span><span class="w"> </span><span class="nb">SERIAL</span><span class="w"> </span><span class="k">PRIMARY</span><span class="w"> </span><span class="k">KEY</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="w">    </span><span class="n">content</span><span class="w"> </span><span class="nb">TEXT</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="w">    </span><span class="n">embedding</span><span class="w"> </span><span class="n">vector</span><span class="p">(</span><span class="mi">1536</span><span class="p">)</span><span class="w">  </span><span class="c1">-- OpenAI ada-002 維度
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="c1"></span><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="w"></span><span class="c1">-- 三種 distance operator
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="c1"></span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">documents</span><span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">embedding</span><span class="w"> </span><span class="o">&lt;-&gt;</span><span class="w"> </span><span class="s1">&#39;[0.1, 0.2, ...]&#39;</span><span class="w"> </span><span class="k">LIMIT</span><span class="w"> </span><span class="mi">10</span><span class="p">;</span><span class="w">  </span><span class="c1">-- L2
</span></span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="c1"></span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">documents</span><span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">embedding</span><span class="w"> </span><span class="o">&lt;#&gt;</span><span class="w"> </span><span class="s1">&#39;[0.1, 0.2, ...]&#39;</span><span class="w"> </span><span class="k">LIMIT</span><span class="w"> </span><span class="mi">10</span><span class="p">;</span><span class="w">  </span><span class="c1">-- inner product
</span></span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="c1"></span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">documents</span><span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">embedding</span><span class="w"> </span><span class="o">&lt;=&gt;</span><span class="w"> </span><span class="s1">&#39;[0.1, 0.2, ...]&#39;</span><span class="w"> </span><span class="k">LIMIT</span><span class="w"> </span><span class="mi">10</span><span class="p">;</span><span class="w">  </span><span class="c1">-- cosine</span></span></span></code></pre></div><p>Operator 對應：</p>
<table>
  <thead>
      <tr>
          <th>Operator</th>
          <th>意義</th>
          <th>適用</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><code>&lt;-&gt;</code></td>
          <td>L2 distance</td>
          <td>通用、空間距離</td>
      </tr>
      <tr>
          <td><code>&lt;#&gt;</code></td>
          <td>Negative inner product</td>
          <td>normalized vector、cosine 等價</td>
      </tr>
      <tr>
          <td><code>&lt;=&gt;</code></td>
          <td>Cosine distance</td>
          <td>embedding 比較最常用</td>
      </tr>
  </tbody>
</table>
<p>對 OpenAI / Cohere / sentence-transformers embedding、通常用 <code>&lt;=&gt;</code>（cosine）— embedding model 訓練時是 cosine objective。</p>
<h2 id="ann-index-是-vector-search-的核心">ANN Index 是 Vector Search 的核心</h2>
<p>不加 index 的 <code>ORDER BY embedding &lt;=&gt; ?</code> 是 <em>full scan</em>：</p>
<ul>
<li>100K row、1536 dim、每 query ~2-5s（不可用）</li>
<li>1M row 直接超時</li>
</ul>
<p>pgvector 提供兩種 <em>Approximate Nearest Neighbor</em>（ANN）index：</p>
<table>
  <thead>
      <tr>
          <th>Index</th>
          <th>Build 時間</th>
          <th>Query 時間</th>
          <th>Recall@10</th>
          <th>Memory cost</th>
          <th>Update 行為</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>IVFFlat</td>
          <td>快（分鐘級）</td>
          <td>中（10-100ms）</td>
          <td>90-95%</td>
          <td>中（lists 數量）</td>
          <td>Insert OK、需重建保持 recall</td>
      </tr>
      <tr>
          <td>HNSW</td>
          <td>慢（小時級）</td>
          <td>快（1-10ms）</td>
          <td>95-99%</td>
          <td>高（2-4x 資料）</td>
          <td>Insert OK、graph 漸進維護</td>
      </tr>
  </tbody>
</table>
<p><strong>選 IVFFlat 的場景</strong>：</p>
<ul>
<li>Embedding 量 &lt; 1M</li>
<li>Build 時間敏感（CI / batch 環境）</li>
<li>Memory 緊</li>
<li>接受重建 cost（每月 / 每季）</li>
</ul>
<p><strong>選 HNSW 的場景</strong>：</p>
<ul>
<li>Embedding 量 1M-100M</li>
<li>Query latency &lt; 50ms 要求</li>
<li>Memory 充足</li>
<li>Insert 量穩定（不會爆炸性增長）</li>
</ul>
<h2 id="ivfflat分-cluster-找鄰居">IVFFlat：分 Cluster 找鄰居</h2>
<p>IVFFlat 機制：</p>
<ol>
<li><strong>Build</strong>：跑 k-means 把所有 vector 分 <code>lists</code> 個 cluster</li>
<li><strong>Query</strong>：先找最近的 <code>probes</code> 個 cluster、再在這些 cluster 內找 nearest neighbor</li>
</ol>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1">-- Build（lists 數量重要）
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"></span><span class="k">CREATE</span><span class="w"> </span><span class="k">INDEX</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">documents</span><span class="w"> </span><span class="k">USING</span><span class="w"> </span><span class="n">ivfflat</span><span class="w"> </span><span class="p">(</span><span class="n">embedding</span><span class="w"> </span><span class="n">vector_cosine_ops</span><span class="p">)</span><span class="w"> </span><span class="k">WITH</span><span class="w"> </span><span class="p">(</span><span class="n">lists</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">100</span><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w"></span><span class="c1">-- Query 時調 probes 換 recall vs latency
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="c1"></span><span class="k">SET</span><span class="w"> </span><span class="n">ivfflat</span><span class="p">.</span><span class="n">probes</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">10</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="w"></span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">documents</span><span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">embedding</span><span class="w"> </span><span class="o">&lt;=&gt;</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="k">LIMIT</span><span class="w"> </span><span class="mi">10</span><span class="p">;</span></span></span></code></pre></div><p><strong>Lists 跟 probes sizing 規則</strong>（pgvector 官方建議）：</p>
<table>
  <thead>
      <tr>
          <th>Row count</th>
          <th>lists 建議</th>
          <th>probes 建議</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>&lt; 1M</td>
          <td><code>rows / 1000</code></td>
          <td><code>sqrt(lists)</code></td>
      </tr>
      <tr>
          <td>&gt; 1M</td>
          <td><code>sqrt(rows)</code></td>
          <td><code>sqrt(lists)</code></td>
      </tr>
  </tbody>
</table>
<p>實務：100K row → lists=100 / probes=10、1M row → lists=1000 / probes=32。</p>
<p><strong>IVFFlat 的 recall drift</strong>：cluster 是 build 時固定的、新 insert 的 vector 進入「最近 cluster」、但隨資料分布改變、cluster center 可能不再代表性、recall 隨時間下降。</p>
<p>修法：定期 <code>REINDEX INDEX CONCURRENTLY ...</code>（每月 / 每 100K 新 row）。</p>
<h2 id="hnswmulti-level-graph-找鄰居">HNSW：Multi-level Graph 找鄰居</h2>
<p>HNSW（Hierarchical Navigable Small World）機制：</p>
<ol>
<li>多層 graph、上層稀疏、下層密集</li>
<li>Query 從上層 entry point 開始、逐層找近鄰、最後在底層精細搜尋</li>
<li>Insert 漸進維護 graph、不必重建</li>
</ol>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1">-- Build（兩個關鍵參數）
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"></span><span class="k">CREATE</span><span class="w"> </span><span class="k">INDEX</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">documents</span><span class="w"> </span><span class="k">USING</span><span class="w"> </span><span class="n">hnsw</span><span class="w"> </span><span class="p">(</span><span class="n">embedding</span><span class="w"> </span><span class="n">vector_cosine_ops</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w"></span><span class="k">WITH</span><span class="w"> </span><span class="p">(</span><span class="n">m</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">16</span><span class="p">,</span><span class="w"> </span><span class="n">ef_construction</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">64</span><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="w"></span><span class="c1">-- Query 時調 ef_search
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="c1"></span><span class="k">SET</span><span class="w"> </span><span class="n">hnsw</span><span class="p">.</span><span class="n">ef_search</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">100</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="w"></span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">documents</span><span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">embedding</span><span class="w"> </span><span class="o">&lt;=&gt;</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="k">LIMIT</span><span class="w"> </span><span class="mi">10</span><span class="p">;</span></span></span></code></pre></div><p><strong>參數含義</strong>：</p>
<table>
  <thead>
      <tr>
          <th>參數</th>
          <th>含義</th>
          <th>預設</th>
          <th>Trade-off</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><code>m</code></td>
          <td>每 node 最多鄰居數</td>
          <td>16</td>
          <td>大 → recall 高、memory 多</td>
      </tr>
      <tr>
          <td><code>ef_construction</code></td>
          <td>Build 時 graph 質量參數</td>
          <td>64</td>
          <td>大 → build 慢、graph 質量好</td>
      </tr>
      <tr>
          <td><code>ef_search</code></td>
          <td>Query 時搜尋範圍</td>
          <td>40</td>
          <td>大 → recall 高、latency 高</td>
      </tr>
  </tbody>
</table>
<p><strong>Build cost 真實量級</strong>（1M vector × 1536 dim）：</p>
<table>
  <thead>
      <tr>
          <th>配置</th>
          <th>Build 時間</th>
          <th>Memory</th>
          <th>Recall@10</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>m=8, ef_construction=32</td>
          <td>30 min</td>
          <td>4GB</td>
          <td>92%</td>
      </tr>
      <tr>
          <td>m=16, ef_construction=64</td>
          <td>2 hour</td>
          <td>8GB</td>
          <td>96%</td>
      </tr>
      <tr>
          <td>m=32, ef_construction=200</td>
          <td>8 hour</td>
          <td>16GB</td>
          <td>98%</td>
      </tr>
  </tbody>
</table>
<p>Production 多數選中間 <code>m=16, ef_construction=64</code>、recall / cost 平衡。</p>
<h2 id="hybrid-searchvector--filter-一起">Hybrid Search：Vector + Filter 一起</h2>
<p>Vector search 加 SQL filter 是 pgvector 比專業 vector DB 強的場景：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1">-- Vector + metadata filter
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"></span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">documents</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w"></span><span class="k">WHERE</span><span class="w"> </span><span class="n">category</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;tech&#39;</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="n">created_at</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span><span class="s1">&#39;2025-01-01&#39;</span><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w"></span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">embedding</span><span class="w"> </span><span class="o">&lt;=&gt;</span><span class="w"> </span><span class="s1">&#39;[0.1, 0.2, ...]&#39;</span><span class="w">
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="w"></span><span class="k">LIMIT</span><span class="w"> </span><span class="mi">10</span><span class="p">;</span></span></span></code></pre></div><p>但這裡有個 <em>pgvector 的踩雷</em>：filter 跟 ANN index 互動有兩種模式：</p>
<ol>
<li><strong>Pre-filter</strong>（planner 選）：先 filter 出符合條件的 row、再對 subset 跑 vector ordering → 不用 ANN index、可能慢</li>
<li><strong>Post-filter</strong>：用 ANN index 找 top-N、再 filter、可能 N 不夠補</li>
</ol>
<p>pgvector 0.8+（2024-10 release）加入 <em>iterative index scan</em>：HNSW / IVFFlat 一邊掃 graph 一邊 filter、效能比 pre-filter 好 5-10x。0.7+（2024-07）加 halfvec / binary quantization / parallel HNSW build。</p>
<p>實務：filter selectivity 高（&lt; 10%）時、考慮對 filter column 加 index 走 pre-filter；selectivity 低（&gt; 50%）走 iterative scan。</p>
<h2 id="quantization-跟-dimension-reduction">Quantization 跟 Dimension Reduction</h2>
<p>1536 dim float32 vector 一筆 6KB、1M row 6GB、加 HNSW index 後 ~20GB。Memory 緊時的省法：</p>
<h3 id="half-precisionpgvector-07">Half-precision（pgvector 0.7+）</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">documents</span><span class="w"> </span><span class="p">(</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w">    </span><span class="n">embedding</span><span class="w"> </span><span class="n">halfvec</span><span class="p">(</span><span class="mi">1536</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w"></span><span class="p">);</span></span></span></code></pre></div><p><code>halfvec</code> 是 float16、storage 減半、recall 損失通常 &lt; 1%。</p>
<h3 id="binary-quantization">Binary quantization</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1">-- 把每維壓成 1 bit
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"></span><span class="k">CREATE</span><span class="w"> </span><span class="k">INDEX</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">documents</span><span class="w"> </span><span class="k">USING</span><span class="w"> </span><span class="n">hnsw</span><span class="w"> </span><span class="p">(</span><span class="n">embedding</span><span class="w"> </span><span class="n">bit_hamming_ops</span><span class="p">);</span></span></span></code></pre></div><p>Recall 下降明顯（85-90%）、但 storage 1/32、適合「先粗篩再 rerank」hybrid pipeline。</p>
<h3 id="dimension-reduction">Dimension reduction</h3>
<p>訓練 PCA / Matryoshka model 把 1536 dim 降到 256-512 dim、recall 通常損失 &lt; 3%、storage 1/3-1/6。</p>
<h2 id="5-個-production-踩雷">5 個 Production 踩雷</h2>
<h3 id="case-1dimension-超-2000-限制">Case 1：Dimension 超 2000 限制</h3>
<p><strong>情境</strong>：要用 OpenAI text-embedding-3-large（3072 dim）、<code>CREATE TABLE ... embedding vector(3072)</code> 報錯。</p>
<p>pgvector <code>vector</code> type 上限 2000 dim（IVFFlat / HNSW index 限制）。</p>
<p>修法：</p>
<ul>
<li>改用 <code>halfvec</code>（pgvector 0.7+ 支援 4000 dim）</li>
<li>用 Matryoshka 截斷到 2000 dim 以下</li>
<li>換 embedding model（OpenAI text-embedding-3-small 1536 dim / 可截斷到 256-1024）</li>
</ul>
<h3 id="case-2hnsw-build-太慢">Case 2：HNSW build 太慢</h3>
<p><strong>情境</strong>：1M row build HNSW、跑 8 小時、blocking production。</p>
<p>修法：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1">-- 用 CONCURRENTLY 不 block
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"></span><span class="k">CREATE</span><span class="w"> </span><span class="k">INDEX</span><span class="w"> </span><span class="n">CONCURRENTLY</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">documents</span><span class="w"> </span><span class="k">USING</span><span class="w"> </span><span class="n">hnsw</span><span class="w"> </span><span class="p">(...);</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w"></span><span class="c1">-- 開 maintenance_work_mem
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="c1"></span><span class="k">SET</span><span class="w"> </span><span class="n">maintenance_work_mem</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;8GB&#39;</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="w"></span><span class="c1">-- 開 parallel
</span></span></span><span class="line"><span class="ln">8</span><span class="cl"><span class="c1"></span><span class="k">SET</span><span class="w"> </span><span class="n">max_parallel_maintenance_workers</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">7</span><span class="p">;</span></span></span></code></pre></div><p>仍慢的話、考慮：</p>
<ul>
<li>切分 batch insert + index（適合 read-heavy）</li>
<li>用 IVFFlat 短期上線、之後再切 HNSW</li>
<li>改用 cloud managed pgvector（提供更大 instance）</li>
</ul>
<h3 id="case-3ivfflat-不重建-recall-漂移">Case 3：IVFFlat 不重建 recall 漂移</h3>
<p><strong>情境</strong>：IVFFlat build 時資料 100K、現在 500K、新資料 recall 從 92% 降到 75%、user 抱怨「找不到相關文件」。</p>
<p>修法：</p>
<ul>
<li>Monitor recall：定期跑 ground-truth eval（brute-force 對比）</li>
<li>設定 reindex policy：每 100K 新 row 或每月 reindex</li>
<li>換 HNSW：insert 漸進維護、不需 reindex（trade-off：build 更慢）</li>
</ul>
<h3 id="case-4hybrid-search-filter-selectivity-沒設計">Case 4：Hybrid search filter selectivity 沒設計</h3>
<p><strong>情境</strong>：query <code>WHERE user_id = ? ORDER BY embedding &lt;=&gt; ?</code>、user_id 高選擇性（1/1M）、planner 選 vector index scan、掃到 top-K 全不符 user_id、補抓無止盡。</p>
<p>修法：</p>
<ul>
<li><code>EXPLAIN</code> 看 planner 選 pre-filter 還是 vector-first</li>
<li>對 <code>user_id</code> 加 B-tree index、強 planner pre-filter（hint 不容易、用 statistics）</li>
<li>pgvector 0.8+ 用 iterative scan、自動處理</li>
<li>設計 schema：高選擇性 filter（user_id）建議走 pre-filter；低選擇性（category）走 iterative</li>
</ul>
<h3 id="case-5memory-budget-沒抓">Case 5：Memory budget 沒抓</h3>
<p><strong>情境</strong>：1M vector × 1536 dim × HNSW（m=16）= ~12GB index、shared_buffers 8GB、index 不在 cache、每 query disk IO、latency 100ms+。</p>
<p>修法：</p>
<ul>
<li>算 vector + index memory：<code>row × dim × 4 bytes × (1 + index_overhead)</code></li>
<li><code>shared_buffers</code> 至少能放 hot index portion</li>
<li>不行就降 dim（halfvec）/ 升 instance / 拆 sharded</li>
</ul>
<h2 id="跟專業-vector-db-對比">跟專業 Vector DB 對比</h2>
<table>
  <thead>
      <tr>
          <th>維度</th>
          <th>pgvector</th>
          <th>Pinecone</th>
          <th>Weaviate</th>
          <th>Milvus</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Query 介面</td>
          <td>SQL</td>
          <td>REST/gRPC API</td>
          <td>GraphQL / REST</td>
          <td>gRPC</td>
      </tr>
      <tr>
          <td>Recall</td>
          <td>95-99%（HNSW）</td>
          <td>95-99%</td>
          <td>95-99%</td>
          <td>95-99%</td>
      </tr>
      <tr>
          <td>Throughput</td>
          <td>中（PG 限制）</td>
          <td>高</td>
          <td>高</td>
          <td>高</td>
      </tr>
      <tr>
          <td>Hybrid search</td>
          <td>強（完整 SQL）</td>
          <td>中（metadata filter）</td>
          <td>中</td>
          <td>中</td>
      </tr>
      <tr>
          <td>跟既有 PG 整合</td>
          <td>完美（同 DB join）</td>
          <td>需 sync</td>
          <td>需 sync</td>
          <td>需 sync</td>
      </tr>
      <tr>
          <td>Multi-tenant</td>
          <td>row-level（PG 一致）</td>
          <td>內建</td>
          <td>內建</td>
          <td>partition</td>
      </tr>
      <tr>
          <td>Open source</td>
          <td>是</td>
          <td>否</td>
          <td>是</td>
          <td>是</td>
      </tr>
      <tr>
          <td>Operational cost</td>
          <td>跟 PG 一樣（管 PG 即可）</td>
          <td>Managed-only</td>
          <td>需自管或 cloud</td>
          <td>需自管或 cloud</td>
      </tr>
      <tr>
          <td>Scale 上限</td>
          <td>10M-100M vector</td>
          <td>10B+</td>
          <td>1B+</td>
          <td>10B+</td>
      </tr>
  </tbody>
</table>
<p><strong>選 pgvector 的場景</strong>：</p>
<ul>
<li>Application 已用 PG、不想多管系統</li>
<li>Vector 量 &lt; 100M</li>
<li>需要 join vector + relational</li>
<li>Team SQL 熟、不想學 API SDK</li>
<li>Cost 敏感（managed Pinecone 1M vector 月 ~$70+）</li>
</ul>
<p><strong>選專業 vector DB 的場景</strong>：</p>
<ul>
<li>Vector 量 &gt; 5-20M（依 dim / QPS / recall 要求、pgvector 在這個級別 + 高 QPS 已開始痛、不必撐到 100M 才換）</li>
<li>純 vector workload（沒 relational integration）</li>
<li>需要 multi-tenant SaaS</li>
<li>Throughput 要求極高（&gt; 10K QPS）</li>
<li>不想自管 HNSW build / memory budget / recall drift（managed Pinecone 把這層 ops 轉嫁、cost 換 ops 時間）</li>
<li>需要 dim &gt; 2000（pgvector vector type 限制、halfvec 可到 4000、再大需 dimension reduction）</li>
</ul>
<h2 id="相關連結">相關連結</h2>
<ul>
<li><a href="/blog/backend/01-database/vendors/postgresql/extension-ecosystem/" data-link-title="PostgreSQL Extension Ecosystem：把 PG 變成 vector DB / time-series / sharded 的 plugin 生態" data-link-desc="PG 的 extension 機制不只是 plugin、是 *結構性產品線擴張* — pgvector 讓 PG 變 vector DB、TimescaleDB 變 time-series、Citus 變 sharded、PostGIS 變 GIS。本文走 PG extension lifecycle、6 個 production-critical extension（pg_stat_statements / pg_partman / pg_repack / pgvector / TimescaleDB / PostGIS）、5 production 踩雷（extension version 跟 PG version 對齊 / managed PG 限制 / upgrade order / shared_preload_libraries 衝突 / extension 跟 logical replication 互動）、cloud vendor 對 extension 的限制">extension-ecosystem</a>：其他 PG extension</li>
<li><a href="/blog/backend/01-database/vendors/postgresql/jsonb-deep-dive/" data-link-title="PostgreSQL JSONB Deep Dive：Binary Storage &#43; GIN Index 為什麼是結構性優勢" data-link-desc="PG JSONB（9.4&#43;）是 *binary 儲存的 JSON*、可直接 GIN index、是 PG 在 JSON workload 的結構性優勢、跟 MongoDB / MySQL 8.0 JSON_TABLE 比仍領先。本文走 JSON vs JSONB 差異、GIN index 機制（jsonb_ops vs jsonb_path_ops）、operator &#43; path query、partial JSONB indexing、5 production 踩雷（大 JSONB 跟 TOAST / nested update / index 選錯 op class / jsonb_path_query 跟 jsonb_path_exists 行為差 / partial index 條件搞錯）、何時用 JSONB vs 拆 column">jsonb-deep-dive</a>：embedding 通常配 metadata JSONB</li>
<li><a href="/blog/backend/01-database/vendors/postgresql/index-selection/" data-link-title="PostgreSQL Index Selection：B-tree / GIN / GiST / BRIN / Hash 對應 workload 的決策樹" data-link-desc="PG 有 6 種 index method（B-tree / Hash / GIN / GiST / SP-GiST / BRIN）跟 partial / expression / covering 三種變體、不是「都用 B-tree 就好」。每種 index 有自己的 query pattern、儲存代價、write amplification 跟 maintenance 成本。本文走 6 種 index 的適用 workload 對照、決策樹、partial / expression / covering / multi-column 變體、5 production 踩雷（過度 index / partial 條件不對 / B-tree 對 JSON 無效 / BRIN 對非 correlated 資料無效 / multi-column 順序錯）、跟 query-optimization 的 EXPLAIN 互補">index-selection</a>：B-tree / GIN / HNSW 整體比較</li>
<li><a href="/blog/backend/01-database/vendors/postgresql/query-optimization/" data-link-title="PostgreSQL Query Optimization：EXPLAIN ANALYZE / pg_hint_plan / auto_explain 三層工具跟 4 個 case" data-link-desc="PG query 慢的根因常是 *planner 選錯 plan 或 statistics 過時*。本文從 4 個 production case 開場（seq scan vs index / hash vs nested loop / 多 column 統計缺 / parallel query 沒觸發）、走 EXPLAIN / EXPLAIN ANALYZE / auto_explain 三層工具、pg_hint_plan extension 跟 planner GUC 取捨、5 production 踩雷（ANALYZE 過時 / multi-column statistics / cost-base setting 不對齊硬體 / random_page_cost SSD 沒調 / parallel query 配置）、跟 MySQL query-optimization sibling 對比">query-optimization</a>：vector query 的 EXPLAIN</li>
</ul>
<h2 id="下一步">下一步</h2>
<ul>
<li>看 <a href="/blog/backend/01-database/vendors/postgresql/extension-ecosystem/" data-link-title="PostgreSQL Extension Ecosystem：把 PG 變成 vector DB / time-series / sharded 的 plugin 生態" data-link-desc="PG 的 extension 機制不只是 plugin、是 *結構性產品線擴張* — pgvector 讓 PG 變 vector DB、TimescaleDB 變 time-series、Citus 變 sharded、PostGIS 變 GIS。本文走 PG extension lifecycle、6 個 production-critical extension（pg_stat_statements / pg_partman / pg_repack / pgvector / TimescaleDB / PostGIS）、5 production 踩雷（extension version 跟 PG version 對齊 / managed PG 限制 / upgrade order / shared_preload_libraries 衝突 / extension 跟 logical replication 互動）、cloud vendor 對 extension 的限制">extension-ecosystem</a> 探索其他 PG 擴展可能</li>
<li>回 <a href="/blog/backend/01-database/vendors/postgresql/" data-link-title="PostgreSQL" data-link-desc="多用途 OLTP 主流關聯式資料庫、MVCC、豐富 SQL 特性、是 Aurora / Cosmos DB / Spanner / CockroachDB / Aurora DSQL 的相容目標">PostgreSQL overview</a> 看全圖</li>
</ul>
]]></content:encoded></item></channel></rss>