Extension on Tarragon

1.6 延伸方向：Web UI、coding agent、產圖

Mon, 11 May 2026 00:00:00 +0000

模組一前五章覆蓋了「Ollama + Continue.dev」這條最短路徑。日常路徑跑穩後，你可能會想往以下方向延伸：加裝 ChatGPT 風格的 Web UI、跑 coding agent、嘗試產圖。本章把這些延伸方向逐一列出、給優先順序、講清楚哪些是「換工具」、哪些是「換領域」。

關鍵原則：先把寫 code 跑穩、再考慮延伸。同時推進三條延伸通常會讓每條都停在半生不熟階段、累積成果有限。本章建議的順序是先 Web UI、再 coding agent、最後產圖；如果你只想嘗試一個、依自己最常用的場景挑。

本章目標

讀完本章後，你應該能：

列出三條延伸方向的代表工具與基本定位。
知道每個方向跟寫 code 主路徑的關係。
判斷自己現階段該不該往延伸方向走。
對「產圖」這條歧路建立正確認知（不是換 model 就好）。

延伸方向一：ChatGPT 風格 Web UI（Open WebUI）

定位：在瀏覽器跑一個類 ChatGPT 介面，連到本地 LLM 或雲端 LLM。屬於三層架構的介面層，跟 Continue.dev 同層、解決不同情境（瀏覽器 vs IDE）。

典型使用情境：

不在寫 code 但想跟 LLM 對話（解釋技術概念、寫文章草稿）。
跟同事 / 家人分享 LLM 使用，他們不會用 VS Code。
從手機 / iPad 連回家裡 Mac 跑的 Ollama。
多輪深度對話、希望有歷史紀錄保存。

主流選擇：Open WebUI

Open WebUI 是 open source 的 ChatGPT-clone，連 Ollama 與 OpenAI 相容 API。安裝最快路徑是 Docker：

1docker run -d --name open-webui -p 3000:8080 \
2  -e OLLAMA_BASE_URL=http://host.docker.internal:11434 \
3  -v open-webui:/app/backend/data \
4  --restart always \
5  ghcr.io/open-webui/open-webui:main

host.docker.internal 是 Docker Desktop 提供的 DNS 名稱、container 內透過它連到宿主機（macOS 本身）跑的 Ollama；Linux Docker 沒這個別名、要改用 --add-host=host.docker.internal:host-gateway 或直接填宿主 IP。啟動後開 http://localhost:3000、註冊本地帳號（資料只存本機 SQLite）、就有完整 ChatGPT 介面：

對話歷史保存（本地 SQLite）
多 model 切換、可同時對比兩個 model 回答
系統 prompt 自訂、prompt template 管理
上傳檔案分析（PDF、txt 等）
圖片支援（如果本地 model 是多模態）

陷阱：

沒裝 Docker 的話要先學 Docker，是不小的前置學習。
Open WebUI 預設不需要驗證，跑在 0.0.0.0 會暴露在 LAN 上。要從外網用記得加 reverse proxy + auth。
對話紀錄存在 Docker volume，刪 container 要小心保留 volume，否則歷史會消失。

何時做這個延伸：日常 Continue.dev + Ollama 跑穩、用了至少一週、確認本地 LLM 對你有用，再加 Open WebUI 擴展使用情境。

延伸方向二：Coding Agent（aider、Cline 等）

定位：比 Continue.dev 更主動的 LLM 寫 code 工具。Continue.dev 是「你提問、LLM 答」的對話模式；coding agent 是「你給目標、LLM 自己分多步驟改 code、跑測試、修錯誤」的代理模式。詳細的 agent loop 結構、失敗模式、人類審查協作見 4.4 Agent 架構原理。

主流選擇：

工具	介面	定位
aider	CLI	git-aware、把 LLM 改的 diff 直接 commit、支援 multi-file edit
Cline	VS Code 擴充	在 VS Code 內跑 agent、可執行 shell command
Cursor Agent	Cursor 內建	Cursor 訂閱戶可用、雲端綁定

選擇三個工具的延伸判讀：

aider：當主要工作流是「在 terminal + git 內完成」、想讓 LLM 把 diff 直接 commit 進 history、aider 的 CLI-first + git-aware 設計最對位。失敗模式：跨多檔修改超過 5 個檔時、aider 的 prompt 規劃容易斷裂；改回 Continue.dev 手動逐檔修可能更穩。
Cline：當你已在 VS Code 內工作、想要 agent 能跑 shell command（執行測試、跑 build 看錯誤）並 loop 修錯時、Cline 比 aider 更貼近「IDE 內 agent」。失敗模式：本地模型在「規劃 → 執行 shell → 解讀錯誤 → 改 code」這個 loop 上接受度不穩、常需要人工接管。
Cursor Agent：當你已是 Cursor 訂閱戶、agent 預設綁雲端旗艦（成功率最高、但 prompt / code 會送到 Cursor 雲端）。NDA / 合規場景不適用、本地 LLM 接入也是次要 surface。

為什麼是 advanced：coding agent 需要本地模型能「跟著規劃跑多步驟、用 tools、不偏離目標」。這部分是本地 LLM 的弱項（見 1.5 期望管理）；現階段本地模型跑 coding agent 的成功率明顯低於雲端旗艦。

用 aider 跑本地 LLM 的最小範例：

1# 裝 aider
2pip install aider-chat
3
4# 在 git repo 內啟動，用本地 Ollama
5aider --model ollama/gemma4:31b-coding-mtp-bf16 \
6  --ollama-base-url http://localhost:11434

aider 會把當前 repo 的相關檔案打進 prompt、把 LLM 生成的 diff apply 到本機、自動 commit。簡單任務（單檔重構、加 test）成功率還行；複雜任務（跨檔案、需要規劃）失敗率高。

陷阱：

本地 LLM 跑 aider 比跑 Continue.dev 慢得多、因為每輪 agent loop 都要重新處理長 context。
coding agent 對 long context 敏感、本地 TTFT 痛點被放大。Agent loop 每輪都會 mutate prompt（前一輪結果加入下一輪的 context）、KV cache 命中率低、每輪都要重新做完整 prefill。
失敗時 agent 可能 commit 不可用的 code、要記得 git diff 審過再 push。

何時做這個延伸：本地模型在 Continue.dev 對話模式下表現穩定、且你想看看「multi-step 自動化」能幫到什麼程度。對多數讀者、這條延伸在 2026 年 5 月時是「值得試一週、但不一定留下」。

何時該停：以下訊號出現時、agent 路線在你的工作流暫時不成立、回到 Continue.dev 對話模式：

連續 5 個 multi-step 任務都需要人工接管 / 中途介入修錯
TTFT 持續 > 30 秒、agent loop 的「等待 → 接管」節奏比手寫快不了多少
agent commit 進 git history 的 diff 通過率 < 50%、審查與 revert 的成本超過自己寫
簡單任務（單檔重構、加 test）本地 agent 也常失敗、表示模型 capacity 對 agent 規劃不足

延伸方向三：產圖（Stable Diffusion、Flux 等）

產圖是另一個專業領域、工具鏈跟概念體系另起一套、跟 LLM 寫 code 沒有共用的伺服器層或 model layer。產圖用的是 Diffusion 架構、跟寫 code 用的 Transformer 架構是兩個獨立的神經網路類型。

四個維度上產圖跟寫 code 的工作流互不相通：

工具鏈各自獨立：Ollama 服務 Transformer LLM、Draw Things / ComfyUI 服務 Diffusion 模型、兩條路線的伺服器與生態互不通用。
prompt 風格不同：寫 code 是 instruction 形式、產圖是 descriptive prompt + negative prompt + sampler 參數。
學習成本各自獨立：產圖有自己的 LoRA、ControlNet、IP-Adapter、refiner 等概念體系、學起來等於進入新領域。
硬體最適規格不同：寫 code 看記憶體預算（跑大模型）、產圖看 GPU 算力與 VRAM 頻寬。

本章只給入口資訊、不展開教學。

主流工具：

工具	定位	適合誰
Draw Things	Mac 原生 app，GUI 友善，免費	macOS 使用者入門首選
ComfyUI	節點式工作流，跨平台，需要 Python 環境	想客製化流程、進階使用者
AUTOMATIC1111	Web UI，跨平台，需要 Python	Linux / NVIDIA 玩家為主
Diffusers	Hugging Face 的 Python library	開發者、要嵌入產品

主流模型：

模型	風格特色
Stable Diffusion 3.5	通用、社群成熟、生態最大
Flux	質感高、prompt 跟隨度高
SDXL	SD 1.5 的進階版，仍有大量 LoRA

Apple Silicon Mac 跑產圖的現實：

24GB+ Mac 可以順暢跑 SDXL / Flux。記憶體需求其實比 LLM 低（一張圖 ~ 8GB），但對 GPU 算力敏感。
M4 Max 跑 Flux 生 1024x1024 圖約 15 ~ 30 秒一張，可接受。
Draw Things 在 Mac App Store 可下載，是最簡單的入門路徑。

本指南的立場：先把寫 code 跑穩、再考慮產圖。產圖屬於獨立的學習主題、另外找專門教材會學得更有效率。

給讀者的延伸順序

如果你想嘗試延伸方向，建議的順序：

先用一個月本地 LLM 寫 code。確認 Ollama + Continue.dev 對你有用、習慣了切換。
第一個延伸：Open WebUI。加裝最低成本（只多裝 Docker），擴展使用情境到非 VS Code 場景。
第二個延伸：aider 或 Cline。試 coding agent，評估本地模型能 handle 多複雜的多步驟任務。
第三個延伸：產圖。完全獨立的學習投入，跟前面工具鏈無關。

依序進階。先讓基底穩、再疊加延伸、學習曲線最平滑。

不在本章範圍內的延伸

下列延伸方向值得知道存在，但不在本指南內展開：

方向	為什麼不展開
RAG（檢索增強生成）	需要 vector database、文件 chunking、embedding 設計、見 4.1 RAG 原理
Fine-tuning	訓練流程跟跑現成模型是不同工程；資源、資料、評估都複雜
Multi-modal（語音、影片）	工具鏈跟生態完全獨立
MCP（Model Context Protocol）伺服器整合	是工具串接協定、見 4.6 應用層協議
部署到雲端 GPU / Linux server	本指南範圍只在 Apple Silicon Mac

需要這些方向時請另尋專門資源；硬塞進來會稀釋本指南「Mac 本地寫 code」這條最短路徑。

下一步

實作範例（含 ComfyUI / Whisper / Piper TTS / RAG / MCP）見 Hands-on 章節。

讀到這裡、本指南的核心內容就完了。下一步是回到模組零或模組一任一章節做深度閱讀、或實際打開終端機跑第一個 ollama run、把概念變成肌肉記憶。

PostgreSQL Extension Ecosystem：把 PG 變成 vector DB / time-series / sharded 的 plugin 生態

Tue, 19 May 2026 00:00:00 +0000

本文是 PostgreSQL overview 的 implementation-layer deep article。Overview 已說明 PG 在 OLTP 譜系的定位、本文聚焦 extension ecosystem — PG 結構性產品線擴張的機制。

Extension 不只是 plugin、是產品線擴張

PG extension 機制讓 第三方加新 type / function / operator / index access method / planner hook、深度整合到 PG core。對比其他 DB 的 plugin model（MySQL plugin / MongoDB plugin）、PG extension 是 更深的 SPI。

結果：

pgvector → PG 變 vector similarity search DB（取代 Pinecone / Weaviate）
TimescaleDB → PG 變 time-series DB（取代 InfluxDB）
Citus → PG 變 sharded cluster
PostGIS → PG 變 GIS DB
pg_cron → PG 變 scheduled job runner
pgvectorscale → 大規模 vector index

對 vendor lock-in 敏感 / 想統一 stack 的 org、PG extension 提供 用 PG 取代多個 specialized DB 的可能。

但 統一 stack 的代價：PG 主庫 ops 風險集中（一個 PG 掛 = vector / time-series / GIS / cron 全掛）、extension 跟 PG version 對齊矩陣多一道升級顧慮、規模上限通常比專業 DB 低（pgvector 100M+ vs Pinecone 10B+ / TimescaleDB 100K rows/s vs InfluxDB 500K+）。決策框架：中小規模 + 已用 PG + 不想多管系統 → extension；大規模 + 純該 workload + 有專業 team → specialized DB。

Extension Lifecycle

 1-- 看可用 extension
 2SELECT * FROM pg_available_extensions;
 3
 4-- 安裝（在 OS 層、要有對應 package）
 5-- apt install postgresql-14-pg-stat-statements
 6
 7-- Enable in DB
 8CREATE EXTENSION pg_stat_statements;
 9
10-- 確認
11SELECT * FROM pg_extension;
12
13-- 升級 extension
14ALTER EXTENSION pg_stat_statements UPDATE;
15
16-- 移除
17DROP EXTENSION pg_stat_statements;

每個 extension 有：

Version — 跟 PG version 綁定（如 pg_stat_statements 14 / 15 / 16）
Schema — 安裝到 public 或專屬 schema
Dependencies — 部分 extension 依賴其他（如 PostGIS 依賴 pg_trgm）
Trusted vs untrusted — trusted 可以 non-superuser 安裝（PG 13+）

6 個 Production-Critical Extension

1. pg_stat_statements — Query stats（必裝）

任何 production PG cluster 都該裝：

1# postgresql.conf
2shared_preload_libraries = 'pg_stat_statements'
3pg_stat_statements.max = 5000
4pg_stat_statements.track = all

1CREATE EXTENSION pg_stat_statements;
2
3-- Top 10 query by total time
4SELECT query, calls, total_exec_time, mean_exec_time, rows
5FROM pg_stat_statements
6ORDER BY total_exec_time DESC LIMIT 10;

對應 MySQL events_statements_summary_by_digest。詳見 Query Optimization。

2. pg_partman — 自動 partition lifecycle

PG declarative partitioning 需要 手動建 / drop partition。pg_partman 自動化：

 1CREATE EXTENSION pg_partman SCHEMA partman;
 2
 3-- 設 events 表自動 monthly partition
 4SELECT partman.create_parent(
 5    p_parent_table => 'public.events',
 6    p_control => 'created_at',
 7    p_type => 'range',
 8    p_interval => '1 month',
 9    p_premake => 6  -- 預先建 6 個未來 partition
10);
11
12-- 跑 maintenance（建未來 partition + drop 老 partition）
13SELECT partman.run_maintenance(p_analyze => false);
14-- 預設用 pg_cron 排程

對 time-series partition workload 必裝。詳見 Declarative Partitioning。

3. pg_repack — Online table rewrite

詳見 Online Schema Change。

4. pgvector — Vector similarity search

LLM embedding / semantic search 場景必裝：

 1CREATE EXTENSION vector;
 2
 3CREATE TABLE documents (
 4    id SERIAL PRIMARY KEY,
 5    content TEXT,
 6    embedding VECTOR(1536)  -- OpenAI text-embedding-3-small 1536-dim
 7);
 8
 9-- HNSW index（pgvector 0.5+）
10CREATE INDEX ON documents USING HNSW (embedding vector_cosine_ops);
11
12-- 找最相似的 5 個
13SELECT * FROM documents
14ORDER BY embedding <=> '[0.1, 0.2, ...]'::vector
15LIMIT 5;

對 中小規模 RAG / semantic search workload、pgvector 在 PG 內跑、不必跨 Pinecone / Weaviate / Qdrant 等獨立服務。

對 超大規模 vector workload（> 1 億 vector）考慮 pgvectorscale（pgvector 的 streaming variant）或專業 vector DB。

5. TimescaleDB — Time-series 擴展

把 PG 變 time-series DB：

 1CREATE EXTENSION timescaledb;
 2
 3CREATE TABLE metrics (
 4    time TIMESTAMPTZ NOT NULL,
 5    device_id INT,
 6    value DOUBLE PRECISION
 7);
 8
 9-- 轉成 hypertable（auto-partition by time）
10SELECT create_hypertable('metrics', 'time');
11
12-- Continuous aggregate（materialized view 自動 refresh）
13CREATE MATERIALIZED VIEW metrics_5min
14WITH (timescaledb.continuous) AS
15SELECT time_bucket('5 minutes', time) AS bucket,
16       device_id, avg(value)
17FROM metrics
18GROUP BY bucket, device_id;

對 IoT / monitoring / financial tick data 場景、TimescaleDB 比純 PG 寫吞吐高 10x+。

6. PostGIS — GIS extension

地理 / 空間 query 業界標準：

 1CREATE EXTENSION postgis;
 2
 3CREATE TABLE stores (
 4    id SERIAL PRIMARY KEY,
 5    name TEXT,
 6    location GEOGRAPHY(POINT, 4326)
 7);
 8
 9CREATE INDEX ON stores USING GIST (location);
10
11-- 找 1 km 內的 store
12SELECT * FROM stores
13WHERE ST_DWithin(location, ST_MakePoint(121.5, 25.05)::geography, 1000);

PostGIS 是 GIS workload 業界標準、其他 DB GIS 能力都對標 PostGIS。

其他常用 extension

除 6 個 production-critical 之外、以下是 特定場景常用 的 extension — 分四類：排程跟 utility（pg_cron / pg_trgm / uuid-ossp）、type 擴展（hstore / citext / pgcrypto）、跨 DB 整合（postgres_fdw / mysql_fdw）、observability / debug 工具（pg_buffercache / pg_visibility / auto_explain）：

Extension	用途
`pg_cron`	排程 SQL job（不必外部 cron）
`pg_trgm`	Fuzzy string match / similarity
`uuid-ossp`	UUID 產生
`hstore`	Key-value pair type
`citext`	Case-insensitive text type
`pgcrypto`	加密 / hash function
`postgres_fdw`	PG → PG foreign table
`mysql_fdw`	PG → MySQL foreign table
`pg_buffercache`	Buffer pool 內容檢視
`pg_visibility`	Visibility map 檢視（debug bloat）
`auto_explain`	Slow query 自動 log plan
`wal2json`	Logical decoding output 為 JSON
`Citus`	Distributed PG
`pgvector`	Vector similarity
`pglogical`	Logical replication（功能比 native 強）
`pg_squeeze`	pg_repack 替代

實務組合：observability 三件套（pg_stat_statements + auto_explain + pg_buffercache）幾乎是 production 標配；FDW 是「跨 DB query」的 escape hatch、但 cross-DB query 效能差、適合 reporting 不適合 OLTP。

5 個 Production 踩雷

1. Extension version 跟 PG version 對齊

PG cluster 升 14 → 15 後、extension（pg_stat_statements / pg_partman / pgvector 等）必須有對應 15 版本。早期升級 / niche extension 可能還沒釋出。

修法：

升 PG cluster 前 先確認所有 extension 都有對應 PG version 釋出版本
升完 PG cluster 立即跑 ALTER EXTENSION xxx UPDATE
Upgrade runbook 紀錄每個 extension 的版本兼容狀態

2. Managed PG 限制 extension 列表

AWS RDS / Aurora PG / Cloud SQL / Azure DB for PostgreSQL 各自有 支援 extension 白名單：

不在白名單的 extension 不能 install
部分 extension 限定特定 PG version
Untrusted extension 通常不允許

常見 managed 不支援 的 extension：

pg_repack（Aurora 有限支援、RDS 部分 version 支援）
pglogical（部分 cloud 不支援）
pg_cron（cloud 通常用 managed scheduler 取代）
Custom extension（自寫 .so）

修法：

評估 managed PG 之前、先查 vendor 支援 extension 列表
Self-hosted vs managed 的 跨雲 portability 議題：extension 是 lock-in source
如果 application 強依賴某 extension（如 PostGIS），確認 cloud 支援

3. Extension upgrade order

pg_upgrade 升 PG major version 後、extension 也要升。順序：

pg_upgrade PG binary + cluster
對每個 DB 跑 ALTER EXTENSION xxx UPDATE
部分 extension（如 PostGIS）需要 特殊升級程序（SELECT postgis_extensions_upgrade()）

修法：

升 PG 後 先測 staging cluster 確認 extension upgrade 流程
PostGIS / TimescaleDB / Citus 有自己 upgrade 程序、必須遵循 vendor doc
升完跑 \dx 看每個 extension 版本

4. `shared_preload_libraries` 衝突

部分 extension（pg_stat_statements / auto_explain / TimescaleDB / Citus / pg_cron）必須在 shared_preload_libraries 加進去、需要 重啟 PG。

衝突情境：

pg_partman + TimescaleDB 都用 background worker、worker 上限不夠
max_worker_processes 預設 8、不夠時某些 extension 起不起來

修法：

列出所有 shared_preload extension、確認 order（部分有 dependency）
提高 max_worker_processes = 16 / max_parallel_workers = 8 等
重啟 PG 才生效、計入 maintenance window

5. Extension 跟 logical replication 互動

Logical replication（pglogical / native）不自動 replicate extension state（function / type definition）。Subscriber 沒裝對應 extension、replicate event 失敗。

修法：

Subscriber 必須 先安裝 publisher 用的 extension
Extension 版本 publisher / subscriber 對齊
對 extension-heavy schema、考慮用 streaming replication（physical）而非 logical

Cloud Vendor 對 Extension 的支援

Vendor	常見 extension 支援	限制
AWS RDS PostgreSQL	pg_stat_statements / pg_partman / pgvector / pg_repack	部分 version 限制 / 不能 install custom
AWS Aurora PostgreSQL	同 RDS、加 Aurora-specific	pg_repack 限版本
GCP Cloud SQL	標準 extension 廣支援	pg_cron / pgvector OK
Azure DB for PostgreSQL	廣泛支援 + Azure 整合	Citus（managed 即 Cosmos DB for PG）
Self-hosted	全部	自己維護

對 extension-heavy application、self-hosted PG 仍是必要選擇。Managed PG 適合 標準 extension workload。

何時用 PG extension 取代專業 DB

場景	用 extension 還是專業 DB
< 100M vector + RAG / semantic search	pgvector（單一 stack 省 ops）
大規模 vector search > 10M with high QPS	專業 vector DB（Pinecone / Qdrant）
Time-series < 100 TB	TimescaleDB
Time-series > 100 TB + high cardinality	專業 TS DB（InfluxDB / VictoriaMetrics）
GIS	PostGIS（業界標準）
Sharded < 10 TB + multi-tenant	Citus
Sharded > 100 TB	distributed SQL（CockroachDB / TiDB）
Scheduled job	pg_cron（簡單）/ Airflow（複雜）

對中小規模、PG + extension 是 簡化 stack 的有效路徑。規模超過時、專業 DB 仍是首選。

跟其他模組整合

Citus Distributed：extension 一例、可看 extension model
Query Optimization：pg_stat_statements + auto_explain 必用
Online Schema Change：pg_repack 是 extension
Declarative Partitioning：pg_partman 是 extension
SQL Features Baseline：extension 是 PG 結構性領先之一

TimescaleDB Deep Dive：Hypertable / Continuous Aggregate / Compression 把 PG 變 Time-Series DB

Tue, 19 May 2026 00:00:00 +0000

本文是 PostgreSQL overview 的 implementation-layer deep article。Overview 已說明 PG 在 OLTP 譜系的定位、本文聚焦 TimescaleDB extension — 用 PG 解 time-series workload 的路徑、跟 extension-ecosystem 是 單一 extension 細節 vs ecosystem 全景 的關係。

TimescaleDB 是 PG 的 Time-Series Specialization

TimescaleDB 不是獨立 DB、是 PG extension：

1CREATE EXTENSION timescaledb;

加完後、PG 多三個 time-series 專屬機制：

Hypertable：對 time column 自動 partition、應用層看是一張表
Continuous aggregate：incremental refresh 的 materialized view
Compression：對舊 chunk 壓縮（columnar-like format）

跟專業 time-series DB（InfluxDB / Prometheus / VictoriaMetrics）對比、TimescaleDB 的賣點不是「最快」而是「PG ecosystem 一致」：

維度	TimescaleDB	InfluxDB	Prometheus
Query 語言	標準 SQL	InfluxQL / Flux	PromQL
寫入效能	中（10-100K rows/s）	高（500K+ rows/s）	中（pull-based scrape）
壓縮	90%+（columnar compression）	高	高
Join	完整 SQL join	弱	不支援
跟既有 PG schema	同一個 DB、可 join	獨立	獨立
生態	完整 PG ecosystem	自家 ecosystem	自家 ecosystem
Open source	Apache 2.0（部分功能 TSL license）	MIT	Apache 2.0

何時選 TimescaleDB：

Application 已用 PG、不想多管一套 time-series DB
需要 join time-series 跟 application 表（user / device metadata）
不需 InfluxDB 級寫入速度（< 100K rows/s）
Team SQL 熟、PromQL / Flux 學習成本不想付

何時選 InfluxDB / Prometheus（不選 TimescaleDB）：

High-cardinality metric（10M+ unique series）— TSDB-purpose-built engine 在 cardinality 跟 retention 上比 hypertable 高效
Pull-based scrape model（Prometheus）跟 alerting / Grafana 生態深整合
PromQL operator（rate() / histogram_quantile()）對 metric query 比 SQL 直覺
TSL license 不能接受（TimescaleDB 部分功能在 Timescale License、不是純 Apache 2.0）
Operational team 已熟 InfluxDB / Prometheus、不想多學 PG 維運

Hypertable：自動 Time-based Partitioning

普通 PG 表變 hypertable：

1CREATE TABLE sensor_data (
2    time        TIMESTAMPTZ NOT NULL,
3    sensor_id   INTEGER NOT NULL,
4    temperature DOUBLE PRECISION,
5    humidity    DOUBLE PRECISION
6);
7
8-- 變 hypertable、按 time 自動 partition
9SELECT create_hypertable('sensor_data', 'time');

Hypertable 機制：

後台自動拆 chunk（child partition）by time interval（預設 7 天）
Application 看到的是 sensor_data 一張表、實際資料分散在 _timescaledb_internal._hyper_*_chunk 表
Query 自動 chunk pruning（只掃命中時間範圍的 chunk）

Chunk interval 選擇很關鍵：

Chunk interval	適用	問題
1 小時	高頻 metrics（每秒 100+ row）	Chunk 太多、catalog 膨脹
1 天	中高頻（每秒 10-100 row）	OK
7 天（預設）	中頻（每分鐘 row）	OK
30 天	低頻（每小時 row）	OK

通用原則：每個 chunk 25% RAM、超過退化 disk IO。Production 監控 chunk_size 跟 shared_buffers ratio 自動調。

Multi-dimensional hypertable（time + space partition）：

1-- 按 time + device_id 雙維 partition
2SELECT create_hypertable('sensor_data', 'time',
3    partitioning_column => 'sensor_id',
4    number_partitions => 16
5);

適用 sensor 數 1000+ 的 IoT workload、單 chunk 太大時用 space partition 拆。

Continuous Aggregate（CAGG）：Incremental Materialized View

普通 PG materialized view 是 全量重算、TimescaleDB CAGG 是 incremental refresh：

 1-- 1 小時粒度聚合
 2CREATE MATERIALIZED VIEW sensor_hourly
 3WITH (timescaledb.continuous) AS
 4SELECT
 5    time_bucket('1 hour', time) AS hour,
 6    sensor_id,
 7    avg(temperature) AS avg_temp,
 8    max(temperature) AS max_temp,
 9    min(temperature) AS min_temp,
10    count(*) AS sample_count
11FROM sensor_data
12GROUP BY hour, sensor_id;
13
14-- 加 refresh policy（每 30 分鐘 refresh 過去 1 天）
15SELECT add_continuous_aggregate_policy('sensor_hourly',
16    start_offset => INTERVAL '1 day',
17    end_offset => INTERVAL '30 minutes',
18    schedule_interval => INTERVAL '30 minutes'
19);

CAGG 機制：

記錄哪些 time bucket 已 materialize、哪些 stale
Refresh 時只重算 stale bucket、不全量
Query CAGG 自動 fallback 到原 hypertable 補最新資料（real-time aggregation）

CAGG vs 普通 MV 對比：

維度	TimescaleDB CAGG	普通 PG MV
Refresh 模式	Incremental	全量重算
Refresh 時間	秒級	表大時數十分鐘
Real-time fallback	自動補最新	不支援、需手動 union
Storage	多一份 aggregated	多一份 aggregated
Policy	內建排程	需 pg_cron / 外部排程

CAGG hierarchy（多層聚合）：

1-- 從 1 hour CAGG 再聚合到 1 day
2CREATE MATERIALIZED VIEW sensor_daily
3WITH (timescaledb.continuous) AS
4SELECT
5    time_bucket('1 day', hour) AS day,
6    sensor_id,
7    avg(avg_temp) AS daily_avg
8FROM sensor_hourly
9GROUP BY day, sensor_id;

Application query 不同時間範圍時自動命中對應粒度、不必每次掃原始資料。

Compression：把舊 Chunk 壓 90%+

舊 chunk 可以開啟 compression：

1-- 開啟 compression（必須先設定 segment by）
2ALTER TABLE sensor_data SET (
3    timescaledb.compress,
4    timescaledb.compress_segmentby = 'sensor_id',
5    timescaledb.compress_orderby = 'time DESC'
6);
7
8-- 自動壓縮 policy：7 天前 chunk 壓
9SELECT add_compression_policy('sensor_data', INTERVAL '7 days');

Compression 機制：

把 chunk 內 row 按 segmentby 分組
每組內按 orderby 排序後、把每 column 變成 columnar array
對 array 用 type-specific 壓縮（Gorilla for float / delta-of-delta for timestamp / dictionary for string）

實際壓縮率：

Workload	壓縮率
IoT sensor（重複值多）	95-98%
Application metrics	90-95%
Trade tick（隨機浮點）	70-85%
Log line（高 cardinality string）	50-70%

Compression 限制（重要）：

壓縮後 chunk 不能 UPDATE / DELETE 單 row（要先 decompress）
壓縮後 chunk 不能加 column（要 decompress 所有 chunk）
壓縮後 chunk 只能 append new row、不能改舊 row
DDL 變更（加 column / 改 index）需 decompress

實務：compression 是 write-once cold data 的工具、active OLTP chunk 不開。

Retention Policy：自動刪舊資料

1-- 1 年前 chunk 自動刪
2SELECT add_retention_policy('sensor_data', INTERVAL '1 year');

Retention drop 整個 chunk（不是 DELETE row）、O(1) 操作、不產生 bloat。

CAGG 有獨立 retention：

1-- 原始資料只留 30 天、aggregated 留 5 年
2SELECT add_retention_policy('sensor_data', INTERVAL '30 days');
3SELECT add_retention_policy('sensor_hourly', INTERVAL '5 years');

這是 TimescaleDB 跟普通 PG partitioning 最大的價值差 — 普通 PG 要自己寫 cron drop partition、TimescaleDB policy 內建。

5 個 Production 踩雷

Case 1：Chunk size 不對、catalog 膨脹

情境：sensor 每秒寫 10 row、chunk_interval 設 1 小時、一年產 8760 chunk、pg_class 撐到 200 萬 row、planner 變慢。

修法：

Chunk 數量上限 ~10000、超過 catalog overhead 出現
重設 chunk_interval：SELECT set_chunk_time_interval('sensor_data', INTERVAL '1 day');
已存在 chunk 不會自動 merge、要靠 retention drop 自然消化

Case 2：CAGG refresh 落後 real-time

情境：CAGG refresh policy 每 1 小時跑、application 期待「即時 dashboard」、看到的數字落後 1 小時。

修法：

縮短 schedule_interval（5 分鐘）
用 real-time aggregation（預設 ON、CAGG 自動 union 原始資料）
確認 materialized_only = false（real-time aggregation 開啟）

1ALTER MATERIALIZED VIEW sensor_hourly SET (timescaledb.materialized_only = false);

Case 3：Compression 後想 UPDATE

情境：發現某個歷史 row 數值錯、想 UPDATE、報錯 cannot update/delete from compressed chunk。

修法：

1-- 找到該 chunk 並 decompress
2SELECT decompress_chunk(c) FROM show_chunks('sensor_data',
3    older_than => INTERVAL '7 days') c WHERE c::text LIKE '%_5_chunk';
4
5-- UPDATE 完再 compress 回去
6UPDATE sensor_data SET temperature = 22.5 WHERE ...;
7SELECT compress_chunk(...);

或設計階段就避免 — compression 用在 immutable data、有可能改的留未壓。

Case 4：Hypertable 不能加 FK 到 non-hypertable

情境：想對 sensor_data 加 FK 到 sensors 表、報錯 foreign key constraints with hypertables are not supported。

修法：

Application 層維護 referential integrity
或反過來：sensors 可以 FK 到 hypertable（特定方向支援）
TimescaleDB 2.11+ 部分支援 FK from hypertable、但限制多

Case 5：TimescaleDB 跟 PG 主版本對齊

情境：PG 升級 14 → 16、TimescaleDB extension 沒對應升級、PG 啟動 fail。

TimescaleDB 跟 PG 版本對齊矩陣：

TimescaleDB	支援 PG version	備註
2.11+	13, 14, 15
2.13+	13, 14, 15, 16	加 PG 16 支援
2.15.x	13, 14, 15, 16	最後支援 PG 13 的 minor
2.16+	14, 15, 16	PG 13 drop
2.17+	14, 15, 16, 17	PG 17 加入（需 17.2+ binary 對齊）
2.18+	14, 15, 16, 17	PG 17 完整支援
2.23+	14, 15, 16, 17, 18	PG 18 加入

修法：

升 PG 前先升 TimescaleDB 到支援目標 PG 版本的 extension
Production 升級順序：TimescaleDB minor upgrade → PG major upgrade → TimescaleDB final upgrade
Cloud managed（Timescale Cloud）自動處理

跟 PG 原生 Partitioning 對比

PG 10+ 有 declarative partitioning、不一定要 TimescaleDB：

維度	TimescaleDB hypertable	PG declarative partitioning
自動建 chunk	是	否（需手動或 pg_partman）
Chunk pruning	自動	自動（需 partition key）
Retention 內建	是	否（pg_partman 或自寫 cron）
Compression	內建 columnar	否
Continuous aggregate	內建	否（自寫 incremental refresh）
跨 chunk index	統一 management	Per-partition index
Cardinality limit	10000+ chunk OK	1000+ partition 就慢

何時用原生 partitioning（不用 TimescaleDB）：

不需要 compression / CAGG
Partition 數 < 1000
已用 pg_partman 不想換
公司禁用 TSL license（TimescaleDB 部分功能受限）

何時用 TimescaleDB：

高頻 time-series（compression 必要）
需要 CAGG（手寫 incremental MV 成本高）
Partition 數 > 1000
IoT / metrics / observability workload

詳細 partitioning 機制看 declarative-partitioning。

下一步

看 extension-ecosystem 了解其他 PG 擴展選項
回 PostgreSQL overview 看全圖

pgvector Deep Dive：HNSW / IVFFlat 取捨跟跟專業 Vector DB 對比

Tue, 19 May 2026 00:00:00 +0000

本文是 PostgreSQL overview 的 implementation-layer deep article。Overview 已說明 PG 在 OLTP 譜系的定位、本文聚焦 pgvector extension — 用 PG 解 vector search workload 的路徑、是 extension-ecosystem 內最受關注的 extension。

pgvector 是 PG 變 Vector DB 的最短路徑

pgvector 加兩件事：

 1CREATE EXTENSION vector;
 2
 3-- 加 vector column（dimension 必須事先決定）
 4CREATE TABLE documents (
 5    id SERIAL PRIMARY KEY,
 6    content TEXT,
 7    embedding vector(1536)  -- OpenAI ada-002 維度
 8);
 9
10-- 三種 distance operator
11SELECT * FROM documents ORDER BY embedding <-> '[0.1, 0.2, ...]' LIMIT 10;  -- L2
12SELECT * FROM documents ORDER BY embedding <#> '[0.1, 0.2, ...]' LIMIT 10;  -- inner product
13SELECT * FROM documents ORDER BY embedding <=> '[0.1, 0.2, ...]' LIMIT 10;  -- cosine

Operator 對應：

Operator	意義	適用
`<->`	L2 distance	通用、空間距離
`<#>`	Negative inner product	normalized vector、cosine 等價
`<=>`	Cosine distance	embedding 比較最常用

對 OpenAI / Cohere / sentence-transformers embedding、通常用 <=>（cosine）— embedding model 訓練時是 cosine objective。

ANN Index 是 Vector Search 的核心

不加 index 的 ORDER BY embedding <=> ? 是 full scan：

100K row、1536 dim、每 query ~2-5s（不可用）
1M row 直接超時

pgvector 提供兩種 Approximate Nearest Neighbor（ANN）index：

Index	Build 時間	Query 時間	Recall@10	Memory cost	Update 行為
IVFFlat	快（分鐘級）	中（10-100ms）	90-95%	中（lists 數量）	Insert OK、需重建保持 recall
HNSW	慢（小時級）	快（1-10ms）	95-99%	高（2-4x 資料）	Insert OK、graph 漸進維護

選 IVFFlat 的場景：

Embedding 量 < 1M
Build 時間敏感（CI / batch 環境）
Memory 緊
接受重建 cost（每月 / 每季）

選 HNSW 的場景：

Embedding 量 1M-100M
Query latency < 50ms 要求
Memory 充足
Insert 量穩定（不會爆炸性增長）

IVFFlat：分 Cluster 找鄰居

IVFFlat 機制：

Build：跑 k-means 把所有 vector 分 lists 個 cluster
Query：先找最近的 probes 個 cluster、再在這些 cluster 內找 nearest neighbor

1-- Build（lists 數量重要）
2CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);
3
4-- Query 時調 probes 換 recall vs latency
5SET ivfflat.probes = 10;
6SELECT * FROM documents ORDER BY embedding <=> ? LIMIT 10;

Lists 跟 probes sizing 規則（pgvector 官方建議）：

Row count	lists 建議	probes 建議
< 1M	`rows / 1000`	`sqrt(lists)`
> 1M	`sqrt(rows)`	`sqrt(lists)`

實務：100K row → lists=100 / probes=10、1M row → lists=1000 / probes=32。

IVFFlat 的 recall drift：cluster 是 build 時固定的、新 insert 的 vector 進入「最近 cluster」、但隨資料分布改變、cluster center 可能不再代表性、recall 隨時間下降。

修法：定期 REINDEX INDEX CONCURRENTLY ...（每月 / 每 100K 新 row）。

HNSW：Multi-level Graph 找鄰居

HNSW（Hierarchical Navigable Small World）機制：

多層 graph、上層稀疏、下層密集
Query 從上層 entry point 開始、逐層找近鄰、最後在底層精細搜尋
Insert 漸進維護 graph、不必重建

1-- Build（兩個關鍵參數）
2CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops)
3WITH (m = 16, ef_construction = 64);
4
5-- Query 時調 ef_search
6SET hnsw.ef_search = 100;
7SELECT * FROM documents ORDER BY embedding <=> ? LIMIT 10;

參數含義：

參數	含義	預設	Trade-off
`m`	每 node 最多鄰居數	16	大 → recall 高、memory 多
`ef_construction`	Build 時 graph 質量參數	64	大 → build 慢、graph 質量好
`ef_search`	Query 時搜尋範圍	40	大 → recall 高、latency 高

Build cost 真實量級（1M vector × 1536 dim）：

配置	Build 時間	Memory	Recall@10
m=8, ef_construction=32	30 min	4GB	92%
m=16, ef_construction=64	2 hour	8GB	96%
m=32, ef_construction=200	8 hour	16GB	98%

Production 多數選中間 m=16, ef_construction=64、recall / cost 平衡。

Hybrid Search：Vector + Filter 一起

Vector search 加 SQL filter 是 pgvector 比專業 vector DB 強的場景：

1-- Vector + metadata filter
2SELECT * FROM documents
3WHERE category = 'tech' AND created_at > '2025-01-01'
4ORDER BY embedding <=> '[0.1, 0.2, ...]'
5LIMIT 10;

但這裡有個 pgvector 的踩雷：filter 跟 ANN index 互動有兩種模式：

Pre-filter（planner 選）：先 filter 出符合條件的 row、再對 subset 跑 vector ordering → 不用 ANN index、可能慢
Post-filter：用 ANN index 找 top-N、再 filter、可能 N 不夠補

pgvector 0.8+（2024-10 release）加入 iterative index scan：HNSW / IVFFlat 一邊掃 graph 一邊 filter、效能比 pre-filter 好 5-10x。0.7+（2024-07）加 halfvec / binary quantization / parallel HNSW build。

實務：filter selectivity 高（< 10%）時、考慮對 filter column 加 index 走 pre-filter；selectivity 低（> 50%）走 iterative scan。

Quantization 跟 Dimension Reduction

1536 dim float32 vector 一筆 6KB、1M row 6GB、加 HNSW index 後 ~20GB。Memory 緊時的省法：

Half-precision（pgvector 0.7+）

1CREATE TABLE documents (
2    embedding halfvec(1536)
3);

halfvec 是 float16、storage 減半、recall 損失通常 < 1%。

Binary quantization

1-- 把每維壓成 1 bit
2CREATE INDEX ON documents USING hnsw (embedding bit_hamming_ops);

Recall 下降明顯（85-90%）、但 storage 1/32、適合「先粗篩再 rerank」hybrid pipeline。

Dimension reduction

訓練 PCA / Matryoshka model 把 1536 dim 降到 256-512 dim、recall 通常損失 < 3%、storage 1/3-1/6。

5 個 Production 踩雷

Case 1：Dimension 超 2000 限制

情境：要用 OpenAI text-embedding-3-large（3072 dim）、CREATE TABLE ... embedding vector(3072) 報錯。

pgvector vector type 上限 2000 dim（IVFFlat / HNSW index 限制）。

修法：

改用 halfvec（pgvector 0.7+ 支援 4000 dim）
用 Matryoshka 截斷到 2000 dim 以下
換 embedding model（OpenAI text-embedding-3-small 1536 dim / 可截斷到 256-1024）

Case 2：HNSW build 太慢

情境：1M row build HNSW、跑 8 小時、blocking production。

修法：

1-- 用 CONCURRENTLY 不 block
2CREATE INDEX CONCURRENTLY ON documents USING hnsw (...);
3
4-- 開 maintenance_work_mem
5SET maintenance_work_mem = '8GB';
6
7-- 開 parallel
8SET max_parallel_maintenance_workers = 7;

仍慢的話、考慮：

切分 batch insert + index（適合 read-heavy）
用 IVFFlat 短期上線、之後再切 HNSW
改用 cloud managed pgvector（提供更大 instance）

Case 3：IVFFlat 不重建 recall 漂移

情境：IVFFlat build 時資料 100K、現在 500K、新資料 recall 從 92% 降到 75%、user 抱怨「找不到相關文件」。

修法：

Monitor recall：定期跑 ground-truth eval（brute-force 對比）
設定 reindex policy：每 100K 新 row 或每月 reindex
換 HNSW：insert 漸進維護、不需 reindex（trade-off：build 更慢）

情境：query WHERE user_id = ? ORDER BY embedding <=> ?、user_id 高選擇性（1/1M）、planner 選 vector index scan、掃到 top-K 全不符 user_id、補抓無止盡。

修法：

EXPLAIN 看 planner 選 pre-filter 還是 vector-first
對 user_id 加 B-tree index、強 planner pre-filter（hint 不容易、用 statistics）
pgvector 0.8+ 用 iterative scan、自動處理
設計 schema：高選擇性 filter（user_id）建議走 pre-filter；低選擇性（category）走 iterative

Case 5：Memory budget 沒抓

情境：1M vector × 1536 dim × HNSW（m=16）= ~12GB index、shared_buffers 8GB、index 不在 cache、每 query disk IO、latency 100ms+。

修法：

算 vector + index memory：row × dim × 4 bytes × (1 + index_overhead)
shared_buffers 至少能放 hot index portion
不行就降 dim（halfvec）/ 升 instance / 拆 sharded

跟專業 Vector DB 對比

維度	pgvector	Pinecone	Weaviate	Milvus
Query 介面	SQL	REST/gRPC API	GraphQL / REST	gRPC
Recall	95-99%（HNSW）	95-99%	95-99%	95-99%
Throughput	中（PG 限制）	高	高	高
Hybrid search	強（完整 SQL）	中（metadata filter）	中	中
跟既有 PG 整合	完美（同 DB join）	需 sync	需 sync	需 sync
Multi-tenant	row-level（PG 一致）	內建	內建	partition
Open source	是	否	是	是
Operational cost	跟 PG 一樣（管 PG 即可）	Managed-only	需自管或 cloud	需自管或 cloud
Scale 上限	10M-100M vector	10B+	1B+	10B+

選 pgvector 的場景：

Application 已用 PG、不想多管系統
Vector 量 < 100M
需要 join vector + relational
Team SQL 熟、不想學 API SDK
Cost 敏感（managed Pinecone 1M vector 月 ~$70+）

選專業 vector DB 的場景：

Vector 量 > 5-20M（依 dim / QPS / recall 要求、pgvector 在這個級別 + 高 QPS 已開始痛、不必撐到 100M 才換）
純 vector workload（沒 relational integration）
需要 multi-tenant SaaS
Throughput 要求極高（> 10K QPS）
不想自管 HNSW build / memory budget / recall drift（managed Pinecone 把這層 ops 轉嫁、cost 換 ops 時間）
需要 dim > 2000（pgvector vector type 限制、halfvec 可到 4000、再大需 dimension reduction）

下一步

看 extension-ecosystem 探索其他 PG 擴展可能
回 PostgreSQL overview 看全圖

PostGIS Deep Dive：Geometry / Geography 型別、GiST 空間索引跟 ST_* 函式生態

Tue, 19 May 2026 00:00:00 +0000

本文是 PostgreSQL overview 的 implementation-layer deep article。Overview 已說明 PG 在 OLTP 譜系的定位、本文聚焦 PostGIS extension — PG 變 GIS DB 的標配、跟 extension-ecosystem 是 單一 extension 細節 vs ecosystem 全景 的關係。

PostGIS 是 PG 的 GIS Specialization

PostGIS 是 PG 最成熟的 extension 之一（2001 年起、25 年歷史）、產業地位等同 OracleSpatial / SQL Server geography：

1CREATE EXTENSION postgis;

加完後 PG 多兩件事：

空間型別：geometry（平面）/ geography（地球曲面）/ raster（柵格）
1000+ 函式：ST_Distance / ST_Within / ST_Buffer / ST_Intersects 等

用 PostGIS 解的典型 workload：

「離我最近的 N 家店」（k-NN）
「半徑 1km 內的所有 POI」（radius query）
「兩個 polygon 是否重疊」（intersection）
「polyline 總長度」（measurement）
「行政區包含哪些 point」（containment）

Geometry vs Geography：選錯付學費

PostGIS 提供兩種空間型別、用途完全不同：

維度	`geometry`	`geography`
座標系統	平面（笛卡兒）	地球曲面（spheroid）
距離單位	座標系統決定（meter / degree）	永遠 meter
跨經度 180°	不處理	自動處理
適用範圍	小區域（單一城市 / 國家）	全球
函式覆蓋	1000+ 函式	約 300 函式
效能	快（平面計算）	慢 2-5x（球面計算）
Index 行為	GiST 直接	GiST 直接

選 geography 的場景：

全球範圍 application（跨國 / 跨大陸）
距離精準度要求高（球面比平面誤差小）
不需要複雜空間運算（geography 函式較少）

選 geometry 的場景：

單一城市 / 國家內 application
需要完整 ST_* 函式（90% 函式只支援 geometry）
效能敏感

實務多數 production 選 geometry + 適合的 SRID（用 local projection）— 既快又精準。

SRID 跟 Projection：為什麼 4326 vs 3857 是 GIS 第一課

SRID（Spatial Reference System Identifier）定義「座標數字怎麼解讀」：

SRID	名稱	適用
4326	WGS 84（GPS）	經緯度、最常見、Google Maps API
3857	Web Mercator	Web tile map（OpenStreetMap）
3826	TWD97 / TM2 zone 121	台灣 local projection、米為單位
2272	NAD83 / Pennsylvania	美國 state plane（各州不同）

為什麼選 local projection（3826）而不是經緯度（4326）：

經緯度單位是度、不是距離 — ST_Distance 直接算出來是「度」、不是「米」
距離計算需 ST_DistanceSphere 或 geography cast、計算 cost 高
Local projection 是「平面投影」、ST_Distance 直接是米、ST_Area 直接是平方米

 1-- 4326 經緯度直接算 → 結果不是米
 2SELECT ST_Distance(
 3    ST_SetSRID(ST_MakePoint(121.5654, 25.0330), 4326),  -- 台北 101
 4    ST_SetSRID(ST_MakePoint(121.5170, 25.0478), 4326)   -- 台北車站
 5);  -- ~0.05（這是「度」）
 6
 7-- 轉 3826（台灣本地投影）才是米
 8SELECT ST_Distance(
 9    ST_Transform(ST_SetSRID(ST_MakePoint(121.5654, 25.0330), 4326), 3826),
10    ST_Transform(ST_SetSRID(ST_MakePoint(121.5170, 25.0478), 4326), 3826)
11);  -- ~5300（米）
12
13-- 或用 geography cast
14SELECT ST_Distance(
15    ST_SetSRID(ST_MakePoint(121.5654, 25.0330), 4326)::geography,
16    ST_SetSRID(ST_MakePoint(121.5170, 25.0478), 4326)::geography
17);  -- ~5300（米）

典型 schema 設計（台灣 application）：

 1CREATE TABLE pois (
 2    id SERIAL PRIMARY KEY,
 3    name TEXT,
 4    -- 儲存 4326（跟 Google Maps API 對齊）
 5    location_4326 geometry(Point, 4326),
 6    -- 預計算 3826（給距離 / 面積 query 用）
 7    location_3826 geometry(Point, 3826) GENERATED ALWAYS AS
 8        (ST_Transform(location_4326, 3826)) STORED
 9);
10
11CREATE INDEX idx_pois_location_3826 ON pois USING GIST (location_3826);

GiST 空間索引：R-tree 的 PG 實作

PostGIS 用 PG 內建 GiST 做空間索引（內部是 R-tree 變體）：

1CREATE INDEX idx_pois_geom ON pois USING GIST (location_3826);

GiST 對空間 query 加速的場景：

 1-- 範圍 query（box overlap）
 2SELECT * FROM pois
 3WHERE location_3826 && ST_MakeEnvelope(290000, 2760000, 305000, 2775000, 3826);
 4
 5-- 半徑 query（用 ST_DWithin 才走 index）
 6SELECT * FROM pois
 7WHERE ST_DWithin(location_3826, ST_SetSRID(ST_MakePoint(300000, 2770000), 3826), 1000);
 8
 9-- k-NN（PostGIS 2.0+ <-> operator）
10SELECT id, name, location_3826 <-> ST_SetSRID(ST_MakePoint(300000, 2770000), 3826) AS dist
11FROM pois
12ORDER BY location_3826 <-> ST_SetSRID(ST_MakePoint(300000, 2770000), 3826)
13LIMIT 10;

index 用沒用到的關鍵：

Query 寫法	走 index？
`ST_DWithin(a, b, dist)`	是
`ST_Distance(a, b) < dist`	否（必 full scan）
`a && bbox`	是
`ST_Intersects(a, bbox)`	是
`a <-> b ORDER BY ... LIMIT n`	是（k-NN）
`ST_Equals(a, b)`	否

Production 寫法守則：能用 ST_DWithin 就不用 ST_Distance(...) < ?、語意一樣但 index 行為差很多。

ST_* 函式生態：產業級全套

PostGIS 1000+ 函式分類（典型用到的）：

類別	代表函式
建構	`ST_MakePoint` / `ST_MakeLine` / `ST_MakePolygon`
關係判定	`ST_Intersects` / `ST_Within` / `ST_Contains` / `ST_Touches`
距離 / 大小	`ST_Distance` / `ST_DWithin` / `ST_Length` / `ST_Area`
變換	`ST_Buffer` / `ST_Union` / `ST_Difference` / `ST_Intersection`
投影	`ST_Transform` / `ST_SetSRID`
格式轉換	`ST_AsGeoJSON` / `ST_AsKML` / `ST_AsText` / `ST_GeomFromGeoJSON`
路徑 / 拓樸	`ST_ShortestLine` / `ST_LineMerge`
聚合	`ST_Collect` / `ST_ConvexHull` / `ST_Centroid`
簡化	`ST_Simplify` / `ST_SimplifyPreserveTopology`

Web tile 場景典型 query：

1-- 給定 z/x/y tile、找這個 tile 內的所有 POI
2SELECT id, name, ST_AsMVTGeom(location_3857, ST_TileEnvelope(z, x, y)) AS geom
3FROM pois
4WHERE location_3857 && ST_TileEnvelope(z, x, y);

ST_AsMVTGeom + ST_AsMVT 直接產 Mapbox Vector Tile binary、給前端 Leaflet / Mapbox GL JS 用。

5 個 Production 踩雷

Case 1：Geometry 用錯 SRID

情境：app 寫入時用 4326、query 時用 3826 ST_Transform、忘記給某個 column 設 SRID、index 失效。

修法：

 1-- 確認 SRID
 2SELECT ST_SRID(location) FROM pois LIMIT 1;
 3
 4-- 強 type 約束（column type 寫死 SRID）
 5ALTER TABLE pois ALTER COLUMN location TYPE geometry(Point, 4326)
 6USING ST_SetSRID(location, 4326);
 7
 8-- Check constraint 防錯
 9ALTER TABLE pois ADD CONSTRAINT chk_location_srid
10CHECK (ST_SRID(location) = 4326);

Case 2：Geography 不能用所有 ST_* 函式

情境：用 geography 想跑 ST_Buffer、報錯或結果不對。

ST_Buffer 對 geography 走 spheroid 近似、邊界 case 結果跟 geometry 不一致；很多函式（ST_Voronoi / ST_Delaunay 等）只支援 geometry。

修法：

簡單距離 query 用 geography
複雜空間運算用 geometry + 適合 projection
不確定哪些函式支援 geography、看 PostGIS docs Geography Support Functions 清單

Case 3：GiST index 不對 ST_Distance 生效

情境：query ST_Distance(location, ?) < 1000、EXPLAIN 顯示 full scan、加 index 也沒用。

ST_Distance 算完才 filter、planner 沒辦法用 GiST。

修法：

改 ST_DWithin(location, ?, 1000) — 語意一樣、會走 GiST
確認 index 是對 被 query 的 column 建的（不是 transform 後的 expression）

Case 4：CLUSTER on geom 後 BRIN 失效

情境：對 pois 跑 CLUSTER pois USING idx_pois_geom 想加速空間查、但同時對 created_at 用 BRIN index、BRIN 完全失效。

CLUSTER 重組 physical order 跟 GiST 對齊、created_at physical order correlation 從 1.0 變 0.0、BRIN range 沒選擇性。

修法：

不要 CLUSTER 大表（一次性、影響其他 column）
換 partition by time + GiST per-partition（取兩者）
看 index-selection 的 BRIN 段

Case 5：EWKB vs WKB 跨工具相容

情境：用 PostGIS export 給其他 GIS 工具（QGIS / Shapely / ogr2ogr）、resort 抱怨格式不對。

PostGIS 內部用 EWKB（Extended Well-Known Binary）— 多帶 SRID。多數 GIS 工具讀 WKB（標準）。

修法：

1-- Export 標準 WKB
2SELECT ST_AsBinary(geom) FROM pois;
3
4-- 或 GeoJSON（跨工具最相容）
5SELECT ST_AsGeoJSON(geom) FROM pois;
6
7-- 或 Shapefile via ogr2ogr
8-- ogr2ogr -f "ESRI Shapefile" output.shp PG:"..." -sql "SELECT * FROM pois"

跟專業 GIS DB 對比

維度	PostGIS	Oracle Spatial	SQL Server geography	MongoDB GeoJSON
函式覆蓋	1000+	800+	200+	~20
Raster 支援	是	是	否	否
Topology	是（PostGIS Topology）	是	否	否
3D 支援	是（PostGIS SFCGAL）	是	部分	否
License	GPL	商業	商業	開源
Tile generation	內建（ST_AsMVT）	否	否	否
跟 PG 整合	完美	跟 Oracle 一體	跟 SQL Server 一體	獨立
工業界使用	OpenStreetMap / 各國國土測繪	大型企業	Microsoft 生態	簡單 location app

選 PostGIS 的場景（90% GIS workload）：

Application 已用 PG
需要完整 GIS 函式生態（路網 / 等高線 / 流域分析）
開源 / cost 敏感
跟 OGR / GDAL / QGIS 互通

選專業 GIS DB 的場景：

已綁定 Oracle / SQL Server license
極專業 GIS（3D 城市模型 / LIDAR / GPU 加速）
純 location app 不需 relational（MongoDB GeoJSON 足夠）

下一步

看 extension-ecosystem 探索其他 PG 擴展可能
回 PostgreSQL overview 看全圖

Extension on Tarragon

1.6 延伸方向：Web UI、coding agent、產圖

本章目標

延伸方向一：ChatGPT 風格 Web UI（Open WebUI）

主流選擇：Open WebUI

延伸方向二：Coding Agent（aider、Cline 等）

延伸方向三：產圖（Stable Diffusion、Flux 等）

給讀者的延伸順序

不在本章範圍內的延伸

下一步

PostgreSQL Extension Ecosystem：把 PG 變成 vector DB / time-series / sharded 的 plugin 生態

Extension 不只是 plugin、是產品線擴張

Extension Lifecycle

6 個 Production-Critical Extension

1. pg_stat_statements — Query stats（必裝）

2. pg_partman — 自動 partition lifecycle

3. pg_repack — Online table rewrite

4. pgvector — Vector similarity search

5. TimescaleDB — Time-series 擴展

6. PostGIS — GIS extension

其他常用 extension

5 個 Production 踩雷

1. Extension version 跟 PG version 對齊

2. Managed PG 限制 extension 列表

3. Extension upgrade order

4. shared_preload_libraries 衝突

5. Extension 跟 logical replication 互動

Cloud Vendor 對 Extension 的支援

何時用 PG extension 取代專業 DB

跟其他模組整合

相關連結

TimescaleDB Deep Dive：Hypertable / Continuous Aggregate / Compression 把 PG 變 Time-Series DB

TimescaleDB 是 PG 的 Time-Series Specialization

Hypertable：自動 Time-based Partitioning

Continuous Aggregate（CAGG）：Incremental Materialized View

Compression：把舊 Chunk 壓 90%+

Retention Policy：自動刪舊資料

5 個 Production 踩雷

Case 1：Chunk size 不對、catalog 膨脹

Case 2：CAGG refresh 落後 real-time

Case 3：Compression 後想 UPDATE

Case 4：Hypertable 不能加 FK 到 non-hypertable

Case 5：TimescaleDB 跟 PG 主版本對齊

跟 PG 原生 Partitioning 對比

相關連結

下一步

pgvector Deep Dive：HNSW / IVFFlat 取捨跟跟專業 Vector DB 對比

pgvector 是 PG 變 Vector DB 的最短路徑

ANN Index 是 Vector Search 的核心

IVFFlat：分 Cluster 找鄰居

HNSW：Multi-level Graph 找鄰居

Hybrid Search：Vector + Filter 一起

Quantization 跟 Dimension Reduction

Half-precision（pgvector 0.7+）

Binary quantization

Dimension reduction

5 個 Production 踩雷

Case 1：Dimension 超 2000 限制

Case 2：HNSW build 太慢

Case 3：IVFFlat 不重建 recall 漂移

Case 4：Hybrid search filter selectivity 沒設計

Case 5：Memory budget 沒抓

跟專業 Vector DB 對比

相關連結

下一步

PostGIS Deep Dive：Geometry / Geography 型別、GiST 空間索引跟 ST_* 函式生態

PostGIS 是 PG 的 GIS Specialization

Geometry vs Geography：選錯付學費

SRID 跟 Projection：為什麼 4326 vs 3857 是 GIS 第一課

GiST 空間索引：R-tree 的 PG 實作

ST_* 函式生態：產業級全套

5 個 Production 踩雷

Case 1：Geometry 用錯 SRID

Case 2：Geography 不能用所有 ST_* 函式

Case 3：GiST index 不對 ST_Distance 生效

Case 4：CLUSTER on geom 後 BRIN 失效

Case 5：EWKB vs WKB 跨工具相容

跟專業 GIS DB 對比

相關連結

下一步

4. `shared_preload_libraries` 衝突