<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>PostgreSQL Hands-on 操作路線 on Tarragon</title><link>https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/hands-on/</link><description>Recent content in PostgreSQL Hands-on 操作路線 on Tarragon</description><generator>Hugo -- gohugo.io</generator><language>zh-TW</language><copyright>Tarragon (CC BY 4.0)</copyright><lastBuildDate>Fri, 22 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/hands-on/index.xml" rel="self" type="application/rss+xml"/><item><title>PostgreSQL Connection Pool Lab</title><link>https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/hands-on/connection-pool-lab/</link><pubDate>Fri, 22 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/hands-on/connection-pool-lab/</guid><description>&lt;p>PostgreSQL connection pool lab 的核心責任是讓讀者看到 connection pressure 如何從 application pool 傳到 PostgreSQL backend process。這篇承接 &lt;a href="../../connection-scaling/">Connection Scaling&lt;/a> 與 &lt;a href="../../pgbouncer-config/">PgBouncer Config&lt;/a>。&lt;/p>
&lt;p>本文的驗收標準是：你能比較 direct connection 與 PgBouncer transaction pooling，取得 &lt;code>pg_stat_activity&lt;/code>、PgBouncer &lt;code>SHOW POOLS&lt;/code>、latency / error sample 與 failure note。&lt;/p>
&lt;h2 id="baseline-direct-connections">Baseline Direct Connections&lt;/h2>
&lt;p>Baseline direct connections 的核心責任是先看 application 直連 PostgreSQL 時的 backend 數。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="nb">export&lt;/span> &lt;span class="nv">DATABASE_URL&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;postgres://lab_admin:lab_admin_pw@localhost:54329/appdb?sslmode=disable&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">psql &lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="nv">$DATABASE_URL&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span> -c &lt;span class="s2">&amp;#34;SELECT count(*) FROM pg_stat_activity WHERE datname = current_database();&amp;#34;&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>用多個 terminal 或簡單 workload 產生 idle connection：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="k">for&lt;/span> i in &lt;span class="m">1&lt;/span> &lt;span class="m">2&lt;/span> &lt;span class="m">3&lt;/span> &lt;span class="m">4&lt;/span> 5&lt;span class="p">;&lt;/span> &lt;span class="k">do&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl"> psql &lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="nv">$DATABASE_URL&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span> -c &lt;span class="s2">&amp;#34;SELECT pg_sleep(10);&amp;#34;&lt;/span> &lt;span class="p">&amp;amp;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">&lt;span class="k">done&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">psql &lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="nv">$DATABASE_URL&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span> -c &lt;span class="s2">&amp;#34;SELECT state, count(*) FROM pg_stat_activity WHERE datname = current_database() GROUP BY state;&amp;#34;&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>這一步證明每個 client session 會占用 PostgreSQL backend process。&lt;/p>
&lt;h2 id="add-pgbouncer">Add PgBouncer&lt;/h2>
&lt;p>Add PgBouncer 的核心責任是把 client connection 與 server connection 拆開。以下 compose fragment 可加入 local lab：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-yaml" data-lang="yaml">&lt;span class="line">&lt;span class="ln"> 1&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">pgbouncer&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 2&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">image&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">edoburu/pgbouncer:latest&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 3&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">environment&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 4&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">DB_HOST&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">postgres&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 5&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">DB_USER&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">lab_admin&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 6&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">DB_PASSWORD&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">lab_admin_pw&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 7&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">DB_NAME&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">appdb&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 8&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">POOL_MODE&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">transaction&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 9&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">MAX_CLIENT_CONN&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="m">100&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">10&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">DEFAULT_POOL_SIZE&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="m">5&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">11&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">ports&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">12&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>- &lt;span class="s2">&amp;#34;64329:5432&amp;#34;&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>啟動後設定 pooler URL：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="nb">export&lt;/span> &lt;span class="nv">POOL_URL&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;postgres://lab_admin:lab_admin_pw@localhost:64329/appdb?sslmode=disable&amp;#34;&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="compare-pool-behavior">Compare Pool Behavior&lt;/h2>
&lt;p>Compare pool behavior 的核心責任是觀察 client 多、server 少的效果。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="k">for&lt;/span> i in &lt;span class="k">$(&lt;/span>seq &lt;span class="m">1&lt;/span> 20&lt;span class="k">)&lt;/span>&lt;span class="p">;&lt;/span> &lt;span class="k">do&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl"> psql &lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="nv">$POOL_URL&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span> -c &lt;span class="s2">&amp;#34;SELECT pg_sleep(1);&amp;#34;&lt;/span> &lt;span class="p">&amp;amp;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">&lt;span class="k">done&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">psql &lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="nv">$DATABASE_URL&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span> -c &lt;span class="s2">&amp;#34;SELECT state, count(*) FROM pg_stat_activity WHERE datname = current_database() GROUP BY state;&amp;#34;&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>再進 PgBouncer admin console，實際命令依 image 設定調整：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">psql &lt;span class="s2">&amp;#34;postgres://lab_admin:lab_admin_pw@localhost:64329/pgbouncer?sslmode=disable&amp;#34;&lt;/span> -c &lt;span class="s2">&amp;#34;SHOW POOLS;&amp;#34;&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>驗收重點是：client workload 增加時，PostgreSQL backend 數量被 pool size 控制，排隊發生在 pooler 層。&lt;/p>
&lt;h2 id="pool-exhaustion">Pool Exhaustion&lt;/h2>
&lt;p>Pool exhaustion 的核心責任是看過載時的錯誤與等待。&lt;/p></description><content:encoded><![CDATA[<p>PostgreSQL connection pool lab 的核心責任是讓讀者看到 connection pressure 如何從 application pool 傳到 PostgreSQL backend process。這篇承接 <a href="../../connection-scaling/">Connection Scaling</a> 與 <a href="../../pgbouncer-config/">PgBouncer Config</a>。</p>
<p>本文的驗收標準是：你能比較 direct connection 與 PgBouncer transaction pooling，取得 <code>pg_stat_activity</code>、PgBouncer <code>SHOW POOLS</code>、latency / error sample 與 failure note。</p>
<h2 id="baseline-direct-connections">Baseline Direct Connections</h2>
<p>Baseline direct connections 的核心責任是先看 application 直連 PostgreSQL 時的 backend 數。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="nb">export</span> <span class="nv">DATABASE_URL</span><span class="o">=</span><span class="s2">&#34;postgres://lab_admin:lab_admin_pw@localhost:54329/appdb?sslmode=disable&#34;</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">psql <span class="s2">&#34;</span><span class="nv">$DATABASE_URL</span><span class="s2">&#34;</span> -c <span class="s2">&#34;SELECT count(*) FROM pg_stat_activity WHERE datname = current_database();&#34;</span></span></span></code></pre></div><p>用多個 terminal 或簡單 workload 產生 idle connection：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">for</span> i in <span class="m">1</span> <span class="m">2</span> <span class="m">3</span> <span class="m">4</span> 5<span class="p">;</span> <span class="k">do</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">  psql <span class="s2">&#34;</span><span class="nv">$DATABASE_URL</span><span class="s2">&#34;</span> -c <span class="s2">&#34;SELECT pg_sleep(10);&#34;</span> <span class="p">&amp;</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="k">done</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">psql <span class="s2">&#34;</span><span class="nv">$DATABASE_URL</span><span class="s2">&#34;</span> -c <span class="s2">&#34;SELECT state, count(*) FROM pg_stat_activity WHERE datname = current_database() GROUP BY state;&#34;</span></span></span></code></pre></div><p>這一步證明每個 client session 會占用 PostgreSQL backend process。</p>
<h2 id="add-pgbouncer">Add PgBouncer</h2>
<p>Add PgBouncer 的核心責任是把 client connection 與 server connection 拆開。以下 compose fragment 可加入 local lab：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="w">  </span><span class="nt">pgbouncer</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="w">    </span><span class="nt">image</span><span class="p">:</span><span class="w"> </span><span class="l">edoburu/pgbouncer:latest</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="w">    </span><span class="nt">environment</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="w">      </span><span class="nt">DB_HOST</span><span class="p">:</span><span class="w"> </span><span class="l">postgres</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="w">      </span><span class="nt">DB_USER</span><span class="p">:</span><span class="w"> </span><span class="l">lab_admin</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="w">      </span><span class="nt">DB_PASSWORD</span><span class="p">:</span><span class="w"> </span><span class="l">lab_admin_pw</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="w">      </span><span class="nt">DB_NAME</span><span class="p">:</span><span class="w"> </span><span class="l">appdb</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="w">      </span><span class="nt">POOL_MODE</span><span class="p">:</span><span class="w"> </span><span class="l">transaction</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="w">      </span><span class="nt">MAX_CLIENT_CONN</span><span class="p">:</span><span class="w"> </span><span class="m">100</span><span class="w">
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="w">      </span><span class="nt">DEFAULT_POOL_SIZE</span><span class="p">:</span><span class="w"> </span><span class="m">5</span><span class="w">
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="w">    </span><span class="nt">ports</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="w">      </span>- <span class="s2">&#34;64329:5432&#34;</span></span></span></code></pre></div><p>啟動後設定 pooler URL：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="nb">export</span> <span class="nv">POOL_URL</span><span class="o">=</span><span class="s2">&#34;postgres://lab_admin:lab_admin_pw@localhost:64329/appdb?sslmode=disable&#34;</span></span></span></code></pre></div><h2 id="compare-pool-behavior">Compare Pool Behavior</h2>
<p>Compare pool behavior 的核心責任是觀察 client 多、server 少的效果。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">for</span> i in <span class="k">$(</span>seq <span class="m">1</span> 20<span class="k">)</span><span class="p">;</span> <span class="k">do</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">  psql <span class="s2">&#34;</span><span class="nv">$POOL_URL</span><span class="s2">&#34;</span> -c <span class="s2">&#34;SELECT pg_sleep(1);&#34;</span> <span class="p">&amp;</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="k">done</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">psql <span class="s2">&#34;</span><span class="nv">$DATABASE_URL</span><span class="s2">&#34;</span> -c <span class="s2">&#34;SELECT state, count(*) FROM pg_stat_activity WHERE datname = current_database() GROUP BY state;&#34;</span></span></span></code></pre></div><p>再進 PgBouncer admin console，實際命令依 image 設定調整：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">psql <span class="s2">&#34;postgres://lab_admin:lab_admin_pw@localhost:64329/pgbouncer?sslmode=disable&#34;</span> -c <span class="s2">&#34;SHOW POOLS;&#34;</span></span></span></code></pre></div><p>驗收重點是：client workload 增加時，PostgreSQL backend 數量被 pool size 控制，排隊發生在 pooler 層。</p>
<h2 id="pool-exhaustion">Pool Exhaustion</h2>
<p>Pool exhaustion 的核心責任是看過載時的錯誤與等待。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">for</span> i in <span class="k">$(</span>seq <span class="m">1</span> 50<span class="k">)</span><span class="p">;</span> <span class="k">do</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">  psql <span class="s2">&#34;</span><span class="nv">$POOL_URL</span><span class="s2">&#34;</span> -c <span class="s2">&#34;BEGIN; SELECT pg_sleep(5); COMMIT;&#34;</span> <span class="p">&amp;</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="k">done</span></span></span></code></pre></div><p>觀察：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">psql <span class="s2">&#34;</span><span class="nv">$DATABASE_URL</span><span class="s2">&#34;</span> -c <span class="s2">&#34;SELECT count(*) FROM pg_stat_activity WHERE datname = current_database();&#34;</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">psql <span class="s2">&#34;postgres://lab_admin:lab_admin_pw@localhost:64329/pgbouncer?sslmode=disable&#34;</span> -c <span class="s2">&#34;SHOW POOLS;&#34;</span></span></span></code></pre></div><p>Pool exhaustion 的 evidence 包含 waiting clients、timeout、application latency 與 error message。這些要接到 production alert。</p>
<h2 id="failure-note">Failure Note</h2>
<p>Failure note 的核心責任是把 lab 結果轉成 runbook。記錄三件事：</p>
<ol>
<li>Direct connection baseline backend 數。</li>
<li>PgBouncer transaction pooling 下 server connection 數。</li>
<li>Pool exhaustion 時的 latency / error / queue。</li>
</ol>
<p>若 application 使用 session state、prepared statement、temp table 或 advisory lock，還要補 transaction pooling compatibility matrix。</p>
<h2 id="下一步路由">下一步路由</h2>
<p>完成本篇後，回到 <a href="../../connection-pooler-comparison/">Connection Pooler Comparison</a> 做選型；要看 PgBouncer production 設定讀 <a href="../../pgbouncer-config/">PgBouncer Config</a>。</p>
]]></content:encoded></item><item><title>PostgreSQL HA Failover Drill</title><link>https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/hands-on/ha-failover-drill/</link><pubDate>Fri, 22 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/hands-on/ha-failover-drill/</guid><description>&lt;p>PostgreSQL HA failover drill 的核心責任是讓讀者觀察 primary promotion 對 application、pooler 與 incident decision 的影響。這篇承接 &lt;a href="../../patroni-ha/">Patroni HA&lt;/a> 與 &lt;a href="../../cross-region-dr/">Cross-region DR&lt;/a>。&lt;/p>
&lt;p>本文的驗收標準是：你能記錄 failover timeline、replication lag snapshot、client error sample、data validation query 與 incident decision log entry。實際觸發方式依 Patroni、managed PostgreSQL 或雲平台而異；lab 重點是 evidence。&lt;/p>
&lt;h2 id="pre-failover-baseline">Pre-Failover Baseline&lt;/h2>
&lt;p>Pre-failover baseline 的核心責任是確認 primary / standby 狀態與 client route。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-sql" data-lang="sql">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="k">SELECT&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">pg_is_in_recovery&lt;/span>&lt;span class="p">();&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="k">SELECT&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">now&lt;/span>&lt;span class="p">(),&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">pg_current_wal_lsn&lt;/span>&lt;span class="p">();&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="k">SELECT&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">application_name&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">state&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">sync_state&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">replay_lag&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="k">FROM&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">pg_stat_replication&lt;/span>&lt;span class="p">;&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>在 standby 查：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-sql" data-lang="sql">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="k">SELECT&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">pg_is_in_recovery&lt;/span>&lt;span class="p">();&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="k">SELECT&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">now&lt;/span>&lt;span class="p">(),&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">pg_last_wal_receive_lsn&lt;/span>&lt;span class="p">(),&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">pg_last_wal_replay_lsn&lt;/span>&lt;span class="p">();&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Baseline 要保存 primary host、standby host、replication lag、application connection string、pooler route 與 current timeline。&lt;/p>
&lt;h2 id="client-workload">Client Workload&lt;/h2>
&lt;p>Client workload 的核心責任是讓 failover 對 application 的影響可見。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="k">while&lt;/span> true&lt;span class="p">;&lt;/span> &lt;span class="k">do&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl"> date -u
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl"> psql &lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="nv">$DATABASE_URL&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span> -c &lt;span class="s2">&amp;#34;INSERT INTO restore_markers(marker) VALUES (&amp;#39;failover-drill&amp;#39;) RETURNING id, created_at;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl"> sleep &lt;span class="m">1&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl">&lt;span class="k">done&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>這個 loop 會在 failover 期間產生成功、timeout、connection reset 或 read-only error。正式演練要用 synthetic workload，避免影響真實使用者。&lt;/p>
&lt;h2 id="trigger-failover">Trigger Failover&lt;/h2>
&lt;p>Trigger failover 的核心責任是以受控方式促成 promotion。Patroni lab 可以用 &lt;code>patronictl failover&lt;/code>；managed service 則用 provider failover / reboot with failover 功能。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-text" data-lang="text">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">failover_start_time:
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">trigger_method:
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">old_primary:
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">candidate:
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl">operator:
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">6&lt;/span>&lt;span class="cl">reason:&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Failover 觸發前要先確認這是演練，並且 workload、backup、rollback 與 stakeholder 都已對齊。&lt;/p>
&lt;h2 id="observe-promotion">Observe Promotion&lt;/h2>
&lt;p>Observe promotion 的核心責任是記錄資料庫與 client 的時間線。&lt;/p>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>時間點&lt;/th>
 &lt;th>Evidence&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>Trigger issued&lt;/td>
 &lt;td>command / provider event&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Old primary down&lt;/td>
 &lt;td>connection error / health check&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>New primary promoted&lt;/td>
 &lt;td>&lt;code>pg_is_in_recovery() = false&lt;/code>&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Client reconnect&lt;/td>
 &lt;td>first successful write&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Pooler stable&lt;/td>
 &lt;td>pool queue / server connection&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Validation complete&lt;/td>
 &lt;td>row count / marker sequence&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>Promotion timeline 要保留秒級時間戳。這是評估 RTO、client retry 與 pooler behavior 的基礎。&lt;/p></description><content:encoded><![CDATA[<p>PostgreSQL HA failover drill 的核心責任是讓讀者觀察 primary promotion 對 application、pooler 與 incident decision 的影響。這篇承接 <a href="../../patroni-ha/">Patroni HA</a> 與 <a href="../../cross-region-dr/">Cross-region DR</a>。</p>
<p>本文的驗收標準是：你能記錄 failover timeline、replication lag snapshot、client error sample、data validation query 與 incident decision log entry。實際觸發方式依 Patroni、managed PostgreSQL 或雲平台而異；lab 重點是 evidence。</p>
<h2 id="pre-failover-baseline">Pre-Failover Baseline</h2>
<p>Pre-failover baseline 的核心責任是確認 primary / standby 狀態與 client route。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">SELECT</span><span class="w"> </span><span class="n">pg_is_in_recovery</span><span class="p">();</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w"></span><span class="k">SELECT</span><span class="w"> </span><span class="n">now</span><span class="p">(),</span><span class="w"> </span><span class="n">pg_current_wal_lsn</span><span class="p">();</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w"></span><span class="k">SELECT</span><span class="w"> </span><span class="n">application_name</span><span class="p">,</span><span class="w"> </span><span class="k">state</span><span class="p">,</span><span class="w"> </span><span class="n">sync_state</span><span class="p">,</span><span class="w"> </span><span class="n">replay_lag</span><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w"></span><span class="k">FROM</span><span class="w"> </span><span class="n">pg_stat_replication</span><span class="p">;</span></span></span></code></pre></div><p>在 standby 查：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">SELECT</span><span class="w"> </span><span class="n">pg_is_in_recovery</span><span class="p">();</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w"></span><span class="k">SELECT</span><span class="w"> </span><span class="n">now</span><span class="p">(),</span><span class="w"> </span><span class="n">pg_last_wal_receive_lsn</span><span class="p">(),</span><span class="w"> </span><span class="n">pg_last_wal_replay_lsn</span><span class="p">();</span></span></span></code></pre></div><p>Baseline 要保存 primary host、standby host、replication lag、application connection string、pooler route 與 current timeline。</p>
<h2 id="client-workload">Client Workload</h2>
<p>Client workload 的核心責任是讓 failover 對 application 的影響可見。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">while</span> true<span class="p">;</span> <span class="k">do</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">  date -u
</span></span><span class="line"><span class="ln">3</span><span class="cl">  psql <span class="s2">&#34;</span><span class="nv">$DATABASE_URL</span><span class="s2">&#34;</span> -c <span class="s2">&#34;INSERT INTO restore_markers(marker) VALUES (&#39;failover-drill&#39;) RETURNING id, created_at;&#34;</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">  sleep <span class="m">1</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="k">done</span></span></span></code></pre></div><p>這個 loop 會在 failover 期間產生成功、timeout、connection reset 或 read-only error。正式演練要用 synthetic workload，避免影響真實使用者。</p>
<h2 id="trigger-failover">Trigger Failover</h2>
<p>Trigger failover 的核心責任是以受控方式促成 promotion。Patroni lab 可以用 <code>patronictl failover</code>；managed service 則用 provider failover / reboot with failover 功能。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">failover_start_time:
</span></span><span class="line"><span class="ln">2</span><span class="cl">trigger_method:
</span></span><span class="line"><span class="ln">3</span><span class="cl">old_primary:
</span></span><span class="line"><span class="ln">4</span><span class="cl">candidate:
</span></span><span class="line"><span class="ln">5</span><span class="cl">operator:
</span></span><span class="line"><span class="ln">6</span><span class="cl">reason:</span></span></code></pre></div><p>Failover 觸發前要先確認這是演練，並且 workload、backup、rollback 與 stakeholder 都已對齊。</p>
<h2 id="observe-promotion">Observe Promotion</h2>
<p>Observe promotion 的核心責任是記錄資料庫與 client 的時間線。</p>
<table>
  <thead>
      <tr>
          <th>時間點</th>
          <th>Evidence</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Trigger issued</td>
          <td>command / provider event</td>
      </tr>
      <tr>
          <td>Old primary down</td>
          <td>connection error / health check</td>
      </tr>
      <tr>
          <td>New primary promoted</td>
          <td><code>pg_is_in_recovery() = false</code></td>
      </tr>
      <tr>
          <td>Client reconnect</td>
          <td>first successful write</td>
      </tr>
      <tr>
          <td>Pooler stable</td>
          <td>pool queue / server connection</td>
      </tr>
      <tr>
          <td>Validation complete</td>
          <td>row count / marker sequence</td>
      </tr>
  </tbody>
</table>
<p>Promotion timeline 要保留秒級時間戳。這是評估 RTO、client retry 與 pooler behavior 的基礎。</p>
<h2 id="data-validation">Data Validation</h2>
<p>Data validation 的核心責任是確認 failover 後資料一致性。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">SELECT</span><span class="w"> </span><span class="k">count</span><span class="p">(</span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">restore_markers</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">marker</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;failover-drill&#39;</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w"></span><span class="k">SELECT</span><span class="w"> </span><span class="k">max</span><span class="p">(</span><span class="n">created_at</span><span class="p">)</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">restore_markers</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w"></span><span class="k">SELECT</span><span class="w"> </span><span class="n">status</span><span class="p">,</span><span class="w"> </span><span class="k">count</span><span class="p">(</span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">accounts</span><span class="w"> </span><span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">status</span><span class="p">;</span></span></span></code></pre></div><p>若 workload 有 idempotency key，還要檢查 duplicate。若外部 side effect 參與交易，例如 payment 或 queue，必須有 reconciliation query。</p>
<h2 id="pooler-and-client-behavior">Pooler and Client Behavior</h2>
<p>Pooler and client behavior 的核心責任是確認 failover 後連線能重新指向新 primary。</p>
<p>檢查項目：</p>
<ol>
<li>Application retry 是否有 backoff / jitter。</li>
<li>PgBouncer / proxy 是否清掉舊 server connection。</li>
<li>DNS / endpoint TTL 是否符合 RTO。</li>
<li>Read-only error 是否被正確分類。</li>
<li>Migration / background job 是否暫停。</li>
</ol>
<p>Failover 的完成標準包含 database promote、client reconnect 與 pooler stable。若 client 長時間連到舊 primary 或 pooler 卡住，服務仍處於 unavailable 狀態。</p>
<h2 id="incident-decision-log">Incident Decision Log</h2>
<p>Incident decision log 的核心責任是把演練變成可審查紀錄。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">Incident / drill id:
</span></span><span class="line"><span class="ln">2</span><span class="cl">Decision: promote standby
</span></span><span class="line"><span class="ln">3</span><span class="cl">Reason:
</span></span><span class="line"><span class="ln">4</span><span class="cl">Accepted data loss:
</span></span><span class="line"><span class="ln">5</span><span class="cl">RTO observed:
</span></span><span class="line"><span class="ln">6</span><span class="cl">Client impact:
</span></span><span class="line"><span class="ln">7</span><span class="cl">Validation result:
</span></span><span class="line"><span class="ln">8</span><span class="cl">Follow-up:</span></span></code></pre></div><p>每次 drill 都要產生 follow-up。常見 follow-up 是調整 retry、降低 DNS TTL、補 pooler command、增加 validation query 或改善 monitoring。</p>
<h2 id="下一步路由">下一步路由</h2>
<p>完成本篇後，HA 架構讀 <a href="../../patroni-ha/">Patroni HA</a>；跨區災難復原讀 <a href="../../cross-region-dr/">Cross-region DR</a>；connection retry 與 pooler 行為讀 <a href="../connection-pool-lab/">Connection Pool Lab</a>。</p>
]]></content:encoded></item><item><title>PostgreSQL Local Lab Quickstart</title><link>https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/hands-on/local-lab-quickstart/</link><pubDate>Fri, 22 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/hands-on/local-lab-quickstart/</guid><description>&lt;p>PostgreSQL local lab quickstart 的核心責任是建立後續 connection、migration、backup 與 failover 演練共用的本地環境。這個 lab 提供一個可重建的 PostgreSQL instance、app-facing user、baseline schema、seed data 與 basic evidence。&lt;/p>
&lt;p>本文的驗收標準是：你能啟動本地 PostgreSQL，套用 schema，跑 sample workload，取得 &lt;code>pg_stat_activity&lt;/code> / &lt;code>pg_stat_database&lt;/code> snapshot，最後 teardown 並重建。&lt;/p>
&lt;h2 id="docker-compose">Docker Compose&lt;/h2>
&lt;p>Docker Compose 的核心責任是讓 lab 環境可重建。建立 &lt;code>docker-compose.yml&lt;/code>：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-yaml" data-lang="yaml">&lt;span class="line">&lt;span class="ln"> 1&lt;/span>&lt;span class="cl">&lt;span class="nt">services&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 2&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">postgres&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 3&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">image&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">postgres:16&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 4&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">environment&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 5&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">POSTGRES_USER&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">lab_admin&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 6&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">POSTGRES_PASSWORD&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">lab_admin_pw&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 7&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">POSTGRES_DB&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">appdb&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 8&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">ports&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 9&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>- &lt;span class="s2">&amp;#34;54329:5432&amp;#34;&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">10&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">command&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">11&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>- &lt;span class="s2">&amp;#34;postgres&amp;#34;&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">12&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>- &lt;span class="s2">&amp;#34;-c&amp;#34;&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">13&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>- &lt;span class="s2">&amp;#34;log_min_duration_statement=100&amp;#34;&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">14&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>- &lt;span class="s2">&amp;#34;-c&amp;#34;&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">15&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>- &lt;span class="s2">&amp;#34;shared_preload_libraries=pg_stat_statements&amp;#34;&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>啟動：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">docker compose up -d
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="nb">export&lt;/span> &lt;span class="nv">DATABASE_URL&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;postgres://lab_admin:lab_admin_pw@localhost:54329/appdb?sslmode=disable&amp;#34;&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="baseline-schema">Baseline Schema&lt;/h2>
&lt;p>Baseline schema 的核心責任是建立可測 transaction、index、lock 與 migration 的資料模型。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln"> 1&lt;/span>&lt;span class="cl">psql &lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="nv">$DATABASE_URL&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span> &lt;span class="s">&amp;lt;&amp;lt;&amp;#39;SQL&amp;#39;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 2&lt;/span>&lt;span class="cl">&lt;span class="s">CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 3&lt;/span>&lt;span class="cl">&lt;span class="s">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 4&lt;/span>&lt;span class="cl">&lt;span class="s">CREATE TABLE accounts (
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 5&lt;/span>&lt;span class="cl">&lt;span class="s"> id bigserial PRIMARY KEY,
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 6&lt;/span>&lt;span class="cl">&lt;span class="s"> tenant_id uuid NOT NULL,
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 7&lt;/span>&lt;span class="cl">&lt;span class="s"> owner_name text NOT NULL,
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 8&lt;/span>&lt;span class="cl">&lt;span class="s"> status text NOT NULL CHECK (status IN (&amp;#39;active&amp;#39;, &amp;#39;closed&amp;#39;)),
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 9&lt;/span>&lt;span class="cl">&lt;span class="s"> created_at timestamptz NOT NULL DEFAULT now()
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">10&lt;/span>&lt;span class="cl">&lt;span class="s">);
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">11&lt;/span>&lt;span class="cl">&lt;span class="s">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">12&lt;/span>&lt;span class="cl">&lt;span class="s">CREATE TABLE ledger_entries (
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">13&lt;/span>&lt;span class="cl">&lt;span class="s"> id bigserial PRIMARY KEY,
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">14&lt;/span>&lt;span class="cl">&lt;span class="s"> account_id bigint NOT NULL REFERENCES accounts(id),
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">15&lt;/span>&lt;span class="cl">&lt;span class="s"> amount_cents bigint NOT NULL CHECK (amount_cents &amp;lt;&amp;gt; 0),
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">16&lt;/span>&lt;span class="cl">&lt;span class="s"> idempotency_key text NOT NULL UNIQUE,
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">17&lt;/span>&lt;span class="cl">&lt;span class="s"> created_at timestamptz NOT NULL DEFAULT now()
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">18&lt;/span>&lt;span class="cl">&lt;span class="s">);
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">19&lt;/span>&lt;span class="cl">&lt;span class="s">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">20&lt;/span>&lt;span class="cl">&lt;span class="s">CREATE INDEX idx_ledger_entries_account_created
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">21&lt;/span>&lt;span class="cl">&lt;span class="s">ON ledger_entries(account_id, created_at DESC);
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">22&lt;/span>&lt;span class="cl">&lt;span class="s">SQL&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>這組 schema 後續可用於 migration、lock、PITR 與 pool lab。&lt;/p></description><content:encoded><![CDATA[<p>PostgreSQL local lab quickstart 的核心責任是建立後續 connection、migration、backup 與 failover 演練共用的本地環境。這個 lab 提供一個可重建的 PostgreSQL instance、app-facing user、baseline schema、seed data 與 basic evidence。</p>
<p>本文的驗收標準是：你能啟動本地 PostgreSQL，套用 schema，跑 sample workload，取得 <code>pg_stat_activity</code> / <code>pg_stat_database</code> snapshot，最後 teardown 並重建。</p>
<h2 id="docker-compose">Docker Compose</h2>
<p>Docker Compose 的核心責任是讓 lab 環境可重建。建立 <code>docker-compose.yml</code>：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="nt">services</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="w">  </span><span class="nt">postgres</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="w">    </span><span class="nt">image</span><span class="p">:</span><span class="w"> </span><span class="l">postgres:16</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="w">    </span><span class="nt">environment</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="w">      </span><span class="nt">POSTGRES_USER</span><span class="p">:</span><span class="w"> </span><span class="l">lab_admin</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="w">      </span><span class="nt">POSTGRES_PASSWORD</span><span class="p">:</span><span class="w"> </span><span class="l">lab_admin_pw</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="w">      </span><span class="nt">POSTGRES_DB</span><span class="p">:</span><span class="w"> </span><span class="l">appdb</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="w">    </span><span class="nt">ports</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="w">      </span>- <span class="s2">&#34;54329:5432&#34;</span><span class="w">
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="w">    </span><span class="nt">command</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="w">      </span>- <span class="s2">&#34;postgres&#34;</span><span class="w">
</span></span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="w">      </span>- <span class="s2">&#34;-c&#34;</span><span class="w">
</span></span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="w">      </span>- <span class="s2">&#34;log_min_duration_statement=100&#34;</span><span class="w">
</span></span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="w">      </span>- <span class="s2">&#34;-c&#34;</span><span class="w">
</span></span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="w">      </span>- <span class="s2">&#34;shared_preload_libraries=pg_stat_statements&#34;</span></span></span></code></pre></div><p>啟動：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">docker compose up -d
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="nb">export</span> <span class="nv">DATABASE_URL</span><span class="o">=</span><span class="s2">&#34;postgres://lab_admin:lab_admin_pw@localhost:54329/appdb?sslmode=disable&#34;</span></span></span></code></pre></div><h2 id="baseline-schema">Baseline Schema</h2>
<p>Baseline schema 的核心責任是建立可測 transaction、index、lock 與 migration 的資料模型。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl">psql <span class="s2">&#34;</span><span class="nv">$DATABASE_URL</span><span class="s2">&#34;</span> <span class="s">&lt;&lt;&#39;SQL&#39;
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="s">CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="s">
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="s">CREATE TABLE accounts (
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="s">  id bigserial PRIMARY KEY,
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="s">  tenant_id uuid NOT NULL,
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="s">  owner_name text NOT NULL,
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="s">  status text NOT NULL CHECK (status IN (&#39;active&#39;, &#39;closed&#39;)),
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="s">  created_at timestamptz NOT NULL DEFAULT now()
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="s">);
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="s">
</span></span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="s">CREATE TABLE ledger_entries (
</span></span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="s">  id bigserial PRIMARY KEY,
</span></span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="s">  account_id bigint NOT NULL REFERENCES accounts(id),
</span></span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="s">  amount_cents bigint NOT NULL CHECK (amount_cents &lt;&gt; 0),
</span></span></span><span class="line"><span class="ln">16</span><span class="cl"><span class="s">  idempotency_key text NOT NULL UNIQUE,
</span></span></span><span class="line"><span class="ln">17</span><span class="cl"><span class="s">  created_at timestamptz NOT NULL DEFAULT now()
</span></span></span><span class="line"><span class="ln">18</span><span class="cl"><span class="s">);
</span></span></span><span class="line"><span class="ln">19</span><span class="cl"><span class="s">
</span></span></span><span class="line"><span class="ln">20</span><span class="cl"><span class="s">CREATE INDEX idx_ledger_entries_account_created
</span></span></span><span class="line"><span class="ln">21</span><span class="cl"><span class="s">ON ledger_entries(account_id, created_at DESC);
</span></span></span><span class="line"><span class="ln">22</span><span class="cl"><span class="s">SQL</span></span></span></code></pre></div><p>這組 schema 後續可用於 migration、lock、PITR 與 pool lab。</p>
<h2 id="seed-and-workload">Seed and Workload</h2>
<p>Seed and workload 的核心責任是產生可觀察的資料與查詢。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl">psql <span class="s2">&#34;</span><span class="nv">$DATABASE_URL</span><span class="s2">&#34;</span> <span class="s">&lt;&lt;&#39;SQL&#39;
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="s">INSERT INTO accounts(tenant_id, owner_name, status)
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="s">VALUES
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="s">  (&#39;00000000-0000-0000-0000-000000000001&#39;, &#39;Ada&#39;, &#39;active&#39;),
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="s">  (&#39;00000000-0000-0000-0000-000000000002&#39;, &#39;Lin&#39;, &#39;active&#39;);
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="s">
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="s">INSERT INTO ledger_entries(account_id, amount_cents, idempotency_key)
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="s">SELECT 1, 100, &#39;seed-ada-&#39; || g
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="s">FROM generate_series(1, 100) AS g;
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="s">
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="s">SELECT a.owner_name, SUM(l.amount_cents) AS balance_cents
</span></span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="s">FROM accounts a
</span></span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="s">JOIN ledger_entries l ON l.account_id = a.id
</span></span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="s">GROUP BY a.owner_name;
</span></span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="s">SQL</span></span></span></code></pre></div><p>Sample workload 要保留 SQL 與輸出，作為後續 migration / restore validation 的 baseline。</p>
<h2 id="basic-evidence">Basic Evidence</h2>
<p>Basic evidence 的核心責任是把 lab 狀態保存成可比較 snapshot。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl">psql <span class="s2">&#34;</span><span class="nv">$DATABASE_URL</span><span class="s2">&#34;</span> <span class="s">&lt;&lt;&#39;SQL&#39;
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="s">SELECT current_database(), current_user, version();
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="s">SELECT relname, n_live_tup FROM pg_stat_user_tables ORDER BY relname;
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="s">SELECT datname, numbackends, xact_commit, xact_rollback
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="s">FROM pg_stat_database
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="s">WHERE datname = current_database();
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="s">SELECT pid, state, wait_event_type, query
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="s">FROM pg_stat_activity
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="s">WHERE datname = current_database();
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="s">SQL</span></span></span></code></pre></div><p>這些查詢是 PostgreSQL lab 的最小 evidence。正式服務要再加入 slow query、lock wait、replica lag、backup status 與 pooler metrics。</p>
<h2 id="teardown">Teardown</h2>
<p>Teardown 的核心責任是讓 lab 可重跑。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">docker compose down -v</span></span></code></pre></div><p>重建後應能重新套用 schema 與 seed。若 lab 需要跨章節沿用資料，先用 <code>pg_dump</code> 保存 fixture，再 teardown。</p>
<h2 id="下一步路由">下一步路由</h2>
<p>完成本篇後，連線壓力進入 <a href="../connection-pool-lab/">Connection Pool Lab</a>；migration evidence 進入 <a href="../schema-migration-evidence-lab/">Schema Migration Evidence Lab</a>；backup / PITR 進入 <a href="../pitr-restore-drill/">PITR Restore Drill</a>。</p>
]]></content:encoded></item><item><title>PostgreSQL PITR Restore Drill</title><link>https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/hands-on/pitr-restore-drill/</link><pubDate>Fri, 22 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/hands-on/pitr-restore-drill/</guid><description>&lt;p>PostgreSQL PITR restore drill 的核心責任是證明 backup 可以還原到指定時間點。這篇承接 &lt;a href="../../pitr-wal-archiving/">PITR + WAL Archiving&lt;/a>，把備份從存在狀態推進到可恢復證據。&lt;/p>
&lt;p>本文的驗收標準是：你能記錄 base backup 時間、target time、restore duration、validation query 與 RPO / RTO note。實際命令會依 pgBackRest、Barman、cloud snapshot 或 managed service 而變；本文提供 vendor-neutral drill frame。&lt;/p>
&lt;h2 id="prepare-recovery-point">Prepare Recovery Point&lt;/h2>
&lt;p>Prepare recovery point 的核心責任是建立可辨識 transaction。先寫入一筆 marker，記錄時間。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln"> 1&lt;/span>&lt;span class="cl">psql &lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="nv">$DATABASE_URL&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span> &lt;span class="s">&amp;lt;&amp;lt;&amp;#39;SQL&amp;#39;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 2&lt;/span>&lt;span class="cl">&lt;span class="s">CREATE TABLE IF NOT EXISTS restore_markers (
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 3&lt;/span>&lt;span class="cl">&lt;span class="s"> id bigserial PRIMARY KEY,
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 4&lt;/span>&lt;span class="cl">&lt;span class="s"> marker text NOT NULL,
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 5&lt;/span>&lt;span class="cl">&lt;span class="s"> created_at timestamptz NOT NULL DEFAULT clock_timestamp()
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 6&lt;/span>&lt;span class="cl">&lt;span class="s">);
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 7&lt;/span>&lt;span class="cl">&lt;span class="s">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 8&lt;/span>&lt;span class="cl">&lt;span class="s">INSERT INTO restore_markers(marker) VALUES (&amp;#39;before-bad-change&amp;#39;);
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 9&lt;/span>&lt;span class="cl">&lt;span class="s">SELECT id, marker, created_at FROM restore_markers ORDER BY id DESC LIMIT 1;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">10&lt;/span>&lt;span class="cl">&lt;span class="s">SQL&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>把 &lt;code>created_at&lt;/code> 記為 target time。正式 drill 要用 UTC，並記錄 timezone、operator、backup set 與 WAL archive status。&lt;/p>
&lt;h2 id="create-bad-change">Create Bad Change&lt;/h2>
&lt;p>Create bad change 的核心責任是模擬需要 PITR 的錯誤。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">psql &lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="nv">$DATABASE_URL&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span> &lt;span class="s">&amp;lt;&amp;lt;&amp;#39;SQL&amp;#39;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="s">INSERT INTO restore_markers(marker) VALUES (&amp;#39;bad-change-after-target&amp;#39;);
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">&lt;span class="s">UPDATE accounts SET status = &amp;#39;closed&amp;#39;;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">&lt;span class="s">SELECT status, count(*) FROM accounts GROUP BY status;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl">&lt;span class="s">SQL&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>這一步在 lab 中代表誤操作。Production 事故中，bad change 可能是誤刪、錯誤 batch、壞 migration 或 application bug。&lt;/p>
&lt;h2 id="restore-workflow">Restore Workflow&lt;/h2>
&lt;p>Restore workflow 的核心責任是把 backup tool 的操作轉成固定 evidence。不同工具命令不同，但流程一致：&lt;/p>
&lt;ol>
&lt;li>選定 base backup。&lt;/li>
&lt;li>設定 recovery target time。&lt;/li>
&lt;li>套用 WAL 到 target time。&lt;/li>
&lt;li>Promote restored instance。&lt;/li>
&lt;li>跑 validation query。&lt;/li>
&lt;li>啟動 application smoke test。&lt;/li>
&lt;/ol>
&lt;p>Example pseudo-runbook：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-text" data-lang="text">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">restore_target_time = 2026-05-21T10:15:30Z
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">base_backup = latest backup before target
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">wal_archive = available through target
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">restore_path = isolated environment&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Restore 必須在隔離環境先完成。直接覆蓋 production 會讓 evidence 與 rollback 空間消失。&lt;/p></description><content:encoded><![CDATA[<p>PostgreSQL PITR restore drill 的核心責任是證明 backup 可以還原到指定時間點。這篇承接 <a href="../../pitr-wal-archiving/">PITR + WAL Archiving</a>，把備份從存在狀態推進到可恢復證據。</p>
<p>本文的驗收標準是：你能記錄 base backup 時間、target time、restore duration、validation query 與 RPO / RTO note。實際命令會依 pgBackRest、Barman、cloud snapshot 或 managed service 而變；本文提供 vendor-neutral drill frame。</p>
<h2 id="prepare-recovery-point">Prepare Recovery Point</h2>
<p>Prepare recovery point 的核心責任是建立可辨識 transaction。先寫入一筆 marker，記錄時間。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl">psql <span class="s2">&#34;</span><span class="nv">$DATABASE_URL</span><span class="s2">&#34;</span> <span class="s">&lt;&lt;&#39;SQL&#39;
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="s">CREATE TABLE IF NOT EXISTS restore_markers (
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="s">  id bigserial PRIMARY KEY,
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="s">  marker text NOT NULL,
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="s">  created_at timestamptz NOT NULL DEFAULT clock_timestamp()
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="s">);
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="s">
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="s">INSERT INTO restore_markers(marker) VALUES (&#39;before-bad-change&#39;);
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="s">SELECT id, marker, created_at FROM restore_markers ORDER BY id DESC LIMIT 1;
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="s">SQL</span></span></span></code></pre></div><p>把 <code>created_at</code> 記為 target time。正式 drill 要用 UTC，並記錄 timezone、operator、backup set 與 WAL archive status。</p>
<h2 id="create-bad-change">Create Bad Change</h2>
<p>Create bad change 的核心責任是模擬需要 PITR 的錯誤。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">psql <span class="s2">&#34;</span><span class="nv">$DATABASE_URL</span><span class="s2">&#34;</span> <span class="s">&lt;&lt;&#39;SQL&#39;
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="s">INSERT INTO restore_markers(marker) VALUES (&#39;bad-change-after-target&#39;);
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="s">UPDATE accounts SET status = &#39;closed&#39;;
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="s">SELECT status, count(*) FROM accounts GROUP BY status;
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="s">SQL</span></span></span></code></pre></div><p>這一步在 lab 中代表誤操作。Production 事故中，bad change 可能是誤刪、錯誤 batch、壞 migration 或 application bug。</p>
<h2 id="restore-workflow">Restore Workflow</h2>
<p>Restore workflow 的核心責任是把 backup tool 的操作轉成固定 evidence。不同工具命令不同，但流程一致：</p>
<ol>
<li>選定 base backup。</li>
<li>設定 recovery target time。</li>
<li>套用 WAL 到 target time。</li>
<li>Promote restored instance。</li>
<li>跑 validation query。</li>
<li>啟動 application smoke test。</li>
</ol>
<p>Example pseudo-runbook：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">restore_target_time = 2026-05-21T10:15:30Z
</span></span><span class="line"><span class="ln">2</span><span class="cl">base_backup = latest backup before target
</span></span><span class="line"><span class="ln">3</span><span class="cl">wal_archive = available through target
</span></span><span class="line"><span class="ln">4</span><span class="cl">restore_path = isolated environment</span></span></code></pre></div><p>Restore 必須在隔離環境先完成。直接覆蓋 production 會讓 evidence 與 rollback 空間消失。</p>
<h2 id="validation-query">Validation Query</h2>
<p>Validation query 的核心責任是確認 restore 到正確時間點。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">psql <span class="s2">&#34;</span><span class="nv">$RESTORED_DATABASE_URL</span><span class="s2">&#34;</span> <span class="s">&lt;&lt;&#39;SQL&#39;
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="s">SELECT marker, created_at
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="s">FROM restore_markers
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="s">ORDER BY id;
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="s">
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="s">SELECT status, count(*)
</span></span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="s">FROM accounts
</span></span></span><span class="line"><span class="ln">8</span><span class="cl"><span class="s">GROUP BY status;
</span></span></span><span class="line"><span class="ln">9</span><span class="cl"><span class="s">SQL</span></span></span></code></pre></div><p>預期結果是存在 <code>before-bad-change</code>，且 <code>bad-change-after-target</code> 尚未出現。<code>accounts</code> 狀態應維持 target time 前的分布。</p>
<h2 id="rpo--rto-evidence">RPO / RTO Evidence</h2>
<p>RPO / RTO evidence 的核心責任是把 drill 結果轉成服務語言。</p>
<table>
  <thead>
      <tr>
          <th>Evidence</th>
          <th>記錄內容</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Backup timestamp</td>
          <td>使用哪份 base backup</td>
      </tr>
      <tr>
          <td>Target time</td>
          <td>要恢復到哪一秒</td>
      </tr>
      <tr>
          <td>WAL availability</td>
          <td>target time 前後 WAL 是否完整</td>
      </tr>
      <tr>
          <td>Restore duration</td>
          <td>從開始 restore 到 validation 成功</td>
      </tr>
      <tr>
          <td>Data gap</td>
          <td>target time 後需補償的 transaction</td>
      </tr>
      <tr>
          <td>Smoke test</td>
          <td>application 核心 workflow 是否可用</td>
      </tr>
  </tbody>
</table>
<p>PITR 的成功標準是資料與 application 都可用。只讓 PostgreSQL 啟動成功，還不足以交付服務。</p>
<h2 id="drill-retrospective">Drill Retrospective</h2>
<p>Drill retrospective 的核心責任是把演練缺口轉成下一步。</p>
<p>常見缺口：</p>
<ol>
<li>找不到正確 base backup。</li>
<li>WAL archive 缺段。</li>
<li>target time timezone 混亂。</li>
<li>Restore 太慢，超過 RTO。</li>
<li>Application secret / config 指不到 restored DB。</li>
<li>Validation query 缺少 business invariant。</li>
</ol>
<p>完成本篇後，跨區恢復讀 <a href="../../cross-region-dr/">Cross-region DR</a>；備份策略讀 <a href="../../pitr-wal-archiving/">PITR + WAL Archiving</a>。</p>
]]></content:encoded></item><item><title>PostgreSQL Schema Migration Evidence Lab</title><link>https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/hands-on/schema-migration-evidence-lab/</link><pubDate>Fri, 22 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/hands-on/schema-migration-evidence-lab/</guid><description>&lt;p>PostgreSQL schema migration evidence lab 的核心責任是把 schema change 轉成 release gate 可使用的 evidence。這篇承接 &lt;a href="../../online-schema-change/">Online Schema Change&lt;/a> 與 &lt;a href="https://tarrragon.github.io/blog/backend/01-database/database-migration-playbook/" data-link-title="1.6 資料庫轉換實作：雙寫、回填、切流與回滾" data-link-desc="同 DB 內 schema 演進與資料變更的可分段驗證流程、跟 1.12 cross-DB migration 分工">Database Migration Playbook&lt;/a>。&lt;/p>
&lt;p>本文的驗收標準是：你能設計 expand migration、量測 lock、跑 backfill validation、建立 contract migration 的 fail-forward / rollback 判準。&lt;/p>
&lt;h2 id="expand-migration">Expand Migration&lt;/h2>
&lt;p>Expand migration 的核心責任是先加入向後相容 schema。以下範例新增 &lt;code>accounts.email&lt;/code>，先允許 null。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">psql &lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="nv">$DATABASE_URL&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span> &lt;span class="s">&amp;lt;&amp;lt;&amp;#39;SQL&amp;#39;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="s">\timing on
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">&lt;span class="s">BEGIN;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">&lt;span class="s">ALTER TABLE accounts ADD COLUMN email text;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl">&lt;span class="s">COMMIT;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">6&lt;/span>&lt;span class="cl">&lt;span class="s">SQL&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>新增 nullable column 通常是低風險操作，但仍要記錄 timing 與 lock。正式服務要在低流量窗口或 staging 上先測。&lt;/p>
&lt;h2 id="lock-evidence">Lock Evidence&lt;/h2>
&lt;p>Lock evidence 的核心責任是讓 migration 的阻塞風險可見。開另一個 terminal，在 migration 前後查 lock。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">psql &lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="nv">$DATABASE_URL&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span> &lt;span class="s">&amp;lt;&amp;lt;&amp;#39;SQL&amp;#39;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="s">SELECT locktype, relation::regclass, mode, granted, pid
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">&lt;span class="s">FROM pg_locks
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">&lt;span class="s">WHERE relation IN (&amp;#39;accounts&amp;#39;::regclass, &amp;#39;ledger_entries&amp;#39;::regclass)
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl">&lt;span class="s">ORDER BY granted, mode;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">6&lt;/span>&lt;span class="cl">&lt;span class="s">SQL&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Release gate 要保存 lock mode、duration、blocked session 與 application impact。高風險 DDL 要先改成 expand / backfill / contract。&lt;/p>
&lt;h2 id="backfill-and-validation">Backfill and Validation&lt;/h2>
&lt;p>Backfill and validation 的核心責任是把資料補齊並證明結果符合 domain。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">psql &lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="nv">$DATABASE_URL&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span> &lt;span class="s">&amp;lt;&amp;lt;&amp;#39;SQL&amp;#39;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="s">UPDATE accounts
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">&lt;span class="s">SET email = lower(owner_name) || &amp;#39;@example.test&amp;#39;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">&lt;span class="s">WHERE email IS NULL;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl">&lt;span class="s">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">6&lt;/span>&lt;span class="cl">&lt;span class="s">SELECT count(*) AS missing_email
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">7&lt;/span>&lt;span class="cl">&lt;span class="s">FROM accounts
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">8&lt;/span>&lt;span class="cl">&lt;span class="s">WHERE email IS NULL;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">9&lt;/span>&lt;span class="cl">&lt;span class="s">SQL&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>大型表要分 batch backfill，避免 WAL、replica lag、autovacuum 與 lock 壓力。每個 batch 要記錄 row count、duration、error 與 lag。&lt;/p>
&lt;h2 id="add-constraint-safely">Add Constraint Safely&lt;/h2>
&lt;p>Add constraint safely 的核心責任是把資料驗證和 constraint 生效拆開。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">psql &lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="nv">$DATABASE_URL&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span> &lt;span class="s">&amp;lt;&amp;lt;&amp;#39;SQL&amp;#39;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="s">ALTER TABLE accounts
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">&lt;span class="s">ADD CONSTRAINT accounts_email_present
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">&lt;span class="s">CHECK (email IS NOT NULL) NOT VALID;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl">&lt;span class="s">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">6&lt;/span>&lt;span class="cl">&lt;span class="s">ALTER TABLE accounts
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">7&lt;/span>&lt;span class="cl">&lt;span class="s">VALIDATE CONSTRAINT accounts_email_present;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">8&lt;/span>&lt;span class="cl">&lt;span class="s">SQL&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;code>NOT VALID&lt;/code> 讓 constraint 先約束新資料，再用 validation 掃既有資料。這是 PostgreSQL online migration 常用技巧。&lt;/p></description><content:encoded><![CDATA[<p>PostgreSQL schema migration evidence lab 的核心責任是把 schema change 轉成 release gate 可使用的 evidence。這篇承接 <a href="../../online-schema-change/">Online Schema Change</a> 與 <a href="/blog/backend/01-database/database-migration-playbook/" data-link-title="1.6 資料庫轉換實作：雙寫、回填、切流與回滾" data-link-desc="同 DB 內 schema 演進與資料變更的可分段驗證流程、跟 1.12 cross-DB migration 分工">Database Migration Playbook</a>。</p>
<p>本文的驗收標準是：你能設計 expand migration、量測 lock、跑 backfill validation、建立 contract migration 的 fail-forward / rollback 判準。</p>
<h2 id="expand-migration">Expand Migration</h2>
<p>Expand migration 的核心責任是先加入向後相容 schema。以下範例新增 <code>accounts.email</code>，先允許 null。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">psql <span class="s2">&#34;</span><span class="nv">$DATABASE_URL</span><span class="s2">&#34;</span> <span class="s">&lt;&lt;&#39;SQL&#39;
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="s">\timing on
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="s">BEGIN;
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="s">ALTER TABLE accounts ADD COLUMN email text;
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="s">COMMIT;
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="s">SQL</span></span></span></code></pre></div><p>新增 nullable column 通常是低風險操作，但仍要記錄 timing 與 lock。正式服務要在低流量窗口或 staging 上先測。</p>
<h2 id="lock-evidence">Lock Evidence</h2>
<p>Lock evidence 的核心責任是讓 migration 的阻塞風險可見。開另一個 terminal，在 migration 前後查 lock。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">psql <span class="s2">&#34;</span><span class="nv">$DATABASE_URL</span><span class="s2">&#34;</span> <span class="s">&lt;&lt;&#39;SQL&#39;
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="s">SELECT locktype, relation::regclass, mode, granted, pid
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="s">FROM pg_locks
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="s">WHERE relation IN (&#39;accounts&#39;::regclass, &#39;ledger_entries&#39;::regclass)
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="s">ORDER BY granted, mode;
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="s">SQL</span></span></span></code></pre></div><p>Release gate 要保存 lock mode、duration、blocked session 與 application impact。高風險 DDL 要先改成 expand / backfill / contract。</p>
<h2 id="backfill-and-validation">Backfill and Validation</h2>
<p>Backfill and validation 的核心責任是把資料補齊並證明結果符合 domain。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">psql <span class="s2">&#34;</span><span class="nv">$DATABASE_URL</span><span class="s2">&#34;</span> <span class="s">&lt;&lt;&#39;SQL&#39;
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="s">UPDATE accounts
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="s">SET email = lower(owner_name) || &#39;@example.test&#39;
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="s">WHERE email IS NULL;
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="s">
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="s">SELECT count(*) AS missing_email
</span></span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="s">FROM accounts
</span></span></span><span class="line"><span class="ln">8</span><span class="cl"><span class="s">WHERE email IS NULL;
</span></span></span><span class="line"><span class="ln">9</span><span class="cl"><span class="s">SQL</span></span></span></code></pre></div><p>大型表要分 batch backfill，避免 WAL、replica lag、autovacuum 與 lock 壓力。每個 batch 要記錄 row count、duration、error 與 lag。</p>
<h2 id="add-constraint-safely">Add Constraint Safely</h2>
<p>Add constraint safely 的核心責任是把資料驗證和 constraint 生效拆開。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">psql <span class="s2">&#34;</span><span class="nv">$DATABASE_URL</span><span class="s2">&#34;</span> <span class="s">&lt;&lt;&#39;SQL&#39;
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="s">ALTER TABLE accounts
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="s">ADD CONSTRAINT accounts_email_present
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="s">CHECK (email IS NOT NULL) NOT VALID;
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="s">
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="s">ALTER TABLE accounts
</span></span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="s">VALIDATE CONSTRAINT accounts_email_present;
</span></span></span><span class="line"><span class="ln">8</span><span class="cl"><span class="s">SQL</span></span></span></code></pre></div><p><code>NOT VALID</code> 讓 constraint 先約束新資料，再用 validation 掃既有資料。這是 PostgreSQL online migration 常用技巧。</p>
<h2 id="query-plan-evidence">Query Plan Evidence</h2>
<p>Query plan evidence 的核心責任是確認 migration 後 query 仍走正確路徑。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">psql <span class="s2">&#34;</span><span class="nv">$DATABASE_URL</span><span class="s2">&#34;</span> <span class="s">&lt;&lt;&#39;SQL&#39;
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="s">EXPLAIN (ANALYZE, BUFFERS)
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="s">SELECT *
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="s">FROM accounts
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="s">WHERE email = &#39;ada@example.test&#39;;
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="s">SQL</span></span></span></code></pre></div><p>若 email 查詢成為正式 path，要新增 index，並用 <code>CREATE INDEX CONCURRENTLY</code> 評估 lock 與時間。</p>
<h2 id="contract-migration">Contract Migration</h2>
<p>Contract migration 的核心責任是在 application 都改用新欄位後，收斂舊欄位或舊 constraint。Contract migration 要比 expand 更謹慎，因為 rollback 空間更小。</p>
<p>Contract release gate：</p>
<ol>
<li>所有 app version 已停止讀舊欄位 / 舊行為。</li>
<li>Backfill validation 為零缺口。</li>
<li>Query plan 與 index evidence 已保存。</li>
<li>Rollback path 是 fail-forward 或 restore，兩者擇一寫清楚。</li>
<li>PITR / backup window 符合風險。</li>
</ol>
<h2 id="release-gate-note">Release Gate Note</h2>
<p>Release gate note 的核心責任是形成可交付 artifact。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">Migration: add accounts.email
</span></span><span class="line"><span class="ln">2</span><span class="cl">Expand DDL duration:
</span></span><span class="line"><span class="ln">3</span><span class="cl">Backfill rows:
</span></span><span class="line"><span class="ln">4</span><span class="cl">Validation query:
</span></span><span class="line"><span class="ln">5</span><span class="cl">Lock evidence:
</span></span><span class="line"><span class="ln">6</span><span class="cl">Query plan:
</span></span><span class="line"><span class="ln">7</span><span class="cl">Rollback / fail-forward:
</span></span><span class="line"><span class="ln">8</span><span class="cl">Owner:</span></span></code></pre></div><p>完成本篇後，複雜 migration 回到 <a href="../../online-schema-change/">Online Schema Change</a>；需要跨 DB 遷移則讀 <a href="/blog/backend/01-database/database-migration-playbook/" data-link-title="1.6 資料庫轉換實作：雙寫、回填、切流與回滾" data-link-desc="同 DB 內 schema 演進與資料變更的可分段驗證流程、跟 1.12 cross-DB migration 分工">Database Migration Playbook</a>。</p>
]]></content:encoded></item></channel></rss>