<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Pgbouncer on Tarragon</title><link>https://tarrragon.github.io/blog/tags/pgbouncer/</link><description>Recent content in Pgbouncer on Tarragon</description><generator>Hugo -- gohugo.io</generator><language>zh-TW</language><copyright>Tarragon (CC BY 4.0)</copyright><lastBuildDate>Mon, 18 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://tarrragon.github.io/blog/tags/pgbouncer/index.xml" rel="self" type="application/rss+xml"/><item><title>PostgreSQL pgBouncer 配置 + 連線池治理</title><link>https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/pgbouncer-config/</link><pubDate>Mon, 18 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/pgbouncer-config/</guid><description>&lt;p>PostgreSQL 的 connection 是 &lt;em>昂貴的 process&lt;/em>、每個 connection ~10MB RAM、idle connection 也吃 backend slot。當 application instance 數量爆炸（K8s replica × 多 deployment × pool size）、直接連 PostgreSQL 會把 backend slot 耗盡、新 connection 全 refuse — 即使 active query 不多。pgBouncer 是 &lt;em>connection pool proxy&lt;/em>、把幾千個 application connection 收斂成幾百個 PostgreSQL backend connection、production-grade PostgreSQL 部署的標配。&lt;/p>
&lt;p>本文不是 pgBouncer overview（請看 &lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/" data-link-title="PostgreSQL" data-link-desc="多用途 OLTP 主流關聯式資料庫、MVCC、豐富 SQL 特性、是 Aurora / Cosmos DB / Spanner / CockroachDB / Aurora DSQL 的相容目標">PostgreSQL vendor 頁&lt;/a> 中 connection pool 段）— 而是 &lt;em>production 部署 + 故障演練&lt;/em> 的實作層教學。覆蓋三層 pool（application → pgBouncer → PostgreSQL）的對齊、transaction pooling 跟 session pooling 的選擇陷阱、跟 HA failover 的整合、容量規劃。&lt;/p>
&lt;h2 id="問題情境">問題情境&lt;/h2>
&lt;p>典型觸發場景：團隊規模從 50 人爬到 200 人、microservice 從 20 個爬到 100 個、K8s replica 從 3 個爬到每服務 5-10 個。直連 PostgreSQL 的 connection 計算：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-text" data-lang="text">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">100 service × 6 replica × 30 application pool = 18000 connection&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>PostgreSQL 預設 &lt;code>max_connections = 100&lt;/code>、production 設 &lt;code>max_connections = 500-1000&lt;/code> 已經是上限（每多一個都加 memory + context switch cost）。18000 連線打 PostgreSQL 直接打爆。&lt;/p>
&lt;p>進一步問題：&lt;/p>
&lt;ul>
&lt;li>一半 connection 是 &lt;em>idle&lt;/em>（application pool 預留、實際沒查詢）— 浪費 backend slot&lt;/li>
&lt;li>Cold start 時所有 replica 同時建 connection、瞬間 spike&lt;/li>
&lt;li>DB failover 時所有 application 同時 reconnect、prod-test pattern 跑不通&lt;/li>
&lt;li>DNS-based failover 時 application connection pool 不知道 backend 換了&lt;/li>
&lt;/ul>
&lt;p>pgBouncer 解這四個問題。但 &lt;em>引入 pgBouncer&lt;/em> 後又會引入新的問題層（pgBouncer 跟 application pool 不對齊、transaction pooling 的 session state 限制、HA 故障時 pgBouncer 也要 failover）— 本文討論這些。&lt;/p>
&lt;h2 id="核心概念pool-mode--sizing">核心概念：pool mode + sizing&lt;/h2>
&lt;p>pgBouncer 的 first-class concept 是 &lt;em>pool mode&lt;/em>、決定 application connection 跟 PostgreSQL backend connection 的綁定方式：&lt;/p></description><content:encoded><![CDATA[<p>PostgreSQL 的 connection 是 <em>昂貴的 process</em>、每個 connection ~10MB RAM、idle connection 也吃 backend slot。當 application instance 數量爆炸（K8s replica × 多 deployment × pool size）、直接連 PostgreSQL 會把 backend slot 耗盡、新 connection 全 refuse — 即使 active query 不多。pgBouncer 是 <em>connection pool proxy</em>、把幾千個 application connection 收斂成幾百個 PostgreSQL backend connection、production-grade PostgreSQL 部署的標配。</p>
<p>本文不是 pgBouncer overview（請看 <a href="/blog/backend/01-database/vendors/postgresql/" data-link-title="PostgreSQL" data-link-desc="多用途 OLTP 主流關聯式資料庫、MVCC、豐富 SQL 特性、是 Aurora / Cosmos DB / Spanner / CockroachDB / Aurora DSQL 的相容目標">PostgreSQL vendor 頁</a> 中 connection pool 段）— 而是 <em>production 部署 + 故障演練</em> 的實作層教學。覆蓋三層 pool（application → pgBouncer → PostgreSQL）的對齊、transaction pooling 跟 session pooling 的選擇陷阱、跟 HA failover 的整合、容量規劃。</p>
<h2 id="問題情境">問題情境</h2>
<p>典型觸發場景：團隊規模從 50 人爬到 200 人、microservice 從 20 個爬到 100 個、K8s replica 從 3 個爬到每服務 5-10 個。直連 PostgreSQL 的 connection 計算：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">100 service × 6 replica × 30 application pool = 18000 connection</span></span></code></pre></div><p>PostgreSQL 預設 <code>max_connections = 100</code>、production 設 <code>max_connections = 500-1000</code> 已經是上限（每多一個都加 memory + context switch cost）。18000 連線打 PostgreSQL 直接打爆。</p>
<p>進一步問題：</p>
<ul>
<li>一半 connection 是 <em>idle</em>（application pool 預留、實際沒查詢）— 浪費 backend slot</li>
<li>Cold start 時所有 replica 同時建 connection、瞬間 spike</li>
<li>DB failover 時所有 application 同時 reconnect、prod-test pattern 跑不通</li>
<li>DNS-based failover 時 application connection pool 不知道 backend 換了</li>
</ul>
<p>pgBouncer 解這四個問題。但 <em>引入 pgBouncer</em> 後又會引入新的問題層（pgBouncer 跟 application pool 不對齊、transaction pooling 的 session state 限制、HA 故障時 pgBouncer 也要 failover）— 本文討論這些。</p>
<h2 id="核心概念pool-mode--sizing">核心概念：pool mode + sizing</h2>
<p>pgBouncer 的 first-class concept 是 <em>pool mode</em>、決定 application connection 跟 PostgreSQL backend connection 的綁定方式：</p>
<ul>
<li><strong>Session pooling</strong>：application connection 拿到 backend connection 後、整個 session 期間都綁同一個 backend。tear-down 才釋放。語義跟「直連」一樣、不破壞 session state。但 <em>idle connection 仍占 backend slot</em>、收斂效率低、適合 <em>連線數不多但要保留 session state</em>（用了 prepared statement、temporary table、advisory lock 等）的場景。</li>
<li><strong>Transaction pooling</strong>：application connection 在 <em>transaction 邊界</em> 才綁 backend、commit / rollback 後立即釋放。同一個 application connection 不同 transaction 可能拿到不同 backend。收斂效率高（idle connection 完全不占 backend slot）、但 <em>session state 限制嚴</em> — 不能用 <code>SET</code> 改 session-level setting、不能用 prepared statement（除非 application 端禁用）、不能用 advisory lock 跨 transaction。</li>
<li><strong>Statement pooling</strong>：每個 statement 完就釋放 backend。極端高收斂但 <em>連 transaction 都不能跨 statement</em>、絕大多數 application 用不了、只在 batch query 場景。</li>
</ul>
<p><strong>Production 預設選 transaction pooling</strong>、application 端禁用 prepared statement（或用 <a href="https://www.pgbouncer.org/config.html#max_prepared_statements">PgBouncer-supported prepared statement</a>、需 pgBouncer 1.21+）。例外場景才開 session pooling。</p>
<p><strong>Pool sizing 公式</strong>：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">PostgreSQL max_connections     = pgBouncer N × default_pool_size + reserve
</span></span><span class="line"><span class="ln">2</span><span class="cl">pgBouncer default_pool_size    = per-database backend connection 上限
</span></span><span class="line"><span class="ln">3</span><span class="cl">Application pool size          = 每 application instance 拿幾個 pgBouncer connection</span></span></code></pre></div><p>實例：50 個 application replica、每 instance pool 30 個、pgBouncer 後 default_pool_size = 20（per database）、3 個 database。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">Total application → pgBouncer = 50 × 30 = 1500 connection
</span></span><span class="line"><span class="ln">2</span><span class="cl">pgBouncer → PostgreSQL        = 3 × 20 = 60 connection
</span></span><span class="line"><span class="ln">3</span><span class="cl">PostgreSQL max_connections    = 60 + reserve (50 預留 admin / migration) = 110</span></span></code></pre></div><p>1500 → 110 收斂 13.6 倍、PostgreSQL 還在合理上限內。</p>
<h2 id="step-by-step-配置">Step-by-step 配置</h2>
<p><strong>pgBouncer.ini</strong>：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-ini" data-lang="ini"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="k">[databases]</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="na">mydb</span> <span class="o">=</span> <span class="s">host=postgres-primary.internal port=5432 dbname=mydb auth_user=pgbouncer</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="k">[pgbouncer]</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="na">listen_port</span> <span class="o">=</span> <span class="s">6432</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="na">listen_addr</span> <span class="o">=</span> <span class="s">0.0.0.0</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="na">auth_type</span> <span class="o">=</span> <span class="s">scram-sha-256</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="na">auth_file</span> <span class="o">=</span> <span class="s">/etc/pgbouncer/userlist.txt</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="na">auth_query</span> <span class="o">=</span> <span class="s">SELECT usename, passwd FROM pg_shadow WHERE usename=$1</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">
</span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="na">pool_mode</span> <span class="o">=</span> <span class="s">transaction</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="na">default_pool_size</span> <span class="o">=</span> <span class="s">20</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="na">min_pool_size</span> <span class="o">=</span> <span class="s">5</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="na">reserve_pool_size</span> <span class="o">=</span> <span class="s">10</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="na">reserve_pool_timeout</span> <span class="o">=</span> <span class="s">5</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl">
</span></span><span class="line"><span class="ln">17</span><span class="cl"><span class="na">max_client_conn</span> <span class="o">=</span> <span class="s">2000</span>
</span></span><span class="line"><span class="ln">18</span><span class="cl"><span class="na">max_db_connections</span> <span class="o">=</span> <span class="s">100</span>
</span></span><span class="line"><span class="ln">19</span><span class="cl">
</span></span><span class="line"><span class="ln">20</span><span class="cl"><span class="na">server_idle_timeout</span> <span class="o">=</span> <span class="s">600</span>
</span></span><span class="line"><span class="ln">21</span><span class="cl"><span class="na">server_lifetime</span> <span class="o">=</span> <span class="s">3600</span>
</span></span><span class="line"><span class="ln">22</span><span class="cl"><span class="na">server_connect_timeout</span> <span class="o">=</span> <span class="s">15</span>
</span></span><span class="line"><span class="ln">23</span><span class="cl"><span class="na">server_login_retry</span> <span class="o">=</span> <span class="s">5</span>
</span></span><span class="line"><span class="ln">24</span><span class="cl">
</span></span><span class="line"><span class="ln">25</span><span class="cl"><span class="na">client_idle_timeout</span> <span class="o">=</span> <span class="s">0</span>
</span></span><span class="line"><span class="ln">26</span><span class="cl"><span class="na">client_login_timeout</span> <span class="o">=</span> <span class="s">60</span>
</span></span><span class="line"><span class="ln">27</span><span class="cl">
</span></span><span class="line"><span class="ln">28</span><span class="cl"><span class="na">stats_period</span> <span class="o">=</span> <span class="s">60</span>
</span></span><span class="line"><span class="ln">29</span><span class="cl"><span class="na">log_connections</span> <span class="o">=</span> <span class="s">0</span>
</span></span><span class="line"><span class="ln">30</span><span class="cl"><span class="na">log_disconnections</span> <span class="o">=</span> <span class="s">0</span>
</span></span><span class="line"><span class="ln">31</span><span class="cl"><span class="na">log_pooler_errors</span> <span class="o">=</span> <span class="s">1</span>
</span></span><span class="line"><span class="ln">32</span><span class="cl">
</span></span><span class="line"><span class="ln">33</span><span class="cl"><span class="na">admin_users</span> <span class="o">=</span> <span class="s">pgbouncer_admin</span>
</span></span><span class="line"><span class="ln">34</span><span class="cl"><span class="na">stats_users</span> <span class="o">=</span> <span class="s">pgbouncer_stats</span></span></span></code></pre></div><p>關鍵欄位解釋：</p>
<ul>
<li><code>pool_mode = transaction</code>：絕大多數 production 場景</li>
<li><code>default_pool_size = 20</code>：每 database 對 PostgreSQL 的 backend connection 上限、調整時要算進 PostgreSQL <code>max_connections</code></li>
<li><code>reserve_pool_size = 10</code> + <code>reserve_pool_timeout = 5</code>：當 default_pool_size 用滿、等 5 秒還拿不到 connection 才用 reserve pool — 是 <em>突發 spike</em> 的 buffer、不是 baseline</li>
<li><code>max_client_conn = 2000</code>：application 端能連 pgBouncer 的最大數</li>
<li><code>server_lifetime = 3600</code>：每 1 小時強制 recycle backend connection、避免 long-lived connection 累積 memory bloat（PostgreSQL <code>pg_stat_activity</code> 看 connection age）</li>
<li><code>auth_query</code>：pgBouncer 直接從 PostgreSQL <code>pg_shadow</code> 拉密碼、不需要在 pgBouncer 本地維護 userlist — production 推薦做法</li>
</ul>
<p><strong>Application 端 pool 設定</strong>：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c"># 例：Spring Boot HikariCP</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="w"></span><span class="nt">spring.datasource.url</span><span class="p">:</span><span class="w"> </span><span class="l">jdbc:postgresql://pgbouncer.internal:6432/mydb</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="w"></span><span class="nt">spring.datasource.hikari.maximum-pool-size</span><span class="p">:</span><span class="w"> </span><span class="m">30</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="w"></span><span class="nt">spring.datasource.hikari.minimum-idle</span><span class="p">:</span><span class="w"> </span><span class="m">5</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="w"></span><span class="nt">spring.datasource.hikari.connection-timeout</span><span class="p">:</span><span class="w"> </span><span class="m">30000</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="w"></span><span class="nt">spring.datasource.hikari.idle-timeout</span><span class="p">:</span><span class="w"> </span><span class="m">600000</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="w"></span><span class="nt">spring.datasource.hikari.max-lifetime</span><span class="p">:</span><span class="w"> </span><span class="m">1800000</span><span class="w">  </span><span class="c"># 30 min &lt; pgBouncer server_lifetime 60 min</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="w"></span><span class="c"># 例：SQLAlchemy</span><span class="w">
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="w"></span><span class="l">engine = create_engine(</span><span class="w">
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="w">    </span><span class="s2">&#34;postgresql://pgbouncer.internal:6432/mydb&#34;</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="w">    </span><span class="l">pool_size=30,</span><span class="w">
</span></span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="w">    </span><span class="l">max_overflow=5,</span><span class="w">
</span></span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="w">    </span><span class="l">pool_pre_ping=True,       </span><span class="w"> </span><span class="c"># 必開、檢測 stale connection</span><span class="w">
</span></span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="w">    </span><span class="l">pool_recycle=1800,        </span><span class="w"> </span><span class="c"># 30 min、跟 pgBouncer server_lifetime 對齊</span><span class="w">
</span></span></span><span class="line"><span class="ln">16</span><span class="cl"><span class="w"></span><span class="l">)</span></span></span></code></pre></div><p><strong>Application 跟 pgBouncer 對齊</strong>：</p>
<ul>
<li>application <code>max-lifetime</code> &lt; pgBouncer <code>server_lifetime</code>：避免 application 拿到已被 pgBouncer recycle 的 connection</li>
<li><code>pool_pre_ping = True</code>：每次 checkout 前 send <code>SELECT 1</code>、檢測 stale connection — 對 transaction pooling 是必要的</li>
<li>application 端 <em>不要</em> 用 prepared statement（除非 pgBouncer 1.21+ 設 <code>max_prepared_statements</code>）</li>
</ul>
<h2 id="故障演練--邊界-case">故障演練 / 邊界 case</h2>
<h3 id="case-1pool-exhaustiondefault_pool_size-用滿">Case 1：Pool exhaustion（default_pool_size 用滿）</h3>
<p>徵兆：application log <code>ERROR: no more connections allowed</code>、pgBouncer log <code>pool is full</code>、pgBouncer admin console <code>SHOW POOLS</code> 顯示 <code>cl_waiting &gt; 0</code>。</p>
<p>Debug：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1">-- 連 pgBouncer admin
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"></span><span class="err">\</span><span class="k">c</span><span class="w"> </span><span class="n">pgbouncer</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w"></span><span class="k">SHOW</span><span class="w"> </span><span class="n">POOLS</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w"></span><span class="c1">-- 看 cl_active / cl_waiting / sv_active / sv_idle
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="c1"></span><span class="k">SHOW</span><span class="w"> </span><span class="n">SERVERS</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="w"></span><span class="c1">-- 看 server connection state（active / idle / used）</span></span></span></code></pre></div><p>修：</p>
<ul>
<li>短期：調高 <code>default_pool_size</code> 跟 PostgreSQL <code>max_connections</code>、配合 reserve pool</li>
<li>中期：找 <em>long-running query</em>（PostgreSQL <code>pg_stat_activity</code> 看 <code>query_start</code>、kill 過長 query）</li>
<li>長期：拆 database / 改 read replica / 移 OLAP query 到 data warehouse</li>
</ul>
<h3 id="case-2transaction-pooling-下-session-state-漏洞">Case 2：Transaction pooling 下 session state 漏洞</h3>
<p>徵兆：random 失敗 <code>prepared statement &quot;S_3&quot; does not exist</code>、<code>relation &quot;tmp_xxx&quot; does not exist</code>、advisory lock 不釋放。</p>
<p>原因：application 用了 prepared statement / temporary table / advisory lock、但 transaction commit 後 backend connection 釋放、下一個 transaction 拿到不同 backend、session state 不存在。</p>
<p>修：</p>
<ul>
<li>Application 框架禁用 prepared statement（JDBC <code>prepareThreshold=0</code>、SQLAlchemy <code>use_native_prepared_statements=False</code>）</li>
<li>temporary table 改 <a href="https://www.postgresql.org/docs/current/sql-createtable.html#SQL-CREATETABLE-UNLOGGED-TABLES">unlogged table</a> + cleanup</li>
<li>advisory lock 改 row-level lock 或 application-level lock（Redis）</li>
<li>或：切到 session pooling、犧牲收斂效率</li>
</ul>
<h3 id="case-3dns-based-failover-後-application-連到舊-master">Case 3：DNS-based failover 後 application 連到舊 master</h3>
<p>徵兆：PostgreSQL 切換 master 後、application 寫操作 <em>時好時壞</em>（看連到哪台）。</p>
<p>原因：pgBouncer 在 application 跟 PostgreSQL 之間、application 不知道 backend 換了；pgBouncer 自己也需要 reload config 才會連新 master。</p>
<p>修：</p>
<ul>
<li>pgBouncer 用 <code>RECONNECT</code> admin command 強制 close all backend connection、重連</li>
<li>配 Patroni / Stolon 等 HA 工具自動 trigger pgBouncer reconnect</li>
<li>application 端 <code>pool_pre_ping</code> 開啟、stale connection 自動踢</li>
</ul>
<h3 id="case-4server-lifetime-recycle-跟-in-flight-transaction-衝突">Case 4：Server lifetime recycle 跟 in-flight transaction 衝突</h3>
<p>徵兆：偶發 <code>server closed the connection unexpectedly</code>、跟 long-running transaction 重疊。</p>
<p>原因：pgBouncer <code>server_lifetime = 3600</code> 強制 recycle、但有 transaction 在跑時 pgBouncer 不會切、超過時間後仍會切。</p>
<p>修：</p>
<ul>
<li>確認沒有 <em>超過 1 小時</em> 的 transaction（PostgreSQL <code>pg_stat_activity</code> 看 <code>xact_start</code>）</li>
<li>必要時調高 <code>server_lifetime</code>、但 memory bloat 風險上升</li>
<li>application 端做 transaction timeout</li>
</ul>
<h3 id="case-5pgbouncer-自己-crash--oom">Case 5：pgBouncer 自己 crash / OOM</h3>
<p>徵兆：所有 application 同時失去 PostgreSQL 連線。</p>
<p>原因：pgBouncer 是 single-process（除非 1.21+ 用 <code>so_reuseport</code> 多 process）、memory leak / OOM / 部署事件都會打掉整個 connection layer。</p>
<p>修：</p>
<ul>
<li>多 pgBouncer instance + load balancer（HAProxy / Envoy）前置、application 連 LB</li>
<li><code>so_reuseport = 1</code>（1.21+）讓多個 pgBouncer process 共用 port</li>
<li>Resource limit 跟 alert：RSS &gt; N、connection count &gt; M</li>
<li>HA mode：active-passive 配 keepalived</li>
</ul>
<h2 id="容量--cost-規劃">容量 / cost 規劃</h2>
<p><strong>單一 pgBouncer 容量上限</strong>：</p>
<ul>
<li><code>max_client_conn</code>：實務 &lt; 5000 per instance（再高 CPU 跟 file descriptor 緊）</li>
<li><code>default_pool_size × database 數</code>：實務 &lt; 200 per instance</li>
<li>single process CPU bound：在 10K QPS 等級已經是瓶頸、要橫向 scale</li>
</ul>
<p><strong>何時加 pgBouncer instance</strong>：</p>
<ul>
<li>application connection 數突破 3000 / pgBouncer instance</li>
<li>pgBouncer CPU usage &gt; 60%（baseline、不算 spike）</li>
<li>跨 region application 需要 region-local pgBouncer</li>
</ul>
<p><strong>何時改架構（pgBouncer 不夠用）</strong>：</p>
<ul>
<li>PostgreSQL backend connection 數突破 500（即使有 pgBouncer 也撐不住）→ 改 read replica / partitioning / sharding</li>
<li>write 量太大（每秒 50K+ TPS）→ 改 sharding（<a href="https://vitess.io">Vitess</a> / <a href="https://www.citusdata.com">Citus</a>）或全球分散式 SQL（<a href="/blog/backend/01-database/global-distributed-oltp/" data-link-title="1.11 全球分散式 OLTP" data-link-desc="Spanner / Aurora DSQL / Cosmos DB multi-region write / CockroachDB / TiDB 的全球一致性取捨">1.11 全球分散式 OLTP</a>）</li>
<li>application 大量 prepared statement / session state 需求 → 改 <a href="https://github.com/postgresml/pgcat">PgCat</a>（Rust 寫、支援更完整的 session feature）或回 session pooling</li>
</ul>
<h2 id="整合--下一步">整合 / 下一步</h2>
<p><strong>跟 HA failover 整合</strong>（<a href="https://github.com/zalando/patroni">Patroni</a>）：</p>
<ul>
<li>Patroni 切換 master 後 trigger pgBouncer <code>RECONNECT</code></li>
<li>pgBouncer 透過 service discovery（Consul / etcd）拿新 master 位址、不是寫死在 config</li>
<li>application 不需感知 failover、connection 從 pgBouncer 拿到新 master 的 backend</li>
</ul>
<p><strong>跟監控整合</strong>：</p>
<ul>
<li>pgBouncer admin console <code>SHOW STATS</code> / <code>SHOW POOLS</code> / <code>SHOW SERVERS</code> 拉到 Prometheus（<a href="https://github.com/jbub/pgbouncer_exporter">pgbouncer_exporter</a>）</li>
<li>必看 metric：<code>cl_waiting</code>（等 backend 的 client 數）、<code>sv_active</code>（active backend 數）、<code>avg_query_time</code>、<code>avg_xact_time</code></li>
<li>Alert：<code>cl_waiting &gt; 0 持續 30s</code>、<code>server connection error rate &gt; 0</code></li>
</ul>
<p><strong>跟 application observability 整合</strong>：</p>
<ul>
<li>Application APM（<a href="/blog/backend/04-observability/vendors/datadog/" data-link-title="Datadog" data-link-desc="All-in-one SaaS 觀測平台、APM / Logs / Metrics / RUM / Security">Datadog</a> / Honeycomb / OpenTelemetry）的 DB span 顯示 <em>application 看到的 latency</em>、pgBouncer metric 顯示 <em>pgBouncer ↔ PostgreSQL latency</em> — 兩者差異揭露 connection wait time</li>
</ul>
<p><strong>何時 revisit 這個配置</strong>：</p>
<ul>
<li>application 數量倍增（trigger pool sizing 重算）</li>
<li>PostgreSQL 升級（pgBouncer 跟 PostgreSQL 版本相容性）</li>
<li>跨 region 部署（要不要 region-local pgBouncer）</li>
<li>切換到 RDS Proxy / Aurora Cluster Endpoint（managed alternative）</li>
</ul>
<h2 id="相關連結">相關連結</h2>
<ul>
<li><a href="/blog/backend/01-database/vendors/postgresql/" data-link-title="PostgreSQL" data-link-desc="多用途 OLTP 主流關聯式資料庫、MVCC、豐富 SQL 特性、是 Aurora / Cosmos DB / Spanner / CockroachDB / Aurora DSQL 的相容目標">PostgreSQL vendor overview</a> — 本文是該頁尾「pgBouncer / PgCat 配置 best practice」backlog 的深度展開</li>
<li><a href="/blog/backend/01-database/vendors/postgresql/connection-scaling/" data-link-title="PostgreSQL Connection Scaling：process-per-connection model 跟為什麼 pooler 是必裝" data-link-desc="PG 每個 client connection fork 一個 backend process（不是 thread）、RAM 成本 5-15MB/connection、context switch 跟 fork() cost 在 100&#43; connection 後線性放大、所以 pooler 不是 *optional optimization* 而是 *production prerequisite*。本文走 process-per-connection model 跟 MySQL thread-per-connection 對比、max_connections &#43; shared_buffers &#43; work_mem 三 GUC 互動、application-side pool vs middleware pool vs RDS Proxy 三層選擇、5 production 踩雷（connection storm / fork() cost 在 burst 流量 / shared_buffers 跟 connection 數壓縮 / double-pool 配置錯誤 / max_connections 設太大反而慢）、跟 PgBouncer config 互補不重複">Connection Scaling Deep Dive</a> — connection-per-process model 跟為什麼 pooler 是必裝（根因 vs 配置）</li>
<li><a href="/blog/backend/01-database/high-concurrency-access/" data-link-title="1.1 高併發下的 SQL 讀寫邊界" data-link-desc="說明高併發服務如何共用資料庫 client、控制 transaction、管理 connection pool、避免資料庫成為瓶頸">1.1 高併發資料存取</a> — 上游：什麼時候需要 connection pool</li>
<li><a href="/blog/backend/knowledge-cards/connection-pool/" data-link-title="Connection Pool" data-link-desc="說明連線池如何限制下游資源並影響服務容量">Connection Pool 卡片</a> — 概念基底</li>
<li><a href="/blog/posts/vendor-%E6%B7%B1%E5%BA%A6%E6%8A%80%E8%A1%93%E6%96%87%E7%AB%A0%E6%96%B9%E6%B3%95%E8%AB%96%E7%9A%84%E6%BC%94%E5%8C%96%E7%B4%80%E9%8C%84%E5%90%8C-vendor-%E7%B3%BB%E5%88%97%E7%9A%84%E9%96%8B%E5%A0%B4%E8%BC%AA%E6%9B%BF%E9%A9%97%E8%AD%89/" data-link-title="Vendor 深度技術文章方法論的演化紀錄：同 vendor 系列的開場輪替驗證" data-link-desc="vendor overview 飽和後要寫單一功能深度文章、需要選題與結構依據時回來。這套方法論的驗證來源與 cadence variant 在高風險場景（同 vendor sub-tool 系列）的實證。">Vendor 深度技術文章方法論</a> — 本文是該方法論的 demo #1</li>
<li><a href="/blog/backend/09-performance-capacity/cases/ntt-docomo-lemino-japanese-streaming/" data-link-title="9.C29 NTT DOCOMO Lemino：3 個月達 500 萬 MAU 的串流後端" data-link-desc="Lemino 用 DynamoDB &#43; AWS Media Services 撐 30 channels live &#43; 5M MAU、工程工時下降 90%">9.C29 Lemino RDB connection limit case</a> — connection 爆是 streaming surge 場景的 vendor-switch 主因</li>
<li>官方：<a href="https://www.pgbouncer.org/usage.html">pgBouncer Documentation</a></li>
</ul>
]]></content:encoded></item></channel></rss>