<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Middleware on Tarragon</title><link>https://tarrragon.github.io/blog/tags/middleware/</link><description>Recent content in Middleware on Tarragon</description><generator>Hugo -- gohugo.io</generator><language>zh-TW</language><copyright>Tarragon (CC BY 4.0)</copyright><lastBuildDate>Sat, 20 Jun 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://tarrragon.github.io/blog/tags/middleware/index.xml" rel="self" type="application/rss+xml"/><item><title>Rate Limit 實作</title><link>https://tarrragon.github.io/blog/backend/09-performance-capacity/rate-limit-implementation/</link><pubDate>Sat, 20 Jun 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/09-performance-capacity/rate-limit-implementation/</guid><description>&lt;p>&lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/rate-limit/" data-link-title="Rate Limit" data-link-desc="說明限流如何保護服務入口、下游依賴與租戶公平性">Rate limit&lt;/a> 的實作分成三個層次：單機 middleware（一個 server instance 內的限速）、分散式限速（多個 instance 共用的限速狀態）、配額設計（不同 client 和 endpoint 的差異化配額）。Rate limit 的概念基礎（token bucket / sliding window / 和背壓的區別）見 &lt;a href="https://tarrragon.github.io/blog/devops/03-traffic-management/rate-limiting/" data-link-title="Rate Limiting" data-link-desc="主動限制每個來源的請求速率 — per-client vs global、token bucket vs sliding window、優先級豁免">DevOps 流量管控&lt;/a>，本章聚焦後端的程式碼實作。&lt;/p>
&lt;h2 id="單機-middleware-實作">單機 Middleware 實作&lt;/h2>
&lt;p>Rate limit middleware 在 HTTP handler 之前攔截請求。每個 request 過一次 limiter，通過就進入 handler，超限就回 429。&lt;/p>
&lt;h3 id="go-實作">Go 實作&lt;/h3>
&lt;p>Go 標準生態的 &lt;code>golang.org/x/time/rate&lt;/code> 提供 token bucket 的 &lt;code>rate.Limiter&lt;/code>。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="ln"> 1&lt;/span>&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="s">&amp;#34;golang.org/x/time/rate&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 2&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 3&lt;/span>&lt;span class="cl">&lt;span class="c1">// 全域 limiter：每秒 100 個 request、burst 上限 200&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 4&lt;/span>&lt;span class="cl">&lt;span class="kd">var&lt;/span> &lt;span class="nx">globalLimiter&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="nx">rate&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">NewLimiter&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">100&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">200&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 5&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 6&lt;/span>&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">rateLimitMiddleware&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">next&lt;/span> &lt;span class="nx">http&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Handler&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nx">http&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Handler&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 7&lt;/span>&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">http&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">HandlerFunc&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="kd">func&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">w&lt;/span> &lt;span class="nx">http&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">ResponseWriter&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">r&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">http&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Request&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 8&lt;/span>&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="p">!&lt;/span>&lt;span class="nx">globalLimiter&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Allow&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 9&lt;/span>&lt;span class="cl"> &lt;span class="nx">w&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Header&lt;/span>&lt;span class="p">().&lt;/span>&lt;span class="nf">Set&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;Retry-After&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;1&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">10&lt;/span>&lt;span class="cl"> &lt;span class="nx">http&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Error&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">w&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;Too Many Requests&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">http&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">StatusTooManyRequests&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">11&lt;/span>&lt;span class="cl"> &lt;span class="k">return&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">12&lt;/span>&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">13&lt;/span>&lt;span class="cl"> &lt;span class="nx">next&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">ServeHTTP&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">w&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">r&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">14&lt;/span>&lt;span class="cl"> &lt;span class="p">})&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">15&lt;/span>&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="per-client-限速">Per-client 限速&lt;/h3>
&lt;p>全域 limiter 對所有 client 共用一個配額。Per-client 限速讓每個 client（by API key、IP、或 tenant ID）有各自的配額。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="ln"> 1&lt;/span>&lt;span class="cl">&lt;span class="kd">var&lt;/span> &lt;span class="nx">clients&lt;/span> &lt;span class="nx">sync&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Map&lt;/span> &lt;span class="c1">// map[string]*rate.Limiter&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 2&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 3&lt;/span>&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">getClientLimiter&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">clientID&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">rate&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Limiter&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 4&lt;/span>&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="nx">limiter&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">ok&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">clients&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Load&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">clientID&lt;/span>&lt;span class="p">);&lt;/span> &lt;span class="nx">ok&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 5&lt;/span>&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">limiter&lt;/span>&lt;span class="p">.(&lt;/span>&lt;span class="o">*&lt;/span>&lt;span class="nx">rate&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Limiter&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 6&lt;/span>&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 7&lt;/span>&lt;span class="cl"> &lt;span class="nx">limiter&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">rate&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">NewLimiter&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">10&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">20&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="c1">// 每 client 每秒 10 個&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 8&lt;/span>&lt;span class="cl"> &lt;span class="nx">clients&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Store&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">clientID&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">limiter&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 9&lt;/span>&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">limiter&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">10&lt;/span>&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Per-client limiter 用 &lt;code>sync.Map&lt;/code> 存、首次出現的 client 自動建立 limiter。長期運行的服務需要定期清理不再活躍的 client limiter（用 goroutine + ticker 掃描最後使用時間）。&lt;/p>
&lt;h3 id="回應格式">回應格式&lt;/h3>
&lt;p>超限時的 HTTP response 需要帶足夠資訊讓 client 做正確的重試決策。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-text" data-lang="text">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">HTTP/1.1 429 Too Many Requests
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">Retry-After: 1
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">X-RateLimit-Limit: 100
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">X-RateLimit-Remaining: 0
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl">X-RateLimit-Reset: 1719014400&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;code>Retry-After&lt;/code> 告訴 client 等多久再試（秒數或 HTTP date）。&lt;code>X-RateLimit-*&lt;/code> headers 不是 RFC 標準但被廣泛使用（GitHub API、Stripe API 都用），讓 client 在被限速前就知道剩餘配額。&lt;/p>
&lt;h2 id="分散式限速redis-backed">分散式限速（Redis-backed）&lt;/h2>
&lt;p>單機 limiter 的計數存在 process 記憶體中。多個 server instance 各自有獨立的 limiter，client 的請求被 load balancer 分配到不同 instance 時，每個 instance 只看到部分請求 — 全域限速失效。&lt;/p>
&lt;p>Redis 做共用的計數儲存，所有 instance 查同一個 counter。&lt;/p></description><content:encoded><![CDATA[<p><a href="/blog/backend/knowledge-cards/rate-limit/" data-link-title="Rate Limit" data-link-desc="說明限流如何保護服務入口、下游依賴與租戶公平性">Rate limit</a> 的實作分成三個層次：單機 middleware（一個 server instance 內的限速）、分散式限速（多個 instance 共用的限速狀態）、配額設計（不同 client 和 endpoint 的差異化配額）。Rate limit 的概念基礎（token bucket / sliding window / 和背壓的區別）見 <a href="/blog/devops/03-traffic-management/rate-limiting/" data-link-title="Rate Limiting" data-link-desc="主動限制每個來源的請求速率 — per-client vs global、token bucket vs sliding window、優先級豁免">DevOps 流量管控</a>，本章聚焦後端的程式碼實作。</p>
<h2 id="單機-middleware-實作">單機 Middleware 實作</h2>
<p>Rate limit middleware 在 HTTP handler 之前攔截請求。每個 request 過一次 limiter，通過就進入 handler，超限就回 429。</p>
<h3 id="go-實作">Go 實作</h3>
<p>Go 標準生態的 <code>golang.org/x/time/rate</code> 提供 token bucket 的 <code>rate.Limiter</code>。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="kn">import</span> <span class="s">&#34;golang.org/x/time/rate&#34;</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="c1">// 全域 limiter：每秒 100 個 request、burst 上限 200</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="kd">var</span> <span class="nx">globalLimiter</span> <span class="p">=</span> <span class="nx">rate</span><span class="p">.</span><span class="nf">NewLimiter</span><span class="p">(</span><span class="mi">100</span><span class="p">,</span> <span class="mi">200</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="kd">func</span> <span class="nf">rateLimitMiddleware</span><span class="p">(</span><span class="nx">next</span> <span class="nx">http</span><span class="p">.</span><span class="nx">Handler</span><span class="p">)</span> <span class="nx">http</span><span class="p">.</span><span class="nx">Handler</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">    <span class="k">return</span> <span class="nx">http</span><span class="p">.</span><span class="nf">HandlerFunc</span><span class="p">(</span><span class="kd">func</span><span class="p">(</span><span class="nx">w</span> <span class="nx">http</span><span class="p">.</span><span class="nx">ResponseWriter</span><span class="p">,</span> <span class="nx">r</span> <span class="o">*</span><span class="nx">http</span><span class="p">.</span><span class="nx">Request</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">        <span class="k">if</span> <span class="p">!</span><span class="nx">globalLimiter</span><span class="p">.</span><span class="nf">Allow</span><span class="p">()</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">            <span class="nx">w</span><span class="p">.</span><span class="nf">Header</span><span class="p">().</span><span class="nf">Set</span><span class="p">(</span><span class="s">&#34;Retry-After&#34;</span><span class="p">,</span> <span class="s">&#34;1&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">            <span class="nx">http</span><span class="p">.</span><span class="nf">Error</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span> <span class="s">&#34;Too Many Requests&#34;</span><span class="p">,</span> <span class="nx">http</span><span class="p">.</span><span class="nx">StatusTooManyRequests</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">            <span class="k">return</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">        <span class="nx">next</span><span class="p">.</span><span class="nf">ServeHTTP</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span> <span class="nx">r</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">    <span class="p">})</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><h3 id="per-client-限速">Per-client 限速</h3>
<p>全域 limiter 對所有 client 共用一個配額。Per-client 限速讓每個 client（by API key、IP、或 tenant ID）有各自的配額。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="kd">var</span> <span class="nx">clients</span> <span class="nx">sync</span><span class="p">.</span><span class="nx">Map</span> <span class="c1">// map[string]*rate.Limiter</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="kd">func</span> <span class="nf">getClientLimiter</span><span class="p">(</span><span class="nx">clientID</span> <span class="kt">string</span><span class="p">)</span> <span class="o">*</span><span class="nx">rate</span><span class="p">.</span><span class="nx">Limiter</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">    <span class="k">if</span> <span class="nx">limiter</span><span class="p">,</span> <span class="nx">ok</span> <span class="o">:=</span> <span class="nx">clients</span><span class="p">.</span><span class="nf">Load</span><span class="p">(</span><span class="nx">clientID</span><span class="p">);</span> <span class="nx">ok</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">        <span class="k">return</span> <span class="nx">limiter</span><span class="p">.(</span><span class="o">*</span><span class="nx">rate</span><span class="p">.</span><span class="nx">Limiter</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">    <span class="nx">limiter</span> <span class="o">:=</span> <span class="nx">rate</span><span class="p">.</span><span class="nf">NewLimiter</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">20</span><span class="p">)</span> <span class="c1">// 每 client 每秒 10 個</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">    <span class="nx">clients</span><span class="p">.</span><span class="nf">Store</span><span class="p">(</span><span class="nx">clientID</span><span class="p">,</span> <span class="nx">limiter</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">    <span class="k">return</span> <span class="nx">limiter</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>Per-client limiter 用 <code>sync.Map</code> 存、首次出現的 client 自動建立 limiter。長期運行的服務需要定期清理不再活躍的 client limiter（用 goroutine + ticker 掃描最後使用時間）。</p>
<h3 id="回應格式">回應格式</h3>
<p>超限時的 HTTP response 需要帶足夠資訊讓 client 做正確的重試決策。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">HTTP/1.1 429 Too Many Requests
</span></span><span class="line"><span class="ln">2</span><span class="cl">Retry-After: 1
</span></span><span class="line"><span class="ln">3</span><span class="cl">X-RateLimit-Limit: 100
</span></span><span class="line"><span class="ln">4</span><span class="cl">X-RateLimit-Remaining: 0
</span></span><span class="line"><span class="ln">5</span><span class="cl">X-RateLimit-Reset: 1719014400</span></span></code></pre></div><p><code>Retry-After</code> 告訴 client 等多久再試（秒數或 HTTP date）。<code>X-RateLimit-*</code> headers 不是 RFC 標準但被廣泛使用（GitHub API、Stripe API 都用），讓 client 在被限速前就知道剩餘配額。</p>
<h2 id="分散式限速redis-backed">分散式限速（Redis-backed）</h2>
<p>單機 limiter 的計數存在 process 記憶體中。多個 server instance 各自有獨立的 limiter，client 的請求被 load balancer 分配到不同 instance 時，每個 instance 只看到部分請求 — 全域限速失效。</p>
<p>Redis 做共用的計數儲存，所有 instance 查同一個 counter。</p>
<h3 id="sliding-window-counter">Sliding Window Counter</h3>
<p>用 Redis 的 INCR + EXPIRE 實作 sliding window counter。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-lua" data-lang="lua"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1">-- Redis Lua script（原子操作）</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="kd">local</span> <span class="n">key</span> <span class="o">=</span> <span class="n">KEYS</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="kd">local</span> <span class="n">limit</span> <span class="o">=</span> <span class="n">tonumber</span><span class="p">(</span><span class="n">ARGV</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="kd">local</span> <span class="n">window</span> <span class="o">=</span> <span class="n">tonumber</span><span class="p">(</span><span class="n">ARGV</span><span class="p">[</span><span class="mi">2</span><span class="p">])</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="kd">local</span> <span class="n">current</span> <span class="o">=</span> <span class="n">redis.call</span><span class="p">(</span><span class="s1">&#39;INCR&#39;</span><span class="p">,</span> <span class="n">key</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="kr">if</span> <span class="n">current</span> <span class="o">==</span> <span class="mi">1</span> <span class="kr">then</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">    <span class="n">redis.call</span><span class="p">(</span><span class="s1">&#39;EXPIRE&#39;</span><span class="p">,</span> <span class="n">key</span><span class="p">,</span> <span class="n">window</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="kr">end</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">
</span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="kr">if</span> <span class="n">current</span> <span class="o">&gt;</span> <span class="n">limit</span> <span class="kr">then</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">    <span class="kr">return</span> <span class="mi">0</span>  <span class="c1">-- 超限</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="kr">end</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="kr">return</span> <span class="mi">1</span>      <span class="c1">-- 通過</span></span></span></code></pre></div><p>Key 的設計：<code>ratelimit:{client_id}:{endpoint}:{window_start}</code>。Window start 用當前時間截斷到秒或分鐘（如 <code>1719014400</code>），每個窗口一個 key，EXPIRE 自動清理過期窗口。</p>
<h3 id="現成套件">現成套件</h3>
<p>自己寫 Lua script 適合學習，production 用現成套件更可靠：</p>
<table>
  <thead>
      <tr>
          <th>語言</th>
          <th>套件</th>
          <th>特點</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Go</td>
          <td><code>go-redis/redis_rate</code></td>
          <td>Token bucket 演算法、原子操作、直接整合 go-redis</td>
      </tr>
      <tr>
          <td>Node</td>
          <td><code>rate-limit-redis</code> + <code>express-rate-limit</code></td>
          <td>Express middleware、Redis store 外掛</td>
      </tr>
      <tr>
          <td>Python</td>
          <td><code>limits</code> + Redis backend</td>
          <td>多演算法支援（fixed window / sliding window / token bucket）</td>
      </tr>
  </tbody>
</table>
<h2 id="配額設計">配額設計</h2>
<h3 id="差異化配額">差異化配額</h3>
<p>不同的 endpoint 和 client 有不同的配額需求。搜尋 API 比列表 API 消耗更多計算資源，應該有更低的速率上限。</p>
<table>
  <thead>
      <tr>
          <th>維度</th>
          <th>配額範例</th>
          <th>理由</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Per-API key</td>
          <td>1000 req/min</td>
          <td>每個 client 的公平上限</td>
      </tr>
      <tr>
          <td>Per-endpoint</td>
          <td>搜尋 100 req/min、列表 500 req/min</td>
          <td>搜尋比列表貴</td>
      </tr>
      <tr>
          <td>Per-tenant</td>
          <td>免費 100 req/min、付費 10000 req/min</td>
          <td>商業差異化</td>
      </tr>
  </tbody>
</table>
<h3 id="配額溢出的處理">配額溢出的處理</h3>
<p>超限時的處理策略依業務需求決定：</p>
<p><strong>Reject（429）</strong>：直接拒絕。最簡單，適合 API 服務。Client 收到 429 後按 Retry-After 重試。</p>
<p><strong>Queue（排隊等）</strong>：超限的請求進入等待隊列，按順序處理。適合不能丟棄的操作（付款確認、訂單建立）。代價是 client 端等待時間增加。</p>
<p><strong>Degrade（降級回應）</strong>：超限時回傳簡化版的回應（cached 結果、摘要而非完整資料）。適合讀取操作。</p>
<h2 id="和-monitoring-的整合">和 Monitoring 的整合</h2>
<p>Rate limit 的命中事件應該記入監控系統，讓團隊知道哪些 client 在撞限速、哪些 endpoint 的配額是否合理。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1">// Rate limit hit 時送 metric 事件</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="nx">monitor</span><span class="p">.</span><span class="nf">Metric</span><span class="p">(</span><span class="s">&#34;ratelimit.hit&#34;</span><span class="p">,</span> <span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">any</span><span class="p">{</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">    <span class="s">&#34;client_id&#34;</span><span class="p">:</span> <span class="nx">clientID</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">    <span class="s">&#34;endpoint&#34;</span><span class="p">:</span>  <span class="nx">r</span><span class="p">.</span><span class="nx">URL</span><span class="p">.</span><span class="nx">Path</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">    <span class="s">&#34;limit&#34;</span><span class="p">:</span>     <span class="mi">100</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl">    <span class="s">&#34;window&#34;</span><span class="p">:</span>    <span class="s">&#34;1m&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="p">})</span></span></span></code></pre></div><p>Dashboard 視圖：rate limit hit 的時間趨勢 + 按 client 和 endpoint 分群。Hit 數持續上升代表配額設太低（正常使用被限速）或某個 client 在濫用。</p>
<h2 id="下一步路由">下一步路由</h2>
<ul>
<li>Rate limit 的概念基礎 → <a href="/blog/devops/03-traffic-management/rate-limiting/" data-link-title="Rate Limiting" data-link-desc="主動限制每個來源的請求速率 — per-client vs global、token bucket vs sliding window、優先級豁免">DevOps 流量管控 — Rate Limiting</a></li>
<li>背壓機制（被動的流量控制）→ <a href="/blog/devops/03-traffic-management/backpressure/" data-link-title="背壓機制" data-link-desc="下游處理慢時上游怎麼減速 — 有限 buffer &#43; 回壓訊號的設計、和 rate limit 的區別">DevOps 背壓機制</a></li>
<li>Rate limit 知識卡 → <a href="/blog/backend/knowledge-cards/rate-limit/" data-link-title="Rate Limit" data-link-desc="說明限流如何保護服務入口、下游依賴與租戶公平性">Rate Limit</a></li>
<li>監控系統中的 ingestion 限速 → <a href="/blog/monitoring/04-collector/ingestion-scaling/" data-link-title="Ingestion Scaling" data-link-desc="四層防線應對 ingestion 端的流量擴展 — SDK 取樣、Collector 背壓、水平擴展、Queue 解耦">Monitoring Ingestion Scaling</a></li>
</ul>
]]></content:encoded></item></channel></rss>