Rate Limit 實作

2026-06-20

Rate limit 的實作分成三個層次：單機 middleware（一個 server instance 內的限速）、分散式限速（多個 instance 共用的限速狀態）、配額設計（不同 client 和 endpoint 的差異化配額）。Rate limit 的概念基礎（token bucket / sliding window / 和背壓的區別）見 DevOps 流量管控，本章聚焦後端的程式碼實作。

單機 Middleware 實作

Rate limit middleware 在 HTTP handler 之前攔截請求。每個 request 過一次 limiter，通過就進入 handler，超限就回 429。

Go 實作

Go 標準生態的 golang.org/x/time/rate 提供 token bucket 的 rate.Limiter。

 1import "golang.org/x/time/rate"
 2
 3// 全域 limiter：每秒 100 個 request、burst 上限 200
 4var globalLimiter = rate.NewLimiter(100, 200)
 5
 6func rateLimitMiddleware(next http.Handler) http.Handler {
 7    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
 8        if !globalLimiter.Allow() {
 9            w.Header().Set("Retry-After", "1")
10            http.Error(w, "Too Many Requests", http.StatusTooManyRequests)
11            return
12        }
13        next.ServeHTTP(w, r)
14    })
15}

Per-client 限速

全域 limiter 對所有 client 共用一個配額。Per-client 限速讓每個 client（by API key、IP、或 tenant ID）有各自的配額。

 1var clients sync.Map // map[string]*rate.Limiter
 2
 3func getClientLimiter(clientID string) *rate.Limiter {
 4    if limiter, ok := clients.Load(clientID); ok {
 5        return limiter.(*rate.Limiter)
 6    }
 7    limiter := rate.NewLimiter(10, 20) // 每 client 每秒 10 個
 8    clients.Store(clientID, limiter)
 9    return limiter
10}

Per-client limiter 用 sync.Map 存、首次出現的 client 自動建立 limiter。長期運行的服務需要定期清理不再活躍的 client limiter（用 goroutine + ticker 掃描最後使用時間）。

回應格式

超限時的 HTTP response 需要帶足夠資訊讓 client 做正確的重試決策。

1HTTP/1.1 429 Too Many Requests
2Retry-After: 1
3X-RateLimit-Limit: 100
4X-RateLimit-Remaining: 0
5X-RateLimit-Reset: 1719014400

Retry-After 告訴 client 等多久再試（秒數或 HTTP date）。X-RateLimit-* headers 不是 RFC 標準但被廣泛使用（GitHub API、Stripe API 都用），讓 client 在被限速前就知道剩餘配額。

分散式限速（Redis-backed）

單機 limiter 的計數存在 process 記憶體中。多個 server instance 各自有獨立的 limiter，client 的請求被 load balancer 分配到不同 instance 時，每個 instance 只看到部分請求 — 全域限速失效。

Redis 做共用的計數儲存，所有 instance 查同一個 counter。

Sliding Window Counter

用 Redis 的 INCR + EXPIRE 實作 sliding window counter。

 1-- Redis Lua script（原子操作）
 2local key = KEYS[1]
 3local limit = tonumber(ARGV[1])
 4local window = tonumber(ARGV[2])
 5
 6local current = redis.call('INCR', key)
 7if current == 1 then
 8    redis.call('EXPIRE', key, window)
 9end
10
11if current > limit then
12    return 0  -- 超限
13end
14return 1      -- 通過

Key 的設計：ratelimit:{client_id}:{endpoint}:{window_start}。Window start 用當前時間截斷到秒或分鐘（如 1719014400），每個窗口一個 key，EXPIRE 自動清理過期窗口。

現成套件

自己寫 Lua script 適合學習，production 用現成套件更可靠：

語言	套件	特點
Go	`go-redis/redis_rate`	Token bucket 演算法、原子操作、直接整合 go-redis
Node	`rate-limit-redis` + `express-rate-limit`	Express middleware、Redis store 外掛
Python	`limits` + Redis backend	多演算法支援（fixed window / sliding window / token bucket）

配額設計

差異化配額

不同的 endpoint 和 client 有不同的配額需求。搜尋 API 比列表 API 消耗更多計算資源，應該有更低的速率上限。

維度	配額範例	理由
Per-API key	1000 req/min	每個 client 的公平上限
Per-endpoint	搜尋 100 req/min、列表 500 req/min	搜尋比列表貴
Per-tenant	免費 100 req/min、付費 10000 req/min	商業差異化

配額溢出的處理

超限時的處理策略依業務需求決定：

Reject（429）：直接拒絕。最簡單，適合 API 服務。Client 收到 429 後按 Retry-After 重試。

Queue（排隊等）：超限的請求進入等待隊列，按順序處理。適合不能丟棄的操作（付款確認、訂單建立）。代價是 client 端等待時間增加。

Degrade（降級回應）：超限時回傳簡化版的回應（cached 結果、摘要而非完整資料）。適合讀取操作。

和 Monitoring 的整合

Rate limit 的命中事件應該記入監控系統，讓團隊知道哪些 client 在撞限速、哪些 endpoint 的配額是否合理。

1// Rate limit hit 時送 metric 事件
2monitor.Metric("ratelimit.hit", map[string]any{
3    "client_id": clientID,
4    "endpoint":  r.URL.Path,
5    "limit":     100,
6    "window":    "1m",
7})

Dashboard 視圖：rate limit hit 的時間趨勢 + 按 client 和 endpoint 分群。Hit 數持續上升代表配額設太低（正常使用被限速）或某個 client 在濫用。

下一步路由

Rate limit 的概念基礎 → DevOps 流量管控 — Rate Limiting
背壓機制（被動的流量控制）→ DevOps 背壓機制
Rate limit 知識卡 → Rate Limit
監控系統中的 ingestion 限速 → Monitoring Ingestion Scaling

#backend #performance #rate-limit #middleware #redis