Dragonflydb on Tarragon

DragonflyDB → Redis / Valkey：回退到標準生態的遷移路徑

Mon, 22 Jun 2026 00:00:00 +0000

本文是跨 vendor migration playbook、cross-link 到 DragonflyDB（source）跟 Redis / Valkey（target）。反向路徑見 Redis → DragonflyDB。跑 6 維 diff dimension audit 後判定為 Type B drop-in（RESP 協定相容），但 HA 和持久化有差異需要處理。

為什麼從 DragonflyDB 遷回

DragonflyDB 遷回 Redis/Valkey 的 driver 跟正向遷移互為鏡像：

Redis Modules 需求：業務開始需要 RedisJSON、RediSearch 或 RedisTimeSeries，DragonflyDB 不支援 Redis Modules 生態
Cluster mode 需求：DragonflyDB 設計為單機 scale-up，當資料量超過單機記憶體上限（數 TB）或需要跨 node sharding 時，Redis Cluster 或 Valkey Cluster 是成熟選擇
Sentinel / HA 生態：DragonflyDB 的 HA 用自家 replication，不支援 Sentinel。若團隊已有 Sentinel 或 Operator 基礎設施，回到 Redis/Valkey 整合成本更低
BSL 授權疑慮：DragonflyDB 是 BSL 1.1（4 年後轉 Apache 2.0），部分組織偏好 BSD（Valkey）或即使是 RSALv2（Redis）的已知授權

6 維 diff dimension audit

維度	評估	等級
Schema / API	RESP 相容、data types 一致	Low
Operational model	DragonflyDB replication → Sentinel/Cluster；snapshotting → RDB+AOF	Medium
Abstraction / paradigm	相同（key-value cache）	Low
Number of components	DragonflyDB 1-2 nodes → Redis primary + replica + Sentinel（或 Cluster 6 nodes）	Medium
Application change	endpoint 換、client config 微調（無 API 差異）	Low
Data topology	DragonflyDB snapshot → Redis RDB 相容	Low

全域 Low-Medium → Type B drop-in，工作重心在 HA 架構切換和持久化模式對齊。

相容性確認

DragonflyDB → Redis 的相容方向跟 Redis → DragonflyDB 相反 — Redis 是 superset，回到 Redis 不會有功能缺失。但有幾個操作面差異需要處理：

DragonflyDB 行為	Redis 行為	處理方式
Multi-threaded 吞吐量	單主線程（I/O threads 輔助）	回到 Redis 後 throughput 下降是預期行為；若單機不夠需要 Cluster 分片
Fork-less snapshot	BGSAVE fork + COW	關注 persistence fork latency，大 dataset 的 fork 會造成延遲 spike
自家 replication	Redis replication + Sentinel 或 Cluster	需要重建 HA 架構，見下方階段二
無 AOF	AOF + RDB 混合持久化	依需求決定是否開 AOF；純 cache 場景可只用 RDB
無 Cluster mode	Redis Cluster 或 Valkey Cluster	資料量大時需要規劃 sharding

階段一：資料匯出

DragonflyDB 支援 SAVE / BGSAVE 產生 RDB 格式 snapshot，跟 Redis RDB 相容。

1# 在 DragonflyDB 觸發 snapshot
2redis-cli -h dragonfly-host BGSAVE
3
4# 等 BGSAVE 完成
5redis-cli -h dragonfly-host LASTSAVE
6
7# 複製 snapshot 檔案到 Redis 資料目錄
8cp /dragonfly-data/dump.rdb /redis-data/dump.rdb

RDB 載入驗證：

1# 啟動 Redis 載入 RDB
2redis-server --dbfilename dump.rdb --dir /redis-data
3
4# 驗證 key count
5redis-cli DBSIZE

若 DragonflyDB 跑的是較新版本產出的 RDB，先在測試環境驗證 Redis 能正常載入。DragonflyDB 的 RDB 基於 Redis 6.x 格式，Redis 7.x 和 Valkey 8.x 向下相容無問題。

階段二：HA 架構重建

DragonflyDB 回到 Redis/Valkey 後，HA 需要從 DragonflyDB replication 切換到 Sentinel 或 Cluster。

Sentinel 路徑（適合非分片場景）

1 primary + N replica + 3 Sentinel nodes。配置見 Sentinel HA Failover。

Cluster 路徑（適合需要分片的場景）

最小 3 primary + 3 replica。配置見 Redis Cluster Resharding。

選擇依據：資料量 < 單機記憶體的 70% 用 Sentinel，需要水平擴展用 Cluster。

階段三：Client 切換

Application 的 Redis client 不需要改 API — DragonflyDB 跟 Redis 用同一套 RESP 協定。需要改的只有：

Endpoint：從 DragonflyDB host:port 改為 Redis primary（或 Sentinel/Cluster endpoint）
認證：若 DragonflyDB 用 requirepass，Redis 同參數；若要升級到 ACL 趁此機會配置
Sentinel/Cluster 配置：client library 需要啟用 Sentinel discovery 或 Cluster mode

1# 切換前：直連 DragonflyDB
2r = redis.Redis(host="dragonfly-host", port=6379, password="secret")
3
4# 切換後：Sentinel 模式
5sentinel = redis.Sentinel([("sentinel-1", 26379), ("sentinel-2", 26379), ("sentinel-3", 26379)])
6r = sentinel.master_for("mymaster", password="secret")

階段四：效能 baseline 與回退

效能預期

回到 Redis 後，單機 throughput 會低於 DragonflyDB（Redis 單主線程 vs DragonflyDB 多線程）。建立 baseline 時要跟 Redis 的歷史數據比，不是跟 DragonflyDB 比。

指標	預期變化	應對
吞吐量	下降（單線程限制）	Cluster 分片或 read replica 分散
Latency p99	BGSAVE 期間可能有 spike	調整 BGSAVE 排程避開高峰
記憶體使用	上升 ~30%（Redis 記憶體效率較低）	預先調整 maxmemory 和 eviction policy

回退路徑

回退到 DragonflyDB：把 Redis 的 RDB dump 回 DragonflyDB 載入，endpoint 改回。Cache 資料可重建，即使 RDB 不搬，DragonflyDB 重啟後 cache miss 回源到 DB 即可。

DragonflyDB 在遷移完成後保留 7 天再下線。

交接路由

Source vendor：DragonflyDB
Target vendor：Redis / Valkey
反向路徑：Redis → DragonflyDB
HA 重建：Sentinel HA Failover、Cluster Resharding
持久化注意：Persistence Fork Latency

DragonflyDB shared-nothing 多核架構：用 scale-up 取代 Redis Cluster

Tue, 16 Jun 2026 00:00:00 +0000

本文是 DragonflyDB overview 的 implementation-layer deep article。選型層（為何選 DragonflyDB、BSL 授權、相容度）見 overview；本文只處理「決定用 DragonflyDB 後，多核架構怎麼用、相容邊界在哪」。命令實機驗證於 dragonfly df-v1.39.0（redis_version:7.4.0）、最後檢查日 2026-06-16；效能數字以 DragonflyDB 官方 benchmark 為準。

scale-up 還是 scale-out：一個架構賭注

把一台 32 核機器交給 Redis，Redis 的主執行緒只用得到其中一核處理命令——要榨乾這台機器，你得在同一台上跑好幾個 Redis 進程、組成 Cluster、用 hash slot 把 key 分片。多核利用變成了一個分散式系統問題（cluster re-sharding、cross-slot transaction、hash tag 治理全都來了）。

DragonflyDB 賭的是相反方向：一個進程、thread-per-core、shared-nothing，讓單機在不分片的情況下用滿所有核。它的論點是——多數「需要 Redis Cluster」的場景，真正的需求是吞吐與記憶體，不是跨機器分散；如果單機就能撐到那個規模，Cluster 的複雜度就不必付。實機可以看到這個架構：

1redis-cli INFO server | grep -E "thread_count|redis_version|dragonfly_version"
2# thread_count:8               ← 自動對齊 CPU 核數
3# redis_version:7.4.0          ← 對 client 裝成 Redis 7.4
4# dragonfly_version:df-v1.39.0

thread_count:8 在一個進程內，不是 8 個 Redis 進程組 Cluster。這就是賭注的核心：把 Redis Cluster 的水平分片，收進單一進程的垂直多核。理解 DragonflyDB 就是理解這個賭注成立的條件與它撞牆的地方。

對高吞吐單機 workload，這個賭注有現成的對照。Snap 在 multi-cloud 用 KeyDB（Redis 的 multi-threaded fork、單實例吞吐提升 5-10x）撐超高吞吐 cache，正是「不想為了多核去組 Cluster」的同一類需求；DragonflyDB 是這條路線更激進的版本（從零用 C++ 重寫、不是在 Redis 上加 thread）。

核心概念：thread-per-core 與 shared-nothing

DragonflyDB 的多核不是「多個執行緒搶同一份資料」，而是把資料切給各個執行緒、彼此不共享——這是它能線性擴展到多核的關鍵。

thread-per-core + 資料分區。每個 thread 綁一個核，keyspace 被 hash 切成多個 slice，每個 slice 只由一個 thread 擁有。一個命令進來，被路由到擁有該 key 的 thread 處理。因為一個 key 只有一個 thread 碰，單 key 操作不需要鎖——這消除了 Redis 多執行緒方案最大的開銷（lock contention）。

dashtable 取代 Redis 的 dict。DragonflyDB 用自製的 dashtable（一種 hash table）取代 Redis 的 dictionary，記憶體佈局更緊湊、resize 時不需要像 Redis 那樣漸進式 rehash 全表，同樣的 dataset 通常比 Redis 省 20-40% 記憶體（依資料形狀，以官方 benchmark 為準）。

fork-less snapshot。Redis 的持久化靠 fork()，大記憶體下會凍結主執行緒並讓記憶體接近翻倍（見 Redis persistence deep article）。DragonflyDB 不用 fork——它用自己的快照演算法在不複製整個進程的前提下做一致性快照，大記憶體場景不付 fork 的延遲尖峰與記憶體翻倍代價。這是它對「fork 是 Redis 結構性瓶頸」這個痛點的直接回答。

多執行緒的代價：沒有 Redis Cluster mode。資料分區在單進程內，DragonflyDB 不提供 Redis Cluster mode（它的哲學是單機撐大、不跨機器分片）。這個取捨決定了它的相容邊界與容量天花板，是後面踩坑的根源。

配置：多核與持久化的設定路徑

1docker run -d --name dragonfly -p 6379:6379 \
2  docker.dragonflydb.io/dragonflydb/dragonfly \
3    --threads 8 \              # thread 數、預設等於 CPU 核數（一般不需手動設）
4    --maxmemory 4gb \          # 記憶體上限、行為類似 Redis maxmemory
5    --cache_mode true \        # 純 cache 模式：記憶體滿時自動 evict（類似 allkeys-lru）
6    --snapshot_cron "0 3 * * *" # fork-less snapshot 排程（cron 格式、這裡每天 3 點）

調校判讀：

--threads 預設對齊 CPU 核數，多數情況不需手動設；設小於核數會浪費核，設大於核數沒有意義
--cache_mode true 讓 DragonflyDB 在記憶體滿時自動淘汰（純 cache 行為）；不開則記憶體滿時拒絕寫入（類似 Redis noeviction）
--maxmemory 留 headroom，但因為 fork-less，headroom 不需要像 Redis 留那麼多給 fork copy-on-write
snapshot 用 --snapshot_cron 排程，fork-less 機制讓大記憶體快照不產生延遲尖峰

Production 故障演練

Case 1：client 配 Cluster mode、連不上

徵兆：從 Redis Cluster 遷來，application 的 client library 還配著 cluster mode，連 DragonflyDB 報錯或 hang，CLUSTER 相關命令行為不如預期。

根因：DragonflyDB 不提供 Redis Cluster mode（單進程多核、不跨機器分片）。cluster-aware client 會嘗試 CLUSTER SLOTS 之類的拓樸發現，跟 standalone 的 DragonflyDB 對不上。

修法：

client 改回 standalone 配置（不要 cluster mode）
評估原本用 Cluster 的理由：若是為了多核吞吐，DragonflyDB 單進程多核已涵蓋，不需要 cluster mode
若原本用 Cluster 是為了超過單機的容量 / 跨機器分散，DragonflyDB 的 scale-up 模型撐不住，該留在 Redis Cluster
確認 application 沒有依賴 cluster-specific 行為（hash tag 的跨 slot 語意等）

Case 2：某些 Redis 命令 / module 不支援

徵兆：核心 SET/GET/HASH 等正常，但某個命令報 unknown command 或行為跟 Redis 不同，特別是 module 命令（RedisJSON / RedisSearch）與部分冷門命令。

根因：DragonflyDB 相容大多數 Redis 命令但不是 100%；它宣稱相容 redis_version:7.4.0，但部分 module、部分冷門命令、部分 Lua 行為有差異。

修法：

遷移前盤點 application 用到的命令，對照 DragonflyDB 的 API 相容清單（官方 docs）
module 重度依賴（RedisJSON / RedisSearch）要特別確認——DragonflyDB 的 module 生態比 Redis 淺
Lua script 行為差異要實測，不要假設跟 Redis 完全一致
相容性是遷移的主要風險，跟 Valkey 的相容性驗證同理但 DragonflyDB 邊界更寬（重寫而非 fork）

Case 3：thread 沒對齊核數、多核優勢沒發揮

徵兆：吞吐沒有達到預期、CPU 使用率不均（部分核閒置），thread_count 跟機器核數對不上。

根因：--threads 被手動設成小於 CPU 核數，或容器的 CPU limit 限制了實際可用核數，DragonflyDB 沒能用滿所有核。

修法：

redis-cli INFO server | grep thread_count 確認 thread 數對齊實體核數
容器環境確認 CPU limit 沒有卡住 DragonflyDB 的核數（cgroup CPU quota）
不要手動把 --threads 設小，預設對齊核數就是最佳
吞吐沒到預期也可能是 workload 本身（大命令、網路 RTT），用連線 / pipeline 的 RTT 分析交叉判斷

Case 4：跨 partition 的多 key 操作有額外成本

徵兆：大量多 key 命令（MGET 跨很多 key、跨 key 的 Lua）的延遲比預期高，單 key 操作則很快。

根因：shared-nothing 下 key 分散在不同 thread，多 key 操作要跨 thread 協調——單 key 免鎖的好處在多 key 跨 partition 時要付協調成本。這跟 Redis Cluster 的 cross-slot 是類似的本質（資料分散的代價），只是發生在單進程內。

修法：

高頻的多 key 操作盡量讓 key 落在同 partition（DragonflyDB 的 key 分布規則）
評估能否用單 key 結構（hash）取代多個 key 的聚合
跨 partition 協調是分區架構的固有成本，不是 bug，量大時要設計繞過
對照 Redis Cluster 的 cross-slot 限制，兩者都是「資料分散換吞吐」的代價

Case 5：BSL 授權踩到商業使用限制

徵兆：準備把 DragonflyDB 包成對外的 managed service 提供給客戶，法務 review 卡關。

根因：DragonflyDB 用 BSL（Business Source License），商業使用受限——具體限制是不可把 DragonflyDB 當成 managed service 對外提供（4 年後該版本轉 Apache 2.0）。內部使用無限制，但 SaaS 對外提供 DragonflyDB 即服務受限。

修法：

內部使用（多數企業場景）無限制，直接用
要把 DragonflyDB 當 managed service 對外賣，聯絡 DragonflyDB 取得商業 license
開源合規敏感（公部門 / 企業 OSI 政策）走 OSI 認可的 Valkey（BSD）
授權法律解讀諮詢法務，不要憑技術判斷

Capacity / cost 邊界

DragonflyDB 的容量判讀，核心在 scale-up 的天花板與多核效率：

訊號	健康區間	警戒與動作
`thread_count`	= CPU 實體核數	< 核數 → 沒用滿多核、查 –threads / cgroup
單機吞吐	遠高於單 Redis 進程	撞單機網路 / CPU 上限 → scale-up 到頂
記憶體效率	比 Redis 省 20-40%（依形狀）	以官方 benchmark + 自己量為準
snapshot 延遲尖峰	接近 0（fork-less）	有尖峰 → 確認用的是 DragonflyDB 快照不是相容路徑
單機容量 / 跨 AZ 需求	單機 + replica 撐得住	超單機 / 要跨機器分散 → DragonflyDB 撐不住

撞牆後的路由判斷：

超過單機容量、需要跨機器分散：DragonflyDB 的 scale-up 賭注在這裡輸——它沒有 Cluster mode。要跨機器分片走 Redis / Valkey Cluster。
需要 OSI 認可開源授權：BSL 不是 OSI 認可，合規敏感走 Valkey（BSD）。
不想自管：DragonflyDB 目前沒有 fully managed offering（無 ElastiCache for Dragonfly），必須自管。要 managed 走 ElastiCache（Redis / Valkey / Memcached）。
跨 AZ / 跨 region HA：DragonflyDB 有 replica 模式（primary-replica）跨 AZ 可行，但跨 region 需自建——大規模跨區走 managed 的 Global Datastore。

整合 / 下一步

DragonflyDB 的定位是「Redis 相容 + 激進多核」，它在 Redis 相容服務的光譜上有明確座標：

跟 Valkey：兩者都打「Redis 相容 + 更好的多核」，但 Valkey 是 fork（同源、最高相容、漸進加 thread），DragonflyDB 是 C++ 重寫（相容核心但架構激進、多核更徹底）。相容度要極致選 Valkey，多核吞吐要極致選 DragonflyDB。
跟 KeyDB / Garnet：KeyDB 是 Redis 的 multi-threaded fork（Snap 採用、Snap 收購後相對停滯）；Garnet 是 Microsoft 的研究型高吞吐 store（生態淺）。DragonflyDB 是這個「高吞吐 Redis 替代」群裡商業化最積極、生態最活躍的。
跟 Redis Cluster re-sharding：如果你的 Redis Cluster re-sharding 頻繁觸發、運維負擔重，DragonflyDB 的 scale-up 模型可能用單機取代整個 Cluster——這是評估遷移的主要動機。
跟 Shopify write-through：write-through 在 DragonflyDB 上行為一致，但單進程多核能承接比單 Redis 進程更大的 throughput，是 read-heavy + write-through 場景的 scale-up 選項。

Redis → DragonflyDB：drop-in 相容下的容量躍升 + 5 個踩雷

Tue, 19 May 2026 00:00:00 +0000

本文是跨 vendor migration playbook、cross-link 到 Redis（source）跟 DragonflyDB（target）。跟前一篇 Splunk → Elastic Security 的 6-phase playbook 對照、Redis → DragonflyDB 是 drop-in 相容 形態的 migration、結構更接近 vendor deep article methodology 的 6-section flow + 一段「相容性驗證」前置。

為什麼遷：cost / single-thread / multi-tenancy 三條 driver

Driver	觸發場景
Memory cost	Redis 6.x cluster 跑 1-10 TB 時、機器成本爆；DragonflyDB 記憶體效率提升 ~30%、相同 dataset 少 30% RAM
Single-thread bottleneck	Redis 主執行緒在單一 hot key 寫入時是瓶頸、scale-up 受限；DragonflyDB 多執行緒 + shared-nothing 設計、單機 throughput 號稱 25x
Multi-tenancy	Redis Cluster 多 namespace 需要 cluster-per-tenant、運維成本爆；DragonflyDB 設計上 namespace 隔離成本低

反向 driver（DragonflyDB → Redis）也存在 — 主要是 Redis Modules 依賴（RedisJSON / RedisSearch / RedisGraph）DragonflyDB 不支援、或 Lua script 用了 redis.call 進階 API。

跟 phased migration 的對照：drop-in 不需要 phased

跟前一篇 Splunk → Elastic 的 6-phase playbook 不同、Redis → DragonflyDB 的 migration 結構接近 standard deep article：

維度	Splunk → Elastic（phased）	Redis → DragonflyDB（drop-in）
Schema 對位	需要（SPL ↔ KQL / CIM ↔ ECS）	不需要（RESP protocol 相容）
Rule translation	4-12 週 SOC engineering 工作	不需要（command 直接相容）
Parallel run	4-8 週 dual-SIEM 跑	1-7 天 dual-write 觀察
Cutover 邊界	軟邊界（routing 切換、可逆 30 分鐘）	硬邊界（client 配置切換、單次完成）
不可逆 cleanup	1 年後 archive	立刻（DragonflyDB 接管後 Redis 可關）
整體週期	4-9 個月	1-4 週

判斷依據：migration 結構由 source 跟 target 的 schema / protocol 差異程度 決定、不是 universal phased playbook。本批第 2 篇驗證 deep article methodology 的 6-section 框架 在 drop-in migration 仍適用（只需前置 相容性驗證 段、其他 6 段對位）。

相容性驗證：在 cutover 前要確認的清單

DragonflyDB 號稱 Redis drop-in、但「drop-in」涵蓋範圍依 Redis feature 使用程度而定。Pre-migration 必跑的相容性 audit：

Redis feature	DragonflyDB 支援程度	Action
Basic data types (String / Hash / List / Set / ZSet)	完全相容	無需處理
RESP protocol v2 / v3	完全相容	無需處理
RDB load	Redis 6.x RDB 完全相容；7.x 部分 feature 待測	用 BGSAVE → 切換 → load 驗證
AOF	DragonflyDB 不用 AOF、改 snapshotting 模式	不直接 import AOF、需經 RDB 中介
Lua scripts	90% 相容、部分 redis.call API + EVAL 邊界 case 差異	Lua script audit 必跑、不能假設全相容
Pub/Sub	相容、但 message fanout 行為差異（多 thread 處理）	高 fanout pub/sub 場景需測 latency
Cluster mode	DragonflyDB 單機即可達 cluster throughput、不必 cluster；emulated cluster mode 部分相容	評估是否仍需 cluster
Sentinel HA	不直接支援、用 DragonflyDB 自家 replication	HA 架構重設計
Redis Modules (RedisJSON / Search / Graph)	不支援	必須前置改寫 application
Streams	相容、但 consumer group 行為部分差異	Stream consumer 跑 dual-write 觀察
Keyspace notifications	相容	無需處理

Audit 的關鍵 output：列「不相容功能」清單 + 對應 application code 修改範圍；若 Modules 在 production 使用、migration 退役。

Step-by-step cutover

 1# 1. 部署 DragonflyDB
 2docker run -d --name dragonfly -p 6380:6379 \
 3  -v /data/dragonfly:/data \
 4  docker.dragonflydb.io/dragonflydb/dragonfly:latest \
 5  --logtostderr --requirepass=
 6
 7# 2. Redis 端 BGSAVE
 8redis-cli -h redis-primary BGSAVE
 9# 等到 BGSAVE 完成
10redis-cli -h redis-primary INFO Persistence | grep rdb_last_save_time
11
12# 3. 把 dump.rdb 拷到 DragonflyDB
13scp redis-primary:/var/lib/redis/dump.rdb dragonfly-host:/data/dragonfly/
14
15# 4. 重啟 DragonflyDB 載入 RDB
16docker restart dragonfly
17
18# 5. 驗證資料一致
19redis-cli -h dragonfly-host -p 6380 DBSIZE
20redis-cli -h redis-primary DBSIZE
21# 兩端 key 數對齊
22
23# 6. Dual-write 1-7 天（application 同時寫兩端）
24# 7. Read 切換到 DragonflyDB、Redis 端只寫不讀
25# 8. Write 切換、Redis 端 standby
26# 9. 觀察 1-2 週、無異常後 Redis decommission

關鍵時間點：

BGSAVE → load：100GB RDB 約 5-15 分鐘、跨網路 SCP 時間另算
Dual-write window：1-7 天觀察、application 寫兩端、read 仍走 Redis
Cutover：read switch → write switch、每步間隔 24 小時
Decom：Redis 保留 standby 1-2 週、無異常後關閉

Production 故障演練

Case 1：RDB 版本差，DragonflyDB load 失敗

徵兆：Redis 7.2 端 BGSAVE 出的 dump.rdb 在 DragonflyDB load 時報 Unsupported RDB version、DragonflyDB 啟動失敗。

根因：Redis 7.2 RDB version 11 含新 feature（function library / sharded pubsub）DragonflyDB 當前 release 沒支援；版本相容性需逐 release 確認。

修法：

Pre-migration 版本相容矩陣 audit：DragonflyDB release note 對照 Redis version、確認 RDB version 支援
降級 BGSAVE：Redis 端設 rdb-version 9（Redis 6.x 兼容版本）、犧牲 Redis 7.x 新 feature
替代方案：用 redis-cli --scan + MIGRATE 命令 incremental 搬、不用 RDB；速度慢 100x 但相容性好

Case 2：Lua script 跑進 EVAL 不一致

徵兆：dual-write 階段、發現某些 EVAL script 在 Redis 跟 DragonflyDB 結果不同；具體是某個 redis.call("OBJECT", "ENCODING", key) 在 DragonflyDB 回不一樣的 encoding 字串。

根因：DragonflyDB 內部不用 Redis 的 ziplist / listpack encoding（dashtable 不需要）、OBJECT ENCODING 返回值不對等；script 邏輯依賴 encoding 來決定行為、結果不同。

修法：

Audit Lua script：grep 所有 redis.call("OBJECT"、列出依賴 encoding 的 script
改寫 application：不依賴 encoding、改用 MEMORY USAGE 或 high-level check
接受差異：DragonflyDB 不會回 encoding 但 functional 結果對等、SOC review 確認可接受

Case 3：Pub/Sub fanout 高負載 latency

徵兆：production 切到 DragonflyDB 後、Pub/Sub 訂閱端 latency p99 從 5ms 漲到 20-50ms；topic fanout >10K subscriber 場景。

根因：DragonflyDB 多 thread 設計、Pub/Sub message 在 thread 間 dispatch 需要 routing；Redis single-thread 沒這個 overhead。高 fanout 是 DragonflyDB 設計取捨。

修法：

架構：高 fanout Pub/Sub 不用 DragonflyDB、改 NATS / Redis Streams + consumer group
DragonflyDB 配置調整：--proactor_threads 對 Pub/Sub 影響大、調到符合 CPU 核心數
接受 latency：< 10K subscriber 差異可忽略、不必動

Case 4：Cluster mode 看似相容但 slot routing 行為差

徵兆：application 用 Redis Cluster client（lettuce / Jedis cluster mode）連 DragonflyDB emulated cluster、運行幾天後 MOVED redirect 異常、key 找不到。

根因：DragonflyDB emulated cluster mode 是 single node 模擬、CLUSTER SLOTS 返回固定 mapping；某些 client 端 cluster topology cache 跟實際 routing 不對齊、發 redirect。

修法：

Application 改 standalone client：DragonflyDB single node 已能達 cluster 級 throughput、不必用 cluster client
Client config：lettuce 端 clusterTopologyRefreshOptions(...) 設較長 refresh、減少 redirect 機會
長期：等 DragonflyDB cluster 正式 GA 後再評估

Case 5：Modules 用了沒注意，migration 卡住

徵兆：cutover 後幾天、application 某個功能完全壞、log 顯示 ERR unknown command 'JSON.SET'；DragonflyDB 不支援 RedisJSON。

根因：Pre-migration audit 漏掉 application 用了 RedisJSON（透過某 client library 抽象）；DragonflyDB 不支援該 Module 命令、application 直接壞。

修法：

Pre-migration audit 必跑：MONITOR 抓 1 小時 production traffic、grep 非 standard command（JSON.* / FT.* / GRAPH.*）
應急回退：Redis standby 還在、application client config 切回
長期：JSON 改用 standard Hash + serialization、Search 改 Elasticsearch / Meilisearch、Graph 改 Neo4j

Capacity / cost 對照

維度	Redis（self-managed）	DragonflyDB	取捨
Single-node throughput	~100K-200K ops/s	~2-5M ops/s（號稱 25x）	DragonflyDB 領先、實測依 workload 而定
Memory efficiency	baseline	-30% 平均、依資料分佈	DragonflyDB 領先
Persistence	RDB / AOF 雙模式	Snapshotting 為主、不用 AOF	Redis 對 durability 要求高的 workload 仍領先
HA / Replication	Sentinel + Cluster 成熟	自家 replication、HA 文件相對少	Redis 領先
Modules ecosystem	RedisJSON / Search / Graph / TimeSeries	不支援	Redis 領先
Cluster scaling	Cluster mode 成熟	單機效能高、cluster 仍 emerging	Redis 領先、但 DragonflyDB 單機已能 cover 多數 use case
Total cost (10TB cache)	$8-15K USD / month	$2-5K USD / month	DragonflyDB 顯著便宜
Operational maturity	高（10+ 年 production）	中（2022+、production 案例 1000+）	Redis 領先

判讀：cache use case 簡單（pure cache / session store）走 DragonflyDB；複雜 use case（Modules / Pub/Sub fanout / strict durability）保留 Redis。

整合 / 下一步

跟 client library 整合

主流 Redis client（lettuce / Jedis / redis-py / node-redis / go-redis）都直接相容 DragonflyDB；唯一例外是 cluster client 模式行為差（見 Case 4）。

跟 monitoring 整合

DragonflyDB exporter 提供 Prometheus metric、跟 Redis exporter 對應 metric 名稱 80% 相同；grafana dashboard 需小改：

redis_memory_used_bytes → dragonfly_memory_used_bytes
redis_commands_processed_total → dragonfly_commands_processed_total

跟 Redis Sentinel HA 對位

DragonflyDB 不直接支援 Sentinel、HA 走自家 master-replica + DNS-based failover：

DragonflyDB primary + replica
K8s 用 StatefulSet + Service + readiness probe
失敗 failover 比 Sentinel 慢（30s-2min vs 5-15s）

下一步議題

DragonflyDB Cluster GA：正式 cluster mode 出來後重評估
Stream + consumer group 細節：dual-write 期間驗證每個 consumer pattern
Modules 替代方案：JSON / Search / Graph 各自的 cloud-native 替代評估

Dragonflydb on Tarragon

DragonflyDB → Redis / Valkey：回退到標準生態的遷移路徑

為什麼從 DragonflyDB 遷回

6 維 diff dimension audit

相容性確認

階段一：資料匯出

階段二：HA 架構重建

Sentinel 路徑（適合非分片場景）

Cluster 路徑（適合需要分片的場景）

階段三：Client 切換

階段四：效能 baseline 與回退

效能預期

回退路徑

交接路由

DragonflyDB shared-nothing 多核架構：用 scale-up 取代 Redis Cluster

scale-up 還是 scale-out：一個架構賭注

核心概念：thread-per-core 與 shared-nothing

配置：多核與持久化的設定路徑

Production 故障演練

Case 1：client 配 Cluster mode、連不上

Case 2：某些 Redis 命令 / module 不支援

Case 3：thread 沒對齊核數、多核優勢沒發揮

Case 4：跨 partition 的多 key 操作有額外成本

Case 5：BSL 授權踩到商業使用限制

Capacity / cost 邊界

整合 / 下一步

相關連結

Redis → DragonflyDB：drop-in 相容下的容量躍升 + 5 個踩雷

為什麼遷：cost / single-thread / multi-tenancy 三條 driver

跟 phased migration 的對照：drop-in 不需要 phased

相容性驗證：在 cutover 前要確認的清單

Step-by-step cutover

Production 故障演練

Case 1：RDB 版本差，DragonflyDB load 失敗

Case 2：Lua script 跑進 EVAL 不一致

Case 3：Pub/Sub fanout 高負載 latency

Case 4：Cluster mode 看似相容但 slot routing 行為差

Case 5：Modules 用了沒注意，migration 卡住

Capacity / cost 對照

整合 / 下一步

跟 client library 整合

跟 monitoring 整合

跟 Redis Sentinel HA 對位

下一步議題

相關連結