"Managed"
- AWS ElastiCache 的責任邊界:managed 接手了什麼、又默默留下什麼
ElastiCache 把 failover、patching、snapshot、跨 AZ 複製接走,但 cache stampede、client 重連、key 設計、eviction policy 還是你的事。本文用 shared responsibility 拆解 managed 的真實邊界、展開 engine 選擇與 cluster mode 配置、5 個把『以為 AWS 全包』寫成事故的 production 踩坑,以及 ElastiCache 到 MemoryDB 的 durability 邊界
- MongoDB → Atlas:Atlas 不是 MongoDB + managed、是另一個 product
Atlas 號稱「MongoDB managed」但 operational model 完全不同(auto-scaling / VPC peering / IAM-driven access / 內建 backup / billing 模型);本文採用 Type C operational redesign hybrid 結構、4-phase operational migration + drop-in cutover、5 個 production 踩雷(連線數限制 / IP whitelist / backup retention / IAM token 過期 / billing 暴漲)
- Self-managed Prometheus → Grafana Cloud Metrics:feature × ops × cost 對照
Self-managed Prometheus → Grafana Cloud Metrics (Mimir-backed) 是 Type C operational redesign — Prometheus query API 完全相容、operational stack (HA / retention / scaling) 全託管;本文用 feature / ops / cost 三維對照表開頭、5 個 production 踩雷
- RabbitMQ → AWS SQS:交出 broker 維運、把 routing 收斂進 application
自管 RabbitMQ 叢集遷到 AWS SQS 是 operational redesign:protocol 不相容、application 要從 manual ack 改成 visibility timeout + delete、exchange routing 收斂成 SNS fan-out 或多 queue;本文跑 6 維 diff dimension audit(operational 差最大)、釐清什麼該遷什麼不該遷、5 個 production 故障演練(DLX → redrive policy / prefetch → batch + visibility / fan-out → SNS-to-SQS / 256KB 大小限制 / ordering 到 FIFO 的吞吐取捨)跟漸進 cutover
- Self-managed ELK → Elastic Cloud:5 年 ELK 集群的 lifecycle 收尾
Self-managed ELK Stack → Elastic Cloud 是 Type C operational redesign — protocol drop-in、operational stack(cluster sizing / shard 治理 / upgrade / backup)全託管;本文按 5 年 ELK lifecycle (build → scale → degrade → save → migrate) 組織、5 個 production 踩雷
- Self-managed Kafka → AWS MSK:把 $15K/month operational cost 拆解到 managed
Kafka self-managed → MSK 是 Type C operational redesign — protocol 完全相容、operational stack(ZooKeeper / brokers / monitoring / patching)全託管;本文用 cost 拆解開頭、5 個 production 踩雷(client connection pattern / version pinning / metric pipeline / IAM auth / cross-cluster mirror)
- 自管 Redis / Valkey → AWS ElastiCache:engine 不變、變的是誰運維
自管 Redis/Valkey 遷到 ElastiCache 的特殊之處:engine 沒變(Redis 還是 Redis)、data model 沒變、API 沒變——變的只有運維責任歸屬。本文跑 6 維 diff audit 對映 Type C operational hybrid、展開 VPC/安全/cutover 的實際工作、以及『把 failover/patching 交出去、同時交出哪些控制權』的責任邊界,5 個 production 踩坑