1. Smooth Operator Large-Scale Automated Storage with Kubernetes Celina Ward @shaleenaa Matt Schallert @mattschallert
2. What is M3?
3. M3DB Scale 31M 50Gb Writes per second Gigabits per second 1000+ 9B Instances running M3DB Unique Metric IDs
4. 2016 2 Clusters 1 Configuration
5. 2018 40+ Clusters 10+ Configurations
6. M3DB Features Sharding Metrics are sharded at ingestion time
7. M3DB Features Sharding Replication Metrics are sharded at ingestion time Replicates in 3 separate failure domains
8. Managing M3DB Lifecycle Reactive Proactive 1 hour per day, 2 hours per week 5 hours per week
9. Managing Complexity
11. Performant Stateful Primitives Requirement #1: Support a high-throughput, latency-sensitive workload
12. Ephemeral Instances? ● No durability ● Streaming terabytes of data on restart ● Dangerous reliability implications
13. Remote: Block Store? ● Increased latency ● We already replicate 3x ● Less portable (+ no on-prem)
14. Remote: Object Store? ● Deduplicate, store remotely ● Even worse latency ● Terabytes of data transfer
16. Data Centers & Cloud Requirement #2
17. Embrace the Community Requirement #3
21. Local Volumes Performant Stateful Primitives (storage disk attached to host)
23. Node Affinity + StatefulSets Data Centers and Cloud
25. Results
27. $ kubectl get pods NAME east1-prod-a-rep0-0 east1-prod-a-rep0-1 ... east1-prod-a-rep1-0 east1-prod-a-rep1-1 ... east1-prod-a-rep2-0 east1-prod-a-rep2-1 ... ZONE us-east1-b us-east1-b us-east1-c us-east1-c us-east1-d us-east1-d
28. Where does our operator replace human effort? Reactive Proactive 0 minutes / day 20 minutes / week 0 minutes / week
34. Lessons Learned
35. Broken Assumptions ● Kubernetes revealed assumptions we made ● Instance identity ≠ host ● Made M3DB more robust
37. kubectl apply -f m3db_operator.yaml
39. Advice for Large Stateful Workloads
40. Out-of-Cluster Reliability ● Years invested in M3DB reliability & tooling ● Considered Kubernetes once we faced operational scaling challenge ● Be mindful of adding complexity
41. Declarative > Imperative ● Core to Kubernetes, great for stateful ● Operator exchanged desired states between Kubernetes and M3DB ● Storing topology externally → no hard dependency on Kubernetes API
42. Iterate on Each Stateful Interaction ● Don’t try to do everything at once ● Edge case scenarios still need humans
43. Next Steps ● Data centers… ● Auto-scale M3DB clusters
45. +
46. Thank You to the Team Special shout out to Paul Schooss
47. github.com/m3db/m3db-operator m3db.io/talks eng.uber.com/m3 @shaleenaa @mattschallert