全球架构师峰会 Arch Summit 2018

徐宏亮 Uber搭建基于Kafka的跨数据中心复制平台

1. How Uber Builds A Cross Data Center Replication Platform on Apache Kafka Hongliang Xu Uber Streaming Data Team Dec 07, 2018
2. Agenda 01 Apache Kafka at Uber 02 Apache Kafka pipeline & replication 03 uReplicator 04 Data loss detection 05 Q&A
3. Real-time Dynamic Pricing App Views Apache Kafka Stream Processing Dynamic pricing Vehicle Information
4. Uber Eats - Real-Time ETAs
5. A bunch more ... ● ● Fraud Detection Driver & Rider Sign-ups, etc.
6. Apache Kafka - Use Cases ● ● ● ● ● General pub-sub, messaging queue Stream processing ○ AthenaX - self-service streaming analytics platform (Apache Samza & Apache Flink) Database changelog transport ○ Cassandra, MySQL, etc. Ingestion ○ HDFS, S3 Logging
7. Data Infrastructure @ Uber PRODUCERS Rider App CONSUMERS Kafka Payment Driver App (Internal) Services Mobile App Surge API / Services Debugging Etc. ELK Samza / Flink DATABASES Hadoop Cassandra MySQL Real-time Analytics, Alerts, Dashboards Applications Data Science Ad-hoc Exploration AWS S3 Vertica / Hive Analytics Reporting
8. Scale Trillions Messages / Day PBs Data Tens of Thousands Topics excluding replication
9. Agenda 01 Apache Kafka at Uber 02 Apache Kafka pipeline & replication 03 uReplicator 04 Data loss detection 05 Q&A
10. Apache Kafka Pipeline @ Uber DC1 Applications [ProxyClient] Kafka REST Proxy Regional Kafka uReplicator Aggregate Kafka Offset Sync Service DC2 Applications [ProxyClient] Kafka REST Proxy Regional Kafka Secondary Apache Kafka uReplicator Aggregate Kafka
11. Aggregation DC1 ● Regional Kafka uReplicator Aggregate Kafka Offset Sync Service DC2 Regional Kafka uReplicator Aggregate Kafka Global view
12. Cross-Data Center Failover During runtime DC1 ● Regional Kafka uReplicator Aggregate Kafka ● Offset Sync Service During failover DC2 Regional Kafka uReplicator reports offset mapping to offset sync service Offset sync service is all-active and the offset info is replicated across data centers ● uReplicator Aggregate Kafka ● Consumers ask offset sync service for offsets to resume consumption based on its last commit offsets Offset sync service translates offsets between aggregate clusters
13. Ingestion DC1 Aggregate Kafka HDFS/S3 Aggregate Kafka HDFS/S3 DC2 ● ● HDFS S3
14. Topic Migration DC1 ● Producer Regional Kafka DC2 Regional Kafka Consumer ● ● Setup uReplicator from new cluster to old cluster Move producer Move consumer
15. Topic Migration DC1 ● Regional Kafka uReplicator DC2 Producer Regional Kafka Consumer ● ● ● Setup uReplicator from new cluster to old cluster Move producer Move consumer Remove uReplicator
16. Topic Migration DC1 ● Regional Kafka Consumer ● ● ● uReplicator DC2 Producer Regional Kafka Consumer Setup uReplicator from new cluster to old cluster Move producer Move consumer Remove uReplicator
17. Topic Migration DC1 ● Regional Kafka ● ● ● uReplicator DC2 Producer Regional Kafka Consumer Setup uReplicator from new cluster to old cluster Move producer Move consumer Remove uReplicator
18. Topic Migration DC1 ● Regional Kafka ● ● ● DC2 Producer Regional Kafka Consumer Setup uReplicator from new cluster to old cluster Move producer Move consumer Remove uReplicator
19. Replication - Use Cases ● ● Aggregation ○ Global view ○ All-active ○ Ingestion Migration ○ Move topics between clusters/DCs
20. Agenda 01 Apache Kafka at Uber 02 Apache Kafka pipeline & replication 03 uReplicator 04 Data loss detection 05 Q&A
21. Motivation - MirrorMaker ● Pain point ○ Expensive rebalancing ○ Difficulty adding topics ○ Possible data loss ○ Metadata sync issues
22. Requirements ● ● ● ● ● Stable replication Simple operations High throughput No data loss Auditing
23. Design - uReplicator ● ● Apache Helix uReplicator controller ○ Stable replication ■ Assign topic partitions to each worker process ■ Handle topic/worker changes ○ Simple operations ■ Handle adding/deleting topics
24. Design - uReplicator ● uReplicator worker ○ Apache Helix agent ○ Dynamic Simple Consumer ○ Apache Kafka producer ○ Commit after flush Apache ZooKeeper Apache Helix Controller topic: testTopic partition:0 Apache Helix Agent Worker Thread FetcherManager LeaderFind Thread FetcherThread SimpleConsumer Apache Kafka Producer LinkedBlockingQueue
25. Requirements ● ● ● ● ● Stable replication Simple operations High throughput No data loss Auditing
26. Performance Issues ● Catch-up time is too long (4 hours failover) ○ Full-speed phase: 2~3 hours ○ Long tail phase: 5~6 hours
27. Problem: Full Speed Phase ● Destination brokers are bound by CPUs
28. Solution 1: Increase Batch Size ● ● Destination brokers are bound by CPUs ○ Increase throughput ○ producer.batch.size: 64KB => 128KB ○ producer.linger.ms: 100 => 1000 Effect ○ Batch size increases: 10~22KB => 50~90KB ○ Compress rate: 40~62% => 27~35% (compressed size)
29. Solution 2: 1-1 Partition Mapping ● ● ● ● ● ● ● ● Round robin Topic with N partitions N^2 connection DoS-like traffic Deterministic partition Topic with N partitions N connection Reduce contention p0 MM 1 p0 p0 MM 1 p0 p1 MM 2 p1 p1 MM 2 p1 p2 MM 3 p2 p2 MM 3 p2 p3 MM 4 p3 p3 MM 4 p3 Destination Source Source Destination
30. Full Speed Phase Throughput ● First hour: 27.2MB/s => 53.6MB/s per aggregate broker Before After
31. Problem 2: Long Tail Phase ● ● During full speed phase, all workers are busy Some workers catch up and become idle, but the others are still busy
32. Solution 1: Dynamic Workload Balance ● ● ● During full speed phase, all workers are busy Original partition assignment ○ Number of partitions ○ Heavy partition on the same worker Workload-based assignment ○ Total workload when added ■ Source cluster bytes-in-rate ■ Retrieved from Chaperone3 ○ Dynamic rebalance periodically ■ Exceeds 1.5 times the average workload
33. Solution 2: Lag-Feedback Rebalance ● ● During full speed phase, all workers are busy Monitors topic lags ○ Balance the workers based on lags ■ Larger lag = heavier workload ○ Dedicated workers for lagging topics
34. Dynamic Workload Rebalance ● ● Periodically adjust workload every 10 minutes Multiple lagging topics on the same worker spreads to multiple workers
35. Requirements ● ● ● ● ● Stable replication Simple operations High throughput No data loss Auditing
36. Requirements ● ● ● ● ● Stable replication Simple operations High throughput No data loss Auditing
37. More Issues ● ● ● Scalability ○ Number of partitions ○ Up to 1 hour for a full rebalance during rolling restart Operation ○ Number of deployments ○ New cluster Sanity ○ Loop ○ Double route
38. Design - Federated uReplicator ● ● Scalability ○ Multiple route Operation ○ Deployment group ○ One manager
39. Design - Federated uReplicator ● uReplicator Manager ○ Decide when to create new route ○ Decide how many workers in each route ○ Auto-scaling ■ Total workload of route ■ Total lag in the route ■ Total workload / expected workload on worker ■ Automatically add workers to route
40. Design - Federated uReplicator ● uReplicator Front ○ whitelist/blacklist topics (topic, src, dst) ○ Persist in DB ○ Sanity checks ○ Assigns to uReplicator Manager
41. Requirements ● ● ● ● ● Stable replication Simple operations High throughput No data loss Auditing
42. Agenda 01 Apache Kafka at Uber 02 Apache Kafka pipeline & replication 03 uReplicator 04 Data loss detection 05 Q&A
43. Detect data loss ● ● ● ● It checks data as it flows through each tier of the pipeline It keeps ts-offset index to support ts-based query It also checks latency from the PoV of consumer It also provides workload for each topic
44. Requirements ● ● ● ● ● Stable replication Simple operations High throughput No data loss Auditing
45. Blogs & Open Source ● ● uReplicator ○ Running in production for 2+ years ○ Open sourced: https://github.com/uber/uReplicator ○ Blog: https://eng.uber.com/ureplicator/ Chaperone ○ Running in production for 2+ years, audit almost all topics in our clusters ○ Open sourced: https://github.com/uber/chaperone ○ Blog: https://eng.uber.com/chaperone/
46. Agenda 01 Apache Kafka at Uber 02 Apache Kafka pipeline & replication 03 uReplicator 04 Data loss detection 05 Q&A

相关幻灯片