unified log zk nov15

Razor

2020/04/25 发布于 技术 分类

文字内容
1. The good, the bad, and the ugly of Apache ZooKeeper Flavio Junqueira Apache ZooKeeper Committer, PMC Confluent fpj@confluent.io twitter: @fpjunqueira
2. What’s ZooKeeper?
3. Building resilient distributed systems Source: Ashish via Flickr
4. Leader election Master Worker Worker Worker
5. Leader election Master E.g., replication Worker Worker Worker
6. Leader election Master Worker Worker Worker Master Worker Worker Master Worker Worker Worker Worker
7. Source: Ki Young Lee via Flickr
8. Leader election Who’s the leader? Process Who’s the leader? Process Process Who’s the leader?
9. Leader election Who’s the leader? Process messages messages Process Who’s the leader? Process messages Who’s the leader?
10. Leader election • • • • What if a process doesn’t hear from another? A process is allowed to change its vote? For how many rounds to I need to exchange messages? Is this even correct? Who’s the leader? Process messages messages Process Who’s the leader? Process messages Who’s the leader?
11. Leader election Process Who’s the leader? Who’s the leader? Process Leader dude Who’s the leader? Process
12. Leader election Process Who’s the leader? Who’s the leader? Process Leader dude Who’s the leader? Process
13. Leader election Process Who’s the leader? Leader dude Who’s the leader? Process Leader dude Leader dude Who’s the leader? Process
14. Leader election • • • Replicas need to give consistent answers Protocol to replicate the state … essentially a consensus protocol Process Who’s the leader? Leader dude Who’s the leader? Process Leader dude Leader dude Who’s the leader? Process
15. Leader election • Process The dudes are ZooKeeper servers Who’s the leader? ZK Server Who’s the leader? Process ZK Server ZK Server ZooKeeper Ensemble Who’s the leader? Process
16. … and more • Membership • Synchronization primitives • • locks • barriers • atomic counters • CAS Configuration metadata
17. How does ZooKeeper work?
18. Basics • Hierarchy of simple files called znodes • • • Persistent, ephemeral, sequential File-system-like API • Writes: create, delete, setData • Reads: exists, getChildren, getData Watches • Enables clients to observe changes to znodes • One shot, not a subscription
19. Recipes • ZooKeeper doesn’t expose primitives explicitly • Primitives implemented using recipes • Simple algorithms based on the ZooKeeper API • Many have been implemented and battle-tested over time
20. Leader election with ZooKeeper • Each process 1. Creates an ephemeral znode with path /election 2. If create call succeeds, then lead 3. Otherwise, watch /election
21. Sessions and Ephemerals • Sessions • Abstraction of connection to the ensemble • Sessions start on a single server in an ensemble • Sessions can move to different servers over time • The ensemble leader expires sessions using a timeout scheme • An ephemeral znode is associated to a session • If session expires, then ephemerals automatically deleted Client ZooKeeper ensemble
22. Sessions and Ephemerals • • • Sessions • Abstraction of connection to the ensemble • Sessions start on a single server in an ensemble • Sessions can move to different servers over time The ensemble leader expires sessions using a timeout scheme An ephemeral znode is associated to a session • If session expires, then ephemerals automatically deleted Client Server Server Server ZooKeeper Ensemble
23. … but could we have done it ourselves?
24. Implement your own screw driver… Source: Florinda Chan via Flickr
25. Use case: Apache Kafka Replication
26. Kafka basics • Pub-sub messaging • • • Implemented as a distributed commit log Topics • App-specific element of organization • E.g., user clicks, search queries, likes, friendship connections, tweets Topics are sharded into partitions • Each partition has a replica set
28. ZooKeeper • Stores the metadata of replica groups • Leadership and in-sync replicas
29. Partition replication and ZooKeeper
30. Partition replication and ZooKeeper
31. Partition replication and ZooKeeper
32. Partition replication and ZooKeeper
33. ZooKeeper • Stores the metadata of replica groups • Leadership and in-sync replicas • Advantages • Source of truth: Precise information about the replica group • Flexibility: No need to rely on majority quorums
34. But why use a replicated system to build another replicated system?
35. Rationale • • Write throughput to ZooKeeper is bounded • Lower write throughput with more replicas • … higher read throughput though Management of replica groups • Easier with a component like ZooKeeper around
36. Other examples • Apache HBase • • Large-scale key-value store Apache BookKeeper • High-performance, distributed logging
37. The project
38. Apache ZooKeeper • Apache top-level project • • Committers: 15 • • Since 2010 Across 9 different companies PMC members: 9 • Across 8 different companies http://zookeeper.apache.org
39. Good, bad, and ugly • Good • • Bad • • What made the project successful, what users like What users don’t like Ugly • What we devs of ZooKeeper don’t like
40. The good • See previous slides… • Simple API • It works • Battle tested
41. The bad • Dependency-phobia • Server footprint • • Requires additional hardware (or VMs) Hard to embed • Making operations harder • Fat client • Dedicated device for the txn log
42. The ugly • Requests under disconnection • • No really good way to tell if request has been executed Multi-tenancy • Security and performance isolation: ok but not stelar
43. Wrap up
44. Apache ZooKeeper • Distributed coordination • Master election, membership, metadata, locks, barriers, etc • Battle-tested in production across a number of companies • Consider contributing • Subscribe to (user dev)@zookeeper.apache.org • Check http://zookeeper.apache.org