1. Zuul - Netflix API Gateway that Handles 2M Requests Per Second Susheel Aroskar Cloud Gateway - Netflix
5. How??
6. How??
7. Zuul shifted all the traffic from the affected us-east-1 region to other healthy AWS regions https://github.com/Netflix/zuul
8. Susheel Aroskar Senior Software Engineer Cloud Gateway saroskar@netflix.com github.com/raksoras @susheelaroskar
9. Agenda
10. Agenda ● What is Zuul?
11. Agenda ● What is Zuul? ● What Zuul can do for you?
12. Agenda ● What is Zuul? ● What Zuul can do for you? ● How Zuul does it?
13. Agenda ● What is Zuul? ● What Zuul can do for you? ● How Zuul does it? ● Future roadmap
14. What Is Zuul? Zuul Introduction
15. Zuul is Netflix’s API Gateway
16. ~2M RPS Zuul is Netflix’s API Gateway
17. ~2M RPS 72 Clusters Zuul is Netflix’s API Gateway
18. ~2M RPS 72 Clusters Zuul is Netflix’s API Gateway 6000 servers
19. ~2M RPS 72 Clusters Zuul is Netflix’s API Gateway 6000 servers 150 Backend Services
20. What Can Zuul Do for you? Zuul Features
21. Core Responsibilities
22. Route Traffic Core Responsibilities
23. Route Traffic Core Load Balance Responsibilities
24. Route Traffic Core Load Balance Protect Origins Responsibilities
25. Route Traffic Core Load Balance Protect Origins Responsibilities Metrics and Monitoring
26. 1. Route Traffic CORE RESPONSIBILITIES
27. Login Service Playback Service
28. Login Service Playback Service Service Registry (Eureka)
29. Login Service register Playback Service register Service Registry (Eureka)
30. Login Service register Load service addresses Playback Service register Service Registry (Eureka)
31. Login Service register Load service addresses Playback Service register Service Registry (Eureka)
32. Login Service register register Load service addresses Playback Service register Service Registry (Eureka)
33. Login Service register Load service addresses Playback Service register Service Registry (Eureka)
34. Login Service zuul-eu register mTLS Load service addresses Playback Service register Service Registry (Eureka)
35. You can change routes dynamically at runtime
36. You can change routes dynamically at runtime
37. You can change routes dynamically at runtime
38. 2. Load Balance CORE RESPONSIBILITIES
39. Load balancing notation ZUUL Incoming Traffic Server Connections
40. Round Robin ZUUL
41. ZUUL Round Robin Easiest to implement, No locking/ Synchronization
42. ZUUL Round Robin Easiest to implement, No locking/ Synchronization Unequal resource utilization
43. Least Connections ZUUL
44. ZUUL Least Connections More balanced resource utilization
45. ZUUL Least Connections More balanced resource utilization Selecting next server is expensive, O(N) operation.
46. Power of 2 ZUUL ?
47. Power of 2 ZUUL <
48. Power of 2 ZUUL
49. ZUUL Power of 2 More balanced resource utilization Inexpensive next server selection
50. 3. Protect Origins CORE RESPONSIBILITIES
51. Protection from slow clients - timeouts Default = 100 ms
52. Protection from slow clients - timeouts Default = 100 ms Origin Billing = 150 ms
53. Protection from slow clients - timeouts Default = 100 ms Origin Billing = 150 ms Path /payment = 200 ms
54. Surviving sudden surge in traffic - throttling and load shedding
55. Surviving sudden surge in traffic - throttling and load shedding Login service Playback service All is well
56. Per origin concurrency throttling Login service Playback service Overloaded Playback service
57. Global concurrency throttling at Zuul frontend Login service Login service Overloaded Playback service
58. Zuul CPU load shedding Login service Zuul CPU hits the wall Playback service
59. Other side of protection - security
60. Other side of protection - security DDoS protection
61. Other side of protection - security DDoS protection Single sign-on
62. Other side of protection - security DDoS protection Single sign-on Client auth certificates
63. Other side of protection - security DDoS protection Single sign-on Client auth certificates Whitelisted internal CORP access
64. 4. Metrics and Monitoring CORE RESPONSIBILITIES
65. Historic and near real time aggregate metrics - Atlas Metrics Total request counts Latency Geo location Tags Status Host name Device type Atlas
69. How Zuul does it? Zuul Architecture
70. C10K challenge
71. Thread per Connection Socket Socket Read Write Write Read Thread-1 Thread-2
72. Thread per Connection Socket Socket Read Write Write Read Thread-1 Thread-2 Async I/O read callback Socket write callback read callback Socket Single Thread write callback
73. Core abstraction at the heart of the Zuul - Filter Input Filter Output
74. Zuul is made up of different types of filters Request Inbound Filter Request
75. Zuul is made up of different types of filters Request Inbound Filter Request Response Outbound Filter Response
76. Zuul is made up of different types of filters Request Inbound Filter Request Response Outbound Filter Response Request Proxy Filter Response
77. Simple, extensible Zuul core architecture Proxy Filter Inbound Filters Outbound Filters
78. Simple, extensible Zuul core architecture Backend origin Proxy Filter Inbound Filters REQUEST Outbound Filters
79. Simple, extensible Zuul core architecture Backend origin Proxy Filter Inbound Filters REQUEST Outbound Filters
80. Simple, extensible Zuul core architecture Backend origin Proxy Filter Inbound Filters REQUEST Outbound Filters
81. Simple, extensible Zuul core architecture Backend origin Proxy Filter Inbound Filters REQUEST Outbound Filters RESPONSE
82. Simple, extensible Zuul core architecture Backend origin Proxy Filter Inbound Filters REQUEST Outbound Filters RESPONSE
83. Future roadmap Plans for future
84. Planned features • End to end HTTP/2 proxying
85. Planned features • End to end HTTP/2 proxying • WebSocket proxying
86. Planned features • End to end HTTP/2 proxying • WebSocket proxying • gRPC proxying
87. Planned features • End to end HTTP/2 proxying • WebSocket proxying • gRPC proxying • Edge caching
88. In conclusion
89. You can deploy Zuul at the edge of your network to gain higher availability and better visibility with unparalleled flexibility in handling your traffic.
90. Thank you.
91. ZUUL Least Connections More balanced resource utilization Tricky to track number of connections when multiple load balancing nodes are involved
92. Determining number of connections with more than one LB Zuul 1 1 11 Backend origin servers
93. Determining number of connections with more than one LB Zuul 1 1 11 Backend origin servers Zuul 2 2 1
94. Determining number of connections with more than one LB Zuul 1 1 11 Backend origin servers Zuul 2 2 1
95. Zuul’s improvement to least connections algorithm Zuul 1 Backend origin servers Zuul 2 13 1 12 3 12 3 2 23 12
98. Zuul’s improvement Zuul 1 Backend origin servers Zuul 2 13 1 12 3 12 3 2 23 12