Uber如何结合DevOps理念来加速网络基础设施建设的自动化进程

微风

2019/03/24 发布于 技术 分类

文字内容
1. Autonomous Network System at Uber Scale Cao, Bo (曹博) Network Engineer @ Uber QCon Shanghai 2018
3. Outline ● Who Is Uber? ● Uber Global Network Infrastructure ● What Is NetDevOps? ● Autonomous Network System ● Monitoring & Alerting
4. Who Is Uber?
5. Uber By the Numbers 600+ 75+ M 3+ M 15 M Cities Monthly Active Riders Monthly Active Drivers Trips/Day 1B 2B 5B 10 B Trips, Dec 2015 Trips, Jun 2016 Trips, May 2017 Trips, Jun 2018
6. Uber Rides Uber Freight Uber for Business UberEats Uber Health Advanced Technology Group Uber Bike ...
7. Uber Global Network Infrastructure
9. Data Center Network Fabric Example 6-plane, M-Pod Fabric Fabric Agg Switch Fabric Edge Switch Pod Switch Rack Switch
10. Multi-Region, Multi-Zone Design Zone Region
11. What is NetDevOps?
12. What Is DevOps? Infrastructure == Code
13. What Is NetDevOps? Network Infrastructure == Code
14. Autonomous Network System (ANS)
15. Traditional Network Infrastructure ● Manual Configuration ● Manual Validation ● Manual Operation Source: https://www.opservices.com/what-it-infrastructure-remote-monitoring-noc-is/
16. Goal of ANS ● Zero Touch Provisioning ● Zero Touch Validation ● Zero Touch Operation
17. ANS Three Layers Network Operation Network Management Network Infrastructure
18. Infrastructure Layer Network Operation Network Management Network Infrastructure API Network Operating System Hardware Platform
19. Management Layer Network Operation Network Management Change Management Circuit Management Device Management Network Provision Configuration Management Network Software Model Asset Management Network Infrastructure
20. Operation Layer Network Operation Network Management Network Infrastructure Auto-Remediation Network Validation Network Monitoring
21. Operation Auto-Remediation Network Validation Network Monitoring Change Management Management Circuit Management Device Management Network Provision Configuration Management Network Software Model Infrastructure Asset Management API Network Operating System Hardware Platform
22. Plan Phase Auto-Remediation Network Validation Network Monitoring Change Management API Circuit Management Network Operating System Device Management Hardware Platform Network Provision Configuration Management Network Software Model Asset Management
23. Implementation Phase Auto-Remediation Network Validation Network Monitoring Change Management Circuit Management Device Management Network Provision Configuration Management 1 Network Software Model Asset Management
24. Implementation Phase Auto-Remediation Network Validation Network Monitoring Change Management 2 Circuit Management Device Management Network Provision Configuration Management 1 Network Software Model Asset Management
25. Implementation Phase Auto-Remediation Network Validation Network Monitoring Change Management 2 3 Circuit Management Device Management Network Provision Configuration Management 1 Network Software Model Asset Management
26. Implementation Phase Auto-Remediation 4 Network Validation Network Monitoring Change Management 2 3 Circuit Management Device Management Network Provision Configuration Management 1 Network Software Model Asset Management
27. Operation Phase Auto-Remediation Network Validation Network Monitoring Change Management 1 Circuit Management Device Management Network Provision Configuration Management Network Software Model Asset Management
28. Operation Phase Auto-Remediation Network Validation 2 Network Monitoring Change Management 1 Circuit Management Device Management Network Provision Configuration Management Network Software Model Asset Management
29. Operation Phase Auto-Remediation Network Validation 2 Network Monitoring Change Management 1 Circuit Management Device Management 3 Network Provision Configuration Management Network Software Model Asset Management
30. Incident Response Auto-Remediation Network Validation 1 Network Monitoring Change Management Circuit Management Device Management Network Provision Configuration Management Network Software Model Asset Management
31. Incident Response 2 Auto-Remediation Network Validation 1 Network Monitoring Change Management Circuit Management Device Management Network Provision Configuration Management Network Software Model Asset Management
32. Incident Response 2 Auto-Remediation Network Validation 1 Network Monitoring Change Management 3 Circuit Management Device Management Network Provision Configuration Management Network Software Model Asset Management
33. Monitoring
34. Monitoring & Alerting Framework Graph Visualization SNMP Syslog Streaming RPC Data Collection Data Storage Event Detection Event Correlation Alert Manager CLI Data Source Provider Maintenance Event Reaction Alerting Block
35. Data Source Determines the protocols for data collection. Graph Visualization SNMP Syslog Streaming Data Collection Data Storage Event Detection Event Correlation Alert Manager RPC CLI Data Source Provider Maintenance Event Reaction Alerting Block
36. Provider Maintenance ● Maintenance notifications from external sources. ● Non-standard format across providers need special treatment. ● Email parser + API Graph Visualization SNMP Syslog Streaming Data Collection Data Storage Event Detection Event Correlation Alert Manager RPC CLI Data Source Provider Maintenance Event Reaction Alerting Block
37. Data Collection ● Implement the methods of data collection. ● Save the data to the storage. Graph Visualization SNMP Syslog Streaming Data Collection Data Storage Event Detection Event Correlation Alert Manager RPC CLI Data Source Provider Maintenance Event Reaction Alerting Block
38. Data Storage ● Convert the collected raw data into a formatted data. ● Save the processed data into database. Graph Visualization SNMP Syslog Streaming Data Collection Data Storage Event Detection Event Correlation Alert Manager RPC CLI Data Source Provider Maintenance Event Reaction Alerting Block
39. Graph & Visualization Present collected data in various of display format Syslog Streaming ● Chart ● Graph ● Table Network topology visualization. Graph Visualization SNMP Data Collection Data Storage Event Detection Event Correlation Alert Manager RPC CLI Data Source Provider Maintenance Event Reaction Alerting Block
40. Event Detection Designed to detect abnormal events. Syslog Streaming ● Metrics-based detector ● Log-based detector ● Trend-based detector Graph Visualization SNMP Data Collection Data Storage Event Detection Event Correlation Alert Manager RPC CLI Data Source Provider Maintenance Event Reaction Alerting Block
41. Event Correlation Perform the analysis across multiple detection sources for the purpose of: Graph Visualization SNMP Syslog Streaming Data Collection Data Storage Event Detection Event Correlation Alert Manager RPC ● ● ● ● Deduplication & grouping. Surface the root cause. Suppress the symptoms. Reduce notification noise. CLI Data Source Provider Maintenance Event Reaction Alerting Block
42. Alert Manager ● Single pane of glass system. Graph Visualization SNMP ● Graphical User Interface. Syslog Streaming ● A state machine. Data Collection Data Storage Event Detection Event Correlation Alert Manager RPC CLI Data Source Provider Maintenance Event Reaction Alerting Block
43. Event Reaction ● Email ● Pager ● Task creation ● Auto-remediation Graph Visualization SNMP Syslog Streaming Data Collection Data Storage Event Detection Event Correlation Alert Manager RPC CLI Data Source Provider Maintenance Event Reaction Alerting Block
44. Auto-Remediation
45. Remediation Policy Example - Link Down
46. Tools & Programming Language ● ● ● ● ● ● Git Puppet Jenkins Ansible Napalm Arachne ... ● Golang ● Python
47. Takeaways ● Know about Uber ● Understand NetDevOps ● Good network design simplifies automation framework ● Understand the end-to-end flow before development ● Automation is the future
48. We ignite opportunity by setting the world in motion