破解云原生应用的可观测性 刘征 PHPCON2019

Cloudwu

2019/08/20 发布于 技术 分类

PHPConChina2019 

文字内容
1. 2019-8, , elastic, @martinliu
2. PHPCon PPT https://github.com/ThinkDevelopers/PHPConChina PPT PHPCon
3. elastic • DevOps • DevOps Handbook • The Reliability Workbook https://martinliu.cn 2
4. 1 2 3 4 5 3 Q&A
5. • • • DevOps • • Kubernetes • • • IaaS • PaaS CaaS
6. Monolith at scale • • • • • • • 5
7. Micro service at Scale Wheel of Doom This diagram, taken from late 2014, illustrates the interactions between services running on Hailo’s platform. 6
8. vs. / Docker 7
9. Rethink Service Monitoring
10. 1 2 3 4 5 9 Q&A
11. 10
12. 11
13. 12
14. Baron SchSchwarz CTO of VividCortex 13
15. Alerting Overview Monitoring Ops Debugging Observability Dev Profilling dependency 14
16. Log 15 Metric Tracing
17. 1 2 3 4 5 16 Q&A
18. 0 17 1 2 3
19. Level 0 : 18
20. Level 0 : • • • • • • ip/ / • etcd Zk • •
21. Level 0 20
22. Elastic Heartbeat Level 0 1 2 3 4 21 Heatbeat
23. Elastic Heartbeat Level 0 : • • • DDos
24. KPI Level 1 07/Jan/2019 16:10:00 all 2.58 0.00 0.70 1.12 0.05 95.55 server1 containerX regionA 07/Jan/2019 16:20:00 all 2.56 0.00 0.69 1.05 0.04 95.66 server2 containerY regionB 07/Jan/2019 16:30:00 all 2.64 0.00 0.65 1.15 0.05 95.50 server2 containerZ regionC 10 23 CPU load
25. / Level 1 1 • CPU • • 24 2 • • • 3 • • •
26. Level 1 / /metrics Agent /Infra /vm 25 /metrics Agent /metrics /
27. Level 1 / /metrics 26 /Infra /metrics /vm /metrics /
28. Level 1 CustomerID 27
29. Elastic Level 1 / Beat /Infra Beat /vm Beat Logstash Kibana Elasticsearch + Beat 28
30. Level 1 : SRE SLO SLI / CustomerID beats logstash elasticsearch Elastic Stack elasticsearch
31. Level 2 : 64.242.88.10 - - [07/Jan/2019:16:10:02 -0800] "GET /mailman/listinfo/hsdivision HTTP/1.1" 200 6291 64.242.88.10 - - [07/Jan/2019:16:11:58 -0800] "POST /twiki/bin/view/TWiki/WikiSyntax HTTP/1.1" 404 7352 64.242.88.10 - - [07/Jan/2019:16:20:55 -0800] "GET /twiki/bin/view/Main/DCCAndPostFix HTTP/1.1" 200 5253 30
32. Level 2: 31
33. Level 2: 32
34. Level 2 : Centralized Searchable Correlated
35. Level 2 : C B D A ERROR [svc=A] [trace=x5y6z7] Failed to process order Cause: Order process manager responded with 500 E x5y6z7 ERROR [svc=F] [trace=x5y6z7] Failed to complete order Cause: Shipping service responded with 500 F x5y6z7 G x5y6z7 J INFO [svc=G] [trace=x5y6z7] Items verified in stock ERROR [svc=H] [trace=x5y6z7] Failed to server order Cause: Cassandra timeout exception H x5y6z7
36. Level 2 : 2018-02-20T16:38:23+00:00 ERROR Timestamp Level Service commit Read time out ! ~~ 2018-02-20T16:38:23+00:00 Error Log Read time out registration-service 5938deumw4 Region ap-northeast-1 customerID 87762 traceID x5y6z7 team Beijing-T1 Build 5938deumw4 Node Reg-host012 userID 87762 Runtime java-1.8.0_160 JSON
37. Level 3 03:43:45 Request "GET cyclops.ESProductDetailView" 03:43:57 Response "cyclops.ESProductDetailView 200 OK" 12 36 zzzzZZZZzzz
38. Level 3 03:43:59 Request "POST /api/checkout" 03:43:59 Response "/api/checkout 500 ERROR" 37
39. Level 3 Trace- 38
40. Elastic APM Level 3 39
41. Level 3 Web Agent Agent UI Web apm-server Agent Agent Agent Web Agent 40 Elasticsearch Kibana
42. Level 3 • • • • 41 OpenTracing API W3C
43. 1 2 3 4 5 42 Q&A
44. L2 L1 L3
45. { { } & 44 } &
46. • • APM • • APM • • • 45 • •
47. 46
48. PHPCON http://www.phpconchina.com PHPCon https://k.weidian.com/H3=4lVho PHPConChina 11643
49. Level 0 : THANK YOU