全球运维技术大会

朱伟 支持百亿请求的微博广告运维技术实践

1. ඪ೮ጯՊ᧗࿢ጱங‫ܗ‬ଠ‫ޞ‬ᬩᖌದ ๞ਫ᪢ ๟ւ ங‫ܗ‬ଠ‫ޞ‬SREࢫᴚᨮᨱՈ
2. • ᬩᖌࣁଠ‫֛ޞ‬ᔮӾጱհ‫؀‬ • ॔๥ӱ‫࣋ۓ‬วӥጱSREୌᦡԏ᪠ • ၹᰁ೰ຽፊഴଘ‫ݣ‬Oopsਫ᪢ ๟ւ Kimi ங‫ܗ‬ଠ‫ޞ‬SREࢫᴚᨮᨱՈ ̽ฬᚆᬩᖌғ՗0൫ୌय़ᥢཛྷ‫૲ړ‬ୗAIOpsᔮᕹ̾ԡ֢ᘏԏӞ
3. ᬩᖌ֛ᔮ‫઀ݎ‬ᴤྦྷ Ոૡ ᴤྦྷ ޸ե ૡٍ ᴤྦྷ ᚕ๜ ୏რ ૡٍ Dev Ops ଘ‫ݣ‬ ᥢ᝜ AiOps य़හഝ ๢࢏ ਍ԟ
4. SREࣁங‫ܗ‬ଠ‫ޞ‬Ӿጱհ‫؀‬ ӱ‫ۓ‬ ‫ݢ‬አ௔ ൉ṛපሲ ս۸ᔮᕹ ᔮᕹ௔ᚆᦧ֌ ඳᵑᬥ᭛ਧ֖ ଫ௒Ԫկ॒ቘ ᧗࿢᱾᪠᪙᪵ դᎱள᭛ᬽդ ೰ຽᩳ۠ᶼၥ ……
5. • ᬩᖌࣁଠ‫֛ޞ‬ᔮӾጱհ‫؀‬ • ॔๥ӱ‫࣋ۓ‬วӥጱSREୌᦡԏ᪠ • ၹᰁ೰ຽፊഴଘ‫ݣ‬Oopsਫ᪢
6. ๐‫ۓ‬လቘ
7. ๐‫ۓ‬လቘ ‫ٺ‬੝ඳᵑ ௔ᚆս۸ ᱾᪠᪙᪵ ᩒრ‫ݳ‬ቘ‫ڥ‬አ ள᭛ᴳᕆ ଘ‫ݣ‬۸ ᛔۖ۸ ฬᚆ۸ ள᭛ಘ਻ ள᭛‫ݒ‬ๅ පሲս۸ ሾह ᮱ᗟ ਞ‫ق‬ ᦊᦤ ๐‫ۓ‬ ᨶᰁ ᥢ᝜ ᕅਧ ᨶወ ᵱ࿢ ᪙᪵ ඳᵑ
8. ๐‫ۓ‬လቘ ๐‫ۓ‬᮱ᗟ‫ܔ‬ᅩ ၞᰁ‫૲ړ‬ӧ࣐ ‫ܔ‬๢಄ಥ᫹ํᴴ ᪜๢಄᧗࿢
9. ๐‫ۓ‬လቘ ๏ ๐‫ۓ‬ग़๢಄࣐ᤍ᮱ᗟ ๏ ‫ࣁ૲ړ‬ӧ‫ݶ‬ᬩ០ࠟ ๏ ๢಄ಥ᫹ᚆ‫ێ‬ٞ֟ ๏ ၞᰁ᧗࿢࣐۰‫૲ړ‬ ๏ Ӥӥ჋‫ݶ‬๢಄᧗࿢ 100+๐‫ۓ‬ 3+๢಄/ᬩ០ࠟ 1.5‫׭‬ၞᰁٞ֟ ሾहຽ‫ٵ‬۸
10. ๐‫ۓ‬လቘ • ຤Ծߝᕚ௔ᚆܴၥ
11. ᛔۖ۸ᬩᖌଘ‫ݣ‬ ள᭛ಘ਻ ள᭛ᴳᕆ ள᭛‫ݒ‬ๅ ᓕቘํଧ ൉ṛපሲ ਞ‫ق‬඙֢
12. ᛔۖ۸ᬩᖌଘ‫ݣ‬ Kunkka Visualization Terminal Consul CMDB DCP Nexus Git Jenkins Celery SaltStack ਻࢏ ᇔቘ๢
13. ᛔۖ۸ᬩᖌଘ‫ݣ‬ $XGLW Jenkins Slave nexus 3UH2QOLQH Git Jenkins Server …… Hosts Jenkins Slave Registry DCP Kunkka • ग़ሾहᖫᦲ • ᛔۖ಑۱ • ᛔۖ۸ၥᦶ • ᛔۖ᮱ᗟ • ۖாಘᖽ਻ • ग़ᕆਭ໐
14. ํපጱಸᦄ ই֜൉ṛಸᦄጱํප௔҅‫ٺ‬੝᧏ಸҘ ᨶወᵱ࿢ጱ‫ݳ‬ቘ௔ ಸᦄᘸ‫ݳ‬ ᭄᪵შრ
15. ํපጱಸᦄ 9000 8885 2017ଙಸᦄහᩳ۠ 6750 6818 4500 5814 6253 5818 7445 6067 4057 2250 0 1์ 2์ 3์ 4์ 5์ 6์ 7์ 8์ 2226 1203 1807 1222 9 ์ 10 ์ 11์ 12์
16. ‫ق‬᱾᪠Traceᔮᕹ • ෭ப໒ୗӨᥴຉ ᵱᥝ᭗ᬦ‫੶ᦓܐ‬փᬌਁྦྷ ๐‫֢ۓ‬ԅ&OLHQWᒒ ‫ݎ‬ᭆ᧗࿢‫ک‬ӥ჋ጱ෸ᳵ ԅ᧣አӥ჋๐‫ۓ‬ ኞਂጱVSDQLG ๐‫֢ۓ‬ԅ&OLHQWᒒ՗ ‫ݎ‬ᭆ᧗࿢‫ک‬ളත ᬬࢧ᧗࿢ጱ෸ᳵ૧ ӥ჋Ԇ๢ ӥ჋๐‫ۓ‬ ӥ჋ᒒ‫ݗ‬ cspanid cs duration rh rs rp Traceid Spanid sr duration cspan list annotation ‫ܔ‬๵᧗࿢‫ق‬ੴࠔӞ᪙᪵ID ‫ེܔ‬ᗑᕶ᧣አ,' ๐‫֢ۓ‬ԅ6HUYHUᒒጱ ളත෸ᳵ҅෸ᳵԅPV ๐‫֢ۓ‬ԅ6HUYHUᒒ ՗ളත᧗࿢‫ک‬ᬬࢧ ᧗࿢ጱ෸ᳵ૧҅ ෸ᳵԅPV ᧗࿢ӥ჋๐‫ڜۓ‬ᤒ ᛔਧԎਁྦྷ
17. ‫ق‬᱾᪠Traceᔮᕹ ෭பኞԾ Application ӱ‫ۓ‬᭦ᬋ 7UDFHᕟկ ෭பᕟկ kafka ෭பၞୗ॒ቘӨ‫ਂؙ‬ Flink ClickHouse Filebeat ෭ப‫ݎ‬ᭆ०ᨳᴚ‫ڜ‬ • හഝතᵞӨ॒ቘ හഝ‫ړ‬ຉ‫ݢ‬ᥤ۸ ๐‫ۓ‬೐ಏ ‫ݢ‬ ෭ப༄ᔱ ᥤ ۸ ඳᵑഭັ ᕹᦇ‫ړ‬ຉ ೰ຽፊഴ ௔ᚆ‫ړ‬ຉ ၞᰁᦧ֌
18. ‫ق‬᱾᪠Traceᔮᕹ ັᧃ,';ᔱ୚ ೲ෸ᳵ‫ׯ‬ഭ ‫܃‬ᯈ7UDFH,' ઀ᐏ๋ᬪጱ7UDFH,' ‫ى‬ᘶ̵ഭଧ UID • ӱ‫ᧃັۓ‬ ClickHouse 1 TraceID 2 TraceID 3 TraceID N TraceID ClickHouse ෭ப॒ቘ ᱾᪠઀ᐏ
19. ‫ق‬᱾᪠Traceᔮᕹ
20. • ᬩᖌࣁଠ‫֛ޞ‬ᔮӾጱհ‫؀‬ • ॔๥ӱ‫࣋ۓ‬วӥጱSREୌᦡԏ᪠ • ၹᰁ೰ຽፊഴଘ‫ݣ‬Oopsਫ᪢
21. ፊഴଘ‫ݣ‬ጱ೴౴ ETL Metric Event Log Graph Metrics DB Alert • ୊᬴ • ؇૧ • ӧᑞਧ
22. ፊഴଘ‫ݣ‬ጱፓຽ ᓌ‫ܔ‬ ᑞਧ ‫ݢ‬አ௔ ਫ෸ ‫ٵ‬Ꮯ
23. ෆ֛ຝ຅ Oops *UDSK ‫ق‬᱾᪠Trace ෭பᔮᕹ ElasticSearch ClickHouse Druid Prometheus Graphite MySQL ಸᦄᔮᕹ ᓕቘ ᔮᕹ Redis Hive ೰ຽ‫ݢ‬ᥤ۸ ೰ຽਂ‫ؙ‬ Zookeeper Logstash Filebeat ᔮᕹ෭ப Flink ๐‫ۓ‬෭ப HDFS HBase Kafka ӱ‫ۓ‬෭ப …… ള‫ݗ‬හഝ ೰ຽႴ။ හഝ᯻ᵞ
24. ᛔۖ۸᯻ᵞ ṛප ᫷ᰁ ਠෆ ᅎၚ
25. ᛔۖ۸᯻ᵞ • ௔ᚆܴၥ • ‫ᜓܔ‬ᅩ • 1໐CPU • Snappyܴᖽ • kafka partition = 10 • required_acks = 1 • 24000 TPS • 27 Mb/s • ०ᨳ෭பහԅ 0 • ᯈᗝ݇හ • flush.min_events • close_inactive • scan_frequency • ignore_older • clean_inactive • required_acks • bulk_max_size • compression
26. ᯈᗝ۸Ⴔ။ ෭ப᯻ᵞ Write ೮ԋ۸ Hbase ೮ԋ۸฿‫ط‬෭பsink ᥴຉ฿‫ط‬෭ பmark_id kafka spout Hbase Search Search Hbase ‫ى‬ᘶ฿‫ط‬ ԰ۖ ᥴຉ԰ۖ෭ பmark_id kafka spout ฿‫ط‬԰ۖ‫ى‬ᘶsink Flink හഝ‫ى‬ᘶ input Flink Filter output …… input Filter output Kafka Sentinel input Logstash Filter output …… HDFS ᐶᕚᦇᓒ input Filter output හഝႴ။ ClickHouse Graphite ES හഝਂ‫ؙ‬
27. ਫ෸೰ຽՙପ • ӧ‫ݶ‬෸ᳵᔉଶጱັᧃ • ӧ‫ݶ‬ӱ‫ۓ‬ᖌଶጱᕟ‫ݳ‬ • ൉ṛັᧃፘଫ᭛ଶ • ॔๥ӱ‫ۓ‬᭦ᬋਧ‫ګ‬ • හഝ॔አ ‫ݢ‬ᥤ۸ ฿‫ط‬ॠ ᤈԅ԰ۖᑁ ଫአ੶ ᧗࿢ᑁ ฿‫ط‬ᑁ ԰ۖᑁ ᘸ‫੶ݳ‬ ᧗࿢ ฿‫ط‬ ԰ۖ ܻত੶ ෭ப
28. ਫ෸೰ຽՙପ ClickHouseਫ෸೰ຽՙପ໛ຝ Visualization Analytics Monitor Replicated Aggregating MergeTree Distributed Replicated Summing MergeTree Replicated MergeTree Materialized View
29. ਫ෸೰ຽՙପ ܻতᤒ ೰ਧਁྦྷᘸ‫ݳ‬ ᘸ‫ݳ‬ᤒ
30. ਫ෸೰ຽՙପ
31. ਫ෸೰ຽՙପ 4667Պ๵හഝᰁ ๋य़QPS60w හഝ ‫ړ‬ຉ ग़ᖌ ձ఺ ᘸ‫ᧃັݳ‬ 5‫ݣ‬๐‫࢏ۓ‬ ೰ຽ ፊഴ ‫ܔ‬ᤒ๋ग़150ਁྦྷ ᱾᪠ ᪙᪵ ᑁᕆ‫ݢ‬ᥤ۸ᔉଶ
32. ೰ຽ‫ݢ‬ᥤ۸
33. ፊഴଘ‫ݣ‬ጱӞԶ౮ᖂ ӱ‫ۓ‬ 80+ 2000+ ๢࢏ හഝრ 100+ • ӧ‫ݶ‬QPSӥᇫாᎱ‫૲ړ‬ ೰ຽහ • ‫ݱ‬๢಄ᇫாᎱ‫૲ړ‬ • ‫ᜓݱ‬ᅩᇫாᎱ‫૲ړ‬ 150+ӡ 20+ӡ QPS 8+ӡ ಸᦄහ
34. ᘶᔮ౯ժ

相关幻灯片