万亿级大数据平台的架构设计与演进实践

万亿级大数据平台的架构设计与演进实践

1. ӡՊᕆय़හഝଘ‫ݣ‬ጱຝ຅ᦡ ᦇӨᄍᬰਫ᪢ ᴯ᩻ Ӡᇍԯ ದ๞௛ፊ
6. • PandoraᓌՕ • WorkflowᦡᦇӨս۸ • ௛ᕮ
7. • PandoraᓌՕ • WorkflowᦡᦇӨս۸ • ௛ᕮ
8. य़හഝଘ‫ݣ‬-Pandora 简单 高效 开放
9. य़හഝଘ‫ݣ‬-Pandoraຝ຅ࢶ API / logkit / ၾ௳ ੕‫ڊ‬ ᦇᓒ ᦇᓒ ၾ௳ ၾ௳ ੕‫ڊ‬ ੕‫ڊ‬ HTTP Report Studio ෸ଧහഝପ ᦇᓒ ၾ௳ ૡ֢ၞ୚ක ੕‫ڊ‬ ੕‫ڊ‬ ෭ப༄ᔱ๐ ੒᨝ਂ‫ؙ‬๐ ‫ۓ‬ XSpark ‫ۓ‬
10. ਫ෸ૡ֢ၞ୚ක
11. ᐶᕚૡ֢ၞ୚ක
12. ෭ப༄ᔱ๐‫(ۓ‬LogDB) • ṛಘ઀݊ṛᑞਧጱ෭ப༄ᔱSaaS๐‫ۓ‬ • ّ਻ESັᧃ᧍ဩ • Ө୏რኞாᔲੂᕮ‫ݳ‬ • ૪ᕪᕪᬦ᩻य़ᥢཛྷහഝᰁ༄ḵ
13. ٌਙ๐‫ۓ‬ • TSDB: ṛ௔ᚆ‫૲ړ‬ୗ෸ଧහഝପ • BI Studio: ಸᤒૡ֢ਰ • XSparkғSpark๐‫ۓ‬
14. ᇙ‫ڦ‬Օᕨ(Logkit) • LogkitฎӠᇍPandora୏‫ݎ‬ጱӞӻ ᭗አጱහഝ᯻ᵞૡٍ҅‫ݢ‬զਖ਼ӧ‫ݶ‬ හഝრጱහഝො‫׎‬ጱ‫ݎ‬ᭆ‫ک‬ Pandoraᬰᤈහഝ‫ړ‬ຉ҅ᴻԧच๜ ጱහഝ‫ݎ‬ᭆ‫ۑ‬ᚆ҅Logkitᬮํ਻ Კ̵ଚ‫̵ݎ‬ፊഴ̵‫ڢ‬ᴻᒵ‫ۑ‬ᚆ̶ • Logkitඪ೮෈կ̵MySQL̵ MSSQL̵ES̵MongoDB̵Kafka ݊RedisᒵӞᔮ‫ڜ‬හഝრ̶ • github.com/qiniu/logkit
15. • PandoraᓌՕ • WorkflowᦡᦇӨս۸ • ௛ᕮ
16. ຝ຅ᓌࢶ Logkit/SDK apiserver ᧣ଶ࢏ ᦇᓒ୚ක ၾ௳ᴚ‫ڜ‬ mongodb export service LogDB ಸᤒ๐‫ۓ‬ TSDB Ӡᇍԯਂ‫ؙ‬ HTTP
17. හഝၞ‫ڽ‬ຉ Ӟᛱ୽ߥࢩᔰ • ᩒრ‫ڥ‬አሲ • ॒ቘපሲ • ๙໲පଫ • ᱾᪠ഖᘙ • ٌ՜ አಁ ၾ௳ᴚ‫ڜ‬ ᦇᓒձ‫ۓ‬ ၾ௳ᴚ‫ڜ‬ ੕‫ڊ‬ ӥ჋ᔮᕹ
18. හഝള‫੶ف‬ አಁ apiserver cluster apiserver apiserver apiserver ᧣ଶ࢏ ၾ௳ᴚ‫ڜ‬ mongodb server server server
19. ਻࢏۸ አಁ pod pod pod ፊഴ apiserv apiserv ਻࢏ԯ apiserv ᧣ଶ࢏ mongodb server server server ၾ௳ᴚ‫ڜ‬
20. ۖாಘ਻ አಁ ਻࢏ԯ • चԭ෸ଧහഝጱፊഴ • चԭፊഴහഝᛔۖಘ਻ᖽ਻ pod pod pod pod ፊഴ apiserv apiserv apiserv apiserv ᧣ଶ࢏ server mongodb server server ၾ௳ᴚ‫ڜ‬
21. හഝٟ‫ف‬ս۸ receiver ᧛‫ݐ‬ᤈٟ‫ف‬channel receiver line channel ᧛‫ݐ‬Ӟᤈ ՗channelӾ᧛‫ݐ‬හഝ ᧛‫ݐ‬ӥӞᤈ parser ս۸ parser හഝ᭄‫کے‬recordsӾ parser හഝ᭄‫کے‬recordsӾ records records ٟ‫ف‬ၾ௳ᴚ‫ڜ‬ parser writer ٟ‫ف‬ၾ௳ᴚ‫ڜ‬ writer
22. ᦇᓒ mongodb launch task container spark driver ᧣ଶ࢏ container manage job launch task submit job cluster manager transform server container spark driver container launch task ᦕ୯Job‫ز‬හഝ޾ᇫா container spark driver container Z Z Z zookeeper cluster
23. ੕‫ڊ‬ ᬳളӤӥ჋ • ձ‫ړڔۓ‬ • ᧣ଶ • ձ‫ۓ‬ᛔ࣐ۖᤍ • mongodb ӥ჋ᔮᕹ ᧣ଶ࢏ export cluster ࿜ଘಘ઀ • ᩒრᵍᐶ • ṛ‫ݢ‬አ ӥ‫ݎ‬ձ‫ۓ‬ master master master server server server Ӥಸᇫா ᦕ୯᧣ଶ‫௳מ‬ ၾ௳ᴚ‫ڜ‬ Ԇ॓ᭌԈ server Z Z Z zookeeper server server
24. ձ‫ړڔۓ‬Өᓕቘ master serverᓕቘ ၾ௳ᴚ‫௳מڜ‬᯻ᵞ ձ‫᧣ۓ‬ଶ ӥ‫ݎ‬ձ‫ۓ‬ ӥ‫ݎ‬ձ‫ۓ‬ Ӥಸஞ᪡ T1 T2 ӥ‫ݎ‬ձ‫ۓ‬ Ӥಸஞ᪡ T3 T4 server1 Ӥಸஞ᪡ T5 T6 server2 server3 ၾ௳ᴚ‫ڜ‬ Q1 Q2 Q3 T7 QN
25. ᧣ଶොဩ • ᶎ‫ݻ‬ᩒრ • ꧌‫ڥړ‬አ୑຅๢࢏ • ᛔۖ᧣ෆ
26. ձ‫ړۓ‬ᯈ • ձ‫࣐ۓ‬۰‫ړ‬ᯈࣁserverӤ • T8޾T9‫فے‬ master master master server1 T1 T2 T8 server2 T3 T9 T4 server3 T5 T6 T7
27. ᛔۖ᧣ෆ • ձ‫࣐ۓ‬۰‫ړ‬ᯈࣁ3‫ݣ‬serverӤ master master master server1 T1 T2 T8 server2 T3 T9 T4 server3 T5 T10 T6 T7
28. ᛔۖ᧣ෆ • ձ‫࣐ۓ‬۰‫ړ‬ᯈࣁ3‫ݣ‬serverӤ • T2̵T8̵T4̵T9ᤩ‫ڢ‬ᴻ master master master server1 T1 server2 T3 server3 T5 T10 T6 T7
29. ᛔۖ᧣ෆ • ձ‫࣐ۓ‬۰‫ړ‬ᯈࣁ3‫ݣ‬serverӤ • T2̵T8̵T4̵T9ᤩ‫ڢ‬ᴻ • ᩒრ‫ڊ‬ሿӧ࣐ᤍጱఘ‫᥶҅٭‬ master master master server1 ‫ݎ‬ձ‫ۓ‬ᛔ࣐ۖᤍ T1 server2 T3 server3 T5 T10 T6 T7
30. ᛔۖ᧣ෆ • ձ‫࣐ۓ‬۰‫ړ‬ᯈࣁ3‫ݣ‬serverӤ • T2̵T8̵T4̵T9ᤩ‫ڢ‬ᴻ • ᩒრ‫ڊ‬ሿӧ࣐ᤍጱఘ‫᥶҅٭‬ master master master ‫ݎ‬ձ‫ۓ‬ᛔ࣐ۖᤍ • server1 server2 server3 ᧣ଶձ‫ۓ‬ᛗᑮᳳ๢࢏ T1 T5 T3 T6 T10 T7
31. ࿜ଘಘ઀ • 3‫ݣ‬server૪ᕪ‫ق‬᮱॒ԭჿ master master master ᨮ᫹ఘ‫٭‬ • ෛ‫فے‬ጱձ‫ۓ‬T13෫ဩᤩํ ප॒ቘ server1 T1 T2 T3 T4 server2 T5 T6 T7 T8 server3 T10 T10 T11 T12 T13 Ҙ
32. ࿜ଘಘ઀ master master master ӥ‫ݎ‬ձ‫ۓ‬ ಸஞ᪡ server1 T1 T2 T3 T4 server2 T5 T6 T7 T8 server3 T10 T10 server4 T11 T12 T13
33. ᩒრᵍᐶ • ᵍᐶᇙྛᔄࣳձ‫ۓ‬ • ‫ڥ‬አᇙྛᏝկᩒრ ձ‫ݎۓ‬ሿ serverᓕቘ ᧣ଶᕟᓕቘ ᧣ଶ࢏1 ᧣ଶ࢏2 ձ‫ݎړۓ‬ ᧣ଶ࢏3
34. masterṛ‫ݢ‬አ • master᭗ᬦಶᲁ๶٬ਧԆ޾॓ • Ԇmasterဳٙᛔ૩ጱ᫝ղ‫ک‬zk master1 master2 ಶᲁ౮‫ۑ‬ ಶᲁ०ᨳ Z Z Z zookeeper
35. masterṛ‫ݢ‬አ • master᭗ᬦಶᲁ๶٬ਧԆ޾॓ • Ԇmasterဳٙᛔ૩ጱ᫝ղ‫ک‬zk master1 master2 Ӷᲁ ಶᲁ౮‫ۑ‬ Z Z Z zookeeper
36. serverṛ‫ݢ‬አ master master master server1 T1 T2 server2 T3 T4 server3 T5 T6
37. serverṛ‫ݢ‬አ master master master ஞ᪡Ӷ०҅ఽᎣserverਣ๢ server1 T1 T2 server2 T3 T4 server3 T5 T6
38. ᔮᕹᕆ࿜ଘಘ઀ • kafka‫ܔ‬ᵞᗭॠᜰ຃ • ᵞᗭᕆscale • ᅎၚ᪠ኧ apiserver ᦇᓒ୚ක kafka export service
39. ᔮᕹᕆ࿜ଘಘ઀ • kafka‫ܔ‬ᵞᗭॠᜰ຃ • ᵞᗭᕆscale • ᅎၚ᪠ኧ apiserver ᦇᓒ୚ක ᧣ଶ࢏ ፊഴ kafka kafka export service kafka
40. Ӥӥ჋‫ᦓܐ‬ս۸ • Json vs Protobuf type Test struct { Uid BatchSize Hostname Method Operation Instance ReqBodyLength ReqId RespBodyLength RespCode RespTime Timestamp } string int64 string string string string int64 string int64 int64 int64 int64 `json:"uid"` `json:"batchSize"` `json:"hostname"` `json:"method"` `json:"operation"` `json:"instance"` `json:"reqBodyLength"` `json:"reqId"` `json:"respBodyLength"` `json:"respCode"` `json:"respTime"` `json:"timestamp"` ᶱፓ Json Protobuf ଧ‫ڜ‬۸ (ns/op) 82161 67833 ‫ݍ‬ଧ‫ڜ‬۸ (ns/op) 36380 7705 ଧ‫ڜ‬۸ᳩଶ (byte) 259 100
41. ၞ࿜ᕚ॒ቘ batch1 • ੕‫॒ڊ‬ቘཛྷࣳ • ၞ࿜ᕚଚᤈ॒ቘ ೉‫ݐ‬ ॒ቘ ೉‫ݐ‬ വᭆ വᭆ batch1 ೉‫ݐ‬ batch2 ೉‫ݐ‬ വᭆ batch2 batch3 ೉‫ݐ‬ വᭆ batch3 വᭆ വᭆ വᭆ വᭆ വᭆ ೉‫ݐ‬ ೉‫ݐ‬ ೉‫ݐ‬ ೉‫ݐ‬ ೉‫ݐ‬
42. Golang GC • stop the world sync.Pool • ᯿አ੒᨝ • Golangᇇ๜‫܋‬ᕆ •
43. ํᴴᩒრ‫ᦡ؃‬ • ‫֖ܔ‬ᩒრ๐‫ۓ‬ᚆ‫ێ‬ • ᩒრֵአᦧ֌ • ᩒრᥢ‫ښ‬
44. • PandoraᓌՕ • WorkflowᦡᦇӨս۸ • ௛ᕮ
45. ౮ຎ • ྯॠඪඅӡՊᕆහഝᅩ̵PBᕆහഝᰁ • ඪ೮ၹᰁአಁ • ຄ֗ጱᔮᕹ୊᬴ • ᛔۖ۸ᬩᖌ • ‫ݢ‬አ௔ᬡ‫ک‬99.9%
46. ౮ຎ • Ӡᇍԯಅํӱ‫ۓ‬ጱ෭ப‫ق‬᮱ᬢᑏᛗPandora • य़ᰁक़᮱ਮಁള‫ف‬
47. ᘶᔮ౯ժ