TiDB 与 TiFlash扩展——向真 HTAP 平台前进 韦万
登录发表评论
文字内容
1. TiDB with TiFlash Extension A Venture Towards the True HTAP Platform weiwan@pingcap.com
2. 在此键入姓名 在此键入tittle
3. About me ● Wei Wan ⻙韦万 ● R&D @ PingCAP ● Used to be a game / android / big-data dev, now a database core dev. ● Focused on storage engine & performance optimization
4. About this talk ● The TP & AP challenge ● What is TiFlash? ● How TiFlash is built? ● TiDB data platform
5. Data Platform - What You Think It Is BI Reporting Ad hoc App Databases Console
6. Data Platform - What It Really Is BI ETL Reporting Analytical DBs Ad hoc App OLTP DBs Console Data Warehouse / Data Lake
7. Why VS
8. Fundamental Conflicts ● Large / batch process vs point / short access ○ Row format for OLTP ○ Columnar format for OLAP ● Workload Interference ○ A single large analytical query might cause disaster for your OLTP workload
9. A Popular Solution ● Use different types of databases ○ For live and fast data, use an OLTP specialized database or NoSQL ○ For historical data, use Hadoop / analytical database ● Offload data via the ETL process into your Hadoop cluster or analytical database ○ Per hour or even per day ○ Complex offload procedures
10. Good enough, really?
11. Complexity or
12. Freshness or
13. Consistency or
14. TiFlash Extension
15. What Is TiFlash? ● An extended analytical engine for TiDB ○ Columnar storage and vectorized processing ○ Based on ClickHouse with tons of proprietary modifications ● Data sync via extended Raft consensus algorithm ○ Strong consistency ○ Trivial overhead ● Clear workload isolation for not impacting OLTP ● Tight integration with TiDB
16. What Is TiFlash? + Distributed database with MySQL protocol TiFlash AP extension = Real HTAP database!
17. What Is TiFlash? Spark Cluster TiSpark TiSpark Worker Worker TiFlash Node 2 TiFlash Node 1 TiFlash Extension Cluster TiD BTiD B TiKV Node 1 TiKV Node 2 TiKV Node 3 Store 1 Store 2 Store 3 Region 1 Region 2 Region 3 Region 4 Region 4 Region 3 Region 2 Region 1 Region 2 Region 3 Region 4 Region 1 TiKV Cluster
18. Columnstore vs Rowstore ● Columnar Storage stores data in columns instead of rows ○ Suitable for analytical workload ■ Possible for column pruning ○ Compression made possible and further IO reduction ■ Far less storage requirement ○ Bad small random read and write ■ Which is the typical workload for OLTP ● Rowstore is the classic format for databases ○ Researched and optimized for OLTP scenario for decades ○ Cumbersome in analytical use cases
19. Columnstore vs Rowstore Rowstore SELECT avg(age) from emp; id name age 0962 Jane 30 7658 John 45 3589 Jim 20 id name age 5523 Susan 52 0962 Jane 30 7658 John 45 3589 Jim 20 5523 Susan 52 Columnstore
20. Columnstore vs Rowstore “If your mother and your wife fell into a river at the same time, who would you save?” “Why not both?”
21. Low-cost Data Replication ● TiDB replicates log via Raft consensus protocol ● TiFlash replicates data in columnstore via Raft Learner ● Learner is a special read-only role in Raft ● Data is replicated to learner asynchronously ○ Write operation does not wait for learner finish replicating data ● Introduce almost zero latency for the OLTP workload
22. Low-cost Data Replication Leader Learner TiFlash R e g i o n A TiKV Region A TiKV TiKV Region A Region A Follower Follower
23. Strong Consistency ● Although data replication is asynchronous ● Read operation guarantees strong consistency ● Raft Learner read protocol + MVCC do the trick ○ Check readIndex on read and wait for necessary log ○ Read according to Timestamp within each log
24. Learner Read Timestamp : 17 Raft Leader 4 Raft Learner 3
25. Learner Read Raft Leader 4 Raft Learner 4
26. Update support ● It is hard to do update on columnar storage engine compared with row based engine. ○ Block structure ○ Rough index maintenance ○ Scan speed ● Even harder to support ONLINE, TRANSACTIONAL update
27. Update support key ts del value a 102 0 bob a 104 0 alice a 108 1 alice b 105 0 kevin b 107 0 joe Versioned rows (MVCC) L0 L0 L1 L0 } L1 L2 MutableMergeTree Storage Engine (Based on MergeTree of ClickHouse, LSM-Tree like design) In memory, rowbased (raft, transaction, cache) } On disk, columnar (MVCC, AP performance)
28. TiFlash is beyond columnar format
29. Scalability ● An HTAP database needs to store huge amount of data ● Scalability is very important ● TiDB relies on multi-raft for scalability ○ One command to add / remove node ○ Scaling is fully automatic ○ Smooth and painless data rebalance ● TiFlash adopts the same design
30. Isolation ● Perfect Resource Isolation ● Data rebalance based on the “label” mechanism ○ Dedicated nodes for TiFlash / Columnstore ○ Nodes are differentiated by “label” ● Computation Isolation is possible by nature ○ Use a different set of compute nodes ○ Read only from nodes with AP label
31. Isolation Peer 1 Peer 2 Peer 3 Peer 4 TiDB TiSpark TiFlash TiFlash Node 2 Region 2 TiKV TiKV TiKV TiFlash TiKV TiFlash Node 1 TiFlash Extension Cluster TiKV Node 1 TiKV Node 2 TiKV Node 3 Store 1 Store 2 Store 3 Region 1 Region 4 Region 2 Region 2 Region 3 Region 3 Region 3 Region 2 Region 4 Region 4 Region 1 Region 1 TiKV Cluster
32. Integration ● Tightly Integrated Interaction ○ TiDB / TiSpark might choose to read from either side ■ Based on cost ■ Columnstore is treated as a special kind of index ○ Upon TiFlash replica failure, read TiKV replica transparently ○ Join data from both sides in a single query
33. Integration SELECT AVG(s.price) FROM prod p, sales s WHERE p.pid = s.pid AND p.batch_id = ‘B1328’; TiDB / TiSpark Index Scan(batch_id = B1328) TableScan(price,pid) TiFlash Node 2 TiFlash Node 1 TiFlash Extension Cluster TiKV Node 1 TiKV Node 2 Store 1 Store 2 Store 3 Region 1 Region 4 Region 2 Region 2 Region 3 Region 3 Region 3 Region 2 Region 4 Region 4 Region 1 Region 1 TiKV Cluster TiKV Node 3
34. MPP Support ● TiFlash nodes form a MPP cluster by themselves ● Full computation support on MPP layer ○ Speed up TiDB since it is not MPP design ○ Speed up TiSpark by avoiding writing disk during shuffle
35. MPP Support TiFlash nodes exchange data and enable complex operators like distributed join. TiDB / TiSpark Coordinator Plan Segment TiFlash Node 1 TiFlash Node 2 MPP Worker MPP Worker TiFlash Node 3 MPP Worker
36. Performance ● Underlying Storage Engine supports Multi-Raft + MVCC ● Still comparable performance against Parquet ● Benchmark against Apache Spark 2.3 on Parquet ○ Pre-POC version of TiFlash + Spark
37. Performance
38. Performance A new storage engine is on the way, deliver at least 3x performance boost.
39. TiDB Data Platform
40. Traditional Data Platform BI ETL Reporting Analytical DBs Ad hoc App OLTP DBs Traditional data platform relies on complex architecture moving data around via ETL. This Console introduces maintenance cost and delay of data arrival in data warehouse. Data Warehouse / Data Lake
41. TiDB Data Platform BI Reporting Ad hoc App TiDB with TiFlash Console
42. Fundamental Change ● “What happened yesterday” vs “What’s going on right now” ○ Realtime report for sales campaign and adjust price in no time ○ Risk management with up-to-date info always ○ Very fast paced replenishment based on live data and prediction
43. Project status ● Beta / User POC in May, 2019 ● GA, By the end of 2019
44. 在此键入姓名 在此键入tittle
-
1
QCon2019 NLP深度培训 高扬
QCon大会
-
2
4 方秋枋
QCon大会
-
3
Apache Flink 社区最新动向及 ...
QCon大会
-
4
爱奇艺信息流广告的排序算法演进 刘国辉
QCon大会
-
5
构建多线程的 Electron 应用和性能...
QCon大会
-
6
日均百万订单下的高可用苏宁拼购系统架构设计...
QCon大会
-
7
4qcon广州 岑裕 保障API优雅与稳定...
QCon大会
-
8
阿里巴巴超大规模微服务实践 陈志轩
QCon大会
-
9
Flink 在 OPPO 的平台研发与应用...
QCon大会
-
10
知乎首页已读数据万亿规模下高吞吐低时延查询...
QCon大会
-
11
陈春华 青云在混合云架构设计的关键实践
QCon大会
-
12
智能 Web 研发初探 邵帅
QCon大会
-
13
唯品会容器环境与应用一键拉起—...
QCon大会
-
TiDB 与 TiFlash扩展&mdas...
QCon大会
-
15
小游戏质量保证测试实践之路 王昱杰
QCon大会
-
16
Golang 内存管理探微—&...
QCon大会
-
17
腾讯广告高可用的深度学习技术架构 唐溪柳
QCon大会
-
18
王鹏飞 Kubernetes容器存储解决之道
QCon大会
-
19
京东物流仓储数据分发平台架构实践及挑战 江龙飞
QCon大会
-
20
融合 Kotlin 和 Swift 语言进...
QCon大会
-
21
用 Clojure 改善 Java 项目的...
QCon大会
-
22
宋斌 美团一站式业务稳定性保障平台的 AI...
QCon大会
-
23
2019年的9102年的微服务 江南白衣
QCon大会
-
24
C端服务端渲染(SSR)和性能优化实践 桑世龙
QCon大会
-
25
主题演讲 乔新亮
QCon大会
-
26
Apache Spark 2,4 和未来 王耿亮
QCon大会
-
27
朱剑峰 网易 Spring Cloud 万...
QCon大会
-
28
Meshing up Open Platf...
QCon大会
-
29
从“996ICU”...
QCon大会
-
30
大到不能慢——敏捷...
QCon大会
-
31
从 Darknet 到 Tensorfow...
QCon大会
-
32
吉奇 高并发场景下分布式实时信令系统的架构实践
QCon大会
-
33
编程语言 王文槿
QCon大会
-
34
PB级数据检索平台 ElasticSear...
QCon大会
-
35
机器学习在广告创意优化中的应用 潘尧振
QCon大会
-
36
架构师和技术总监的两面一体 黄良懿
QCon大会
分享