刘奇 CockroachDB设计与实现

卓乐儿

2018/05/13 发布于 技术 分类

关于存储引擎,重用已有的成果,不是整个系统的重点。rocksdb已经足够快了。设计上考虑支持多种存储引擎。关于hlc.hybird logic clicks 保持logic clock 特点的同时,逼近真实时间。在2015年中国数据库技术大会上来自豌豆荚资深系统架构师刘奇为我们分享了CockroachDB设计与实现的精彩内容。

文字内容
1. CockroachDB 设计与实现 by 刘奇 微博 @goroutine
2. 数据库的演化 SQL时代:MySql, PostgreSQL NoSQL: MongoDB, redis, HBase… NewSQL: Google F1, FoundationDB, CockroachDB
3. Why Transaction + Scale
4. Google的历程 2004: BigTable eventually consistent NoSQL 2006: Megastore (on top of BigTable) transactional, slow, complex 2010: remove sharding mysql 2012: Spanner (+F1 on top of it) semi-relational, fully linearizable
5. 历史上的努力 cobar: just sharding. Simple vitess: from youtube. Complex 还有无数各大公司造的轮子 …. No distributed transaction !!!!!!!
6. CockroachDB ● Cockroach is CP in CAP ● “A”vailable not same as “H”ighly “A” vailable
9. 事务原理 ● Variation of two phase commit ● Txn writes stored as MVCC “intents” ● Txn table has a single key / txn ○ Stores txn status, timestamp, priority ○ Modified by concurrent txns - first writer wins ○ The single source of truth ● 2nd phase more efficient -- 1 write ● Intents resolved after commit
10. Txn table在哪里? ● 也作为kv存在某个range里面,具体由 txn.Key决定
11. Linearizability ● Serializable for all cases ● Temporal reverse? ● Client decision: perf / correctness ● Max Timestamp ○ Passed to order causal txns ● Commit wait ○ Always waiting means true linearizability ○ More accurate clocks = less wait
12. 选择读时间戳 Spanner: TrueTime always reads committed value Cockroach: Doesn’t wait on writes Sometimes waits on reads For txn, choose Tstart, Tmax If read encounters timestamp in “uncertainty” window, restart read-only txn Shrinking window means max wait is clock skew
13. Uncertainty ● ---------ts--------------txn.MaxTs-----------> ● 剪头指向的绝对时间方向
14. 关于 存储引擎 ● 重用已有的成果,不是整个系统的重点 ● rocksdb 已经足够快了 ● 设计上考虑支持多种存储引擎
15. 关于 hlc ● hybird logic clocks ● 保持 logic clock 特点的同时,逼近真实时 间
16. hlc算法