A Specialized B-tree for Concurrent Datalog Evaluation

1. A Specialized B-tree for Concurrent Datalog Evaluation Herbert Jordan1, Pavle Subotić3, David Zhao2, and Bernhard Scholz2 PPoPP 2019, 16-20 February 2019, Washington, DC 1) 2) 3) University of Innsbruck University of Sydney University College London
2. Datalog (by Example) a c b d e g f graph from to a b a c b f c e d a d c … … Which nodes are connected? edge relation 2
3. Datalog (by Example) a c b d e g f graph from to a b a c b f c e d a d c … … edge relation path(X,Y) :- edge(X,Y). path(X,Z) :- path(X,Y), edge(Y,Z). Datalog query 3
4. Datalog › Benefits: – a concise formalism for powerful data analysis – lately major performance improvements and tool support › Applications: – – – – data base queries program analysis security vulnerability analysis network analysis 100s of relations and rules, billions of tuples, all in-memory 4
5. Query Processing relations set of integer tuples rules sequence of relational algebra operations on sets 5
6. Example path(X,Z) :- path(X,Y), edge(Y,Z). ,(-#" ← !"#ℎ while ( ,(-#" ≠ ∅ ) { '() ← *(,(-#" ⋈ (,/() ∖ !"#ℎ !"#ℎ ← !"#ℎ ∪ '() } computational expensive and dominating part ,(-#" ← '() 6
7. Example !"# ← %('"()* ⋈ "',") ∖ /*)ℎ Relation new; for t1 ∈ delta { auto l = edge.lower_bound( { t1[1], 0 } ); auto u = edge.upper_bound( { t1[1]+1, 0 } ); for t2 ∈ [l,u] { Tuple t3 = { t1[0], t2[1] }; if ( t3 ∉ path ) { new.insert(t3); } } } 7
8. Example !"# ← %('"()* ⋈ "',") ∖ /*)ℎ Relation new; #pragma omp parallel for for t1 ∈ delta { auto l = edge.lower_bound( { t1[1], 0 } ); auto u = edge.upper_bound( { t1[1]+1, 0 } ); for t2 ∈ [l,u] { Tuple t3 = { t1[0], t2[1] }; if ( t3 ∉ path ) { new.insert(t3); } one write access } (assignment) } all read accesses (right hand side) But: write target is never read on right hand side! 8
9. Needed › efficient data structure for relations – maintain set of n-dimensional tuples – efficient support for › › › › › insertion, scans, range queries, membership tests, emptiness checks – efficient synchronization of concurrent inserts well supported by B-trees not so much … 9
10. B-tree (1,1) (1,2) (3,2) (4,7) (5,3) (8,2) (6,9) (7,4) (8,7) (9,2) (9,4) › Insertion: – locate target leaf node – split leaf node if necessary, may propagate up – insert element in sorted leaf-node element array 10
11. B-tree Locking Strategies global lock btree fine grained btree read write btree million insertions / second 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 number of threads random order, on 4x8 core Intel Xeon E5-4650 11
12. Seqlocks / Optimistic Locking SN: 1234 sequence number: • even: no writer active • odd: one writer active Shared Data 12
13. SN: 1234 Shared Data Seqlocks › Reader: <not even> read seq number <seq number changed> perform read access check seq number perform write access inc seq number › Writer: <was not even> set last bit in SN 13
14. SN: 1234 Shared Data Optimistic Read/Write Lock › Reader <seq number changed> <not even> read seq number perform read need write perform read access access access? check seq number <yes> <fail> set last bit in SN perform write access inc seq number 14
15. Optimistic B-tree › Protect nodes and root pointer with optimistic R/W lock › Synchronize insert operation – read access on inner nodes, update to write when necessary › Key challenge: – pointer indirection – concurrency memory model SN: 4 (1,1) (1,2) (3,2) (4,7) SN: 1234 (5,3) (8,2) SN: 12 (6,9) (8,7) (7,4) SN: 321 (9,2) (9,4) SN: 254 15
16. B-tree Locking Strategies (cont) global lock btree fine grained btree read write btree optimistic btree million insertions / second 14 12 10 8 6 4 2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 number of threads random order, on 4x8 core Intel Xeon E5-4650 16
17. google btree seq btree btree google btree seq btree btree STL rbtset STL hashset TBB hashset STL rbtset STL hashset TBB hashset 25 20 15 10 5 0 1000² 2000² 5000² total elements inserted ordered insertion 10000² million insertions/s million insertions/s Sequential Performance 5 4 3 2 1 0 1000² 2000² 5000² 10000² total elements inserted random order insertion (additional data structures covered in paper) 17
18. Parallel Performance btree google btree reduction btree TBB hashset 50 40 30 20 10 0 1 3 5 7 up to 10 x faster th an TBB 9 11 13 15 17 19 21 23 25 27 29 31 million insertions/s million insertions/s up to 2 7x btree google btree reduction btree TBB hashset faster t han TBB 10 8 6 4 2 0 1 3 number of threads 5 7 9 11 13 15 17 19 21 23 25 27 29 31 number of threads ordered insertion random order insertion 4x8 core Intel Xeon E5-4650 18
19. Other Concurrent Tree Data Structures 1. 5 - 2. 9 B-tree Masstree B-slack PALM tree x faster B-tree B-slack x faster PALM tree 120 million insertions/second million insertions/second 20 Masstree 1. 5 - 2. 6 15 10 5 0 100 80 60 40 20 0 ordered random Insertion order single-threaded ordered random Insertion order 8 threads 19
20. Datalog Query Processing btree STL rbtset google btree TBB hashset 1. 4 - 7. 7 x faster STL hashset btree STL rbtset google btree TBB hashset 3000 700 2500 600 total query time [s] total query time [s] 1. 4 - 6. 3 2000 1500 1000 500 0 x faster STL hashset 500 400 300 200 100 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 number of threads number of threads context sensitive var-points-to analysis security vulnerability analysis 20
21. Conclusion › Developed concurrent set for Datalog relations: – B-tree foundation › good sequential performance, cache friendly – Fine-grained synchronization › based on customized seqlock variant › Results: – up to 59x faster than state-of-the-art hash based sets – up to 2.9x faster than state-of-the-art tree based sets – up to 7.7x faster for real-world query processing › Future work: – investigate other data structures for specialized use cases 21
22. Thank you! visit us on https://souffle-lang.github.io sources: https://github.com/souffle-lang/souffle 22
23. B-tree Locking Strategies global lock btree fine grained btree read write btree tbb::hash_set reduction btree million insertions / second 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 number of threads random order, on 4x8 core Intel Xeon E5-4650 44
24. B-tree Locking Strategies (cont) global lock btree fine grained btree read write btree tbb::hash_set reduction btree optimistic btree million insertions / second 14 12 10 8 6 4 2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 number of threads random order, on 4x8 core Intel Xeon E5-4650 45