Postgres-XL_XC, Scale-out Approach in PostgreSQL

国文栋

2018/05/13 发布于 技术 分类

今年是中国数据库技术大会第六个年头,大会将继续秉承分享IT最佳应用实践的宗旨,围绕传统数据库和大数据两条技术主线,在目前IT技术和管理快速的大背景下,更加深入地探讨数据库技术的现状和未来的发展方向,以及我们在这个转型过程中的实践经验和教训。

文字内容
1. DTCC 2015 Database Technology Conference China 2015 Postgres-XL/XC Scale-out Approach in PostgreSQL May 18th, 2015 NTT DATA INTELLILINK Corporation Koichi Suzuki Copyright © 2015 NTT DATA INTELLILINK Corporation
2. Introduction Copyright © 2015 NTT DATA INTELLILINK Corporation 2
3. About the Speaker ● Fellow at NTT DATA Intellilink Corporation ● Principal Technology Professionals at NTT DATA Group In Charge Of ● General Database Technology ● Database in huge data warehouse and its design ● PostgreSQL and its cluster technology In The Past ● Character Set Standard (Extended Unix Code, Unicode, etc) ● Heisei-font development (Technical Committee) ● Oracle Porting ● Object-Relational Database Copyright © 2015 NTT DATA INTELLILINK Corporation 3
4. Motivation ● Growing Database Workload both for OLTP (OnLine Transaction Processing) and OLAP (OnLine Analytical Processing) applications. ● Shared-Nothing Approach ● Performance with commodity hardware/software ● Extension to existing PostgreSQL ● Transparent API ● Internal API could be different ● Transparent libpq Interface ● No significant restriction on transaction ACID properties and SQL language. Copyright © 2015 NTT DATA INTELLILINK Corporation 4
5. Scale-out approach ● Distribution/Replication of table rows among different database “nodes” ● Parallelism ● Local join operation ● SQL planning for row distribution/replication ● Consistent and synchronous transaction management among “nodes” ● Performance with commodity hardware/software Copyright © 2015 NTT DATA INTELLILINK Corporation 5
6. Read Scale-out in PostgreSQL Master/Slave Read/Write Transactions Possible time delay Read-only Transactions Master WAL (or Redo Log) Copyright © 2015 NTT DATA INTELLILINK Corporation Slave 6
7. Scaling Out in Postgres XL/XC Read/Write Transactions No Delay in Update Visibility Local Disk Local Disk Local Disk Local Disk Backend Transaction Synchronization Copyright © 2015 NTT DATA INTELLILINK Corporation 7
8. OLTP Workload Scalability and Table Design Copyright © 2015 NTT DATA INTELLILINK Corporation 8
9. DBT-1 Workload Scalability Copyright © 2015 NTT DATA INTELLILINK Corporation DBT-1 (Rev) 9
10. Table Design in DBT-1 Benchmark CUSTOMER ORDERS ORDER_LINE ITEM SHOPPING_CART C_ID C_UNAME C_PASSWD C_FNAME C_LNAME C_ADDR_ID C_PHONE C_EMAIL C_SINCE C_LAST_VISIT C_LOGIN C_EXPIRATION C_DISCOUNT C_BALANCE C_YTD_PMT C_BIRTHDATE C_DATA ADDRESS ADDR_ID ADDR_STREET1 ADDR_STREET2 ADDR_CITY ADDR_STATE ADDR_ZIP ADDR_CO_ID ADDR_C_ID O_ID O_C_ID O_DATE O_SUB_TOTAL O_TAX O_TOTAL O_SHIP_TYPE O_BILL_ADDR_ID O_SHIP_ADDR_ID O_STATUS Distributed with Customer ID OL_ID OL_O_ID OL_I_ID OL_QTY OL_DISCOUNT OL_COMMENTS OL_C_ID CC_XACTS CX_I_ID CX_TYPE CX_NUM CX_NAME CX_EXPIRY CX_AUTH_ID CX_XACT_AMT CX_XACT_DATE CX_CO_ID CX_C_ID Replicated COUNTRY CO_ID CO_NAME CO_EXCHANGE CO_CURRENCY AUTHOR OL_ID OL_O_ID OL_I_ID OL_QTY OL_DISCOUNT OL_COMMENTS OL_C_ID I_ID I_TITLE I_A_ID I_PUB_DATE I_PUBLISHER I_SUBJECT I_DESC I_RELATED1 I_RELATED2 I_RELATED3 I_RELATED4 I_RELATED5 I_THUMBNAIL I_IMAGE I_SRP I_COST I_AVAIL I_ISBN I_PAGE I_BACKING I_DIMENASIONS STOCK ST_I_ID ST_STOCK SC_ID SC_C_ID SC_DATE SC_SUB_TOTAL SC_TAX SC_SHIPPING_COST SC_TOTAL SC_C_FNAME SC_C_LNAME SC_C>DISCOUNT Distributed with Shopping Cart ID SHOPPING_CART_LINE SCL_SC_ID SCL_I_ID SCL_QTY SCL_COST SCL_SRP SCL_TITLE SCL_BACKING SCL_C_ID Distributed with ItemID Copyright © 2015 NTT DATA INTELLILINK Corporation 10
11. Scale Out Approach (1): Table Distribution/Replication Categorize tables into two groups: Large and frequently-updated tables → Distribute rows among nodes (Distributed Tables) → Based on a column value (distribution key) → Hash, modulo or round-robin → Parallelism among transactions (OLTP) or in SQL processing (OLAP) Smaller and stable tables → Replicate among nodes (Replicated Tables) → Join Pushdown Avoid joins between Distributed Tables with join keys different from distribution key as possible. Copyright © 2015 NTT DATA INTELLILINK Corporation 11
12. Scale Out Table Design in DBT-1 Three distribution keys: ● Customer ID ● Shopping Cart ID ● Item ID Some transactions involve joins across distributed tables with non-distribution join keys. Copyright © 2015 NTT DATA INTELLILINK Corporation 12
13. Some More in XL/XC Node Configuration Copyright © 2015 NTT DATA INTELLILINK Corporation 13
14. Node Configuration: Two-Tier Approach Coordinator: ● Holds catalog information ● Build global SQL plan ● Tell Datanode what to do in SQL statements Datanode ● Holds actual data ● Run local SQL statement from Coordiator (In XL, datanode may ask other datanodes for their local data) Copyright © 2015 NTT DATA INTELLILINK Corporation 14
15. Coordinator and Datanode Read/Write Transactions Coordinator Datanode Copyright © 2015 NTT DATA INTELLILINK Corporation 15
16. Node Configuration: Yet Another Node: GTM GTM: Global Transaction Manager Synchronizes each node's transaction status Copyright © 2015 NTT DATA INTELLILINK Corporation 16
17. Why GTM? Two-Phase Commit Protocol doesn't work? Two-Phase Commit Protocol Does: ● Maintain database consistency in transactions updating more than one node. Two-Phase Commit Protocol Doesn't: ● Maintain Atomic Visibility of Updates to other transactions (next slide) Copyright © 2015 NTT DATA INTELLILINK Corporation 17
18. Atomic Visibility and GTM TXN 1 Updates A and B Node A Prepares A and B Commits A and B Node B Inconsistent Read! TXN 2 Reads B and gets old value Reads A and gets new value GTM monitors TXN activity and make new value available at this timing. Copyright © 2015 NTT DATA INTELLILINK Corporation 18
19. Final Configuration: GTM, Coordinator and Datanode Read/Write Transactions Coordinator GTM Datanode Copyright © 2015 NTT DATA INTELLILINK Corporation 19
20. Configuration in Practice Just like configuring many database servers to talk each other ● Many pitfalls ● Pgxc_ctl provides simpler way to configure the whole cluster ● Provide only needed parameters ● Pgxc_ctl will do the rest to issue needed commands and SQL statements. – Visit http://sourceforge.net/p/postgres-xc/xc-wiki/PGOpen2013_Postgres_Open_2013/ Copyright © 2015 NTT DATA INTELLILINK Corporation 20
21. Scalability in OLTP Workloads Copyright © 2015 NTT DATA INTELLILINK Corporation 21
22. OLTP Workload Characteristics Number of Transactions: Many Number of Involved Table Rows: Small Locality of Row Allocation: High Update Frequency: High Copyright © 2015 NTT DATA INTELLILINK Corporation 22
23. Scaling Out OLTP Workload Read/Write Transactions Run Transactions in Parallel Coordinator GTM High workload Datanode Copyright © 2015 NTT DATA INTELLILINK Corporation 23
24. Scalability in OLAP (Analytic) Workloads Copyright © 2015 NTT DATA INTELLILINK Corporation 24
25. OLAP Workload Characteristics Number of Transactions: Small Number of Involved Table Rows: Huge Locality of Row Allocation: Low Update Frequency: Low Copyright © 2015 NTT DATA INTELLILINK Corporation 25
26. Scaling Out OLAP Workload Coordinator GTM SQL Top level aggregation May need less coordinators Low workload Datanode Copyright © 2015 NTT DATA INTELLILINK Corporation Run Small Local SQLs for each Datanode in Parallel 26
27. Join Offloading Copyright © 2015 NTT DATA INTELLILINK Corporation 27
28. Join Offloading: When row allocation is available ● Replicated Table and Partitioned Table – Can determine which datanode to go from WHERE clause Copyright © 2015 NTT DATA INTELLILINK Corporation 28
29. Join Offloading: When row allocation is available ● Replicated Table and Partitioned Table – When the coordinator cannot determine which datanode to go from WHERE clause Copyright © 2015 NTT DATA INTELLILINK Corporation 29
30. Parallel Aggregation Copyright © 2015 NTT DATA INTELLILINK Corporation 30
31. Aggregate Functions in PostgreSQL Finalize Function State Transition Function Copyright © 2015 NTT DATA INTELLILINK Corporation 31
32. Aggregate Functions in Postgres-XC/XL Finalize Function ← AVG (Sum, Count) Collector Function Coordinator (Sum, Count) StSatStFaetutFaeTnutFerTcnautrTcniaotrscninaoitstninioiostniniotinon Datanode Similar to Map Reduce! Copyright © 2015 NTT DATA INTELLILINK Corporation 32
33. Community status and future Copyright © 2015 NTT DATA INTELLILINK Corporation 33
34. XC and XL community ● Postgres-XC is the original community – Based upon PostgreSQL 9.3 – Tested more for OLPT workload ● Postgres-XL was became separate community for more product-oriented and better stability – Based upon PostgreSQL 9.2 – Shares most of XC code base – Tested more for OLAP workload ● Direct data capture between datanodes – Provide many fixes. Most of them apply to XL as well ● Unified again? Copyright © 2015 NTT DATA INTELLILINK Corporation 34
35. Product status ● Source code inherits all the PostgreSQL repository (at some point) ● Fundamental features are all available – Global transaction management – SQL statements – Utilities ● Further challenges – Subtransaction (needed for full function support) – Catching up PostgreSQL (needed?) – Copyright © 2015 NTT DATA INTELLILINK Corporation 35
36. XC and XL community ● Both communities need much more resource to move forward – Developer – Tester – Real workload Copyright © 2015 NTT DATA INTELLILINK Corporation 36
37. XC and XL community sites Postgres-XC http://sourceforge.net/projects/postgres-xc/ koichi.dbms@gmail.com Postgres-XL http://www.postgres-xl.org/ Copyright © 2015 NTT DATA INTELLILINK Corporation 37
38. Copyright © 2015 NTT DATA INTELLILINK Corporation