大规模实时图计算在PayPal风险管理系统的应用 张彭善 (2)

Razor

2019/10/19 发布于 技术 分类

文字内容
1. Large Scale Online Graph on Aerospike at PayPal Risk
3. • PayPal Risk & Online Graph Applications Agenda • Online Graph Platform • Online Graph Linking Cases • Aerospike at PayPal ©2019 PayPal Inc. Confidential and proprietary. 3
4. • PayPal Risk & Online Graph Applications Agenda • Online Graph Platform • Online Graph Linking Cases • Aerospike at PayPal ©2019 PayPal Inc. Confidential and proprietary. 4
5. PayPal Risk: Building Trust in a New World Industry Trends Redefining the Way PayPal Builds Trust Between Buyers and Sellers Trust & Protection CHIEF RISK OFFICER = CHIEF TRUST OFFICER 500M to 1B identities stolen globally; $32M in U.S. retail fraud losses1 Financial Risk Security Compliance Sources: 1 Nielsen, Dept of Commerce, JP Morgan 2015 ©2019 PayPal Inc. Confidential and proprietary. 5
6. Data Analytical Solution of Risk Fraud Detection * Different kinds of models adopted in different fraud cases Linear/Logistic Regression * Strategies is tree-based rules based on machine learning model scores * Rules for some fraud trend which cannot be reflected in models in time * Graph enhanced new dimension for both models/strategies/rules and agents Neural Networks Ensemble/Embedding Models Strategies Rules Agents Tree Ensemble Models Historical Behavior Data ©2019 PayPal Inc. Confidential and proprietary. Streaming Behavior Data Graph Linking Behavior Data 6
7. Real Time Graph Risk Opportunities - Repeated Offenders Linking is an efficient method to identify repeated offenders by finding they share the same private assets (e.g., bank account) with bad accounts who offended our platform before. XXXXX XX Bad date 2017/12/18 XXXXX XX • Buyer claims INR/SNAD and sellers are identified as bad sellers • An unknown MY account adds this bank account is suspicious XXXXX XX Bad date 2018/01/31 add add on 2018/11/21 Bank Account xx7846 XXXXX XXXX d ad d ad add ad date Bad date 2017/07/09 d ad d ad • 7 bad seller accounts sell counterfeit shoes on websites • 7 bad sellers share the same MY bank account XXXXX XXX Bad date 2017/08/09 add Southeast Asia Sample Case XXX XXXXX Bad date 2018/01/16 XXXXX XXX Bad date 2018/02/07 XXXX XXX Bad date 2018/01/11 ©2019 PayPal Inc. Confidential and proprietary. 7
8. Real Time Graph Risk Opportunities - Fast Hit & Run Graph Real-time Features Bad IT Account Online Solution Available Create account, DOF=0 Bad Cookie Linked Receive $680 2019/01/04 2019/01/04 16:54 20:50 Balance Sent $680 2019/01/04 2019/01/06 23:37 1. One new account is linked to one bad account by cookie in 2019/01/04 2. Later fraud transactions from such new account 3. With online graph linking features, new rule has detected such fraud transactions in 2019/01/06 when money to be exit ©2019 PayPal Inc. Confidential and proprietary. 8
9. • PayPal Risk & Online Graph Applications Agenda • Online Graph Platform • Online Graph Linking Cases • Aerospike at PayPal ©2019 PayPal Inc. Confidential and proprietary. 9
10. Overview of PayPal Risk Real Time Graph Vertex Account ID creation_time last_modified_time IP last_name YYY account_status … Edge Account -> IP Cookie XXX src_id_dst_id creation_time last_modified_time link_count … ©2019 PayPal Inc. Confidential and proprietary. 10
11. Dynamic Evolving Graph & Its Vertex Centric Storage Model Schema Storage Model Graph Vertex1 IP Vertex Account Vertex Edge1 Key:'>Key:'>Key:'>Key:'>Key:'>Key:'>Key:'>Key: Account ID Key:'>Key:'>Key:'>Key:'>Key:'>Key:'>Key:'>Key: IP Column1:'>Column1:'>Column1:'>Column1:'>Column1:'>Column1:'>Column1:'>Column1: Properties id id Column1:'>Column1:'>Column1:'>Column1:'>Column1:'>Column1:'>Column1:'>Column1: Properties property1 property1 Column2:'>Column2:'>Column2:'>Column2:'>Column2:'>Column2:'>Column2:'>Column2: Edge Account-> IP property2 property2 Column3: Edge Account-> Cookie … … Column4: Edge Account-> XXX Vertex2 Edge2 Column2:'>Column2:'>Column2:'>Column2:'>Column2:'>Column2:'>Column2:'>Column2: Edge IP <- Account Cookie Vertex Key:'>Key:'>Key:'>Key:'>Key:'>Key:'>Key:'>Key: Cookie ID Column1:'>Column1:'>Column1:'>Column1:'>Column1:'>Column1:'>Column1:'>Column1: Properties … Column2:'>Column2:'>Column2:'>Column2:'>Column2:'>Column2:'>Column2:'>Column2: Edge Cookie <- Account id id property1 property1 property2 property2 Key:'>Key:'>Key:'>Key:'>Key:'>Key:'>Key:'>Key: XXX ID … … Column1:'>Column1:'>Column1:'>Column1:'>Column1:'>Column1:'>Column1:'>Column1: Properties XXX Vertex Column2:'>Column2:'>Column2:'>Column2:'>Column2:'>Column2:'>Column2:'>Column2: Edge XXX <- Account Dynamic Evolving Graph Nov 2018 Initial Loading Account, IP Vertices & Edges March 2019 Loading New XXX/YYY Vertices & Edges ©2019 PayPal Inc. Confidential and proprietary. May 2019 July 2019 Sep 2019 NRT Updating New Properties in Account & IP Loading New Properties in Account, New Vertex … * Graph Write Policy is merge-update to support new vertices/edges. 11
12. Offline Batch Graph Loading & Near Real Time Graph Updates Data Repair By Offline Data Periodical Graph Compute Job Offline Generic Graph Offline-toonline Table (Hive) Offline Graph Backfill Job Kafka 1. 2. 3. 4. Schema & Data Parsing Vertex Creation or Update Edge Creation or Update WAL Dump Graph Load Daemon Hourly/Daily/… Publish Job Graph Data Recovery Job NRT & Offline loading merge update by same timestamp. Failed Write Operations Aerospike Signup Event Login Event Near Real Time Kafka Graph Process Daemon Add Bank Event Status Change Event Transaction Event ©2019 PayPal Inc. Confidential and proprietary. 1. 2. 3. 4. Event VO Mapping Vertex Creation or Update Edge Creation or Update WAL Dump 12
13. Gremlin-Based Traversal Queries (Query Engine) A powerful scalable graph DSL from http://tinkerpop.apache.org Cookie g.V().hasLabel('IP').has('ip', '192.168.1.1').inE('IP2Account').out('Cookie2Account').count() Cookie IP MilkywayGraphStep MilkywayEdgeStep MilkywayVertexStep CountStep Cookie Cookie Cookie Query Optimizer: Strategies Schema Parsing AsyncVertexEdgeLoader Aerospike 1. 2. By ID as Key Query from Aerospike; Populate as Vertex or Edge with properties by parsing schema. ©2019 PayPal Inc. Confidential and proprietary. Result is 5; Logic is counting 2nd hop of cookies. 13
14. Gremlin Query Performance Optimizations Query in Parallel Seed Query with Larger Timeout Request Level Cache In Parallel Load from DB In Parallel IP IP IP Read from Cache IP IP IP In Parallel Cache Entry 30ms timeout 20ms timeout 20ms timeout * Linked vertices are queried in parallel. * Seed query timeout is 1.5X than others. ©2019 PayPal Inc. Confidential and proprietary. * Cache vertices and edges in local memory in request level. 14
15. Overview of Online Graph Platform Graph Visualization Solution Platform Risk Fraud Detection Gremlin API Real-time Events Graph Query Service Risk TM Tools API Gateway Compliance Investigation Graph Query Engine Write Graph Storage Engine Real-time Graph DB Update Offline Event Consumer Graph WAL logs Credit Unified Graph Data Model Query Online Event Consumer Graph Persist Engine Graph Query Engine Update Graph process engine Graph Persist Engine Marketing Online Offline Offline Milkyway Graph DB Compliance Detection Offline-to-Online Data Pipeline KV Storage Graph WAL logs Hadoop/Hive ©2019 PayPal Inc. Confidential and proprietary. 15
16. • PayPal Risk & Online Graph Applications Agenda • Online Graph Platform • Online Graph Linking Cases • Aerospike at PayPal ©2019 PayPal Inc. Confidential and proprietary. 16
17. Typical Real Time Graph Traversal Queries Bad Account Rate from IP Max Bad Account Rate from Account->IP>Account # of Permanent Cookies Linked to One IP Cookie IP Cookie IP IP Cookie Cookie IP Cookie g.V().hasLabel('IP') .has('ip', '192.168.1.1') .in(‘IP2Account').fold() .project('totalCount', 'badCount') .by(unfold().count()) .by(unfold().has('badTag', 1).count()) .math('badCount / totalCount') ©2019 PayPal Inc. Confidential and proprietary. g.V().hasLabel('Account') .has('CustID', 123).in(‘IP2Account') .map(local( out(‘IP2Account').fold() .project('totalCount', 'badCount') .by(unfold().count()) .by(unfold().has('badTag', 1).count()))) .math('badCount / totalCount').inject(-1.0).max() g.V().hasLabel('IP').has('ip', '192.168.1.1') .in(‘IP2Account') .out(‘Cookie2Account') .filter { it.get().value(‘type').equals(‘permanent') } .count() 17
18. Real Time Subgraph Query for Complicated Linking Computation Subgraph Query Real Time Clustering on Subgraph IP XXX Cookie YYY g.V().hasLabel('Account') .has('CustID', 123) .in().store('sg') .out().store('sg') .cap('sg').subgraph() ©2019 PayPal Inc. Confidential and proprietary. g.V().hasLabel('Account') .has('CustID', 123) .out('txn').store('sg') .out('txn').store('sg') .cap('sg').subgraph() .clustering('algorithm1') 18
19. Real Time User-Defined Vertex to Scale Linking 1. User-defined vertex (UDV) can be dynamically defined; UDV4 UDV1 UDV2 UDV3 2. User-defined vertex can be defined as combination like property 1 + property 2 + prefix of property 3; 3. Take examples: 1) UDV1 = f1(x, y) 2) UDV2 = f2(x, y); 3) … *User-defined vertex should be defined well to avoid hot spot in graph, extreme case like one-> all or one->one. ©2019 PayPal Inc. Confidential and proprietary. 19
20. Performance, Scalability, Availability 50 Billions 20ms 110ms 99.9% Vertices/Edges Average Latency P99 Latency Availability How? ©2019 PayPal Inc. Confidential and proprietary. 20
21. • PayPal Risk & Online Graph Applications Agenda • Online Graph Platform • Online Graph Linking Cases • Aerospike at PayPal ©2019 PayPal Inc. Confidential and proprietary. 21
22. Aerospike at PayPal Storage Growth Trend 700 650 In-Memory Caching product 600 Hybrid Memory Aerospike 500 450 TB 400 300 210 200 100 0 90 0.5 2011 3 2012 12 18 30 2013 2014 2015 2016 2017 2018 2019 Data Growth (TB) ©2019 PayPal Inc. Confidential and proprietary. 22
23. Why Aerospike as Real-time Graph Storage Aerospike Anatomy High density storage with Hybrid Memory Architecture Linear horizontal scale – CPU, Memory, Disk Highly available, Shared nothing architecture, XDR replication Persistent Memory/Shared Memory support for fast DB restarts/OS reboots o Flexible data model o Async non-blocking IO support for Java client (Netty) o o o o o ©2019 PayPal Inc. Confidential and proprietary. 23
24. Aerospike Anatomy Server NoSQL KV Database Written in C AP (Eventual) and CP (Strong) modes In-Memory or Hybrid-Memory Modes Uses Linux Shared Mem or Persistent Memory for quick restarts • Low Disk write amplification (up to 2) • SSD optimized - block storage • UDF for server side computes • • • • • Designed for SSDs. (Even wear and tear on Device) Proprietary file system Key, Value Client • C, C++, Java, Go, C#, NodeJS, PHP, Python, Ruby, Perl, Rust Console • AMC Hybrid Storage – Predictable capacity. (Key=64Bytes, Value=value, RIPEMD-160 Hash) ©2019 PayPal Inc. Confidential and proprietary. 24
25. Storage Architecture In-Memory Cache/DB Hybrid-Memory Memory-First Client Client Read Path With Cache Hit Read Path Read Path With Cache Miss Database Memory Client Database Memory Read Path Database Memory Async/Sync Persistence Database Disk Latency - Ultra Low Throughput - Consistent ©2019 PayPal Inc. Confidential and proprietary. Database Disk Latency - Varying Throughput - Varying Database Disk Latency High Throughput - Low or - Consistent 25
26. Performance Cost In-Memory Cache (50TB) High Storage Density 5x Total Cost = $1.8m Space/Power Total Cost = $12.5m Availability Aerospike (Hybrid Memory NoSQL) (50TB) 3 Racks # of servers = 120 18 Racks # of server 1024 8x • Inconsistent Performance • Average Throughput – 200K TPS • Low latency (~1ms Avg) Before 99.5 ATB 10x ©2019 PayPal Inc. Confidential and proprietary. After • Consistent Performance • Avg Throughput – 2M TPS • Ultra low latency (~200us Avg) 99.99+ ATB 26
27. Linear Scale – CPU, Memory CPU 5% Load 250M Max Write = 200K TPS Memory NVMe 5% 40 cores 15G B Available = 384GB Available = 384GB 100% Utilization 100% Utilization Available = 1.92TB x 1 SATA RI - SSD ©2019 PayPal Inc. Confidential and proprietary. 268GB Available = 1.92TB x 1 SATA RI - SSD Load 1B Max Write = 200K TPS 60GB Available = 384GB 100% Utilization 1TB Available = 1.92TB x 1 SATA RI - SSD 27
28. Linear Scale – Disk CPU 10% 40 cores Memory Available = 384GB Load 1B Max Write = 400K TPS 100% Utilization NVMe Available = 1.92TB x 1 NVMe RI - SSD ©2019 PayPal Inc. Confidential and proprietary. 60GB Available = 384GB 100% Utilization Available =500GB 1.92TB x 1 NVMe RI - SSD Available =500GB 1.92TB x 1 NVMe RI - SSD 28
29. Highly available, Shared Nothing , XDR Multi-DC Deployment Topology High speed replication with XDR Data Availability ©2019 PayPal Inc. Confidential and proprietary. 29
30. Indexes on DRAM or Persistent Memory DB Restarts < 1 min (Index in Shared memory) < 1 min (Index in Persistent Memory) OS Reboots < 40 min (Index in Shared memory) < 1 min (Index in Persistent Memory) ©2019 PayPal Inc. Confidential and proprietary. NVDIMM-P 30
31. Flexible Schema, Async Event loop support DB Namespace1 Set1 Namespace(2) Set(2) key1 Namespace(3) key3 Set(3) bin1 bin2 • Strongly typed • Rows contain cells (bins) with strings, integers, blobs, lists, maps, and serialized objects • Async programming interfaces – Single record, Batch, Scan, Query ©2019 PayPal Inc. Confidential and proprietary. key2 bin3 bin1 bin2 bin3 bin1 bin2 bin3 31
32. ©2017 PayPal Inc. Confidential and proprietary.
33. Thanks!