京东 吴国晓 Spark With Cloud Native JVM Profiling

CodeWarrior

2019/07/08 发布于 编程 分类

GIAC2019 

文字内容
2. About Me • • wuguoxiao@jd.com • @ • • Spark • 15 & TPS
3. Agenda • Apache Spark at JD • JVM • Cloud Native JVM Profiling • •
4. Apache Spark at • 10W+ Spark workload • 10W+ Hive workload Spark x 10000 Spark App Scale 12 10 8 6 4 2 0 2018 Q1 2018 Q2 2018 Q3 Application 2018 Q4 2019 Q1
5. Agenda • Apache Spark at JD • JVM • Cloud Native JVM Profiling • •
6. JVM • JVM • • • JVM • • JVM
7. JVM • Hight CPU / Hot Method ü ü ü
8. JVM • Memory Leak ü ü ü Live Set ü Used Heap GC Pause
9. JVM • Allocations ü ü ü Allocations
10. Agenda • Apache Spark at • JVM • Cloud Native JVM Profiling • •
11. Cloud Native JVM Profiling Spark Runtime Executor Program Driver Program JD-JVM javaagent JD-JVM javaagent Troubleshooting Service JVM Profiling Web
12. Cloud Native JVM Profiling Kubernetes Profiling Service K8s API Server Troubleshooting Service JVM Profiling Web Pod JD-JVM
13. Agenda • Apache Spark at JD • JVM • Cloud Native JVM Profiling • •
14. – JDK Flight Recorder • • •
15. – JDK Flight Recorder • • 7u4 Oracle JDK • • JVM 3% • • Ad-hoc Always-running “after-the-fact”
16. – JDK Flight Recorder • Application • • JVM Internal • OS
17. – JDK Flight Recorder • High JVM CPU Load • Method Profiling • Free Memory Application • GC Pressure • File I/O JVM Internal • Socket I/O • Thrown Exceptions/Errors OS
18. – JDK Flight Recorder • Fatal Error • GC Pauses Application • Metaspace Live Set Trend • TLAB Allocation Ratio JVM Internal OS
19. – JDK Flight Recorder • Competing CPU Usage • Competing Processes Application • Passwords in Environment • Passwords in Properties JVM Internal OS
20. – Container Awareness • • Backport to JD-JDK • Container.metrics() • thread thread gc jit compile
21. – AppCDS 0.45 0.4 • • • • Spark SQL workload startup footprint container /day 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 startup Nornal With AppCDS
22. Kubernetes Spark Fink Storm K8s API Server Profiling Service Binlog Sync
23. Agenda • Apache Spark at JD • JVM • Cloud Native JVM Profiling • •
24. – Hot Method • Driver Hang6 + • • Spark • JVM Profiling
25. – Hot Method • • •
26. – Hot Method • • • LZO
27. – Allocations • Spark Job Failed • Mem • Container Killed • RSS •
28. – Allocations • • • map outputs Reduce Task Buffer 2 3
29. – Native Memory Tracking Task • kill • NMT • Could Native JVM Profiling NMT
30. – Native Memory Tracking • NMT -XX:NativeMemoryTracking=[summary detail] • • jcmd <pid> VM.native_memory baseline jcmd <pid> VM.native_memory detail.diff • Spark • javaagent / NMT
31. – Native Memory Tracking Native Memory Tracking: Total: reserved=28483532KB +9683KB, committed=27185624KB +9683KB <--- total memory changes vs. earlier baseline - Java Heap (reserved=25165824KB, committed=25165824KB) (mmap: reserved=25165824KB, committed=25165824KB) <--- Java Heap - Class (reserved=1107296KB, committed=65888KB) (classes #9815) (malloc=1376KB #15226) (mmap: reserved=1105920KB, committed=64512KB) - Thread (reserved=104593KB, committed=104593KB) (thread #102) (stack: reserved=103668KB, committed=103668KB) (malloc=326KB #519) (arena=599KB #203) - - Code (reserved=256427KB, committed=49079KB) (malloc=6827KB #11197) (mmap: reserved=249600KB, committed=42252KB) GC (reserved=1034218KB +54KB, committed=1034218KB +54KB) (malloc=67562KB +54KB #30687) (mmap: reserved=966656KB, committed=966656KB) - Compiler (reserved=300KB, committed=300KB) (malloc=169KB #410) (arena=131KB #3) - Internal (reserved=745258KB +8330KB, committed=745258KB +8330KB) (malloc=745226KB +8330KB #17436 +976) (mmap: reserved=32KB, committed=32KB) - - - - Symbol (reserved=15547KB, committed=15547KB) (malloc=14068KB #125300) (arena=1479KB #1) Native Memory Tracking (reserved=3483KB +84KB, committed=3483KB +84KB) (malloc=275KB +56KB #4257 +821) (tracking overhead=3209KB +29KB) Arena Chunk (reserved=1435KB +1215KB, committed=1435KB +1215KB) (malloc=1435KB +1215KB) Unknown (reserved=49152KB, committed=0KB) (mmap: reserved=49152KB, committed=0KB) <--- class metadata <--- number of loaded classes <--- malloc'd memory, #number of malloc <--- number of threads <--- memory used by thread stacks <--- resource and handle areas
32. – Native Memory Tracking NMT • • GC Heap • container overhead https://shipilev.net/jvm/anatomy-quarks/12-native-memory-tracking/