Monitoring Kubernetes with eBPF and Prometheus

1. Monitoring Kubernetes with eBPF and Prometheus KubeCon North America 2018
2. Agenda • Flow monitoring: benefits • Getting flow data • Technology: eBPF • Tour of our staging cluster • Productizing: Challenges
3. Flow Monitoring: benefits Architecture, HA, Env isolation Icons: CC-BY-4.0 Health Cost
4. Getting Flow Data $ kubectl describe pod $POD Name: Namespace: ... Status: IP: Controlled By: A A→X X Running ReplicaSet/A # PID=`docker inspect -f '{{.State.Pid}}' $CONTAINER` \ nsenter -t $PID -n ss -ti A (A,X) iptables A staging ESTAB 0 0 cubic wscale:9,9 rto:204 rtt:0.003/0 mss:1448 cwnd:19 ssthresh:19 bytes_acked:2525112 segs_out:15664 segs_in:15578 data_segs_out:15662 send 73365.3Mbps lastsnd:384 lastrcv:10265960 lastack:384 rcv_space:29200 minrtt:0.002 ~ (A,B) A→B # conntrack -L B X X B A tcp 6 86399 ESTABLISHED src= dst= sport=34940 dport=8000 src= dst= sport=8000 dport=34940 [ASSURED] mark=0 use=1 A
5. Technology: eBPF • Linux bpf() system call since 3.18 • Run code on kernel events • Only changes, more data • Safe: In-kernel verifier, read-only • Fast: JIT-compiled Unofficial BPF mascot by Deirdré Straughan → 100% coverage + no app changes + low overhead ftw!
7. k8s nodes kubelet health checks
8. out-of-cluster APIs frontend
9. flow analysis prometheus
10. flow analysis prometheus
11. Productizing: Challenges • CPU overhead: profile + iterate → 0.1% CPU • Network overhead: encode efficiently and compress • Security: TLS, OAuth everywhere • Real-time: stream, don’t batch → 2 second latency • Pre-aggregate to manage cardinality • Workload baselining for automatic alerting Jonathan Perry <>