Multi Cloud Ingress LB Gimbal Use Case in Actapio and Yahoo Japan

1. Multi-Cloud Ingress LB: Gimbal Use Case in Actapio and Yahoo Japan
2. Who is Hirotaka? Hirotaka “HIRO” Ichikawa Actapio, Inc. (Yahoo Japan, Corp.) Cloud Infrastructure Engineer ● Kubernetes Full Stack ● OpenStack Infrastructure Engineer ● Networking ● Data Center
3. What is Actapio? ● US subsidiary of Yahoo Japan ● 2 Data Center in WA ○ 16MW DC (next week ) ● SW infrastructure acceleration ○ Load Balancing ○ Application Monitoring ○ Storage ○ etc...
4. Who is Ryutaro? Ryutaro Inoue Senior Manager @Yahoo! JAPAN Speciality: ● Network ● Security ● HW ● OS ● OpenStack,Kubernetes
5. About Yahoo! JAPAN ● Joint venture of Softbank and Yahoo! Inc. ● 100+ services (news,auction,weather,etc..) ● 72billion PV per month ○ ○ ○ ○ 10 Data centers 100,000+ Servers 60+ OpenStack Clusters 100+ kubernetes Clusters
6. GIMBAL
7. What is Gimbal? Ingress Load Balancing Platform ● ● ● ● built by Heptio w/ Actapio work on Kubernetes ecosystem scalable, multi-team and API-driven support multi-cloud upstream ○ Kubernetes ○ OpenStack https://github.com/heptio/gimbal
9. Why#1: Necessity of Gimbal BRIDGING THE GAP BETWEEN VM AND CONTAINER Scalable Multi-Cloud Load Balancer ● support both of VM and Container cloud ● make migration from VM to Container EASY ● optimize operation cost by providing large single load balancing tier
10. Modernizing application release RAPID RELEASE AT SCALE FOR ANY CLOUD ● ● ● ● Service Discovery Performance Measurement Canary Deployment Instant Rollback
11. Why#2: Collaborating with Heptio ● Deep Expertise around K8S & Cloud Native ● Tight Connection to OSS Community
13. Technical Overview
14. Gimbal Architecture
15. Service Discovery ● ● Service/Endpoint Service Endpoint Gimbal discovers upstream cluster services. Developers can see their services at gimbal cluster and operate routing rules according to discovered services. LBaaS Pool/Member Supported Upstream Cloud ● Kubernetes ○ Service & Endpoint ● OpenStack ○ LBaaSv2 Pool & Member
16. K8S Service Discovery # Gimbal K8S Cluster $ kubectl get svc,ep --all-namespaces NAMESPACE NAME team01 service/up01-team01-app team01 service/up02-<LB_UUID> team02 service/up01-team02-app NAMESPACE team01 team01 team02 TYPE ClusterIP ClusterIP ClusterIP NAME endpoints/up01-team01-app endpoints/up02-<LB_UUID> endpoints/up01-team02-app CLUSTER-IP None None None EXTERNAL-IP <none> <none> <none> PORT(S) 80/TCP 80/TCP 80/TCP ENDPOINTS 10.1.0.10:80,10.1.0.5:80,10.1.0.8:80 172.17.165.214:80,172.17.165.215:80 10.1.0.12:80 AGE 136m 20h 42s AGE 136m 20h 42s # Upstream K8S Cluster $ kubectl get svc,ep --all-namespaces NAMESPACE NAME TYPE team01 service/team01-app ClusterIP team02 service/team02-app ClusterIP NAMESPACE team01 team02 NAME endpoints/team01-app endpoints/team01-app CLUSTER-IP 10.96.236.146 10.98.14.219 EXTERNAL-IP <none> <none> PORT(S) 80/TCP 80/TCP ENDPOINTS 10.1.0.10:80,10.1.0.5:80,10.1.0.8:80 10.1.0.12:80 AGE 15h 7m51s AGE 15h 7m51s
17. K8S Service Discovery # Gimbal K8S Cluster $ kubectl get svc,ep --all-namespaces NAMESPACE NAME team01 service/up01-team01-app team01 service/up02-<LB_UUID> team02 service/up01-team02-app NAMESPACE team01 team01 team02 TYPE ClusterIP ClusterIP ClusterIP NAME endpoints/up01-team01-app endpoints/up02-<LB_UUID> endpoints/up01-team02-app CLUSTER-IP None None None EXTERNAL-IP <none> <none> <none> PORT(S) 80/TCP 80/TCP 80/TCP ENDPOINTS 10.1.0.10:80,10.1.0.5:80,10.1.0.8:80 172.17.165.214:80,172.17.165.215:80 10.1.0.12:80 AGE 136m 20h 42s AGE 136m 20h 42s # Upstream K8S Cluster $ kubectl get svc,ep --all-namespaces NAMESPACE NAME TYPE team01 service/team01-app ClusterIP team02 service/team02-app ClusterIP NAMESPACE team01 team02 NAME endpoints/team01-app endpoints/team01-app CLUSTER-IP 10.96.236.146 10.98.14.219 EXTERNAL-IP <none> <none> PORT(S) 80/TCP 80/TCP ENDPOINTS 10.1.0.10:80,10.1.0.5:80,10.1.0.8:80 10.1.0.12:80 AGE 15h 7m51s AGE 15h 7m51s
18. K8S Service Discovery # Gimbal K8S Cluster $ kubectl get svc,ep --all-namespaces NAMESPACE NAME team01 service/up01-team01-app team01 service/up02-<LB_UUID> team02 service/up01-team02-app NAMESPACE team01 team01 team02 TYPE ClusterIP ClusterIP ClusterIP NAME endpoints/up01-team01-app endpoints/up02-<LB_UUID> endpoints/up01-team02-app CLUSTER-IP None None None EXTERNAL-IP <none> <none> <none> PORT(S) 80/TCP 80/TCP 80/TCP ENDPOINTS 10.1.0.10:80,10.1.0.5:80,10.1.0.8:80 172.17.165.214:80,172.17.165.215:80 10.1.0.12:80 AGE 136m 20h 42s AGE 136m 20h 42s # Upstream K8S Cluster $ kubectl get svc,ep --all-namespaces NAMESPACE NAME TYPE team01 service/team01-app ClusterIP team02 service/team02-app ClusterIP NAMESPACE team01 team02 NAME endpoints/team01-app endpoints/team01-app CLUSTER-IP 10.96.236.146 10.98.14.219 EXTERNAL-IP <none> <none> PORT(S) 80/TCP 80/TCP ENDPOINTS 10.1.0.10:80,10.1.0.5:80,10.1.0.8:80 10.1.0.12:80 AGE 15h 7m51s AGE 15h 7m51s
19. Routing and Policy ● Ingress controller for envoy ● Advanced Capability ○ Weight-shifting ○ Load balancing method ○ cross-cluster backends https://github.com/heptio/contour
20. Ingress Routing $ kubectl get ingressroute --namespace team01 NAME FQDN ALIASES TLS SECRET FIRST ROUTE app01 / apiVersion: contour.heptio.com/v1beta1 kind: IngressRoute metadata: name: team01-app spec: virtualhost: fqdn: team01-app.example.com routes: - match: / services: - name: up01-team01-app port: 80 weight: 75 - name: up02-team01-<UUID> port: 80 weight: 25 healthCheck: path: /status.html intervalSeconds: 60 STATUS valid STATUS DESCRIPTION valid IngressRoute ingress traffic envoy weight A envoy weight B LBaaS Service equally balanced POD POD POD equally balanced VM VM VM
21. Observability Activity Monitoring ● Up/Downstream ○ Request ○ Connection ○ Latency Developers can control balancing weight watching each service upstream situation.
22. Observability For Operators: Backend Clusters / Replication Status / IngressRoutes
23. ※ Current Limitations ● POD should be reachable from outside cluster ○ use “hostNetwork: true” ○ advertise pod network with Calico These methods are not applicable to all K8S users. We need future improvement.
24. Case study at Yahoo! JAPAN
25. Gimbal cluster structure in Yahoo! JAPAN production env ● 1 Cluster per datacenter ● L4LB uses DSR (Direct Server Return) ● Envoy pods in Gimbal cluster uses hostNetwork ○ 1 Envoy pod per node HW/Spec OS number L4LB A10 Thunder 3030S ACOS 2.7.2-P12-SP1 2(HA pair) Worker Nodes Xeon E5-2683v4 * 2 / 512GBmem/ 10G NIC * 2 Ubuntu 16.04.5 LTS 10 Master & etcd Nodes Xeon E5-2683v4 * 2 / 512GBmem / 10G NIC * 2 Ubuntu 16.04.5 LTS 3
26. Using Gimbal ● The application is developed for each service ○ Requires namespace for each service Service admin: ● Managing endpoint(vm/pod) ● Managing Gimbal config in their namespace Gimbal admin: ● Managing entire Gimbal cluster ○ Admin Namespace,Nodes, L4LB,etc...
27. Using Gimbal at Yahoo! JAPAN Beginning of use 1. Service admin sends a request Gimbal admin via ticketing system(JIRA) ○ Namespace & FQDN 2. Gimbal admin makes a kubeconfig file with authority for service namespace 3. Gimbal admin create the FQDN in admin(default) namespace 4. Gimbal admin sends the kubeconfig to service admin 5. Service admin modifies gimbal config using their kubeconfig 6. Service admin modifies their DNS record to gimbal cluster’s ip Modifying IngressRoutes 1. Service admin modifies Gimal cluster config Adding or deleting Endpoints 1. Service admin modifies their backend (openstack or kubernetes) a. Gimbal discovers automatically applies the changes to ingressroute
28. Config sample (delegation) admin namespace(default) apiVersion: contour.heptio.com/v1beta1 kind: IngressRoute metadata: name: root-news-yahoo-co-jp namespace: default spec: routes: - delegate: name: news-yahoo-co-jp namespace: news-team match: / virtualhost: fqdn: news.yahoo.co.jp service namespace(news-team) apiVersion: contour.heptio.com/v1beta1 kind: IngressRoute metadata: name: news-yahoo-co-jp namespace: news-team spec: routes: - match: / services: - name: os1-6f671e13-1fbd-4c6d-a1db-31a565bb019c port: 80 weight: 60 - name: k8s1-7877d3b4-a471-4f38-9459-69659a593b38 port: 80 weight: 40
29. Performance Testing
30. Performance Testing -environment Workload cluster: ● kubernetes on vm ● VM(4vCPU/8Gmem) * 10 nodes ● load generator: hey (https://github.com/rakyll/hey) Gimbal cluster: ● kubernetes on physical machine ● spec:Xeon E5-2683v4 * 2 (64 thread) / 512GBmem ● 1 Enovy pod, ● Envoy 1.6.0 Backend cluster ● kubernetes on vm ● VM(4vCPU/8Gmem) * 10nodes ● application: nginx
31. Performance Testing - Latency Request per second comment Data Size 10k 20k 50k 100k 150k 200k 250k 300k 1B 1.3ms 1.4ms 1.3ms 2.1ms 6.2ms 16.2ms 16.3ms 14.6ms 1KB 1.4ms 1.5ms 1.3ms 2.3ms 6.4ms 11.3ms 17.0ms 15.7ms 22KB 3.8ms 5.4ms 12.5ms - - - - - saturation of bandwidth(10Gbps)
32. Performance Testing - CPU usage Request per second comment Data Size 10k 20k 50k 100k 150k 200k 250k 300k 1B 5.33% 8.66% 25.33% 35.10% 38.85% 50.25% 64.22% 81.88% 1KB 5.66% 8.16% 20.88% 22.84% 36.28% 51.71% 59.05% 80.89% 22KB 7.17% 11.02% 23.45% - - - - - saturation of bandwidth(10Gbps)
33. Performance Testing # of Ingress routes / Endpoints # of Ingress routes P99 Latency (ms) 100 3.35 250 2.77 500 2500 5000 # of Endpoints P99 Latency (ms) 1 2.83 10 2.80 50 2.83 100 2.84 2.73 2.73 2.72 @22KB,45krps
34. Performance Testing -Conclusion ● An Envoy pod achieves 300k req/s @1KB data size ○ Might achieve higher? ● It scales linearly in terms of structure, 10 envoy pods will achieve 3M req/s ● No degradation of latency was observed when increasing the number of ingressroutes or endpoints
35. Future activities ● Expansion of usage in Yahoo JAPAN production environment ● Development of faster openstack discover ● Change Hardware L4LB into SWLB in upstream of Gimbal cluster ● Development of GUI
36. Conclusion ● Gimbal Overview ○ Bridging the gap between VM and Container ○ Modernize traditional application release ● Production ready performance at scale ● Yahoo Japan will expand Gimbal in production ● PR is welcome
37. Thank you Check it out https://github.com/heptio/gimbal Office Hour: 12/13 12:15 - 13:45 @Heptio booth