Yarn on Docker,大数据下Docker的最佳实践

Yarn on Docker,大数据下Docker的最佳实践

1. Docker Yarn on Docker TalkingData Hierarch.pan
2. TalkingData -数据处理能⼒ #1 30 2.5 6.5 Device Coverage Daily Active Devices Monthly Active Devices 34 370 14T #2 Daily Ingested Data 80 #3 2 Cities Daily Sessions Daily Events 3,000 400 Shopping Malls WIFI Fingerprints WIFI 12 800+ #4 Partner Ecosystem #5
3. Topic 1. 2. 3. 4. 5. 6. 7.
4. Hadoop Docker plugin shipyard Docker etcd swarm vxlan
5. Docker1.9 Libnetwork
6. TalkingData Docker plugin
7. etcd cluster etcd1 static ip pool etcd2 DNS etcd3 dns-auto-discover Docker-ipam-plugin Swarm Cluster br0 172.18.1.1 veth Container br0 172.18.1.2 veth Container Host1:172.18.0.1 172.18.1.3 veth Container br0 172.18.1.4 172.18.1.5 veth Container Host2:172.18.0.2 Cisco/h3c veth Container br0 172.18.1.6 veth Container Host3:172.18.0.3 vxlan 172.18.1.7 veth Container 172.18.1.8 veth Container Host3:172.18.0.4
8. vxlan • • Docker 4096 vlan static ip • Docker • DNS docker API mac 16m
9. iPerf a) Window Size Bandwidth 4K 16K 64K 256K 1M 137 770 Mbits/sec Mbits/sec 2.32 Gbits/sec 4.03 Gbits/sec 6.30 Gbits/sec 4K 16K 64K 256K 1M 1.06 Gbits/sec 5.13 Gbits/sec 13.3 Gbits/sec 32.2 Gbits/sec 37.8 Gbits/sec 4K 16K 64K 256K 1M 157 Mbits/sec 690 Mbits/sec 2.13 Gbits/sec 5.63 Gbits/sec 7.37 Gbits/sec b) Window Size Bandwidth c) Window Size Bandwidth
10. Yarn • - Hadoop yarn Yarn - • yarn - Yarn NodeManager CPU CPU • - NodeManager NodeManager yarn
11. Yarn on Docker H D F S Y1 Y1 Y1 Y1 Y1 Y1 Y1 Y1 Y1 Y1 Y1 Y2 Y2 Y2 Y2 Y2 Y3 Y3 Y3 Y3 Y3 DN DN DN DN DN Yarn1 Y2 Yarn2 Y3 Yarn3 S P A R K M A P R E D U C E 24 64G 5 12G NM 4 10G OS DN 4 16G
12. • Swarm - shipyard Yarn Hadoop CPU 80% - For datacenter - For cluster - IO IO
13. Swarm YARN ResourceManager register 1.swarm node swarm master swarm cluster 2.swarm master cluster 3.swarm Node Node Node Node Node assign node swarm docker container 4.swarm node Node docker deamon master join NodeManager 5.NodeManager NodeManager YARN ResourceManager Swarm cluster
14. Swarm Node Swarm node • Swarm node • • Docker container docker container container • NodeManager CPU Swarm YARN master Node Manager 4CPU 10G 5CPU 12G • 24CPU 64G •
15. Source Code Gitlab Build Artifact Docker Image Harbor Docker Registry Jenkins Build Environment Docker container S W A R M Node Node Node Node
16. Dockerfile startup.sh sed -i -E "s/NAMESERVICE/$NAMESERVICE/g" $HADOOP_HOME/etc/hadoop/core-site.xml ENTRYPOINT [“startup.sh"] CMD ["nodemanager"] docker run -e NAMESERVICE=ns1 Dockerfile …… ENV HA ENV NAMESERVICE ENV ACTIVE_NAMENODE_IP ENV … …… ADD files to image ENTRYPOINT [“startup.sh"] CMD ["nodemanager"]
17. docker run -d —-net=mynet —e NAMESERVICE=nameservice -e ACTIVE_NAMENODE_ID=namenode29 \ -e STANDBY_NAMENODE_ID=namenode63 \ -e HA_ZOOKEEPER_QUORUM=zk1:2181,zk2:2181,zk3:2181 \ -e YARN_ZK_DIR=rmstore \ -e YARN_CLUSTER_ID=yarnRM \ -e YARN_RM1_IP=rm1 \ -e YARN_RM2_IP=rm2 \ -e CPU_CORE_NUM=5 -e NODEMANAGER_MEMORY_MB=12288 \ -e YARN_JOBHISTORY_IP=jobhistory \ -e ACTIVE_NAMENODE_IP=active-namenode \ -e STANDBY_NAMENODE_IP=standby-namenode \ -e HA=yes \ docker-registry/library/hadoop-yarn:v0.1 resourcemanager
18. shipyard scale swarm
19. Shipyard
20. Docker Docker --storage-opt dm.basesize=50G --storage-opt dm.basesize=50G --storage-opt dm.loopdatasize=480G /usr/lib/systemd/system/docker.service /var/lib/docker/ network mapperdevice mynet 12G CPU 5 storage driver overlay
21. Hadoop yarn.nodemanager.localizer.fetch.thread-count 4 8 yarn.resourcemanager.amliveliness-monitor.interval-ms 1 yarn.resourcemanager.resource-tracker.client.thread-count yarn.nodemanager.pmem-check-enabled true false hadoop ulimit 4096 655350 yarn.nodemanager.resource.cpu-vcores 4 yarn.nodemanager.resource.memory-mb 10240 10 50 100
22. Mini yarn
23. Yarn
24. Docker
25. 1. Docker 2. DNS Docker 3. docker DNS bug bug 4.Docker 5.spark IP overlay shuffle mac Centos 7.2 Docker apr
26. Future • Service Control Panel web • Load balance • Dynamic Scale REST API/Web UI Scale up/down Yarn cluster • Scheduler swarm • • • API yarn
27. GPL Yarn on docker 9 github