分布式任务系统

  • 110 浏览

rfyiamcool

2020/11/20 发布于 技术 分类

xiaorui.cc

Kubernetes  golang k8s 

文字内容
1. 构建分布式任务系统 - xiaorui.cc - github.com/rfyiamcool
2. 1 出师有名
3. shark ? kubernetes manage node like ansible deploy k8s cluster template + scripts manage k8s state execute shell backup k8s config custom module migrate k8s zone custom plugin
4. shark ? standard deploy & manage server os init kafka, rabbitmq base init rocketmq cluster extend init cloud init … redis, mysql, mongodb cluster … more feature
5. 2 架构设计
6. 架构设计
7. grpc 优点 protobuf ⽀持各类语⾔ 基于http2兼容好 HTTP2 TLS ⽀持bidi全双⼯通信模式 protobuf ⾼性能序列化 TCP tag及数据压缩 类型约束
8. gateway client client gatway gatway dispatch zone redirect leader master wrap logic master master master master master master beijing shanghai ….
9. leader election master master redis + lua acqiure release ttl
10. interceptor client sdk-api 3. query leader 2. resp not leader 1. req 4. is leader master master minion client interceptor
11. sync replica event stream master master grpc unary snapshot revision fullsync 全量数据 psync 增量数据
12. grace upgrade golang没有原⽣的fork和exec syscall !!! need upgrade master minion fork-exec with fd download 1. new-minion 询问是否可升级 ? 2. 下载升级⽂件 3. 检验升级⽂件md5、version及试运⾏⾃检 4. fork + exec 新⽂件携带listen fd 5. 主进程加⼊超时事件并监听winch信号 watch sig winch set procname check if pid is running ? wait max 30s add timeout event send signal winch set procname grace shutdown 6. ⼦进程在服务运⾏30s后, 向主进程触发winch信号 7. 主进程收到winch触发grpc grace shutdown() if rase timeout ? send signal kill force exit
13. 3 功能设计
14. K8S master del k8s-master add node del worker node k8s-master k8s-node k8s-node k8s-node k8s-node k8s-etcd zone 1 k8s-master k8s-etcd k8s-node k8s-node k8s-node k8s-node zone 2 k8s-node k8s-node
15. K8S 加⼊更多的预先条件检查 基于rancher rke⼆次开发 改造⼒度很⼤ k8s组件使⽤hyperkube运⾏ 可增减k8s master及node节点 很⼤ 实时进度流程反馈 ⼤ 更详细的参数定制 . 可传⼊⾃定义插件 ⾃由的镜像版本选择 …
16. 任务的异步化 0. ping/pong 1. submit client minion master 6. push res 7. get or poll 2. init 5. output result state init 3. push pending 4. input task ack running worker worker worker worker ok worker worker worker worker failed
17. watch api channel grpc stream push revision client update watcher revision revision 增量数据 grpc stream revision revision channel 全量数据 post result client
18. golang plugin sync plugin master minion download plugin优点 减少主代码的污染 动态库的热加载 加快编译速度 设计 使⽤⼦进程运⾏不安全plugin 对go plugin进⾏压缩传输 申明多种任务类型的interface{}
19. golang modules fork + exec minion EventHandler grpc stream over unix domain socket module loader module-1 module loader module-2 module loader module-3 各个业务modules实现⾃注册 module loader 开放evnetHandler⽅法 checker check if pid is running ? ping idle timeout 运⾏模式 安全模式 ⾮安全模式
21. timer 10s timer < 15s 20s producer 15 - 30s producer timer 30s - 1m channel 3m 参考rocketmq实现 timer 1m - 5m 5m - 10m 放低精度 channel 减少timer heap复杂度 减少锁竞争
22. kvstore string +keyname,S = expire length S[keyname] = value 使用go-badger实现minion本地存储 list +keyname,L = expire length 借鉴redis来设计复杂数据结构 简易接⼝实现 L[key] = value hash +keyname,H = expire length 由gcworker来定时轮询清理过期数据 H[keyname]fieldname = value sorted set +key,Z = expire length Z[keyname]m_member = score Z[keyname]s_score = “member valud” …
23. 安全 正向 client -> master -> minion 链路有证书 对client加⼊额外的token检验 反向 minion -> master -> client 另⼀组证书 防⽌gdb 去除符号及调试信息 IP的⿊⽩名单 auth错误次数超过阈值则进⾏锁定 minion token机制 minion启动时使⽤master公钥构建token投递给master master⽤密钥解析token, ⽤该token才可访问minion
24. ⾼可用性 兜底 if minion process crash ? systemd if minion node crash ? no way if master node crash ? 重新选举并更正节点的状态 minion监听选举状态, 连接leader节点 if master logic panic ? wrapper recovery if master process crash ? systemd
25. ⾼可用性 rate limiter protect backoff 退避算法 ip failfast 快速失败 service/method failover 失效转义
26. 3 optimize
27. cpu on
28. cpu off use fgprof
29. 消减锁粒度 cache cache cache cache Rwlock Rwlock Rwlock cache cache cache Rwlock Rwlock Rwlock Rwlock 减少锁竞争 提⾼并发性
30. gc optimize sync.pool缓存 结果集对象 sync.pool缓存 bytes.buffer对象 map及slice采用预分配⽅案 …
31. reduce syscall goroutine goroutine goroutine pipeline fnv(task_id) & ( cap -1 ) goroutine goroutine merge goroutine
32. 4 some bug
33. data race goroutine set string goroutine Data confusion type structHeader struct{ uintprt len } get string copy and set atomic store & load
34. shell pgid : 18534 bash pid:'>pid:'>pid:'>pid: 18534 设置⼦进程的进程组id 杀进程组 fork & exec golang os/exec bash -c “sleep 3; tailf log” fork & exec tailf log pid:'>pid:'>pid:'>pid: 18536 sleep 3 pid:'>pid:'>pid:'>pid: 18535 pgid : 18534 shell执⾏多条命令, kill os/exec pid清理不⼲净
36. “ Q&A ” - xiaorui.cc