openEuler安装高可用k8s kubesphere出现故障
【标题描述】能够简要描述问题:
在使用kubesphere提供的kk工具安装 k8s 1.19.8和Kubesphere v3.1.0时,出现了 kubesphere组件redis-ha-haproxy组件一直无法启动的情况,通过排查内核日志及kubelet日志 ,发现该容器是被kubelet Omm kill掉,
但是服务器节点内存非常充足,3个master 节点分别是16G 、32G 、16g,空闲内存还很多,具体如下
kubelet事件
操作系统内核日志
$$
Dec 15 21:04:20 ahsshjz-k8smaster01 kernel: [16704.633785] haproxy invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=994
Dec 15 21:04:20 ahsshjz-k8smaster01 kernel: [16704.633788] CPU: 1 PID: 192200 Comm: haproxy Kdump: loaded Not tainted 5.10.0-60.18.0.50.oe2203.x86_64 #1
Dec 15 21:04:20 ahsshjz-k8smaster01 kernel: [16704.633789] Hardware name: OpenStack Foundation OpenStack Nova, BIOS rel-1.10.2-0-g5f4c7b1-20181220_000000-szxrtosci10000 04/01/2014
Dec 15 21:04:20 ahsshjz-k8smaster01 kernel: [16704.633790] Call Trace:
Dec 15 21:04:20 ahsshjz-k8smaster01 kernel: [16704.633799] dump_stack+0×57/0×6a
Dec 15 21:04:20 ahsshjz-k8smaster01 kernel: [16704.633802] dump_header+0×4a/0×1f0
Dec 15 21:04:20 ahsshjz-k8smaster01 kernel: [16704.633804] oom_kill_process.cold+0xb/0×10
Dec 15 21:04:20 ahsshjz-k8smaster01 kernel: [16704.633808] out_of_memory+0×100/0×310
Dec 15 21:04:20 ahsshjz-k8smaster01 kernel: [16704.633812] mem_cgroup_out_of_memory+0×134/0×150
Dec 15 21:04:20 ahsshjz-k8smaster01 kernel: [16704.633813] mem_cgroup_oom+0×14d/0×180
Dec 15 21:04:20 ahsshjz-k8smaster01 kernel: [16704.633815] try_charge+0×2b1/0×580
Dec 15 21:04:20 ahsshjz-k8smaster01 kernel: [16704.633817] ? __pagevec_lru_add_fn+0×183/0×2e0
Dec 15 21:04:20 ahsshjz-k8smaster01 kernel: [16704.633819] mem_cgroup_charge+0xf1/0×250
Dec 15 21:04:20 ahsshjz-k8smaster01 kernel: [16704.633821] do_anonymous_page+0×1f2/0×560
Dec 15 21:04:20 ahsshjz-k8smaster01 kernel: [16704.633823] __handle_mm_fault+0×3dd/0×6d0
Dec 15 21:04:20 ahsshjz-k8smaster01 kernel: [16704.633824] handle_mm_fault+0xbe/0×290
Dec 15 21:04:20 ahsshjz-k8smaster01 kernel: [16704.633827] exc_page_fault+0×273/0×550
Dec 15 21:04:20 ahsshjz-k8smaster01 kernel: [16704.633830] ? asm_exc_page_fault+0×8/0×30
Dec 15 21:04:20 ahsshjz-k8smaster01 kernel: [16704.633831] asm_exc_page_fault+0×1e/0×30
Dec 15 21:04:20 ahsshjz-k8smaster01 kernel: [16704.633833] RIP: 0033:0×55a4a16d2a9a
Dec 15 21:04:20 ahsshjz-k8smaster01 kernel: [16704.633835] Code: 1b 00 ff ff ff ff c7 05 2c 92 1b 00 ff ff ff ff c7 05 1e 92 1b 00 ff ff ff ff 85 ed 7e 34 66 90 48 63 c2 83 c2 01 48 c1 e0 06 <48> c7 04 03 00 00 00 00 48 8b 1d 57 3b 1b 00 48 01 d8 c7 40 18 fd
Dec 15 21:04:20 ahsshjz-k8smaster01 kernel: [16704.633836] RSP: 002b:00007ffe490870e0 EFLAGS: 00010206
Dec 15 21:04:20 ahsshjz-k8smaster01 kernel: [16704.633838] RAX: 000000001f254000 RBX: 00007f68f22b6010 RCX: 00007f62f22b6010
Dec 15 21:04:20 ahsshjz-k8smaster01 kernel: [16704.633839] RDX: 00000000007c9501 RSI: 0000000000000000 RDI: 0000000000000000
Dec 15 21:04:20 ahsshjz-k8smaster01 kernel: [16704.633840] RBP: 000000003ffffff7 R08: 00007f62f22b6010 R09: 0000000000000000
Dec 15 21:04:20 ahsshjz-k8smaster01 kernel: [16704.633841] R10: 0000000000000022 R11: 0000000000000246 R12: 00007f66f22b6010
Dec 15 21:04:20 ahsshjz-k8smaster01 kernel: [16704.633842] R13: 000055a4a225ec30 R14: 0000000000000001 R15: 0000000000000001
Dec 15 21:04:20 ahsshjz-k8smaster01 kernel: [16704.633844] memory: usage 512000kB, limit 512000kB, failcnt 2537
Dec 15 21:04:20 ahsshjz-k8smaster01 kernel: [16704.633844] memory+swap: usage 512000kB, limit 9007199254740988kB, failcnt 0
Dec 15 21:04:20 ahsshjz-k8smaster01 kernel: [16704.633845] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
Dec 15 21:04:20 ahsshjz-k8smaster01 kernel: [16704.633845] Memory cgroup stats for /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podc9f031a9_b682_459c_ad9d_8fbb491edbdb.slice:
Dec 15 21:04:20 ahsshjz-k8smaster01 kernel: [16704.633855] anon 524288000#012file 0#012kernel_stack 0#012percpu 0#012sock 0#012shmem 0#012file_mapped 0#012file_dirty 0#012file_writeback 0#012anon_thp 511705088#012inactive_anon 524242944#012active_anon 8192#012inactive_file 0#012active_file 0#012unevictable 0#012slab_reclaimable 0#012slab_unreclaimable 0#012slab 0#012workingset_refault_anon 0#012workingset_refault_file 14598#012workingset_activate_anon 0#012workingset_activate_file 6#012workingset_restore_anon 0#012workingset_restore_file 0#012workingset_nodereclaim 0#012pgfault 35036#012pgmajfault 561#012pgrefill 10434#012pgscan 29409#012pgsteal 17231#012pgactivate 9817#012pgdeactivate 10021#012pglazyfree 0#012pglazyfreed 0#012thp_fault_alloc 1967#012thp_collapse_alloc 0
Dec 15 21:04:20 ahsshjz-k8smaster01 kernel: [16704.633856] Tasks state (memory values in pages):
Dec 15 21:04:20 ahsshjz-k8smaster01 kernel: [16704.633857] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
Dec 15 21:04:20 ahsshjz-k8smaster01 kernel: [16704.633859] [ 186685] 1000 186685 243 1 28672 0 -998 pause
Dec 15 21:04:20 ahsshjz-k8smaster01 kernel: [16704.633861] [ 192200] 1000 192200 23072522 127962 1118208 0 994 haproxy
Dec 15 21:04:20 ahsshjz-k8smaster01 kernel: [16704.633862] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=docker-be7545b7c95b6f6f011944a5e3c693363db0a18e9a4e4e089a5b4bfea1a12e28.scope,mems_allowed=0,oom_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podc9f031a9_b682_459c_ad9d_8fbb491edbdb.slice,task_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podc9f031a9_b682_459c_ad9d_8fbb491edbdb.slice/docker-be7545b7c95b6f6f011944a5e3c693363db0a18e9a4e4e089a5b4bfea1a12e28.scope,task=haproxy,pid=192200,uid=1000
Dec 15 21:04:20 ahsshjz-k8smaster01 kernel: [16704.633870] Memory cgroup out of memory: Killed process 192200 (haproxy) total-vm:92290088kB, anon-rss:511720kB, file-rss:128kB, shmem-rss:0kB, UID:1000 pgtables:1092kB oom_score_adj:994
Dec 15 21:04:20 ahsshjz-k8smaster01 kernel: [16704.637227] oom_reaper: reaped process 192200 (haproxy), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
$$
配置15000M 依然是启动haproxy 会达到最大内存限制,而被Kill
$$
Dec 15 22:22:52 ahsshjz-k8snode2 kernel: [23041.159234] haproxy invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=997
Dec 15 22:22:52 ahsshjz-k8snode2 kernel: [23041.159237] CPU: 0 PID: 269263 Comm: haproxy Kdump: loaded Not tainted 5.10.0-60.18.0.50.oe2203.x86_64 #1
Dec 15 22:22:52 ahsshjz-k8snode2 kernel: [23041.159238] Hardware name: OpenStack Foundation OpenStack Nova, BIOS rel-1.10.2-0-g5f4c7b1-20181220_000000-szxrtosci10000 04/01/2014
Dec 15 22:22:52 ahsshjz-k8snode2 kernel: [23041.159239] Call Trace:
Dec 15 22:22:52 ahsshjz-k8snode2 kernel: [23041.159246] dump_stack+0×57/0×6a
Dec 15 22:22:52 ahsshjz-k8snode2 kernel: [23041.159250] dump_header+0×4a/0×1f0
Dec 15 22:22:52 ahsshjz-k8snode2 kernel: [23041.159252] oom_kill_process.cold+0xb/0×10
Dec 15 22:22:52 ahsshjz-k8snode2 kernel: [23041.159255] out_of_memory+0×100/0×310
Dec 15 22:22:52 ahsshjz-k8snode2 kernel: [23041.159259] mem_cgroup_out_of_memory+0×134/0×150
Dec 15 22:22:52 ahsshjz-k8snode2 kernel: [23041.159261] mem_cgroup_oom+0×14d/0×180
Dec 15 22:22:52 ahsshjz-k8snode2 kernel: [23041.159262] try_charge+0×2b1/0×580
Dec 15 22:22:52 ahsshjz-k8snode2 kernel: [23041.159265] ? __pagevec_lru_add_fn+0×183/0×2e0
Dec 15 22:22:52 ahsshjz-k8snode2 kernel: [23041.159266] mem_cgroup_charge+0xf1/0×250
Dec 15 22:22:52 ahsshjz-k8snode2 kernel: [23041.159268] do_anonymous_page+0×1f2/0×560
Dec 15 22:22:52 ahsshjz-k8snode2 kernel: [23041.159271] __handle_mm_fault+0×3dd/0×6d0
Dec 15 22:22:52 ahsshjz-k8snode2 kernel: [23041.159272] handle_mm_fault+0xbe/0×290
Dec 15 22:22:52 ahsshjz-k8snode2 kernel: [23041.159275] exc_page_fault+0×273/0×550
Dec 15 22:22:52 ahsshjz-k8snode2 kernel: [23041.159278] ? asm_exc_page_fault+0×8/0×30
Dec 15 22:22:52 ahsshjz-k8snode2 kernel: [23041.159279] asm_exc_page_fault+0×1e/0×30
Dec 15 22:22:52 ahsshjz-k8snode2 kernel: [23041.159281] RIP: 0033:0×562cc41baa9a
Dec 15 22:22:52 ahsshjz-k8snode2 kernel: [23041.159283] Code: 1b 00 ff ff ff ff c7 05 2c 92 1b 00 ff ff ff ff c7 05 1e 92 1b 00 ff ff ff ff 85 ed 7e 34 66 90 48 63 c2 83 c2 01 48 c1 e0 06 <48> c7 04 03 00 00 00 00 48 8b 1d 57 3b 1b 00 48 01 d8 c7 40 18 fd
Dec 15 22:22:52 ahsshjz-k8snode2 kernel: [23041.159284] RSP: 002b:00007ffcc9b32e70 EFLAGS: 00010206
Dec 15 22:22:52 ahsshjz-k8snode2 kernel: [23041.159286] RAX: 000000005da52000 RBX: 00007f58a8a43010 RCX: 00007f52a8a43010
Dec 15 22:22:52 ahsshjz-k8snode2 kernel: [23041.159286] RDX: 0000000001769481 RSI: 0000000000000000 RDI: 0000000000000000
Dec 15 22:22:52 ahsshjz-k8snode2 kernel: [23041.159287] RBP: 000000003ffffff7 R08: 00007f52a8a43010 R09: 0000000000000000
Dec 15 22:22:52 ahsshjz-k8snode2 kernel: [23041.159288] R10: 0000000000000022 R11: 0000000000000246 R12: 00007f56a8a43010
Dec 15 22:22:52 ahsshjz-k8snode2 kernel: [23041.159289] R13: 0000562cc4afcc30 R14: 0000000000000001 R15: 0000000000000001
Dec 15 22:22:52 ahsshjz-k8snode2 kernel: [23041.159290] memory: usage 1536000kB, limit 1536000kB, failcnt 1668
Dec 15 22:22:52 ahsshjz-k8snode2 kernel: [23041.159291] memory+swap: usage 1536000kB, limit 9007199254740988kB, failcnt 0
Dec 15 22:22:52 ahsshjz-k8snode2 kernel: [23041.159292] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
Dec 15 22:22:52 ahsshjz-k8snode2 kernel: [23041.159292] Memory cgroup stats for /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod655eac53_2a9d_4db8_b083_b4776263d81b.slice:
Dec 15 22:22:52 ahsshjz-k8snode2 kernel: [23041.159302] anon 1572851712#012file 12288#012kernel_stack 0#012percpu 0#012sock 0#012shmem 0#012file_mapped 0#012file_dirty 0#012file_writeback 0#012anon_thp 1560281088#012inactive_anon 1572843520#012active_anon 8192#012inactive_file 4096#012active_file 8192#012unevictable 0#012slab_reclaimable 0#012slab_unreclaimable 0#012slab 0#012workingset_refault_anon 0#012workingset_refault_file 8857#012workingset_activate_anon 0#012workingset_activate_file 0#012workingset_restore_anon 0#012workingset_restore_file 0#012workingset_nodereclaim 0#012pgfault 20892#012pgmajfault 293#012pgrefill 5615#012pgscan 14815#012pgsteal 8858#012pgactivate 5191#012pgdeactivate 5192#012pglazyfree 0#012pglazyfreed 0#012thp_fault_alloc 2987#012thp_collapse_alloc 0
Dec 15 22:22:52 ahsshjz-k8snode2 kernel: [23041.159303] Tasks state (memory values in pages):
Dec 15 22:22:52 ahsshjz-k8snode2 kernel: [23041.159303] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
Dec 15 22:22:52 ahsshjz-k8snode2 kernel: [23041.159305] [ 267934] 1000 267934 243 1 28672 0 -998 pause
Dec 15 22:22:52 ahsshjz-k8snode2 kernel: [23041.159307] [ 269263] 1000 269263 23072522 383961 3166208 0 997 haproxy
Dec 15 22:22:52 ahsshjz-k8snode2 kernel: [23041.159308] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=docker-4db120f25d7ff16eba42ef22d3daef5201050bb61ff51f460f843c2e8cc62975.scope,mems_allowed=0,oom_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod655eac53_2a9d_4db8_b083_b4776263d81b.slice,task_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod655eac53_2a9d_4db8_b083_b4776263d81b.slice/docker-4db120f25d7ff16eba42ef22d3daef5201050bb61ff51f460f843c2e8cc62975.scope,task=haproxy,pid=269263,uid=1000
Dec 15 22:22:52 ahsshjz-k8snode2 kernel: [23041.159317] Memory cgroup out of memory: Killed process 269263 (haproxy) total-vm:92290088kB, anon-rss:1535832kB, file-rss:12kB, shmem-rss:0kB, UID:1000 pgtables:3092kB oom_score_adj:997
Dec 15 22:22:52 ahsshjz-k8snode2 kernel: [23041.164890] oom_reaper: reaped process 269263 (haproxy), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
$$
问题如上,请问我应该如何处理该问题,以上问题在centos7环境中未出现
【环境信息】
硬件信息:
1) 虚拟机.x86架构
2) 官方提供的虚拟机镜像
软件信息:
1) openeuler 22.03lts
2) 5.10.0-60.18.0.50.oe2203.x86_64
3) k8s v1.19.8 kubesphere v3.1.0 harproxy镜像 registry.cn-beijing.aliyuncs.com/kubesphereio/haproxy:2.0.4
【问题复现步骤】
3台 openeuler 22.03lts,3个master节点
使用Kubesphere 官方提供的 kk工具创建集群
./kk create config –with-kubernetes v1.19.8 –with-kubesphere v3.1.0 -f config-sample.yaml
./kk create cluster -f config-sample.yaml
出现概率(必现)
内核sysctl.conf配置内容如下
$$
kernel.sysrq=0
net.ipv4.ip_forward = 1
net.ipv4.conf.all.send_redirects=0
net.ipv4.conf.default.send_redirects=0
net.ipv4.conf.all.accept_source_route=0
net.ipv4.conf.default.accept_source_route=0
net.ipv4.conf.all.accept_redirects=0
net.ipv4.conf.default.accept_redirects=0
net.ipv4.conf.all.secure_redirects=0
net.ipv4.conf.default.secure_redirects=0
net.ipv4.icmp_echo_ignore_broadcasts=1
net.ipv4.icmp_ignore_bogus_error_responses=1
net.ipv4.conf.all.rp_filter=1
net.ipv4.conf.default.rp_filter=1
net.ipv4.tcp_syncookies=1
kernel.dmesg_restrict=1
net.ipv6.conf.all.accept_redirects=0
net.ipv6.conf.default.accept_redirects=0
net.ipv4.tcp_timestamps=0
kernel.sched_autogroup_enabled=0
net.ipv4.icmp_ignore_bogus_error_responses=0
net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.all.forwarding=1
net.ipv4.neigh.default.gc_thresh1=4096
net.ipv4.neigh.default.gc_thresh2=6144
net.ipv4.neigh.default.gc_thresh3=8192
net.ipv4.neigh.default.gc_interval=60
net.ipv4.neigh.default.gc_stale_time=120
kernel.perf_event_paranoid=-1
net.ipv4.tcp_slow_start_after_idle=0
net.core.rmem_max=16777216
fs.inotify.max_user_watches=524288
kernel.softlockup_all_cpu_backtrace=1
kernel.softlockup_panic=0
kernel.watchdog_thresh=30
fs.file-max=2097152
$$