由于文章被限制在 65535 个字符,无法详细描述,文章内容会精简一部分;
设备环境相关信息
物理环境信息
- 一台服务器虚拟出来的五台虚拟机:
- Master 节点两台,4C/8G;
- Worker 节点三台,8C/16G;
操作系统信息
- CentOS 7.9
系统安装相关信息
Docker 版本
Client: Docker Engine - Community
Version: 20.10.12
API version: 1.41
Server: Docker Engine - Community
Engine:
Version: 20.10.12
API version: 1.41 (minimum version 1.12)
containerd:
Version: 1.4.12
runc:
Version: 1.0.2
Kubernetes 版本
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.5", GitCommit:"aea7bbadd2fc0cd689de94a54e5b7b758869d691", GitTreeState:"clean", BuildDate:"2021-09-15T21:10:45Z", GoVersion:"go1.16.8", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.5", GitCommit:"aea7bbadd2fc0cd689de94a54e5b7b758869d691", GitTreeState:"clean", BuildDate:"2021-09-15T21:04:16Z", GoVersion:"go1.16.8", Compiler:"gc", Platform:"linux/amd64"}
KubeSphere 版本、安装方式
版本:v3.2.1
安装方式:
在纯净的 CentOS 上,使用 kk All in one 方式安装,kk 配置如下:
apiVersion: kubekey.kubesphere.io/v1alpha2 kind: Cluster metadata: name: sample spec: hosts: - {name: k8s-test-master1, address: 192.168.10.10, internalAddress: 192.168.10.10, user: root, password: "root"} - {name: k8s-test-master2, address: 192.168.10.11, internalAddress: 192.168.10.11, user: root, password: "root"} - {name: k8s-test-worker1, address: 192.168.10.12, internalAddress: 192.168.10.12, user: root, password: "root"} - {name: k8s-test-worker2, address: 192.168.10.13, internalAddress: 192.168.10.13, user: root, password: "root"} - {name: k8s-test-worker3, address: 192.168.10.14, internalAddress: 192.168.10.14, user: root, password: "root"} roleGroups: etcd: - k8s-test-master1 - k8s-test-master2 control-plane: - k8s-test-master1 - k8s-test-master2 worker: - k8s-test-worker1 - k8s-test-worker2 - k8s-test-worker3 controlPlaneEndpoint: internalLoadbalancer: haproxy domain: lb.kubesphere.local address: "" port: 6443 kubernetes: version: v1.21.5 clusterName: cluster.local autoRenewCerts: true etcd: type: kubekey network: plugin: calico kubePodsCIDR: 10.233.64.0/18 kubeServiceCIDR: 10.233.0.0/18 multusCNI: enabled: false registry: plainHTTP: false privateRegistry: "" namespaceOverride: "" registryMirrors: ["https://registry.docker-cn.com", "https://docker.mirrors.ustc.edu.cn", "http://hub-mirror.c.163.com"] addons: [] --- apiVersion: installer.kubesphere.io/v1alpha1 kind: ClusterConfiguration metadata: name: ks-installer namespace: kubesphere-system labels: version: v3.2.1 spec: persistence: storageClass: "" authentication: jwtSecret: "" zone: "" local_registry: "" namespace_override: "" etcd: monitoring: true endpointIps: localhost port: 2379 tlsEnable: true common: core: console: enableMultiLogin: true port: 30880 type: NodePort redis: enabled: false volumeSize: 2Gi openldap: enabled: false volumeSize: 2Gi minio: volumeSize: 20Gi monitoring: endpoint: http://prometheus-operated.kubesphere-monitoring-system.svc:9090 GPUMonitoring: enabled: false gpu: kinds: - resourceName: "nvidia.com/gpu" resourceType: "GPU" default: true es: logMaxAge: 7 elkPrefix: logstash basicAuth: enabled: false username: "" password: "" externalElasticsearchHost: "" externalElasticsearchPort: "" alerting: enabled: true auditing: enabled: true devops: enabled: true jenkinsMemoryLim: 2Gi jenkinsMemoryReq: 1500Mi jenkinsVolumeSize: 8Gi jenkinsJavaOpts_Xms: 512m jenkinsJavaOpts_Xmx: 512m jenkinsJavaOpts_MaxRAM: 2g events: enabled: true logging: enabled: true containerruntime: docker logsidecar: enabled: true replicas: 2 metrics_server: enabled: true monitoring: storageClass: "" gpu: nvidia_dcgm_exporter: enabled: false # resources: {} multicluster: clusterRole: none network: networkpolicy: enabled: false ippool: type: calico topology: type: weave scope openpitrix: store: enabled: true servicemesh: enabled: true kubeedge: enabled: true cloudCore: nodeSelector: {"node-role.kubernetes.io/worker": ""} tolerations: [] cloudhubPort: "10000" cloudhubQuicPort: "10001" cloudhubHttpsPort: "10002" cloudstreamPort: "10003" tunnelPort: "10004" cloudHub: advertiseAddress: - "192.168.10.10" - "192.168.10.11" nodeLimit: "800" service: cloudhubNodePort: "30000" cloudhubQuicNodePort: "30001" cloudhubHttpsNodePort: "30002" cloudstreamNodePort: "30003" tunnelNodePort: "30004" edgeWatcher: nodeSelector: {"node-role.kubernetes.io/worker": ""} tolerations: [] edgeWatcherAgent: nodeSelector: {"node-role.kubernetes.io/worker": ""} tolerations: []
KubeEdge 版本
v1.7.2
问题描述
背景
边缘节点也是服务器下的一台虚拟机,用于作为边缘节点测试;
使用容忍的方式,使得 iptables 可以运行在 master 节点上,解决了无法在 KubeSphere 上对边缘节点进行 log、exec;
kubectl edit iptables -n kubeedge
在 master 节点执行了以下脚本,用于对边缘节点进行了 patch,使得容忍度较强的 calico、kubeproxy 等 pod 不会被调度到边缘节点上:
#!/bin/bash NodeSelectorPatchJson='{"spec":{"template":{"spec":{"nodeSelector":{"node-role.kubernetes.io/master": "","node-role.kubernetes.io/worker": ""}}}}}' NoShedulePatchJson='{"spec":{"template":{"spec":{"affinity":{"nodeAffinity":{"requiredDuringSchedulingIgnoredDuringExecution":{"nodeSelectorTerms":[{"matchExpressions":[{"key":"node-role.kubernetes.io/edge","operator":"DoesNotExist"}]}]}}}}}}}' edgenode="edge1" if [ $1 ]; then edgenode="$1" fi namespaces=($(kubectl get pods -A -o wide |egrep -i $edgenode | awk '{print $1}' )) pods=($(kubectl get pods -A -o wide |egrep -i $edgenode | awk '{print $2}' )) length=${#namespaces[@]} for((i=0;i<$length;i++)); do ns=${namespaces[$i]} pod=${pods[$i]} resources=$(kubectl -n $ns describe pod $pod | grep "Controlled By" |awk '{print $3}') echo "Patching for ns: $ns, resources: $resources" kubectl -n $ns patch $resources --type merge --patch "$NoShedulePatchJson" sleep 1 done
目前已经可以成功对边缘节点进行 log、exec、metric;
问题一:安装完 KubeSphere 后,cloudcore 的报错问题
此时还没加入边缘节点,不知道该报错是否有影响,以下为 cloudcore 的日志:
W0527 10:09:10.897412 1 validation.go:168] TLSTunnelPrivateKeyFile does not exist in /etc/kubeedge/certs/server.key, will load from secret W0527 10:09:10.897547 1 validation.go:171] TLSTunnelCertFile does not exist in /etc/kubeedge/certs/server.crt, will load from secret W0527 10:09:10.897556 1 validation.go:174] TLSTunnelCAFile does not exist in /etc/kubeedge/ca/rootCA.crt, will load from secret I0527 10:09:10.897566 1 server.go:73] Version: v1.7.2 W0527 10:09:10.897580 1 client_config.go:608] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work. I0527 10:09:11.910684 1 module.go:34] Module cloudhub registered successfully I0527 10:09:11.922923 1 module.go:34] Module edgecontroller registered successfully I0527 10:09:11.923136 1 module.go:34] Module devicecontroller registered successfully I0527 10:09:11.923769 1 module.go:34] Module synccontroller registered successfully I0527 10:09:11.924170 1 module.go:34] Module cloudStream registered successfully W0527 10:09:11.924183 1 module.go:37] Module router is disabled, do not register W0527 10:09:11.924188 1 module.go:37] Module dynamiccontroller is disabled, do not register I0527 10:09:11.924299 1 core.go:24] Starting module cloudhub I0527 10:09:11.924344 1 core.go:24] Starting module edgecontroller I0527 10:09:11.924389 1 core.go:24] Starting module devicecontroller I0527 10:09:11.924408 1 upstream.go:121] start upstream controller I0527 10:09:11.924428 1 core.go:24] Starting module synccontroller I0527 10:09:11.924451 1 core.go:24] Starting module cloudStream I0527 10:09:11.924497 1 downstream.go:870] Start downstream devicecontroller I0527 10:09:11.925257 1 downstream.go:566] start downstream controller E0527 10:09:11.998958 1 reflector.go:127] github.com/kubeedge/kubeedge/cloud/pkg/client/informers/externalversions/factory.go:119: Failed to watch *v1alpha2.Device: failed to list *v1alpha2.Device: the server could not find the requested resource (get devices.devices.kubeedge.io) ***E0527 10:09:12.096627 1 reflector.go:127] github.com/kubeedge/kubeedge/cloud/pkg/client/informers/externalversions/factory.go:119: Failed to watch *v1alpha2.DeviceModel: failed to list *v1alpha2.DeviceModel: the server could not find the requested resource (get devicemodels.devices.kubeedge.io)*** I0527 10:09:12.124506 1 server.go:243] Ca and CaKey don't exist in local directory, and will read from the secret I0527 10:09:12.126786 1 server.go:247] Ca and CaKey don't exist in the secret, and will be created by CloudCore I0527 10:09:12.201854 1 server.go:288] CloudCoreCert and key don't exist in local directory, and will read from the secret I0527 10:09:12.203132 1 server.go:292] CloudCoreCert and key don't exist in the secret, and will be signed by CA I0527 10:09:12.207185 1 tunnelserver.go:136] Succeed in loading TunnelCA from CloudHub I0527 10:09:12.207531 1 tunnelserver.go:149] Succeed in loading TunnelCert and Key from CloudHub I0527 10:09:12.207700 1 tunnelserver.go:169] Prepare to start tunnel server ... I0527 10:09:12.209109 1 streamserver.go:280] Prepare to start stream server ... I0527 10:09:12.210027 1 signcerts.go:100] Succeed to creating token I0527 10:09:12.210065 1 server.go:44] start unix domain socket server I0527 10:09:12.210225 1 uds.go:71] listening on: //var/lib/kubeedge/kubeedge.sock I0527 10:09:12.210407 1 server.go:64] Starting cloudhub websocket server ***E0527 10:09:13.124710 1 reflector.go:127] github.com/kubeedge/kubeedge/cloud/pkg/client/informers/externalversions/factory.go:119: Failed to watch *v1alpha2.DeviceModel: failed to list *v1alpha2.DeviceModel: the server could not find the requested resource (get devicemodels.devices.kubeedge.io) E0527 10:09:13.450656 1 reflector.go:127] github.com/kubeedge/kubeedge/cloud/pkg/client/informers/externalversions/factory.go:119: Failed to watch *v1alpha2.Device: failed to list *v1alpha2.Device: the server could not find the requested resource (get devices.devices.kubeedge.io)*** I0527 10:09:13.924726 1 upstream.go:63] Start upstream devicecontroller ***E0527 10:09:15.639730 1 reflector.go:127] github.com/kubeedge/kubeedge/cloud/pkg/client/informers/externalversions/factory.go:119: Failed to watch *v1alpha2.DeviceModel: failed to list *v1alpha2.DeviceModel: the server could not find the requested resource (get devicemodels.devices.kubeedge.io) E0527 10:09:16.432358 1 reflector.go:127] github.com/kubeedge/kubeedge/cloud/pkg/client/informers/externalversions/factory.go:119: Failed to watch *v1alpha2.Device: failed to list *v1alpha2.Device: the server could not find the requested resource (get devices.devices.kubeedge.io)***
问题二:边缘节点加入时的报错问题
使用 KubeSphere 加入边缘节点时生成的命令,并修改为内网 IP ,在边缘节点上执行以下命令来加入集群:
arch=$(uname -m); curl -LO https://kubeedge.pek3b.qingstor.com/bin/v1.7.2/$arch/keadm-v1.7.2-linux-$arch.tar.gz && tar xvf keadm-v1.7.2-linux-$arch.tar.gz && chmod +x keadm && ./keadm join --kubeedge-version=1.7.2 --region=zh --cloudcore-ipport=192.168.10.40:30000 --quicport 30001 --certport 30002 --tunnelport 30004 --edgenode-name edge1 --edgenode-ip 192.168.10.141 --token e467ec90405bd002fcbda2594c62683f0e2ed3694ccd6e07439fb1d8be94572e.eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE2NTM3MDM3NTJ9.nhjRfMZdj17wzLd8zsnHto9aEqGESTEwM2BWT-mQxxk --with-edge-taint
执行以上命令后,边缘节点可以成功加入集群,但查看边缘节点的日志有几处报错,不知道是否有影响,报错如下:
报错1:
May 27 02:27:33 k8s-test-edge2 edgecore[6149]: I0527 02:27:33.998159 6149 core.go:24] Starting module edgemesh May 27 02:27:33 k8s-test-edge2 edgecore[6149]: I0527 02:27:33.998240 6149 core.go:24] Starting module metaManager May 27 02:27:33 k8s-test-edge2 edgecore[6149]: I0527 02:27:33.998307 6149 core.go:24] Starting module edgestream May 27 02:27:33 k8s-test-edge2 edgecore[6149]: I0527 02:27:33.998390 6149 core.go:24] Starting module twin May 27 02:27:33 k8s-test-edge2 edgecore[6149]: I0527 02:27:33.998480 6149 core.go:24] Starting module edged May 27 02:27:33 k8s-test-edge2 edgecore[6149]: I0527 02:27:33.998537 6149 edged.go:290] Starting edged... May 27 02:27:33 k8s-test-edge2 edgecore[6149]: I0527 02:27:33.998642 6149 http.go:40] tlsConfig InsecureSkipVerify true May 27 02:27:33 k8s-test-edge2 edgecore[6149]: I0527 02:27:33.998502 6149 process.go:113] Begin to sync sqlite May 27 02:27:33 k8s-test-edge2 edgecore[6149]: I0527 02:27:33.998554 6149 core.go:24] Starting module websocket May 27 02:27:33 k8s-test-edge2 edgecore[6149]: I0527 02:27:33.999086 6149 core.go:24] Starting module eventbus ***May 27 02:27:33 k8s-test-edge2 edgecore[6149]: E0527 02:27:33.998678 6149 csi_plugin.go:226] kubernetes.io/csi: CSIDriverLister not found on KubeletVolumeHost*** May 27 02:27:33 k8s-test-edge2 edgecore[6149]: I0527 02:27:33.999471 6149 fs_resource_analyzer.go:64] Starting FS ResourceAnalyzer May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.000315 6149 client.go:86] parsed scheme: "unix" May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.000332 6149 client.go:86] scheme "unix" not registered, fallback to default scheme May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.000382 6149 passthrough.go:48] ccResolverWrapper: sending update to cc: {[{unix:///run/containerd/containerd.sock <nil> 0 <nil>}] <nil> <nil>} May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.000389 6149 clientconn.go:948] ClientConn switching balancer to "pick_first" May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.000823 6149 common.go:96] start connect to mqtt server with client id: hub-client-sub-1653618454 May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.000855 6149 common.go:98] client hub-client-sub-1653618454 isconnected: false May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:33.998396 6149 log.go:181] DEBUG: Installed strategy plugin: [RoundRobin]. May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.001331 6149 log.go:181] DEBUG: ConfigurationFactory Initiated May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.001354 6149 log.go:181] INFO: Configuration files: [] May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.001403 6149 log.go:181] WARN: empty configurtion from [FileSource] May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.001420 6149 log.go:181] INFO: invoke dynamic handler:FileSource May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.001463 6149 log.go:181] INFO: archaius init success May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.001847 6149 log.go:181] INFO: create new watcher May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.003303 6149 client.go:150] finish hub-client sub May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.003342 6149 common.go:96] start connect to mqtt server with client id: hub-client-pub-1653618454 May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.003357 6149 common.go:98] client hub-client-pub-1653618454 isconnected: false May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.010648 6149 client.go:166] finish hub-client pub May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.010671 6149 eventbus.go:63] Init Sub And Pub Client for externel mqtt broker tcp://127.0.0.1:1883 successfully May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.010707 6149 client.go:91] edge-hub-cli subscribe topic to $hw/events/upload/# May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.010924 6149 client.go:91] edge-hub-cli subscribe topic to $hw/events/device/+/state/update May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.011051 6149 client.go:91] edge-hub-cli subscribe topic to $hw/events/device/+/twin/+ May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.011163 6149 client.go:91] edge-hub-cli subscribe topic to $hw/events/node/+/membership/get May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.011272 6149 client.go:91] edge-hub-cli subscribe topic to SYS/dis/upload_records May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.011406 6149 client.go:91] edge-hub-cli subscribe topic to +/user/# May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.011506 6149 client.go:99] list edge-hub-cli-topics status, no record, skip sync ***May 27 02:27:34 k8s-test-edge2 edgecore[6149]: E0527 02:27:34.014687 6149 proxy.go:143] [EdgeMesh] open file /run/edgemesh-iptables err: open /run/edgemesh-iptables: no such file or directory*** May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.018287 6149 proxy.go:95] [EdgeMesh] chain EDGE-MESH not exists May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.024132 6149 proxy.go:103] [EdgeMesh] inbound rule -p tcp -d 9.251.0.0/16 -i docker0 -j EDGE-MESH not exists May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.034342 6149 certmanager.go:159] Certificate rotation is enabled. May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.034366 6149 websocket.go:51] Websocket start to connect Access May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.035274 6149 proxy.go:111] [EdgeMesh] outbound rule -p tcp -d 9.251.0.0/16 -o docker0 -j EDGE-MESH not exists May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.058096 6149 proxy.go:119] [EdgeMesh] dnat rule -p tcp -j DNAT --to-destination 172.17.0.1:40001 not exists May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.058842 6149 ws.go:46] dial wss://192.168.10.40:30000/e632aba927ea4ac2b575ec1603d56f10/edge1/events successfully May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.059074 6149 websocket.go:93] Websocket connect to cloud access successful May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.059368 6149 process.go:513] node connection event occur: cloud_connected May 27 02:27:34 k8s-test-edge2 edgecore[6149]: W0527 02:27:34.059464 6149 eventbus.go:148] Action not found May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.059541 6149 process.go:513] node connection event occur: cloud_connected May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.059601 6149 process.go:282] DeviceTwin receive msg May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.059671 6149 process.go:66] Send msg to the CommModule module in twin May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.111086 6149 cpu_manager.go:184] [cpumanager] starting with none policy May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.111469 6149 cpu_manager.go:185] [cpumanager] reconciling every 0s May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.111640 6149 state_mem.go:36] [cpumanager] initializing new in-memory state store May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.113127 6149 policy_none.go:43] [cpumanager] none policy: Start May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.139178 6149 record.go:19] Normal NodeAllocatableEnforced Updated Node Allocatable limit across pods May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.139511 6149 volume_manager.go:265] Starting Kubelet Volume Manager May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.141891 6149 desired_state_of_world_populator.go:139] Desired state populator starts to run ***May 27 02:27:34 k8s-test-edge2 edgecore[6149]: E0527 02:27:34.153820 6149 imitator.go:222] failed to unmarshal message content to unstructured obj: Object 'Kind' is missing in '{"metadata":{"name":"edge1","creationTimestamp":null,"labels":{"kubernetes.io/arch":"amd64","kubernetes.io/hostname":"edge1","kubernetes.io/os":"linux","node-role.kubernetes.io/agent":"","node-role.kubernetes.io/edge":""}},"spec":{},"status":{"daemonEndpoints":{"kubeletEndpoint":{"Port":0}},"nodeInfo":{"machineID":"","systemUUID":"","bootID":"","kernelVersion":"","osImage":"","containerRuntimeVersion":"","kubeletVersion":"","kubeProxyVersion":"","operatingSystem":"","architecture":""}}}'***
报错2:
May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.156121 6149 status_manager.go:53] Starting to sync pod status with apiserver May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.156307 6149 edged.go:890] start pod addition queue work 0 May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.156592 6149 edged.go:890] start pod addition queue work 1 May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.156692 6149 edged.go:890] start pod addition queue work 2 May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.156755 6149 edged.go:890] start pod addition queue work 3 May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.156819 6149 edged.go:890] start pod addition queue work 4 May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.156895 6149 edged.go:356] starting plugin manager May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.156896 6149 server.go:35] starting to listen read-only on 127.0.0.1:10350 May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.157240 6149 plugin_manager.go:114] Starting Kubelet Plugin Manager May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.158731 6149 server.go:425] Adding debug handlers to kubelet server. May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.159897 6149 edged_status.go:390] Attempting to register node edge1 May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.160798 6149 cpu_manager.go:184] [cpumanager] starting with none policy May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.160809 6149 cpu_manager.go:185] [cpumanager] reconciling every 1s May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.160823 6149 state_mem.go:36] [cpumanager] initializing new in-memory state store May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.160924 6149 state_mem.go:88] [cpumanager] updated default cpuset: "" May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.160931 6149 state_mem.go:96] [cpumanager] updated cpuset assignments: "map[]" May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.160941 6149 policy_none.go:43] [cpumanager] none policy: Start May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.160957 6149 edged.go:368] starting syncPod May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.170464 6149 edged_status.go:409] Successfully registered node edge1 May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.179008 6149 edged_status.go:198] Sync VolumesInUse: [] ***May 27 02:27:34 k8s-test-edge2 edgecore[6149]: E0527 02:27:34.211687 6149 imitator.go:222] failed to unmarshal message content to unstructured obj: json: cannot unmarshal array into Go value of type map[string]interface {}*** May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.258100 6149 listener.go:316] [EdgeMesh] update services: 50 resource: namespace/servicelist/service May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.258392 6149 listener.go:327] [EdgeMesh] update svc kubesphere-logging-system.ks-events-ruler in cache ***May 27 02:27:34 k8s-test-edge2 edgecore[6149]: E0527 02:27:34.259566 6149 imitator.go:222] failed to unmarshal message content to unstructured obj: json: cannot unmarshal array into Go value of type map[string]interface {} May 27 02:27:34 k8s-test-edge2 edgecore[6149]: E0527 02:27:34.263035 6149 imitator.go:222] failed to unmarshal message content to unstructured obj: json: cannot unmarshal array into Go value of type map[string]interface {} May 27 02:27:34 k8s-test-edge2 edgecore[6149]: E0527 02:27:34.359786 6149 imitator.go:222] failed to unmarshal message content to unstructured obj: Object 'Kind' is missing in 'null'***
问题三:在边缘节点部署 pod 时出现问题
问题描述:
该应用在集群内部创建可以成功运行;
将该应用调度到边缘节点创建,则无法成功运行:
cloudcore 日志:
I0527 10:27:37.900475 1 session.go:125] Add a new apiserver connection APIServer_MetricsConnection MessageID 3 in to Tunnel session [edge1]
I0527 10:27:37.908200 1 containermetrics_connection.go:117] APIServer_MetricsConnection MessageID 3 find edge peer done, so stop this connection
I0527 10:27:37.908216 1 containermetrics_connection.go:93] APIServer_MetricsConnection MessageID 3 end successful
I0527 10:27:37.908223 1 session.go:133] Delete a apiserver connection APIServer_MetricsConnection MessageID 3 from Tunnel session [edge1]
I0527 10:27:37.908227 1 streamserver.go:189] Delete APIServer_MetricsConnection MessageID 3 from Tunnel session [edge1]
I0527 10:27:40.126371 1 upstream.go:88] Dispatch message: 5f765cde-43e3-4fad-b4a9-4dbe0d4fa21f
W0527 10:27:40.127293 1 upstream.go:92] Parse message: 5f765cde-43e3-4fad-b4a9-4dbe0d4fa21f resource type with error: unknown resource
***E0527 10:27:49.999931 1 reflector.go:127] github.com/kubeedge/kubeedge/cloud/pkg/client/informers/externalversions/factory.go:119: Failed to watch *v1alpha2.DeviceModel: failed to list *v1alpha2.DeviceModel: the server could not find the requested resource (get devicemodels.devices.kubeedge.io)
E0527 10:28:02.091135 1 reflector.go:127] github.com/kubeedge/kubeedge/cloud/pkg/client/informers/externalversions/factory.go:119: Failed to watch *v1alpha2.Device: failed to list *v1alpha2.Device: the server could not find the requested resource (get devices.devices.kubeedge.io)***
I0527 10:28:27.573267 1 session.go:125] Add a new apiserver connection APIServer_LogsConnection MessageID 4 in to Tunnel session [edge1]
I0527 10:28:27.577977 1 containerlog_connection.go:116] APIServer_LogsConnection MessageID 4 find edge peer done, so stop this connection
I0527 10:28:27.577988 1 containerlog_connection.go:92] APIServer_LogsConnection MessageID 4 end successful
I0527 10:28:27.577995 1 session.go:133] Delete a apiserver connection APIServer_LogsConnection MessageID 4 from Tunnel session [edge1]
I0527 10:28:27.577999 1 streamserver.go:139] Delete APIServer_LogsConnection MessageID 4 from Tunnel session [edge1]
***E0527 10:28:29.251566 1 objectsync.go:38] failed to get obj(gvr:/, Resource=,namespace:default,name:pvc-a1e452be-a4a5-451a-8d1f-3569a8289835), default "pvc-a1e452be-a4a5-451a-8d1f-3569a8289835" is forbidden: User "system:serviceaccount:kubeedge:cloudcore" cannot get resource "default" in API group "" at the cluster scope
E0527 10:28:29.257808 1 objectsync.go:38] failed to get obj(gvr:/, Resource=,namespace:edge-edgex1,name:consul-config), edge-edgex1 "consul-config" is forbidden: User "system:serviceaccount:kubeedge:cloudcore" cannot get resource "edge-edgex1" in API group "" at the cluster scope
E0527 10:28:29.262112 1 objectsync.go:38] failed to get obj(gvr:/, Resource=,namespace:edge-edgex1,name:consul-data), edge-edgex1 "consul-data" is forbidden: User "system:serviceaccount:kubeedge:cloudcore" cannot get resource "edge-edgex1" in API group "" at the cluster scope
E0527 10:28:29.274941 1 objectsync.go:38] failed to get obj(gvr:/, Resource=,namespace:edge-edgex1,name:db-data), edge-edgex1 "db-data" is forbidden: User "system:serviceaccount:kubeedge:cloudcore" cannot get resource "edge-edgex1" in API group "" at the cluster scope
E0527 10:28:29.303132 1 objectsync.go:38] failed to get obj(gvr:/, Resource=,namespace:default,name:pvc-ec504cbf-a4b7-4a04-9b38-932c50bf2607), default "pvc-ec504cbf-a4b7-4a04-9b38-932c50bf2607" is forbidden: User "system:serviceaccount:kubeedge:cloudcore" cannot get resource "default" in API group "" at the cluster scope***
边缘节点 edgecore 日志:
May 27 03:03:09 k8s-test-edge2 edgecore[6149]: I0527 03:03:09.184467 6149 record.go:24] Warning MissingClusterDNS kubelet does not have ClusterDNS IP configured and cannot create Pod using "ClusterFirst" policy. Falling back to "Default" policy. May 27 03:03:09 k8s-test-edge2 edgecore[6149]: I0527 03:03:09.184817 6149 record.go:24] Warning MissingClusterDNS pod: "edgex-core-metadata-78788c8c48-2cmkz_edge-edgex1(eb4a0914-f8c4-4a23-9ea4-b8c077dc9a7d)". kubelet does not have ClusterDNS IP configured and cannot create Pod using "ClusterFirst" policy. Falling back to "Default" policy. May 27 03:03:09 k8s-test-edge2 edgecore[6149]: I0527 03:03:09.184953 6149 edged.go:1015] consume added pod [edgex-core-metadata-78788c8c48-2cmkz] successfully May 27 03:03:09 k8s-test-edge2 edgecore[6149]: I0527 03:03:09.189078 6149 edged.go:900] worker [2] get pod addition item [edgex-support-notifications-598f7f85d-q6bfw] ***May 27 03:03:09 k8s-test-edge2 edgecore[6149]: E0527 03:03:09.189313 6149 edged.go:903] consume pod addition backoff: Back-off consume pod [edgex-support-notifications-598f7f85d-q6bfw] addition error, backoff: [20s]*** May 27 03:03:09 k8s-test-edge2 edgecore[6149]: I0527 03:03:09.189413 6149 edged.go:905] worker [2] backoff pod addition item [edgex-support-notifications-598f7f85d-q6bfw] failed, re-add to queue May 27 03:03:09 k8s-test-edge2 edgecore[6149]: I0527 03:03:09.344129 6149 edged_volumes.go:54] Using volume plugin "kubernetes.io/empty-dir" to mount wrapped_kube-api-access-5pbcc May 27 03:03:10 k8s-test-edge2 edgecore[6149]: I0527 03:03:10.007517 6149 edged.go:900] worker [3] get pod addition item [edgex-support-scheduler-5f9c499574-4jj64] May 27 03:03:10 k8s-test-edge2 edgecore[6149]: I0527 03:03:10.007606 6149 edged.go:968] start to consume added pod [edgex-support-scheduler-5f9c499574-4jj64] May 27 03:03:10 k8s-test-edge2 edgecore[6149]: I0527 03:03:10.008306 6149 record.go:24] Warning MissingClusterDNS kubelet does not have ClusterDNS IP configured and cannot create Pod using "ClusterFirst" policy. Falling back to "Default" policy. May 27 03:03:10 k8s-test-edge2 edgecore[6149]: I0527 03:03:10.008359 6149 record.go:24] Warning MissingClusterDNS pod: "edgex-support-scheduler-5f9c499574-4jj64_edge-edgex1(da0167dc-a3cb-4760-842e-ee4601a139f7)". kubelet does not have ClusterDNS IP configured and cannot create Pod using "ClusterFirst" policy. Falling back to "Default" policy. May 27 03:03:10 k8s-test-edge2 edgecore[6149]: I0527 03:03:10.009382 6149 record.go:24] Warning BackOff Back-off restarting failed container ***May 27 03:03:10 k8s-test-edge2 edgecore[6149]: E0527 03:03:10.009558 6149 edged.go:919] worker [3] handle pod addition item [edgex-support-scheduler-5f9c499574-4jj64] failed: failed to "StartContainer" for "edgex-support-scheduler" with CrashLoopBackOff: "back-off 5m0s restarting failed container=edgex-support-scheduler pod=edgex-support-scheduler-5f9c499574-4jj64_edge-edgex1(da0167dc-a3cb-4760-842e-ee4601a139f7)", re-add to queue*** May 27 03:03:10 k8s-test-edge2 edgecore[6149]: I0527 03:03:10.146028 6149 edged_volumes.go:54] Using volume plugin "kubernetes.io/empty-dir" to mount wrapped_kube-api-access-zv6l9 May 27 03:03:10 k8s-test-edge2 edgecore[6149]: I0527 03:03:10.720984 6149 edged.go:900] worker [1] get pod addition item [edgex-core-data-5bb8bcc584-95zzk] ***May 27 03:03:10 k8s-test-edge2 edgecore[6149]: E0527 03:03:10.721058 6149 edged.go:903] consume pod addition backoff: Back-off consume pod [edgex-core-data-5bb8bcc584-95zzk] addition error, backoff: [1m20s]*** May 27 03:03:10 k8s-test-edge2 edgecore[6149]: I0527 03:03:10.721118 6149 edged.go:905] worker [1] backoff pod addition item [edgex-core-data-5bb8bcc584-95zzk] failed, re-add to queue May 27 03:03:10 k8s-test-edge2 edgecore[6149]: I0527 03:03:10.727149 6149 edged.go:900] worker [4] get pod addition item [edgex-core-command-6fb8d849bc-q9qvr] ***May 27 03:03:10 k8s-test-edge2 edgecore[6149]: E0527 03:03:10.727166 6149 edged.go:903] consume pod addition backoff: Back-off consume pod [edgex-core-command-6fb8d849bc-q9qvr] addition error, backoff: [20s]*** May 27 03:03:10 k8s-test-edge2 edgecore[6149]: I0527 03:03:10.727184 6149 edged.go:905] worker [4] backoff pod addition item [edgex-core-command-6fb8d849bc-q9qvr] failed, re-add to queue May 27 03:03:11 k8s-test-edge2 edgecore[6149]: I0527 03:03:11.327897 6149 edged_status.go:198] Sync VolumesInUse: [] May 27 03:03:18 k8s-test-edge2 edgecore[6149]: I0527 03:03:18.945156 6149 edged.go:900] worker [0] get pod addition item [edgex-sys-mgmt-agent-76698f698-g2n7w] May 27 03:03:18 k8s-test-edge2 edgecore[6149]: I0527 03:03:18.945856 6149 edged.go:968] start to consume added pod [edgex-sys-mgmt-agent-76698f698-g2n7w] May 27 03:03:18 k8s-test-edge2 edgecore[6149]: I0527 03:03:18.946594 6149 record.go:24] Warning MissingClusterDNS kubelet does not have ClusterDNS IP configured and cannot create Pod using "ClusterFirst" policy. Falling back to "Default" policy. May 27 03:03:18 k8s-test-edge2 edgecore[6149]: I0527 03:03:18.946878 6149 record.go:24] Warning MissingClusterDNS pod: "edgex-sys-mgmt-agent-76698f698-g2n7w_edge-edgex1(5e5563dd-1d8e-48eb-aab6-708f4c12d1d9)". kubelet does not have ClusterDNS IP configured and cannot create Pod using "ClusterFirst" policy. Falling back to "Default" policy. May 27 03:03:18 k8s-test-edge2 edgecore[6149]: I0527 03:03:18.949592 6149 record.go:19] Normal Pulled Container image "edgexfoundry/sys-mgmt-agent:2.1.0" already present on machine May 27 03:03:18 k8s-test-edge2 edgecore[6149]: I0527 03:03:18.949665 6149 edged_pods.go:321] container: edge-edgex1/edgex-sys-mgmt-agent-76698f698-g2n7w/edgex-sys-mgmt-agent podIP: "172.17.0.13" creating hosts mount: true May 27 03:03:18 k8s-test-edge2 edgecore[6149]: I0527 03:03:18.949693 6149 edged_pods.go:403] Pod "edgex-sys-mgmt-agent-76698f698-g2n7w_edge-edgex1(5e5563dd-1d8e-48eb-aab6-708f4c12d1d9)" container "edgex-sys-mgmt-agent" mount "system-claim0" has propagation "PROPAGATION_HOST_TO_CONTAINER" May 27 03:03:18 k8s-test-edge2 edgecore[6149]: I0527 03:03:18.949721 6149 edged_pods.go:403] Pod "edgex-sys-mgmt-agent-76698f698-g2n7w_edge-edgex1(5e5563dd-1d8e-48eb-aab6-708f4c12d1d9)" container "edgex-sys-mgmt-agent" mount "kube-api-access-f7s2p" has propagation "PROPAGATION_HOST_TO_CONTAINER" May 27 03:03:18 k8s-test-edge2 edgecore[6149]: I0527 03:03:18.978162 6149 record.go:19] Normal Created Created container edgex-sys-mgmt-agent May 27 03:03:19 k8s-test-edge2 edgecore[6149]: W0527 03:03:19.053832 6149 dns.go:125] [EdgeMesh] failed to resolve dns: get from real dns May 27 03:03:19 k8s-test-edge2 edgecore[6149]: I0527 03:03:19.058828 6149 record.go:19] Normal Started Started container edgex-sys-mgmt-agent May 27 03:03:19 k8s-test-edge2 edgecore[6149]: I0527 03:03:19.058926 6149 edged.go:1015] consume added pod [edgex-sys-mgmt-agent-76698f698-g2n7w] successfully May 27 03:03:19 k8s-test-edge2 edgecore[6149]: E0527 03:03:19.067279 6149 dns.go:290] [EdgeMesh] service edgex-core-consul is not found in this cluster May 27 03:03:19 k8s-test-edge2 edgecore[6149]: W0527 03:03:19.067297 6149 dns.go:125] [EdgeMesh] failed to resolve dns: get from real dns May 27 03:03:19 k8s-test-edge2 edgecore[6149]: W0527 03:03:19.067311 6149 dns.go:125] [EdgeMesh] failed to resolve dns: get from real dns May 27 03:03:19 k8s-test-edge2 edgecore[6149]: I0527 03:03:19.069355 6149 edged_volumes.go:54] Using volume plugin "kubernetes.io/empty-dir" to mount wrapped_kube-api-access-f7s2p ***May 27 03:03:19 k8s-test-edge2 edgecore[6149]: E0527 03:03:19.078343 6149 dns.go:290] [EdgeMesh] service edgex-core-consul is not found in this cluster*** May 27 03:03:19 k8s-test-edge2 edgecore[6149]: W0527 03:03:19.078520 6149 dns.go:125] [EdgeMesh] failed to resolve dns: get from real dns May 27 03:03:20 k8s-test-edge2 edgecore[6149]: I0527 03:03:20.721558 6149 edged.go:900] worker [1] get pod addition item [edgex-sys-mgmt-agent-76698f698-g2n7w] May 27 03:03:20 k8s-test-edge2 edgecore[6149]: I0527 03:03:20.721595 6149 edged.go:968] start to consume added pod [edgex-sys-mgmt-agent-76698f698-g2n7w]
问题四:物理环境重启后无法对边缘节点进行 log、exec、metric
- 将运行着 KubeSphere 、边缘节点的虚拟机所在的服务器重启后,在 KubeSphere 上查看边缘节点信息,此时能看到边缘节点,但无法 log、exec、metric,包括 CPU 、内存的信息都无法显示,对边缘节点的 pod 进行 log 、exec 的时候出现报错: ip:端口 connect refused;