由于文章被限制在 65535 个字符,无法详细描述,文章内容会精简一部分;

设备环境相关信息

物理环境信息

  • 一台服务器虚拟出来的五台虚拟机:
  • Master 节点两台,4C/8G;
  • Worker 节点三台,8C/16G;

操作系统信息

  • CentOS 7.9

系统安装相关信息

Docker 版本

Client: Docker Engine - Community
 Version:           20.10.12
 API version:       1.41

Server: Docker Engine - Community
 Engine:
  Version:          20.10.12
  API version:      1.41 (minimum version 1.12)
 containerd:
  Version:          1.4.12
 runc:
  Version:          1.0.2

Kubernetes 版本

Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.5", GitCommit:"aea7bbadd2fc0cd689de94a54e5b7b758869d691", GitTreeState:"clean", BuildDate:"2021-09-15T21:10:45Z", GoVersion:"go1.16.8", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.5", GitCommit:"aea7bbadd2fc0cd689de94a54e5b7b758869d691", GitTreeState:"clean", BuildDate:"2021-09-15T21:04:16Z", GoVersion:"go1.16.8", Compiler:"gc", Platform:"linux/amd64"}

KubeSphere 版本、安装方式

  • 版本:v3.2.1

  • 安装方式:

    • 在纯净的 CentOS 上,使用 kk All in one 方式安装,kk 配置如下:

        apiVersion: kubekey.kubesphere.io/v1alpha2
        kind: Cluster
        metadata:
          name: sample
        spec:
          hosts:
          - {name: k8s-test-master1, address: 192.168.10.10, internalAddress: 192.168.10.10, user: root, password: "root"}
          - {name: k8s-test-master2, address: 192.168.10.11, internalAddress: 192.168.10.11, user: root, password: "root"}
          - {name: k8s-test-worker1, address: 192.168.10.12, internalAddress: 192.168.10.12, user: root, password: "root"}
          - {name: k8s-test-worker2, address: 192.168.10.13, internalAddress: 192.168.10.13, user: root, password: "root"}
          - {name: k8s-test-worker3, address: 192.168.10.14, internalAddress: 192.168.10.14, user: root, password: "root"}
          roleGroups:
            etcd:
            - k8s-test-master1
            - k8s-test-master2   
            control-plane: 
            - k8s-test-master1
            - k8s-test-master2   
            worker:
            - k8s-test-worker1
            - k8s-test-worker2
            - k8s-test-worker3
          controlPlaneEndpoint:
            internalLoadbalancer: haproxy
            domain: lb.kubesphere.local
            address: ""
            port: 6443
          kubernetes:
            version: v1.21.5
            clusterName: cluster.local
            autoRenewCerts: true
          etcd:
            type: kubekey
          network:
            plugin: calico
            kubePodsCIDR: 10.233.64.0/18
            kubeServiceCIDR: 10.233.0.0/18
            multusCNI:
              enabled: false
          registry:
            plainHTTP: false
            privateRegistry: ""
            namespaceOverride: ""
            registryMirrors: ["https://registry.docker-cn.com", "https://docker.mirrors.ustc.edu.cn", "http://hub-mirror.c.163.com"]
          addons: []
        
        ---
        apiVersion: installer.kubesphere.io/v1alpha1
        kind: ClusterConfiguration
        metadata:
          name: ks-installer
          namespace: kubesphere-system
          labels:
            version: v3.2.1
        spec:
          persistence:
            storageClass: ""
          authentication:
            jwtSecret: ""
          zone: ""
          local_registry: ""
          namespace_override: ""
          etcd:
            monitoring: true
            endpointIps: localhost
            port: 2379
            tlsEnable: true
          common:
            core:
              console:
                enableMultiLogin: true
                port: 30880
                type: NodePort
            redis:
              enabled: false
              volumeSize: 2Gi
            openldap:
              enabled: false
              volumeSize: 2Gi
            minio:
              volumeSize: 20Gi
            monitoring:
              endpoint: http://prometheus-operated.kubesphere-monitoring-system.svc:9090
              GPUMonitoring:
                enabled: false
            gpu:
              kinds:
              - resourceName: "nvidia.com/gpu"
                resourceType: "GPU"
                default: true
            es:
              logMaxAge: 7
              elkPrefix: logstash
              basicAuth:
                enabled: false
                username: ""
                password: ""
              externalElasticsearchHost: ""
              externalElasticsearchPort: ""
          alerting:
            enabled: true
          auditing:
            enabled: true
          devops:
            enabled: true
            jenkinsMemoryLim: 2Gi
            jenkinsMemoryReq: 1500Mi
            jenkinsVolumeSize: 8Gi
            jenkinsJavaOpts_Xms: 512m
            jenkinsJavaOpts_Xmx: 512m
            jenkinsJavaOpts_MaxRAM: 2g
          events:
            enabled: true
          logging:
            enabled: true
            containerruntime: docker
            logsidecar:
              enabled: true
              replicas: 2
          metrics_server:
            enabled: true
          monitoring:
            storageClass: ""
            gpu:
              nvidia_dcgm_exporter:
                enabled: false
                # resources: {}
          multicluster:
            clusterRole: none
          network:
            networkpolicy:
              enabled: false
            ippool:
              type: calico
            topology:
              type: weave scope
          openpitrix:
            store:
              enabled: true
          servicemesh:
            enabled: true
          kubeedge:
            enabled: true   
            cloudCore:
              nodeSelector: {"node-role.kubernetes.io/worker": ""}
              tolerations: []
              cloudhubPort: "10000"
              cloudhubQuicPort: "10001"
              cloudhubHttpsPort: "10002"
              cloudstreamPort: "10003"
              tunnelPort: "10004"
              cloudHub:
                advertiseAddress:
                  - "192.168.10.10"
                  - "192.168.10.11"
                nodeLimit: "800"
              service:
                cloudhubNodePort: "30000"
                cloudhubQuicNodePort: "30001"
                cloudhubHttpsNodePort: "30002"
                cloudstreamNodePort: "30003"
                tunnelNodePort: "30004"
            edgeWatcher:
              nodeSelector: {"node-role.kubernetes.io/worker": ""}
              tolerations: []
              edgeWatcherAgent:
                nodeSelector: {"node-role.kubernetes.io/worker": ""}
                tolerations: []

KubeEdge 版本

v1.7.2

问题描述

背景

  • 边缘节点也是服务器下的一台虚拟机,用于作为边缘节点测试;

  • 使用容忍的方式,使得 iptables 可以运行在 master 节点上,解决了无法在 KubeSphere 上对边缘节点进行 log、exec;

      kubectl edit iptables -n kubeedge

  • 在 master 节点执行了以下脚本,用于对边缘节点进行了 patch,使得容忍度较强的 calico、kubeproxy 等 pod 不会被调度到边缘节点上:

      #!/bin/bash
      
      NodeSelectorPatchJson='{"spec":{"template":{"spec":{"nodeSelector":{"node-role.kubernetes.io/master": "","node-role.kubernetes.io/worker": ""}}}}}'
      NoShedulePatchJson='{"spec":{"template":{"spec":{"affinity":{"nodeAffinity":{"requiredDuringSchedulingIgnoredDuringExecution":{"nodeSelectorTerms":[{"matchExpressions":[{"key":"node-role.kubernetes.io/edge","operator":"DoesNotExist"}]}]}}}}}}}'
      
      edgenode="edge1"
      if [ $1 ]; then
              edgenode="$1"
      fi
      
      namespaces=($(kubectl get pods -A -o wide |egrep -i $edgenode | awk '{print $1}' ))
      pods=($(kubectl get pods -A -o wide |egrep -i $edgenode | awk '{print $2}' ))
      length=${#namespaces[@]}
      
      for((i=0;i<$length;i++));  
      do
              ns=${namespaces[$i]}
              pod=${pods[$i]}
              resources=$(kubectl -n $ns describe pod $pod | grep "Controlled By" |awk '{print $3}')
              echo "Patching for ns: $ns, resources: $resources"
              kubectl -n $ns patch $resources --type merge --patch "$NoShedulePatchJson"
              sleep 1
      done
  • 目前已经可以成功对边缘节点进行 log、exec、metric;

问题一:安装完 KubeSphere 后,cloudcore 的报错问题

  • 此时还没加入边缘节点,不知道该报错是否有影响,以下为 cloudcore 的日志:

      W0527 10:09:10.897412       1 validation.go:168] TLSTunnelPrivateKeyFile does not exist in /etc/kubeedge/certs/server.key, will load from secret
      W0527 10:09:10.897547       1 validation.go:171] TLSTunnelCertFile does not exist in /etc/kubeedge/certs/server.crt, will load from secret
      W0527 10:09:10.897556       1 validation.go:174] TLSTunnelCAFile does not exist in /etc/kubeedge/ca/rootCA.crt, will load from secret
      I0527 10:09:10.897566       1 server.go:73] Version: v1.7.2
      W0527 10:09:10.897580       1 client_config.go:608] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
      I0527 10:09:11.910684       1 module.go:34] Module cloudhub registered successfully
      I0527 10:09:11.922923       1 module.go:34] Module edgecontroller registered successfully
      I0527 10:09:11.923136       1 module.go:34] Module devicecontroller registered successfully
      I0527 10:09:11.923769       1 module.go:34] Module synccontroller registered successfully
      I0527 10:09:11.924170       1 module.go:34] Module cloudStream registered successfully
      W0527 10:09:11.924183       1 module.go:37] Module router is disabled, do not register
      W0527 10:09:11.924188       1 module.go:37] Module dynamiccontroller is disabled, do not register
      I0527 10:09:11.924299       1 core.go:24] Starting module cloudhub
      I0527 10:09:11.924344       1 core.go:24] Starting module edgecontroller
      I0527 10:09:11.924389       1 core.go:24] Starting module devicecontroller
      I0527 10:09:11.924408       1 upstream.go:121] start upstream controller
      I0527 10:09:11.924428       1 core.go:24] Starting module synccontroller
      I0527 10:09:11.924451       1 core.go:24] Starting module cloudStream
      I0527 10:09:11.924497       1 downstream.go:870] Start downstream devicecontroller
      I0527 10:09:11.925257       1 downstream.go:566] start downstream controller
      E0527 10:09:11.998958       1 reflector.go:127] github.com/kubeedge/kubeedge/cloud/pkg/client/informers/externalversions/factory.go:119: Failed to watch *v1alpha2.Device: failed to list *v1alpha2.Device: the server could not find the requested resource (get devices.devices.kubeedge.io)
      ***E0527 10:09:12.096627       1 reflector.go:127] github.com/kubeedge/kubeedge/cloud/pkg/client/informers/externalversions/factory.go:119: Failed to watch *v1alpha2.DeviceModel: failed to list *v1alpha2.DeviceModel: the server could not find the requested resource (get devicemodels.devices.kubeedge.io)***
      I0527 10:09:12.124506       1 server.go:243] Ca and CaKey don't exist in local directory, and will read from the secret
      I0527 10:09:12.126786       1 server.go:247] Ca and CaKey don't exist in the secret, and will be created by CloudCore
      I0527 10:09:12.201854       1 server.go:288] CloudCoreCert and key don't exist in local directory, and will read from the secret
      I0527 10:09:12.203132       1 server.go:292] CloudCoreCert and key don't exist in the secret, and will be signed by CA
      I0527 10:09:12.207185       1 tunnelserver.go:136] Succeed in loading TunnelCA from CloudHub
      I0527 10:09:12.207531       1 tunnelserver.go:149] Succeed in loading TunnelCert and Key from CloudHub
      I0527 10:09:12.207700       1 tunnelserver.go:169] Prepare to start tunnel server ...
      I0527 10:09:12.209109       1 streamserver.go:280] Prepare to start stream server ...
      I0527 10:09:12.210027       1 signcerts.go:100] Succeed to creating token
      I0527 10:09:12.210065       1 server.go:44] start unix domain socket server
      I0527 10:09:12.210225       1 uds.go:71] listening on: //var/lib/kubeedge/kubeedge.sock
      I0527 10:09:12.210407       1 server.go:64] Starting cloudhub websocket server
      ***E0527 10:09:13.124710       1 reflector.go:127] github.com/kubeedge/kubeedge/cloud/pkg/client/informers/externalversions/factory.go:119: Failed to watch *v1alpha2.DeviceModel: failed to list *v1alpha2.DeviceModel: the server could not find the requested resource (get devicemodels.devices.kubeedge.io)
      E0527 10:09:13.450656       1 reflector.go:127] github.com/kubeedge/kubeedge/cloud/pkg/client/informers/externalversions/factory.go:119: Failed to watch *v1alpha2.Device: failed to list *v1alpha2.Device: the server could not find the requested resource (get devices.devices.kubeedge.io)***
      I0527 10:09:13.924726       1 upstream.go:63] Start upstream devicecontroller
      ***E0527 10:09:15.639730       1 reflector.go:127] github.com/kubeedge/kubeedge/cloud/pkg/client/informers/externalversions/factory.go:119: Failed to watch *v1alpha2.DeviceModel: failed to list *v1alpha2.DeviceModel: the server could not find the requested resource (get devicemodels.devices.kubeedge.io)
      E0527 10:09:16.432358       1 reflector.go:127] github.com/kubeedge/kubeedge/cloud/pkg/client/informers/externalversions/factory.go:119: Failed to watch *v1alpha2.Device: failed to list *v1alpha2.Device: the server could not find the requested resource (get devices.devices.kubeedge.io)***

问题二:边缘节点加入时的报错问题

  • 使用 KubeSphere 加入边缘节点时生成的命令,并修改为内网 IP ,在边缘节点上执行以下命令来加入集群:

      arch=$(uname -m); curl -LO https://kubeedge.pek3b.qingstor.com/bin/v1.7.2/$arch/keadm-v1.7.2-linux-$arch.tar.gz  && tar xvf keadm-v1.7.2-linux-$arch.tar.gz && chmod +x keadm && ./keadm join --kubeedge-version=1.7.2 --region=zh --cloudcore-ipport=192.168.10.40:30000 --quicport 30001 --certport 30002 --tunnelport 30004 --edgenode-name edge1 --edgenode-ip 192.168.10.141 --token e467ec90405bd002fcbda2594c62683f0e2ed3694ccd6e07439fb1d8be94572e.eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE2NTM3MDM3NTJ9.nhjRfMZdj17wzLd8zsnHto9aEqGESTEwM2BWT-mQxxk --with-edge-taint
  • 执行以上命令后,边缘节点可以成功加入集群,但查看边缘节点的日志有几处报错,不知道是否有影响,报错如下:

    • 报错1:

          May 27 02:27:33 k8s-test-edge2 edgecore[6149]: I0527 02:27:33.998159    6149 core.go:24] Starting module edgemesh
          May 27 02:27:33 k8s-test-edge2 edgecore[6149]: I0527 02:27:33.998240    6149 core.go:24] Starting module metaManager
          May 27 02:27:33 k8s-test-edge2 edgecore[6149]: I0527 02:27:33.998307    6149 core.go:24] Starting module edgestream
          May 27 02:27:33 k8s-test-edge2 edgecore[6149]: I0527 02:27:33.998390    6149 core.go:24] Starting module twin
          May 27 02:27:33 k8s-test-edge2 edgecore[6149]: I0527 02:27:33.998480    6149 core.go:24] Starting module edged
          May 27 02:27:33 k8s-test-edge2 edgecore[6149]: I0527 02:27:33.998537    6149 edged.go:290] Starting edged...
          May 27 02:27:33 k8s-test-edge2 edgecore[6149]: I0527 02:27:33.998642    6149 http.go:40] tlsConfig InsecureSkipVerify true
          May 27 02:27:33 k8s-test-edge2 edgecore[6149]: I0527 02:27:33.998502    6149 process.go:113] Begin to sync sqlite
          May 27 02:27:33 k8s-test-edge2 edgecore[6149]: I0527 02:27:33.998554    6149 core.go:24] Starting module websocket
          May 27 02:27:33 k8s-test-edge2 edgecore[6149]: I0527 02:27:33.999086    6149 core.go:24] Starting module eventbus
          ***May 27 02:27:33 k8s-test-edge2 edgecore[6149]: E0527 02:27:33.998678    6149 csi_plugin.go:226] kubernetes.io/csi: CSIDriverLister not found on KubeletVolumeHost***
          May 27 02:27:33 k8s-test-edge2 edgecore[6149]: I0527 02:27:33.999471    6149 fs_resource_analyzer.go:64] Starting FS ResourceAnalyzer
          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.000315    6149 client.go:86] parsed scheme: "unix"
          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.000332    6149 client.go:86] scheme "unix" not registered, fallback to default scheme
          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.000382    6149 passthrough.go:48] ccResolverWrapper: sending update to cc: {[{unix:///run/containerd/containerd.sock  <nil> 0 <nil>}] <nil> <nil>}
          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.000389    6149 clientconn.go:948] ClientConn switching balancer to "pick_first"
          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.000823    6149 common.go:96] start connect to mqtt server with client id: hub-client-sub-1653618454
          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.000855    6149 common.go:98] client hub-client-sub-1653618454 isconnected: false
          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:33.998396    6149 log.go:181] DEBUG: Installed strategy plugin: [RoundRobin].
          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.001331    6149 log.go:181] DEBUG: ConfigurationFactory Initiated
          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.001354    6149 log.go:181] INFO: Configuration files: []
          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.001403    6149 log.go:181] WARN: empty configurtion from [FileSource]
          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.001420    6149 log.go:181] INFO: invoke dynamic handler:FileSource
          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.001463    6149 log.go:181] INFO: archaius init success
          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.001847    6149 log.go:181] INFO: create new watcher
          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.003303    6149 client.go:150] finish hub-client sub
          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.003342    6149 common.go:96] start connect to mqtt server with client id: hub-client-pub-1653618454
          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.003357    6149 common.go:98] client hub-client-pub-1653618454 isconnected: false
          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.010648    6149 client.go:166] finish hub-client pub
          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.010671    6149 eventbus.go:63] Init Sub And Pub Client for externel mqtt broker tcp://127.0.0.1:1883 successfully
          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.010707    6149 client.go:91] edge-hub-cli subscribe topic to $hw/events/upload/#
          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.010924    6149 client.go:91] edge-hub-cli subscribe topic to $hw/events/device/+/state/update
          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.011051    6149 client.go:91] edge-hub-cli subscribe topic to $hw/events/device/+/twin/+
          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.011163    6149 client.go:91] edge-hub-cli subscribe topic to $hw/events/node/+/membership/get
          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.011272    6149 client.go:91] edge-hub-cli subscribe topic to SYS/dis/upload_records
          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.011406    6149 client.go:91] edge-hub-cli subscribe topic to +/user/#
          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.011506    6149 client.go:99] list edge-hub-cli-topics status, no record, skip sync
          ***May 27 02:27:34 k8s-test-edge2 edgecore[6149]: E0527 02:27:34.014687    6149 proxy.go:143] [EdgeMesh] open file /run/edgemesh-iptables err: open /run/edgemesh-iptables: no such file or directory***
          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.018287    6149 proxy.go:95] [EdgeMesh] chain EDGE-MESH not exists
          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.024132    6149 proxy.go:103] [EdgeMesh] inbound rule -p tcp -d 9.251.0.0/16 -i docker0 -j EDGE-MESH not exists
          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.034342    6149 certmanager.go:159] Certificate rotation is enabled.
          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.034366    6149 websocket.go:51] Websocket start to connect Access
          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.035274    6149 proxy.go:111] [EdgeMesh] outbound rule -p tcp -d 9.251.0.0/16 -o docker0 -j EDGE-MESH not exists
          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.058096    6149 proxy.go:119] [EdgeMesh] dnat rule -p tcp -j DNAT --to-destination 172.17.0.1:40001 not exists
          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.058842    6149 ws.go:46] dial wss://192.168.10.40:30000/e632aba927ea4ac2b575ec1603d56f10/edge1/events successfully
          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.059074    6149 websocket.go:93] Websocket connect to cloud access successful
          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.059368    6149 process.go:513] node connection event occur: cloud_connected
          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: W0527 02:27:34.059464    6149 eventbus.go:148] Action not found
          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.059541    6149 process.go:513] node connection event occur: cloud_connected
          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.059601    6149 process.go:282] DeviceTwin receive msg
          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.059671    6149 process.go:66] Send msg to the CommModule module in twin
          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.111086    6149 cpu_manager.go:184] [cpumanager] starting with none policy
          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.111469    6149 cpu_manager.go:185] [cpumanager] reconciling every 0s
          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.111640    6149 state_mem.go:36] [cpumanager] initializing new in-memory state store
          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.113127    6149 policy_none.go:43] [cpumanager] none policy: Start
          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.139178    6149 record.go:19] Normal NodeAllocatableEnforced Updated Node Allocatable limit across pods
          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.139511    6149 volume_manager.go:265] Starting Kubelet Volume Manager
          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.141891    6149 desired_state_of_world_populator.go:139] Desired state populator starts to run
          ***May 27 02:27:34 k8s-test-edge2 edgecore[6149]: E0527 02:27:34.153820    6149 imitator.go:222] failed to unmarshal message content to unstructured obj: Object 'Kind' is missing in '{"metadata":{"name":"edge1","creationTimestamp":null,"labels":{"kubernetes.io/arch":"amd64","kubernetes.io/hostname":"edge1","kubernetes.io/os":"linux","node-role.kubernetes.io/agent":"","node-role.kubernetes.io/edge":""}},"spec":{},"status":{"daemonEndpoints":{"kubeletEndpoint":{"Port":0}},"nodeInfo":{"machineID":"","systemUUID":"","bootID":"","kernelVersion":"","osImage":"","containerRuntimeVersion":"","kubeletVersion":"","kubeProxyVersion":"","operatingSystem":"","architecture":""}}}'***
    • 报错2:

          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.156121    6149 status_manager.go:53] Starting to sync pod status with apiserver
          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.156307    6149 edged.go:890] start pod addition queue work 0
          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.156592    6149 edged.go:890] start pod addition queue work 1
          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.156692    6149 edged.go:890] start pod addition queue work 2
          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.156755    6149 edged.go:890] start pod addition queue work 3
          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.156819    6149 edged.go:890] start pod addition queue work 4
          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.156895    6149 edged.go:356] starting plugin manager
          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.156896    6149 server.go:35] starting to listen read-only on 127.0.0.1:10350
          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.157240    6149 plugin_manager.go:114] Starting Kubelet Plugin Manager
          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.158731    6149 server.go:425] Adding debug handlers to kubelet server.
          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.159897    6149 edged_status.go:390] Attempting to register node edge1
          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.160798    6149 cpu_manager.go:184] [cpumanager] starting with none policy
          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.160809    6149 cpu_manager.go:185] [cpumanager] reconciling every 1s
          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.160823    6149 state_mem.go:36] [cpumanager] initializing new in-memory state store
          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.160924    6149 state_mem.go:88] [cpumanager] updated default cpuset: ""
          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.160931    6149 state_mem.go:96] [cpumanager] updated cpuset assignments: "map[]"
          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.160941    6149 policy_none.go:43] [cpumanager] none policy: Start
          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.160957    6149 edged.go:368] starting syncPod
          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.170464    6149 edged_status.go:409] Successfully registered node edge1
          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.179008    6149 edged_status.go:198] Sync VolumesInUse: []
          ***May 27 02:27:34 k8s-test-edge2 edgecore[6149]: E0527 02:27:34.211687    6149 imitator.go:222] failed to unmarshal message content to unstructured obj: json: cannot unmarshal array into Go value of type map[string]interface {}***
          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.258100    6149 listener.go:316] [EdgeMesh] update services: 50 resource: namespace/servicelist/service
          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: I0527 02:27:34.258392    6149 listener.go:327] [EdgeMesh] update svc kubesphere-logging-system.ks-events-ruler in cache
          ***May 27 02:27:34 k8s-test-edge2 edgecore[6149]: E0527 02:27:34.259566    6149 imitator.go:222] failed to unmarshal message content to unstructured obj: json: cannot unmarshal array into Go value of type map[string]interface {}
          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: E0527 02:27:34.263035    6149 imitator.go:222] failed to unmarshal message content to unstructured obj: json: cannot unmarshal array into Go value of type map[string]interface {}
          May 27 02:27:34 k8s-test-edge2 edgecore[6149]: E0527 02:27:34.359786    6149 imitator.go:222] failed to unmarshal message content to unstructured obj: Object 'Kind' is missing in 'null'***

问题三:在边缘节点部署 pod 时出现问题

  • 问题描述:

    • 该应用在集群内部创建可以成功运行;

    • 将该应用调度到边缘节点创建,则无法成功运行:

  • cloudcore 日志:

I0527 10:27:37.900475       1 session.go:125] Add a new apiserver connection APIServer_MetricsConnection MessageID 3 in to Tunnel session [edge1]
I0527 10:27:37.908200       1 containermetrics_connection.go:117] APIServer_MetricsConnection MessageID 3 find edge peer done, so stop this connection
I0527 10:27:37.908216       1 containermetrics_connection.go:93] APIServer_MetricsConnection MessageID 3 end successful
I0527 10:27:37.908223       1 session.go:133] Delete a apiserver connection APIServer_MetricsConnection MessageID 3 from Tunnel session [edge1]
I0527 10:27:37.908227       1 streamserver.go:189] Delete APIServer_MetricsConnection MessageID 3 from Tunnel session [edge1]
I0527 10:27:40.126371       1 upstream.go:88] Dispatch message: 5f765cde-43e3-4fad-b4a9-4dbe0d4fa21f
W0527 10:27:40.127293       1 upstream.go:92] Parse message: 5f765cde-43e3-4fad-b4a9-4dbe0d4fa21f resource type with error: unknown resource
***E0527 10:27:49.999931       1 reflector.go:127] github.com/kubeedge/kubeedge/cloud/pkg/client/informers/externalversions/factory.go:119: Failed to watch *v1alpha2.DeviceModel: failed to list *v1alpha2.DeviceModel: the server could not find the requested resource (get devicemodels.devices.kubeedge.io)
E0527 10:28:02.091135       1 reflector.go:127] github.com/kubeedge/kubeedge/cloud/pkg/client/informers/externalversions/factory.go:119: Failed to watch *v1alpha2.Device: failed to list *v1alpha2.Device: the server could not find the requested resource (get devices.devices.kubeedge.io)***
I0527 10:28:27.573267       1 session.go:125] Add a new apiserver connection APIServer_LogsConnection MessageID 4 in to Tunnel session [edge1]
I0527 10:28:27.577977       1 containerlog_connection.go:116] APIServer_LogsConnection MessageID 4 find edge peer done, so stop this connection
I0527 10:28:27.577988       1 containerlog_connection.go:92] APIServer_LogsConnection MessageID 4 end successful
I0527 10:28:27.577995       1 session.go:133] Delete a apiserver connection APIServer_LogsConnection MessageID 4 from Tunnel session [edge1]
I0527 10:28:27.577999       1 streamserver.go:139] Delete APIServer_LogsConnection MessageID 4 from Tunnel session [edge1]
***E0527 10:28:29.251566       1 objectsync.go:38] failed to get obj(gvr:/, Resource=,namespace:default,name:pvc-a1e452be-a4a5-451a-8d1f-3569a8289835), default "pvc-a1e452be-a4a5-451a-8d1f-3569a8289835" is forbidden: User "system:serviceaccount:kubeedge:cloudcore" cannot get resource "default" in API group "" at the cluster scope
E0527 10:28:29.257808       1 objectsync.go:38] failed to get obj(gvr:/, Resource=,namespace:edge-edgex1,name:consul-config), edge-edgex1 "consul-config" is forbidden: User "system:serviceaccount:kubeedge:cloudcore" cannot get resource "edge-edgex1" in API group "" at the cluster scope
E0527 10:28:29.262112       1 objectsync.go:38] failed to get obj(gvr:/, Resource=,namespace:edge-edgex1,name:consul-data), edge-edgex1 "consul-data" is forbidden: User "system:serviceaccount:kubeedge:cloudcore" cannot get resource "edge-edgex1" in API group "" at the cluster scope
E0527 10:28:29.274941       1 objectsync.go:38] failed to get obj(gvr:/, Resource=,namespace:edge-edgex1,name:db-data), edge-edgex1 "db-data" is forbidden: User "system:serviceaccount:kubeedge:cloudcore" cannot get resource "edge-edgex1" in API group "" at the cluster scope
E0527 10:28:29.303132       1 objectsync.go:38] failed to get obj(gvr:/, Resource=,namespace:default,name:pvc-ec504cbf-a4b7-4a04-9b38-932c50bf2607), default "pvc-ec504cbf-a4b7-4a04-9b38-932c50bf2607" is forbidden: User "system:serviceaccount:kubeedge:cloudcore" cannot get resource "default" in API group "" at the cluster scope***
  • 边缘节点 edgecore 日志:

      May 27 03:03:09 k8s-test-edge2 edgecore[6149]: I0527 03:03:09.184467    6149 record.go:24] Warning MissingClusterDNS kubelet does not have ClusterDNS IP configured and cannot create Pod using "ClusterFirst" policy. Falling back to "Default" policy.
      May 27 03:03:09 k8s-test-edge2 edgecore[6149]: I0527 03:03:09.184817    6149 record.go:24] Warning MissingClusterDNS pod: "edgex-core-metadata-78788c8c48-2cmkz_edge-edgex1(eb4a0914-f8c4-4a23-9ea4-b8c077dc9a7d)". kubelet does not have ClusterDNS IP configured and cannot create Pod using "ClusterFirst" policy. Falling back to "Default" policy.
      May 27 03:03:09 k8s-test-edge2 edgecore[6149]: I0527 03:03:09.184953    6149 edged.go:1015] consume added pod [edgex-core-metadata-78788c8c48-2cmkz] successfully
      May 27 03:03:09 k8s-test-edge2 edgecore[6149]: I0527 03:03:09.189078    6149 edged.go:900] worker [2] get pod addition item [edgex-support-notifications-598f7f85d-q6bfw]
      ***May 27 03:03:09 k8s-test-edge2 edgecore[6149]: E0527 03:03:09.189313    6149 edged.go:903] consume pod addition backoff: Back-off consume pod [edgex-support-notifications-598f7f85d-q6bfw] addition  error, backoff: [20s]***
      May 27 03:03:09 k8s-test-edge2 edgecore[6149]: I0527 03:03:09.189413    6149 edged.go:905] worker [2] backoff pod addition item [edgex-support-notifications-598f7f85d-q6bfw] failed, re-add to queue
      May 27 03:03:09 k8s-test-edge2 edgecore[6149]: I0527 03:03:09.344129    6149 edged_volumes.go:54] Using volume plugin "kubernetes.io/empty-dir" to mount wrapped_kube-api-access-5pbcc
      May 27 03:03:10 k8s-test-edge2 edgecore[6149]: I0527 03:03:10.007517    6149 edged.go:900] worker [3] get pod addition item [edgex-support-scheduler-5f9c499574-4jj64]
      May 27 03:03:10 k8s-test-edge2 edgecore[6149]: I0527 03:03:10.007606    6149 edged.go:968] start to consume added pod [edgex-support-scheduler-5f9c499574-4jj64]
      May 27 03:03:10 k8s-test-edge2 edgecore[6149]: I0527 03:03:10.008306    6149 record.go:24] Warning MissingClusterDNS kubelet does not have ClusterDNS IP configured and cannot create Pod using "ClusterFirst" policy. Falling back to "Default" policy.
      May 27 03:03:10 k8s-test-edge2 edgecore[6149]: I0527 03:03:10.008359    6149 record.go:24] Warning MissingClusterDNS pod: "edgex-support-scheduler-5f9c499574-4jj64_edge-edgex1(da0167dc-a3cb-4760-842e-ee4601a139f7)". kubelet does not have ClusterDNS IP configured and cannot create Pod using "ClusterFirst" policy. Falling back to "Default" policy.
      May 27 03:03:10 k8s-test-edge2 edgecore[6149]: I0527 03:03:10.009382    6149 record.go:24] Warning BackOff Back-off restarting failed container
      ***May 27 03:03:10 k8s-test-edge2 edgecore[6149]: E0527 03:03:10.009558    6149 edged.go:919] worker [3] handle pod addition item [edgex-support-scheduler-5f9c499574-4jj64] failed: failed to "StartContainer" for "edgex-support-scheduler" with CrashLoopBackOff: "back-off 5m0s restarting failed container=edgex-support-scheduler pod=edgex-support-scheduler-5f9c499574-4jj64_edge-edgex1(da0167dc-a3cb-4760-842e-ee4601a139f7)", re-add to queue***
      May 27 03:03:10 k8s-test-edge2 edgecore[6149]: I0527 03:03:10.146028    6149 edged_volumes.go:54] Using volume plugin "kubernetes.io/empty-dir" to mount wrapped_kube-api-access-zv6l9
      May 27 03:03:10 k8s-test-edge2 edgecore[6149]: I0527 03:03:10.720984    6149 edged.go:900] worker [1] get pod addition item [edgex-core-data-5bb8bcc584-95zzk]
      ***May 27 03:03:10 k8s-test-edge2 edgecore[6149]: E0527 03:03:10.721058    6149 edged.go:903] consume pod addition backoff: Back-off consume pod [edgex-core-data-5bb8bcc584-95zzk] addition  error, backoff: [1m20s]***
      May 27 03:03:10 k8s-test-edge2 edgecore[6149]: I0527 03:03:10.721118    6149 edged.go:905] worker [1] backoff pod addition item [edgex-core-data-5bb8bcc584-95zzk] failed, re-add to queue
      May 27 03:03:10 k8s-test-edge2 edgecore[6149]: I0527 03:03:10.727149    6149 edged.go:900] worker [4] get pod addition item [edgex-core-command-6fb8d849bc-q9qvr]
      ***May 27 03:03:10 k8s-test-edge2 edgecore[6149]: E0527 03:03:10.727166    6149 edged.go:903] consume pod addition backoff: Back-off consume pod [edgex-core-command-6fb8d849bc-q9qvr] addition  error, backoff: [20s]***
      May 27 03:03:10 k8s-test-edge2 edgecore[6149]: I0527 03:03:10.727184    6149 edged.go:905] worker [4] backoff pod addition item [edgex-core-command-6fb8d849bc-q9qvr] failed, re-add to queue
      May 27 03:03:11 k8s-test-edge2 edgecore[6149]: I0527 03:03:11.327897    6149 edged_status.go:198] Sync VolumesInUse: []
      May 27 03:03:18 k8s-test-edge2 edgecore[6149]: I0527 03:03:18.945156    6149 edged.go:900] worker [0] get pod addition item [edgex-sys-mgmt-agent-76698f698-g2n7w]
      May 27 03:03:18 k8s-test-edge2 edgecore[6149]: I0527 03:03:18.945856    6149 edged.go:968] start to consume added pod [edgex-sys-mgmt-agent-76698f698-g2n7w]
      May 27 03:03:18 k8s-test-edge2 edgecore[6149]: I0527 03:03:18.946594    6149 record.go:24] Warning MissingClusterDNS kubelet does not have ClusterDNS IP configured and cannot create Pod using "ClusterFirst" policy. Falling back to "Default" policy.
      May 27 03:03:18 k8s-test-edge2 edgecore[6149]: I0527 03:03:18.946878    6149 record.go:24] Warning MissingClusterDNS pod: "edgex-sys-mgmt-agent-76698f698-g2n7w_edge-edgex1(5e5563dd-1d8e-48eb-aab6-708f4c12d1d9)". kubelet does not have ClusterDNS IP configured and cannot create Pod using "ClusterFirst" policy. Falling back to "Default" policy.
      May 27 03:03:18 k8s-test-edge2 edgecore[6149]: I0527 03:03:18.949592    6149 record.go:19] Normal Pulled Container image "edgexfoundry/sys-mgmt-agent:2.1.0" already present on machine
      May 27 03:03:18 k8s-test-edge2 edgecore[6149]: I0527 03:03:18.949665    6149 edged_pods.go:321] container: edge-edgex1/edgex-sys-mgmt-agent-76698f698-g2n7w/edgex-sys-mgmt-agent podIP: "172.17.0.13" creating hosts mount: true
      May 27 03:03:18 k8s-test-edge2 edgecore[6149]: I0527 03:03:18.949693    6149 edged_pods.go:403] Pod "edgex-sys-mgmt-agent-76698f698-g2n7w_edge-edgex1(5e5563dd-1d8e-48eb-aab6-708f4c12d1d9)" container "edgex-sys-mgmt-agent" mount "system-claim0" has propagation "PROPAGATION_HOST_TO_CONTAINER"
      May 27 03:03:18 k8s-test-edge2 edgecore[6149]: I0527 03:03:18.949721    6149 edged_pods.go:403] Pod "edgex-sys-mgmt-agent-76698f698-g2n7w_edge-edgex1(5e5563dd-1d8e-48eb-aab6-708f4c12d1d9)" container "edgex-sys-mgmt-agent" mount "kube-api-access-f7s2p" has propagation "PROPAGATION_HOST_TO_CONTAINER"
      May 27 03:03:18 k8s-test-edge2 edgecore[6149]: I0527 03:03:18.978162    6149 record.go:19] Normal Created Created container edgex-sys-mgmt-agent
      May 27 03:03:19 k8s-test-edge2 edgecore[6149]: W0527 03:03:19.053832    6149 dns.go:125] [EdgeMesh] failed to resolve dns: get from real dns
      May 27 03:03:19 k8s-test-edge2 edgecore[6149]: I0527 03:03:19.058828    6149 record.go:19] Normal Started Started container edgex-sys-mgmt-agent
      May 27 03:03:19 k8s-test-edge2 edgecore[6149]: I0527 03:03:19.058926    6149 edged.go:1015] consume added pod [edgex-sys-mgmt-agent-76698f698-g2n7w] successfully
      May 27 03:03:19 k8s-test-edge2 edgecore[6149]: E0527 03:03:19.067279    6149 dns.go:290] [EdgeMesh] service edgex-core-consul is not found in this cluster
      May 27 03:03:19 k8s-test-edge2 edgecore[6149]: W0527 03:03:19.067297    6149 dns.go:125] [EdgeMesh] failed to resolve dns: get from real dns
      May 27 03:03:19 k8s-test-edge2 edgecore[6149]: W0527 03:03:19.067311    6149 dns.go:125] [EdgeMesh] failed to resolve dns: get from real dns
      May 27 03:03:19 k8s-test-edge2 edgecore[6149]: I0527 03:03:19.069355    6149 edged_volumes.go:54] Using volume plugin "kubernetes.io/empty-dir" to mount wrapped_kube-api-access-f7s2p
      ***May 27 03:03:19 k8s-test-edge2 edgecore[6149]: E0527 03:03:19.078343    6149 dns.go:290] [EdgeMesh] service edgex-core-consul is not found in this cluster***
      May 27 03:03:19 k8s-test-edge2 edgecore[6149]: W0527 03:03:19.078520    6149 dns.go:125] [EdgeMesh] failed to resolve dns: get from real dns
      May 27 03:03:20 k8s-test-edge2 edgecore[6149]: I0527 03:03:20.721558    6149 edged.go:900] worker [1] get pod addition item [edgex-sys-mgmt-agent-76698f698-g2n7w]
      May 27 03:03:20 k8s-test-edge2 edgecore[6149]: I0527 03:03:20.721595    6149 edged.go:968] start to consume added pod [edgex-sys-mgmt-agent-76698f698-g2n7w]

问题四:物理环境重启后无法对边缘节点进行 log、exec、metric

  • 将运行着 KubeSphere 、边缘节点的虚拟机所在的服务器重启后,在 KubeSphere 上查看边缘节点信息,此时能看到边缘节点,但无法 log、exec、metric,包括 CPU 、内存的信息都无法显示,对边缘节点的 pod 进行 log 、exec 的时候出现报错: ip:端口 connect refused;
10 天 后
21 天 后
2 年 后