• 边缘
  • kubesphere-4.1.1部署kubeedge组件和边缘节点无法添加的异常

创建部署问题时,请参考下面模板,你提供的信息越多,越容易及时获得解答。如果未按模板创建问题,管理员有权关闭问题。
确保帖子格式清晰易读,用 markdown code block 语法格式化代码块。
你只花一分钟创建的问题,不能指望别人花上半个小时给你解答。

操作系统信息
例如:物理机 kylinv10系统 arm64

Kubernetes版本信息
kubectl version 命令执行结果贴在下方

1.28.8

容器运行时
docker version / crictl version / nerdctl version 结果贴在下方

# containerd版本

[root@localhost ~]# containerd -v

containerd github.com/containerd/containerd v1.7.16 83031836b2cf55637d7abf847b17134c51b38e53

KubeSphere版本信息
例如:4.1.1。离线安装。在已有K8s上安装。

问题是什么
在尝试部署kubeedge组件时发现无法添加边缘节点,一直提示失败,期间修改了svc模式、端口等均无效。

# 边缘节点本地镜像

[root@localhost ~]# nerdctl images

REPOSITORY TAG IMAGE ID CREATED PLATFORM SIZE BLOB SIZE

kubeedge/installation-package v1.13.1 129895c7f5ca 23 hours ago linux/arm64 199.7 MiB 193.5 MiB

kubeedge/pause 3.6 8d94e670ae98 23 hours ago linux/arm64 480.0 KiB 475.8 KiB

eclipse-mosquitto 1.6.15 a8772266908e 23 hours ago linux/arm64 12.3 MiB 11.3 MiB

registry.k8s.io/pause 3.9 1a190d4cc73c 22 hours ago linux/arm64 508.0 KiB 504.9 KiB

#在界面上添加边缘节点时的验证内容复制如下:

arch=$(uname -m); if [[ $arch ≠ x86_64 ]]; then arch=‘arm64’; else arch=‘amd64’; fi; curl -LO https://kubeedge.pek3b.qingstor.com/bin/v1.13.1/$arch/keadm-v1.13.1-linux-$arch.tar.gz && tar xvf keadm-v1.13.1-linux-$arch.tar.gz && mv keadm-v1.13.1-linux-$arch/keadm/keadm . && chmod +x keadm && ./keadm join –kubeedge-version=1.13.1 –cloudcore-ipport=192.168.160.130:10000 –quicport 10001 –certport 10002 –tunnelport 10004 –edgenode-name edgenode-a0qs –token e0befa54dcede093089d62c673f98692aa3240fdefa9483e0717b87fcf775881.eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE3NTAzNzgxMzV9.YXZ1ZEAjIr1yOvFGBmjRjphSRbnVugTQp6c-K6rnQFk –with-edge-taint

#这边将cloudcore部署后修改了的loadbalancer IP,信息如下:

[root@master1 ~]# kubectl get svc -n kubeedge

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE

cloudcore LoadBalancer 10.102.21.106 192.168.160.130 10000:30000/TCP,10001:30001/TCP,10002:30002/TCP,10003:30003/TCP,10004:30004/TCP 26m

kubeedge-proxy ClusterIP 10.101.203.199 <none> 80/TCP 26m

# 在待添加的边缘节点上执行的命令

[root@localhost ~]# keadm join –kubeedge-version=1.13.1 –cloudcore-ipport=192.168.160.130:10000 –quicport 10001 –certport 10002 –tunnelport 10004 –edgenode-name edgenode-lnum –token e0befa54dcede093089d62c673f98692aa3240fdefa9483e0717b87fcf775881.eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE3NTAzNzgxMzV9.YXZ1ZEAjIr1yOvFGBmjRjphSRbnVugTQp6c-K6rnQFk –with-edge-taint

I0619 08:15:08.197219 4265 command.go:845] 1. Check KubeEdge edgecore process status

I0619 08:15:08.208677 4265 command.go:845] 2. Check if the management directory is clean

I0619 08:15:08.208930 4265 join.go:110] 3. Create the necessary directories

I0619 08:15:08.221154 4265 join.go:202] 4. Pull Images

Pulling kubeedge/installation-package:v1.13.1 …

Successfully pulled kubeedge/installation-package:v1.13.1

Pulling eclipse-mosquitto:1.6.15 …

Successfully pulled eclipse-mosquitto:1.6.15

Pulling kubeedge/pause:3.6 …

Successfully pulled kubeedge/pause:3.6

I0619 08:15:08.224059 4265 join.go:202] 5. Copy resources from the image to the management directory

I0619 08:15:11.981120 4265 join.go:202] 6. Start the default mqtt service

I0619 08:15:11.981635 4265 join.go:110] 7. Generate systemd service file

I0619 08:15:11.981911 4265 join.go:110] 8. Generate EdgeCore default configuration

I0619 08:15:11.981953 4265 join.go:288] The configuration does not exist or the parsing fails, and the default configuration is generated

I0619 08:15:52.014966 4265 join.go:110] 9. Run EdgeCore daemon

I0619 08:15:52.875831 4265 join.go:505]

I0619 08:15:52.875863 4265 join.go:506] KubeEdge edgecore is running, For logs visit: journalctl -u edgecore.service -xe

Error: edge node join failed: timed out waiting for the condition

execute keadm command failed: edge node join failed: timed out waiting for the condition

# 检查的edgecore服务

[root@localhost ~]# journalctl -u edgecore.service -xe

Jun 19 08:20:35 localhost systemd[1]: Started edgecore.service.

Jun 19 08:21:35 localhost edgecore[11199]: I0619 08:21:35.754628 11199 server.go:102] Version: v1.13.1

Jun 19 08:21:35 localhost edgecore[11199]: I0619 08:21:35.761288 11199 sql.go:21] Begin to register twin db model

Jun 19 08:21:35 localhost edgecore[11199]: I0619 08:21:35.761536 11199 module.go:52] Module twin registered successfully

Jun 19 08:21:35 localhost systemd[1]: Started Kubernetes systemd probe.

Jun 19 08:21:35 localhost edgecore[11199]: I0619 08:21:35.782137 11199 module.go:52] Module edged registered successfully

Jun 19 08:21:35 localhost edgecore[11199]: I0619 08:21:35.782185 11199 module.go:52] Module websocket registered successfully

Jun 19 08:21:35 localhost edgecore[11199]: I0619 08:21:35.782200 11199 module.go:52] Module eventbus registered successfully

Jun 19 08:21:35 localhost edgecore[11199]: I0619 08:21:35.782239 11199 metamanager.go:41] Begin to register metamanager db model

Jun 19 08:21:35 localhost edgecore[11199]: I0619 08:21:35.782343 11199 module.go:52] Module metamanager registered successfully

Jun 19 08:21:35 localhost edgecore[11199]: W0619 08:21:35.782362 11199 module.go:55] Module servicebus is disabled, do not register

Jun 19 08:21:35 localhost edgecore[11199]: I0619 08:21:35.782394 11199 module.go:52] Module edgestream registered successfully

Jun 19 08:21:35 localhost edgecore[11199]: W0619 08:21:35.782406 11199 module.go:55] Module testManager is disabled, do not register

Jun 19 08:21:35 localhost systemd[1]: run-rf9bc3a13c68c4e97bb96003b514e138f.scope: Succeeded.

Jun 19 08:21:35 localhost edgecore[11199]: table `device` already exists, skip

Jun 19 08:21:35 localhost edgecore[11199]: table `device_attr` already exists, skip

Jun 19 08:21:35 localhost edgecore[11199]: table `device_twin` already exists, skip

Jun 19 08:21:35 localhost edgecore[11199]: table `sub_topics` already exists, skip

Jun 19 08:21:35 localhost edgecore[11199]: table `meta` already exists, skip

Jun 19 08:21:35 localhost edgecore[11199]: table `meta_v2` already exists, skip

Jun 19 08:21:35 localhost edgecore[11199]: table `target_urls` already exists, skip

Jun 19 08:21:35 localhost edgecore[11199]: I0619 08:21:35.935298 11199 core.go:46] starting module eventbus

Jun 19 08:21:35 localhost edgecore[11199]: I0619 08:21:35.935357 11199 core.go:46] starting module metamanager

Jun 19 08:21:35 localhost edgecore[11199]: I0619 08:21:35.935391 11199 core.go:46] starting module edgestream

Jun 19 08:21:35 localhost edgecore[11199]: I0619 08:21:35.935425 11199 core.go:46] starting module twin

Jun 19 08:21:35 localhost edgecore[11199]: I0619 08:21:35.935457 11199 core.go:46] starting module edged

Jun 19 08:21:35 localhost edgecore[11199]: I0619 08:21:35.935493 11199 core.go:46] starting module websocket

Jun 19 08:21:35 localhost edgecore[11199]: I0619 08:21:35.935657 11199 process.go:117] Begin to sync sqlite

Jun 19 08:21:35 localhost edgecore[11199]: I0619 08:21:35.935725 11199 http.go:40] tlsConfig InsecureSkipVerify true

Jun 19 08:21:35 localhost edgecore[11199]: I0619 08:21:35.935819 11199 common.go:97] start connect to mqtt server with client id: hub-client-sub-1750292495

Jun 19 08:21:35 localhost edgecore[11199]: I0619 08:21:35.935841 11199 common.go:99] client hub-client-sub-1750292495 isconnected: false

Jun 19 08:21:35 localhost edgecore[11199]: I0619 08:21:35.936095 11199 edged.go:121] Starting edged…

Jun 19 08:21:35 localhost edgecore[11199]: I0619 08:21:35.936274 11199 dmiworker.go:67] dmi worker start

Jun 19 08:21:35 localhost mosquitto[719]: 1750292495: New connection from 127.0.0.1 on port 1883.

Jun 19 08:21:35 localhost mosquitto[719]: 1750292495: New client connected from 127.0.0.1 as hub-client-sub-1750292495 (p2, c1, k30).

Jun 19 08:21:35 localhost edgecore[11199]: I0619 08:21:35.936463 11199 dmiworker.go:215] success to init device model info from db

Jun 19 08:21:35 localhost edgecore[11199]: I0619 08:21:35.936530 11199 dmiworker.go:235] success to init device info from db

Jun 19 08:21:35 localhost edgecore[11199]: I0619 08:21:35.936585 11199 dmiworker.go:255] success to init device mapper info from db

Jun 19 08:21:35 localhost edgecore[11199]: I0619 08:21:35.936608 11199 server.go:183] init uds socket: /etc/kubeedge/dmi.sock

Jun 19 08:21:35 localhost edgecore[11199]: I0619 08:21:35.937324 11199 client.go:134] finish hub-client sub

Jun 19 08:21:35 localhost edgecore[11199]: I0619 08:21:35.937410 11199 common.go:97] start connect to mqtt server with client id: hub-client-pub-1750292495

Jun 19 08:21:35 localhost edgecore[11199]: I0619 08:21:35.937429 11199 common.go:99] client hub-client-pub-1750292495 isconnected: false

Jun 19 08:21:35 localhost edgecore[11199]: I0619 08:21:35.937851 11199 client.go:89] edge-hub-cli subscribe topic to $hw/events/upload/#

Jun 19 08:21:35 localhost edgecore[11199]: I0619 08:21:35.937944 11199 client.go:153] finish hub-client pub

Jun 19 08:21:35 localhost edgecore[11199]: I0619 08:21:35.937957 11199 eventbus.go:71] Init Sub And Pub Client for external mqtt broker tcp://127.0.0.1:1883 successfully

Jun 19 08:21:35 localhost edgecore[11199]: I0619 08:21:35.938028 11199 client.go:89] edge-hub-cli subscribe topic to $hw/events/device/+/state/update

Jun 19 08:21:35 localhost mosquitto[719]: 1750292495: New connection from 127.0.0.1 on port 1883.

Jun 19 08:21:35 localhost mosquitto[719]: 1750292495: New client connected from 127.0.0.1 as hub-client-pub-1750292495 (p2, c1, k30).

Jun 19 08:21:35 localhost edgecore[11199]: I0619 08:21:35.938459 11199 client.go:89] edge-hub-cli subscribe topic to $hw/events/device/+/twin/+

Jun 19 08:21:35 localhost edgecore[11199]: I0619 08:21:35.938670 11199 client.go:89] edge-hub-cli subscribe topic to $hw/events/node/+/membership/get

Jun 19 08:21:35 localhost edgecore[11199]: I0619 08:21:35.938885 11199 client.go:89] edge-hub-cli subscribe topic to SYS/dis/upload_records

Jun 19 08:21:35 localhost edgecore[11199]: I0619 08:21:35.939068 11199 client.go:89] edge-hub-cli subscribe topic to +/user/#

Jun 19 08:21:35 localhost edgecore[11199]: I0619 08:21:35.939263 11199 client.go:97] list edge-hub-cli-topics status, no record, skip sync

Jun 19 08:21:36 localhost edgecore[11199]: F0619 08:21:36.002217 11199 certmanager.go:96] Error: failed to get edge certificate from the cloudcore, error: Get “https://192.168.160.130:10002/edge.crt”: x509: certificate is valid for 10.244.136.10, not 192.168.160.130

Jun 19 08:21:36 localhost mosquitto[719]: 1750292496: Socket error on client hub-client-pub-1750292495, disconnecting.

Jun 19 08:21:36 localhost mosquitto[719]: 1750292496: Socket error on client hub-client-sub-1750292495, disconnecting.

Jun 19 08:21:36 localhost systemd[1]: edgecore.service: Main process exited, code=exited, status=1/FAILURE

Jun 19 08:21:36 localhost systemd[1]: edgecore.service: Failed with result ‘exit-code’.

Jun 19 08:21:46 localhost systemd[1]: edgecore.service: Service RestartSec=10s expired, scheduling restart.

Jun 19 08:21:46 localhost systemd[1]: edgecore.service: Scheduled restart job, restart counter is at 5.

Jun 19 08:21:46 localhost systemd[1]: Stopped edgecore.service.

Jun 19 08:21:46 localhost systemd[1]: Started edgecore.service.

# 系统日志输出如下:

[root@localhost ~]# tail -f /var/log/messages

Jun 19 08:35:41 localhost edgecore[27858]: I0619 08:35:41.771219 27858 client.go:97] list edge-hub-cli-topics status, no record, skip sync

Jun 19 08:35:41 localhost edgecore[27858]: F0619 08:35:41.818255 27858 certmanager.go:96] Error: failed to get edge certificate from the cloudcore, error: Get “https://192.168.160.130:10002/edge.crt”: x509: certificate is valid for 10.244.136.10, not 192.168.160.130

Jun 19 08:35:41 localhost mosquitto[719]: 1750293341: Socket error on client hub-client-pub-1750293341, disconnecting.

Jun 19 08:35:41 localhost mosquitto[719]: 1750293341: Socket error on client hub-client-sub-1750293341, disconnecting.

Jun 19 08:35:41 localhost systemd[1]: edgecore.service: Main process exited, code=exited, status=1/FAILURE

Jun 19 08:35:41 localhost systemd[1]: edgecore.service: Failed with result ‘exit-code’.

Jun 19 08:35:51 localhost systemd[1]: edgecore.service: Service RestartSec=10s expired, scheduling restart.

Jun 19 08:35:51 localhost systemd[1]: edgecore.service: Scheduled restart job, restart counter is at 17.

Jun 19 08:35:51 localhost systemd[1]: Stopped edgecore.service.

Jun 19 08:35:51 localhost systemd[1]: Started edgecore.service.

# 查看mosquitto日志

[root@localhost ~]# journalctl -u mosquitto.service

6月 19 08:34:31 localhost.localdomain mosquitto[719]: 1750293271: New connection from 127.0.0.1 on port 1883.

6月 19 08:34:31 localhost.localdomain mosquitto[719]: 1750293271: New client connected from 127.0.0.1 as hub-client-sub-1750293271 (p2, c1, k30).

6月 19 08:34:31 localhost.localdomain mosquitto[719]: 1750293271: New connection from 127.0.0.1 on port 1883.

6月 19 08:34:31 localhost.localdomain mosquitto[719]: 1750293271: New client connected from 127.0.0.1 as hub-client-pub-1750293271 (p2, c1, k30).

6月 19 08:34:31 localhost.localdomain mosquitto[719]: 1750293271: Socket error on client hub-client-pub-1750293271, disconnecting.

6月 19 08:34:31 localhost.localdomain mosquitto[719]: 1750293271: Socket error on client hub-client-sub-1750293271, disconnecting.

6月 19 08:35:41 localhost.localdomain mosquitto[719]: 1750293341: New connection from 127.0.0.1 on port 1883.

6月 19 08:35:41 localhost.localdomain mosquitto[719]: 1750293341: New client connected from 127.0.0.1 as hub-client-sub-1750293341 (p2, c1, k30).

6月 19 08:35:41 localhost.localdomain mosquitto[719]: 1750293341: New connection from 127.0.0.1 on port 1883.

6月 19 08:35:41 localhost.localdomain mosquitto[719]: 1750293341: New client connected from 127.0.0.1 as hub-client-pub-1750293341 (p2, c1, k30).

6月 19 08:35:41 localhost.localdomain mosquitto[719]: 1750293341: Socket error on client hub-client-pub-1750293341, disconnecting.

6月 19 08:35:41 localhost.localdomain mosquitto[719]: 1750293341: Socket error on client hub-client-sub-1750293341, disconnecting.

首先,cloudcore的配置文件里并没有配置10.244.136.10,10.244.136.10为容器的IP,但集群内部并没有这个IP的pod,完成看不懂这个IP是从哪儿来的,还是说说被写死在代码里了?

Jun 19 08:35:41 localhost edgecore[27858]: F0619 08:35:41.818255 27858 certmanager.go:96] Error: failed to get edge certificate from the cloudcore, error: Get “https://192.168.160.130:10002/edge.crt”: x509: certificate is valid for 10.244.136.10, not 192.168.160.130

kubeedge的版本为1.13.1,下载的keadm是从验证内容复制出来进行下载的,涉及到了几个镜像如下:

# 边缘节点本地镜像

[root@localhost ~]# nerdctl images

REPOSITORY TAG IMAGE ID CREATED PLATFORM SIZE BLOB SIZE

kubeedge/installation-package v1.13.1 129895c7f5ca 23 hours ago linux/arm64 199.7 MiB 193.5 MiB

kubeedge/pause 3.6 8d94e670ae98 23 hours ago linux/arm64 480.0 KiB 475.8 KiB

eclipse-mosquitto 1.6.15 a8772266908e 23 hours ago linux/arm64 12.3 MiB 11.3 MiB

registry.k8s.io/pause 3.9 1a190d4cc73c 22 hours ago linux/arm64 508.0 KiB 504.9 KiB

其中,kubeedge/installation-package:v1.13.1 镜像也无法找到相关代码工程,想进一步排查但无查看起内部的edgeore的来源。

因此,希望能否通过更为详细的离线kubeedge cloudcore部署及edgecore在边缘节点部署的说明,太难排查原因了,官网部署说明稍显简陋了

部署时也是从官网的添加边缘节点的步骤进行,但是还是无法添加边缘节点成功