mobile wallpaper 1mobile wallpaper 2mobile wallpaper 3mobile wallpaper 4mobile wallpaper 5mobile wallpaper 6
1707 字
9 分钟
蓝盾v7.1 mac intel系列芯片 构建机业务构建进程异常退出

背景#

因为一些特殊的场景需要,找了一台老的intel macmini当作构建机。但是在执行任务时遇见了报错:构建进程执行错误.

image-20251015223017295

这里的报错其实已经提示了一些重要信息

查看agent日志:devopsAgent.log#

在agent日志里找到了重要信息,这里的信息与蓝盾前端一致。

2025-10-14 21:14:26.848|info|build[b-a5082d1c7fc848c8a896349ea2450453] finish, delete:/Users/m100101825/bkci/build_tmp/b-a5082d1c7fc848c8a896349ea2450453_1_build_msg.log, err:%!s(<nil>)
2025-10-14 21:14:26.848|info|url:[https://devops.bk.diezhi.net/ms/dispatch/api/buildAgent/agent/thirdPartyAgent/workerBuildFinish]|request body: {"projectId":"client","buildId":"b-a5082d1c7fc848c8a896349ea2450453","vmSeqId":"1","workspace":"","pipelineId":"p-ccf93f681a7543cab29f16fa59569ccd","dockerBuildInfo":null,"executeCount":1,"containerHashId":"c-9228fbfd76154bef961d3fba0272f1ad","success":false,"message":"业务构建进程异常退出,可能被操作系统或其他程序杀掉,需自查并降低负载后重试,或解压 agent.zip 还原安装后重启agent再重试。","error":{"errorType":"USER","errorMessage":"构建进程执行错误","errorCode":2128040}}

我们看下这个报错:业务构建进程异常退出,可能被操作系统或其他程序杀掉,需自查并降低负载后重试,或解压 agent.zip 还原安装后重启agent再重试

agent安装流程#

当我们尝试导入一台构建机时,实际是分成了几个步骤:

step1:生成安装链接#

# 生成MACOS 链接
https://devops.bk.diezhi.net/ms/environment/api/user/environment/thirdPartyAgent/projects/client/os/MACOS/generateLink
# 生成 LINUX 链接
https://devops.bk.diezhi.net/ms/environment/api/user/environment/thirdPartyAgent/projects/client/os/LINUX/generateLink
# 生成Windows 链接
https://devops.bk.diezhi.net/ms/environment/api/user/environment/thirdPartyAgent/projects/client/os/WINDOWS/generateLink

安装链接里会包含重要的agentid,例如下面的dnprdwpe:

curl -H "X-DEVOPS-PROJECT-ID: client" https://devops.bk.diezhi.net/ms/environment/api/external/thirdPartyAgent/dnprdwpe/install | bash

step2:安装agent#

在构建机上执行安装命令,以macos的为例:

step2.1:判断构建机的芯片架构:#

function initArch() {
ARCH=$(uname -m)
case $ARCH in
aarch64) ARCH="arm64";;
arm64) ARCH="arm64";;
mips64) ARCH="mips64";;
*) ARCH="";;
esac
}

关于ARCH的取值:

  • 在 Apple Silicon Mac 上:ARCH=“arm64”
  • 在 x86_64 机器(以及其他)上:ARCH="" # 空字符串,需要额外处理
  • 在 MIPS64 机器上:ARCH=“mips64”

step2.2:下载agent#

function download_agent()
{
echo "start download agent install package"
if [[ -f "agent.zip" ]]; then
echo "agent.zip already exist, skip download"
return
fi
if exists curl; then
curl -H "X-DEVOPS-PROJECT-ID: client" -o agent.zip "https://devops.bk.diezhi.net/ms/environment/api/external/thirdPartyAgent/ydmdkgpw/agent?arch=${ARCH}"
if [[ $? -ne 0 ]]; then
echo "fail to use curl to download the agent, use wget"
wget --header="X-DEVOPS-PROJECT-ID: client" -O agent.zip "https://devops.bk.diezhi.net/ms/environment/api/external/thirdPartyAgent/ydmdkgpw/agent?arch=${ARCH}"
fi
elif exists wget; then
wget --header="X-DEVOPS-PROJECT-ID: client" -O agent.zip "https://devops.bk.diezhi.net/ms/environment/api/external/thirdPartyAgent/ydmdkgpw/agent?arch=${ARCH}"
else
echo "curl & wget command don't exist, download fail"
exit 1
fi
}

会根据agentid、ARCH来拼接agent下载链接,因为构建机是的架构是x86_64,所以ARCH的值为空值,agentid是dnprdwpe,最终的agent下载链接:

https://devops.bk.diezhi.net/ms/environment/api/external/thirdPartyAgent/ydmdkgpw/agent?arch=

step2.3:解压agent资源#

先从agent.zip解压全部资源,

unzip -o agent.zip
unzip_jdk

再从解压出来的资源里,解压jdk资源

function unzip_jdk()
{
echo "start unzipping the jdk package"
if [[ -d "jdk" ]]; then
echo "jdk already exists, skip unzip"
return
fi
unzip -q -o jre.zip -d jdk
}

step2.4:验证jdk版本#

echo "check java version"
jdk/Contents/Home/bin/java -version

此处的验证jdk的执行路径是jdk/Contents/Home/bin/,这里的路径就是本次出现问题的原因!!!

step2.5:尝试卸载agent#

这里是为了卸载已经存在的旧的agent环境。

function uninstallAgentService()
{
if [[ "$user" != "root" && -f ~/Library/LaunchAgents/$(getServiceName).plist ]]; then
echo "remove run at load"
rm -f ~/Library/LaunchAgents/$(getServiceName).plist
fi
cd ${workspace}
chmod +x *.sh
${workspace}/stop.sh
}

主要是进行守护进程plist文件的清理,agent的停止等操作

step2.6:安装agent#

function installAgentService()
{
if [[ "$user" != "root" ]]; then
echo "add run at load with user ${user}"
addRunAtLoad
fi
cd ${workspace}
chmod +x *.sh
${workspace}/start.sh
}
function addRunAtLoad()
{
mkdir -p ~/Library/LaunchAgents
cat > ~/Library/LaunchAgents/$(getServiceName).plist <<EOF
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>$(getServiceName)</string>
<key>Program</key>
<string>${workspace}/devopsDaemon</string>
<key>RunAtLoad</key>
<true/>
<key>WorkingDirectory</key>
<string>${workspace}</string>
<key>KeepAlive</key>
<false/>
</dict>
</plist>
EOF
}

主要进行守护进程plist文件的创建,agent的启动等操作。

问题排查#

我们看下这个报错:业务构建进程异常退出,可能被操作系统或其他程序杀掉,需自查并降低负载后重试,或解压 agent.zip 还原安装后重启agent再重试

该问题七言曾经和蓝盾内部人员沟通过,多发与比较老的intel芯片架构的mac机器,根因是蓝盾agent打包时,目录结构存在问题。

jdk的执行路径是:jdk/Contents/Home/bin/

所以需要验证机器上是否存在该路径:

# 进入蓝盾agent安装目录
cd /Users/m100101825/bkci
# 进入jre目录,也可能是jdk目录
cd /jdk
# 查看目录结构
ls -la

经过验证确实是因为缺失了jdk/Contents/Home/bin/。

所以问题的根因就是下载了与该平台不匹配的agent资源包,所以agent找不到jdk,导致运行失败。

接下来需要确定蓝盾server端保存的agent资源包目录结构:

# 找到environment的pod,下面两个命令二选一
kubectl get pod -n blueking |grep bk-ci-bk-ci-environment
kubectl get pod -n blueking -l app.kubernetes.io/name=environment
# 登陆environment的pod:将bk-ci-bk-ci-environment-985dc6b59-hhtbn替换为实际podName即可
kubectl exec -it pod/bk-ci-bk-ci-environment-985dc6b59-hhtbn -n blueking -- bash
# 进入agent资源包路径
cd /data/workspace/agent-package/
# 进入jre目录,并查看jre(jre.zip)支持的平台,
cd jre && ls -l

jre支持的平台情况:

total 24
drwxr-xr-x 1 root root 4096 Sep 7 2023 linux
drwxr-xr-x 1 root root 4096 Sep 7 2023 linux_arm64
drwxr-xr-x 2 root root 4096 Sep 7 2023 linux_mips64
drwxr-xr-x 1 root root 4096 Sep 7 2023 macos
drwxr-xr-x 1 root root 4096 Sep 7 2023 macos_arm64
drwxr-xr-x 1 root root 4096 Sep 7 2023 windows

由于前面说的操作系统、agentid、ARCH的值情况,可以获知这台intel芯片的macos构建机使用的是macos目录下的jre.zip.

来看下目前的jre.zip的结构:

# 创建一个测试目录,并将现在的jre.zip解压进去
mkdir test && unzip jre.zip -d test
# 查看目录结构
cd test && ls -l

目录结果:

total 212
-r-xr-xr-x 1 root root 1522 May 9 2023 ASSEMBLY_EXCEPTION
drwxrwxrwx 2 root root 4096 Jul 4 2023 bin
drwxrwxrwx 10 root root 4096 Jul 4 2023 demo
drwxrwxrwx 3 root root 4096 Jul 4 2023 include
drwxrwxrwx 4 root root 4096 Jul 4 2023 jre
drwxrwxrwx 2 root root 4096 May 9 2023 lib
-r-xr-xr-x 1 root root 19274 May 9 2023 LICENSE
drwxrwxrwx 5 root root 4096 Jul 4 2023 man
-rwxrwxrwx 1 root root 105 May 9 2023 release
drwxrwxrwx 11 root root 4096 Jul 4 2023 sample
-r-xr-xr-x 1 root root 157063 May 9 2023 THIRD_PARTY_README

确实是缺失了/Contents/Home/bin/这个目录!!!

解决方案#

永久解决方案:#

想永久解决该问题,后续的所有intel芯片macos都可以直接一键安装,就需要重新打包正确的agent.zip资源包。

# 登陆跳板机(10.212.32.32),并切换到root用户
kubecm switch platform-pops
# 找到environment的pod,下面两个命令二选一
kubectl get pod -n blueking |grep bk-ci-bk-ci-environment
kubectl get pod -n blueking -l app.kubernetes.io/name=environment
# 登陆environment的pod:将bk-ci-bk-ci-environment-985dc6b59-hhtbn替换为实际podName即可
kubectl exec -it pod/bk-ci-bk-ci-environment-985dc6b59-hhtbn -n blueking -- bash
# 进入agent资源包路径
cd /data/workspace/agent-package/jre/macos
# 手动创建缺失的目录
mkdir -p Contents/Home
# 将jre.zip 解压进去
unzip jre.zip -d Contents/Home
# 重新打包jre.zip
zip -r jre.zip Contents

构建机临时方案:#

本质上就是和在bk-ci的environment的pod里面做的事情一样,构造正确的jre目录结构

# 进入蓝盾agent安装目录
cd /Users/m100101825/bkci/jdk
# 进入jre目录,也可能是jdk目录
cd /jdk
# 将jre.zip 复制过来
cp ../jre.zip ./
# 创建缺失的Contents/Home 目录
mkdir -p Contents/Home
# 将jre.zip解压到Contents/Home,然后agent就可以通过jdk/Contents/Home/bin/来找到jdk
unzip jre.zip -d Contents/Home
# 重新打包jre,以防下次使用本机上的agent资源包安装时,修复的jdk被覆盖
zip jre.zip -d Contents && mv jre.zip ../jre.zip
# 回到agent安装目录,并重新启动agent
cd /Users/m100101825/bkci && bash stop.sh && bash start.sh
分享

如果这篇文章对你有帮助,欢迎分享给更多人!

蓝盾v7.1 mac intel系列芯片 构建机业务构建进程异常退出
https://hua-ri.cn/posts/蓝盾v71mac构建机业务构建进程异常退出/
作者
花日
发布于
2026-01-05
许可协议
CC BY-NC-SA 4.0

部分信息可能已经过时