mobile wallpaper 1mobile wallpaper 2mobile wallpaper 3mobile wallpaper 4mobile wallpaper 5mobile wallpaper 6
5408 字
27 分钟
Argo Workflows 完全实战指南

Argo Workflows 完全实战指南#

1. Argo Workflows 简介#

1.1 什么是 Argo Workflows#

Argo Workflows 是一个开源的容器原生工作流引擎,用于在 Kubernetes 上编排并行任务。它通过 Kubernetes CRD(Custom Resource Definition)实现,是 CNCF 毕业项目。

官方文档https://argo-workflows.readthedocs.io
源码地址https://github.com/argoproj/argo-workflows

1.2 核心特性#

  • 容器原生:每个工作流步骤都在容器中运行,无需传统 VM 环境的开销
  • DAG 支持:支持有向无环图(DAG)定义复杂的任务依赖关系
  • 并行执行:自动并行执行独立任务,提升执行效率
  • 工件管理:支持在步骤间传递文件和数据
  • 参数化:支持工作流参数化,实现可复用的模板
  • 条件执行:支持基于条件的分支执行
  • 重试机制:内置失败重试和超时控制
  • 可视化界面:提供直观的 Web UI 查看工作流执行状态

1.3 应用场景#

场景说明典型用例
CI/CD 流水线构建、测试、部署自动化代码编译、单元测试、镜像构建、部署发布
数据处理批量数据处理和 ETL数据清洗、转换、聚合、导入导出
机器学习ML 模型训练和推理数据预处理、模型训练、超参数调优、模型评估
基础设施自动化集群管理和运维任务备份恢复、资源清理、健康检查、批量操作

1.4 Argo Workflows vs 其他工作流引擎#

特性Argo WorkflowsJenkinsAirflow
运行环境Kubernetes 原生独立服务器/容器独立服务器/容器
定义方式YAML(声明式)Groovy/Pipeline(命令式)Python(命令式)
容器支持原生支持,每步骤一个容器需要插件需要 KubernetesExecutor
并行执行自动并行需要手动配置支持,但配置复杂
可扩展性基于 K8s,自动扩展需要手动配置节点需要配置 Worker
学习曲线中等(需要了解 K8s)中等(需要了解 Python)

2. 环境准备#

2.1 创建 Kind 集群#

在开始之前,我们需要一个实验环境。如果你没有现成的Kubernetes集群,可以基于kind搭建测试集群快速创建一个:

配置文件:

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
networking:
apiServerAddress: "10.10.151.201"
nodes:
- role: control-plane
extraPortMappings:
- containerPort: 6443
hostPort: 6443
listenAddress: "10.10.151.201"
protocol: tcp
- role: control-plane
- role: control-plane
- role: worker
- role: worker
- role: worker

创建高可用集群命令:

sudo kind create cluster --config=huari.yaml --name huari-test --image kindest/node:v1.34.0 --retain; sudo kind export logs --name huari-test

切换kubectl上下文:

sudo kubectl cluster-info --context kind-huari-test

查看信息:

# 查看集群节点
sudo kubectl get nodes
# 查看集群全部的pod
sudo kubectl get pods -A -owide

删除集群:

sudo kind delete cluster --name huari-test

3. 安装 Argo Workflows#

3.1 使用 Helm 安装#

# 添加 Argo Helm 仓库
helm repo add argo https://argoproj.github.io/argo-helm
helm repo update
# 安装 Argo Workflows(测试环境配置)
helm upgrade --install argo-workflows argo/argo-workflows \
--version 0.47.4 \
--namespace argo \
--create-namespace \
--set server.secure=false \
--set server.extraArgs[0]="--auth-mode=server" \
--set workflow.serviceAccount.create=true \
--set workflow.serviceAccount.name=argo-workflow \
--set workflow.rbac.create=true \
--set 'workflow.rbac.rules[0].apiGroups[0]=' \
--set 'workflow.rbac.rules[0].apiGroups[1]=apps' \
--set 'workflow.rbac.rules[0].resources[0]=*' \
--set 'workflow.rbac.rules[0].verbs[0]=*'

配置说明

  • server.secure=false:禁用 HTTPS,使用 HTTP(测试环境)
  • --auth-mode=server:使用服务器模式认证(无需登录)
  • workflow.serviceAccount.create=true:自动创建工作流 ServiceAccount
  • workflow.serviceAccount.name=argo-workflow:ServiceAccount 名称
  • workflow.rbac.create=true:自动创建 RBAC 权限
  • workflow.rbac.rules[0]:给予工作流在 argo 命名空间的完整资源操作权限(测试环境)

如果网络存在问题,也可以从本地 chart 进行安装:

# 下载 chart
helm pull argo/argo-workflows --version 0.47.4
# 从本地安装
helm upgrade --install argo-workflows ./argo-workflows-0.47.4.tgz \
--namespace argo \
--create-namespace \
--set server.secure=false \
--set server.extraArgs[0]="--auth-mode=server" \
--set workflow.serviceAccount.create=true \
--set workflow.serviceAccount.name=argo-workflow \
--set workflow.rbac.create=true \
--set 'workflow.rbac.rules[0].apiGroups[0]=' \
--set 'workflow.rbac.rules[0].apiGroups[1]=apps' \
--set 'workflow.rbac.rules[0].resources[0]=*' \
--set 'workflow.rbac.rules[0].verbs[0]=*'

3.2 验证安装#

# 查看 Pod 状态
kubectl get pods -n argo -w

期望输出:

NAME READY STATUS RESTARTS AGE
argo-workflows-server-695449f55-jbqvt 1/1 Running 0 76s
argo-workflows-workflow-controller-858ff4bbc7-pjl9h 1/1 Running 0 76s

3.4 配置工件仓库(可选)#

如果需要在工作流步骤间传递文件(Artifacts),需要配置工件仓库。支持 S3、GCS、OSS 等对象存储。

使用 MinIO(本地测试)#

# 安装 MinIO(单节点模式,适合测试环境)
helm repo add minio https://charts.min.io/
helm install minio minio/minio \
--namespace argo \
--set mode=standalone \
--set replicas=1 \
--set persistence.enabled=false \
--set rootUser=admin \
--set rootPassword=password123 \
--set resources.requests.memory=512Mi
# 创建 Secret
kubectl create secret generic minio-credentials -n argo \
--from-literal=accesskey=admin \
--from-literal=secretkey=password123
# 配置 Artifact Repository
kubectl apply -f - <<EOF
apiVersion: v1
kind: ConfigMap
metadata:
name: artifact-repositories
namespace: argo
data:
default-v1: |
s3:
bucket: argo-artifacts
endpoint: minio.argo.svc.cluster.local:9000
insecure: true
accessKeySecret:
name: minio-credentials
key: accesskey
secretKeySecret:
name: minio-credentials
key: secretkey
EOF
# 创建 bucket
kubectl apply -f - <<EOF
apiVersion: batch/v1
kind: Job
metadata:
name: minio-setup
namespace: argo
spec:
template:
spec:
restartPolicy: OnFailure
containers:
- name: mc
image: minio/mc:latest
command:
- /bin/sh
- -c
- |
mc alias set minio http://minio.argo.svc.cluster.local:9000 admin password123
mc mb minio/argo-artifacts || true
mc ls minio/
EOF
# 查看 Job 执行结果
kubectl wait --for=condition=complete --timeout=60s job/minio-setup -n argo
kubectl logs job/minio-setup -n argo
# 清理 Job
kubectl delete job minio-setup -n argo
# 配置 Workflow Controller 使用 Artifact Repository
kubectl patch configmap argo-workflows-workflow-controller-configmap -n argo --type merge -p '{
"data": {
"config": "nodeEvents:\n enabled: true\nworkflowEvents:\n enabled: true\nartifactRepository:\n archiveLogs: false\n s3:\n bucket: argo-artifacts\n endpoint: minio.argo.svc.cluster.local:9000\n insecure: true\n accessKeySecret:\n name: minio-credentials\n key: accesskey\n secretKeySecret:\n name: minio-credentials\n key: secretkey\n"
}
}'
# 等待 Workflow Controller 自动重启并加载配置
kubectl get pods -n argo -l app=workflow-controller -w

重要说明

  • Workflow Controller 的 ConfigMap 中只能有一个 config
  • artifactRepository 配置必须放在 config 键的内容中
  • 配置更新后,Workflow Controller 会自动重启

使用阿里云 OSS#

apiVersion: v1
kind: Secret
metadata:
name: oss-credentials
namespace: argo
type: Opaque
stringData:
accessKey: YOUR_OSS_ACCESS_KEY_ID
secretKey: YOUR_OSS_ACCESS_KEY_SECRET
---
apiVersion: v1
kind: ConfigMap
metadata:
name: artifact-repositories
namespace: argo
data:
default-v1: |
s3:
bucket: argo-artifacts
endpoint: oss-cn-hangzhou.aliyuncs.com
insecure: false
accessKeySecret:
name: oss-credentials
key: accessKey
secretKeySecret:
name: oss-credentials
key: secretKey

说明

  • endpoint:根据你的 OSS Bucket 所在地域修改,例如:
    • 杭州:oss-cn-hangzhou.aliyuncs.com
    • 北京:oss-cn-beijing.aliyuncs.com
    • 上海:oss-cn-shanghai.aliyuncs.com
    • 深圳:oss-cn-shenzhen.aliyuncs.com
  • bucket:替换为你的 OSS Bucket 名称
  • OSS 兼容 S3 协议,因此使用 s3 配置即可

4. 安装 Argo CLI#

macOS#

# 使用 Homebrew
brew install argo
# 或手动下载
ARGO_WORKFLOWS_VERSION=v3.7.10
curl -sLO https://github.com/argoproj/argo-workflows/releases/download/${ARGO_WORKFLOWS_VERSION}/argo-darwin-arm64.gz
gunzip argo-darwin-arm64.gz
chmod +x argo-darwin-arm64
sudo mv argo-darwin-arm64 /usr/local/bin/argo

Linux#

# 设置版本号(注意:这是 Argo Workflows 应用版本,不是 Helm chart 版本)
ARGO_WORKFLOWS_VERSION=v3.7.10
# 下载并安装
curl -sLO https://github.com/argoproj/argo-workflows/releases/download/${ARGO_WORKFLOWS_VERSION}/argo-linux-amd64.gz
gunzip argo-linux-amd64.gz
chmod +x argo-linux-amd64
sudo mv argo-linux-amd64 /usr/local/bin/argo

验证安装:

argo version

5. 访问 Argo Server UI#

Port Forward(测试环境)#

kubectl -n argo port-forward deployment/argo-workflows-server 2746:2746 --address 0.0.0.0

访问:http://localhost:2746 或 http://YOUR_HOST_IP:2746


6. 核心概念#

4.1 Workflow#

Workflow 是 Argo Workflows 的核心资源,定义了要执行的工作流。它具有双重职责:

  1. 定义工作流:描述工作流的结构和执行逻辑
  2. 存储状态:记录工作流的执行状态和结果

基本结构:

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: hello-world- # 使用 generateName 自动生成唯一名称
spec:
serviceAccountName: argo-workflow
entrypoint: main # 入口模板
templates:
- name: main
container:
image: busybox
command: [echo]
args: ["Hello World"]

4.2 Template 类型#

Argo Workflows 提供 9 种模板类型,分为两大类:

模板定义类(Template Definitions)#

定义实际要执行的工作:

类型说明使用场景
Container运行容器执行命令、运行应用
Script运行脚本Python/Bash 脚本执行
Resource操作 K8s 资源创建/删除/更新资源
Suspend暂停执行人工审批、等待条件
HTTP发送 HTTP 请求调用 API、Webhook
Plugin执行插件扩展功能
Container Set运行多容器需要多容器协作的任务

模板调用类(Template Invocators)#

控制工作流执行逻辑:

类型说明使用场景
Steps顺序执行步骤线性流程、多阶段任务
DAG有向无环图复杂依赖关系、并行任务

4.3 参数(Parameters)#

参数用于在模板间传递数据:

spec:
arguments:
parameters:
- name: message
value: "Hello Argo"
templates:
- name: print-message
inputs:
parameters:
- name: message
container:
image: busybox
command: [echo]
args: ["{{inputs.parameters.message}}"]

4.4 工件(Artifacts)#

工件用于在步骤间传递文件:

templates:
- name: generate-artifact
container:
image: busybox
command: [sh, -c]
args: ["echo 'hello world' > /tmp/hello.txt"]
outputs:
artifacts:
- name: hello-art
path: /tmp/hello.txt
- name: consume-artifact
inputs:
artifacts:
- name: hello-art
path: /tmp/hello.txt
container:
image: busybox
command: [cat]
args: ["/tmp/hello.txt"]

7. 实战案例#

7.1 案例 1:Hello World#

最简单的工作流示例:

hello-world.yaml
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: hello-world-
spec:
serviceAccountName: argo-workflow
entrypoint: main
templates:
- name: main
container:
image: busybox
command: [echo]
args: ["Hello World"]

提交工作流:

# 使用 kubectl
kubectl create -f hello-world.yaml -n argo
# 使用 argo CLI
argo submit hello-world.yaml -n argo --watch
# 查看日志
argo logs @latest -n argo

7.2 案例 2:Steps 顺序执行#

使用 Steps 实现多步骤顺序执行:

steps-workflow.yaml
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: steps-
spec:
serviceAccountName: argo-workflow
entrypoint: main
templates:
- name: main
steps:
- - name: step1
template: echo
arguments:
parameters:
- name: message
value: "Step 1"
- - name: step2a
template: echo
arguments:
parameters:
- name: message
value: "Step 2a"
- name: step2b
template: echo
arguments:
parameters:
- name: message
value: "Step 2b"
- - name: step3
template: echo
arguments:
parameters:
- name: message
value: "Step 3"
- name: echo
inputs:
parameters:
- name: message
container:
image: busybox
command: [echo]
args: ["{{inputs.parameters.message}}"]

执行流程:

  1. Step 1 执行
  2. Step 2a 和 Step 2b 并行执行
  3. Step 3 执行

提交工作流:

argo submit steps-workflow.yaml -n argo --watch

期望类似的最终结果:

Name: steps-fsc4w
Namespace: argo
ServiceAccount: argo-workflow
Status: Succeeded
Conditions:
PodRunning False
Completed True
Created: Wed Mar 04 11:17:23 +0800 (56 seconds ago)
Started: Wed Mar 04 11:17:23 +0800 (56 seconds ago)
Finished: Wed Mar 04 11:18:19 +0800 (now)
Duration: 56 seconds
Progress: 4/4
ResourcesDuration: 0s*(1 cpu),20s*(100Mi memory)
STEP TEMPLATE PODNAME DURATION MESSAGE
steps-fsc4w main
├───✔ step1 echo steps-fsc4w-echo-3409550729 4s
├─┬─✔ step2a echo steps-fsc4w-echo-454416900 4s
└─✔ step2b echo steps-fsc4w-echo-504749757 33s
└───✔ step3 echo steps-fsc4w-echo-3563388113 5s

7.3 案例 3:DAG 并行执行#

使用 DAG 定义复杂的依赖关系:

dag-workflow.yaml
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: dag-
spec:
serviceAccountName: argo-workflow
entrypoint: main
templates:
- name: main
dag:
tasks:
- name: A
template: echo
arguments:
parameters:
- name: message
value: "Task A"
- name: B
dependencies: [A]
template: echo
arguments:
parameters:
- name: message
value: "Task B"
- name: C
dependencies: [A]
template: echo
arguments:
parameters:
- name: message
value: "Task C"
- name: D
dependencies: [B, C]
template: echo
arguments:
parameters:
- name: message
value: "Task D"
- name: echo
inputs:
parameters:
- name: message
container:
image: busybox
command: [echo]
args: ["{{inputs.parameters.message}}"]

执行流程:

A
/ \
B C
\ /
D

提交工作流:

argo submit dag-workflow.yaml -n argo --watch

7.4 案例 4:脚本执行#

使用 Script 模板执行 Python 脚本:

script-workflow.yaml
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: script-
spec:
serviceAccountName: argo-workflow
entrypoint: main
templates:
- name: main
steps:
- - name: generate
template: gen-random-int
- - name: print
template: print-message
arguments:
parameters:
- name: message
value: "{{steps.generate.outputs.result}}"
- name: gen-random-int
script:
image: python:alpine3.23
command: [python]
source: |
import random
i = random.randint(1, 100)
print(i)
- name: print-message
inputs:
parameters:
- name: message
container:
image: busybox
command: [echo]
args: ["Random number: {{inputs.parameters.message}}"]

提交工作流:

argo submit script-workflow.yaml -n argo --watch

期望类似的最终结果:

Name: dag-bhp6s
Namespace: argo
ServiceAccount: argo-workflow
Status: Succeeded
Conditions:
PodRunning False
Completed True
Created: Wed Mar 04 11:20:08 +0800 (30 seconds ago)
Started: Wed Mar 04 11:20:08 +0800 (30 seconds ago)
Finished: Wed Mar 04 11:20:38 +0800 (now)
Duration: 30 seconds
Progress: 4/4
ResourcesDuration: 19s*(100Mi memory),0s*(1 cpu)
STEP TEMPLATE PODNAME DURATION MESSAGE
dag-bhp6s main
├─✔ A echo dag-bhp6s-echo-1335943234 4s
├─✔ B echo dag-bhp6s-echo-1319165615 4s
├─✔ C echo dag-bhp6s-echo-1302387996 5s
└─✔ D echo dag-bhp6s-echo-1285610377 4s

7.5 案例 5:工件传递#

在步骤间传递文件。

前置条件:工件传递需要配置 Artifact Repository,请先完成 3.4 节的 MinIO 配置。

artifact-workflow.yaml
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: artifact-
spec:
serviceAccountName: argo-workflow
entrypoint: main
templates:
- name: main
steps:
- - name: generate
template: generate-file
- - name: consume
template: consume-file
arguments:
artifacts:
- name: input-file
from: "{{steps.generate.outputs.artifacts.output-file}}"
- name: generate-file
container:
image: busybox
command: [sh, -c]
args: ["echo 'Hello from artifact' > /tmp/output.txt"]
outputs:
artifacts:
- name: output-file
path: /tmp/output.txt
- name: consume-file
inputs:
artifacts:
- name: input-file
path: /tmp/input.txt
container:
image: busybox
command: [cat]
args: ["/tmp/input.txt"]

提交工作流:

argo submit artifact-workflow.yaml -n argo --watch

期望类似的最终结果:

Name: artifact-t69b6
Namespace: argo
ServiceAccount: argo-workflow
Status: Succeeded
Conditions:
PodRunning False
Completed True
Created: Wed Mar 04 11:35:08 +0800 (20 seconds ago)
Started: Wed Mar 04 11:35:08 +0800 (20 seconds ago)
Finished: Wed Mar 04 11:35:28 +0800 (now)
Duration: 20 seconds
Progress: 2/2
ResourcesDuration: 0s*(1 cpu),8s*(100Mi memory)
STEP TEMPLATE PODNAME DURATION MESSAGE
artifact-t69b6 main
├───✔ generate generate-file artifact-t69b6-generate-file-149832164 4s
└───✔ consume consume-file artifact-t69b6-consume-file-3273100530 4s

7.6 案例 6:条件执行#

基于条件执行不同的分支:

conditional-workflow.yaml
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: conditional-
spec:
serviceAccountName: argo-workflow
entrypoint: main
arguments:
parameters:
- name: environment
value: "production"
templates:
- name: main
steps:
- - name: check-env
template: check-environment
- - name: prod-deploy
template: deploy
arguments:
parameters:
- name: env
value: "production"
when: "{{steps.check-env.outputs.result}} == production"
- name: dev-deploy
template: deploy
arguments:
parameters:
- name: env
value: "development"
when: "{{steps.check-env.outputs.result}} != production"
- name: check-environment
script:
image: python:alpine3.23
command: [python]
source: |
print("{{workflow.parameters.environment}}")
- name: deploy
inputs:
parameters:
- name: env
container:
image: busybox
command: [echo]
args: ["Deploying to {{inputs.parameters.env}}"]

提交工作流:

# 生产环境
argo submit conditional-workflow.yaml -n argo -p environment=production --watch
# 开发环境
argo submit conditional-workflow.yaml -n argo -p environment=development --watch

生产环境期望类似的最终结果:

Name: conditional-ltr9x
Namespace: argo
ServiceAccount: argo-workflow
Status: Succeeded
Conditions:
PodRunning False
Completed True
Created: Wed Mar 04 11:42:20 +0800 (34 seconds ago)
Started: Wed Mar 04 11:42:20 +0800 (34 seconds ago)
Finished: Wed Mar 04 11:42:54 +0800 (now)
Duration: 34 seconds
Progress: 2/2
ResourcesDuration: 1s*(1 cpu),18s*(100Mi memory)
Parameters:
environment: production
STEP TEMPLATE PODNAME DURATION MESSAGE
conditional-ltr9x main
├───✔ check-env check-environment conditional-ltr9x-check-environment-504116037 14s
└─┬─○ dev-deploy deploy when 'production != production' evaluated false
└─✔ prod-deploy deploy conditional-ltr9x-deploy-754292413 4s

测试环境期望类似的最终结果:

Name: conditional-px5t2
Namespace: argo
ServiceAccount: argo-workflow
Status: Succeeded
Conditions:
PodRunning False
Completed True
Created: Wed Mar 04 11:43:19 +0800 (30 seconds ago)
Started: Wed Mar 04 11:43:19 +0800 (30 seconds ago)
Finished: Wed Mar 04 11:43:49 +0800 (now)
Duration: 30 seconds
Progress: 2/2
ResourcesDuration: 1s*(1 cpu),19s*(100Mi memory)
Parameters:
environment: development
STEP TEMPLATE PODNAME DURATION MESSAGE
conditional-px5t2 main
├───✔ check-env check-environment conditional-px5t2-check-environment-1115641911 16s
└─┬─✔ dev-deploy deploy conditional-px5t2-deploy-12024433 5s
└─○ prod-deploy deploy

7.7 案例 7:重试机制#

配置失败重试:

retry-workflow.yaml
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: retry-
spec:
serviceAccountName: argo-workflow
entrypoint: main
templates:
- name: main
retryStrategy:
limit: "3"
retryPolicy: "Always"
backoff:
duration: "5s"
factor: 2
maxDuration: "1m"
container:
image: python:alpine3.23
command: [python]
args:
- -c
- |
import random
import sys
# 70% 概率失败
if random.random() < 0.7:
print("Task failed!")
sys.exit(1)
else:
print("Task succeeded!")

提交工作流:

argo submit retry-workflow.yaml -n argo --watch

期望类似的最终结果:

Name: retry-cf29c
Namespace: argo
ServiceAccount: argo-workflow
Status: Succeeded
Conditions:
PodRunning False
Completed True
Created: Wed Mar 04 11:44:21 +0800 (10 seconds ago)
Started: Wed Mar 04 11:44:21 +0800 (10 seconds ago)
Finished: Wed Mar 04 11:44:31 +0800 (now)
Duration: 10 seconds
Progress: 1/1
ResourcesDuration: 0s*(1 cpu),4s*(100Mi memory)
STEP TEMPLATE PODNAME DURATION MESSAGE
retry-cf29c(0) main retry-cf29c-main-439993472 4s

7.8 案例 8:CI/CD 流水线#

使用真实的开源项目演示 CI/CD 流水线。本示例使用 flask-hello-world 项目进行代码检出、测试和部署。

注:为了方便,部署在了argo namespace下

cicd-pipeline.yaml
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: cicd-pipeline-
spec:
serviceAccountName: argo-workflow
entrypoint: main
arguments:
parameters:
- name: repo
value: "https://github.com/pallets/flask.git"
- name: branch
value: "main"
- name: app-name
value: "flask-demo"
volumeClaimTemplates:
- metadata:
name: workdir
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 500Mi
templates:
- name: main
steps:
- - name: checkout
template: git-clone
- - name: lint
template: run-lint
- - name: test
template: run-tests
- - name: deploy
template: deploy-app
- name: git-clone
container:
image: alpine/git
command: [sh, -c]
args:
- |
git clone -b {{workflow.parameters.branch}} --depth 1 {{workflow.parameters.repo}} /work/repo
cd /work/repo
echo "Cloned repository: {{workflow.parameters.repo}}"
echo "Branch: {{workflow.parameters.branch}}"
echo "Commit: $(git rev-parse HEAD)"
volumeMounts:
- name: workdir
mountPath: /work
- name: run-lint
container:
image: python:3.11-slim
command: [sh, -c]
args:
- |
cd /work/repo
pip install --quiet flake8
echo "Running code linting..."
flake8 src/flask --count --select=E9,F63,F7,F82 --show-source --statistics || true
echo "Linting completed"
volumeMounts:
- name: workdir
mountPath: /work
- name: run-tests
container:
image: python:3.11-slim
command: [sh, -c]
args:
- |
cd /work/repo
pip install --quiet -e .
pip install --quiet pytest
echo "Running tests..."
pytest tests/ -v || echo "Tests completed with some failures (demo purpose)"
volumeMounts:
- name: workdir
mountPath: /work
- name: deploy-app
resource:
action: apply
manifest: |
apiVersion: v1
kind: ConfigMap
metadata:
name: {{workflow.parameters.app-name}}-config
namespace: argo
data:
app.info: |
Application: {{workflow.parameters.app-name}}
Repository: {{workflow.parameters.repo}}
Branch: {{workflow.parameters.branch}}
Deployed: $(date)

提交工作流:

# 使用默认参数
argo submit cicd-pipeline.yaml -n argo --watch
# 或指定自定义参数
argo submit cicd-pipeline.yaml -n argo \
-p repo=https://github.com/pallets/flask.git \
-p branch=main \
-p app-name=my-flask-app \
--watch

期望类似的最终结果:

Name: cicd-pipeline-pfg6f
Namespace: argo
ServiceAccount: argo-workflow
Status: Succeeded
Conditions:
PodRunning False
Completed True
Created: Wed Mar 04 12:35:31 +0800 (2 minutes ago)
Started: Wed Mar 04 12:35:31 +0800 (2 minutes ago)
Finished: Wed Mar 04 12:37:53 +0800 (now)
Duration: 2 minutes 22 seconds
Progress: 4/4
ResourcesDuration: 5s*(1 cpu),1m15s*(100Mi memory)
Parameters:
repo: https://github.com/pallets/flask.git
branch: main
app-name: flask-demo
STEP TEMPLATE PODNAME DURATION MESSAGE
cicd-pipeline-pfg6f main
├───✔ checkout git-clone cicd-pipeline-pfg6f-git-clone-2106511117 47s
├───✔ lint run-lint cicd-pipeline-pfg6f-run-lint-717278403 26s
├───✔ test run-tests cicd-pipeline-pfg6f-run-tests-655980579 14s
└───✔ deploy deploy-app cicd-pipeline-pfg6f-deploy-app-2393920939 26s

验证部署结果:

# 查看创建的 ConfigMap
kubectl get configmap flask-demo-config -n argo -o yaml

8. WorkflowTemplate 可复用模板#

WorkflowTemplate 允许定义可复用的工作流模板,便于在多个工作流中重复使用相同的逻辑。本示例使用 gin-gonic/gin 项目演示模板的创建和使用。

8.1 创建 WorkflowTemplate#

workflow-template.yaml
apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
name: go-build-test
namespace: argo
spec:
serviceAccountName: argo-workflow
entrypoint: main
arguments:
parameters:
- name: repo
value: "https://github.com/gin-gonic/gin.git"
- name: branch
value: "master"
volumes:
- name: workdir
emptyDir: {}
templates:
- name: main
steps:
- - name: checkout
template: git-clone
- - name: build
template: go-build
- - name: test
template: go-test
- name: git-clone
container:
image: alpine/git
command: [sh, -c]
args:
- |
git clone -b {{workflow.parameters.branch}} --depth 1 {{workflow.parameters.repo}} /work/repo
cd /work/repo
echo "Repository: {{workflow.parameters.repo}}"
echo "Branch: {{workflow.parameters.branch}}"
echo "Commit: $(git rev-parse HEAD)"
volumeMounts:
- name: workdir
mountPath: /work
- name: go-build
container:
image: golang:1.21-alpine
command: [sh, -c]
args:
- |
cd /work/repo
echo "Building Go project..."
go build -v ./...
echo "Build completed successfully"
volumeMounts:
- name: workdir
mountPath: /work
- name: go-test
container:
image: golang:1.21-alpine
command: [sh, -c]
args:
- |
cd /work/repo
echo "Running Go tests..."
go test -v ./... -short
echo "Tests completed"
volumeMounts:
- name: workdir
mountPath: /work

创建模板:

kubectl apply -f workflow-template.yaml

验证模板创建:

# 查看 WorkflowTemplate
kubectl get workflowtemplate -n argo
# 查看详细信息
kubectl describe workflowtemplate go-build-test -n argo

8.2 使用 WorkflowTemplate#

使用默认参数运行模板:

# 直接从 WorkflowTemplate 提交工作流
argo submit --from workflowtemplate/go-build-test -n argo --watch

使用自定义参数运行模板:

# 使用不同的分支
argo submit --from workflowtemplate/go-build-test -n argo \
-p branch=v1.9.1 \
--watch

或者通过 YAML 文件引用模板:

use-workflow-template.yaml
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: use-go-template-
spec:
serviceAccountName: argo-workflow
workflowTemplateRef:
name: go-build-test
arguments:
parameters:
- name: repo
value: "https://github.com/gin-gonic/gin.git"
- name: branch
value: "master"

提交工作流:

argo submit use-workflow-template.yaml -n argo --watch

期望类似的最终结果:

Name: go-build-test-r89hh
Namespace: argo
ServiceAccount: argo-workflow
Status: Succeeded
Conditions:
PodRunning False
Completed True
Created: Wed Mar 04 12:52:22 +0800 (1 minute ago)
Started: Wed Mar 04 12:52:22 +0800 (1 minute ago)
Finished: Wed Mar 04 12:53:38 +0800 (now)
Duration: 1 minute 16 seconds
Progress: 3/3
ResourcesDuration: 56s*(100Mi memory),4s*(1 cpu)
Parameters:
repo: https://github.com/gin-gonic/gin.git
branch: master
STEP TEMPLATE PODNAME DURATION MESSAGE
go-build-test-r89hh main
├───✔ checkout git-clone go-build-test-r89hh-git-clone-1757181677 5s
├───✔ build go-build go-build-test-r89hh-go-build-3721440394 47s
└───✔ test go-test go-build-test-r89hh-go-test-2594344963 4s

9. CronWorkflow 定时任务#

9.1 创建 CronWorkflow#

cron-workflow.yaml
apiVersion: argoproj.io/v1alpha1
kind: CronWorkflow
metadata:
name: backup-daily
namespace: argo
spec:
schedule: "0 13 * * *" # 每天 13 点执行
timezone: "Asia/Shanghai"
concurrencyPolicy: "Replace" # Replace, Allow, Forbid
startingDeadlineSeconds: 0
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 1
workflowSpec:
entrypoint: backup
templates:
- name: backup
steps:
- - name: database-backup
template: backup-db
- - name: upload-to-s3
template: upload-backup
- name: backup-db
container:
image: postgres:15
command: [sh, -c]
args:
- |
pg_dump -h postgres-host -U postgres mydb > /backup/backup.sql
volumeMounts:
- name: backup-volume
mountPath: /backup
- name: upload-backup
container:
image: amazon/aws-cli
command: [sh, -c]
args:
- |
aws s3 cp /backup/backup.sql s3://my-bucket/backups/$(date +%Y%m%d).sql
volumeMounts:
- name: backup-volume
mountPath: /backup
volumeClaimTemplates:
- metadata:
name: backup-volume
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 10Gi

创建定时任务:

kubectl apply -f cron-workflow.yaml

9.2 管理 CronWorkflow#

# 查看所有 CronWorkflow
kubectl get cronworkflow -n argo
# 查看详情
kubectl describe cronworkflow backup-daily -n argo
# 暂停 CronWorkflow
kubectl patch cronworkflow backup-daily -n argo -p '{"spec":{"suspend":true}}'
# 恢复 CronWorkflow
kubectl patch cronworkflow backup-daily -n argo -p '{"spec":{"suspend":false}}'
# 删除 CronWorkflow
kubectl delete cronworkflow backup-daily -n argo

10. 常用命令#

10.1 Workflow 管理#

# 提交工作流
argo submit workflow.yaml -n argo
# 提交并观察
argo submit workflow.yaml -n argo --watch
# 提交并传递参数
argo submit workflow.yaml -n argo -p param1=value1 -p param2=value2
# 列出工作流
argo list -n argo
# 查看工作流详情
argo get <workflow-name> -n argo
# 查看工作流日志
argo logs <workflow-name> -n argo
# 查看最新工作流日志
argo logs @latest -n argo
# 删除工作流
argo delete <workflow-name> -n argo
# 删除所有已完成的工作流
argo delete --completed -n argo
# 重新提交工作流
argo resubmit <workflow-name> -n argo
# 重试失败的工作流
argo retry <workflow-name> -n argo
# 暂停工作流
argo suspend <workflow-name> -n argo
# 恢复工作流
argo resume <workflow-name> -n argo
# 终止工作流
argo terminate <workflow-name> -n argo
# 停止工作流
argo stop <workflow-name> -n argo

10.2 使用 kubectl 管理#

# 查看工作流
kubectl get workflow -n argo
# 查看工作流详情
kubectl describe workflow <workflow-name> -n argo
# 查看工作流 YAML
kubectl get workflow <workflow-name> -n argo -o yaml
# 删除工作流
kubectl delete workflow <workflow-name> -n argo
# 查看 WorkflowTemplate
kubectl get workflowtemplate -n argo
# 查看 CronWorkflow
kubectl get cronworkflow -n argo

11. 常见问题排查#

11.1 工作流一直处于 Pending 状态#

原因

  • 资源不足(CPU/内存)
  • 镜像拉取失败
  • PVC 无法挂载

排查步骤

# 查看工作流详情
argo get <workflow-name> -n argo
# 查看 Pod 状态
kubectl get pods -n argo -l workflows.argoproj.io/workflow=<workflow-name>
# 查看 Pod 详情
kubectl describe pod <pod-name> -n argo
# 查看 Pod 事件
kubectl get events -n argo --field-selector involvedObject.name=<pod-name>

解决方案

  • 增加集群资源
  • 检查镜像地址和拉取凭证
  • 检查 StorageClass 配置

11.2 工作流执行失败#

原因

  • 容器执行错误
  • 超时
  • 资源限制

排查步骤

# 查看工作流日志
argo logs <workflow-name> -n argo
# 查看特定步骤日志
argo logs <workflow-name> -n argo -c <step-name>
# 查看工作流状态
kubectl get workflow <workflow-name> -n argo -o jsonpath='{.status.phase}'

解决方案

  • 检查容器日志,修复代码错误
  • 调整超时时间
  • 增加资源限制

11.3 工件传递失败#

原因

  • Artifact Repository 未配置
  • S3 凭证错误
  • 网络问题

排查步骤

# 查看 Artifact Repository 配置
kubectl get configmap artifact-repositories -n argo -o yaml
# 查看工作流日志
argo logs <workflow-name> -n argo

解决方案

  • 配置正确的 Artifact Repository
  • 验证 S3 凭证
  • 检查网络连通性

11.4 权限问题#

原因

  • ServiceAccount 权限不足
  • RBAC 配置错误

排查步骤

# 查看 ServiceAccount
kubectl get sa -n argo
# 查看 RoleBinding
kubectl get rolebinding -n argo
# 查看工作流使用的 ServiceAccount
kubectl get workflow <workflow-name> -n argo -o jsonpath='{.spec.serviceAccountName}'

解决方案

  • 创建正确的 ServiceAccount 和 RBAC 规则
  • 在工作流中指定 ServiceAccount

12. 最佳实践#

12.1 资源管理#

设置资源限制

templates:
- name: resource-limited
container:
image: busybox
command: [sh, -c]
args: ["echo 'Running with resource limits'"]
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"

使用节点选择器

templates:
- name: gpu-task
container:
image: tensorflow/tensorflow:latest-gpu
command: [python, train.py]
nodeSelector:
gpu: "true"
tolerations:
- key: "nvidia.com/gpu"
operator: "Exists"
effect: "NoSchedule"

12.2 安全最佳实践#

使用专用 ServiceAccount

spec:
serviceAccountName: workflow-sa

避免使用 root 用户

templates:
- name: non-root
container:
image: busybox
command: [sh, -c]
args: ["whoami"]
securityContext:
runAsNonRoot: true
runAsUser: 1000

使用 Secret 管理敏感信息

templates:
- name: use-secret
container:
image: busybox
command: [sh, -c]
args: ["echo $PASSWORD"]
env:
- name: PASSWORD
valueFrom:
secretKeyRef:
name: my-secret
key: password

12.3 性能优化#

并行执行独立任务

# 使用 DAG 而不是 Steps
templates:
- name: parallel-tasks
dag:
tasks:
- name: task1
template: worker
- name: task2
template: worker
- name: task3
template: worker

设置合理的超时时间

spec:
activeDeadlineSeconds: 3600 # 1 小时超时
templates:
- name: task-with-timeout
activeDeadlineSeconds: 600 # 10 分钟超时
container:
image: busybox
command: [sleep, "300"]

使用 PVC 复用

spec:
volumeClaimTemplates:
- metadata:
name: workdir
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 10Gi

12.4 可维护性#

使用 WorkflowTemplate 提高复用性

# 定义通用模板
apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
name: common-tasks
spec:
templates:
- name: notify
inputs:
parameters:
- name: message
container:
image: curlimages/curl
command: [sh, -c]
args:
- |
curl -X POST https://hooks.slack.com/services/xxx \
-d '{"text":"{{inputs.parameters.message}}"}'

添加标签和注解

metadata:
generateName: my-workflow-
labels:
app: myapp
env: production
team: platform
annotations:
description: "Daily backup workflow"
owner: "platform-team@example.com"

使用有意义的命名

templates:
- name: checkout-source-code # 清晰的名称
container:
image: alpine/git
command: [git, clone, "{{workflow.parameters.repo}}"]

13. 总结#

Argo Workflows 是一个强大的 Kubernetes 原生工作流引擎,适用于 CI/CD、数据处理、机器学习等多种场景。

核心优势

  • ✅ 容器原生,与 Kubernetes 深度集成
  • ✅ 支持复杂的 DAG 和并行执行
  • ✅ 丰富的模板类型和可复用性
  • ✅ 完善的可视化界面和 CLI 工具
  • ✅ 活跃的社区和生态系统

适用场景

  • CI/CD 流水线自动化
  • 批量数据处理和 ETL
  • 机器学习模型训练
  • 基础设施自动化运维

学习路径

  1. 掌握基本概念(Workflow、Template、Parameters)
  2. 实践简单案例(Hello World、Steps、DAG)
  3. 学习高级特性(工件传递、条件执行、重试)
  4. 构建实际项目(CI/CD 流水线、数据处理)
  5. 优化和监控(资源管理、性能优化、告警)

参考资源

分享

如果这篇文章对你有帮助,欢迎分享给更多人!

Argo Workflows 完全实战指南
https://hua-ri.cn/posts/argo-workflows-完全实战指南/
作者
花日
发布于
2026-03-04
许可协议
CC BY-NC-SA 4.0

部分信息可能已经过时