Argo Workflows Reference: DAG, Steps, Artifacts, CronWorkflow & K8s ML Pipelines
Argo Workflows is a Kubernetes-native workflow engine for running DAG (directed acyclic graph) or step-based workflows as K8s pods. Widely used for ML pipelines, data processing, CI/CD, and batch jobs. Different from ArgoCD (GitOps) and Argo Rollouts (deployments).
1. Argo Workflows vs Other Tools
When Argo Workflows vs Tekton vs Airflow vs Prefect
| Tool | K8s-native | DAG | Primary use case |
|---|---|---|---|
| Argo Workflows | Yes (runs as pods) | Yes | K8s batch jobs, ML pipelines, data processing, complex CI |
| Tekton | Yes (CRD-based) | Sequential pipelines | CI/CD pipelines, build systems |
| Airflow | Optional (KubernetesPodOperator) | Yes | Data pipelines, scheduled ETL, long-running workflows |
| Prefect | Optional | Yes | Python-first data workflows, cloud + hybrid |
# Install Argo Workflows: kubectl create namespace argo kubectl apply -n argo -f https://github.com/argoproj/argo-workflows/releases/latest/download/install.yaml # Install argo CLI: brew install argo # macOS # Or: curl -sLO https://github.com/argoproj/argo-workflows/releases/latest/download/argo-linux-amd64.gz # Expose the UI: kubectl -n argo port-forward deploy/argo-server 2746:2746 # Open: http://localhost:2746 # List workflows: argo list -n argo argo get my-workflow -n argo argo logs my-workflow -n argo
2. Step-Based Workflow
Sequential and parallel steps — hello-world pattern
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: pipeline- # auto-generates unique name: pipeline-xxxxx
namespace: argo
spec:
entrypoint: main # which template to run first
arguments:
parameters:
- name: env
value: production # default value; override: argo submit --parameter env=staging
templates:
# Main template — orchestrates other templates as steps:
- name: main
steps:
- - name: fetch-data # sequential: step 1
template: fetch
arguments:
parameters: [{name: env, value: "{{workflow.parameters.env}}"}]
- - name: process-a # parallel: runs simultaneously
template: process
arguments: {parameters: [{name: shard, value: "a"}]}
- name: process-b # also in step 2 (same indent = parallel)
template: process
arguments: {parameters: [{name: shard, value: "b"}]}
- - name: aggregate # sequential: step 3 (after both 2a + 2b)
template: aggregate
# Leaf templates — actual work (each runs as a K8s pod):
- name: fetch
inputs:
parameters:
- name: env
container:
image: my-fetcher:latest
command: [python, fetch.py]
args: ["--env", "{{inputs.parameters.env}}"]
- name: process
inputs:
parameters: [{name: shard}]
container:
image: my-processor:latest
command: [python, process.py, "--shard", "{{inputs.parameters.shard}}"]
- name: aggregate
container:
image: my-aggregator:latest
command: [python, aggregate.py]
3. DAG Workflow
Explicit dependency graph — more flexible than steps
templates:
- name: main-dag
dag:
tasks:
- name: ingest
template: run-job
arguments: {parameters: [{name: cmd, value: "python ingest.py"}]}
- name: validate
dependencies: [ingest] # runs after ingest completes
template: run-job
arguments: {parameters: [{name: cmd, value: "python validate.py"}]}
- name: transform-a
dependencies: [validate] # runs after validate
template: run-job
arguments: {parameters: [{name: cmd, value: "python transform.py --part a"}]}
- name: transform-b
dependencies: [validate] # also runs after validate (parallel with transform-a)
template: run-job
arguments: {parameters: [{name: cmd, value: "python transform.py --part b"}]}
- name: load
dependencies: [transform-a, transform-b] # waits for BOTH to finish
template: run-job
arguments: {parameters: [{name: cmd, value: "python load.py"}]}
- name: run-job
inputs:
parameters: [{name: cmd}]
container:
image: my-pipeline:latest
command: [sh, -c]
args: ["{{inputs.parameters.cmd}}"]
4. Artifacts, Volumes & Resource Templates
Pass data between steps, mount PVCs, create K8s resources
# Artifacts — pass files between steps (S3, GCS, or Minio):
templates:
- name: produce
outputs:
artifacts:
- name: result
path: /tmp/result.json # file written by the container
container:
image: my-producer:latest
command: [python, produce.py] # writes to /tmp/result.json
- name: consume
inputs:
artifacts:
- name: result
from: "{{tasks.produce.outputs.artifacts.result}}" # DAG reference
path: /tmp/input.json # where to mount the artifact in this container
container:
image: my-consumer:latest
command: [python, consume.py, --input, /tmp/input.json]
# Artifacts config (minio/S3):
spec:
artifactRepositoryRef:
configMap: artifact-repositories
key: default
# ConfigMap contains S3/GCS endpoint + bucket + credentials
# Volume claim template (PVC per workflow run):
spec:
volumeClaimTemplates:
- metadata: {name: work}
spec:
accessModes: [ReadWriteOnce]
resources: {requests: {storage: 10Gi}}
templates:
- name: my-step
container:
volumeMounts:
- name: work
mountPath: /work
# Resource template (create/delete K8s resources as workflow steps):
templates:
- name: create-job
resource:
action: create
successCondition: status.succeeded > 0
failureCondition: status.failed > 0
manifest: |
apiVersion: batch/v1
kind: Job
metadata: {generateName: my-job-}
spec:
template:
spec:
containers: [{name: main, image: my-image:latest, command: [python, run.py]}]
restartPolicy: Never
5. CLI & Operations
Submit, monitor, and manage workflow runs
# Submit a workflow:
argo submit workflow.yaml -n argo # submit and watch
argo submit workflow.yaml -n argo --wait # wait for completion
argo submit workflow.yaml -n argo -p env=staging # override parameter
argo submit workflow.yaml -n argo --from=cronworkflow/my-cron # run a cron now
# Monitor:
argo list -n argo # all workflows + status
argo get my-workflow-xxx -n argo # detailed status
argo logs my-workflow-xxx -n argo # all pod logs
argo logs my-workflow-xxx -n argo -c main # specific container
argo watch my-workflow-xxx -n argo # live status updates
# Retry failed workflows:
argo retry my-workflow-xxx -n argo # retry from last failed node
argo retry my-workflow-xxx -n argo --restart-successful # retry ALL nodes from scratch
# CronWorkflow (schedule a workflow):
apiVersion: argoproj.io/v1alpha1
kind: CronWorkflow
metadata:
name: daily-pipeline
namespace: argo
spec:
schedule: "0 2 * * *" # 2am daily
timezone: Europe/London
concurrencyPolicy: Forbid # don't start new run if previous still running
workflowSpec:
entrypoint: main
templates: [...]
# Suspend/resume a cron:
argo cron suspend daily-pipeline -n argo
argo cron resume daily-pipeline -n argo
Track Argo Workflows, ArgoCD, and Kubernetes release pipeline tooling.
ReleaseRun monitors Kubernetes, Docker, and 13+ DevOps technologies.
Related: ArgoCD & GitOps Reference | Argo Rollouts Reference | Kubernetes YAML Reference | Kubernetes EOL Tracker
🔍 Free tool: K8s YAML Security Linter — check your Argo Workflow templates and K8s manifests for 12 security misconfigurations.
Founded
2023 in London, UK
Contact
hello@releaserun.com