Argo Workflows Reference: DAG, Steps, Artifacts, CronWorkflow & K8s ML Pipelines

Argo Workflows is a Kubernetes-native workflow engine for running DAG (directed acyclic graph) or step-based workflows as K8s pods. Widely used for ML pipelines, data processing, CI/CD, and batch jobs. Different from ArgoCD (GitOps) and Argo Rollouts (deployments).

1. Argo Workflows vs Other Tools

When Argo Workflows vs Tekton vs Airflow vs Prefect

Tool	K8s-native	DAG	Primary use case
Argo Workflows	Yes (runs as pods)	Yes	K8s batch jobs, ML pipelines, data processing, complex CI
Tekton	Yes (CRD-based)	Sequential pipelines	CI/CD pipelines, build systems
Airflow	Optional (KubernetesPodOperator)	Yes	Data pipelines, scheduled ETL, long-running workflows
Prefect	Optional	Yes	Python-first data workflows, cloud + hybrid

# Install Argo Workflows:
kubectl create namespace argo
kubectl apply -n argo -f https://github.com/argoproj/argo-workflows/releases/latest/download/install.yaml

# Install argo CLI:
brew install argo           # macOS
# Or: curl -sLO https://github.com/argoproj/argo-workflows/releases/latest/download/argo-linux-amd64.gz

# Expose the UI:
kubectl -n argo port-forward deploy/argo-server 2746:2746
# Open: http://localhost:2746

# List workflows:
argo list -n argo
argo get my-workflow -n argo
argo logs my-workflow -n argo

2. Step-Based Workflow

Sequential and parallel steps — hello-world pattern

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: pipeline-   # auto-generates unique name: pipeline-xxxxx
  namespace: argo
spec:
  entrypoint: main          # which template to run first
  arguments:
    parameters:
      - name: env
        value: production   # default value; override: argo submit --parameter env=staging

  templates:
    # Main template — orchestrates other templates as steps:
    - name: main
      steps:
        - - name: fetch-data            # sequential: step 1
            template: fetch
            arguments:
              parameters: [{name: env, value: "{{workflow.parameters.env}}"}]
        - - name: process-a             # parallel: runs simultaneously
            template: process
            arguments: {parameters: [{name: shard, value: "a"}]}
          - name: process-b             # also in step 2 (same indent = parallel)
            template: process
            arguments: {parameters: [{name: shard, value: "b"}]}
        - - name: aggregate             # sequential: step 3 (after both 2a + 2b)
            template: aggregate

    # Leaf templates — actual work (each runs as a K8s pod):
    - name: fetch
      inputs:
        parameters:
          - name: env
      container:
        image: my-fetcher:latest
        command: [python, fetch.py]
        args: ["--env", "{{inputs.parameters.env}}"]

    - name: process
      inputs:
        parameters: [{name: shard}]
      container:
        image: my-processor:latest
        command: [python, process.py, "--shard", "{{inputs.parameters.shard}}"]

    - name: aggregate
      container:
        image: my-aggregator:latest
        command: [python, aggregate.py]

3. DAG Workflow

Explicit dependency graph — more flexible than steps

templates:
  - name: main-dag
    dag:
      tasks:
        - name: ingest
          template: run-job
          arguments: {parameters: [{name: cmd, value: "python ingest.py"}]}

        - name: validate
          dependencies: [ingest]       # runs after ingest completes
          template: run-job
          arguments: {parameters: [{name: cmd, value: "python validate.py"}]}

        - name: transform-a
          dependencies: [validate]     # runs after validate
          template: run-job
          arguments: {parameters: [{name: cmd, value: "python transform.py --part a"}]}

        - name: transform-b
          dependencies: [validate]     # also runs after validate (parallel with transform-a)
          template: run-job
          arguments: {parameters: [{name: cmd, value: "python transform.py --part b"}]}

        - name: load
          dependencies: [transform-a, transform-b]   # waits for BOTH to finish
          template: run-job
          arguments: {parameters: [{name: cmd, value: "python load.py"}]}

  - name: run-job
    inputs:
      parameters: [{name: cmd}]
    container:
      image: my-pipeline:latest
      command: [sh, -c]
      args: ["{{inputs.parameters.cmd}}"]

4. Artifacts, Volumes & Resource Templates

Pass data between steps, mount PVCs, create K8s resources

# Artifacts — pass files between steps (S3, GCS, or Minio):
templates:
  - name: produce
    outputs:
      artifacts:
        - name: result
          path: /tmp/result.json         # file written by the container
    container:
      image: my-producer:latest
      command: [python, produce.py]      # writes to /tmp/result.json

  - name: consume
    inputs:
      artifacts:
        - name: result
          from: "{{tasks.produce.outputs.artifacts.result}}"  # DAG reference
          path: /tmp/input.json          # where to mount the artifact in this container
    container:
      image: my-consumer:latest
      command: [python, consume.py, --input, /tmp/input.json]

# Artifacts config (minio/S3):
spec:
  artifactRepositoryRef:
    configMap: artifact-repositories
    key: default
  # ConfigMap contains S3/GCS endpoint + bucket + credentials

# Volume claim template (PVC per workflow run):
spec:
  volumeClaimTemplates:
    - metadata: {name: work}
      spec:
        accessModes: [ReadWriteOnce]
        resources: {requests: {storage: 10Gi}}
  templates:
    - name: my-step
      container:
        volumeMounts:
          - name: work
            mountPath: /work

# Resource template (create/delete K8s resources as workflow steps):
templates:
  - name: create-job
    resource:
      action: create
      successCondition: status.succeeded > 0
      failureCondition: status.failed > 0
      manifest: |
        apiVersion: batch/v1
        kind: Job
        metadata: {generateName: my-job-}
        spec:
          template:
            spec:
              containers: [{name: main, image: my-image:latest, command: [python, run.py]}]
              restartPolicy: Never

5. CLI & Operations

Submit, monitor, and manage workflow runs

# Submit a workflow:
argo submit workflow.yaml -n argo                          # submit and watch
argo submit workflow.yaml -n argo --wait                   # wait for completion
argo submit workflow.yaml -n argo -p env=staging           # override parameter
argo submit workflow.yaml -n argo --from=cronworkflow/my-cron  # run a cron now

# Monitor:
argo list -n argo                       # all workflows + status
argo get my-workflow-xxx -n argo        # detailed status
argo logs my-workflow-xxx -n argo       # all pod logs
argo logs my-workflow-xxx -n argo -c main  # specific container
argo watch my-workflow-xxx -n argo      # live status updates

# Retry failed workflows:
argo retry my-workflow-xxx -n argo      # retry from last failed node
argo retry my-workflow-xxx -n argo --restart-successful  # retry ALL nodes from scratch

# CronWorkflow (schedule a workflow):
apiVersion: argoproj.io/v1alpha1
kind: CronWorkflow
metadata:
  name: daily-pipeline
  namespace: argo
spec:
  schedule: "0 2 * * *"               # 2am daily
  timezone: Europe/London
  concurrencyPolicy: Forbid            # don't start new run if previous still running
  workflowSpec:
    entrypoint: main
    templates: [...]

# Suspend/resume a cron:
argo cron suspend daily-pipeline -n argo
argo cron resume daily-pipeline -n argo

Track Argo Workflows, ArgoCD, and Kubernetes release pipeline tooling.
ReleaseRun monitors Kubernetes, Docker, and 13+ DevOps technologies.

🔍 Free tool: K8s YAML Security Linter — check your Argo Workflow templates and K8s manifests for 12 security misconfigurations.