Argo Workflow (3)

1 Artifacts

When running workflows, it is very common for steps to generate or use artifacts. Typically, an output artifact from one step may be used as an input artifact by a subsequent step.

The following workflow specification contains two sequential steps. The first step, named generate-artifact, uses the argosay template to generate an artifact, which is then used by the second step, named print-message, which consumes the generated artifact.

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: artifact-passing-
spec:
  entrypoint: artifact-example
  templates:
  - name: artifact-example
    steps:
    - - name: generate-artifact
        template: argosay
    - - name: consume-artifact
        template: print-message
        arguments:
          artifacts:
          # bind message to the hello-art artifact
          # generated by the generate-artifact step
          - name: message
            from: "{{steps.generate-artifact.outputs.artifacts.hello-art}}"

  - name: argosay
    container:
      image: yky8/argosay:v2
      # sh -c allows you to provide a string that sh will execute as a complete shell command
      command: [sh, -c]
      # The tee command reads from standard input and writes to both standard output and files simultaneously
      # Useful for saving command output to a file while viewing it in the terminal
      args: ["/usr/local/bin/argosay echo 'hello world' | tee /tmp/hello_world.txt"]
    outputs:
      artifacts:
      # generate hello-art artifact from /tmp/hello_world.txt
      # artifacts can be directories as well as files
      - name: hello-art # Used by consume-artifact step, specified with "from" field
        path: /tmp/hello_world.txt

  - name: print-message
    inputs:
      artifacts:
      # unpack the message input artifact
      # and put it at /tmp/message
      - name: message
        path: /tmp/message # put it at /tmp/message
    container:
      image: alpine:latest
      command: [sh, -c]
      args: ["cat /tmp/message"]

argo submit artifacts.yaml -n argo

Handling Large Artifacts

When running workflows, it is very common for steps to generate or use artifacts. Typically, an output artifact from one step may be used as an input artifact by a subsequent step.

<... snipped ...>
  - name: print-large-artifact
    # below patch gets merged with the actual pod spec and increases the memory
    # request of the init container.
    podSpecPatch: |
      initContainers:
        - name: init
          resources:
            requests:
              memory: 2Gi
              cpu: 300m      
    inputs:
      artifacts:
      - name: data
        path: /tmp/large-file
    container:
      image: alpine:latest
      command: [sh, -c]
      args: ["cat /tmp/large-file"]
<... snipped ...>

Artifact Archiving Strategy

In Argo Workflows, artifacts are by default archived as tarballs and compressed with gzip. You can customize this behavior by specifying the archive strategy. Below is an example showing how to customize the archiving strategy using the archive field:

<... snipped ...>
    outputs:
      artifacts:
        # default behavior - tar+gzip default compression.
      - name: hello-art-1
        path: /tmp/hello_world.txt

        # disable archiving entirely - upload the file / directory as is.
        # this is useful when the container layout matches the desired target repository layout.   
      - name: hello-art-2
        path: /tmp/hello_world.txt
        archive:
          none: {}

        # customize the compression behavior (disabling it here).
        # this is useful for files with varying compression benefits, 
        # e.g. disabling compression for a cached build workspace and large binaries, 
        # or increasing compression for "perfect" textual data - like a json/xml export of a large database.
      - name: hello-art-3
        path: /tmp/hello_world.txt
        archive:
          tar:
            # no compression (also accepts the standard gzip 1 to 9 values)
            compressionLevel: 0
<... snipped ...>

Artifact Garbage Collection

Documentation for supported storage engines: https://argo-workflows.readthedocs.io/en/latest/configure-artifact-repository/

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: artifact-gc-
spec:
  entrypoint: main
  artifactGC:
    strategy: OnWorkflowDeletion  # default Strategy set here applies to all Artifacts by default
  templates:
    - name: main
      container:
        image: argoproj/argosay:v2
        command:
          - sh
          - -c
        args:
          - |
            echo "can throw this away" > /tmp/temporary-artifact.txt
            echo "keep this" > /tmp/keep-this.txt            
      outputs:
        artifacts:
          - name: temporary-artifact
            path: /tmp/temporary-artifact.txt
            s3:
              key: temporary-artifact.txt
          - name: keep-this
            path: /tmp/keep-this.txt
            s3:
              key: keep-this.txt
            artifactGC:
              strategy: Never   # optional override for an Artifact

Naming Artifacts – Parameterization

When there may be concurrent runs of the same workflow, consider using parameterized S3 keys, such as {{workflow.uid}}. This avoids situations where one workflow deletes an artifact while another generates an artifact with the same S3 key.

For example, if you have two concurrent workflows using the same S3 key to store artifacts, one workflow might delete its artifacts while the other is still generating artifacts. This could result in issues where the second workflow finds its artifacts deleted or overwrites the first workflow’s artifacts.

To avoid this, use parameterized S3 keys so each workflow has a unique S3 key, such as using {{workflow.uid}} as part of the S3 key.

Service Account or IAM Annotations for Storage Services

If you need to use a service account or IAM annotations for storage services, you can specify these annotations in the workflow specification. These annotations are passed to the storage service to control access to the storage bucket.

You can specify a service account or IAM annotations for the entire workflow or for each artifact.

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: artifact-gc-
spec:
  entrypoint: main
  artifactGC:
    strategy: OnWorkflowDeletion 
    ##############################################################################################
    #    Workflow Level Service Account and Metadata
    ##############################################################################################
    serviceAccountName: my-sa
    podMetadata:
      annotations:
        eks.amazonaws.com/role-arn: arn:aws:iam::111122223333:role/my-iam-role
  templates:
    - name: main
      container:
        image: argoproj/argosay:v2
        command:
          - sh
          - -c
        args:
          - |
            echo "can throw this away" > /tmp/temporary-artifact.txt
            echo "keep this" > /tmp/keep-this.txt            
      outputs:
        artifacts:
          - name: temporary-artifact
            path: /tmp/temporary-artifact.txt
            s3:
              key: temporary-artifact-{{workflow.uid}}.txt
            artifactGC:
              ####################################################################################
              #    Optional override capability
              ####################################################################################
              serviceAccountName: artifact-specific-sa
              podMetadata:
                annotations:
                  eks.amazonaws.com/role-arn: arn:aws:iam::111122223333:role/artifact-specific-iam-role
          - name: keep-this
            path: /tmp/keep-this.txt
            s3:
              key: keep-this-{{workflow.uid}}.txt
            artifactGC:
              strategy: Never

To support custom service accounts, you need to create a Role and RoleBinding and bind the Role to the ServiceAccount:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  annotations:
    workflows.argoproj.io/description: |
      This is the minimum recommended permissions needed if you want to use artifact GC.      
  name: artifactgc
rules:
- apiGroups:
  - argoproj.io
  resources:
  - workflowartifactgctasks
  verbs:
  - list
  - watch
- apiGroups:
  - argoproj.io
  resources:
  - workflowartifactgctasks/status
  verbs:
  - patch

If you used the quick start manifest file to install, you get a role named artifactgc. If

you have installed it using the Helm Chart, you need to install this role manually. You can bind this role to a service account using a role binding:

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: artifactgc
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: artifactgc
subjects:
- kind: ServiceAccount
  name: artifact-specific-sa

What Happens When Garbage Collection (GC) Fails in Argo Workflow?

If artifact deletion fails for some reason (except when the artifact has already been deleted, which is not considered a failure), the workflow’s status will be marked with a new condition to indicate an “Artifact GC Failure”. Additionally, Kubernetes will emit an event, and the Argo Server UI will display the failure information. To further debug, users should find one or more Pods named <wfName>-artgc-* and check their logs.

If users need to delete the workflow and its child CRD objects, they need to patch the workflow to remove the finalizer that prevents deletion:

apiVersion: argoproj.io/v1alpha1
kind: Workflow
finalizers:
  - workflows.argoproj.io/artifact-gc

You can remove the finalizer with the following command:

kubectl patch workflow my-wf \
    --type json \
    --patch='[ { "op": "remove", "path": "/metadata/finalizers" } ]'

Alternatively, for easier operation, you can use the argo delete command with the --force flag, which removes the finalizer before performing the delete operation.

In version 3.5 and higher, a flag named forceFinalizerRemoval has been added to the Workflow Spec, allowing the finalizer to be forcibly removed even if Artifact GC fails:

spec:
  artifactGC:
    strategy: OnWorkflowDeletion 
    forceFinalizerRemoval: true

This means the workflow can be deleted even if artifact garbage collection fails.

2 Built-in Artifacts

Argo Workflows provides built-in support for several common artifact types. These include git repositories, HTTP resources, GCS buckets, and S3 buckets. While you can use any container for any purpose, these built-in artifact types make it easier to work with these common types.

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: hardwired-artifact-
spec:
  entrypoint: hardwired-artifact
  templates:
    - name: hardwired-artifact
      podSpecPatch: |
        initContainers:
          - name: init
            resources:
              requests:
                memory: 2Gi
                cpu: 300m        
      inputs:
        artifacts:
          # Check out the main branch of the argo repo and place it at /src
          # revision can be anything that git checkout accepts: branch, commit, tag, etc.
          - name: argo-source
            path: /src
            git:
              repo: https://github.com/argoproj/argo-workflows.git
              revision: "main"
          # Download kubectl 1.8.0 and place it at /bin/kubectl
          - name: kubectl
            path: /bin/kubectl
            mode: 0755
            http:
              url: https://storage.googleapis.com/kubernetes-release/release/v1.8.0/bin/linux/amd64/kubectl
        # Copy an s3 compatible artifact repository bucket (such as AWS, GCS and MinIO) and place it at /s3
        # - name: objects
        #   path: /s3
        #   s3:
        #     endpoint: storage.googleapis.com
        #     bucket: my-bucket-name
        #     key: path/in/bucket
        #     accessKeySecret:
        #       name: my-s3-credentials
        #       key: accessKey
        #     secretKeySecret:
        #       name: my-s3-credentials
        #       key: secretKey
      container:
        image: debian
        command: [sh, -c]
        args: ["ls -l /src /bin/kubectl"] # /s3 is not tested here

These are examples of using artifacts in Argo Workflow. Artifacts are files or directories generated or used during the workflow execution. In this example, the workflow template defines three input artifacts:

argo-source: This artifact uses the git type. It checks out a specified revision (in this case, the “main” branch) from the specified git repository (in this case, https://github.com/argoproj/argo-workflows.git) and places it in the /src directory.
kubectl: This artifact uses the HTTP type. It downloads a file from the specified URL (in this case, https://storage.googleapis.com/kubernetes-release/release/v1.8.0/bin/linux/amd64/kubectl) and places it in the /bin/kubectl directory. The mode: 0755 setting makes this file executable.
objects: This artifact uses the S3 type. It downloads files or directories from a specified S3-compatible storage service (in this case, storage.googleapis.com) using the specified bucket (my-bucket-name) and key (path/in/bucket), and places them in the /s3 directory. The accessKeySecret and secretKeySecret fields specify the Kubernetes secrets storing the access key and secret key for AWS.

The template then defines a container that runs a command to list the contents of the /src, /bin/kubectl, and /s3 directories, where the input artifacts are placed.

3 Script

Sometimes, you may want the workflow to execute a script rather than directly running a container. In such cases, you can use the script type of artifact. A script type of artifact is a script, which can be a shell script, Python script, Perl script, Ruby script, etc. In this example, we will demonstrate how to use the script type of artifact.

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: scripts-
spec:
  entrypoint: bash-script-example
  templates:
  - name: bash-script-example
    steps:
    - - name: generate-from-bash
        template: gen-random-int-bash
      - name: generate-from-python
        template: gen-random-int-python
      - name: generate-from-javascript
        template: gen-random-int-javascript
    - - name: print-for-bash
        template: print-message
        arguments:
          parameters:
          - name: message
            value: "[BASH] {{steps.generate-from-bash.outputs.result}}"  # The result of the here-script
      - name: print-for-python
        template: print-message
        arguments:
          parameters:
          - name: message
            value: "[PY] {{steps.generate-from-python.outputs.result}}"  # The result of the Python script
      - name: print-for-javascript
        template: print-message
        arguments:
          parameters:
          - name: message
            value: "[JS] {{steps.generate-from-javascript.outputs.result}}"  # The result of the JavaScript script

  - name: gen-random-int-bash
    script:
      image: debian:9.4
      command: [bash]
      source: |                                         # Contents of the here-script
        cat /dev/urandom | od -N2 -An -i | awk -v f=1 -v r=100 '{printf "%i\n", f + r * $1 / 65536}'

  - name: gen-random-int-python
    script:
      image: python:alpine3.6
      command: [python]
      source: |
        import random
        i = random.randint(1, 100)
        print(i)        

  - name: gen-random-int-javascript
    script:
      image: node:9.1-alpine
      command: [node]
      source: |
        var rand = Math.floor(Math.random() * 100);
        console.log(rand);        

  - name: print-message
    inputs:
      parameters:
      - name: message
    container:
      image: alpine:latest
      command: [sh, -c]
      args: ["echo result was: {{inputs.parameters.message}}"]

The script keyword allows using the source label to specify the script body. This creates a temporary file containing the script body, and the name of this temporary file is passed as the last argument to command. The command should be an interpreter that executes the script body.

Using the script feature also assigns the standard output of the script to a special output parameter named result. This allows you to use the result of running the script itself in other parts of the workflow specification. In this example, the result is simply echoed by the print-message template.

argo submit scripts-parallel.yaml -n argo --watch

4 Output Parameters

Output parameters provide a general mechanism to use the results of steps as parameters (not just as artifacts). This allows you to use the results of any type of step, not just scripts, for conditional testing, looping, and parameterization. Output parameters work similarly to script results, except the value of the output parameter is set to the contents of a generated file, rather than the contents of stdout.

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: output-parameter-
spec:
  entrypoint: output-parameter
  templates:
    - name: output-parameter
      steps:
        - - name: generate-parameter
            template: argosay
        - - name: consume-parameter
            template: print-message
            arguments:
              parameters:
                # Pass the hello-param output from the generate-parameter step as the message input to print-message
                - name: message
                  value: "{{steps.generate-parameter.outputs.parameters.hello-param}}"

    - name: argosay
      container:
        image: yky8/argosay:v2
        command: [ sh, -c ]
        args: [ "echo -n hello world > /tmp/hello_world.txt" ]  # generate the content of hello_world.txt
      outputs:
        parameters:
          - name: hello-param  # name of output parameter
            valueFrom:
              path: /tmp/hello_world.txt # set the value of hello-param to the contents of this hello-world.txt

    - name: print-message
      inputs:
        parameters:
          - name: message
      container:
        image: yky8/argosay:v2
        command: [ "/usr/local/bin/argosay" ]
        args: [ "echo","{{inputs.parameters.message}}" ]

Here, a step’s output parameter is used as an input parameter, rather than an artifact. In this example, the generate-parameter step generates an output parameter named hello-param, and the consume-parameter step passes this output parameter as an input parameter to the print-message step.

If using a DAG, you can access the output parameter with {{tasks.generate-parameter.outputs.parameters.hello-param}}.

argo submit output-params.yaml -n argo --watch

`outputs.result` Captures Standard Output

Only 256 KB of the standard output stream will be captured.

The output of a script is captured using outputs.result. Refer to the previous section for details.
The standard output of container steps and tasks is also captured and stored in a result parameter.
- For example, if there is a task named log-int, its result can be accessed using {{tasks.log-int.outputs.result}}. If you are using steps, replace tasks with steps, i.e., {{steps.log-int.outputs.result}}. This way, you can use the output result of a step or task in other parts of the workflow.

Argo Workflow 3