Argo Workflow 3
Argo Workflow (3)
1 Artifacts
When running workflows, it is very common for steps to generate or use artifacts. Typically, an output artifact from one step may be used as an input artifact by a subsequent step.
The following workflow specification contains two sequential steps. The first step, named generate-artifact, uses the argosay template to generate an artifact, which is then used by the second step, named print-message, which consumes the generated artifact.
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: artifact-passing-
spec:
entrypoint: artifact-example
templates:
- name: artifact-example
steps:
- - name: generate-artifact
template: argosay
- - name: consume-artifact
template: print-message
arguments:
artifacts:
# bind message to the hello-art artifact
# generated by the generate-artifact step
- name: message
from: "{{steps.generate-artifact.outputs.artifacts.hello-art}}"
- name: argosay
container:
image: yky8/argosay:v2
# sh -c allows you to provide a string that sh will execute as a complete shell command
command: [sh, -c]
# The tee command reads from standard input and writes to both standard output and files simultaneously
# Useful for saving command output to a file while viewing it in the terminal
args: ["/usr/local/bin/argosay echo 'hello world' | tee /tmp/hello_world.txt"]
outputs:
artifacts:
# generate hello-art artifact from /tmp/hello_world.txt
# artifacts can be directories as well as files
- name: hello-art # Used by consume-artifact step, specified with "from" field
path: /tmp/hello_world.txt
- name: print-message
inputs:
artifacts:
# unpack the message input artifact
# and put it at /tmp/message
- name: message
path: /tmp/message # put it at /tmp/message
container:
image: alpine:latest
command: [sh, -c]
args: ["cat /tmp/message"]
argo submit artifacts.yaml -n argo
Handling Large Artifacts
When running workflows, it is very common for steps to generate or use artifacts. Typically, an output artifact from one step may be used as an input artifact by a subsequent step.
<... snipped ...>
- name: print-large-artifact
# below patch gets merged with the actual pod spec and increases the memory
# request of the init container.
podSpecPatch: |
initContainers:
- name: init
resources:
requests:
memory: 2Gi
cpu: 300m
inputs:
artifacts:
- name: data
path: /tmp/large-file
container:
image: alpine:latest
command: [sh, -c]
args: ["cat /tmp/large-file"]
<... snipped ...>
Artifact Archiving Strategy
In Argo Workflows, artifacts are by default archived as tarballs and compressed with gzip. You can customize this behavior by specifying the archive strategy. Below is an example showing how to customize the archiving strategy using the archive field:
<... snipped ...>
outputs:
artifacts:
# default behavior - tar+gzip default compression.
- name: hello-art-1
path: /tmp/hello_world.txt
# disable archiving entirely - upload the file / directory as is.
# this is useful when the container layout matches the desired target repository layout.
- name: hello-art-2
path: /tmp/hello_world.txt
archive:
none: {}
# customize the compression behavior (disabling it here).
# this is useful for files with varying compression benefits,
# e.g. disabling compression for a cached build workspace and large binaries,
# or increasing compression for "perfect" textual data - like a json/xml export of a large database.
- name: hello-art-3
path: /tmp/hello_world.txt
archive:
tar:
# no compression (also accepts the standard gzip 1 to 9 values)
compressionLevel: 0
<... snipped ...>
Artifact Garbage Collection
Documentation for supported storage engines: https://argo-workflows.readthedocs.io/en/latest/configure-artifact-repository/
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: artifact-gc-
spec:
entrypoint: main
artifactGC:
strategy: OnWorkflowDeletion # default Strategy set here applies to all Artifacts by default
templates:
- name: main
container:
image: argoproj/argosay:v2
command:
- sh
- -c
args:
- |
echo "can throw this away" > /tmp/temporary-artifact.txt
echo "keep this" > /tmp/keep-this.txt
outputs:
artifacts:
- name: temporary-artifact
path: /tmp/temporary-artifact.txt
s3:
key: temporary-artifact.txt
- name: keep-this
path: /tmp/keep-this.txt
s3:
key: keep-this.txt
artifactGC:
strategy: Never # optional override for an Artifact
Naming Artifacts – Parameterization
When there may be concurrent runs of the same workflow, consider using parameterized S3 keys, such as {{workflow.uid}}. This avoids situations where one workflow deletes an artifact while another generates an artifact with the same S3 key.
For example, if you have two concurrent workflows using the same S3 key to store artifacts, one workflow might delete its artifacts while the other is still generating artifacts. This could result in issues where the second workflow finds its artifacts deleted or overwrites the first workflow’s artifacts.
To avoid this, use parameterized S3 keys so each workflow has a unique S3 key, such as using {{workflow.uid}} as part of the S3 key.
Service Account or IAM Annotations for Storage Services
If you need to use a service account or IAM annotations for storage services, you can specify these annotations in the workflow specification. These annotations are passed to the storage service to control access to the storage bucket.
You can specify a service account or IAM annotations for the entire workflow or for each artifact.
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: artifact-gc-
spec:
entrypoint: main
artifactGC:
strategy: OnWorkflowDeletion
##############################################################################################
# Workflow Level Service Account and Metadata
##############################################################################################
serviceAccountName: my-sa
podMetadata:
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::111122223333:role/my-iam-role
templates:
- name: main
container:
image: argoproj/argosay:v2
command:
- sh
- -c
args:
- |
echo "can throw this away" > /tmp/temporary-artifact.txt
echo "keep this" > /tmp/keep-this.txt
outputs:
artifacts:
- name: temporary-artifact
path: /tmp/temporary-artifact.txt
s3:
key: temporary-artifact-{{workflow.uid}}.txt
artifactGC:
####################################################################################
# Optional override capability
####################################################################################
serviceAccountName: artifact-specific-sa
podMetadata:
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::111122223333:role/artifact-specific-iam-role
- name: keep-this
path: /tmp/keep-this.txt
s3:
key: keep-this-{{workflow.uid}}.txt
artifactGC:
strategy: Never
To support custom service accounts, you need to create a Role and RoleBinding and bind the Role to the ServiceAccount:
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
annotations:
workflows.argoproj.io/description: |
This is the minimum recommended permissions needed if you want to use artifact GC.
name: artifactgc
rules:
- apiGroups:
- argoproj.io
resources:
- workflowartifactgctasks
verbs:
- list
- watch
- apiGroups:
- argoproj.io
resources:
- workflowartifactgctasks/status
verbs:
- patch
If you used the quick start manifest file to install, you get a role named artifactgc. If
you have installed it using the Helm Chart, you need to install this role manually. You can bind this role to a service account using a role binding:
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: artifactgc
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: artifactgc
subjects:
- kind: ServiceAccount
name: artifact-specific-sa
What Happens When Garbage Collection (GC) Fails in Argo Workflow?
If artifact deletion fails for some reason (except when the artifact has already been deleted, which is not considered a failure), the workflow’s status will be marked with a new condition to indicate an “Artifact GC Failure”. Additionally, Kubernetes will emit an event, and the Argo Server UI will display the failure information. To further debug, users should find one or more Pods named <wfName>-artgc-* and check their logs.
If users need to delete the workflow and its child CRD objects, they need to patch the workflow to remove the finalizer that prevents deletion:
apiVersion: argoproj.io/v1alpha1
kind: Workflow
finalizers:
- workflows.argoproj.io/artifact-gc
You can remove the finalizer with the following command:
kubectl patch workflow my-wf \
--type json \
--patch='[ { "op": "remove", "path": "/metadata/finalizers" } ]'
Alternatively, for easier operation, you can use the argo delete command with the --force flag, which removes the finalizer before performing the delete operation.
In version 3.5 and higher, a flag named forceFinalizerRemoval has been added to the Workflow Spec, allowing the finalizer to be forcibly removed even if Artifact GC fails:
spec:
artifactGC:
strategy: OnWorkflowDeletion
forceFinalizerRemoval: true
This means the workflow can be deleted even if artifact garbage collection fails.
2 Built-in Artifacts
Argo Workflows provides built-in support for several common artifact types. These include git repositories, HTTP resources, GCS buckets, and S3 buckets. While you can use any container for any purpose, these built-in artifact types make it easier to work with these common types.
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: hardwired-artifact-
spec:
entrypoint: hardwired-artifact
templates:
- name: hardwired-artifact
podSpecPatch: |
initContainers:
- name: init
resources:
requests:
memory: 2Gi
cpu: 300m
inputs:
artifacts:
# Check out the main branch of the argo repo and place it at /src
# revision can be anything that git checkout accepts: branch, commit, tag, etc.
- name: argo-source
path: /src
git:
repo: https://github.com/argoproj/argo-workflows.git
revision: "main"
# Download kubectl 1.8.0 and place it at /bin/kubectl
- name: kubectl
path: /bin/kubectl
mode: 0755
http:
url: https://storage.googleapis.com/kubernetes-release/release/v1.8.0/bin/linux/amd64/kubectl
# Copy an s3 compatible artifact repository bucket (such as AWS, GCS and MinIO) and place it at /s3
# - name: objects
# path: /s3
# s3:
# endpoint: storage.googleapis.com
# bucket: my-bucket-name
# key: path/in/bucket
# accessKeySecret:
# name: my-s3-credentials
# key: accessKey
# secretKeySecret:
# name: my-s3-credentials
# key: secretKey
container:
image: debian
command: [sh, -c]
args: ["ls -l /src /bin/kubectl"] # /s3 is not tested here
These are examples of using artifacts in Argo Workflow. Artifacts are files or directories generated or used during the workflow execution. In this example, the workflow template defines three input artifacts:
-
argo-source: This artifact uses the git type. It checks out a specified revision (in this case, the “main” branch) from the specified git repository (in this case, https://github.com/argoproj/argo-workflows.git) and places it in the/srcdirectory. -
kubectl: This artifact uses the HTTP type. It downloads a file from the specified URL (in this case, https://storage.googleapis.com/kubernetes-release/release/v1.8.0/bin/linux/amd64/kubectl) and places it in the/bin/kubectldirectory. Themode: 0755setting makes this file executable. -
objects: This artifact uses the S3 type. It downloads files or directories from a specified S3-compatible storage service (in this case, storage.googleapis.com) using the specified bucket (my-bucket-name) and key (path/in/bucket), and places them in the/s3directory. TheaccessKeySecretandsecretKeySecretfields specify the Kubernetes secrets storing the access key and secret key for AWS.
The template then defines a container that runs a command to list the contents of the /src, /bin/kubectl, and /s3 directories, where the input artifacts are placed.
3 Script
Sometimes, you may want the workflow to execute a script rather than directly running a container. In such cases, you can use the script type of artifact. A script type of artifact is a script, which can be a shell script, Python script, Perl script, Ruby script, etc. In this example, we will demonstrate how to use the script type of artifact.
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: scripts-
spec:
entrypoint: bash-script-example
templates:
- name: bash-script-example
steps:
- - name: generate-from-bash
template: gen-random-int-bash
- name: generate-from-python
template: gen-random-int-python
- name: generate-from-javascript
template: gen-random-int-javascript
- - name: print-for-bash
template: print-message
arguments:
parameters:
- name: message
value: "[BASH] {{steps.generate-from-bash.outputs.result}}" # The result of the here-script
- name: print-for-python
template: print-message
arguments:
parameters:
- name: message
value: "[PY] {{steps.generate-from-python.outputs.result}}" # The result of the Python script
- name: print-for-javascript
template: print-message
arguments:
parameters:
- name: message
value: "[JS] {{steps.generate-from-javascript.outputs.result}}" # The result of the JavaScript script
- name: gen-random-int-bash
script:
image: debian:9.4
command: [bash]
source: | # Contents of the here-script
cat /dev/urandom | od -N2 -An -i | awk -v f=1 -v r=100 '{printf "%i\n", f + r * $1 / 65536}'
- name: gen-random-int-python
script:
image: python:alpine3.6
command: [python]
source: |
import random
i = random.randint(1, 100)
print(i)
- name: gen-random-int-javascript
script:
image: node:9.1-alpine
command: [node]
source: |
var rand = Math.floor(Math.random() * 100);
console.log(rand);
- name: print-message
inputs:
parameters:
- name: message
container:
image: alpine:latest
command: [sh, -c]
args: ["echo result was: {{inputs.parameters.message}}"]
The script keyword allows using the source label to specify the script body. This creates a temporary file containing the script body, and the name of this temporary file is passed as the last argument to command. The command should be an interpreter that executes the script body.
Using the script feature also assigns the standard output of the script to a special output parameter named result. This allows you to use the result of running the script itself in other parts of the workflow specification. In this example, the result is simply echoed by the print-message template.
argo submit scripts-parallel.yaml -n argo --watch
4 Output Parameters
Output parameters provide a general mechanism to use the results of steps as parameters (not just as artifacts). This allows you to use the results of any type of step, not just scripts, for conditional testing, looping, and parameterization. Output parameters work similarly to script results, except the value of the output parameter is set to the contents of a generated file, rather than the contents of stdout.
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: output-parameter-
spec:
entrypoint: output-parameter
templates:
- name: output-parameter
steps:
- - name: generate-parameter
template: argosay
- - name: consume-parameter
template: print-message
arguments:
parameters:
# Pass the hello-param output from the generate-parameter step as the message input to print-message
- name: message
value: "{{steps.generate-parameter.outputs.parameters.hello-param}}"
- name: argosay
container:
image: yky8/argosay:v2
command: [ sh, -c ]
args: [ "echo -n hello world > /tmp/hello_world.txt" ] # generate the content of hello_world.txt
outputs:
parameters:
- name: hello-param # name of output parameter
valueFrom:
path: /tmp/hello_world.txt # set the value of hello-param to the contents of this hello-world.txt
- name: print-message
inputs:
parameters:
- name: message
container:
image: yky8/argosay:v2
command: [ "/usr/local/bin/argosay" ]
args: [ "echo","{{inputs.parameters.message}}" ]
Here, a step’s output parameter is used as an input parameter, rather than an artifact. In this example, the generate-parameter step generates an output parameter named hello-param, and the consume-parameter step passes this output parameter as an input parameter to the print-message step.
If using a DAG, you can access the output parameter with {{tasks.generate-parameter.outputs.parameters.hello-param}}.
argo submit output-params.yaml -n argo --watch
outputs.result Captures Standard Output
Only 256 KB of the standard output stream will be captured.
-
The output of a script is captured using
outputs.result. Refer to the previous section for details. -
The standard output of container steps and tasks is also captured and stored in a result parameter.
- For example, if there is a task named
log-int, its result can be accessed using{{tasks.log-int.outputs.result}}. If you are using steps, replacetaskswithsteps, i.e.,{{steps.log-int.outputs.result}}. This way, you can use the output result of a step or task in other parts of the workflow.
- For example, if there is a task named