Batch workflow syntax
Batch workflow
Batch workflows are located in the .apolo/<batch-name>.yml
or .apolo/<batch-name>.yaml
file under the flow's root. The config filename should be lowercase and not start with a digit if the id
attribute is not specified. The following YAML attributes are supported:
kind
kind
Required The workflow kind, must be batch
for batch workflows.
id
id
Expression contexts: This attribute cannot contain expressions.
Identifier of the workflow. By default, the id
is generated from the filename of the workflow config, with hyphens being (-
) replaced with underscores (_
). It's available as a ${{ flow.flow_id }}
in experssions.
title
title
Workflow title.
Expression contexts: This attribute only allows expressions that don't access contexts.
defaults
defaults
A map of default settings that will be applied to all tasks in the workflow. You can override these global default settings for specific tasks.
defaults.env
defaults.env
A mapping of environment variables that are available to all tasks in the workflow. You can also set environment variables that are only available to a task. For more information, see tasks.env
.
When two or more environment variables are defined with the same name, apolo-flow
uses the most specific environment variable. For example, an environment variable defined in a task will override the workflow default.
Example:
This attribute also supports dictionaries as values:
Expression contexts: flow
context, params
context.
defaults.life_span
defaults.life_span
The default lifespan for jobs ran by the workflow. It can be overridden by tasks.life_span
. If not set, the default task lifespan is 1 day. The lifespan value can be one of the following:
A
float
number representing the amount of seconds (3600
represents an hour)A string of the following format:
1d6h15m
(1 day, 6 hours, 15 minutes)
For lifespan-disabling emulation, use an arbitrary large value (e.g. 365d
). Keep in mind that this may be dangerous, as a forgotten job will consume cluster resources.
life span shorter than 1 minute is forbidden.
Example:
Expression contexts: flow
context, params
context.
defaults.preset
defaults.preset
The default preset name used by all tasks if not overridden by tasks.preset
. The system-wide default preset is used if both defaults.preset
and tasks.preset
are omitted.
Example:
defaults.volumes
defaults.volumes
Volumes that will be mounted to all tasks by default.
Example:
Default volumes are not passed to actions.
Expression contexts: flow
context, params
context.
defaults.schedule_timeout
defaults.schedule_timeout
The default timeout for task scheduling. See tasks.schedule_timeout
for more information.
The attribute accepts the following values:
A
float
number representing the amount of seconds (3600
represents an hour)A string of the following format:
1d6h15m45s
(1 day, 6 hours, 15 minutes, 45 seconds).
The cluster-wide timeout is used if both default.schedule_timeout
and tasks.schedule_timeout
are omitted.
Example:
Expression contexts: flow
context, params
context.
defaults.tags
defaults.tags
A list of tags that are added to every task created by the workflow. A particular task definition can extend this global list by tasks.tags
.
Example:
This attribute supports lists as values.
defaults.workdir
defaults.workdir
The default working directory for tasks created by this workflow. See tasks.workdir
for more information.
Expression contexts: flow
context, params
context.
defaults.fail_fast
defaults.fail_fast
When set to true
, the system cancels all in-progress tasks if any one of them fails. It can be overridden by tasks.strategy.fail_fast
. This is set to true
by default.
Example:
Expression contexts: flow
context, params
context, env
context, tags
context, volumes
context, images
context.
defaults.max_parallel
defaults.max_parallel
The maximum number of tasks that can run simultaneously during flow execution. By default, there is no limit.
Example:
Expression contexts: flow
context, params
context, env
context, tags
context, volumes
context, images
context.
defaults.cache
defaults.cache
A mapping that defines how task outputs are cached. It can be overridden by tasks.cache
``
defaults.cache.strategy
defaults.cache.strategy
The default strategy to use for caching. Available options are:
"none"
Don't use caching at all."default"
The basic caching algorithm that reuses cached outputs in case task definitions and all expression contexts available in the task's expressions are the same.
Default: "default"
Example:
Expression contexts: flow
context, params
context, env
context, tags
context, volumes
context, images
context.
defaults.cache.life_span
defaults.cache.life_span
The default cache invalidation duration. The attribute accepts the following values:
A
float
number representing the amount of seconds (3600
represents an hour)A string of the following format:
1d6h15m45s
(1 day, 6 hours, 15 minutes, 45 seconds)
Default: 14d
(two weeks).
If you decrease this value and re-run the flow, apolo-flow
will ignore cache entries that were added longer ago than the new cache.life_span
value states.
Example:
Expression contexts: flow
context, params
context, env
context, tags
context, volumes
context, images
context.
images
images
A mapping of image definitions used by the batch workflow.
If the specified image reference is available at the Apolo registry and the .force_rebuild
flag is not set, then Apolo Flow will not attempt to build the image from scratch. If this flag is set or the image is not in the registry, then the platform will start buliding the image.
The images
section is not required. A task can specify the image name in a plain string without referring to the ${{ images.my_image.ref }}
context.
images.<image-id>
images.<image-id>
The key image-id
is a string and its value is a map of an image's configuration data. You must replace <image-id>
with a string that is unique to the images
object. <image-id>
must start with a letter and contain only alphanumeric characters or underscore symbols _
. Dash -
is not allowed.
images.<image-id>.ref
images.<image-id>.ref
Required Image reference that can be used in the tasks.image
expression.
Example of a self-hosted image:
This can only use locally accessible functions (such as hash_files
). Its value will be calculated before the remote executor starts.
Example of external image:
Use the embedded hash_files()
function to generate the built image's tag based on its content.
Example of an auto-calculated stable hash:
Expression contexts: flow
context.
images.<image-id>.context
images.<image-id>.context
The Docker context used to build an image. Can be either local path (e.g. ${{ flow.workspace / 'some/dir' }}
) or a remote volume spec (e.g. storage:subdir/${{ flow.flow_id }}
). If it's a local path, it cannot use anything that's not available at the beginning of a bake (for example, action inputs). If it's a remote path, usage of dynamic values is allowed. Local context will be automatically uploaded to storage during the "local actions" step of the bake.
Example:
apolo-flow
cannot build images without the context, but can address pre-built images using images.<image-id>.ref
Expression contexts: flow
context.
images.<image-id>.dockerfile
images.<image-id>.dockerfile
An optional Docker file name used for building images, Dockerfile
by default.
Works almost the same as .context
- if it's a local path, dynamic values are forbidden and it's automatically uploaded. If it's a remote path, then dynamic values are allowed.
Example:
Expression contexts: flow
context.
images.<image-id>.build_args
images.<image-id>.build_args
A list of optional build arguments passed to the image builder. See Docker documentation for details. Supports dynamic values such as action inputs.
Example:
Expression contexts: flow
context.
images.<image-id>.env
images.<image-id>.env
A mapping of environment variables passed to the image builder. Supports dynamic values such as action inputs.
Example:
This attribute also supports dictionaries as values:
Expression contexts: flow
context.
images.<image-id>.volumes
images.<image-id>.volumes
A list of volume references mounted to the image building process. Supports dynamic values such as action inputs.
Example:
This attribute also supports lists as values:
Expression contexts: flow
context.
images.<image-name>.force_rebuild
images.<image-name>.force_rebuild
If this flag is enabled, the referenced image will be rebuilt from scratch for each bake.
Example:
params
params
Mapping of key-value pairs that have default values.
This attribute describes a set of names and default values of parameters accepted by a flow.
Parameters can be specified in short and long forms.
The short form is compact but only allows to specify the names and default values of parameters:
The long form allows to additionally specify parameter descriptions. This can be useful for apolo-flow bake
command introspection, shell autocompletion, and generation of more detailed error messages:
This attribute can be overridden from the command line in two ways while running a batch in Apolo CLI:
Specifying the parameters through
--param
.
Pointing to a YAML file with parameter descriptions through
--meta-from-file
.
The file should have the following structure:
Expression contexts: This attribute only allows expressions that don't access contexts.
images
images
A mapping of image definitions used by this workflow.
Unlike live flow images, batch flow images cannot be built using apolo-flow build <image-id>
.
The images
section is not required. A task can specify the image name in a plain string without referring to the ${{ images.my_image.ref }}
context.
However, this section exists for convenience: there is no need to repeat yourself if you can just point the image reference everywhere in the YAML.
The following fields are disabled in batch workflows and will be ignored:
images.<image-id>.context
images.<image-id>.dockerfile
images.<image-id>.build_args
images.<image-id>.env
images.<image-id>.volumes
images.<image-id>
images.<image-id>
The key image-id
is a string and its value is a map of the task's configuration data. You must replace <image-id>
with a string that is unique to the images
object. <image-id>
must start with a letter and contain only alphanumeric characters or underscore symbols _
. Dash -
is not allowed.
images.<image-id>.ref
images.<image-id>.ref
Required Image reference that can be used in the tasks.image
expression.
You can use the image definition to address images hosted either on the Apolo registry or Docker Hub.
Example:
Expression contexts: flow
context, params
context.
volumes
volumes
A mapping of volume definitions available in this workflow. A volume defines a link between the Apolo storage folder and a remote folder that can be mounted to a task.
Unlike live flow volumes, batch flow volumes cannot be synchronized by apolo-flow upload
and apolo-flow download
commands. They can only be mounted to a task by using task.volumes
attribute.
The following fields are disabled in batch workflows and will cause an error:
volumes.<volume-id>.local
volumes.<volume-id>
volumes.<volume-id>
The key volume-id
is a string and its value is a map of the volume's configuration data. You must replace <volume-id>
with a string that is unique to the volumes
object. The <volume-id>
must start with a letter and contain only alphanumeric characters or underscore symbols _
. Dash -
is not allowed.
volumes.<volume-id>.remote
volumes.<volume-id>.remote
Required Volume URI on the Apolo Storage.
Example:
Expression contexts: flow
context, params
context.
volumes.<volume-id>.mount
volumes.<volume-id>.mount
Required Mount path inside a task.
Example:
Expression contexts: flow
context, params
context.
volumes.<volume-id>.read_only
volumes.<volume-id>.read_only
The volume is mounted as read-only by default if this attribute is enabled, read-write mode is used otherwise.
Example:
Expression contexts: flow
context, params
context.
tasks
tasks
List of tasks and action calls that this batch workflow contains. Unlike jobs in live workflows, all tasks are executed with one command in the order specified by tasks.needs
. To start execution, run apolo-flow bake <batch-id>
.
Example:
Attributes for tasks and action calls
The attributes described in this section can be applied both to plain tasks and action calls. To simplify reading, this section uses the term "task" instead of "task or action call".
tasks.id
tasks.id
A unique identifier for the task. It's used to reference the task in tasks.needs
. The value must start with a letter and only contain alphanumeric characters or underscore symbols (_
). The dash symbol (-
) is not allowed.
It is impossible to refer to tasks without an ID inside the workflow file, but you can refer to them as task-<num>
in the command line output. Here, <num>
is an index from the tasks
list.
Expression contexts: matrix
context.
tasks.needs
tasks.needs
An array of strings identifying all tasks that must be completed or running before this task will run. If a task fails, all tasks that need it are skipped unless the task uses a tasks.enable
statement that causes it to ignore the dependency failure.
By default, tasks.needs
is set to the previous task in the tasks
list. In case the previous task has matrix
enabled, the current task will only run after all matrix tasks are completed.
This property also specifies what entries are available in the needs context.
Example 1:
In this case, tasks will be executed in the following order:
task_1
task_2
task_3
The order is the same as in the default behavior without needs
.
Example 2:
In this case, tasks will be executed in the following order:
task_1 and task_2 (simultaneously)
task_3
Example 3
Here, task_3 will only be executed if task_1 and task_2 are already running.
The following are two different ways to specify needed tasks:
The possible task states are running
and completed
.
You can use apolo-flow inspect --view BAKE_ID
to view the graph of running batch tasks converted to a PDF file.
Expression contexts: matrix
context.
tasks.enable
tasks.enable
The flag that prevents a task from running unless a condition is met. To learn how to write conditions, refer to expression syntax. Default: ${{ success() }}
Example:
Expression contexts: flow
context, params
context, env
context, tags
context, volumes
context, images
context, matrix
context, strategy
context, needs
context.
tasks.strategy
tasks.strategy
A mapping that defines a matrix and some auxiliary attributes to run multiple instances of the same task.
tasks.strategy.matrix
tasks.strategy.matrix
The matrix
attribute defines a set of configurations with which to run a task. Each key in this mapping sets some variable that will be available in the matrix
context in expressions of the corresponding task. Each value should be an array, and apolo-flow
will start task variants with all possible combinations of items from these arrays. The matrix can generate 256 different tasks at most.
Example 1:
In this example, tasks with IDs example_a
and example_b
will be generated.
Example 2:
In this example, tasks with IDs a_x
, a_y
, b_x
, b_y
will be generated.
Auto-generated IDs for matrix tasks will have suffixes in the form of -<param-1>-<param-2>
Expression contexts: Matrix values only allows expressions that don't access contexts.
tasks.strategy.matrix.exclude
tasks.strategy.matrix.exclude
exclude
is a list of combinations to remove from the matrix. These entries can be partial, in which case all matching combinations are excluded.
Example:
In this example, tasks with IDs a_x_2
, a_y_1
, a_y_2
, b_x_1
, b_x_2
will be generated.
Expression contexts: Matrix values only allows expressions that don't access contexts.
tasks.strategy.matrix.include
tasks.strategy.matrix.include
include
is a list of combinations to add to the matrix. These entries cannot be partial. In case exclude
is also set, include
will be applied after it.
Example:
In this example, tasks with IDs a_x
, a_y
, b_x
, b_y
, a_z
will be generated.
Expression contexts: Matrix values only allows expressions that don't access contexts.
tasks.strategy.fail_fast
tasks.strategy.fail_fast
When set to true
, the system cancels all in-progress tasks if this task or any of its matrix tasks fail. Default: true
.
Expression contexts: flow
context, params
context, env
context, tags
context, volumes
context, images
context, strategy
context (contains flow global values).
tasks.strategy.max_parallel
tasks.strategy.max_parallel
The maximum number of matrix tasks that can run simultaneously. By default, there is no limit.
Expression contexts: flow
context, params
context, env
context, tags
context, volumes
context, images
context, strategy
context (contains flow global values).
tasks.cache
tasks.cache
A mapping that defines how task outputs are cached.
tasks.cache.strategy
tasks.cache.strategy
The strategy to use for caching of this task. Available options are:
"none"
Don't use caching at all"inherit"
Use the flow default value fromdefaults.cache.stategy
``"default"
The basic caching algorithm that reuses cached outputs in case task definition and all expression contexts available in task expressions are the same.
Default: "inherit"
Example:
Expression contexts: flow
context, params
context, env
context, tags
context, volumes
context, images
context, matrix
context, strategy
context.
tasks.cache.life_span
tasks.cache.life_span
The cache invalidation duration. This attribute can accept the following values:
A
float
number representing an amount of secondsA string in the following format:
1d6h15m45s
(1 day, 6 hours, 15 minutes, 45 seconds)
Defaults to defaults.cache.life_span
if not specified.
If you decrease this value and re-run the flow, apolo-flow
will ignore cache entries that were added longer ago than the new cache.life_span
value specifies.
Example:
Expression contexts: flow
context, params
context, env
context, tags
context, volumes
context, images
context, matrix
context, strategy
context.
Attributes for tasks
The attributes in this section are only applicable to plain tasks that are executed by running Docker images on the Apolo platform.
tasks.image
tasks.image
Required Each task is executed remotely on the Apolo cluster using a Docker image. The image can be hosted on Docker Hub (python:3.9
or ubuntu:20.04
) or on the Apolo Registry (image:my_image:v2.3
).
Example with a constant image string:
You may often want to use the reference to images.<image-id>
.
Example with a reference to the images
section:
Expression contexts: flow
context, params
context, env
context, tags
context, volumes
context, images
context, matrix
context, strategy
context, needs
context.
tasks.cmd
tasks.cmd
A tasks executes either a command, a bash script, or a python script. The cmd
, bash
, and python
are mutually exclusive: only one of the three is allowed at the same time. If none of these three attributes are specified, the CMD
from the tasks.image
is used.
The cmd
attribute points to the command with optional arguments that is available in the executed tasks.image
.
Example:
Expression contexts: flow
context, params
context, env
context, tags
context, volumes
context, images
context, matrix
context, strategy
context, needs
context.
tasks.bash
tasks.bash
This attribute contains a bash
script to run.
Using cmd
to run a bash script can be tedious: you need to apply quotas to the executed script and set proper bash flags to fail on error.
The bash
attribute is essentially a shortcut for cmd: bash -euo pipefail -c <shell_quoted_attr>
.
This form is especially handy for executing complex multi-line bash scripts.
cmd
, bash
, and python
are mutually exclusive.
bash
should be pre-installed on the image to make this attribute work.
Example:
Expression contexts: flow
context, params
context, env
context, tags
context, volumes
context, images
context, matrix
context, strategy
context, needs
context.
tasks.python
tasks.python
This attribute contains a python
script to run.
Python is usually considered to be one of the best languages for scientific calculation. If you prefer writing simple inlined commands in python
instead of bash
, this notation is great for you.
The python
attribute is essentially a shortcut for cmd: python3 -uc <shell_quoted_attr>
.
cmd
, bash
, and python
are mutually exclusive.
python3
should be pre-installed on the image to make this attribute work.
Example:
Expression contexts: flow
context, params
context, env
context, tags
context, volumes
context, images
context, matrix
context, strategy
context, needs
context.
tasks.entrypoint
tasks.entrypoint
You can override the Docker image ENTRYPOINT
if needed or specify one. Unlike the Docker ENTRYPOINT
instruction that has a shell and exec form, the entrypoint
attribute only accepts a single string defining the executable to be run.
Example:
Expression contexts: flow
context, params
context, env
context, tags
context, volumes
context, images
context, matrix
context, strategy
context, needs
context.
tasks.env
tasks.env
Sets environment variables to use in the executed task.
When two or more variables are defined with the same name, apolo-flow
uses the most specific environment variable. For example, an environment variable defined in a task will override the workflow default.
Example:
This attribute also supports dictionaries as values:
Expression contexts: flow
context, params
context, env
context, tags
context, volumes
context, images
context, matrix
context, strategy
context, needs
context.
tasks.http_auth
tasks.http_auth
Control whether the HTTP port exposed by the tasks requires the Apolo Platform authentication for access.
You may want to disable the authentication to allow everybody to access your task's exposed web resource.
By default, tasks have HTTP protection enabled.
Example:
Expression contexts: flow
context, params
context, env
context, tags
context, volumes
context, images
context, matrix
context, strategy
context, needs
context.
tasks.http_port
tasks.http_port
The number of the task's HTTP port that will be exposed globally.
By default, the Apolo Platform exposes the task's internal 80
port as an HTTPS-protected resource.
You may want to expose a different local port. Use 0
to disable the feature entirely.
Example:
Expression contexts: flow
context, params
context, env
context, tags
context, volumes
context, images
context, matrix
context, strategy
context, needs
context.
tasks.life_span
tasks.life_span
The time period after which a task will be automatically killed.
By default, tasks live 1 day. You may want to change this period by customizing the attribute.
The value could be:
A float number representing an amount of seconds (
3600
for an hour)An expression in the following format:
1d6h15m
(1 day, 6 hours, 15 minutes)
Use an arbitrary large value (e.g. 365d
) for lifespan-disabling emulation. Keep in mind that this can be dangerous, as a forgotten task will consume cluster resources.
Example:
Expression contexts: flow
context, params
context, env
context, tags
context, volumes
context, images
context, matrix
context, strategy
context, needs
context.
tasks.name
tasks.name
Allows you to specify a task's name. This name becomes a part of the task's internal hostname and exposed HTTP URL, and the task can then be controlled by its name through the low-level apolo
tool.
The name is completely optional, the apolo-flow
tool doesn't require it to work properly.
Example:
Expression contexts: flow
context, params
context, env
context, tags
context, volumes
context, images
context, matrix
context, strategy
context, needs
context.
tasks.pass_config
tasks.pass_config
Set this attribute to true
if you want to pass the Apolo config used to execute apolo-flow run ...
command into the spawned task. This can be useful if you use a task image with Apolo CLI installed and want to execute apolo ...
commands inside the running task.
By default, the config is not passed.
Example:
Expression contexts: flow
context, params
context, env
context, tags
context, volumes
context, images
context, matrix
context, strategy
context, needs
context.
tasks.restart
tasks.restart
Optional Control the task behavior when main process exits.
Possible values: never
(default), on-failure
and always
.
Set this attribute to on-failure
if you want your task to be restarted if the main process exits with non-zero exit code. If you set this attribute to always,
the task will be restarted even if the main process exits with 0. In this case you will need to terminate the task manually or it will be automatically terminated when it's lifespan ends. never
implies the platform does not restart the task and this value is used by default.
Example:
Expression contexts: flow
context, params
context, env
context, tags
context, volumes
context, images
context, matrix
context, strategy
context, needs
context.
tasks.preset
tasks.preset
The preset to execute the task with.
Expression contexts: flow
context, params
context, env
context, tags
context, volumes
context, images
context, matrix
context, strategy
context, needs
context.
tasks.schedule_timeout
tasks.schedule_timeout
Use this attribute if you want to increase the schedule timeout. This will prevent a task from failing if the Apolo cluster is under high load and requested resources are likely to not be available at the moment.
If the Apolo cluster has no resources to launch a task immediately, this task is pushed into the waiting queue. If the task is not started yet at the end of the schedule timeout, it will be failed.
The default system-wide schedule timeout is controlled by the cluster administrator and is usually about 5-10 minutes.
The value of this attribute can be:
A
float
number representing an amount of secondsA string in the following format:
1d6h15m45s
(1 day, 6 hours, 15 minutes, 45 seconds)
Example:
Expression contexts: flow
context, params
context, env
context, tags
context, volumes
context, images
context, matrix
context, strategy
context, needs
context.
tasks.tags
tasks.tags
A list of additional task tags.
Each task is tagged. A task's tags are taken from this attribute and system tags (project:<project-id>
, flow:<flow-id>
, and task:<task-id>
).
Example:
This attribute also supports lists as values:
Expression contexts: flow
context, params
context, env
context, tags
context, volumes
context, images
context, matrix
context, strategy
context, needs
context.
tasks.title
tasks.title
A task's title. The title is equal to <task-id>
if not overridden.
Expression contexts: flow
context, params
context, env
context, tags
context, volumes
context, images
context, matrix
context, strategy
context, needs
context.
tasks.volumes
tasks.volumes
A list of task volumes. You can specify a plain string for the volume reference or use the ${{ volumes.<volume-id>.ref }}
expression.
Example:
This attribute also supports lists as values:
Expression contexts: flow
context, params
context, env
context, tags
context, volumes
context, images
context, matrix
context, strategy
context, needs
context.
tasks.workdir
tasks.workdir
A working directory to use inside the task.
This attribute takes precedence if specified. Otherwise, the WORKDIR
definition from the image is used.
Expression contexts: flow
context, params
context, env
context, tags
context, volumes
context, images
context, matrix
context, strategy
context, needs
context.
Attributes for actions calls
The attributes described in this section are only applicable to action calls. An action is a reusable part that can be integrated into a workflow. Refer to the actions reference to learn more about actions.
tasks.action
tasks.action
A URL that selects an action to run. It supports two schemes:
workspace:
orws:
for action files that are stored locallygithub:
orgh:
for actions that are bound to a Github repository
The ws:
scheme expects a valid filesystem path to the action file.
The gh:
scheme expects the following format: {owner}/{repo}@{tag}
. Here, {owner}
is the owner of the Github repository, {repo}
is the repository's name, and {tag}
is the commit tag. Commit tags are used to allow versioning of actions.
Example of the ws:
scheme
Example of the gh:
scheme
Expression contexts: This attribute only allows expressions that don't access contexts.
tasks.args
tasks.args
Mapping of values that will be passed to the actions as arguments. This should correspond to inputs
defined in the action file.
Example:
Expression contexts: flow
context, params
context, env
context, tags
context, volumes
context, images
context, matrix
context, strategy
context, needs
context.
Last updated