Jobs
Running jobs using the Apolo CLI
Overview
The Jobs App is a tool that allows users to schedule and execute containerized tasks and processes. It provides a user-friendly interface to manage these workloads, simplifying tasks like data processing, model training and inference, and other batch jobs. This app is designed to provide flexibility and control over how these jobs are run, while offering monitoring features for insights and debugging. For more information about the Jobs app, as well as detailed instruction of how to use it in Apolo Console, visit the main Jobs app page.
Running Jobs Using the CLI
The apolo job run
command lets you execute containerized workloads with precise control over their configuration. Let's explore how to create and configure jobs effectively using the CLI (see Apolo CLI reference for more information on running jobs).
Basic Job Execution At its most basic, you only need to specify a Docker image to run a job:
This command runs the image with default settings. However, you'll often want to customize how your job runs.
Essential Job Configuration
Let's look at the core parameters you can configure when running a job:
Container Configuration
Environment and Data You can pass data and configuration to your jobs in several ways:
The volume syntax follows the pattern storage:source:destination:mode
, where mode can be rw
(read-write) or ro
(read-only).
Resource Allocation Control the computing resources available to your job:
Job Identity and Organization Give your job meaningful identifiers for better management:
Runtime Behavior Configure how your job behaves during execution:
Network and HTTP Configuration If your job serves HTTP content or needs port forwarding:
Interactive Jobs For jobs requiring interaction:
Project and Organization Context Specify where your job should run:
You can combine these options to create precisely configured jobs. Here's an example that brings together several common options:
This comprehensive command creates a named job with custom resource allocation, mounted storage, environment variables, a working directory, runtime limit, and organizational tags, then executes a specific Python script with custom parameters.
Remember that after starting a job, you can monitor its progress using commands like apolo job logs
to view output and apolo job status
to check its current state.
I'll create a comprehensive section about debugging jobs that builds on the previous content and provides clear, practical explanations.
Debugging Jobs
When developing and running containerized workloads, you'll often need to troubleshoot issues, monitor performance, or inspect the internal state of your jobs. Apolo provides several powerful tools to help you debug and understand your running jobs effectively.
Investigating Job Status and Logs
The first step in debugging is understanding what's happening with your job. Apolo provides several ways to inspect your job's status and output:
When you have many jobs running, you can filter the job list to find the ones you're interested in:
Interactive Debugging
Sometimes you need to interact directly with a running job. Apolo provides two powerful commands for this purpose:
Port Forwarding for Web-Based Debugging
Many debugging tools provide web interfaces. You can access these using Apolo's port forwarding capabilities:
Attaching to Running Jobs
The apolo job attach
command creates an interactive connection to a running job, allowing you to observe its output and interact with it in real-time. This is particularly useful when you need to monitor progress, investigate issues, or work with interactive applications.
Basic attachment is straightforward:
This connects your terminal's input and output streams to the running job, letting you see any output and interact with the job directly. You can detach from the session without stopping the job by pressing Ctrl+P followed by Ctrl+Q.
A powerful feature of attach is its ability to combine terminal access with port forwarding. This is especially valuable when debugging applications that expose network services, like web servers or Jupyter notebooks:
This command gives you both console access to your job and the ability to reach any network services through your local ports. For instance, with the above command, you could monitor training output in the console while accessing Jupyter at localhost:8888 and TensorBoard at localhost:6006 in your browser.
When debugging with attach, you can run additional Apolo commands in separate terminals to get a complete picture of your job's behavior. For example, you might run apolo job top
in another terminal to monitor resource usage while interacting with your attached session.
Remember that closing your attach session doesn't terminate the job – it continues running in the background. You can always reattach to check on its progress or continue debugging as needed.
Preserving Job State
After debugging and fixing issues, you might want to save the state of your container for future use or analysis. The apolo job save
command creates a new image from your job's current state:
This is particularly useful when:
You've made changes to the container during debugging
You want to capture the exact state where an issue occurred
You need to share a reproducible environment with colleagues
Monitoring resource usage
Use the following commands to display GPU, CPU and memory usage in a job.
Last updated