Apolo Jobs vs Apps
Overview
The Apolo platform provides two distinct approaches for running workloads: Jobs and Apps. While both enable you to execute containerized workloads on the platform, they serve fundamentally different purposes and are designed for different use cases. Understanding when to use each is crucial for effectively leveraging the Apolo platform.
Conceptual Differences
Jobs: Ad-Hoc Containerized Workloads
Jobs are the fundamental unit of compute execution in Apolo. They represent ephemeral, containerized tasks that you launch, monitor, and manage with full control over their configuration and lifecycle.
Key Characteristics:
Transient nature: Jobs have a defined beginning and end
Fine-grained control: Every parameter can be customized per execution
Imperative approach: You explicitly specify what to run and how
Flexible lifecycle: Can be interactive, detached, or batch-oriented
No pre-configuration required: Launch directly with CLI commands
Apps: Pre-Packaged, Reusable Solutions
Apps are higher-level abstractions that package commonly-used tools and services into ready-to-deploy solutions. They come in two categories:
Pre-installed Apps: Essential tools that come standard with the Apolo console, providing instant access to key functionalities without setup
Installable Apps: Popular ML/AI tools and services that can be installed as needed
Key Characteristics:
Persistent services: Designed for long-running or frequently-used tools
Pre-configured: Come with sensible defaults and guided setup
Declarative approach: You configure and install, the platform handles deployment
Standardized interface: Consistent management across all apps
Reusability: Install once, use repeatedly
Technical Comparison
Lifecycle
Ephemeral, task-oriented
Persistent, service-oriented
Configuration
Fully customizable per execution
Template-based with parameters
Management
Direct CLI/Console control
App management interface
Purpose
Training, processing, experimentation
Development environments, services
State
Stateless by default
Often stateful
Deployment
Immediate execution
Install then access the app via URL
Access Method
CLI (apolo job run) or Console widget
CLI (apolo app/apolo app-template) or Console widget
Use Cases and Examples
When to Use Jobs
Jobs are ideal for computational tasks that have a clear start and end point:
1. Model Training
Training a machine learning model is a perfect job use case:
apolo job run \
--name training-run \
--preset gpu-small \
--volume storage:data:/workspace/data:rw \
--volume storage:code:/workspace/code:rw \
--env BATCH_SIZE=32 \
--env LEARNING_RATE=0.001 \
--workdir /workspace \
--life-span 24h \
--tag experiment \
--description "Training run with custom parameters" \
pytorch/pytorch:latest \
-- python ./code/train.py --epochs 100Why a Job?
Has a defined duration (training epochs)
Requires specific resource allocation for this run
Needs custom environment variables per experiment
Will terminate when complete
2. Data Processing Pipelines
Batch processing data is another classic job scenario:
apolo job run \
--name data-preprocessing \
--preset cpu-large \
--volume storage:raw-data:/raw-data:ro \
--volume storage:processed-data:/processed-data:rw \
--volume storage:code:/code:rw \
--life-span 6h \
python:3.9 \
-- python /code/preprocess.py --input /raw-data --output /processed-dataWhy a Job?
One-time or periodic execution
Clear input and output
Completes when processing is done
3. Model Inference (Batch)
Running inference on a dataset:
apolo job run \
--name batch-inference \
--preset gpu-small \
--volume storage:models:/models:ro \
--volume storage:code:/code:ro \
--volume storage:test-data:/test-data:ro \
--volume storage:predictions:/predictions:rw \
tensorflow/tensorflow:latest-gpu \
-- python /code/inference.py --model /models/best_model.h5Why a Job?
Batch processing mode
Temporary execution
No need for persistent service
4. Debugging and Experimentation
Interactive exploration and debugging:
apolo job run \
--name debug-session \
--preset gpu-small \
--volume storage::/code:rw \
--http-port 8888 \
quay.io/jupyter/base-notebook:latest \
-- jupyter lab --allow-rootWhy a Job?
Temporary debugging session
Custom configuration for specific investigation
Will be terminated when done
When to Use Apps
Apps are ideal for tools and services that you use repeatedly or that need to run continuously:
1. Development Environments
Apps like Jupyter Notebook, JupyterLab, VS Code, or PyCharm provide persistent development environments:
Why an App?
Used regularly for multiple projects
Requires consistent configuration
Benefits from standardized setup
May maintain session state
Accessed through web interface
Example Scenario: You're developing multiple ML models over weeks. Instead of launching a new Jupyter job each time with custom configuration, install the Jupyter app once with your preferred settings, and access it whenever needed.
2. Model Serving Services
Apps like vLLM or Text Embeddings Inference provide persistent inference APIs:
Why an App?
Must be continuously available
Serves multiple requests over time
Requires stable endpoint
Scalable
Benefits from managed deployment
Example Scenario: You need to serve a large language model to multiple applications. The vLLM app provides a persistent API endpoint with optimized inference, handling requests continuously rather than as batch jobs.
3. MLOps Infrastructure
Apps like MLflow provide persistent tracking and management services:
Why an App?
Centralizes experiment tracking
Maintains historical data
Provides web interface
Used by multiple team members
Example Scenario: Your team runs many training jobs and needs to track experiments, compare metrics, and manage model versions centrally. MLflow app provides a persistent service that all jobs can report to.
4. Data management
Apps like PostgreSQL provide persistent database services:
Why an App?
Data persistence required
Accessed by multiple jobs/services
Benefits from managed configuration
Needs consistent availability
Example Scenario: Multiple training jobs need to log results to a database, and various analysis scripts need to query that data. A PostgreSQL app provides a stable, always-available database service.
5. AI Applications
Apps like Stable Diffusion, Fooocus, or Dify provide ready-to-use AI services:
Why an App?
Pre-configured for optimal performance
Provides user interface
Benefits from standardized deployment
Used interactively over time
Example Scenario: You need to generate images regularly for different projects. The Stable Diffusion app provides a persistent web interface with the model already loaded and optimized, rather than launching a new job each time.
Decision Framework
Choose Jobs when:
✅ You need maximum flexibility in configuration ✅ The workload is temporary or one-time ✅ You're experimenting with different parameters ✅ The task has a clear completion point ✅ You need fine-grained control over resources ✅ You're running batch processing ✅ Each execution needs different settings
Choose Apps when:
✅ You need a persistent service ✅ You'll use the tool repeatedly ✅ You want standardized deployment ✅ The tool requires a web interface ✅ Multiple users need shared access ✅ You need managed configuration ✅ The service should be always available
Practical Examples: Same Tool, Different Approaches
Example 1: Jupyter Notebook
As a Job (Ad-hoc exploration):
apolo job run \
--name quick-analysis \
--preset cpu-small \
--volume storage::/data:ro \
--http-port 8888 \
--life-span 4h \
jupyter/scipy-notebook \
-- jupyter lab --allow-rootUse when: You need a quick, temporary notebook session for specific analysis
As an App (Regular development):
Install Jupyter app from Available Apps
Configure persistent storage and preferred settings
Access through Apps interface whenever needed
Use when: You regularly use Jupyter for development across multiple projects
Example 2: Model Inference
As a Job (Batch inference):
apolo job run \
--name batch-predictions \
--preset gpu-small \
--volume storage:models:/models:ro \
--volume storage:data:/data:ro \
--volume storage:outputs:/output:rw \
vllm/vllm-openai:latest \
-- python batch_inference.pyUse when: You need to process a dataset once
As an App (Inference service):
Install vLLM app
Configure model and serving parameters
Provides persistent API endpoint
Use when: Multiple applications need real-time inference via API
Working with Both: A Complete Workflow
A typical ML workflow often uses both Jobs and Apps together:
Apps for Infrastructure:
Install MLflow app for experiment tracking
Install PostgreSQL app for results storage
Install Jupyter app for development
Jobs for Computation:
Run training jobs with different hyperparameters
Each job logs to MLflow app
Each job stores results in PostgreSQL app
Debug issues by launching temporary jobs
Apps for Deployment:
Once model is trained, deploy via vLLM app for serving
Use Service Deployment app for custom inference API
Pre-installed vs Installable Apps
Pre-installed Apps
These are available immediately in every Apolo project, you can find an overview for such applications on a dedicated Pre-installed Apps page.
Installable Apps
These must be configured and installed per project based on your needs. The list of installable applications grows constantly and can be found at Pre-installed Apps page.
CLI vs Console Management
Jobs Management
CLI (More control):
# Run a job
apolo job run pytorch/pytorch:latest -- python train.py
# List jobs
apolo job ls --status running
# View logs
apolo job logs my-job
# Debug interactively
apolo job exec my-job -- /bin/bash
# Stop a job
apolo job kill my-jobConsole (Visual interface):
Use Custom Job widget on dashboard
Select preset, image, command
Optional name and HTTP port
Visual job status monitoring
Apps Management
CLI:
# List available apps
apolo app ls
# Install an app
apolo app install jupyter
# Dump logs from jupiter app instance
apolo app logs jupyter-instanceWeb Console (Recommended):
Navigate to Apps section
Browse Available Apps
Click Install with guided configuration
Manage through Apps interface
Apps are typically easier to manage through the Web Console due to their configuration complexity since Apolo Web Console has a more intuitive flow, while Jobs benefit from CLI flexibility for scripting and automation.
Best Practices
For Jobs
Use meaningful names:
--name training-resnet50-lr001not--name job1Tag appropriately:
--tag experiment --tag productionSet lifecycle limits:
--life-span 24hto prevent runaway costsMount storage correctly: Use
:rofor inputs,:rwfor outputsLeverage presets: Use standardized resource configurations
Monitor resource usage: Use
apolo job topto track consumption
For Apps
Install what you need: Don't clutter with unused apps
Configure carefully: Review all parameters during installation
Use appropriate presets: Match resources to app requirements
Manage lifecycle: Stop apps when not in use to save credits
Organize by project: Install apps in relevant projects
Update documentation: Note custom configurations for team use
Advanced Patterns
Hybrid Workflows
Combine Jobs and Apps for sophisticated workflows:
# 1. Use MLflow app for tracking
# (Already installed and running)
# 2. Launch multiple training jobs
for lr in 0.001 0.01 0.1; do
apolo job run \
--name "train-lr-${lr}" \
--env MLFLOW_TRACKING_URI="http://mlflow-app:5000" \
--env LEARNING_RATE="${lr}" \
pytorch/pytorch:latest \
-- python train.py
done
# 3. Use Jupyter app to analyze results
# (Access through Apps interface)
# 4. Deploy best model via vLLM app
# (Configure with best model path)Conclusion
The choice between Jobs and Apps in Apolo comes down to persistence and reusability:
Jobs are for computational tasks: training, processing, batch inference, experimentation
Apps are for services and tools: development environments, inference APIs, databases, managed services
Think of Jobs as running programs and Apps as installed applications. Both are essential to an effective ML workflow, and understanding when to use each will help you leverage the Apolo platform most effectively.
Most users will find themselves using Apps for their development environment and infrastructure, while running Jobs for their actual computational workloads. This separation provides the right balance of convenience and control for modern ML operations.
References
Last updated
Was this helpful?