Converting docker-compose to Apolo live.yaml - step by step guide
Overview
This document’s aim is to guide the user on how to approach conversion of their existing docker-compose.yaml that launches an application or service to Apolo using Apolo Flow. Live mode of Apolo Flow (live.yaml configuration file) is a similar concept allowing to launch applications consisting of several “jobs” on Apolo platform. The document takes a sample docker-compose.yaml that launches privateGPT wih a model served by vLLM along with miscellanea and shows how to adapt it to Apolo live.yaml that serves a similar function.
Concepts:
Apolo Workflows:
A workflow is a configurable automated process made up of one or more job, task, or action calls. You must create a YAML file to define your workflow configuration.
Workflow kinds - there are two kinds of workflows: live and batch. We will discuss live workflow in this document.
Live workflows - ( Live workflow syntax ) workflows are controlled from the developer's machine. They contain a set of job definitions that spawn jobs in the Apolo cloud.
Here's an example of a typical Apolo flow job:
Executing a Jupyter Notebook server in the cloud on a powerful node with a lot of memory and a high-performant GPU.
Opening a browser with a Jupyter web client connected to this server.
Docker-compose:
Docker Compose is a tool for defining and running multi-container applications. It is the key to unlocking a streamlined and efficient development and deployment experience.
Comparison table between Docker Compose and Apolo Flow Live mode
Criteria
Docker Compose
Apolo Flow Live Mode
Driver
Local Docker Engine
Apolo Cloud
DSL
Docker Compose Link (YAML configuration)
Apolo Flow Link (YAML configuration)
Templating
Inheritance
Modules & Mixins
Images
Local build & storage
Apolo Cloud (built and stored in the cloud)
Storage
Local storage (volumes)
Apolo Cloud (persistent storage in the cloud)
Networking
Local network / machine
Apolo Project (internal service discovery)
Service Scaling
Manual or external tools
Limited direct scaling (unless using Apolo Apps)
Collaboration
Primarily local, manual sharing of configurations
Built-in collaboration features within Apolo
Docker-Compose Service Architecture
Please note that the examples use different model serving (ollama vs. vLLM and traefik configs). We will address this in the document.
Let’s first look at Docker-Compose file and its structure:
This is a Docker Compose configuration for PrivateGPT with multiple deployment options. Here's a breakdown of the architecture and services:
PrivateGPT Services
private-gpt-ollama
Main application service that interfaces with Ollama
Runs on port 8001 with Docker profile configuration
Supports CPU, CUDA, and API modes through profiles
Depends on Ollama service for model inference
private-gpt-llamacpp-cpu
Alternative deployment using llama.cpp backend
CPU-only inference with local model storage
Direct model execution without external dependencies
Uses local profile for standalone operation
Ollama Infrastructure
ollama (Traefik Proxy)
Reverse proxy using Traefik v2.10
Routes requests to appropriate Ollama backends
Health checks ensure service availability
Exposes management interface on port 8080
ollama-cpu
CPU-based Ollama service
Standard deployment for systems without GPU
Model storage in local ./models directory
ollama-cuda
GPU-accelerated Ollama service
Requires NVIDIA GPU with CUDA support
Uses Docker GPU resource reservations
Same model storage as CPU version
Key Features
Profile-Based Deployment
Multiple profiles: ollama-cpu, ollama-cuda, ollama-api, llamacpp-cpu
Flexible deployment options based on hardware capabilities
Easy switching between inference backends
Model Management
Shared model storage in ./models directory
Persistent data storage in ./local_data
Hugging Face integration with token support
Service Discovery
Traefik handles routing and load balancing
Health checks ensure service reliability
Internal networking between services
This setup provides a comprehensive private AI deployment with options for both CPU and GPU inference, making it suitable for various hardware configurations and use cases.
Now let’s look at how live.yaml is structured:
This is Apolo configuration file for deploying a private GPT system with multiple components. Here's what this setup provides:
Architecture Overview
The configuration defines a distributed system with four main services:
Core Application (pgpt) - PrivateGPT is a production-ready AI project that allows you to ask questions about your documents using the power of Large Language Models (LLMs), even in scenarios without an Internet connection. 100% private, no data leaves your execution environment at any point.
Runs the main PrivateGPT application on CPU resources
Serves the web interface on port 8080
Uses profiles for app and pgvector integration
Connects to external VLLM and embedding services
Language Model Service (vllm) - vLLM is an open-source library designed to make Large Language Model (LLM) inference faster and more efficient, especially in production settings. It achieves this by using an innovative memory management system called PagedAttention, which optimizes GPU memory for the LLM's KV cache. vLLM also uses techniques like continuous batching to process requests dynamically, improving throughput and reducing latency.
Deploys DeepSeek-R1-Distill-Qwen-32B model using VLLM
Runs on A100 GPU for high-performance inference
Provides OpenAI-compatible API endpoints
Uses half-precision (float16) for memory efficiency
Embedding Service (tei) - Text Embeddings Inference (TEI) is a comprehensive toolkit designed for efficient deployment and serving of open source text embeddings models. It enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE, and E5.
TEI offers multiple features tailored to optimize the deployment process and enhance overall performance.
Handles text embeddings using Nomic's embedding model
Also runs on A100 GPU for fast embedding generation
Supports batch processing up to 100 clients
Database (pgvector) - a PostgreSQL extension that provides powerful functionalities for working with high-dimensional vectors. It introduces a dedicated data type, operators, and functions that enable efficient storage, manipulation, and analysis of vector data directly within the PostgreSQL database.
PostgreSQL with vector extension for similarity search
Stores document embeddings and metadata
Uses persistent storage for data durability
Key Features
Persistent Storage: Multiple volumes for caching, data, and settings
GPU Optimization: Uses A100 GPUs for both LLM and embedding inference
Service Discovery: Jobs communicate via internal hostnames
Security: Uses Hugging Face tokens stored as secrets
Scalability: Configurable context window (34K tokens) and batch processing
This setup is ideal for organizations wanting to run a private, on-premises ChatGPT-like system with document ingestion capabilities while maintaining data privacy and control.
Converting docker-compose to live.yaml:
1 . Study the docker‑compose architecture
Identify services, images and profiles. In the provided Compose file, PrivateGPT defines two services for the application (one using Ollama and another using llama.cpp), a Traefik reverse‑proxy and Ollama services for CPU and GPU. For example, the
private-gpt-ollama
service builds theDockerfile.ollama
, exposes port 8001 and sets environment variables such asPGPT_MODE=ollama
andPGPT_OLLAMA_API_BASE=http://ollama:11434
. Profiles (ollama-cpu
,ollama-cuda
,ollama-api
, etc.) select which Ollama backend to use.Observe volumes. Compose mounts local folders (e.g.,
./local_data
to/home/worker/app/local_data
and./models
to/root/.ollama
for Ollama services) for data persistence.Notice dependencies and networking. Compose uses
depends_on
to start the Ollama proxy before PrivateGPT. Port mappings (ports
) expose containers to the host. Traefik routes requests to the appropriate Ollama backend.
Understanding these relationships will help you map each piece into Apolo concepts.
In Apolo’s flow configuration, variables like project
and flow
come from the context that the platform injects when it runs a workflow. They let you reference the current project or workflow without hard‑coding identifiers or paths.
What project
refers to
project
refers toproject
represents the Apolo project that owns your flow. A project is the top‑level container where you store jobs, images, secrets and flows. Projects make it easy to organize resources and share them with teammates.project.id
returns the unique identifier of the current project. You use this in image names (image:$[[ project.id ]]:v1
) and storage paths (storage:$[[ flow.project_id ]]/data
), so that the resources are namespaced correctly.Other attributes include
project.name
(the human‑readable name) andproject.owner
. These can be used in more advanced flows for templating or logging, butproject.id
is the most common.
What flow
refers to
flow
refers toflow
represents the workflow instance being executed. It provides information about the run environment.flow.workspace
is the path to the workspace directory on the build machine. When building images, Apolo mounts your repository at this location. In the PrivateGPT live example thedockerfile
andcontext
fields point to$[[ flow.workspace ]]/Dockerfile.apolo
and$[[ flow.workspace ]]/
, telling Apolo to build the image from the repository root.flow.project_id
is equivalent toproject.id
, but scoped to the running flow. It appears in storage definitions such asremote: storage:$[[ flow.project_id ]]/data
, ensuring that volumes are created within the current project’s namespace.
Where to find more information
The Apolo documentation describes the platform as a “flexible and robust machine learning platform” and directs users to sign up and explore its features. This is a good starting point for understanding projects, flows and other core concepts.
Within the docs, look under the “Flow CLI” and “Flow syntax” sections for a detailed explanation of context variables like
project
andflow
, and examples showing how to use them inlive.yaml
. These sections are accessible through the navigation on the Apolo documentation site.
In short, project
and flow
are context objects that Apolo makes available to your YAML files so you can write generic configurations. project.id
and flow.project_id
let you create images and volumes inside the current project, while flow.workspace
tells Apolo where to find your source code when building images.
2 . Create the Apolo project skeleton
Set the workflow kind and metadata. A live workflow runs on Apolo’s cloud but uses the developer’s machine as the entry point. Start with:
kind: live title: private-gpt-ollama defaults: life_span: 7d env: # global environment variables extracted from Compose PGPT_TAG: "0.6.2" HF_TOKEN: secret:HF_TOKEN
The
life_span
controls how long the jobs stay running, andenv
defines variables available to all jobs.Define images. Compose either builds images locally or pulls them from Docker Hub. In Apolo you must define images explicitly under an
images
section. For example:images: private-gpt-ollama: ref: image:$[[ project.id ]]:pgpt-ollama-v1 dockerfile: Dockerfile.ollama context: $[[ flow.workspace ]]/ build_preset: cpu-large
Similarly define an image for the llama.cpp version if needed.
build_preset
chooses CPU or GPU resources for the build.
3 . Translate volumes
Compose volumes become Apolo volumes backed by persistent storage. For each local directory in Compose, define a volume:
volumes:
local_data:
remote: storage:$[[ flow.project_id ]]/local_data
mount: /home/worker/app/local_data
local: local_data
models:
remote: storage:$[[ flow.project_id ]]/models
mount: /home/worker/app/models
local: models
ollama_models:
remote: storage:$[[ flow.project_id ]]/ollama_models
mount: /root/.ollama
local: models
The remote
path stores data in Apolo’s persistent storage. mount
specifies where the volume is mounted inside the container, and local
points to a folder on your development machine if you want to sync content.
In Apolo, a volume definition tells the platform how to mount persistent storage into your job containers. It has three main fields:
remote
– the path in Apolo’s object‑storage service where the data will be stored. You usually prefix it withstorage:$[[ flow.project_id ]]
so that it lives under the current project’s namespace. In the PrivateGPT live.yaml, for example, thecache
volume points tostorage:$[[ flow.project_id ]]/cache
. Using a unique remote path ensures the data persists across job restarts.mount
– the directory inside the container where the volume will be mounted. For thecache
volume, Apolo mounts it at/root/.cache/huggingface
so that Hugging Face downloads are written to persistent storage instead of the container’s temporary filesystem. When multiple jobs reference the same volume, they all see the same files under theirmount
path.local
– an optional local directory on your machine that is synchronised with the remote storage. This is useful in live workflows because you can edit files locally and have them appear inside the container. In the example,local: cache
means Apolo will sync the remote cache bucket with acache/
folder in your working directory.If you omitlocal
, the volume still persists in the cloud but there’s no local sync.
Here’s how these fields come together for the different volumes in the PrivateGPT live workflow:
cache
/root/.cache/huggingface
Stores model and tokenizer downloads from Hugging Face. The remote path storage:$[[ flow.project_id ]]/cache
persists the cache across runs, and local: cache
syncs it with a local cache/
folder.
data
/home/worker/app/local_data
Holds the application’s working data (documents you upload to PrivateGPT). Persisting this directory allows state to survive job restarts.
pgdata
/var/lib/postgresql/data
Stores PostgreSQL data files for the pgvector
job. Using a remote volume prevents database data loss if the job stops.
settings
/home/worker/app/settings
Contains configuration files for PrivateGPT; syncing this directory lets you update settings locally and have them applied in the container.
tiktoken_cache
/home/worker/app/tiktoken_cache
Caches tokenization data used by the tokenizer library. This avoids re‑downloading or re‑computing tokenization information on every run.
By defining volumes in this way, you achieve the same persistent‑storage behaviour as Docker Compose’s bind mounts, but with the added ability to sync files from your local machine and store them in Apolo’s cloud
4 . Convert services into jobs
Apolo doesn’t run containers directly; it launches jobs. Each Compose service should become a job with analogous settings:
Main application jobs
Ollama mode job: Translate the
private-gpt-ollama
service into a job:jobs: pgpt-ollama: image: ${{ images.private-gpt-ollama.ref }} name: pgpt-ollama preset: cpu-small # choose an Apolo resource preset http_port: "8001" detach: true # keep running after the workflow finishes browse: true # enable remote browsing volumes: - ${{ volumes.local_data.ref_rw }} env: PORT: 8001 PGPT_PROFILES: docker PGPT_MODE: ollama PGPT_EMBED_MODE: ollama PGPT_OLLAMA_API_BASE: http://${{ inspect_job('ollama-proxy').internal_hostname_named }}:11434 HF_TOKEN: secret:HF_TOKEN
llama.cpp mode job: Convert
private-gpt-llamacpp-cpu
into another job:pgpt-llamacpp: image: ${{ images.private-gpt-llamacpp.ref }} name: pgpt-llamacpp preset: cpu-medium http_port: "8001" detach: true browse: true volumes: - ${{ volumes.local_data.ref_rw }} - ${{ volumes.models.ref_rw }} env: PORT: 8001 PGPT_PROFILES: local HF_TOKEN: secret:HF_TOKEN cmd: > sh -c ".venv/bin/python scripts/setup && .venv/bin/python -m private_gpt"
Note that the Compose entrypoint (which runs a shell script) becomes the
cmd
field in Apolo.
Backend services
Ollama CPU/GPU back‑ends: Compose defines
ollama-cpu
andollama-cuda
using theollama/ollama:latest
image. In Apolo:ollama-cpu: image: ollama/ollama:latest name: ollama-cpu preset: cpu-large detach: true port_forward: - "11434:11434" volumes: - ${{ volumes.ollama_models.ref_rw }} ollama-cuda: image: ollama/ollama:latest name: ollama-cuda preset: gpu-k80-small # select an Apolo GPU preset detach: true port_forward: - "11434:11434" volumes: - ${{ volumes.ollama_models.ref_rw }}
Reverse proxy replacement: Compose uses Traefik to route to either CPU or GPU Ollama. Apolo has built‑in service discovery, so you can replace Traefik with a simple Nginx‑based proxy or reference the back‑end jobs directly. For instance:
ollama-proxy: image: nginx:alpine name: ollama-proxy preset: cpu-small detach: true port_forward: - "11434:11434" volumes: - ${{ upload('.docker/nginx.conf').ref }}:/etc/nginx/nginx.conf:ro
The Nginx configuration would route
/v1/*
to the CPU or GPU Ollama job. Alternatively, skip the proxy and setPGPT_OLLAMA_API_BASE
to the internal hostname of the desired job (see Step 7).
Additional services
The original Apolo example uses vLLM, text‑embeddings‑inference and pgvector instead of Ollama. If you choose those back‑ends, define jobs as in the live.yaml: vllm
runs the LLM on an A100 GPU and sets model parameters; tei
runs the text embedding service; and pgvector
sets up a PostgreSQL database with persistent storage.
5 . Handle dependencies and service discovery
Docker Compose’s depends_on
ensures one service starts after another. In Apolo, jobs don’t block each other; instead, the application should wait for dependencies internally or specify environment variables using Apolo’s inspect_job
helper:
PGPT_OLLAMA_API_BASE: http://${{ inspect_job('ollama-cpu').internal_hostname_named }}:11434
inspect_job('job-name').internal_hostname_named
returns the internal hostname of another job so services can communicate without exposing ports.
If your app needs a delay before connecting, implement retry logic in your startup script or set environment variables such as OLLAMA_STARTUP_DELAY
.
6 . Manage secrets and environment variables
Replace sensitive variables (e.g.,
HF_TOKEN
in Compose) with Apolo secrets:HF_TOKEN: secret:HF_TOKEN
Include other environment variables from Compose (e.g.,
PGPT_MODE
,PGPT_EMBED_MODE
,PORT
) in the job’senv
section.
7 . Support multiple deployment profiles
Compose uses profiles (ollama-cpu
, ollama-cuda
, llamacpp-cpu
) to choose back‑ends. In Apolo you can create separate live configuration files that extend the base workflow and override specific jobs. For example:
live‑ollama‑cpu.yaml
extends: live.yaml
jobs:
pgpt-ollama:
env:
PGPT_OLLAMA_API_BASE: http://${{ inspect_job('ollama-cpu').internal_hostname_named }}:11434
ollama-cpu: ${parent.jobs.ollama-cpu}
live‑ollama‑cuda.yaml
extends: live.yaml
jobs:
pgpt-ollama:
env:
PGPT_OLLAMA_API_BASE: http://${{ inspect_job('ollama-cuda').internal_hostname_named }}:11434
ollama-cuda: ${parent.jobs.ollama-cuda}
These override the API base URL and include only the desired backend job.
8 . Validate and iterate
Test your live workflow locally by running
apolo flow run
to ensure jobs start successfully and that the PrivateGPT application can communicate with the back‑end.Adjust resource presets based on the performance of each job (CPU vs GPU).
Use Apolo’s built‑in service discovery to simplify networking; avoid exposing ports unless the service must be accessible outside the workflow.
By following this process, you convert the container‑oriented Docker Compose setup into an Apolo live workflow. Each Compose service becomes an Apolo job with corresponding images, volumes and environment variables; Compose profiles map to separate flow files; and networking is handled through Apolo’s internal hostnames and optional proxies. The end result retains the original functionality while taking advantage of Apolo’s cloud‑native features such as persistent storage, GPU presets and collaborative workflows.
Conversions Notes
Apolo’s live workflow doesn’t need to be a literal mirror of your Compose file—it’s meant to express the same application architecture in a job‑centric, cloud‑native way. In the sample live.yaml we looked at, the model is served by vLLM and embeddings by text‑embeddings‑inference, so the only jobs defined are pgpt
, vllm
, tei
and pgvector
. There is no Ollama backend or Traefik job because:
The model and embedding services are different. The Apolo example switches from Ollama to vLLM (
vllm/vllm-openai:v0.6.6.post1
) and from Traefik to direct service discovery. If you use this architecture, there is no need to runollama-cpu
orollama-cuda
jobs.Apolo has built‑in service discovery. Jobs communicate by referencing each other’s internal hostnames (
inspect_job('vllm').internal_hostname_named
), so a reverse proxy like Traefik—used in Compose to route requests to Ollama—is unnecessary.
If you want to keep using Ollama as your language‑model backend, you can certainly add ollama-cpu
and/or ollama-cuda
jobs, as shown in the conversion steps. Each would have its own image (ollama/ollama:latest
), resource preset and shared volume for model storage. You’d then set PGPT_OLLAMA_API_BASE
in your application job to point to the chosen Ollama job’s internal hostname instead of vLLM. In that case you still don’t need Traefik, because Apolo’s service discovery lets you route requests directly, or you can use a simple Nginx proxy if you prefer.
So, you only add ollama-cpu
, ollama-cuda
or a proxy job when you explicitly choose to deploy the Ollama backend on Apolo; they’re not part of the default live.yaml that uses vLLM.
Spinning up and shutting down the live workflow
Once your live.yaml
file is ready and you’ve defined any required images and secrets, you can start and stop your PrivateGPT deployment using the Apolo CLI. The APOLO.md
in the PrivateGPT repository outlines a typical workflow for the vLLM‑based setup; the same pattern applies when using Ollama or other back‑ends.
1. Build your images and set secrets
Clone the repository and move into it:
git clone <repo-url> cd private-gpt
Build the custom image defined in your
images
section. For the vLLM example the image is calledprivategpt
; useapolo-flow build
to build it:tapolo-flow build privategpt
Create required secrets. For example, create a secret for your Hugging Face token:
apolo secret add HF_TOKEN <your-hf-token>
Secrets can then be referenced in your YAML using
secret:HF_TOKEN
.
2. Start the jobs
Apolo lets you run either the entire live workflow or individual jobs:
Run individual jobs. In
APOLO.md
the vector store, embedding service, LLM server and web application are started separately withapolo-flow run
:apolo-flow run pgvector # start PostgreSQL with pgvector extension apolo-flow run tei # start the embedding server apolo-flow run vllm # start the language model service apolo-flow run pgpt # start the PrivateGPT web server
Each command reads the definition for that job from
live.yaml
, creates the necessary volumes and schedules the job in Apolo’s cloud. You can adapt this pattern for your Ollama‑based jobs (pgpt-ollama
,ollama-cpu
,ollama-cuda
, etc.).
During execution you can use apolo job ls
to see running jobs and apolo job logs <job-name>
to inspect their logs.
3. Shut down the workflow
To stop a running job, use the stop sub‑command:
apolo-flow stop pgpt # stop the PrivateGPT job
apolo-flow stop vllm # stop the vLLM job
apolo-flow stop tei # stop the embedding job
apolo-flow stop pgvector # stop the database
Stopping a job removes the container but does not delete the volumes, so your data in the storage:$[[ flow.project_id ]]
buckets remains intact. You can restart the jobs later with apolo-flow run <job>
and they will pick up where they left off. To completely clean up, delete the volumes or buckets via the Apolo console or CLI (apolo storage rm <path>
).
Alternatively, you can manage jobs from the Apolo console: navigate to the Jobs section, select a job and click Stop to terminate it. Use Start to relaunch a stopped job or Delete to remove it entirely.
By following these commands, you can reliably bring up your PrivateGPT environment on Apolo, perform your work, and then shut it down when you’re finished—all while keeping your data safe in persistent storage.
Resources:
Docker compose example in Apolo Github
Last updated
Was this helpful?