Teaching Models To Reason - Training, Fine-Tuning, and Evaluating Models with LLaMA Factory on Apolo
This is a tutorial for training, fine-tuning and evaluating models.
This tutorial will guide you through training, fine-tuning, and evaluating models using LLaMA Factory on the Apolo platform. By the end, you will be able to:
Deploy LLaMA Factory on Apolo
Train and fine-tune models using Web UI or CLI
Evaluate and test your trained models
Serve the fine-tuned model via API for real-world applications
Prerequisites
Before proceeding, ensure you have: ✅ Apolo CLI installed (Follow the Apolo CLI Installation Guide) ✅ Docker installed for building images ✅ An Apolo project is set up (will use current project if no project is set) ✅ A HuggingFace token set as a secret You can do that by running the following command
1. Deploying LLaMA Factory on Apolo
Step 1: Clone the LLaMA Factory Repository
Inside the apolo branch you will notice a .apolo
folder containing a live.yaml
. You can find more on the structure of the file below here.
Key fields to consider when deploying LlamaFactory on Apolo:
preset
In this case we have H100X1 (Requests a single NVIDIA H100 GPU.) to list all available presets on your cluster you can use:
data
volume, this will upload everything you have in the data folder to the jobs/app/data
output
volume, this is where you can save your checkpoints and training artifacts, the job will write to this folder /app/saves
Step 2: Build the LLaMA Factory Docker Image on Apolo
This command:
Uses the Dockerfile configured for CUDA (located at
docker/docker-cuda/Dockerfile
).Pushes the built image to the Apolo image registry.
Output Example:
Note: If you need to work with other datasets that are not supported by default, you should add support for them by extending the dataset_info.json
. This file allows for field mapping and format type selection. I added open-r1/OpenR1-Math-220k, a reasoning dataset that will be used to finetune a Llama 3B model to enhance its reasoning capabilities.
Step 3: Run the LLaMA Factory Web UI
This will:
Copy necessary datasets and configurations to Apolo storage.
Start the llama_factory_webui job.
Output Example:
Step 4: Access the Web UI
You can either use the URL above, which is part of the deployment output or wait for a new tab to open automatically with the Llama Factory Web UI, as seen below.
🎉 You are now ready to train and fine-tune models!
2. Training & Fine-Tuning Models with LLaMA Factory
Option 1: Fine-Tuning Using the Web UI
Once inside the Web UI:
Step 1: Select Base Model
Choose a base model (e.g.,
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B, Llama-3.2-3B
).Specify the model path (Hugging Face model).
Step 2: Choose Training Method
Full (for training all parameters).
LoRA (lightweight, efficient fine-tuning).
Freeze (freeze certain layers, there's a section on freezing layers in the UI).
Step 3: Configure Dataset
Select a built-in dataset or provide a custom dataset, we''ll go with the open-r1/OpenR1-Math-220k dataset for which we added support.
Click "Preview Dataset" to verify.
Step 4: Set Training Hyperparameters
Learning Rate
5e-5
Epochs
3
Batch Size
2
(adjust based on GPU)
Gradient Accumulation
4
(for small GPUs)
Scheduler
cosine
Compute Type
bf16
(for modern GPUs)
Step 5: Start Training
Click Start to launch fine-tuning.
Monitor logs and loss curves to track progress.
3. Chat
After the model finishes training, you can move to the Chat tab and follow the steps below:
select the model
select the checkpoint you want to test
write your query in the Input section and submit the query
Here you can try the model out. As you can see below, it has already started showing reasoning:
After it finishes, you can expand the Though section to see all of the reasoning tokens.
4. Evaluating Your Fine‐Tuned/Trained Model
Once your training (or fine‐tuning) has completed, you can evaluate the new model from within the Web UI:
Navigate to the “Evaluate & Predict” Tab From the top menu in LLaMA Factory (Train / Evaluate & Predict / Chat / Export), click Evaluate & Predict.
Select Your Checkpoint
Model name: Choose the same base model used for fine‐tuning (e.g.,
Llama-3.2-3B
).Checkpoint path: Pick the checkpoint directory you just trained (e.g.,
train_2025-02-17-12-36-51
).
Configure the Dataset for Evaluation
Data dir: Set to the directory containing your dataset files (e.g.,
data
).Dataset: Select the dataset entry you want to evaluate on (e.g.,
open-r1/OpenR1-Math-220k
).Use the Preview dataset button (optional) to confirm that the dataset is recognized properly (e.g., no
'from'
KeyError, or other format issues).
Adjust Evaluation Settings
Cutoff length: The maximum token length in your input sequences (e.g., 1024).
Max samples: How many data samples to evaluate (e.g., 1000).
Batch size: The number of samples processed simultaneously per GPU (e.g., 2 or 4, depending on GPU memory).
Save predictions: Tick this box to store the model outputs on each sample.
Maximum new tokens: How many tokens the model may generate for each example (e.g., 512).
Top‐p and Temperature: Sampling hyperparameters—adjust for different sampling behaviors (e.g.,
top_p=0.7
,temperature=0.95
).Output dir: Where the evaluation logs and predictions will be saved.
Start the Evaluation
Click Start.
Monitor logs in the console below the Start button.
Once finished, LLaMA Factory will show the evaluation progress and any metrics (if the dataset has built‐in metrics).
If Save predictions was checked, look for a file (e.g.,
.jsonl
or.csv
) in theoutput
directory with each sample’s output.
Review Results
If your dataset or LLaMA Factory supplies metrics (like accuracy, perplexity, or F1), they will appear in the logs or console.
You can also open the saved predictions file to see the raw completions on each sample.
Tip: If your dataset does not provide built‐in metrics, you will see only raw predictions. You can parse these to do custom scoring if needed.
Final results will be visible after the evaluation finishes, it's a good practice to evaluate model you're starting off with and then evaluate the fine-tuned/trained one so you can understand if results improved or not.
5. Exporting the model
For exporting models, you need to switch to the export tab and follow these steps:
Set the export dir (where you want the model to be saved in the mapped volume); you can later download it from this directory; it's recommended to be under one of the mounted volumes in the
.apolo/live.yaml
you can also provide the HF Hub ID, if you want the model to be uploaded to that Huggingface Repo
After exporting the model on Huggingface, you can deploy it also using the vLLM app
Last updated
Was this helpful?