Apolo
HomeConsoleGitHub
  • Apolo concepts
  • CLI Reference
  • Examples/Use Cases
  • Flow CLI
  • Actions Reference
  • Apolo Extras CLI
  • Python SDK
  • Getting started
    • Introduction
    • First Steps
      • Getting Started
      • Training Your First Model
      • Running Your Code
    • Apolo Base Docker image
    • FAQ
    • Troubleshooting
    • References
  • Apolo Console
    • Getting started
      • Sign Up, Login
      • Organizations
      • Clusters
      • Projects
    • Apps
      • Pre-installed apps
        • Files
        • Buckets
        • Disks
        • Images
        • Secrets
        • Jobs
          • Remote Debugging with PyCharm Professional
          • Remote Debugging with VS Code
        • Flows
      • Available apps
        • Terminal
        • LLM Inference
          • vLLM Inference details
          • Multi-GPU Benchmarks Report
        • PostgreSQL
        • Text Embeddings Inference
        • Jupyter Notebook
        • Jupyter Lab
        • VS Code
        • PyCharm Community Edition
        • ML Flow
        • Apolo Deploy
        • Dify
        • Weaviate
        • Fooocus
        • Stable Diffusion
  • Apolo CLI
    • Installing CLI
    • Apps
      • Files
      • Jobs
      • Images
  • Administration
    • Cluster Management
      • Creating a Cluster
      • Managing Users and Quotas
      • Managing organizations
      • Creating Node Pools
      • Managing Presets
Powered by GitBook
On this page
  • Introduction
  • Initializing a new flow
  • Flow configuration structure
  • Filling the flow
  • Training and evaluating the model

Was this helpful?

  1. Getting started
  2. First Steps

Training Your First Model

PreviousGetting StartedNextRunning Your Code

Last updated 2 months ago

Was this helpful?

Introduction

In this tutorial, we describe the recommended way to train a simple machine learning model on the Apolo platform. As our ML engineers prefer PyTorch over other ML frameworks, we show the training and evaluation of one of the basic PyTorch examples.

We assume that you have already signed up to the platform, installed the Apolo CLI, and logged in to the platform (see ).

We base our example on the tutorial.

Initializing a new flow

To simplify working with Apolo Platform and to help establish the best practices in the ML environment, we provide a . This template consists of the recommended directories and files. It's designed to operate smoothly with our

To use it, install the package and initialize cookiecutter-neuro-project:

pipx install cookiecutter
cookiecutter gh:neuro-inc/cookiecutter-neuro-project --checkout release

You will then need to provide some information about the new project:

project_name [Name of the project]: Apolo Tutorial
project_dir [neuro-tutorial]:
project_id [neuro-tutorial]:
code_directory [modules]: rnn
preserve Neuro Flow template hints [yes]:

Flow configuration structure

After you execute the command mentioned above, you get the following structure:

apolo-tutorial
├── .github/            <- Github workflows and a dependabot.yml file
├── .neuro/             <- apolo and apolo-flow CLI configuration files
├── config/             <- configuration files for various integrations
├── data/               <- training and testing datasets (we don't keep it under source control)
├── notebooks/          <- Jupyter notebooks
├── rnn/                <- models' source code
├── results/            <- training artifacts
├── .gitignore          <- default .gitignore file for a Python ML project
├── .neuro.toml         <- autogenerated config file
├── .neuroignore        <- a file telling apolo to ignore the results/ folder
├── HELP.md             <- autogenerated template reference
├── README.md           <- autogenerated informational file
├── Dockerfile          <- description of the base image used for your project
├── apt.txt             <- list of system packages to be installed in the training environment
├── requirements.txt    <- list of Python dependencies to be installed in the training environment
├── setup.cfg           <- linter settings (Python code quality checking)
└── update_actions.py   <- instructions on update actions

When you run a job (for example, via apolo-flow run jupyter), the directories are mounted to the job as follows:

Mount Point
Description
Storage URI

/project/data/

Training / testing data

storage:neuro-tutorial/data/

/project/rnn/

User's Python code

storage:neuro-tutorial/rnn/

/project/notebooks/

User's Jupyter notebooks

storage:neuro-tutorial/notebooks/

/project/results/

Logs and results

storage:neuro-tutorial/results/

Filling the flow

Now we need to fill newly created flow with the content:

  • Change working directory:

cd apolo-tutorial
curl https://raw.githubusercontent.com/pytorch/tutorials/master/intermediate_source/char_rnn_classification_tutorial.py -o rnn/char_rnn_classification_tutorial.py
curl https://download.pytorch.org/tutorial/data.zip -o data/data.zip && unzip data/data.zip && rm data/data.zip

Training and evaluating the model

When you start working with a flow on the Apolo platform, the basic flow looks as follows: you set up the remote environment, upload data and code to your storage, run training, and evaluate the results.

To set up the remote environment, run

apolo-flow build train

This command will run a lightweight job (via apolo run), upload the files containing your dependencies apt.txt and requirements.txt (via apolo cp), install the dependencies (using apolo exec), do other preparatory steps, and then create the base image from this job and push it to the platform (via apolo save, which works similarly to docker commit).

To upload data and code to your storage, run

apolo-flow upload ALL

To run training job, you need to specify the training script in .neuro/live.yaml, and then run apolo-flow run train:

  • open .neuro/live.yaml in an editor,

  • find the following lines (make sure you're looking at the train job, not multitrain which has a very similar section):

    bash: |
        cd $[[ volumes.project.mount ]]
        python -u $[[ volumes.code.mount ]]/train.py --data $[[ volumes.data.mount ]]
  • and replace it with the following lines:

    bash: |
        cd $[[ volumes.project.mount ]]
        python -u $[[ volumes.code.mount ]]/char_rnn_classification_tutorial.py

Now, you can run

apolo-flow run train

and observe the output. You will see how some checks are made at the beginning of the script, and then the model is being trained and evaluated:

['data/names/German.txt', 'data/names/Polish.txt', 'data/names/Irish.txt', 'data/names/Vietnamese.txt', 
'data/names/French.txt', 'data/names/Japanese.txt', 'data/names/Spanish.txt', 'data/names/Chinese.txt', 
'data/names/Korean.txt', 'data/names/Czech.txt', 'data/names/Arabic.txt', 'data/names/Portuguese.txt', 
'data/names/English.txt', 'data/names/Italian.txt', 'data/names/Russian.txt', 'data/names/Dutch.txt', 
'data/names/Scottish.txt', 'data/names/Greek.txt']
Slusarski
['Abandonato', 'Abatangelo', 'Abatantuono', 'Abate', 'Abategiovanni']
tensor([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.,
         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
         0., 0., 0.]])
torch.Size([5, 1, 57])
tensor([[-2.8248, -2.9118, -2.8999, -2.9170, -2.8916, -2.9699, -2.8785, -2.9273,
         -2.8397, -2.8539, -2.8764, -2.9278, -2.8638, -2.9310, -2.9546, -2.9008,
         -2.8295, -2.8441]], grad_fn=<LogSoftmaxBackward>)
('German', 0)
category = Vietnamese / line = Vu
category = Chinese / line = Che
category = Scottish / line = Fraser
category = Arabic / line = Abadi
category = Russian / line = Adabash
category = Vietnamese / line = Cao
category = Greek / line = Horiatis
category = Portuguese / line = Pinho
category = Vietnamese / line = To
category = Scottish / line = Mcintosh
5000 5% (0m 19s) 2.7360 Ho / Portuguese ✗ (Vietnamese)
10000 10% (0m 38s) 2.0606 Anderson / Russian ✗ (Scottish)
15000 15% (0m 58s) 3.5110 Marqueringh / Russian ✗ (Dutch)
20000 20% (1m 17s) 3.6223 Talambum / Arabic ✗ (Russian)
25000 25% (1m 35s) 2.9651 Jollenbeck / Dutch ✗ (German)
30000 30% (1m 54s) 0.9014 Finnegan / Irish ✓
35000 35% (2m 13s) 0.8603 Taverna / Italian ✓
40000 40% (2m 32s) 0.1065 Vysokosov / Russian ✓
45000 45% (2m 52s) 3.6136 Blanxart / French ✗ (Spanish)
50000 50% (3m 11s) 0.0969 Bellincioni / Italian ✓
55000 55% (3m 30s) 3.1383 Roosa / Spanish ✗ (Dutch)
60000 60% (3m 49s) 0.6585 O'Kane / Irish ✓
65000 65% (4m 8s) 4.7300 Satorie / French ✗ (Czech)
70000 70% (4m 27s) 0.9765 Mueller / German ✓
75000 75% (4m 46s) 0.7882 Attia / Arabic ✓
80000 80% (5m 5s) 2.1131 Till / Irish ✗ (Czech)
85000 85% (5m 25s) 0.5304 Wei / Chinese ✓
90000 90% (5m 44s) 1.6258 Newman / Polish ✗ (English)
95000 95% (6m 2s) 3.2015 Eberhardt / Irish ✗ (German)
100000 100% (6m 21s) 0.2639 Vamvakidis / Greek ✓

> Dovesky
(-0.77) Czech
(-1.11) Russian
(-2.03) English

> Jackson
(-0.92) English
(-1.65) Czech
(-1.85) Scottish

> Satoshi
(-1.32) Italian
(-1.81) Arabic
(-2.14) Japanese

Copy the to your rnn folder:

Download data from , extract the ZIP’s content and put it in your data folder:

Getting Started
Classifying Names with a Character-Level RNN
flow template
base environment.
cookiecutter
model source
here