Training Your First Model
Introduction
In this tutorial, we describe the recommended way to train a simple machine learning model on the Apolo platform. As our ML engineers prefer PyTorch over other ML frameworks, we show the training and evaluation of one of the basic PyTorch examples.
We assume that you have already signed up to the platform, installed the Apolo CLI, and logged in to the platform (see Getting Started).
We base our example on the Classifying Names with a Character-Level RNN tutorial.
Initializing a new flow
To simplify working with Apolo Platform and to help establish the best practices in the ML environment, we provide a flow template. This template consists of the recommended directories and files. It's designed to operate smoothly with our base environment.
To use it, install the cookiecutter package and initialize cookiecutter-neuro-project:
pipx install cookiecutter
cookiecutter gh:neuro-inc/cookiecutter-neuro-project --checkout releaseYou will then need to provide some information about the new project:
project_name [Name of the project]: Apolo Tutorial
project_dir [neuro-tutorial]:
project_id [neuro-tutorial]:
code_directory [modules]: rnn
preserve Neuro Flow template hints [yes]:Flow configuration structure
After you execute the command mentioned above, you get the following structure:
When you run a job (for example, via apolo-flow run jupyter), the directories are mounted to the job as follows:
/project/data/
Training / testing data
storage:neuro-tutorial/data/
/project/rnn/
User's Python code
storage:neuro-tutorial/rnn/
/project/notebooks/
User's Jupyter notebooks
storage:neuro-tutorial/notebooks/
/project/results/
Logs and results
storage:neuro-tutorial/results/
Filling the flow
Now we need to fill newly created flow with the content:
Change working directory:
Copy the model source to your
rnnfolder:
Download data from here, extract the ZIP’s content and put it in your
datafolder:
Training and evaluating the model
When you start working with a flow on the Apolo platform, the basic flow looks as follows: you set up the remote environment, upload data and code to your storage, run training, and evaluate the results.
To set up the remote environment, run
This command will run a lightweight job (via apolo run), upload the files containing your dependencies apt.txt and requirements.txt (via apolo cp), install the dependencies (using apolo exec), do other preparatory steps, and then create the base image from this job and push it to the platform (via apolo save, which works similarly to docker commit).
To upload data and code to your storage, run
To run training job, you need to specify the training script in .neuro/live.yaml, and then run apolo-flow run train:
open
.neuro/live.yamlin an editor,find the following lines (make sure you're looking at the
trainjob, notmultitrainwhich has a very similar section):
and replace it with the following lines:
Now, you can run
and observe the output. You will see how some checks are made at the beginning of the script, and then the model is being trained and evaluated:
Last updated
Was this helpful?