Files
Last updated
Last updated
The Files application is a comprehensive file management system designed to help you organize and manage your network storage within the cluster. This documentation will guide you through its features and functionality.
Your organization receives a storage space within the cluster, structured in a hierarchical manner:
Organization Level: The root folder for all your organization's files
Project Level: Individual project folders within your organization
Custom Folders: User-created folders for further organization
Adding New Folders You can create new folders to organize your files by clicking the "Add Folder" button. The system will prompt you with a dialog where you can specify the folder name. These folders help maintain a structured file hierarchy within your project space.
File Upload The system supports file uploads through two methods:
Using the "Upload" button
Drag-and-drop functionality directly into the interface
Navigation and Search The interface provides several navigation tools:
Search bar: Located at the top of the interface for quick file location
Home button: Returns you to your project's root folder
Grid/List view toggle: Allows you to switch between viewing modes
Folder Up: Navigate to the parent folder using the dedicated button
File and Folder Management
For each file and folder, you can:
View properties including:
Save location (full path)
File extension
Last modified date
File size
Perform actions such as:
Rename items
Delete items
Copy/move items
View detailed properties
In addition to the graphical interface, you can manage your storage through our powerful command-line interface (CLI). The CLI provides advanced capabilities for file operations, particularly useful for automation and bulk operations.
Through the command line, you can perform all essential file operations using the apolo storage
command set. Here are some key capabilities:
The CLI supports essential operations such as copying files (cp
), creating directories (mkdir
), moving files (mv
), and removing files (rm
). It also provides advanced features like storage usage analysis (df
) and tree-style directory visualization (tree
). When working with files programmatically or handling batch operations, the CLI offers precise control and automation capabilities.
For example, you can copy files to your storage using:
Or list your storage contents in a tree format:
The CLI is particularly valuable for:
Automating file operations in scripts
Performing bulk file transfers
Integration with development workflows
Remote storage management
Pattern-based file operations using glob patterns
For comprehensive documentation of all CLI commands and their options, please refer to our detailed CLI documentation for storage.
When running computational jobs, you often need to access files from your storage. The Files system integrates seamlessly with our job execution system, allowing you to mount directories from your storage as volumes inside your containers. This gives your jobs direct access to your files, making it easy to process data and save results.
How Storage Mounting Works
When you mount a storage volume, you create a connection between a directory in your storage and a location inside your job's container. Think of it like creating a window between your storage and your running job - any files in the mounted storage directory become accessible from within your job.
You can mount volumes in two modes:
Read-write mode (rw
): Allows your job to both read existing files and write new ones
Read-only mode (ro
): Provides access to read files but prevents modifications
Mounting Volumes Using the CLI
To mount a volume when running a job, use the --volume
(or -v
) option with the apolo job run
command (see Apolo CLI reference for more information on running jobs). The volume specification follows this format:
Where:
<source-path>
: The path in your storage (relative to your project root)
<container-path>
: Where the files will appear inside the container
<mode>
: Either ro
(read-only) or rw
(read-write)
For example, to mount your project's data directory in read-only mode:
You can mount multiple volumes in the same job:
Workflows provide powerful capabilities for integrating your storage with computational jobs. Let's explore how to effectively use storage volumes in your workflow definitions.
Understanding Volumes in Workflows
Volumes in workflows create a three-way connection between:
Your local development environment
Your storage in the Apolo platform
The running jobs in your workflows
This three-way connection enables seamless data flow between development, storage, and computation. Let's look at how to define and use volumes in your workflow configuration.
Defining Volumes
In your workflow YAML file, you can define volumes in two ways:
Direct Reference - Specify the volume directly where it's needed:
Volume Definitions - Define volumes centrally and reference them throughout:
The second approach offers several advantages:
Centralized volume management
Reusability across multiple jobs
Easier synchronization between local and remote storage
Common Volume Patterns
Here are some effective patterns for organizing your workflows with volumes:
Input/Output Separation:
Development Environment:
Volume Synchronization
When you define a volume with a local
path, you can synchronize it with your storage using the CLI:
This is particularly useful for development workflows where you need to:
Push code changes to a development environment
Download computation results
Share data between team members
Best Practices
Use Read-Only Volumes for Input: Protect your source data by mounting input volumes as read-only:
Organize Volumes by Purpose: Structure your volumes based on their role:
Use Volume References: When the same volume is used in multiple jobs, define it once and reference it:
For complete details on volume configuration options and advanced usage, refer to our workflow syntax documentation and Apolo Flow CLI reference.
The apolo-extras
CLI provides powerful tools for managing data transfers between your storage systems. This extension to the main Apolo CLI enables seamless movement of data between external storage systems and your Apolo cluster, as well as transfers between different clusters.
For complete details on Apolo Extras CLI refer to the CLI reference.
Copying Data with apolo-extras data cp
The apolo-extras data cp
command serves as a bridge between external storage systems and your Apolo cluster. It supports multiple major cloud storage providers and protocols:
Amazon Web Services (AWS) S3
Google Cloud Storage (GCS)
Microsoft Azure Blob Storage
HTTP/HTTPS endpoints
Let's explore how to use this command effectively:
Basic Data Copy Operations
To copy data between storage systems, use this basic syntax:
For example, to download data from S3 to your cluster storage:
Or to upload data to Google Cloud Storage:
Working with Archives
The tool provides built-in compression and extraction capabilities. This is particularly useful when dealing with large datasets:
For extraction:
For compression:
The system automatically supports various archive formats:
.tar.gz, .tgz (Gzipped tar archives)
.tar.bz2, .tbz (Bzip2 compressed tar archives)
.tar (Uncompressed tar archives)
.gz (Gzip compressed files)
.zip (ZIP archives)
Resource Management
You can control the resources allocated for data transfer operations:
Transferring Between Clusters
The apolo-extras data transfer
command is specifically designed for moving data between different Apolo clusters. This is particularly useful for:
Migrating datasets between regions
Sharing data between development and production environments
Creating backups across clusters
Basic usage: