GPT OSS
Overview
GPT OSS is a family of open-source large language models developed for broad accessibility and performance. They are optimized for high-quality text generation and support both base and instruction-tuned variants, making them suitable for a wide range of NLP workloads such as chat, reasoning, and content creation.
GPT OSS models are available through the vLLM application and are designed for instant deployment, minimizing the need for manual setup.
For advanced use cases, you can switch to the configurable application variant, which allows customization of parameters such as server-extra-args
, Ingress authentication, and more.
Managing application via Apolo CLI
GPT OSS can be installed on Apolo either via the CLI or the Web Console. Below are the detailed instructions for installing using Apolo CLI.
Install via Apolo CLI
Step 1 — Use the CLI command to get the application configuration file template:
apolo app-template get gpt-inference > gpt-oss.yaml
Step 2 — Customize the application parameters. Below is an example configuration file:
# Application template configuration for: gpt-inference
# Fill in the values below to configure your application.
# To use values from another app, use the following format:
# my_param:
# type: "app-instance-ref"
# instance_id: "<app-instance-id>"
# path: "<path-from-get-values-response>"
# yaml-language-server: $schema=https://api.dev.apolo.us/apis/apps/v2/templates/gpt-inference/v25.7.1/schema
template_name: gpt-inference
template_version: v25.7.1
input:
# Apolo Secret Configuration.
hf_token:
key: ''
# Enable or disable autoscaling for the LLM.
autoscaling_enabled: false
size: gpt-oss-20b
llm_class: gpt-oss
Explanation of configuration parameters:
HuggingFace token: Set HuggingFace model that you want to deploy.
Autoscaling: enable if needed true/false
Size: One of the available model sizes
Step 3 — Deploy the application in your Apolo project:
apolo app install -f gpt-oss.yaml
Monitor the application status using:
apolo app list
To uninstall the application, use:
apolo app uninstall <app-id>
If you want to see logs of the application, use:
apolo app logs <app-id>
For instructions on how to access the application, please refer to the Usage section.
Usage
After installation, you can utilize GPT OSS for different kind of workflows:
Go to the Installed Apps tab.
You will see a list of all running apps, including the GPT OSS app you just installed. To open the detailed information & uninstall the app, click the Details button.
Once in the Details" page, scroll down to the Outputs sections. To launch the applications, find the HTTP API output with with the public domain address, copy and open it and paste to the script.
import requests
API_URL = "<APP_HOST>/v1/chat/completions"
headers = {
"Content-Type": "application/json",
}
data = {
"model": "openai/gpt-oss-20b", # Must match the model name loaded by vLLM
"messages": [
{
"role": "system",
"content": "You are a helpful assistant that gives concise and clear answers.",
},
{
"role": "user",
"content": (
"I'm preparing a presentation for non-technical stakeholders "
"about the benefits and limitations of using large language models in our customer support workflows. "
"Can you help me outline the key points I should include, with clear, jargon-free explanations and practical examples?"
),
},
]
}
if __name__ == '__main__':
response = requests.post(API_URL, headers=headers, json=data)
response.raise_for_status()
reply = response.text
status_code = response.status_code
print("Assistant:", reply)
print("Status Code:", status_code)
References:
Apolo Documentation (for the usage of
apolo run
and resource presets)Hugging Face Model Hub (for discovering or hosting models)
Last updated
Was this helpful?