Deploying Text-to-Speech and Speech-to-Text Models in Apolo
Overview
Step 1: Deploying the Speaches Service via Terminal
mkdir -p speaches-demo/.apolo# yaml-language-server: $schema=https://raw.githubusercontent.com/neuro-inc/neuro-flow/refs/heads/master/src/apolo_flow/flow-schema.json kind: live title: Speaches Demo defaults: life_span: 1d volumes: data: remote: storage:speaches-demo/data mount: /root/.ollama local: data hf_cache: remote: storage:ollama_server/cache mount: /root/.cache/huggingface jobs: ollama: # its actually ollama but lets keep the same name image: ollama/ollama:0.6.5 life_span: 10d detach: true preset: H100x1 http_port: 8000 env: OLLAMA_KEEP_ALIVE: -1 OLLAMA_FLASH_ATTENTION: 1 OLLAMA_KV_CACHE_TYPE: q4_0 OLLAMA_HOST: "0.0.0.0:8000" volumes: - $[[ volumes.data.ref_rw ]] entrypoint: /bin/bash cmd: > -c "/bin/ollama serve & sleep 15 && /bin/ollama pull gemma3:4b && wait" speaches: image: ghcr.io/neuro-inc/speaches:sha-662eef8-cuda-12.4.1 life_span: 10d detach: true preset: H100x1 http_port: 8000 http_auth: false volumes: - $[[ volumes.hf_cache.ref_rw ]] env: CHAT_COMPLETION_BASE_URL: http://${{ inspect_job(params.ollama).internal_hostname_named }}:8000/v1 CHAT_COMPLETION_API_KEY: ollama params: ollama: ollamacd speaches-demo apolo-flow run ollama apolo-flow run speachesCHAT_COMPLETION_BASE_URL: http://${{ inspect_job(params.ollama).internal_hostname_named }}:8000/v1
apolo run \ --name speaches \ --preset a100x1 \ --volume storage:speaches/hf-hub-cache:/home/ubuntu/.cache/huggingface/hub \ --http-port 8000 \ --no-http-auth \ ghcr.io/neuro-inc/speaches:sha-662eef8-cuda-12.4.1
Step 2: Using the Speaches Playground UI
Speech-to-Text Demo

Text-to-Speech Demo

Chat Demo
Step 3: Using the Speaches API Programmatically
Conclusion
Additional Resources
Last updated
Was this helpful?