San Francisco Compute Inference
To use our openai-compatible inference API, please contact us. We serve these models:
- Qwen/QwQ-32B-Preview
Prompt: $0.13/million tokens, Completion: $0.20/million tokens - mistralai/Mistral-Nemo-Instruct-2407
Prompt: $0.04/million tokens, Completion: $0.09/million tokens - Gryphe/MythoMax-L2-13b
Prompt: $0.07/million tokens, Completion: $0.07/million tokens - meta-llama/Llama-3.3-70B-Instruct
Prompt: $0.39/million tokens, Completion: $0.39/million tokens
To check the current prices, use the models endpoint.
curl -X GET "https://inference.sfcompute.com/models" -H "Accept: application/json"
To check the current status of a model, use the status endpoint.
curl -X GET "https://inference.sfcompute.com/status" -H "Accept: application/json"