Supported Models
Elastic model catalog (37 models) and guidance for Dedicated endpoints.
Supported Models
Elastic Inference Catalog
Hypervize Elastic Inference currently exposes dozens of models across many leading providers through our global serverless network.
The catalog is defined in types/model-catalog.ts (in the main codebase) and is the source of truth for both the API and the dashboard.
Key Providers & Highlights
- Anthropic — Claude Sonnet 4.6, Opus 4.7, Haiku 4.5 (vision + text)
- Meta — Llama 4 Scout, Llama 3.3 70B, Llama 3.2 11B (vision)
- Mistral — Mistral Large 3, Devstral, Ministral
- DeepSeek — DeepSeek V3.2
- Qwen — Qwen3 Coder, Qwen3 VL, Qwen3 Next, Qwen3 32B
- NVIDIA — Nemotron models
- Google — Gemma 3 family
- AI21, MiniMax, OpenAI OSS, Amazon Nova, Stability AI, TwelveLabs, Writer — Full coverage
How to Reference Models
Recommended: Use the display name (e.g., claude-sonnet-4.6, qwen3-32b).
The proxy automatically maps it to the correct backend model.
You may also pass the raw provider identifier if you have a specific reason.
Pricing & Context
Every model entry includes:
- Input and output cost per million tokens (or per image for vision models)
- Recommended
max_tokens - Context window size (where known)
These values are surfaced in the dashboard model picker and used for estimated cost logging in the proxy.
Dedicated Endpoints — Any Hugging Face Model
When you deploy a Dedicated endpoint, you are not limited to the Elastic catalog.
You can deploy:
- Any public Hugging Face model (e.g.,
meta-llama/Meta-Llama-3-8B-Instruct) - Gated models (by supplying a Hugging Face access token at deploy time)
- Your own fine-tuned or merged models (upload to HF or use private S3 in future)
Hardware is automatically sized based on:
- Model safetensors size (preferred when available via HF API)
- Heuristic fallback on model name tokens (
70b,8x22b,v3,large, etc.)
Current tiers (subject to change):
- 1× / 4× / 8× A10G (g5)
- 8× A100
- 8× H100
- 8× B200 (Blackwell)
Vision & Multimodal
Several Elastic models support images. Send them using the standard OpenAI vision message format:
{
"role": "user",
"content": [
{ "type": "text", "text": "Describe this image" },
{ "type": "image_url", "image_url": { "url": "https://..." } }
]
}Dedicated endpoints support vision if the underlying model and our inference runtime configuration support it.
Model Updates & Deprecations
- The Elastic catalog is updated periodically as new high-quality models become available through our network.
- When a model is deprecated or retired we will announce a migration window with recommended alternatives.
- Dedicated endpoints are yours — you control the model version completely.
Requesting New Models (Elastic)
If you need a specific model that is not yet available in the Elastic catalog, open a support request or enterprise conversation. We evaluate and add high-demand models on a regular cadence.
Related