Supported Models

Elastic Inference Catalog

Hypervize Elastic Inference currently exposes dozens of models across many leading providers through our global serverless network.

The catalog is defined in types/model-catalog.ts (in the main codebase) and is the source of truth for both the API and the dashboard.

Key Providers & Highlights

Anthropic — Claude Sonnet 4.6, Opus 4.7, Haiku 4.5 (vision + text)
Meta — Llama 4 Scout, Llama 3.3 70B, Llama 3.2 11B (vision)
Mistral — Mistral Large 3, Devstral, Ministral
DeepSeek — DeepSeek V3.2
Qwen — Qwen3 Coder, Qwen3 VL, Qwen3 Next, Qwen3 32B
NVIDIA — Nemotron models
Google — Gemma 3 family
AI21, MiniMax, OpenAI OSS, Amazon Nova, Stability AI, TwelveLabs, Writer — Full coverage

How to Reference Models

Recommended: Use the display name (e.g., claude-sonnet-4.6, qwen3-32b).

The proxy automatically maps it to the correct backend model.

You may also pass the raw provider identifier if you have a specific reason.

Pricing & Context

Every model entry includes:

Input and output cost per million tokens (or per image for vision models)
Recommended max_tokens
Context window size (where known)

These values are surfaced in the dashboard model picker and used for estimated cost logging in the proxy.

Dedicated Endpoints — Any Hugging Face Model

When you deploy a Dedicated endpoint, you are not limited to the Elastic catalog.

You can deploy:

Any public Hugging Face model (e.g., meta-llama/Meta-Llama-3-8B-Instruct)
Gated models (by supplying a Hugging Face access token at deploy time)
Your own fine-tuned or merged models (upload to HF or use private S3 in future)

Hardware is automatically sized based on:

Model safetensors size (preferred when available via HF API)
Heuristic fallback on model name tokens (70b, 8x22b, v3, large, etc.)

Current tiers (subject to change):

1× / 4× / 8× A10G (g5)
8× A100
8× H100
8× B200 (Blackwell)

Vision & Multimodal

Several Elastic models support images. Send them using the standard OpenAI vision message format:

JSON

{
  "role": "user",
  "content": [
    { "type": "text", "text": "Describe this image" },
    { "type": "image_url", "image_url": { "url": "https://..." } }
  ]
}

Dedicated endpoints support vision if the underlying model and our inference runtime configuration support it.

Model Updates & Deprecations

The Elastic catalog is updated periodically as new high-quality models become available through our network.
When a model is deprecated or retired we will announce a migration window with recommended alternatives.
Dedicated endpoints are yours — you control the model version completely.

Requesting New Models (Elastic)

If you need a specific model that is not yet available in the Elastic catalog, open a support request or enterprise conversation. We evaluate and add high-demand models on a regular cadence.

Supported Models

Supported Models

Elastic Inference Catalog

Key Providers & Highlights

How to Reference Models

Pricing & Context

Dedicated Endpoints — Any Hugging Face Model

Vision & Multimodal

Model Updates & Deprecations

Requesting New Models (Elastic)

Related