Quickstart — First Inference Call

This guide gets you from zero to a working streaming inference call as fast as possible.

1. Create an Account & Get an API Key

Go to https://hypervize.tech and sign in (or create an account) using Auth0.
- A default inference-scoped API key is automatically generated for you on signup (via database trigger).
If needed, go to Settings → Keys (or directly to /dashboard/settings/keys) to view it or create additional keys.
Give any new key a name (e.g., “Production – Elastic”) and choose the inference scope.

Copy the key immediately. It will look like:

TEXT

hvz_live_3f8a9c2e1b4d5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0c1d2e3f4a5b6c7d8e9f0a1b2c

Important: Treat this key like a password. It grants access to inference on your behalf.

You now have everything needed for the Elastic Inference API.

2. Make Your First Call (cURL)

Replace YOUR_API_KEY with the key you just generated.

BASH

curl https://hypervize.tech/api/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4.6",
    "messages": [
      {"role": "user", "content": "Explain scale-to-zero inference in one paragraph."}
    ],
    "max_tokens": 256,
    "stream": true
  }'

You should see Server-Sent Events (SSE) chunks starting with data: {...} and ending with data: [DONE].

3. Try a Different Model

The model field accepts the display name from our catalog (recommended) or the raw provider identifier.

Popular starting models:

claude-sonnet-4.6
llama-3-3-70b-instruct
mistral-large-3
deepseek-v3.2
qwen3-32b

See the full list in Supported Models.

4. (Optional) Use the Dashboard Playground

Go to Dashboard → Inference.
Switch to the Elastic tab.
Select a model from the grouped dropdown.
Type a prompt and hit Send.

The playground shows token usage, TTFB, and total latency — very useful while you’re learning the catalog.

5. Next: Try a Dedicated Endpoint

Once you’re comfortable with Elastic:

Go to Dashboard → Inference.
Switch to the Dedicated tab.
Enter a Hugging Face model ID (e.g., meta-llama/Llama-3.1-8B-Instruct).
Configure min/max instances and deploy.

After provisioning completes (status changes to ONLINE), you can call it at:

TEXT

https://hypervize.tech/api/d/{endpoint-id}/chat/completions

Dedicated endpoints support the identical request format as Elastic.

Troubleshooting First Calls

Symptom	Likely Cause	Fix
401 Invalid API key	Key not copied correctly or revoked	Regenerate in dashboard
403 You must generate an API key first to use inference.	Session user has no active inference keys (note: a default key is auto-generated on signup)	Create or reactivate an inference-scoped key in Settings → Keys
403 Unauthorized dedicated	Using someone else’s endpoint ID	Only use IDs you own
No streaming / empty response	Client not handling SSE correctly	Use `stream: true` and read the stream
Slow first token	Cold start on a large model	Use smaller models for testing or provision a dedicated endpoint

What’s Next?

Read the full Authentication & API Keys guide
Deep dive into Elastic Inference API
Learn how to deploy Dedicated Inference Endpoints