Quickstart — First Inference Call
Get an API key and make your first streaming chat completion in under two minutes.
Quickstart — First Inference Call
This guide gets you from zero to a working streaming inference call as fast as possible.
1. Create an Account & Get an API Key
-
Go to https://hypervize.tech and sign in (or create an account) using Auth0.
- A default
inference-scoped API key is automatically generated for you on signup (via database trigger).
- A default
-
If needed, go to Settings → Keys (or directly to
/dashboard/settings/keys) to view it or create additional keys. -
Give any new key a name (e.g., “Production – Elastic”) and choose the
inferencescope. -
Copy the key immediately. It will look like:
TEXThvz_live_3f8a9c2e1b4d5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0c1d2e3f4a5b6c7d8e9f0a1b2c
Important: Treat this key like a password. It grants access to inference on your behalf.
You now have everything needed for the Elastic Inference API.
2. Make Your First Call (cURL)
Replace YOUR_API_KEY with the key you just generated.
curl https://hypervize.tech/api/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "claude-sonnet-4.6",
"messages": [
{"role": "user", "content": "Explain scale-to-zero inference in one paragraph."}
],
"max_tokens": 256,
"stream": true
}'You should see Server-Sent Events (SSE) chunks starting with data: {...} and ending with data: [DONE].
3. Try a Different Model
The model field accepts the display name from our catalog (recommended) or the raw provider identifier.
Popular starting models:
claude-sonnet-4.6llama-3-3-70b-instructmistral-large-3deepseek-v3.2qwen3-32b
See the full list in Supported Models.
4. (Optional) Use the Dashboard Playground
- Go to Dashboard → Inference.
- Switch to the Elastic tab.
- Select a model from the grouped dropdown.
- Type a prompt and hit Send.
The playground shows token usage, TTFB, and total latency — very useful while you’re learning the catalog.
5. Next: Try a Dedicated Endpoint
Once you’re comfortable with Elastic:
- Go to Dashboard → Inference.
- Switch to the Dedicated tab.
- Enter a Hugging Face model ID (e.g.,
meta-llama/Llama-3.1-8B-Instruct). - Configure min/max instances and deploy.
After provisioning completes (status changes to ONLINE), you can call it at:
https://hypervize.tech/api/d/{endpoint-id}/chat/completionsDedicated endpoints support the identical request format as Elastic.
Troubleshooting First Calls
| Symptom | Likely Cause | Fix |
|---|---|---|
| 401 Invalid API key | Key not copied correctly or revoked | Regenerate in dashboard |
| 403 You must generate an API key first to use inference. | Session user has no active inference keys (note: a default key is auto-generated on signup) | Create or reactivate an inference-scoped key in Settings → Keys |
| 403 Unauthorized dedicated | Using someone else’s endpoint ID | Only use IDs you own |
| No streaming / empty response | Client not handling SSE correctly | Use stream: true and read the stream |
| Slow first token | Cold start on a large model | Use smaller models for testing or provision a dedicated endpoint |
What’s Next?
- Read the full Authentication & API Keys guide
- Deep dive into Elastic Inference API
- Learn how to deploy Dedicated Inference Endpoints