Dashboard Guide — Inference
How to use the Hypervize console for keys, elastic testing, and dedicated endpoint management.
Dashboard Guide — Inference
The Hypervize dashboard is the primary control plane for inference products during MVP.
Accessing the Dashboard
After logging in you will land on the main resource overview. The left sidebar contains:
- Fleet Overview — High-level view of all your resources (inference blocks today)
- Inference — The main hub for Elastic playground + Dedicated deployment
- Keys — API key management
Inference Tab
This is the most important screen for launch.
Elastic Playground
- Grouped model selector (by provider, color-coded)
- Chat interface with streaming
- Real-time telemetry (TTFB, prompt/completion tokens, estimated cost)
- Markdown rendering of responses
- Ability to switch between many models quickly
Use this to evaluate quality, latency, and pricing before committing to production traffic.
Dedicated Deployment Form
The right-hand side (or dedicated sub-tab) lets you:
- Choose source (Hugging Face or private registry — HF is primary in MVP)
- Enter a model ID (e.g.
Qwen/Qwen2.5-7B-Instruct) - (Optional) Paste a Hugging Face token for gated models
- Name the endpoint
- Set min and max instance count (controls auto-scaling range)
- Optionally attach a custom domain
- Toggle logging/telemetry
- Choose Public or Private auth mode
Live pricing estimate updates as you type the model ID (fetches HF metadata when possible).
After clicking Deploy you receive an endpt-... ID immediately. The dedicated capacity is provisioned asynchronously in the background.
Inference Block Detail Pages
Click any of your dedicated endpoints from the overview or the list on the Inference page.
Each block shows:
- Current status with color coding (
ONLINE,PROVISIONING,SLEEPING,WAKING UP,FAILED, etc.). Scale-to-zero endpoints show SLEEPING when hibernated and WAKING UP when a request wakes them. - Hourly burn rate (base + addons)
- Configuration summary
- Tabs: Overview, Logs, Playground
The embedded playground on the detail page is pre-bound to that specific dedicated endpoint — extremely useful for validation after provisioning.
Keys Management
Located under Settings → Keys (also reachable from the sidebar).
- View all active keys (masked)
- Create new keys with name + scope
- Revoke keys instantly
Remember: every account starts with one default inference key.
Resource Overview (Fleet)
Shows aggregate burn rate, number of active inference blocks, and quick links.
This will expand in future releases as more compute products are added, but for the inference MVP it focuses on your endpoints.
Tips for Launch
- Use the playground heavily before sending production traffic.
- Keep the dedicated detail page open after deploying — watch status change from
PROVISIONINGtoONLINE. For scale-to-zero, you'll also see transitions toSLEEPINGandWAKING UP. - Create environment-specific keys (e.g., “prod-website”, “internal-agents”).
- Check the Logs tab first when debugging a dedicated endpoint.
Related