INFERENCE DOCUMENTATION
Errors, Statuses & Troubleshooting
Common error responses, endpoint statuses, and how to debug inference issues.
Errors, Statuses & Troubleshooting
Error Response Format
All error responses use a consistent shape:
JSON
{
"error": "Human readable message"
}HTTP status codes are meaningful.
Common Errors (Elastic & Dedicated)
| Status | Error Message | Cause / Fix |
|---|---|---|
| 401 | Invalid API key | Key missing, malformed, or revoked. Regenerate. |
| 401 | Missing or invalid Authorization header | No Bearer token when the endpoint requires auth. |
| 403 | You must generate an API key first to use inference. | Logged-in session user has zero active keys. Create one. |
| 403 | Unauthorized access to dedicated endpoint | Trying to call an endpt- ID you do not own. |
| 403 | Forbidden: Access denied. | Using a key that does not own the dedicated endpoint (only applies to /api/d/{id}/... routes). |
| 500 | Failed to communicate with inference engine | Upstream error from the inference network. Retry with backoff. |
| 500 | Internal Server Error | Unexpected failure in the proxy. Check logs / contact support. |
Dedicated Endpoint Statuses
See the Dedicated Inference page for the full table.
Key ones for debugging:
building— Normal during first 2–10 minutes. Do not call yet.failed— Look at the Logs tab in the dashboard. Common causes: model too large for chosen hardware, gated model without token, container crash.up— Healthy.
Debugging Checklist
Elastic Calls Failing
- Verify the key is active in Settings → Keys.
- Confirm you are using a valid display name from the catalog.
- Try the exact same request in the dashboard playground (isolates client vs server issues).
- Check that your client properly handles SSE (many issues are on the consumer side).
Dedicated Endpoint Not Responding
- Confirm status is
upin the dashboard. - Check the Logs tab for runtime startup errors or out-of-memory conditions.
- If you used a gated model, confirm the HF token was valid at provisioning time.
- Try the in-dashboard playground bound to that endpoint ID — it removes network variables.
Slow First Token / High Latency
- Large models on Elastic can have occasional cold-start latency on first request.
- Dedicated endpoints that have scaled to zero ("SLEEPING") will wake on first request ("WAKING UP"). You'll see a clear message in the playground; allow a short delay before retrying. Our S3 cache makes this much faster than a full cold start.
- Very high
max_tokensor complex prompts increase time-to-first-token.
Streaming Client Issues
If you are not seeing tokens:
- Make sure you are reading the response as a stream and parsing lines that start with
data:. - Do not buffer the entire response.
- Handle
data: [DONE]\n\nas the terminator. - Some frameworks (especially older
fetchwrappers) have poor SSE support — consider using a dedicated library (eventsource,openaiSDK, etc.).
See Code Examples for robust client patterns.
Still Stuck?
- Check the dashboard logs for dedicated endpoints first.
- For Elastic, the proxy logs detailed errors server-side (we monitor these).
- Use the enterprise contact form for production-impacting issues.
We treat inference reliability as the highest priority for the MVP.
Was this helpful?Send feedback