Why this happens
During periods of high demand, the inference engine may become temporarily overloaded. This is a transient condition that typically resolves on its own as traffic subsides.What you can do
-
Retry after a short delay
- Wait a few seconds before retrying your request
- Use exponential backoff to avoid adding to the congestion
-
Spread out requests
- If you’re sending many requests, consider spacing them out over time
- Implement request queuing to smooth traffic spikes
Inference API Errors