Start free, scale to enterprise. No hidden fees. Cancel anytime.
Everything you need to choose the right plan.
| Feature | Starter | Pro | Enterprise |
|---|---|---|---|
| Inference latency optimizer | |||
| Auto-scaling | |||
| Multi-hardware support | |||
| Real-time monitoring | |||
| Custom model deployment | |||
| SLA guarantee | 99.9% | 99.99% | |
| Dedicated support | Priority 24/7 | ||
| On-premise deploy |
One inference = one model forward pass. Batched requests count as one inference per input item in the batch.
Yes. Plan changes take effect immediately. Downgrades are prorated to your billing cycle.
No. Model weights remain in your infrastructure. Inferex injects optimization at the runtime layer only. We never see your model artifacts.
Pro and Enterprise plans support NVIDIA GPU (A100, H100, L40S), Intel/AMD CPU, and edge TPU/NPU devices. Starter supports cloud CPU only.
Talk to our team. We'll help you find the right plan for your workload.
Contact Sales