Pricing — Inferex AI Inference Platform

Starter

Free

Up to 10k inferences/day

Compare Plans

Everything you need to choose the right plan.

Feature	Pro	Enterprise
Inference latency optimizer
Auto-scaling
Multi-hardware support
Real-time monitoring
Custom model deployment
SLA guarantee	99.9%	99.99%
Dedicated support	Email	Priority 24/7
On-premise deploy

One inference = one model forward pass. Batched requests count as one inference per input item in the batch.

Yes. Plan changes take effect immediately. Downgrades are prorated to your billing cycle.

No. Model weights remain in your infrastructure. Inferex injects optimization at the runtime layer only. We never see your model artifacts.

Pro and Enterprise plans support NVIDIA GPU (A100, H100, L40S), Intel/AMD CPU, and edge TPU/NPU devices. Starter supports cloud CPU only.