Business/HuggingFace

DeepInfra Joins Hugging Face as Official Inference Provider

HuggingFace

May 6, 2026

◷ 2 MIN

Original source

huggingface.co — read the full announcement →

What DeepInfra's Integration Means

Hugging Face just added DeepInfra to its official Inference Providers roster. That means anyone browsing the Hub can now spin up a model on DeepInfra's infrastructure without leaving the Hugging Face interface. No separate account, no extra API key—just click and run. DeepInfra supports a long list of open models: Llama 3, Mistral, Gemma, Yi, and plenty more. Pricing remains per-token and famously cheap—often a fraction of what AWS or even Hugging Face's own Inference Endpoints charge. For developers who've been stitching together workflows across multiple platforms, this is one less headache.

Why This Integration Happened Now

Hugging Face has been quietly building out its inference ecosystem for a while. It started with the official Inference API, then added Inference Endpoints for dedicated hardware, and later opened the door for third-party providers. Before DeepInfra, the list included Replicate, Together AI, and several others. But DeepInfra carved out a reputation for being the cheapest option for open models, especially for longer sequences. Hugging Face clearly wants to be the front door for both model discovery and deployment. Bringing DeepInfra in-house makes that vision more complete. It also signals that Hugging Face isn't afraid to compete with its own API—the company seems to care more about ecosystem lock-in than short-term inference revenue.

What This Means for Developers and the Market

If you're building a product that calls open models, this changes the calculus. You can now prototype on Hugging Face, switch inference providers with a dropdown, and only commit when you're sure. DeepInfra's pricing is aggressive: for many models, it's 30–50% cheaper than the next cheapest option. That's not a rounding error—it's the difference between a viable startup and burning cash. For Hugging Face, it's a strategic play: the more services you use through their platform, the harder it is to leave. For DeepInfra, it's a distribution win. But there's a tension: Hugging Face is effectively handing customers to a competitor on its own marketplace. That's either very confident or very desperate.

Open Questions and Risks

DeepInfra has had reliability issues in the past—sporadic timeouts, cold starts that take longer than advertised. The official Hugging Face integration may force them to tighten up, but it's not guaranteed. Also unclear: how does this affect model availability? DeepInfra doesn't support every model on the Hub, especially newer or finicky ones. And what about enterprise customers who need SLAs and data residency? DeepInfra's data centers are mostly in the US and Europe, but details are thin. Finally, the revenue split between Hugging Face and DeepInfra is opaque. If DeepInfra is paying Hugging Face a cut, that might eat into their low-cost advantage. Watch for user reports of latency spikes or unexpected billing.

Watch video

Click to play

Frequently Asked Questions

What exactly is DeepInfra?▾

DeepInfra is a cloud inference provider that specializes in running open-source LLMs at scale. It's known for aggressive per-token pricing and support for models like Llama, Mistral, and Gemma. The service is designed for developers who need fast, cheap inference without managing GPU infrastructure.

Why did Hugging Face add DeepInfra as an official provider?▾

Hugging Face wants to be the go-to platform for the entire ML lifecycle, from model discovery to deployment. Adding DeepInfra gives users a low-cost inference option directly on the Hub, reducing friction and encouraging developers to stay within Hugging Face's ecosystem rather than hopping to external services.

How does DeepInfra's pricing compare to Hugging Face's own Inference API?▾

DeepInfra is generally cheaper per token, especially for longer sequences and larger models. However, Hugging Face's Inference Endpoints offer dedicated hardware and better reliability for production workloads. For prototyping and low-traffic use, DeepInfra is often the better deal.

Do I need a separate DeepInfra account to use this integration?▾

No. You can select DeepInfra as your inference provider directly from the Hugging Face model page, and billing goes through your Hugging Face account. That's the whole point—one login, one bill, multiple backends.

Which models are supported through DeepInfra on Hugging Face?▾

DeepInfra supports a broad set of popular open models, including Llama 3, Mistral 7B, Mixtral 8x7B, Gemma, Yi, and many more. However, not every model on the Hub is available—check the DeepInfra documentation for the full list. Support for newer models may lag by a few weeks.