Skip to main content

FriendliAI Launches InferenceSense™ to Monetize Idle GPU Capacity

No GPU fleet runs at full capacity around the clock. InferenceSense™ automatically fills idle cycles with paid AI inference workloads—and shares the revenue with you.

FriendliAI, The Frontier AI Inference Cloud, today launched Friendli InferenceSense™, the industry’s first inference monetization platform purpose-built for GPU cloud operators.

This press release features multimedia. View the full release here: https://www.businesswire.com/news/home/20260312112245/en/

InferenceSense tackles a persistent and expensive reality: GPU clusters cost billions to build and operate, yet many sit idle or underutilized for large portions of every day.

The Problem with GPU Utilization

GPU infrastructure demands massive capital outlay—a single H100 rents for ~$2.00/hour; an 8-GPU node, $16–20/hour—yet no fleet achieves 100% utilization. Training jobs are inherently bursty: they complete, and the hardware goes dark until the next run. Even fully-committed neoclouds experience idle windows between customer workloads.

Every idle GPU-hour is lost margin.

What InferenceSense™ Does

Friendli InferenceSense detects idle GPU capacity in your infrastructure and fills it with monetizable AI inference workloads. When your own workloads need the GPUs back, InferenceSense preempts immediately—your jobs always come first.

Think of it as “AdSense for GPUs”: just as digital publishers use AdSense to automatically monetize available pixel space with high-yield demand, GPU operators can now use InferenceSense to monetize every available GPU cycle.

Integration is frictionless. Operators retain full control—choosing which nodes participate, setting time-of-day schedules, and defining exactly how much spare capacity InferenceSense may use.

Demand is built in. There is no need to source inference customers independently—FriendliAI brings a ready pool of global demand for widely-used open-weight models including DeepSeek, Qwen, Kimi, GLM, and MiniMax, and dispatches workloads to partner hardware automatically. Token revenue generated on those GPUs is shared between the operator and FriendliAI, with no upfront fees and no minimum commitments.

Crucially, the operator’s own workloads always take priority. The moment a scheduler reclaims a GPU, InferenceSense gracefully vacates—monetized workloads are designed to be preempted, ensuring production jobs are never delayed.

Architecture

When InferenceSense detects available GPU capacity, it spins up secured, fully-isolated containers that serve paid AI inference workloads. Under the hood, FriendliAI’s battle-tested inference engine maximizes token throughput per GPU-hour—squeezing peak economic value from every idle cycle.

The moment your scheduler reclaims a GPU, InferenceSense’s preemption controller gracefully terminates the monetized workload and returns the hardware within seconds—zero downtime, zero disruption, zero config changes.

The Economics: From Idle to Income

The prevailing GPU cloud model charges by the hour. Between customer workloads, revenue drops to zero—but the cost of power, cooling, and depreciation never stops. InferenceSense converts that dead time into an incremental revenue stream.

The mechanics are straightforward: FriendliAI aggregates global, real-time demand for popular open-weight models—DeepSeek, Qwen, Kimi, GLM, and others—and routes paid inference workloads to partner GPUs. Partners earn a share of the token revenue generated during otherwise-empty hours. FriendliAI owns the demand pipeline, model optimization, and serving stack; the partner contributes idle capacity.

Because token generation scales with computational efficiency, monetized inference workloads can generate significantly higher economic yield per GPU-hour than traditional rental models.

There is no upfront cost and no minimum commitment. If a GPU is idle, it earns. The moment your workloads need it back, InferenceSense yields instantly. The bottom line: infrastructure that generates margin even when your own customers aren’t on it.

Why We Built This

“The modern data center isn't just a massive compute cluster—it is an AI factory, a high-performance production environment built to manufacture intelligence at scale. Yet most GPU operators act like traditional landlords, watching revenue evaporate every time a workload finishes, or a contract ends,” said Byung-Gon Chun, CEO of FriendliAI.

“The industry is building these massive factories, but most GPU clouds are still missing the inference assembly line that actually transforms raw compute into tokens—the true finished goods of this era.

InferenceSense provides that missing assembly line. Every idle GPU-hour becomes a chance to serve real AI demand and capture token revenue. We own the demand pipeline, the optimization, and the serving—our partners simply plug in and earn. The AI factory build-out only makes sense when it actually makes cents.”

Who It’s For

InferenceSense is designed for any organization operating GPU-dense infrastructure—GPU neoclouds, ML platforms, and research institutions. Any operator whose GPUs are not fully utilized around the clock is a candidate.

Get Started

Friendli InferenceSense™ is now accepting applications from qualified GPU cloud operators. To explore how InferenceSense can unlock new revenue from your existing infrastructure, contact partners@friendli.ai to schedule an executive briefing during NVIDIA GTC.

About FriendliAI

FriendliAI is The Frontier AI Inference Cloud. Built by the researchers who invented the continuous batching technique that is now industry standard, FriendliAI provides AI engineers with a highly optimized engine that constantly evolves to efficiently run state-of-the-art open-weight and custom models at production scale. By maximizing GPU utilization, FriendliAI delivers speeds up to 3x faster than vLLM, and 50% to 90% cost savings relative to closed model APIs. FriendliAI empowers engineers to deploy frontier AI with uncompromising speed, model ownership, and enterprise-grade reliability.

For more information, visit www.friendli.ai.

“The industry is building these massive factories, but most GPU clouds are missing the inference assembly line that actually transforms raw compute into tokens. The AI factory build-out only makes sense when it actually makes cents."

Contacts

Recent Quotes

View More
Symbol Price Change (%)
AMZN  209.53
-3.12 (-1.47%)
AAPL  255.76
-5.05 (-1.94%)
AMD  197.74
-7.09 (-3.46%)
BAC  47.13
-1.39 (-2.86%)
GOOG  303.21
-5.21 (-1.69%)
META  638.18
-16.68 (-2.55%)
MSFT  401.86
-3.02 (-0.75%)
NVDA  183.14
-2.89 (-1.55%)
ORCL  159.16
-3.96 (-2.43%)
TSLA  395.01
-12.81 (-3.14%)
Stock Quote API & Stock News API supplied by www.cloudquote.io
Quotes delayed at least 20 minutes.
By accessing this page, you agree to the Privacy Policy and Terms Of Service.