TradingKey - The recent exponential progress in generative AI has created a pressing issue: how do we deliver AI model outputs to users in real-time, at scale, and around the world? AI models today such as ChatGPT or Midjourney are remarkably powerful, but delivering their outputs to billions of users in real-time is stretching the limits of current infrastructure.
Years ago, content delivery networks (CDNs) revolutionized web performance by caching images and videos closer to users. Now, we're at the same inflection point for AI – except rather than static content, it's AI inference (running a trained model to get a result) that needs a distribution network.
There is a growing consensus that we need a new “routing” layer in the AI stack, one that can route AI requests in real time to the nearest available compute node. Stargate is a proposal for a smart, decentralized routing network for mapping clients to the best AI inference nodes, a “CDN for AI.” In this piece, we discuss how Stargate can revolutionize AI deployment, why edge inference is the new AI bottleneck, and which companies will win during the transition.
Imagine an AI request soaring across a network, seeking out the optimal server and model to run it on – the same way an internet packet seeks out the fastest route. That's Stargate's vision: a decentralized, intelligent model-routing network that directs each AI query to the optimal available inference node, whether that's a GPU in a nearby data center or an underused server on the other side of the world. It's what Akamai did for web content, but for AI model inference.
Analysts have already begun calling this the search for an “Akamai of AI,” and predict startups for inference scaling will emerge to fill this requirement. In the same way that CDNs route and cache content based on location and network conditions, Stargate would route AI workloads based on model availability, latency, and price.
Cloudflare (NET) is already building out its global network to run AI models at the edge. Cloudflare's CTO calls the network edge the “goldilocks” zone for AI inference – not as resource-poor as user devices, and not as distant as centralized clouds. Even OpenAI, SoftBank, Oracle (ORCL) and NVIDIA (NVDA) are investing in infrastructure that would realize this vision.
The announced Stargate Project – a $500 billion project – will build new AI supercomputing centers and networks in the United States. While much of that is aimed at training capacity, the involvement of Oracle and NVIDIA indicates interest in efficient deployment, too.
Source: the-sun.com
Stargate facilitates real-time decision-making. It would direct a low-complexity request to a local, low-power model; a high-complexity one, to a high-capacity GPU. The network would even split tasks between nodes. Stargate would function as a smart traffic controller, balancing dynamically changing demand and supply, accelerating global AI computing and making it more efficient.
Cloudflare's Workers AI platform already routes AI requests to its infrastructure worldwide. This inaugural Stargate envisions the day AI requests are routed the same way web traffic is today – globally, intelligently, and in real time. And decentralized projects like Together AI, Bittensor, and Gensyn go a step further, creating open networks where compute and models are contributed and monetized.
To that purpose, Stargate is not a single company's project. It's an across-industry effort to create a new basis for real-time AI.
The AI landscape is shifting. Training models were once the holy grail. Inference – the act of actually deploying models into practice – is the real cost driver now. You can train a model like GPT-4 once, but it's run billions of times in end-user applications. You generate an inference every time you talk to a chatbot, recommender engine, or AI assistant.
Inference is continuous and costly. SemiAnalysis states that inference compute for models like ChatGPT quickly outpaces training compute. OpenAI CEO Sam Altman has recognized the astronomical cost of serving AI to users. When even a fraction of a cent is being spent per request, the costs can add up to tens of millions of dollars a month.
Aside from price, latency is the most critical factor. AI is in the user experience critical path now. No one wishes to wait for the chatbot to reply. However, big models are big and have a tendency to run in a few centralized cloud regions. Your experience will be bad if you are in India and the model is in Virginia.
There are three important limitations:
Moving hosting models nearer to users (edge inference) solves latency at the cost of additional hardware requirement. Central clouds offer scale but aren't able to deliver sub-100ms responses worldwide with consistency. And there is restricted GPU availability – NVIDIA chips have low supply, and peak demand can overburden even big data centers.
Stargate provides an elegant solution: smartly route requests through a mesh of compute providers. Just like the electricity grid balances supply and demand, Stargate would move AI requests to where compute is available and efficient. This can make AI go from elite and costly to ubiquitous and seamless.
Source: Precedence Research
To place where Stargate is, we need to look at the new AI infrastructure stack that's being constructed. At the base is the hardware: NVIDIA and AMD (AMD) chips, Arista Networks (ANET) networking gear, and data center power and cooling systems. On top of that is compute – the actual physical servers and cloud platforms themselves that AI models live on. Edge compute is gaining traction here, with the likes of CoreWeave and Lambda Labs offering decentralized GPU access.
And then there is the inference layer: the platforms like Modal and Together AI that help developers deploy and scale their models. And then there is the routing layer – that is where Stargate comes in. It is the logic that determines where to send each AI request. It is what connects all of the available compute to the user and application demand.
Finally, at the top are the models themselves – by OpenAI, Anthropic, Meta, Mistral, and others – and the applications that leverage them.
Stargate is the glue that would bind this stack together as a real-time AI internet. Without it, Compute is fragmented, and developers must manually deal with model deployments. AI requests could be routed globally, intelligently, and efficiently with Stargate.
This routing layer is immature today. Each company builds its own infrastructure, which tends to create redundancy and inefficiencies. An open or de facto routing layer could consolidate this, as CDNs did to web content delivery in the 2000s. The investment opportunity is in identifying this white space before it gets filled.
Several companies are tackling the Stargate idea from different angles. Cloudflare is likely the front-runner. With AI Gateway and Workers AI, it's already up and running and hosting AI workloads at the edge. Its neutrality – it doesn't own chips or models – makes it a natural potential platform for a wide range of AI services.
CoreWeave is also a key player along with Nebius (NBIS) a smaller player in the sector. It is a decentralized GPU cloud operator and would be a prospective supply-side participant in a routing network. Modal, RunPod, and Lambda Labs offer similar infrastructure and either would be integrated into Stargate networks or acquired.
Palantir (PLTR) is a wildcard strategically. Through its Artificial Intelligence Platform (AIP), it is doing AI orchestration in the enterprise. Though it does not traffic the world around, it might bring in Stargate-like intelligence to assist enterprises in navigating inference across geographies and compliance zones.
Big cloud behemoths – Amazon (AMZN), Microsoft (MSFT), Google (GOOG) – can build their own routing infrastructure. They possess scale and infrastructure but might be limited by the walled-garden model of interoperability. Telcos, with 5G infrastructure and edge nodes, can also get into the space with physical proximity to users.
Decentralized protocols like Gensyn and Bittensor add another layer. These networks propose open, token-incentivized compute sharing – a community-created Stargate of sorts. Speculative though they are, they show demand for Big Tech-independent infrastructure alternatives is growing.
The competitive landscape is taking form. No one is the clear winner just yet. However, with Cloudflare's growth and partnerships, it's definitely a strong contender to be the Akamai of AI. It'll be fascinating to witness alliances, acquisitions, and standards emerge.
Source: TradingKey
Investors must chase the enablers of Stargate-scale infrastructure. Nvidia is the AI compute kingpin. More inference means more GPUs. AMD, with its MI300X chips, offers a choice, especially for buyers seeking price or supply variety. Supermicro, a GPU server manufacturer, is benefiting from the AI data center demand boom. Arista Networks, with its high-speed networking gear, connects all of these systems.
Cloudflare makes a intriguing bet on AI routing. It unifies CDN, edge compute, and serverless AI execution on one platform. If it becomes the default Stargate, it could lock in a lot of usage-based revenue. Palantir, although more of a software orchestrator, would benefit from more efficient infrastructure that lowers costs and accelerates wider adoption of its AIP.
CoreWeave and Modal are private companies and pure plays on AI inference infrastructure. They offer exposure to the Stargate theme if they get listed or acquired. Wildcards like Groq (custom inference chips) or decentralized protocols like Bittensor and Gensyn can potentially disrupt the ecosystem if their innovations prove successful.
The whole AI inference market could be worth more than $67.8 billion by 2030. If Stargate captures even a small percentage as a routing fee, the revenue opportunity is enormous. Think of it as the Visa of AI inference – charging per transaction, at scale.
Source: market.us
Stargate is the critical missing piece of AI infrastructure. If AI is everywhere, then it also must be instant – real-time, at scale, and everywhere. Step one was training models. Step two is deploying them with intelligence.
The Stargate vision – a smart, distributed routing layer for AI inference – addresses this issue head-on. It requires the establishment of a new standard, just like CDNs did for web content. Integrators or builders of this layer will be crucial.
To investors, Stargate is a window into where the next value in AI is. It’s not in models or chips, but in the systems that bind them. Those systems – edge networks, routing engines, decentralized protocols – are the picks and shovels of the AI gold rush.
Cloudflare, Nvidia, Arista, Palantir, CoreWeave, and others will benefit. So, too, though, will the early movers who take a bet on the uncontested middle layer of the stack. With AI queries spiking into the billions daily, the infrastructure to host them is not just valuable – it's necessary.
Stargate could very well be the backbone of the AI economy, powering the future behind the scenes, one query at a time.