
Agentic AI keeps long-lived context instead of discarding state after each query. That means that KV (Key-Value) caches must persist across many steps, as explained in this article from The Register, meaning that memory residency time grows from milliseconds to hours or days. However, GPUs stall if context cannot be accessed quickly enough. This turns memory into the primary scaling constraint.
Demand will rise because KV caches ideally stay in HBM for speed, but agentic workloads require far more capacity than current HBM can provide. Even though HBM is expensive, it remains the fastest tier and will be saturated first. You can therefore expect strong demand for larger-capacity HBM stacks (HBM4, HBM4E).
Here demand will rise because DRAM becomes the overflow tier when HBM is insufficient. Agentic AI increases total memory footprint per GPU node. As a result, you require more DRAM per server, higher-bandwidth DIMMs, and memory-rich CPU hosts.
Demand for this memory rises because CXL enables disaggregated, pooled memory with sub-100 ns latency. Offloading KV cache to CXL memory can reduce GPU memory usage by up to 87% and multiple agents can share the same context without duplication. Therefore, buyers should expect to see rapid growth in CXL memory modules, memory pooling appliances, and coherent fabrics.
In this contact, demand rises because new “G3.5” tiers bridge the gap between HBM and SSD. They are designed specifically for large, streaming KV cache reads and enable scale-out context storage without crippling latency. This means that you can expect to see increased demand for high-performance NVMe, RDMA-attached flash, and specialized inference-context storage appliances.
Demand here rises because memory tiers only work if GPUs can access them with minimal jitter. Agentic AI increases cross-node memory traffic, which means you can expect to see growth in ultra-low-latency networking hardware and memory-centric fabrics.
Agentic AI will shift the industry from “more compute” to “more memory, closer to compute.” The biggest winners will be:
Hyperscalers are driving and will continue to drive the first and largest wave of demand—especially for HBM and CXL memory fabrics. Enterprises will follow with a more modest but steady increase in DRAM, CXL modules, and NVMe for local inference. Most companies will deploy:
Why? Because many agentic AI use cases involve:
So enterprises will increase memory per server, but not to hyperscaler levels.
HBM is the biggest winner because agentic AI keeps KV caches “alive” for long periods and needs extreme bandwidth. This demand for supplies here is from hyperscalers.
SK Hynix
Samsung
Micron
Agentic AI pushes more memory off GPU into host DRAM and CXL pools. DRAM demand will rise across the board, especially for servers with 1–2 TB per node.
Samsung, SK Hynix, Micron
All three dominate DRAM. There will be demand increases for:
CXL is the biggest structural shift because agentic AI requires disaggregated, pooled memory. CXL demand is driven by hyperscalers first, and then by enterprise on-prem. The standout non-memory vendor in this segment is likely to be Astera Labs.
Samsung
SK Hynix
Micron
Astera Labs
Marvell
Agentic AI creates a new “G3.5” tier: fast flash used as extended KV cache. Flash demand will increase for both near-compute tiers and bulk storage
Western Digital
Kioxia
Samsung
Solidigm (SK Hynix subsidiary)
Therefore current and future memory demand will mostly be driven by hyperscalers.
It will affect the following suppliers:
There will be a secondary but growing demand driven by enterprise on-premises requirements.
It will affect supplies from:
The shortage continues to persist, making supply management increasingly complex. Contact us to check the actual availability of the components you are using and identify reliable alternative sources, reducing risks to your supply chain.
Newsletter archives:
Why Agentic AI Will Change Memory Demand
Blog article archives:
Search news articles: