The bottleneck isn’t the GPU.
That’s the thesis behind XCENA’s $135M Series B. LLM inference during the decode phase, the part where the model generates each token of a response, is constrained primarily by memory bandwidth, not compute throughput. Expensive GPUs sit idle waiting for data. XCENA’s architecture moves the computation inside the memory rather than shuttling data to a processor, according to the company’s announcement.
The round was co-led by Atinum Investment and IMM Investment and included SBI Investment, Mirae Asset Capital, Korea Development Bank, KDB Capital, and 16 additional Korean institutional investors, according to the company’s Business Wire announcement. The $570M post-money valuation reflects $135M raised on approximately $435M pre-money. XCENA’s cumulative funding now stands at $185M.
The product is the MX1 computational memory controller. According to XCENA, the MX1 integrates thousands of RISC-V CPU cores directly alongside up to 2 TB of DDR5 DRAM, connected via CXL 3.x, the interconnect protocol designed to allow memory and compute resources to communicate across different chips and vendors. XCENA claims the architecture can reduce AI inference server footprint requirements by up to 10x, though this figure hasn’t been independently verified. The company plans to mass-produce MX1 on Samsung’s 4nm process in late 2026, with commercial revenue expected to begin in 2027, according to its announcement.
All figures in this brief originate from the company’s press release announcement. Source URLs are currently unresolved; the Business Wire press release (release ID 20260529005112) is the primary source pending URL confirmation.
The memory-bandwidth bottleneck thesis itself isn’t XCENA’s invention. It’s consistent with published AI inference research and has driven a wave of infrastructure investment this year, from SK Hynix’s iHBM announcement to Micron and Samsung’s participation in Anthropic’s Series H as strategic infrastructure partners, covered in ‘s infrastructure brief. XCENA’s bet is that solving the bottleneck happens at the controller layer, below HBM and above the CPU, by embedding computation directly in commodity DDR5 memory at CXL scale.
XCENA was founded by former Samsung Electronics and SK Hynix design executives, according to the Wire’s package, this claim also originates from the company’s materials and hasn’t been separately verified. If accurate, it’s meaningful context: the people who built the memory architecture are betting that the memory layer is where the inference optimization value gets captured.
What to Watch
The real story is
where in the inference stack the value ultimately concentrates. GPU vendors claim the compute layer. HBM manufacturers are capturing the high-bandwidth memory layer, SK Hynix and Micron both crossed $1T market caps on AI infrastructure demand. XCENA is making a separate bet: that a controller layer between the two is where the real efficiency gains live in LLM decode workloads. Twenty Korean institutional investors put $135M behind that thesis.
The 2027 revenue timeline is the first validation gate. If MX1 ships on Samsung 4nm in late 2026 and enterprise customers adopt it at scale, the footprint reduction claim becomes testable. Watch the Samsung 4nm production schedule and any XCENA design-win announcements in Q4 2026.