SoftBank’s Transformer-Based AI-RAN Hits 30% Uplink Gain at Sub-Millisecond Latency

What they measured (and how)

SoftBank ran its new Transformer model for uplink channel interpolation on GPUs in an over-the-air (OTA) 5G system compliant with 3GPP, then compared against (a) a conventional non-AI baseline and (b) its previous CNN approach. Results and processing timing were captured under real-time constraints.

Key data points

Metric	Result
Uplink throughput vs. non-AI baseline	≈ +30%
Uplink throughput vs. SoftBank’s prior CNN	≈ +8%
End-to-end AI processing time	≈ 338 μs average (requirement: <1 ms)
Downlink throughput (simulation, SRS prediction) at 80 km/h	up to ≈ +29% vs. prior MLP (~+13%)
Downlink throughput (simulation, SRS prediction) at 40 km/h	up to ≈ +31%

Why these tasks?

Channel interpolation estimates the full channel from sparse pilots; better estimates lead to higher UL rates, especially under interference.
SRS prediction helps sustain beamforming quality between sounding intervals, which matters as device counts rise and intervals stretch.

What’s new in the model

SoftBank’s architecture leans on three technical choices:

Self-attention to capture wide time–frequency correlations that CNNs miss;
No input normalization (preserving raw amplitudes) to retain physically meaningful information for tasks like channel estimation;
Unified head that swaps output layers to serve multiple PHY/MAC tasks (interpolation/estimation, SRS prediction, demodulation).

CNN receptive field vs. Transformer self-attention in OFDM grid

From concept to field: AI-RAN’s practical step

The team showed real-time OTA execution on GPUs with sub-millisecond latency, a bar often cited as the gating constraint for PHY/MAC AI in commercial RAN. SoftBank also argues that GPU-centric AI-RAN enables post-deployment model upgrades via software, potentially improving capex efficiency as models evolve.

This follows SoftBank’s March 2025 disclosures that validated three AI-for-RAN use cases—uplink channel interpolation (~~+20% UL in lab), SRS prediction (**~~+13% DL** at 80 km/h), and AI-assisted MAC scheduling (~+8% avg.)—developed with NVIDIA (ARC-OTA testbed) and Fujitsu.

Testbed note: NVIDIA’s ARC-OTA is a full-stack, real-time OTA platform built on Aerial CUDA-accelerated L1 (Hopper GPU) and OAI L2 (Grace CPU), designed explicitly for AI-RAN research and over-the-air model trials.

Context: SoftBank’s wider “beyond 5G” push

6G spectrum exploration: July 2025 outdoor trials with Nokia at 7 GHz (centimeter-wave) mark Japan’s first reported operator trial of this band.
Sub-THz mobility: 2024 field tests showed terahertz links supporting connected-car scenarios, broadening THz use beyond FWA/NFC.
Ecosystem posture: SoftBank is a founding member of the AI-RAN Alliance, which has expanded from its 2024 launch to ~75+ members by early 2025.

Outdoor trial node (7 GHz) in Tokyo streetscape]

Business takeaways (signal, not sizzle)

Performance at the right layer: A ~30% UL lift at 338 μs processing time is notable because UL is increasingly stressed by cloud/video creation and AI traffic patterns; gains are harder here than on DL.
Upgrade path: If GPU DU/CU pools can host multiple PHY/MAC AI tasks with a shared backbone, software-only refresh cycles may amortize hardware over more model generations. Vendor roadmaps (Aerial, ARC-OTA availability) support this direction.
Ops economics: SoftBank explicitly links AI-RAN + GPUs to capex efficiency and 5G-Advanced/6G readiness, but operators will still model power costs and site thermal budgets carefully.

Engineering questions to watch

Generalization & robustness: How does the model behave across bands, mobility profiles, and interference regimes outside the training set?
Energy per bit: What’s the J/bit delta vs. classic DSP/FPGA chains once batched efficiently on shared accelerators?
Lifecycle & toolchain: How are datasets curated, validated, and versioned for safe continuous deployment in live RANs?
Interoperability: Integration with O-RAN split options, vendor stacks, and scheduler co-design remains critical.