Research Article

3D Chiplet Systems for AI: Bandwidth-Centric Compute Integration

March 11, 2026 · chiplet, 3d-integration, ai-systems

Rate this article:

0.0 (0 votes)

Why chiplets for AI now

Large AI workloads are increasingly memory- and communication-bound. Monolithic dies face reticle limits, poor yield at scale, and escalating cost. 3D chiplet architectures offer a path to scale bandwidth per watt.

Proposed architecture

Compose a vertical stack with:

top-tier compute chiplets (matrix/tensor engines),
mid-tier network-on-interposer routing chiplets,
bottom-tier stacked SRAM/HBM-like memory chiplets.

Use workload-aware placement where attention-heavy layers map close to high-bandwidth memory, while MLP-heavy sections map to dense compute layers.

Research hypothesis

Topology-aware graph partitioning plus thermal-aware placement can significantly improve effective throughput under power limits compared with flat multi-die systems.

Experimental roadmap

Build cycle-level simulator with inter-chiplet link models.
Introduce thermal and yield constraints into placement optimization.
Compare against monolithic and 2.5D baselines on LLM and diffusion workloads.