Research Blog
Exploring neural-symbolic systems, hardwired LLM inference, 3D chiplet integration, and diffusion-oriented accelerators. Early-stage concepts at the intersection of algorithms, architecture, and silicon.
AI Hardware Weekly Digest: WorldKV for Video World Models, Gated DeltaNet-2 Linear Attention, and HRM-Text Efficient Pretraining
AI Hardware Weekly Digest: WorldKV for Video World Models, Gated DeltaNet-2 Linear Attention, and HRM-Text Efficient Pretraining
AI Hardware Weekly Digest: Anthropic-Microsoft Maia 200 Deal Talks, LlamaWeb Browser Inference, and InnerQ Hardware-Aware Quantization
AI Hardware Weekly Digest: Anthropic-Microsoft Maia 200 Deal Talks, LlamaWeb Browser Inference, and InnerQ Hardware-Aware Quantization
AI Hardware Weekly Digest: Runtime-Certified Quantized Attention, BrainChip AKD1500 Mass Production, and Qwen3.7-Max 1M Context
AI Hardware Weekly Digest: Runtime-Certified Quantized Attention, BrainChip AKD1500 Mass Production, and Qwen3.7-Max 1M Context
AI Hardware Weekly Digest: Exciton-Polariton All-Optical Computing, Samsung Strike Deal, and Custom AI ASIC Landscape
AI Hardware Weekly Digest: Exciton-Polariton All-Optical Computing, Samsung Strike Deal, and Custom AI ASIC Landscape
AI Hardware Weekly Digest: TurboQuant 4.6× KV Cache Compression, RePlaid Continuous Diffusion Scaling, and CSIRO Vetra Edge AI
AI Hardware Weekly Digest: TurboQuant 4.6× KV Cache Compression, RePlaid Continuous Diffusion Scaling, and CSIRO Vetra Edge AI
AI Hardware Weekly Digest: TriAxialKV Mixed-Precision Quantization, KVDrive Multi-Tier Cache, and NVIDIA $106B Earnings
AI Hardware Weekly Digest: TriAxialKV Mixed-Precision Quantization, KVDrive Multi-Tier Cache, and NVIDIA $106B Earnings
AI Hardware Weekly Digest: SANA-WM Efficient World Model, Samsung HBM for Mobile, and Google I/O 2026
AI Hardware Weekly Digest: SANA-WM Efficient World Model, Samsung HBM for Mobile, and Google I/O 2026
AI Hardware Weekly Digest: CXMT 719% Revenue Surge, Memristor CIM Breakthrough, and Google I/O 2026
AI Hardware Weekly Digest: CXMT 719% Revenue Surge, Memristor CIM Breakthrough, and Google I/O 2026
AI Hardware Weekly Digest: KV-RM Static-Graph Serving, Quantization Security Risks, and Samsung Strike Threat
AI Hardware Weekly Digest: KV-RM Static-Graph Serving, Quantization Security Risks, and Samsung Strike Threat
AI Hardware Weekly Digest: World Action Models, Multi-Agent Coordination, and Cerebras IPO Pricing
AI Hardware Weekly Digest: World Action Models, Multi-Agent Coordination, and Cerebras IPO Pricing
AI Hardware Weekly Digest: FibQuant KV Cache, KV-Fold Recurrence, and Jensen Huang's China Trip
AI Hardware Weekly Digest: FibQuant KV Cache, KV-Fold Recurrence, and Jensen Huang’s China Trip
AI Hardware Weekly Digest: int4 KV Cache Beats fp16 on Apple Silicon, Federation of Experts, and Cerebras IPO
AI Hardware Weekly Digest: int4 KV Cache Beats fp16 on Apple Silicon, Federation of Experts, and Cerebras IPO
AI Hardware Weekly Digest: LaProx KV Cache, SpikingBrain, Cola DLM, and Intel's Neuromorphic Bet
AI Hardware Weekly Digest: LaProx KV Cache, SpikingBrain, Cola DLM, and Intel’s Neuromorphic Bet
AI 硬件研究周报(2026.05.11):EA-WM 事件感知生成世界模型、RecursiveMAS 递归多智能体系统、机器人世界模型综述
AI 硬件研究周报(2026.05.11):EA-WM 事件感知生成世界模型、RecursiveMAS 递归多智能体系统、机器人世界模型综述
AI 硬件研究周报(2026.05.10):GYAN 神经符号语言模型、Embody4D 4D 世界模型、PV-VAE 预测性视频生成、ParoQuant 旋转量化
AI 硬件研究周报(2026.05.10):GYAN 神经符号语言模型、Embody4D 4D 世界模型、PV-VAE 预测性视频生成、ParoQuant 旋转量化
AI 硬件研究周报(2026.05.07):OpenAI 机器人硬件分拆上市、Broadcom 10GW 定制加速器、Flow Matching ODE 求解器硬件优化
AI 硬件研究周报(2026.05.07):OpenAI 机器人硬件分拆上市、Broadcom 10GW 定制加速器、Flow Matching ODE 求解器硬件优化
AI 硬件研究周报(2026.05.06):NEURON 神经符号临床系统、无 DRAM AI 推理芯片(Fractile)、WindowQuant VLM KV Cache 量化、SNN 无反向传播学习
AI 硬件研究周报(2026.05.06):NEURON 神经符号临床系统、无 DRAM AI 推理芯片(Fractile)、WindowQuant VLM KV Cache 量化、SNN 无反向传播学习
AI 硬件研究周报(2026.05.05):视觉生成五层范式演进、Dual-Blade 边缘 KV Cache 卸载、RISC-V 成为 AI 硬件开放基础
AI 硬件研究周报(2026.05.05):视觉生成五层范式演进、Dual-Blade 边缘 KV Cache 卸载、RISC-V 成为 AI 硬件开放基础
AI 硬件研究周报(2026.05.04):KV Cache 三维优化(DepthKV/PolyKV/CacheFlow)、HBM-PIM 张量加速、World-R1 几何一致性世界模型
AI 硬件研究周报(2026.05.04):KV Cache 三维优化(DepthKV/PolyKV/CacheFlow)、HBM-PIM 张量加速、World-R1 几何一致性世界模型
AI 硬件研究周报(2026.05.03):图世界模型统一范式、EdgeSpike 超低功耗 SNN 框架、HfO₂ 忆阻突触降低 70% 能耗
AI 硬件研究周报(2026.05.03):图世界模型统一范式、EdgeSpike 超低功耗 SNN 框架、HfO₂ 忆阻突触降低 70% 能耗
AI 硬件研究周报(2026.05.02):普林斯顿3D生物混合神经芯片、腾讯HY-Embodied具身模型、LLM递归自我改进的数学不可能性证明
AI 硬件研究周报(2026.05.02):普林斯顿3D生物混合神经芯片、腾讯HY-Embodied具身模型、LLM递归自我改进的数学不可能性证明
AI 硬件研究周报(2026.05.01):具身 AI 的 3D 生成综述、脉冲神经元逻辑电路、Motubrain 世界动作模型
AI 硬件研究周报(2026.05.01):具身 AI 的 3D 生成综述、脉冲神经元逻辑电路、Motubrain 世界动作模型
Agentic Harness Engineering: 可观测性驱动的编码智能体 Harness 自动进化
Agentic Harness Engineering: 可观测性驱动的编码智能体 Harness 自动进化
AI 硬件研究周报(2026.04.30):代理世界模型分类学、行为克隆缩放定律、信号折叠神经形态硬件
AI 硬件研究周报(2026.04.30):代理世界模型分类学、行为克隆缩放定律、信号折叠神经形态硬件
AI 硬件研究周报(2026.04.29):LingBot-Map 流式 3D 重建、DeepSeek V4 混合注意力架构、MOMO 机器人技能学习
AI 硬件研究周报(2026.04.29):LingBot-Map 流式 3D 重建、DeepSeek V4 混合注意力架构、MOMO 机器人技能学习
AI 硬件研究周报(2026.04.28):概率计算处理器、信号折叠神经形态硬件、低功耗计算机视觉挑战
AI 硬件研究周报(2026.04.28):概率计算处理器、信号折叠神经形态硬件、低功耗计算机视觉挑战
AI 硬件研究周报(2026.04.26):边缘 LLM 推理的 KV Cache 优化、CPU-GPU 混合注意力、跨数据中心 Prefill 服务
AI 硬件研究周报(2026.04.26):边缘 LLM 推理的 KV Cache 优化、CPU-GPU 混合注意力、跨数据中心 Prefill 服务
Design Conductor: AI Agent 12小时自主设计1.5GHz RISC-V CPU
Design Conductor: AI Agent 12小时自主设计1.5GHz RISC-V CPU
AI 硬件研究周报(2026.04.18-04.25):世界模型用于机器人训练、分子忆阻器神经形态硬件、NSF NeuroAI 路线图
AI 硬件研究周报(2026.04.18-04.25):世界模型用于机器人训练、分子忆阻器神经形态硬件、NSF NeuroAI 路线图
AI 硬件研究周报(2026.04.18-04.24):LLM 生成硬件的表示瓶颈、概率 Ising 机并行加速、KV Cache 神经垃圾回收
AI 硬件研究周报(2026.04.18-04.24):LLM 生成硬件的表示瓶颈、概率 Ising 机并行加速、KV Cache 神经垃圾回收
Hardware-Efficient Neuro-Symbolic Networks with Exp-Minus-Log Operator
原文: arXiv:2604.13871 核心贡献: 提出 DNN-EML 架构,使用单一硬件可实现的 Sheffer 算子实现神经符号网络
The Price Is Not Right: Neuro-Symbolic AI Outperforms VLAs with 100x Lower Energy
原文: arXiv:2602.19260 作者: Timothy Duggan, Pierrick Lorang, Hong Lu, Matthias Scheutz 机构: Tufts University 核心贡献: 神经符号方法在结构化长视野操作任务上超越 VLA,能耗降低 100 倍
Switch-Centric In-Network Architecture for Accelerating LLM Inference
Switch-Centric In-Network Architecture for Accelerating LLM Inference
Neuromorphic Computing for Low-Power Artificial Intelligence
Neuromorphic Computing for Low-Power Artificial Intelligence
HY-Embodied-0.5: Embodied Foundation Models for Real-World Agents
HY-Embodied-0.5: Embodied Foundation Models for Real-World Agents
EPAC: The Last Dance - 欧洲RISC-V HPC加速器芯片的全栈实践
EPAC: The Last Dance - 欧洲RISC-V HPC加速器芯片的全栈实践
CUTEv2: 面向多样化CPU架构的统一可配置矩阵扩展
CUTEv2: 面向多样化CPU架构的统一可配置矩阵扩展
AI Hardware Weekly Digest - April 14, 2026
AI Hardware Weekly Digest - April 14, 2026
Build on Priors: 视觉-语言引导的神经符号模仿学习实现数据高效的机器人操作
Build on Priors: 视觉-语言引导的神经符号模仿学习实现数据高效的机器人操作
EvoSkills: 通过协同进化验证实现智能体技能的自我进化
EvoSkills: 通过协同进化验证实现智能体技能的自我进化
AI 硬件加速前沿:从 3D 堆叠内存到 LLM 解码优化
AI 硬件加速前沿:从 3D 堆叠内存到 LLM 解码优化
MicroScopiQ: 通过异常值感知微缩放量化加速基础模型
原文: arXiv:2411.05282 | PDF 会议: ISCA 2025, Tokyo, Japan 作者: Akshat Ramachandran, Souvik Kundu, Tushar Krishna 机构: Georgia Institute of Technology, Intel La...
TIE Scheduler: Uncertainty-Aware Output Length Prediction for Efficient LLM Inference Scheduling
TIE Scheduler: Uncertainty-Aware Output Length Prediction for Efficient LLM Inference Scheduling
SOLE-R1: Video-Language Reasoning as the Sole Reward for On-Robot Reinforcement Learning
SOLE-R1: Video-Language Reasoning as the Sole Reward for On-Robot Reinforcement Learning
Integer-State Dynamics of Quantized Spiking Neural Networks for Efficient Hardware Acceleration
Integer-State Dynamics of Quantized Spiking Neural Networks for Efficient Hardware Acceleration
SALS: 潜在空间稀疏注意力实现 KV Cache 压缩
原文链接: arXiv:2510.24273 PDF
Video Generation Models as World Models: Efficient Paradigms, Architectures and Algorithms Survey
Video Generation Models as World Models: Efficient Paradigms, Architectures and Algorithms Survey
GPU-FPGA Heterogeneous Systems for Disaggregated LLM Inference: Memory Processing Pipeline Acceleration
GPU-FPGA Heterogeneous Systems for Disaggregated LLM Inference: Memory Processing Pipeline Acceleration
EVA: Aligning Video World Models with Executable Robot Actions via Inverse Dynamics Rewards
EVA: Aligning Video World Models with Executable Robot Actions via Inverse Dynamics Rewards
SPINIC: Programmable Superconducting Neuron with In-Memory Computation for Ultra-Efficient Neuromorphic Computing
SPINIC: Programmable Superconducting Neuron with In-Memory Computation for Ultra-Efficient Neuromorphic Computing
RoboStereo: Dual-Tower 4D Embodied World Models for Unified Policy Optimization
RoboStereo: Dual-Tower 4D Embodied World Models for Unified Policy Optimization
GPU-FPGA 异构系统加速 LLM 推理中的内存处理
GPU-FPGA 异构系统加速 LLM 推理中的内存处理
AS2: Attention-Based Soft Answer Sets for End-to-End Differentiable Neuro-Soft-Symbolic Reasoning
AS2: Attention-Based Soft Answer Sets for End-to-End Differentiable Neuro-Soft-Symbolic Reasoning
Beyond GEMM-Centric NPUs: 高效扩散 LLM 采样架构
原文: arXiv:2601.20706 | PDF 作者: Binglei Lou, Haoran Wu, Yao Lai, et al. (Imperial College London, University of Cambridge) 核心贡献: 针对扩散 LLM 采样优化 NPU 架构,提出 d-...
异构计算:AI Agent 推理的未来关键
原文: arXiv:2601.22001 | PDF 作者: Aaron Zhao (Imperial College London), Junyi Liu (Microsoft Research) 核心贡献: 提出系统级异构计算是 AI Agent 推理的关键,识别”内存容量墙”问题
KernelCraft: 面向新兴硬件的Agentic底层内核生成基准测试
原文: arXiv:2603.08721 | PDF 作者: Jiayi Nie, Haoran Wu, et al. (University of Cambridge, Imperial College London, AMD, University of Edinburgh) 核心贡献: 首个评估 LL...
EQ-ViT: Algorithm-Hardware Co-Design for Real-Time Vision Transformer Acceleration on Versal ACAP
原文: EQ-ViT: Algorithm-Hardware Co-Design for End-to-End Acceleration of Real-Time Vision Transformer Inference on Versal ACAP Architecture 会议: ESWEEK 2024...
Neuromorphic Computing Roadmap: Scaling Brain-Inspired AI to Production
Neuromorphic Computing Roadmap: Scaling Brain-Inspired AI to Production
MediaTek Genio Pro: 50+ TOPS Edge AI Chip for Robotics and Embodied Intelligence
MediaTek Genio Pro: 50+ TOPS Edge AI Chip for Robotics and Embodied Intelligence
Innatera Synfire: Unifying the Neuromorphic Ecosystem for Edge AI
Innatera Synfire: Unifying the Neuromorphic Ecosystem for Edge AI
Google TurboQuant: 6x KV Cache Compression with Near-Optimal Distortion Rate
TurboQuant: Online Vector Quantization with Near-Optimal Distortion Rate
UCV: 通过软件原生优化普及和加速硬件验证
论文: Democratizing and Accelerating Hardware Verification with Software-Native Optimization 会议: ISCA 2026 核心贡献: UnityChip Verification (UCV) - 软件原生硬件验证平台
VMXDOTP: RISC-V Vector ISA Extension for Efficient Microscaling (MX) Format Acceleration
VMXDOTP: RISC-V Vector ISA Extension for Efficient Microscaling (MX) Format Acceleration
UCV: 软件原生硬件验证平台 - ISCA 2026
UCV: 通过软件原生优化实现硬件验证的民主化与加速
Programmable Superconducting Neuron for Ultra-Efficient Neuromorphic Computing
Programmable Superconducting Neuron for Ultra-Efficient Neuromorphic Computing
ReNN-RV: Run-time PE Reconfiguration for DNN Inference Acceleration with Custom RISC-V ISA
ReNN-RV: Run-time PE Reconfiguration for DNN Inference Acceleration with Custom RISC-V ISA
LeWorldModel: Stable End-to-End JEPA World Models from Pixels
LeWorldModel: Stable End-to-End JEPA World Models from Pixels
Helios: Hardware-Software Co-design for 3D-DRAM-based LLM Serving Accelerator
Helios: Hardware-Software Co-design for 3D-DRAM-based LLM Serving Accelerator
ChatNeuroSim: LLM Agent Framework for Automated CIM Accelerator Deployment
ChatNeuroSim: LLM Agent Framework for Automated CIM Accelerator Deployment and Optimization
AA-DiT: Algorithm-Architecture Co-Design for Diffusion Transformer Acceleration
AA-DiT: Algorithm-Architecture Co-Design for Diffusion Transformer Acceleration
PRISM: Photonic Similarity Engine for KV Cache Block Selection in Long-Context LLM Inference
PRISM: Photonic Similarity Engine for KV Cache Block Selection in Long-Context LLM Inference
PdNeuRAM: Forming-Free Multi-Bit ReRAM for Energy-Efficient Neuromorphic Computing
PdNeuRAM: Forming-Free, Multi-bit Pd/HfO₂ ReRAM for Energy-Efficient Neuromorphic Computing
Neuro-Symbolic AI Survey: Task-Directed Advances in the Black-Box Era
Neuro-Symbolic Artificial Intelligence: A Task-Directed Survey in the Black-Box Models Era
LeWorldModel: Stable End-to-End JEPA from Pixels for Embodied AI
LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels
Hummingbird+: Advancing FPGA-based LLM Deployment from Research Prototype to Edge Product
Hummingbird+: Advancing FPGA-based LLM Deployment from Research Prototype to Edge Product
Hummingbird: A Smaller and Faster Large Language Model Accelerator on Embedded FPGA
Hummingbird: A Smaller and Faster Large Language Model Accelerator on Embedded FPGA
ME-ViT: Memory-Efficient FPGA Accelerator for Vision Transformers
ME-ViT: A Single-Load Memory-Efficient FPGA Accelerator for Vision Transformers
Daily Research: Neuromorphic Computing & Spiking Neural Networks
🔍 Today’s Research Focus
Daily Research Roundup: LLM Hardware Acceleration & World Models
🔍 Today’s Research Focus
Understanding Bottlenecks for Efficiently Serving LLM Inference With KV Offloading
Understanding Bottlenecks for Efficiently Serving LLM Inference With KV Offloading
Speculating Experts Accelerates Inference for Mixture-of-Experts: 通过专家预取加速 MoE 推理
Speculating Experts Accelerates Inference for Mixture-of-Experts: 通过专家预取加速 MoE 推理
The Price Is Not Right: Neuro-Symbolic Methods Outperform VLAs on Structured Long-Horizon Manipulation Tasks
The Price Is Not Right: Neuro-Symbolic Methods Outperform VLAs on Structured Long-Horizon Manipulation Tasks
VLA-Perf: VLA 推理性能全景分析——NVIDIA 首个系统性研究
VLA-Perf: VLA 推理性能全景分析——NVIDIA 首个系统性研究
DS2SC-Agent: 从数据手册到 SystemC 模型的多智能体自动化生成流水线
DS2SC-Agent: 从数据手册到 SystemC 模型的多智能体自动化生成流水线
ZipServ: Fast and Memory-Efficient LLM Inference with Hardware-Aware Lossless Compression
原文链接: arXiv PDF
MINISA: Minimal Instruction Set Architecture for Next-gen Reconfigurable Inference Accelerator
原文链接: arXiv PDF
Large Video Planner: 基于视频生成的通用机器人控制新范式
Large Video Planner: 基于视频生成的通用机器人控制新范式
Large Video Planner: 用视频生成实现通用机器人控制
Large Video Planner: 用视频生成实现通用机器人控制
History-Guided Video Diffusion: 用历史引导实现超长视频生成
History-Guided Video Diffusion: 用历史引导实现超长视频生成
模型够聪明之后,工程师该做什么:Harness Engineering 实战指南
模型够聪明之后,工程师该做什么:Harness Engineering 实战指南
Design Conductor: AI 自主构建 1.5GHz RISC-V CPU 的突破性进展
Design Conductor: AI 自主构建 1.5GHz RISC-V CPU 的突破性进展
ZipServ: 硬件感知的无损压缩加速 LLM 推理
ZipServ: Fast and Memory-Efficient LLM Inference with Hardware-Aware Lossless Compression
Mozart: Modularized and Efficient MoE Training on 3.5D Wafer-Scale Chiplet Architectures
Mozart: Modularized and Efficient MoE Training on 3.5D Wafer-Scale Chiplet Architectures
HyperOffload: 图驱动的分层内存管理让大模型突破显存限制
HyperOffload: 图驱动的分层内存管理让大模型突破显存限制
PointWorld: Scaling 3D World Models for In-The-Wild Robotic Manipulation
PointWorld: Scaling 3D World Models for In-The-Wild Robotic Manipulation
Orion: 苹果神经引擎 (ANE) 上的 LLM 训练与推理系统
Orion: Characterizing and Programming Apple’s Neural Engine for LLM Training and Inference
History-Guided Video Diffusion: 用历史引导实现超长视频生成
History-Guided Video Diffusion: 用历史引导实现超长视频生成
GOMA: 通过解析建模实现空间加速器的几何最优映射
GOMA: Geometrically Optimal Mapping via Analytical Modeling for Spatial Accelerators
Synthesis-in-the-Loop Evaluation of LLMs for RTL Generation: Quality, Reliability, and Failure Modes
Synthesis-in-the-Loop Evaluation of LLMs for RTL Generation: Quality, Reliability, and Failure Modes
TOM: 三元只读存储器加速器赋能边缘智能大模型
TOM: 三元只读存储器加速器赋能边缘智能大模型
SNAP-V: 面向小型脉冲神经网络的可配置神经形态 RISC-V SoC
SNAP-V: 面向小型脉冲神经网络的可配置神经形态 RISC-V SoC
ROMA: 基于只读存储器的 QLoRA 边缘设备 LLM 加速器
ROMA: 基于只读存储器的 QLoRA 边缘设备 LLM 加速器
MedBayes-Lite: 临床 Transformer 的轻量级贝叶斯不确定性量化框架
MedBayes-Lite: 临床 Transformer 的轻量级贝叶斯不确定性量化框架
LLM 推理硬件的挑战与研究方向:内存与互连是核心瓶颈
LLM 推理硬件的挑战与研究方向:内存与互连是核心瓶颈
LEGOSim: 多芯片异构集成的统一并行仿真框架
LEGOSim: 多芯片异构集成的统一并行仿真框架
Taalas: 模型专用硬件 - 将AI模型转化为硅芯片
Taalas: 模型专用硬件 - 将AI模型转化为硅芯片
Taalas: Model-Specialized Hardware - Turning AI Models into Silicon
ROMA: 基于ROM的QLoRA边缘设备LLM加速器
ROMA: 基于ROM的QLoRA边缘设备LLM加速器
Neural-Symbolic AI Hardware: Unifying Pattern Learning and Logic
Why this direction matters
Hardwired LLM Accelerators: From Programmable Kernels to Fixed-Flow Inference
Motivation
Diffusion Model Accelerators: Efficient Sampling Beyond Brute-Force Denoising
Problem framing
3D Chiplet Systems for AI: Bandwidth-Centric Compute Integration
Why chiplets for AI now
HSCO-Bench: 首个端到端硬件软件协同设计基准测试
HSCO-Bench: An Agent-Driven End-to-End Hardware-Software Co-design Benchmark for Systems-on-Chip
CPPL: 面向 LLM 的电路提示编程语言
CPPL: A Circuit Prompt Programming Language
DeepStack: 分布式3D堆叠AI加速器的设计空间探索框架
DeepStack: 分布式3D堆叠AI加速器的设计空间探索框架
A Scalable Approach to Probabilistic Neuro-Symbolic Robustness Verification
A Scalable Approach to Probabilistic Neuro-Symbolic Robustness Verification
MicroScopiQ: 通过异常值感知微缩放量化加速基础模型
MicroScopiQ: 通过异常值感知微缩放量化加速基础模型