Latest News

News & Events

Tackling the Dynamicity in a Production LLM Serving System with SOTA Optimizations via Hybrid Prefill/Decode/Verify Scheduling on Efficient Meta-kernels

2025-04-01

Make way for X-Y Serve! Prof. Yin Shouyi and Prof. Hu Yang from the School of Integrated Circuits, in collaboration with Huawei XiaoYi AI Infra@HuaweiAPAC, unveils a high-performance large language model serving system. By unifying computations into hardware-friendly kernels, significant improvement is achieved on Ascend NPUs (Outperforms A800!). The technology is fully adaptable to GPU architectures—and ready for the future.