Researchers at the Korea Advanced Institute of Science and Technology (KAIST) have developed an energy-saving NPU (Neural Processing Unit) technology that has shown significant performance improvements in laboratory tests. Based on controlled experimental results, their dedicated AI chip runs AI models 60% faster than the graphics cards currently powering most AI systems, while reducing power consumption by 44%.

In simple terms, this research, led by Professor Jongse Park from KAIST’s School of Computer Science and in collaboration with HyperAccel, addresses one of the most pressing challenges facing modern AI infrastructure: the enormous energy and hardware demands of large-scale generative AI models. Current systems like OpenAI’s ChatGPT-4 and Google’s Gemini 2.5 not only require high memory bandwidth but also substantial memory capacity, prompting companies like Microsoft and Google to purchase hundreds of thousands of NVIDIA GPUs.
Memory Bottleneck Challenge
The core innovation lies in the team’s approach to solving the memory bottleneck issue that plagues existing AI infrastructure. Their energy-saving NPU technology focuses on a “lightweight” inference process while minimizing accuracy loss—a critical balance that has been challenging for previous solutions.
KAIST PhD student Minsu Kim and Dr. Seongmin Hong from HyperAccel, as co-first authors, presented their research findings at the 2025 International Symposium on Computer Architecture (ISCA 2025) held in Tokyo. The research paper titled “Oaken: Fast and Efficient LLM Service via Online-Offline Hybrid KV Cache Quantization” elaborates on their comprehensive approach to this issue.
The technology centers around KV cache quantization, which the researchers note accounts for a significant portion of memory usage in generative AI systems. By optimizing this component, the team was able to achieve the same level of AI infrastructure performance using fewer NPU devices than traditional GPU-based systems.
Technological Innovations and Architecture
The KAIST team’s energy-saving NPU technology employs a three-pronged quantization algorithm: threshold-based online-offline hybrid quantization, group shift quantization, and fused dense-sparse coding. This approach allows the system to integrate with existing memory interfaces without altering the operational logic in current NPU architectures. The hardware architecture utilizes page-level memory management techniques to efficiently leverage limited memory bandwidth and capacity. Additionally, the team introduced new encoding techniques specifically optimized for quantized KV caches to address the unique demands of their approach.
Professor Park explained, “Through our collaboration with HyperAccel, we found solutions in lightweight algorithms for generative AI inference and successfully developed a core NPU technology that addresses memory issues. With this technology, we achieved over 60% performance improvement compared to the latest GPUs while reducing memory requirements and maintaining inference accuracy through quantization techniques.”
Sustainability Impact
As the adoption of generative AI accelerates, the environmental impact of AI infrastructure has become an increasingly concerning issue. The energy-saving NPU technology developed by KAIST offers a potential pathway to more sustainable AI operations. With a 44% reduction in power consumption compared to current GPU solutions, widespread adoption could significantly reduce the carbon footprint of AI cloud services. However, the actual impact of this technology will depend on several factors, including manufacturability, cost-effectiveness, and industry adoption rates. The researchers acknowledge that their solution is a significant advancement, but widespread implementation will require ongoing development and industry collaboration.
Industry Background and Future Outlook
As AI companies face increasing pressure to balance performance and sustainability, the timing of this breakthrough in energy-saving NPU technology is particularly relevant. The current GPU-dominated market has led to supply chain constraints and rising costs, making alternative solutions increasingly attractive. Professor Park noted that this technology “demonstrates the potential for achieving high-performance, low-power infrastructure specifically designed for generative AI, and is expected to play a key role not only in AI cloud data centers but also in AI transformation (AX) environments represented by dynamic, executable AI (such as agent AI).”
This research from South Korea marks an important step towards more sustainable AI infrastructure, but its ultimate impact will depend on how effectively it can be scaled and deployed in commercial settings. As the AI industry continues to grapple with energy consumption issues, innovations like KAIST’s energy-saving NPU technology bring hope for a sustainable future in AI computing.

Previous Recommendations


【观点】Former UN Deputy Secretary-General Solheim: China’s Green “Overcapacity” is the Climate Solution the World Awaits

【图说】Carbon Emissions by Country as a Percentage of Global Total: China 26%, USA 12%

【存照】Trump Issues Executive Order to Revitalize America’s “Beautiful Clean Coal Industry” to Ensure Data Center Power Supply

【图说】Comparing AI Models: Which is Stronger, China or the USA?

【图说】Extreme Heat Waves are Becoming the New Normal Worldwide
ScantoFollowUs

InternationalEnergyDataStatistics