How Huawei Tamed a Trillion-Parameter Sparse Model? Key Technical Breakthroughs in MOE Training on Ascend NPU
How Huawei Tamed a Trillion-Parameter Sparse Model? Key Technical Breakthroughs in MOE Training on Ascend NPU In the competition of large models, sparse large models represented by Mixture of Experts (MoE) are gradually becoming the new favorites in the AI field due to their outstanding efficiency. Recently, Huawei released a technical report titled “Pangu Ultra … Read more