If we compare the CPU to a Swiss Army knife, the GPU is like a chainsaw—yet in the face of the AI biological revolution, what humanity needs is a weapon akin to neural synapses. The rise of NPU and TPU is not merely a technical iteration but a chip revolution that challenges the very genetic foundation of the von Neumann architecture. Today, we will dissect the underlying logic of this ‘revolutionary change.’ —— 1. The Dilemma of GPU: The Fatal Flaw of a Universal Tool The GPU was once the savior of the AI era, but it is cursed with the flaw of being ‘overly general.’ • Original Sin of Architecture: The ‘democratic voting’ mechanism of CUDA cores forces each pixel calculation to queue for instructions, akin to having 1,000 elementary school students using an abacus to calculate quantum entanglement. • Energy Consumption Black Hole: Training GPT-4 requires tens of thousands of H100 GPUs, consuming as much electricity as a medium-sized city—this clearly violates the first principle of energy conservation. • Edge Killing: A GPU in a smartphone? Sorry, even your body heat might cause it to crash. First Principle Inquiry: When the essence of the task is the brutal aesthetics of matrix multiplication, why pay for legacy code meant for graphics rendering? —— 2. The Counterattack of NPU: Building a ‘Silicon Brain’ with Neuronal Thinking Engineers at Huawei’s Ascend drew inspiration from synaptic plasticity: • Pulsed Arrays: Mimicking the information transmission patterns of the cerebral cortex, allowing data to flow like neural impulses within a 3D Cube structure, improving computational efficiency by 300%. • Dynamic Precision: In Chinese NLP scenarios, intentionally allowing some calculations to ‘make mistakes’—just as the human brain ignores irrelevant details, reducing computational demand by 40%. • Life-and-Death Speed: The energy efficiency of smartphone NPUs is 100 times higher than that of GPUs; tests show that while processing facial recognition, the NPU consumes 0.2W, whereas the GPU requires 20W—equivalent to lifting a truck with the strength of a mosquito’s leg. Case Study: Tesla’s Dojo chip abandoned the GPU architecture, using silicon photonic neural networks to achieve real-time decision-making for autonomous driving, compressing latency from 100ms to 0.5ms—this is a victory of physical laws. —— 3. The Strike of TPU: Google’s ‘Quantum Hegemony’ Trap When TPU v5p replaced traditional copper wires with optical interconnect technology, the entire Silicon Valley trembled: • Tensor Armor: The matrix multiplication units of pulsed arrays stack like Lego blocks to form ultra-large-scale computing clusters, with a single card FP16 computing power of 1840 TFLOPS—equivalent to cramming 100,000 GPUs into a USB drive. • Computing Alchemy: Actively creating ‘computational waste’ through sparse computation, then filtering it with dedicated circuits, reducing training costs by 60%. • Ecological Strangulation: The TensorFlow framework is deeply coupled with TPU, making it difficult for developers to return to the CUDA world once they are on board—this is a dimensionality reduction attack at the operating system level. Shocking Data: Google used TPU clusters to run the BERT model, reducing training time from 3 days on GPU to 4 hours, saving $270,000 in electricity—time is also currency. —— 4. The Ultimate War: Reconstructing Chip DNA with First Principles The underlying logic of this revolution is to break the shackles of Shannon’s theorem: 1. Material Revolution: Tesla introduced gallium nitride quantum dots into silicon carbide chips, allowing electron migration speeds to exceed 1/300 of the speed of light—this is a dimensionality reduction attack of quantum mechanics on classical circuits. 2. Architectural Tyranny: NPUs are beginning to integrate brain-like synaptic caches, while TPUs use photons instead of electrons to transmit signals, leading to the disintegration of GPU’s transistor hegemony. 3. Ecological Reconstruction: When Microsoft integrates TPU into Azure cloud services, and Apple uses NPU for edge-side Stable Diffusion, developers are forced to rethink: what kind of computing paradigm do we really need? —— 5. Future Predictions: The ‘Cambrian Explosion’ of Chips • 2026 Critical Point: NPUs will occupy 70% of the mobile AI chip market, and TPUs will exceed 50% in the supercomputing field—GPUs will become as obsolete as DVD drives. • Emergence of New Species: Neuromorphic chips (like Intel’s Loihi) begin to simulate the pulse neural networks of the human brain, consuming only one ten-thousandth of the energy of traditional chips. • Ultimate Form: The advent of quantum-photon-neural hybrid chips, achieving computing power exceeding 10^28 FLOPS—equivalent to 7 billion people simultaneously using their brains for 100 years. —— The war from GPU to NPU/TPU is essentially a revolution in computational philosophy. As humanity begins to use chips to simulate consciousness and replace software algorithms with physical laws, we will ultimately understand: True intelligence never requires general computation. (Follow me for more hardcore technology insights)