Understanding Global AI Chips: A Comprehensive Guide to 'xPU'

Author’s Introduction

Tan Honghe, a senior IC engineer. He graduated with a PhD from Tsinghua University and has been engaged in digital integrated circuit development for many years. From DSP, ASIP to low-power implementation of specific encryption and decryption algorithm ASICs, from high-performance design of audio and video coding and decoding to efficient implementation in computer vision and speech recognition, he has gradually entered the AI field. He is currently a senior IC engineer at Horizon Robotics, deeply involved in the implementation of AI algorithms on chips.

This article contains 11,500 words, aimed at readers interested in the AI chip industry who want to quickly understand relevant companies and products (not limited to chip engineers), systematically reviewing the development and landscape of global AI chips, hoping everyone can gain something. At the same time, the author intends to pay tribute through this article to all the start-ups in the AI chip field and the chip engineers working together in AI chip start-ups, moving forward together!

With the explosive popularity of AI concepts worldwide, companies producing AI chips are emerging one after another. In order to allow the market and audience to remember their products, each company has put some effort into chip naming, aiming for uniqueness, compatibility with company products, ease of pronunciation, and memorability. Interestingly, many have adopted the naming convention of “xPU”.

This article will review the various AI chips currently named “xPU”, as well as the various “xPU” abbreviations in the chip industry, for the enjoyment of onlookers and for reference for future naming. In addition to the “xPU” naming convention, this article also expands on some “xxP” ways of naming chips or IP as Processors. Additionally, some alternative naming options for xPU have been brainstormed, marked with underscores, and readers are welcome to brainstorm together. Companies looking to make strides in AI chips should quickly grab a letter.

APU

Accelerated Processing Unit. Currently, no AI company has named its processor APU, as AMD has long used the name APU. APU is a processor brand from AMD. AMD integrates traditional CPU and graphics processor GPU on a single chip, eliminating the need for a northbridge on the motherboard, allowing tasks to be flexibly distributed between the CPU and GPU. AMD refers to this heterogeneous structure as an Accelerated Processing Unit, or APU.

Audio Processing Unit. The audio processor, as the name suggests, is a dedicated processor for processing audio data. There are many chip manufacturers producing APU. It is present in sound cards.

BPU

Brain Processing Unit. Horizon Robotics has named its AI chip BPU. Horizon is a start-up founded in 2015, headquartered in Beijing, with the goal of being a “global leader in embedded artificial intelligence.” Horizon’s chips will be directly applied to its main products, including: smart driving, smart living, and smart cities. The name of Horizon Robotics can easily be misunderstood as being related to “robots,” but that is not the case. Horizon is not working on the “machine” part; it is working on the “human” part, creating the “brain” of artificial intelligence, hence its processor is named BPU. Compared to other AI chip start-ups domestically and internationally, Horizon’s first-generation BPU is relatively conservative, using TSMC’s 40nm process. BPU has already been registered as a trademark by Horizon, so other companies should not attempt to use the BPU name.

Biological Processing Unit. A slogan “The 21st century is the century of biology” has lured countless aspiring youths into the biological field. In fact, this statement should be understood in this way: the progress of biology will promote the development of other disciplines in the 21st century. For example, research results on the human brain’s nervous system will promote the development of the AI field, with the SNN structure simulating human brain neurons. Regardless, as time goes on, the pit will eventually be filled. It is unknown when the biological processor will undergo significant development.

Bio-Recognition Processing Unit. Biometric recognition is no longer just theoretical. Fingerprint recognition has become a standard feature in recent smartphones, and the black technology of iris recognition from movies has also made its way into phones, while voiceprint recognition can now be used for payment… However, aside from fingerprint recognition, which has dedicated ASIC chips, other biometric recognitions are mostly sensor plus general CPU/DSP solutions. Regardless, these chips have not occupied the valuable position of BPU or BRPU.

CPU

There is no need to elaborate on CPU; no AI company will name its processor CPU. However, CPU and AI processors are not mutually exclusive.

First, many companies’ AI processors still use CPUs for control scheduling. For example, wave computing uses Andes’ CPU core; Mobileye uses several MIPS’ CPU cores; some domestic AI chip companies use ARM’s CPU cores.

Furthermore, in existing mobile market APs, integrating one or two AI accelerator IPs (e.g., DSP for visual applications, see VPU section) alongside the CPU is also a trend. For instance, Huawei has recently promoted its Kirin 970, which integrates an AI accelerator.

Another trend is that companies producing high-performance computing CPUs are also eager to ride the AI wave. For example,

Adapteva is a company producing multi-core MIMD structure processors. The Epiphany V, taped out in 2016, integrates 1024 cores. Compared to previous versions, specific instructions have been added for deep learning and encryption.
kalrayinc is a company producing multi-core parallel processors, providing solutions for data centers and autonomous driving. It recently announced plans for its third-generation MPPA processor “Coolidge” and raised $26 million. The plan is to adopt 16nm FinFET technology, integrating 80-160 kalray 64-bit cores and 80-160 co-processors for machine vision processing and deep learning computing.

DPU

D is the first letter of Deep Learning, and naming AI chips starting with Deep Learning is a natural idea.

Deep-Learning Processing Unit. DPU is not a proprietary term of any company. In academia, Deep Learning Processing Unit (or processor) is often mentioned. For instance, a session theme added in ISSCC 2017 was Deep Learning Processor. Companies targeting DPU include:

Deephi Tech (深鉴). Deephi is a start-up located in Beijing with a strong Tsinghua background. Deephi refers to its FPGA-based neural network processor as DPU. So far, Deephi has publicly released two DPUs: Aristotle architecture and Cartesian architecture, targeting CNN and DNN/RNN respectively. Although Deephi claims to develop FPGA-based processors, information available through public channels and non-public industry communications indicates that it has indeed been working on chip development.。

TensTorrent. A start-up located in Toronto, developing high-performance processors designed specifically for deep learning and smart hardware, with technical personnel from NVIDIA and AMD.

Deep Learning Unit. Fujitsu recently announced its AI chip, named DLU. Although the name is not very creative, it can be seen that DLU has already been marked as “TM” by Fujitsu, although TM is not much use. In the information released, it can be seen that DLU’s ISA is redesigned, containing numerous small DPUs (Deep Learning Processing Units) and several large master cores (controlling multiple DPUs and memory access). Each DPU contains 16 DPEs (Deep-Learning Processing Elements), with a total of 128 execution units to perform SIMD instructions. Fujitsu expects to launch DLU within the 2018 fiscal year.

Deep Learning Accelerator. NVIDIA announced that this DLA will be open-sourced, bringing significant waves to the industry. Everyone is speculating what the open-sourced DLA will bring to other AI companies. Refer to this article: – “From NVIDIA’s Open Source Deep Learning Accelerator”

Dataflow Processing Unit. Founded in 2010, wave computing calls its developed deep learning acceleration processor Dataflow Processing Unit (DPU), applied in data centers. Wave’s DPU integrates 1024 clusters. Each cluster corresponds to a separate fully customized layout, containing 8 arithmetic units and 16 PEs. The PEs are implemented using asynchronous logic design without a clock signal, driven by data flow, which is why it is called a Dataflow Processor. Using TSMC 16nm FinFET technology, the DPU die area is approximately 400mm^2, with an internal single-port SRAM of at least 24MB, power consumption of about 200W, an equivalent frequency of up to 10GHz, and a performance of up to 181TOPS.

Data-storage Processing Unit. Shenzhen Dapu Microelectronics develops SSD main control chips. The SSD main control is also a large market, with many domestic companies striving in this direction.

Digital Signal Processor. People in the chip industry are familiar with DSP, and many companies design DSPs, such as TI, Qualcomm, CEVA, Tensilica, ADI, Freescale, etc. Compared to CPUs, DSPs enhance digital computing performance by increasing instruction parallelism through technologies like SIMD, VLIW, SuperScalar, etc. In the face of new computational methods (e.g., CNN, DNN, etc.) in the AI field, DSP companies are also busy transforming their DSPs, launching chip series that support neural network computing. In the following VPU section, we will introduce DSPs targeted at Vision applications. Like CPUs, DSP technology has long been dominated by foreign companies, but there are also diligent domestic research institutions working hard in this direction, such as Tsinghua University’s Microelectronics Institute’s Lily DSP (VLIW architecture, with an independent compiler), and the National University of Defense Technology’s YHFT-QDSP and Matrix 2000. However, there is also the infamous “Hanxin”.

EPU

Emotion Processing Unit. Emoshape has not just launched EPU in the past two years, claiming to be the world’s first emotion synthesis engine that allows robots to have emotions. However, from official sources, the EPU itself is not complex and does not require extensive neural network calculations, being based on MCU chips. Combined with application APIs and cloud-enhanced learning algorithms, EPU can enable machines to emotionally understand what they read or see. Coupled with natural language generation (NLG) and WaveNet technology, machines can express various emotional states in a personalized manner. For example, a Kindle that can read aloud will express different emotional states based on the content it reads.

FPU

Let’s first talk about a commonly used abbreviation for FPU: Floating Point Unit. Floating Point Unit, no need for further explanation. Now high-performance CPUs, DSPs, and GPUs all integrate FPUs for floating-point calculations.

Force Processing Unit. The force processor helps you become a Jedi. Cool!

GPU

Graphics Processing Unit. The graphics processor originally had the largest demand from the PC market for various games’ graphics processing. However, with the upgrade of mobile devices, it has gradually developed on the mobile side.

NVIDIA. Speaking of GPUs, there is no doubt that the current leader is NVIDIA. Founded in 1993, this chip company has been dedicated to designing various GPUs: the GeForce series for personal and gaming users, the Quadro series for professional workstations, and the Tesla series for servers and high-performance computing. With the development of AI, NVIDIA has continuously made strides in AI applications, launching the DRIVE series for autonomous driving and the VOLTA architecture specifically designed for AI. It is worth mentioning that VOLTA, released by NVIDIA in May this year, uses TSMC’s 12nm process and has an area of 815mm^2, claiming that related R&D costs reached $3 billion. Thanks to its dominance in the AI field, NVIDIA’s stock price has surged by 300% over the past year. Lastly, don’t forget that NVIDIA also has mobile processors integrated with GeForce GPUs in the Tegra series.
AMD. In recent years, NVIDIA’s popularity has almost made everyone forget about AMD. AMD is a very old chip company in the industry, established in 1969, long before NVIDIA. AMD’s most famous GPU brand, Radeon, comes from its acquisition of ATI for $5.4 billion in 2006 (revealing my age, my first PC’s graphics card was ATI’s). The first term in this article, APU, is a product from AMD. AMD’s newly released MI series GPUs target AI.

In the mobile market, three companies dominate the GPU market, but this does not prevent new competitors from entering.

ARM’s Mali. Mali is not ARM’s self-created GPU brand; it comes from ARM’s acquisition of Falanx in 2006. Falanx’s initial GPU was aimed at the PC market but could not compete with NVIDIA and ATI, so it turned to the mobile market; and the initial name of Falanx’s GPU was not Mali, but Maliak, which was changed to Mali for memorability, coming from Romanian, meaning small, not the Super Mario we are familiar with.
Imagination’s PowerVR. The main customer is Apple, so it mainly focuses on supporting Apple, with insufficient support for other customers. However, Apple suddenly announced that it would abandon PVR for self-research, which dealt a significant blow to Imagination, causing its stock price to plummet by 60%. Imagination is now seeking a complete sale, but the US may not approve.
Qualcomm’s Adreno. The technology comes from AMD’s acquisition of ATI and the sale of the mobile GPU brand Imageon. Interestingly, the name is derived from ATI’s well-known GPU brand Radeon;
VeriSilicon’s Vivante. Vivante (图芯) is a chip company founded in 2004 mainly focused on embedded GPUs, acquired by VSI in 2015. Vivante’s market share is relatively low. Here’s a little gossip: Vivante’s founder is Dai Weijin, and VSI’s founder is Dai Weimin, so to summarize this acquisition: the older brother of the Dai family acquired the younger brother of the Dai family. Oh, and there’s also a third sister, Dai Weili, who founded a company with a more resonant name: Marvell.
Samsung’s… Oh, Samsung does not have its own GPU. As an IDM giant, Samsung has always felt uneasy about not having its own GPU. Samsung has also announced plans to develop its own mobile GPU chip, but that will not happen until 2020.

Let’s briefly supplement two domestic companies developing GPUs:

Shanghai Zhaoxin. Zhaoxin was separated from VIA (威盛). In 2016, Zhaoxin released a GPU chip ZX-2000 for the mobile end, and the name is a bit simple and direct. The main technology comes from VIA’s licensing, and the GPU core technology comes from the acquisition of American S3 Graphics.
Changsha Jingjia Microelectronics. In 2014, it launched a GPU chip JM5400. This is a company with a background from the National University of Defense Technology, and its chips are mainly used in military aircraft and Shenzhou spacecraft.

Graph Streaming Processor. This is an abbreviation proposed by ThinCI (meaning think-eye). ThinCI is a start-up dedicated to creating deep learning and computer vision chips, founded in 2010 by four former Intel employees, headquartered in Sacramento, with research personnel in India. ThinCI’s vision chip targets autonomous driving applications, with investment from the world’s top automotive parts supplier, DENSO. At the recently concluded hotchip conference, ThinCI introduced their GSP (thus the author moved ThinCI from the VPU section to here), using various structural technologies to achieve task-level, thread-level, data-level, and instruction-level parallelism. GSP uses TSMC 28nm HPC+ technology, with an expected power consumption of 2.5W.

HPU

Holographic Processing Unit. The holographic processor was specially developed by Microsoft for its Hololens application. The first-generation HPU uses 28nm HPC technology, utilizing 24 Tensilica DSPs with customized extensions. HPU supports 5 camera inputs, 1 depth sensor, and 1 motion sensor. Microsoft recently announced some information about HPU2 at CVPR 2017. HPU2 will be equipped with a co-processor that supports DNN, specifically for running various deep learning tasks locally. It is worth noting that HPU is a chip designed for specific applications, and this product development approach can be learned from. It is said that Microsoft evaluated Movidius (see the VPU section) chips but found them unable to meet the performance, power consumption, and latency requirements of algorithms, which is why HPU was developed.

IPU

Intelligence Processing Unit. There are two companies that have named their chips IPU.

Graphcore. Graphcore’s IPU is designed specifically for graph computations. To elaborate, Graphcore believes that graphs are a very natural representation of knowledge models and corresponding algorithms, making graphs the foundational representation method for machine intelligence, applicable to both neural networks and Bayesian networks, as well as potentially new models and algorithms that may emerge in the future. Graphcore’s IPU has been quite mysterious, with details only recently released. For example: 16nm, homogenous multi-core (>1000) architecture, supporting both training and inference, using a large amount of on-chip SRAM, with performance exceeding that of Volta GPUs and TPU2, with product release expected by the end of 2017.

Mythic. Another start-up that recently raised $9.3 million, Mythic also mentioned IPU: “Mythic’s intelligence processing unit (IPU) adds best-in-class intelligence to any device.” Compared to the currently popular digital circuit platform solutions, Mythic claims it can reduce power consumption to 1/50. The reason for this confidence is that they use a “processing in memory” structure. More can be elaborated on Processing in Memory in a separate article; for those interested, you can google “UCSB Xie Yuan” to start learning about it.

Image Cognition Processor. The image cognition processor ICP, developed by the Canadian company CogniVue, is used for visual processing and image cognition. To digress, CogniVue was initially an IP supplier for Freescale, later acquired by Freescale in 2015 to further strengthen the integration development of ADAS chips; subsequently, Freescale was acquired by NXP for $11.8 billion; and not to be outdone, Qualcomm acquired NXP for nearly $40 billion. Currently, two ICP IPs are integrated into the S32V series of NXP’s ADAS SOC chips.

Image Processing Unit. Some SOC chips refer to the module that processes static images as IPU. However, IPU is not a commonly used abbreviation; the more common abbreviation for processors handling image signals is the ISP below.

Image Signal Processor . This is not a small topic. The function of an ISP, simply put, is to process the output signals from cameras and other imaging devices to achieve functions like noise reduction, demosaicing, HDR, and color management. It used to be standard in various digital cameras and DSLRs. Canon, Nikon, Sony, and almost every company you can think of that produces digital cameras has its own ISP. With the advent of the smartphone photography era, people’s requirements for photography and videography have increased, making ISPs essential. Back to the AI field, camera-collected image data must first be processed by the ISP before being further analyzed, recognized, classified, and tracked by visual algorithms (running on CPUs, GPUs, or ASIC accelerators). Perhaps, with the development of AI technology, some operations of ISPs will be unified under end-to-end visual algorithms.

JPU

Please forgive my limited vocabulary; I have no novel ideas…

KPU

Knowledge Processing Unit. Canaan claims that it will release its AI chip KPU in 2017. Canaan intends to integrate artificial neural networks and high-performance processors into a single KPU chip, primarily providing heterogeneous, real-time, offline artificial intelligence application services. This is another company with deep pockets expanding into the AI field. As a company producing mining machines (self-proclaimed blockchain-specific chips) and mining machines, Canaan has raised nearly 300 million RMB in funding, with a valuation of nearly 3.3 billion RMB. It is said that Canaan will soon launch a share reform and promote an IPO.

Additionally, the term Knowledge Processing Unit was not first proposed by Canaan; it has already been discussed in papers and books for over ten years. However, Canaan has now registered the KPU trademark.

LPU

Who can give me some inspiration?

MPU

Micro Processing Unit. MPU, CPU, MCU, these three concepts are quite similar; just know that.

Mind Processing Unit. Mind processing unit sounds nice. “Interpreting brain waves,” “mind communication,” an eternal sci-fi topic. If a large amount of human “thought” brainwave data is collected through deep learning, combined with a powerful mind processing unit MPU, will it become a mind-reader? If ethically unacceptable, can we at least understand the “thoughts” of our pets? Further, will it evolve from mind-reader to mind-writer, and after continuous upgrades, can it even become a Skinchanger from Game of Thrones?

Mobile Processing Unit. Mobile processor seems meaningless.

Motion Processing Unit. Motion processor. Analyzing the muscle movements of humans and animals?

By the way, not all xPUs are processors; for example, MPU is an abbreviation for Memory Protection Unit, which is a module equipped with memory area protection functions in ARM cores.

NPU

Neural-Network Processing Unit. Similar to GPUs, Neural Network Processors (NPU) have become a generic term rather than a proprietary abbreviation of any company. Due to the differences in types and computational loads of neural network calculations compared to traditional calculations, NPUs have been designed specifically to meet the needs for NN computation, as traditional CPUs, DSPs, and even GPUs have shortcomings in terms of computing power, performance, and energy efficiency. Here are a few companies that have released products under the name NPU, and some academic neural network accelerators.

Vimicro‘s Star Intelligent No. 1. Vimicro was the first to release the “Star Intelligent No. 1” NPU in 2016. However, this is not a processor specifically developed to accelerate Neural Networks. It is well-known in the industry that it integrates multiple DSP cores (referred to as NPU core) internally to support CNNs and DNNs through the scheduling of SIMD instructions. By this logic, it seems that many chips can be called NPU, while other SOC chips based on DSP as computing cores have been relatively conservative in naming and marketing.

Kneron. This is a start-up company located in San Diego, developing deep learning IP for IoT applications. Kneron’s NPU implements 39-layer CNN, with a power consumption of 0.3W at 28nm, achieving an energy efficiency of 200GFLOPs/W. Another energy efficiency data provided on its homepage is 600GOPs/W. Additionally, Kneron is also developing cloud hardware IP on FPGA. Reliable sources indicate that Kneron plans to establish a research and development department in mainland China, with locations in Beijing, Shanghai, and Shenzhen.

VeriSilicon (芯原)’s VIP8000. VSI was established in 2001. This year, VSI released the VIP8000 Neural Network Processor IP under the name of neural network processor IP. According to the released information “VeriSilicon’s Vivante VIP8000 Neural Network Processor IP Delivers Over 3 Tera MACs Per Second,” this chip does not use its DSP core but integrates its GPU core acquired in 2015. According to VSI, the VIP8000 under 16nm FinFET technology has a computing power exceeding 3 TMAC/s, with energy efficiency higher than 1.5 GMAC/s/mW.

DNPU. Deep Neural-Network Processing Unit. DNPU comes from an article published by KAIST at ISSCC 2017. I consider DNPU a synonym for NPU, as there are no chips currently available in the industry that support only “non-deep” neural networks. For more information on DNPU, refer to “From ISSCC Deep Learning Processor Papers to Face Recognition Products.”
Eyeriss. MIT’s neural network project designed for high energy-efficient computational acceleration targeting CNNs.
Thinker. A reconfigurable multi-modal neural computing chip designed by Tsinghua Microelectronics Institute, balancing resources between CNNs and RNNs in terms of computation and bandwidth.

Neural/Neuromorphic Processing Unit. Neural/Neuromorphic processors differ from the neural network processors mentioned above. Moreover, they are generally not referred to as “processors” but are more often called “neuromorphic chips” or “brain-inspired chips.” These AI chips do not compute using CNNs, DNNs, etc., but rather use a structure more similar to the composition of brain neurons, such as SNN (Spiking Neural Network). Here are a few examples, though they are not named in the “xPU” format.

Qualcomm’s Zeroth. A few years ago, Qualcomm defined Zeroth as an NPU, which could conveniently implement SNN calculations with software. However, the NPU seems to have disappeared; now only the similarly named machine learning engine Zeroth SDK remains.
IBM’s TrueNorth. IBM announced TrueNorth in 2014. It integrates 4096 parallel cores, each containing 256 programmable neurons, totaling 1 million neurons. Each neuron has 256 synapses, making a total of 256 million. TrueNorth uses Samsung’s 28nm process, with a total of 5.4 billion transistors.
BrainChip’s SNAP (Spiking Neuron Adaptive Processor). It has already found applications in casinos.
GeneralVision’s CM1K, NM500 chip, and NeuroMem IP. This company’s CM1K chip has 1k neurons, each corresponding to 256 bytes of storage. Although it cannot be compared with the powerful TrueNorth, it already has customer applications. They also provide a BrainCard, which has an FPGA and can connect directly to Arduino and Raspberry Pi.
Knowm. This start-up is developing AI chips based on memristor technology for “processing in memory.” However, unlike the previously mentioned Mythic (in the IPU section), Knowm is making brain-like chips. The key technology used by Knowm is a type of memory called thermodynamic memory (kT-RAM), developed based on the AHaH theory (Anti-Hebbian and Hebbian).
Koniku. A start-up established in 2014, aiming to use biological neurons for computation, “Biological neurons on a chip.” Their homepage is under countdown, possibly to announce important progress, so stay tuned.

OPU

Optical-Flow Processing Unit. Optical flow processors. Is there a need for a dedicated chip to implement optical flow algorithms? I don’t know, but using ASIC IP for acceleration should be necessary.

PPU

Physical Processing Unit. The physical processor. To explain physical computation, one must understand what a physical processor does. Physical computation simulates how an object should conform to physical laws in the real world. Specifically, it makes objects in virtual worlds move according to the physical laws of the real world, making object behaviors in games more realistic, such as cloth simulation, hair simulation, collision detection, fluid dynamics simulation, etc. Several companies have developed physical computation engines, using CPUs to complete physical computations across various platforms. However, Ageia should be the only one using dedicated chips to accelerate physical computations. Ageia released the PPU chip PhysX in 2006 and also released physics acceleration cards based on PPU, providing SDKs for game developers. In 2008, it was acquired by NVIDIA, and the PhysX acceleration card product was gradually phased out; now, the physical computation acceleration function is implemented by NVIDIA’s GPUs, and the PhysX SDK has been redesigned by NVIDIA.

QPU. Quantum processor. Quantum computers are also a hot research direction in recent years. The author admits to knowing very little about this area. One can pay attention to D-Wave Systems, a company founded in 1999. D-Wave can roughly double the number of quantum bits on its QPU every two years.

RPU

Resistive Processing Unit. The resistive processing unit RPU is a concept proposed by researchers at IBM Watson Research Center. It is indeed a processing unit rather than a processor. The RPU can achieve simultaneous storage and computation. Using RPU arrays, IBM researchers can achieve performance of 80TOPS/s/W.

Ray-tracing Processing Unit. The ray tracing processor. Ray tracing is a rendering algorithm in computer graphics, and RPU is an accelerator developed to speed up the data calculations involved. Currently, these computations are handled by GPUs.

SPU

Streaming Processing Unit. The concept of streaming processors is relatively old, used for processing video data streams, initially appearing in the structure of graphics card chips. One could say that GPUs are a type of streaming processor. There was even a company called “Streaming Processor Inc.” founded in 2004, which closed in 2009 when its founder and chairman was poached by NVIDIA to become chief scientist.

Speech-Recognition Processing Unit. Speech recognition processor, SPU or SRPU. This abbreviation has not yet been used by any company. Currently, speech recognition and semantic understanding are mainly implemented in the cloud, such as iFlytek. iFlytek recently launched a translation machine that can send voice back to the cloud for real-time translation; I haven’t delved into the internal hardware.

Qiying Tailun (Chipintelli). Founded in November 2015 in Chengdu, this company’s CI1006 is a chip that integrates neural network acceleration hardware for speech recognition, achieving large vocabulary recognition offline on a single chip.

MIT Project. Earlier this year, media reported on a black technology chip from MIT, which is actually the chip in a paper published by MIT at ISSCC 2017, capable of achieving offline recognition of k words on a single chip. You can refer to “Analyzing MIT’s Intelligent Speech Recognition Chip.”
UniSound. UniSound is a company specializing in intelligent speech recognition technology, founded in June 2012, headquartered in Beijing. UniSound recently received 300 million RMB in strategic investment, part of which will be used to develop its earlier announced AI chip plan, named “UniOne.” According to official disclosures, UniOne will integrate a DNN processing unit, compatible with multiple microphones and operating systems. The chip will be provided to customers in module form, allowing them to directly own a complete set of cloud-end chip services.

Smart Processing Unit. The smart processor sounds very Q.

Space Processing Unit. The space processor sounds grand. Panoramic imaging, holographic imaging, these are all processing our living space. When facing vast solar systems and galaxies, do we need new, more powerful dedicated processors? Flying to the M31 Andromeda galaxy to fight against the dark side might not be possible with just an x86.

TPU

Tensor Processing Unit. Google’s tensor processor. The two artificial intelligence catalytic events of AlphaGo defeating Lee Sedol in 2016 and AlphaGo defeating Ke Jie in 2017 undoubtedly brought a shock to the chip industry, leading to the emergence and demystification of TPU. Google officially announced TPU2 at the 2017 developer I/O conference, also known as Cloud TPU. Compared to TPU1, TPU2 can be used for both training and inference. TPU1 employs a pulsed array streaming processing structure, and specific details can be referenced in the article “Google TPU Demystified.”

UPU

Universe Processing Unit. The universe processor. Which do you prefer, the Universe Processing Unit or the Space Processing Unit?

VPU

Vision Processing Unit. The vision processor VPU is also likely to become a generic term. As the hottest application area in AI today, the development of computer vision indeed brings unprecedented experiences to users. To handle the enormous computational load encountered in computer vision applications, many companies are designing dedicated VPUs.

Movidius (acquired by Intel). Movidius was founded in 2006 and is headquartered in San Mateo, Silicon Valley, founded by two Irishmen, so it has a branch in Ireland. Movidius initially focused on converting old movies into 3D films, later developing chips for 3D rendering and applying them to computer vision applications (this indicates: 1, the chip industry is a high-tech, high-barrier, high-value industry; 2, start-ups need to adjust their strategies as they grow). Movidius’s Myriad series VPU is specifically optimized for computer vision and can be used for cutting-edge computer vision applications such as 3D scanning and modeling, indoor navigation, and 360° panoramic video. For example, in 2014, Google’s Project Tango used Myriad 1 to help create indoor 3D maps; in 2016, DJI’s “Phantom 4” and “Mavic” both adopted Movidius’s Myriad 2 chip. The Myriad2, using TSMC’s 28nm process, integrates 12 vector processors SHAVE (Streaming Hybrid Architecture Vector Engine). According to Movidius, SHAVE is a hybrid streaming processor that integrates the advantages of GPU, DSP, and RISC, supporting 8/16/32 bit fixed-point and 16/32 bit floating-point calculations, and hardware support for sparse data structures. Additionally, the Myriad2 contains two RISC cores and a video hardware accelerator. It is said that Myriad2 can simultaneously process multiple video streams. Recently, on August 28, Movidius announced the launch of the next-generation VPU: Myriad X. Compared to the previous generation Myriad2, Myriad X will integrate a DNN accelerator: a Neural Compute Engine, supporting both floating-point 16bit and fixed-point 8bit. It is claimed that the DNN inference throughput can reach 1TOPS, while the theoretical computing capacity can exceed 4+ TOPS. Myriad X has four 128-bit VLIW vector processors and supports the latest LPDDR4, as well as 4K hardware encoding, USB3.1, and PCIe3.0. It uses TSMC’s 16nm process.

Inuitive. An Israeli company providing 3D imaging and vision processing solutions for AR/VR, drones, and other application scenarios. The next-generation vision processor NU4000 from Inuitive adopts a 28nm process, choosing to use CEVA’s XM4 DSP, and integrates deep learning processors (self-developed? Or purchased IP?) and depth processing engines among other hardware accelerators.

DeepVision. A start-up headquartered in Palo Alto, designing and developing low-power VPUs for embedded devices to support deep learning, CNN, and traditional vision algorithms, while providing real-time processing software.

Visual Processing Unit. Here it refers to visual, not vision. Initially, ATI referred to the chips on its graphics cards as VPUs, but later they all changed to GPUs.

Video Processing Unit. Video processor. Processes dynamic video rather than static images, such as real-time encoding and decoding.

Vector Processing Unit. Vector processor. Scalar processors, vector processors, and tensor processors are classified based on the type of data processed by the processor. Modern CPUs are no longer purely scalar processors; many CPUs now integrate vector instructions, the most typical being SIMD. Vector processors play an important role in supercomputers and high-performance computing. Developing dedicated chips for AI based on vector processors is also an option for many companies. For example, the previously mentioned Movidius’s Myriad2 contains 12 vector processors.

Vision DSP. Targeted at computer vision applications in AI, various DSP companies have released vision series DSP IP. A brief list is as follows:

CEVA’s XM4, latest XM6 DSP. In addition to connecting to its hardware accelerator HWA (CEVA Deep Neural Network Hardware Accelerator), it can also support third-party developed HWAs. The previously mentioned Inuitive used XM4. You can refer to “Machine Learning Solutions from Processor IP Vendors – CEVA.”
Tensilica (acquired by Cadence for $380 million in 2013) P5, P6, and the latest C5 DSP. One of the biggest features is that it allows instruction customization using TIE language. The DSP from Microsoft’s HPU uses their DSP. You can refer to “The Neural Network DSP Core is Finally Complete.”
Synopsys’ EV5x and EV6x series DSPs. You can refer to “Machine Learning Solutions from Processor IP Vendors – Synopsys.”
Videantis’ v-MP4 series. Videantis, founded in 1997 and headquartered in Hanover, Germany, although capable of performing many machine vision tasks, is still a traditional DSP enhancement design and has not been specifically designed for neural networks.

WPU

Wearable Processing Unit. An Indian company, Ineda Systems, made a big splash in 2014 by announcing their WPU concept for the IoT market, receiving investment from Qualcomm and Samsung. Ineda’s Dhanush WPU is divided into four levels, adapting to the computational needs of wearable devices from ordinary to high-end, aiming to allow wearable devices to achieve 30 days of continuous battery life and reduce energy consumption by 10 times. However, everything seems to have come to a halt in 2015, with no further news. Only at the bottom of their homepage does it state that Ineda has registered the WPU trademark. Information on WPU is limited to a general structure, and there is a US patent.

Wisdom Processing Unit. The wisdom processor sounds quite grand; feel free to use it, no thanks needed. However, it has a bit of a “brain platinum” flavor.

XPU

Why not just call it XPU, where X can represent the unknown, and anything is possible, similar to X Man, X File, SpaceX.

Just as this article is about to conclude, I learned that at this year’s hotchip conference, Baidu publicly disclosed the name of its FPGA Accelerator, called XPU. However, this X refers to Xilinx. No specific details are available yet, so let’s wait and see.

YPU

Y? I have no ideas and need help from readers.

ZPU

Zylin CPU. The CPU name of the Norwegian company Zylin. To have a flexible microprocessor on resource-limited FPGAs, Zylin developed ZPU. ZPU is a stack machine, meaning it has no operands in its instructions, resulting in a small code size, and it is supported by the GCC toolchain, being referred to as “The world’s smallest 32-bit CPU with GCC toolchain.” Zylin open-sourced ZPU on opencores in 2008. An organization has also modified the Arduino development environment for ZPU.

Other non-xPU AI chips

Cambricon Technology. Cambricon, with a background from the Chinese Academy of Sciences, has not used the xPU naming convention for its processors. Media articles refer to it as either a deep learning processor DPU or a neural network processor NPU. The DianNao series chip architecture developed by the Chen brothers has won several best paper awards at major conferences, laying a technical foundation for the establishment of their company. The Cambricon-X instruction set is one of its major features. Currently, its chip IP has been widely licensed for integration into mobile phones, security, wearable devices, and other terminal chips. It is rumored that they secured orders worth 100 million RMB in 2016. In some special fields, Cambricon’s chips will have absolute market share in China. Recent reports indicate that Cambricon has raised $100 million.

Intel. Intel’s setbacks in the smartphone chip market have prompted it to take decisive action, aggressively investing in several AI application directions. What are these decisive actions? Simply put, three words: buy, buy, buy. In the data center/cloud computing sector, Intel spent $16.7 billion to acquire Altera and $400 million for Nervana; in the mobile sector for drones and security monitoring, it acquired Movidius (the acquisition amount was not disclosed). In the ADAS sector, it acquired Mobileye for $15.3 billion. Movidius was introduced in the VPU section earlier; here we will add details about Nervana and Mobileye (which is based on visual technology for ADAS, not merely a vision processor, hence not included in the VPU section).

Nervana. Founded in 2014 and headquartered in San Diego, Nervana’s main business is providing a full-stack AI software platform called Nervana Cloud. In terms of hardware, Nervana Cloud supports backend hardware such as CPUs, GPUs, and even Xeon Phi, and also provides a custom Nervana Engine hardware architecture. According to reports from The Next Platform, “Deep Learning Chip Upstart Takes GPUs to Task,” the Nervana Engine uses TSMC 28nm technology and achieves 55 TOPS. Less than 24 hours after the report was released, Intel acquired it, and all 48 employees were integrated into Intel. Intel has built the Crest Family of chips centered around the Nervana Engine. The project code for the first-generation Nervana Engine is “Lake Crest,” while the second generation is “Knights Crest.” Oh, and the CEO of Nervana previously worked at Qualcomm, overseeing a neural computing research project, which is the Zeroth mentioned earlier.

Mobileye. A company based on computer vision for ADAS, founded in 1999 and headquartered in Jerusalem. Mobileye developed dedicated chips for its ADAS systems, called the EyeQ series. In 2015, Tesla announced it was using Mobileye’s chips (EyeQ3) and solutions. However, in July 2016, Tesla and Mobileye announced they would terminate their cooperation. Subsequently, Mobileye was acquired by Intel for $15.3 billion in 2017 and is now a subsidiary of Intel. Mobileye’s next-generation EyeQ5 will use 7nm FinFET technology, integrating 18 vision processors and adding a hardware safety module to achieve level 5 autonomy in driving.

Bitmain. The custom mining chips designed by Bitmain have outstanding performance, allowing it to make a significant profit. In addition to selling mining chips, Bitmain also mines itself. In summary, Bitmain has extraordinary chip design capabilities and substantial financial resources, aiming to compete with NVIDIA’s high-end GPU chips, boldly starting its AI chip journey with a 16nm process. The chip has been tested for over a month, and the name of this AI chip has been leaked: “Sophon,” derived from the famous novel “The Three-Body Problem,” indicating considerable ambition, and it is believed that it will be officially released soon.

Huawei & HiSilicon. The market has long awaited Huawei’s Kirin 970, which has been publicly known to contain an AI accelerator, and it is rumored to use Cambricon’s IP, just waiting for the autumn conference.

Apple. Apple is developing an AI chip internally referred to as the “Apple Neural Engine.” This news does not surprise anyone; what everyone is curious about is in which iPhone this ANE will be used.

Qualcomm. Besides maintaining its software platform based on Zeroth, Qualcomm has been actively making hardware moves. While acquiring NXP, Qualcomm has reportedly been collaborating with Yann LeCun and Facebook’s AI team to jointly develop a new chip for real-time inference.

There are also start-ups like Leapmind and REM, which I won’t list one by one.

Conclusion

AI chips are in a state of numerous competitors, with opportunities accompanying challenges; today, you compete while tomorrow you may merge. With these “xPUs” constantly emerging, the 26 letters will soon be exhausted. However, looking at it from another perspective, it doesn’t really matter; it’s perfectly fine to come up with a unique name. Alternatively, one could seize the opportunity to lay out the “processing in memory” route and grab names like “xxxRAM” or “xxxMem.”

Finally, I recommend Tsinghua Professor Wang Yu’s laboratory’s NN Accelerator | NICS EFC Lab, which collects data on various publicly available neural network accelerators and visualizes it as shown in the figure.

▼

On the road to building embedded artificial intelligence with “algorithms + chips,”

we urgently need more aspiring youths to join us

to build a world of intelligence together!

Click Read the Original for more details on Horizon’s 2018 campus recruitment!

Understanding Global AI Chips: A Comprehensive Guide to ‘xPU’

Leave a Comment Cancel reply