When it comes to FPGAs, many people might first think of major FPGA manufacturers like Xilinx and Altera (which has been acquired by Intel). However, there are also other very distinctive FPGA manufacturers, such as Achronix, which specializes in hardware acceleration devices based on FPGAs and high-performance embedded FPGA (eFPGA) semiconductor intellectual property (IP).
Thanks to the rapid development of artificial intelligence/machine learning in recent years, new algorithms are constantly emerging, driving the rapid growth of the programmable FPGA market. According to market research firm Semico Research, the market size of FPGAs in AI applications is expected to triple in the next four years, reaching $5.2 billion.
According to the latest IP market analysis report released by market research agency IP Nest, Achronix was the fastest-growing IP provider globally in 2018, with a year-on-year growth rate of 250%, highlighting the rapid growth of Achronix’s business.
To further meet the growing demand for artificial intelligence/machine learning (AI/ML) and high-bandwidth data acceleration applications, Achronix launched its innovative new FPGA product line—the Speedster 7t series—in May this year.
A New Architecture: The Perfect Combination of ASIC and FPGA
We all know that for AI acceleration, compared to common chips like CPUs and GPUs, as well as programmable FPGAs, ASIC chips’ computing power and efficiency can be customized according to the specific needs of algorithms. This allows ASICs to achieve advantages such as small size, low power consumption, high reliability, strong confidentiality, high computing performance, and high efficiency. Therefore, in their targeted specific application fields, ASIC chips outperform CPUs, GPUs, and programmable FPGAs in terms of energy efficiency.
However, as mentioned earlier, AI algorithms are still in a phase of rapid updates and iterations, with increasing options for numerical precision. At the same time, as AI application scenarios rapidly evolve, new solutions must address different demands in high performance, flexibility, and time-to-market.
ASICs are designed for specific algorithm acceleration, which makes them less flexible than FPGAs that can quickly adapt to new software algorithms through programming. However, FPGAs are not as advantageous as ASICs in terms of size, energy efficiency, and cost. So, is there a product that can effectively combine the advantages of both FPGAs and ASICs? Achronix’s Speedster 7t series may be such a product.
Achronix claims that the Speedster 7t series is based on a highly optimized new architecture that far surpasses traditional FPGA solutions with ASIC-like performance, simplified design flexibility, and enhanced functionality.
▲Achronix CEO Robert Blake
Robert Blake, President and CEO of Achronix Semiconductor, stated: “Speedster 7t is the most exciting release in Achronix’s history, representing innovation and accumulation built on four generations of hardware and software development, as well as close collaboration with our leading customers. Speedster 7t is a fusion of flexible FPGA technology and ASIC core efficiency, providing a new category of ‘FPGA+’ chips that significantly enhance the limits of high-performance technology.”
A Detailed Explanation of Speedster 7t FPGA Series
According to Achronix, the Speedster 7t FPGA series products are designed for high-bandwidth applications, featuring a revolutionary new 2D on-chip network (2D NoC, Network on Chip) and a high-density new machine learning processor (MLP) module array. By perfectly combining the programmability of FPGAs with the routing structure and computing engine of ASICs, the Speedster 7t series creates a new class of “FPGA +” technology.
At the same time, the Speedster 7t series also includes high-bandwidth GDDR6 interfaces, 400G Ethernet ports, and PCI Express Gen5 interfaces, all of which are interconnected to provide ASIC-level bandwidth while retaining the full programmability of FPGAs.
To handle large amounts of data from multiple high-speed sources while distributing that data to programmable on-chip algorithmic and processing units, and to deliver those results with as low latency as possible, the Speedster 7t devices are manufactured using TSMC’s latest 7nm FinFET process technology.
A New Machine Learning Processor Array
For traditional FPGAs with DSP modules, the AI performance they can provide is relatively limited, as using DSP modules only offers inefficient numerical precision support. Building AI/ML applications with external LUTs and memory requires eliminating extra logic editing and memory resources, and performance is also constrained by FPGA routing limitations.
In contrast, the Speedster 7t FPGA adopts a large-scale parallel array of programmable computing units in the new machine learning processor (MLP), providing the industry’s highest FPGA-based computing density. MLPs are highly configurable, compute-intensive unit modules, with each MAC unit supporting up to 32 multipliers, capable of driving variable precision adders/accumulators, supporting integer formats from 4 to 24 bits and efficient floating-point modes, including support for TensorFlow’s 16-bit format and direct support for boosted block floating-point formats that can double each MLP’s computing engine.
Additionally, each MLP tightly couples memory blocks, including 72K bits of RAM and 2K bits of registers. This computation and storage-level linkage allows MLPs to implement more complex AI algorithms without using FPGA routing resources.
Furthermore, MLPs are also closely adjacent to embedded memory modules, ensuring that data can be transmitted to MLPs at the highest performance of 750 MHz by eliminating delays associated with FPGA routing in traditional designs.
This combination of high-density computing and high-performance data transmission allows the processor logic array to provide the highest available computing capability based on FPGA, measured in tera-operations per second (TOPS).
Ultra-High Throughput Memory Bandwidth and Interfaces
The key to high-performance computing and machine learning systems is high off-chip memory bandwidth, providing storage sources and buffering for multiple data streams. Speedster 7t devices are the only FPGAs that support GDDR6 memory, which is the highest bandwidth external memory device. Each GDDR6 memory controller can support 512 Gbps of bandwidth, and there can be up to 8 GDDR6 controllers in Speedster 7t devices, supporting an aggregate bandwidth of 4 Tbps for GDDR6, providing storage bandwidth equivalent to HBM-based FPGAs at a much lower cost.
“Micron is pleased to partner with Achronix to realize the world’s first FPGA product directly loaded with GDDR6 for high bandwidth memory demands,” said Mal Humphrey, VP of Marketing for Micron’s Computing and Networking Business Unit. “Innovative and scalable solutions like this will drive differentiation in the AI space, where heterogeneous computing options and high-performance memory are essential for accelerating the acquisition of data insights.”
In addition to this ultra-high throughput memory bandwidth, Speedster 7t devices also include the industry’s highest performance interface ports to support extremely high bandwidth data flows. Speedster 7t devices have up to 72 of the highest performance SerDes, achieving speeds from 1 to 112 Gbps. There is also a hardware 400G Ethernet MAC with forward error correction (FEC), supporting configurations of 4x 100G and 8x 50G, as well as hardware PCI Express Gen5 controllers with 8 or 16 channels per controller.
“Achronix’s new Speedster 7t FPGA series is an excellent example of an innovative chip architecture explosion aimed directly at processing large amounts of data for AI applications,” said Rich Wawrzyniak, Chief Market Analyst for ASICs and SoCs at Semico Research. “By integrating mathematical functions, memory, and programmability into its machine learning processors, combined with cross-chip and 2D NoC structures, it forms an excellent method to eliminate bottlenecks and ensure free data flow throughout the device. In AI/ML applications, memory bandwidth is everything, and Achronix’s Speedster 7t provides impressive performance metrics in this area.”
A New 2D On-Chip Network: Providing Ultra-High Efficiency Data Movement
The amount of data from Speedster 7t’s high-speed I/O and memory ports is enormous. The traditional FPGA’s routing capacity, aimed at bit-level programmable interconnect logic arrays, can no longer meet the demand. Therefore, the Speedster 7t architecture provides an innovative, high-bandwidth 2D on-chip network (NoC) that can span across and vertically across the FPGA logic array.
This 2D NoC connects to all high-speed data and memory interfaces of the FPGA. It operates like an aerial highway network superimposed on the city street system of FPGA interconnects, supporting the high-bandwidth communication required between on-chip processing engines. Each row or column in the NoC can be implemented as two 256-bit unidirectional, industry-standard AXI channels, operating at a frequency of 2GHz, providing 512 Gbps data flow in each direction.
By implementing a dedicated 2D NoC in Speedster, high-speed data movement is greatly simplified, ensuring that data streams can be easily directed to any custom processing engine within the entire FPGA architecture. Most importantly, the NoC eliminates congestion and performance bottlenecks that occur in traditional FPGAs when using programmable routing and logic lookup table resources to move data streams throughout the FPGA. This high-performance network not only increases the total bandwidth capacity of the Speedster 7t FPGA but also improves effective LUT capacity while reducing power consumption.
For example, to operate at the frequency required for 400G Ethernet bus bandwidth, the best solution for traditional FPGAs is a bus size of 1024 bits, but it requires a frequency of 724MHz, which is impossible to achieve in traditional FPGAs. Clearly, for any 400G Ethernet bus bandwidth, traditional FPGAs are not fast enough.
In contrast, the Speedster 7t FPGA can achieve this with a 2D NoC at a working frequency of 506MHz using four 256-bit buses.
Security Features for Safety-Critical Applications
The Speedster 7t FPGA series products can respond to threats from third-party attacks with advanced bitstream security protection features. They have multi-layer defense capabilities to protect the confidentiality and integrity of the bitstream. Keys are encrypted based on physical unclonable function (PUF) anti-tamper technology, and the bitstream is encrypted and verified using the 256-bit AES-GCM encryption algorithm. To prevent side-channel attacks, the bitstream is segmented, with each data segment using a separately derived key, and the decryption hardware employs differential power analysis (DPA) countermeasures. Additionally, a 2048-bit RSA public key authentication protocol is used to activate decryption and authentication hardware. Users can be assured that when they load their secure bitstream, it is the expected configuration, as it has been authenticated through RSA public key, AES-GCM private key, and CRC verification.
Four Models of Speedster 7t FPGA Series
The Speedster 7t FPGA series currently consists of four models, with device sizes ranging from 363K to 2.6M 6-input lookup tables (LUT).
In terms of specific performance metrics, Achronix revealed that the strongest model in the Speedster 7t FPGA series, the 7t1500, can achieve up to 8600 images per second in image recognition capability under the ResNet-50 training model at its maximum frequency of 750MHz and 80% utilization, with each MLP block supporting 16×Int8 operations; while under the Yolov2 algorithm, the 7t1500 can achieve 1600 images per second in image recognition capability.
According to Achronix CEO Robert Blake, the ACE design tools supporting all Achronix products are now available, supporting Speedcore eFPGA and Speedchip™ FPGA multi-chip packaging (Chiplet). The first batch of Speedster 7t FPGA series devices and development boards for evaluation will be available in the fourth quarter of 2019.
Conclusion:
From the previous introduction, it is not difficult to see that the Speedster 7t series FPGAs mainly achieve a perfect combination of the programmability of FPGAs and the routing structure and computing engine of ASICs through their innovative 2D on-chip network and high-density new machine learning processor module array. This is also similar to the new ACAP architecture launched by Xilinx last year.
It is worth noting that Achronix is currently the only company that provides both standalone FPGA chips and Speedcore™ embedded FPGA (eFPGA) semiconductor intellectual property (IP). This means that chip design manufacturers can integrate Achronix’s Speedcore™ embedded FPGA (eFPGA) IP into their chip designs through licensed purchases, designing chips that meet their own needs.
Moreover, Achronix employs the same technology used in Speedster 7t FPGA in its Speedcore eFPGA IP, which supports seamless conversion from Speedster 7t FPGA to ASIC. This also means that chip design manufacturers can obtain the latest technology of the Speedster 7t FPGA series through cooperation with Achronix and convert it into ASIC. Achronix CEO Robert Blake stated that this technology is expected to help customers save up to 50% in power consumption and reduce costs by 90%.
Editor: Chip Intelligence – Lang Ke Jian
Related Articles
China warns foreign companies like Microsoft/Samsung/Hynix/Arm of consequences for aiding US sanctions!
The largest semiconductor merger in China has been approved, and Wingtech Technology has successfully acquired Nexperia!
5G commercialization has begun: which segments of the industrial chain can be self-controlled?
X86 new technology licensing meets obstacles, the prospects for domestic X86 chip manufacturers are bleak!
Huawei’s 5G meets obstacles, and Nokia, which has secured 42 5G orders, wants to make a comeback?
OPPO and Xiaomi compete for the “sovereignty” of under-screen camera technology, upstream screen manufacturers remain silent!
Who is giving wings to China’s IC design? Professor Zhou Zucheng tells the story of Tsinghua and Synopsys!
Countermeasures are here! China announces the establishment of an “unreliable entity list” system!
Micron announces suspension of supply to Huawei! What impact will this have?