The Evolution of FPGA Software Stack in Programming

The billion-dollar data center market is divided among Altera, Xilinx, and otherFPGAs suppliers. After Intel acquired Altera in June 2015, this market became even more complex.

Before the acquisition in 2014, Altera’s revenue of $1.9 billion had 16% from data center-related computing, networking, and storage businesses, totaling $304 million. Those communication and wireless equipment system manufacturers, who had been deeply involved in this field for over ten years, wanted higher energy efficiency, lower costs, and greater scalability, all areas where FPGAs excel. Another point to mention is that using FPGAs to perform these functions does not require an operating system and corresponding software like CPUs do. This part of the revenue accounted for 44% of Altera’s revenue, totaling $835 million.

Another 22% of Altera’s revenue, or $418 million, came from sectors like industrial control, military equipment, and automotive manufacturing. They face the same dilemma, thus choosing FPGAs to handle some of their workloads.

In fact, as early as 2014, Intel was eyeing the potential market of various types of chips valued at $115 billion. Among them, editable logic devices (mainly FPGAs) accounted for about 4%, ASICs 18%, and the rest a mixed bag of ASSPs.

In the field of editable logic devices, Intel estimated that Altera held 39% of the $4.8 billion market, Xilinx 49%, and the remaining suppliers 12%.

At that time, the reason Intel did not acquire Altera was that the growth rate of the FPGA business was almost as fast as that of its data center group (which provides chips, chipsets, and motherboards for server, storage, and switch manufacturers).

Furthermore, Intel did not do so because the slowing pace of Moore’s Law posed an increasing competitive threat to FPGAs.

In fact, if applied, there is not only one FPGA, GPU, or DSP accelerator installed in the data center, but multiple Xeon CPUs are not needed. Since Intel could not continue to provide more cores and accelerators for Xeons, they concluded that FPGAs could be used as accelerators.

Unless FPGAs can generate $500 million in revenue in the data center or create $1 billion or more in revenue in a few years, Intel would rather sacrifice two to three times the Xeon revenue than relinquish Xeon income.

With Deep Learning, FPGA Has a Bright Future

According to Intel’s forecasts, they plan to increase FPGA business at a nearly linear growth rate from now until 2023. We have always been skeptical about this. However, the FPGA business has grown over time (about 2.5 times compared to 15 years ago).

Intel also expects FPGA revenue to double between 2014 and 2023. According to Intel’s forecast, from 2014 to 2023, the compound annual growth rate will be 7%, and its revenue should be slightly less than the predicted $8.9 billion. Interestingly, Intel’s forecast does not include FPGA revenue shares from data computing centers (servers, switching, and networking), which will change significantly. Let’s analyze:

If Altera and Xilinx’s market shares remain unchanged, and assuming Altera’s revenue in the networking, computing, and storage part stays the same, then Altera’s revenue in this part will reach about $560 million by 2023. We believe Intel’s data underestimates the pressure data centers face in providing more efficient and flexible computing. We believe FPGA’s prospects are far better than this forecast. In other words, many supporters of FPGA technology have long awaited the day when FPGA is legitimized in data centers.

Ironically, Intel itself, as an FPGA programming expert, a user of hardware description languages, and a well-known ASIC manufacturer, has become a major player in promoting FPGAs as an accelerator of choice. Such accelerators can serve as independent discrete computing components or as hybrid CPU-FPGA devices.

This is why since 2016, we have seen all news about Altera indicating that FPGAs will experience massive growth. So at least in the short term, they have little choice but to make others’ FPGA manufacturers’ clothes.

This acquisition is not only a milestone in FPGA development but also Intel’s acknowledgment of FPGA’s immense potential. As powerful computing accelerators in the future, FPGAs will not only influence major corporate decisions and market trends but also accelerate workloads within enterprises, promote internal searches in large-scale data centers, and enhance the status of high-performance computing simulations.

As we crossed into 2016, FPGAs added machine learning and deep learning to their application levels, striking another blow to the FPGA industry.

Why Everyone Favors FPGA

First of all, the software stack for programming FPGAs has evolved, especially with Altera’s help, FPGAs have increased support for the OpenCL development environment. But not everyone is a fan of OpenCL.

Initially, Nvidia created its own CUDA parallel programming environment for its Tesla GPU accelerators. Then SRC Computers not only provided hybrid CPU-FPGA systems for defense and intelligence fields as early as 2002 but also further commercialized its developed Carte programming environment by mid-2016, which allows C and Fortran programs to be automatically converted into FPGA’s hardware description language (HDL).

Another factor driving FPGA adoption is the increasing difficulty in continuously shrinking chip manufacturing technology, making it harder to improve multi-core CPU performance. Although CPU performance has made significant leaps, it is mainly used to expand CPU throughput rather than the individual performance of single CPU cores. (We know architectural enhancements are challenging). However, both FPGAs and GPU accelerators have seen compelling improvements in performance per watt.

According to Microsoft’s operating tests, when executing deep learning algorithms, the performance per watt of hybrid computing including CPU-FPGA and CPU-GPU is comparable. GPUs run hotter and have similar performance per watt, but they also bring stronger working capabilities.

Improved performance per watt explains why the world’s most powerful supercomputers transitioned to parallel clusters in the late 1990s, and explains why they are now turning to hybrid machines instead of Intel’s next CPU-GPU hybrid powerhouse Xeon Phi processor “Knights Landing (KNL for short).”

With the help of Altera FPGA coprocessors and Xeon Phi processors Knights Landing, Intel can not only maintain its high-end competitive advantage but also continue to lead in competition with the Open Power Alliance formed by Nvidia, IBM, and Mellanox.

Intel firmly believes that workloads in ultra-large-scale computing, cloud, and HPC markets will grow rapidly. To facilitate its computing business’s continued robust growth, it can only become a seller of FPGAs, or others will snatch this only way away.

But Intel does not say this to everyone. They say: “We don’t think this is a defensive battle or anything else,” Intel CEO Brian Krzanich said at a press conference following the Altera acquisition news.

“We believe the Internet of Things and data centers are both enormous. These are also the products our customers want to build. 30% of our cloud workloads will be on these products, based on our predictions of how we view trend changes and market developments.

This is to prove that these workloads can be transferred to silicon in one way or another. We believe the best approach is to use Xeon processors with the industry’s best performance and cost advantages combined with FPGAs. This will bring better products and performance to the industrial sector. In IoT, this will extend to potential markets against ASICs and ASSPs; in data centers, it will transfer workloads to silicon, driving rapid growth in the cloud.

Krzanich explained: “You can think of FPGAs as a pile of gates that can be programmed at any time. According to their ideas, their algorithms will become smarter over time and with learning. FPGAs can be used as accelerators in multiple areas, capable of performing facial searches while encrypting, and can essentially reprogram FPGAs within microseconds. This is much cheaper and more flexible than large-scale single custom components.”

Intel Sees Greater Opportunities

Intel CEO Brian Krzanich announced after the acquisition that by 2020, up to one-third of cloud service providers would use hybrid CPU-FPGA server nodes, a shocking news. This also presents Altera, which has been targeting data centers since the end of 2014, with about $1 billion in FPGA opportunities. This figure is about three times Nvidia’s currently popular Tesla computing engine revenue.

In early 2014, Intel showcased a prototype of a Xeon-FPGA chip in the same package and planned to launch this chip in 2017. This was shortly after the proposal of a Xeon concept with FPGA circuits put forward by then Data Center Group GM Diane Bryant.

During the conference announcing the Altera deal, Krzanich did not specify the timeline for launching this Xeon-FPGA device, but he stated that Intel would create a single-die hybrid Atom-FPGA device aimed at the IoT market. Intel is considering whether to create a single package hybrid for Atom and Altera FPGA during the transitional phase.

In early 2016, during a Pacific Crest Securities conference call, Intel’s Cloud Infrastructure Group General Manager Jason Waxman discussed Intel’s data center business with research analysts, stating that FPGAs had become a hot topic.

First of all, although he did not specify which manufacturer or any device specifications, Waxman confirmed that Intel has already provided hybrid computing engine samples of Xeon plus FPGA to certain customers.

During the meeting, Waxman elaborated on the reasons driving Intel’s acquisition of Altera and its foray into programmable computing devices. Intel clearly hopes to make FPGAs mainstream, even if this may encroach on some of Xeon’s business in data centers. (We believe that because Intel thinks this infighting is inevitable, the best way to control it is to make FPGAs a part of the Xeon lineup.)

Waxman said: “I think this acquisition could involve many things, and some of them have already gone beyond the scope of the data center group.”

First, a potential core business is often driven by manufacturing leading advantages. In this regard, we can manage it well, and doing so has good synergy.

Furthermore, the IoT “group” is also very interested in this.

As far as we know, the expansion of certain large-scale workloads (such as machine learning, certain network functions) is attracting more attention. We realize that we might achieve some breakthroughs in performance, which will be a good opportunity to migrate FPGAs from data center applications to more suitable and widely developed fields.

However, within the data center group, FPGAs are merely companions to CPUs, helping to solve problems for cloud service providers and other types of large-scale applications.

Intel believes that key applications with priority and high demand for FPGA acceleration include machine learning, search engine indexing, encryption, and data compression. As Waxman pointed out, these are often very targeted and do not have a unified use case. This is the basis for Krzanich’s assertion that one-third of cloud service providers will use FPGA acceleration within five years.

Overcoming FPGA Barriers

Although everyone complains about how difficult it is to program FPGAs, Intel does not back down. Although not revealing too many relevant plans, Waxman proposed some methods to make FPGAs easier to use and understand.

Waxman said: “What we have is unique, something others cannot offer. That is, we can understand these workloads and drive acceleration.”

“We see a shortcut to promote machine learning, accelerate storage encryption, and accelerate network functions,” Waxman emphasized. This is based on our in-depth understanding of these workloads, which is why we see such opportunities.

But now FPGAs still face some difficulties because people are writing RTL. We are a company that writes RTL, so we can solve this problem. First, we make it work, then we can lower the entry barrier. The third step is true economies of scale, all relying on integration and manufacturing strength.

To address these barriers, we are providing a range of methods.

X86+FPGA?

For those speculating that Intel intends to replace Xeons with FPGAs, Waxman stated that this is nonsense.

Waxman stated that for algorithms with strong demands for high rates and repetitiveness, FPGAs are the best choice due to their inherent advantages. For data operations and transformations that require extremely low latency, FPGAs are also candidates.

Considering that Altera has integrated ARM processors and FPGAs on a SoC, it is natural to think that Intel will attempt to fully replace ARM cores with X86 cores to create similar devices. But it does not seem that this will happen.

First, during Intel’s second-quarter financial statement meeting in 2016, Krzanich promised that Intel would strengthen support for current customers using Altera’s ARM-FPGA chips.

Waxman further clarified: “Our view is that we will integrate FPGAs into Xeons in some form. We have publicly announced that we will create the first generation of devices using this single package, but we will adjust direction based on progress and may even achieve this on the same die. We will understand from customer feedback what the right combination is.”

By the way, I still look forward to seeing non-integrated systems that maintain their system-level synergy. We will not combine Xeon and FPGAs in various ways; instead, we will find the right targets and balance in the market.”

Programming Issues Take Center Stage

Although Altera’s toolset utilizes the OpenCL programming model to acquire application code and convert it to RTL (the native language of FPGAs), interestingly, Intel does not believe that the future success of FPGAs in data centers is based on improvements in the integration of OpenCL and RTL tools or the broader adoption of OpenCL.

Waxman also emphasized: “This is not based on OpenCL.” While we do see OpenCL as a way to further expand the application scope of FPGAs, the initial cloud deployment of FPGAs may be completed by more capable companies, but they do not require us to provide OpenCL, Waxman added.

While unable to speak freely about it, Waxman hinted that Intel plans to make FPGAs easier to program. He stated that Intel will provide programmers with RTL libraries to facilitate their invocation of routines deployed on FPGAs and promote the formation of gates for application routines on them, rather than letting them create routines themselves. This makes sense, similar to what Convey (now a department of Micron Technology) did with FPGA-accelerated systems years ago.

Waxman said: “I think there is a continuous acceleration. At first, you might not know what you are trying to accelerate, just trying some things, so at this acceleration stage, what you want is a more general purpose. When you really want to accelerate, you will want something more efficient, lower power, and less space, at which point you will focus on FPGAs.”

Waxman also cited Microsoft’s use of FPGAs in its “Catapult” system as an example.

The system uses its Open Cloud Server and adds FPGA interlayer cards as accelerators. We studied this project in March, applying these accelerators to execute the same image recognition training algorithms on Google, resulting in a better performance/watt from a 25-watt FPGA device compared to a server using Nvidia Tesla K20 GPU accelerators (235 watts).

As we said, we have no doubt about the performance data released by Microsoft and Google. However, it is unfair to measure the performance of discrete GPUs or FPGAs and their thermal profiles against each other. You have to look at this at the server node level.

If you realize this, the FPGA-assisted Microsoft server slightly leads the Google server with Tesla K20s at the system level. (These are just our estimates based on image processing performance per watt per second). In this comparison, Microsoft should not consider costs. Frankly, unlike the Tesla GPUs equipped with everything, Microsoft’s open cloud server did not use Juice or Cooling. Real evaluations will use GPU interlayer cards and will also need to consider heat, performance, and price factors.

However, the focus of Waxman’s discussion remains that. “At some point, you really want that solution that surprises you and can do it with lower power. And that is what our FPGA solution excels at.”

Cloud Business

Lastly, the cloud business of Intel needs to be considered. These customers currently account for 25% of their data center group’s revenue.

Overall, their purchasing volume grows approximately 25% annually. It is expected that starting from 2016, the overall data center group business will grow by 15% in the coming years. Let’s do some calculations.

If Intel’s plans are implemented on schedule, its data center group’s revenue in 2016 will reach $16.6 billion. Cloud service providers (including cloud builders and super-scale computing users using our language on The Next Platform) account for about $4.1 billion, while the remainder belongs to Intel’s data center, with sales data of about $12.5 billion. Therefore, Intel’s data center business growth is around 12% (excluding cloud), which is half the cloud rate. Intel needs to meet the growth of the cloud and the apparent demand for FPGAs, even if it only occupies a little of Xeon’s capacity. For Intel, this choice is better than allowing GPU acceleration to continue to grow.

Programming may be a primary reason hindering the widespread adoption of FPGAs (unlike other accelerators, which have rich development ecosystems, such as Nvidia GPU’s CUDA). This drives programmers to extend designs based on C language or use OpenCL rather than the low-level models that have plagued FPGA development in the past. But even with so many milestones in the application process, FPGAs remain less favored in the mainstream. We will explore methods and opportunities to solve programming issues.

Although we have communicated with many suppliers in this relatively small ecosystem (including Altera and Xilinx, the two main suppliers), according to long-term FPGA researcher Russell Tessier, the day when FPGAs can flourish in a broader market is still ahead, and new developments mean broader adoption.

He has studied FPGAs for over twenty years at the University of Massachusetts (also worked at Altera and founded the virtual machine engineering firm acquired by Mentor Graphics) and believes that FPGAs are officially transitioning from scientific projects to enterprise applications. He believes the key lies in improvements in design tools, as designers continually enhance their high-level designs. Additionally, tool vendors can better guide chip development. He adds that the large amount of logic within devices means users can achieve more functionality, making FPGAs more attractive across more fields.

Tessier said: “In recent years, a significant trend for FPGAs is that these devices are easier to program.”

Xilinx currently encourages the use of its Vivado product to design with C language. Altera has also developed an OpenCL environment. The key is that both companies are trying to create an environment where users can use more familiar programming (like C and OpenCL) rather than relying on RTL design experts proficient in Verilog or VHDL. Although good progress has been made in recent years, it is still in the advancing stage, but this will help bring more things into the mainstream.

One factor that truly benefits FPGAs is that if paired with chips, they can establish a fast internal interconnection, solving limitations in memory and data movement. This advantage is the main incentive for Intel to acquire Altera. Furthermore, if large companies like Intel and IBM can actively promote the construction of FPGA software ecosystems, their application markets will expand rapidly. The mainstreaming of FPGAs (at least not as important as GPUs now) may occur more quickly.

Tessier explains: “The increased integration of standard core processors is definitely key. The past barriers were languages and tools, and as these barriers diminish, it opens a door for new collaboration opportunities for chip suppliers. With these and other ‘mainstreaming’ trends emerging, the continuously changing application areas of FPGAs will continue to grow. For example, financial service stores were the first users to employ FPGAs for financial trend and stock selection analysis, but use cases are expanding. Now there are more powerful devices that can solve bigger problems.

Broader Application Areas

Additionally, FPGAs are discovering new uses in other emerging fields, including DNA sequencing, security, encryption, and some critical machine learning tasks.

Of course, we hope that FPGAs will become powerful and “enter” the world’s largest cloud and ultra-large-scale data centers. Hamant Dhulla, vice president of Xilinx’s data center division, strongly agrees. At the beginning of 2016, he told The Next Platform, “Heterogeneous computing is no longer a trend but a reality,” at which point Microsoft launched the Catapult case using FPGAs (currently many or will be many), Intel acquired Altera, and more statements about the widespread application of FPGAs in data centers were made.

From machine learning, high-performance computing, data analysis, and other fields, FPGAs are emerging in increasingly diverse application areas. These are all related to the increasing availability of on-chip storage embedded within FPGAs, which FPGA manufacturers and potential end-users expect.

Dhulla stated that the market potential is large enough for Xilinx to adjust its business approach. In recent years, storage and networking have dominated the FPGA user base. However, in the next five years, demand on the computing side will far exceed storage and networking, continuing to develop along a steady growth line.

In other popular areas for FPGAs (including machine learning), they are more like “collaborative” accelerators with GPUs. Undoubtedly, for many machine learning workloads’ training parts, GPUs are dominant. Therefore, a lot of computing power is needed here, similar to HPC, where the power envelope tradeoff is worthwhile. However, these customers buy dozens or hundreds of GPUs rather than tens of thousands; the large number of accelerators is used in the inference part of machine learning pipelines, which is where the market lies.

As we pointed out, Nvidia is using two separate GPUs (with M4 for training and the lower-power M4 inserted to reduce server load) to offset this, but Dhulla believes FPGAs can still lower power consumption by adopting PCIe methods and can be embedded in ultra-large-scale data centers.

Their SDAccel programming environment makes it more practical by providing high-level interfaces to C, C++, and OpenCL, but the true pathway to promote ultra-large-scale and HPC adoption is through end-user examples.

When it comes to these early users, it sets the stage for the next generation of FPGA applications, and Dhulla points to companies like Edico Genome. Xilinx is also collaborating with customers in other fields, including oil and gas and financial historical computing. Early customers are applying Xilinx’s FPGAs in machine learning, image recognition, analysis, and security, which can be seen as their first steps in developing computing acceleration business.

Despite poor double-precision performance and overall pricing, the real large-scale application opportunity for FPGAs lies in the cloud. Because FPGAs can offer advantages that GPUs cannot. If FPGA vendors can convince their end-users that their accelerators can provide significant performance improvements (in some cases they do), they can address pricing issues by providing FPGAs in the cloud through a complexity-wise programming environment with other accelerators (like CUDA) to advance OpenCL development. This could be a new hope.

Of course, this hope comes from deploying FPGAs within ultra-dense server cloud architectures rather than in single-machine sales. This model has already occurred in FPGA’s financial services.

Just as their GPU accelerator “partners” pull around deep learning to quickly gain more users, FPGA devices are exploring the real opportunities to penetrate the market by solving neural network and deep learning problems.

New application hosts mean new markets, and as cloud applications promote the elimination of some management overhead, it could mean broader adoption. FPGA vendors are striving to drive their applications in some key areas of machine learning, neural networks, and search. FPGAs are becoming increasingly common in contexts like natural language processing, medical imaging, and deep data detection.

In the past year, various applications of FPGAs have been exposed, especially in deep learning and neural networks, as well as in image recognition and natural language processing. For example, Microsoft uses FPGAs to provide double the search service on 1,632 nodes and employs innovative high-throughput networks to support Altera FPGA-driven workloads. China’s search engine giant Baidu (also a user of many deep learning and neural network tasks with GPUs) is executing storage control with FPGAs, with a daily data throughput between 100TB to 1PB.

The large-scale data center applications and others using FPGAs are attracting more attention to the single-precision floating-point performance of FPGAs.

While some cases use (including the Baidu example), GPUs as computing accelerators and FPGAs on the storage side, researchers from Altera, Xilinx, Nallatech, and IBM in the OpenPower Alliance demonstrate the bright prospects of FPGAs in cloud deep learning.

It can be said that we are now in a golden age for FPGAs.

Source | FPGA Research Institute

The Evolution of FPGA Software Stack in Programming

☞ Business Cooperation: ☏ Please call 010-82306118 / ✐ Or send a message to [email protected]

Click

Here “Read the original text” to reach Electronic Technology Application Official Website

Related posts

Leave a Comment Cancel reply