The Future of Data Center Architecture: x86, RISC-V, or ARM?

Click the blue text to follow us

Article reprinted from: Techsugar

By: ANN STEFFORA MUTSCHLER

Source: Semiconductor Engineering

Translation | Editorial Department

Customization and heterogeneity are gradually becoming the mainstream development trends in data center architecture, shifting from processors manufactured by a single vendor to processors and accelerators jointly produced by multiple vendors (including system companies’ own design teams).

Due to the rising costs of server power supply and cooling racks, tighter integration is needed to handle AI/ML applications, along with a significant increase in the volume of data that needs to be processed. Over the past five years, hyperscale data centers have gradually migrated to heterogeneous architectures. Coupled with the construction of edge data centers at various levels, the entire data center industry is undergoing tremendous changes.

It is precisely because of these changes that Intel, which previously refused to open its architecture to third-party IP, has turned to build a “democratized environment” for its chips. In addition to the company’s willingness to include Arm cores in its solutions and having reached multi-year agreements, Intel has now joined the RISC-V International Committee as a primary member.

It is still unclear how this will develop. On one hand, it opens the door for more customized processing elements and accelerators based on RISC-V architecture, which will introduce RISC-V designs into data centers for the first time, although the capacity remains to be seen. But from a longer-term perspective, it lays the groundwork for more customization by major chip suppliers who previously relied on each new revision of Moore’s Law as their competitive weapon.

This solution is no longer effective, as evidenced by Apple’s M1 chip. Apple replaced Intel chips in laptops and desktops with processors based on internally designed Arm cores, tightly integrated with its native software to enhance performance and extend battery life by up to five times. Reports indicate that Apple plans to switch its desktops and servers to Arm-based chips in the coming years.

Arm is also entering the enterprise. “Cloud computing plays a critical role in existing application areas such as media consumption, e-commerce, remote learning, standard communication, IT services, and digital transformation, and will become a core driving force for the development of new applications such as machine learning, the metaverse, autonomous driving, and smart IoT in the future,” said Dhaval Parikh, marketing director of Arm’s infrastructure business line.

Parikh pointed out that in order to meet the growing demands of existing applications and new applications supported by cloud computing, hyperscale data centers and cloud service providers are seeking to rebuild their next-generation data centers using specially built heterogeneous infrastructures.

As a result, competition in the heterogeneous architecture market in the data center field is becoming increasingly fierce. While it is unlikely to quickly replace major processing elements, RISC-V adds another customization option. It is expected that this architecture will further penetrate the data center field in the coming years. Intel’s actions will only accelerate this transition. Intel Foundry Services stated earlier this month that Intel is collaborating with leading partners in the RISC-V ecosystem, including Andes Technology, Esperanto Technologies, SiFive, and Ventana Micro Systems, to ensure optimal performance of RISC-V on Intel Foundry chips and accelerate time-to-market for chips.

“Currently, everyone seems to be focused on the two main advantages that RISC-V brings. RISC-V is an open-source ISA and does not require licensing fees,” said Gajinder Panesar, a researcher at Siemens EDA. “First, the open ISA only applies to CPUs. But this is not about CPUs; it’s about systems. You still need to integrate it into the SoC, and the SoC needs to be placed in the data rack or other devices. Therefore, even if a CPU core is developed, it does not mean the end. This is very friendly for enterprises, especially startups, because not having to pay licensing fees can be quite a large expense. For the big players in this market, licensing fees pale in comparison to the actual cost of manufacturing chips. Then there are the costs of developing chips, from design, implementation, verification, confirmation, to manufacturing. Especially for cutting-edge technology chips, using the RISC-V architecture brings a series of benefits. In fact, when paying around 80 million to 100 million dollars to manufacture chips, saving 2 million or 3 million dollars on licenses becomes negligible. Developing chips based on open-source ISA is another story, which can be compensated and special offers can be obtained from EDA tool vendors, but everything still needs to work smoothly. Additionally, chips need to be placed into the software stack, submitted to the operating system, and security layers. Security audits also need to be conducted. All these costs add up.”

Collaboration of Various Parts

In the downstream market, processors are gradually shifting towards heterogeneous integration, triggering large-scale market competition among enterprises. The ecosystem that adapts to and supports heterogeneous integration is still under construction and will remain so in the foreseeable future. The transition from billions of unit processors to custom-designed processors can integrate various small chips in smaller manufacturing runs, which remains a huge challenge for design teams.

“If the choice of processors is the only challenge they face, the design teams will continue to break through,” said Frank Schirrmeister, senior group director of solutions and ecosystems at Cadence. “However, from the perspective of RISC-V, when designers try to build custom chips, they face many challenges. From selecting the right IP, through hardware or software verification, to choosing the right software from the IP catalog, and all potential 3D-IC integration challenges. Then, a complete and comprehensive verification of the IP needs to be conducted, and it must be placed on the circuit board, ensuring sufficient airflow so that it does not burn out the rest of the data center. When designers make these decisions, the challenges are plentiful. The choice of architecture is actually just one of them. This is why it is important to consider how to make the design process easier and ensure that mistakes do not occur within your responsibilities. Frankly, this situation is a challenge for system designers.”

The Future of Data Center Architecture: x86, RISC-V, or ARM?

Homogeneous and heterogeneous chip stacks of logic memory (Image source: Cadence)

For EDA vendors, it is particularly interesting to be able to use RISC-V as an entry point to delve deeper into systems and large processor companies. “This is a huge opportunity because RISC-V is open-source, but development costs are still high,” said Natalija Colic, a digital design engineer at Vtool. “This is a highly customizable processor, and verification needs to follow this trend. RISC-V has ushered in a great opportunity for development in server clusters, but efforts are still needed to make RISC-V ISA a viable competitor.”

She noted that discussions around RISC-V are positive in other respects. “For example, this trend may force Arm, which has long held a monopoly position, to incorporate RISC-V into some of its traditional products. With Intel, Google, and Arm investing in RISC-V, it will certainly affect the market, not only for clusters with these accelerators in RISC-V form but also for smaller embedded chips, as we do at Vtool.”

Slow Changes in the Data Center Market

Of course, none of this will happen overnight. Historically, data centers have been conservative regarding transformation, and EDA tools take time to develop. However, competition in the data center market is extremely fierce, and the introduction of heterogeneous architecture marks a turning point.

“We are seeing AI processors starting to use RISC-V, with varying degrees of customization, enhancement, and expansion,” noted Rupert Baines, chief marketing officer of Codasip, pointing out that RISC-V’s success thus far has been limited to AI, accelerators, and specific components from companies like Esperanto and Mythic. “What you are seeing is deeply embedded applications. For years, Nvidia has been using RISC-V as minion cores and controller cores, not for actual GPU functions or AI functions, but for everything else. Therefore, we see RISC-V steadily used in data centers in many ways, but not yet for heavy-duty Intel Xeon application processors. This remains dominated by Intel, with AMD deeply positioned, and Arm in early progress. Nvidia, Ampere, and Marvell are also entering this field, while RISC-V has not fully penetrated. But RISC-V will ultimately go deeper into this field.”

In fact, Baines predicts that mainstream data center application processor cores based on RISC-V ISA could become more common within just three to four years.

At this point, the real value may be more about the integration of various components rather than having a single vendor that does everything. It’s simple to break it down on paper, but much more difficult to reassemble all the parts into a secure, efficient, and reliable device, as large chip manufacturers like Intel and AMD have been scrambling to use a small chip/tile approach to put everything together. Foundries like TSMC have also been researching this method, using hybrid bonding to accelerate data flow between small chips.

The Future of Data Center Architecture: x86, RISC-V, or ARM?

Chips with small chips (Image source: Cadence)

The Future of Data Center Architecture: x86, RISC-V, or ARM?

RFIC packaging collaborative design (Image source: Cadence)

The Future of Data Center Architecture: x86, RISC-V, or ARM?

3D-IC packaging (Image source: Cadence)

This indicates the ongoing announcements and repositioning across the entire processor field. Industry insiders report that Arm has recently begun collaborating with startups to offer more flexible licensing terms to save time and effort for customers in design.

“If Arm really fits your project, you should choose Arm because it has been tested and has all the functions your project needs,” said Olivera Stojanovic, a project manager at Vtool. “But if you need a more specific design, then RISC-V might be a better choice. But remember, comprehensive verification of CPU functionality is essential. A verification process needs to be executed to ensure that the open-source CPU based on ISA is fully verified.”

Potential Market Trends

While RISC-V has certainly garnered a lot of interest, its success is less about being able to drive large-scale changes in data centers and more about the widespread changes occurring in the market.

“Consumer demand is driving changes in data center architecture, optimizing workloads,” stated Schirrmeister from Cadence. “Data center providers need to offer corresponding solutions for specific workload sets, and how to better achieve this from the underlying processors? This is why it is not inherently about RISC-V. Now, a series of decisions need to be made regarding interfaces with other devices. Which buses are supported? Can it scale well? Does it meet the requirements?”

In this regard, RISC-V may just be one of many options. “If I were a system architect, I would adopt a small chip-based RISC-V core and integrate it, and now I must figure out whether there is software support,” he stated. “Moreover, I also need to manage risk preferences. If problems arise, can I transfer the risk to someone else, or do I take it all on myself? This is an obstacle that needs to be overcome. If you have figured that out, if there is software support, and if you are comfortable with the risks that might come from introducing RISC-V drivers, then in the backdrop of the 50 decisions you must make, the choice of RISC-V plays an important role.”

Uncertain Future Development

So will an ISA like RISC-V influence data center architecture over time? Codasip’s Baines believes this is a trend for the future.

“One reason is about controllers versus decision-makers,” Baines said. “If you are Google or Facebook, then the hardware supplier is you; in the past 5 or 10 years, every Google data center has been filled with Google servers designed to Google specifications. More and more companies (Google, Facebook, Microsoft) are not only customizing their hardware but also customizing their chips. By customizing, these companies have control over the stack from top to bottom. Therefore, if these companies want, they will specify the programming language. Perhaps they will use Swift or Objective C or GO. These programming languages may differ from those used by other companies, but that doesn’t matter. They also have their own toolchains, and if they switch to different ISAs, that is also within their control. If these companies see the advantages and benefits, they will enter the market. This goes back to the concept of functional computation and domain-specific computation. If you are vertically integrated and control both software and chips, then investing in functional computation, heterogeneous computation, and domain-specific architectures makes a lot of sense, meaning you need to control the architecture. You cannot rely entirely on an independent third-party supplier.”

The Future of Data Center Architecture: x86, RISC-V, or ARM?

Google Cloud TPU at Hot Chips 2019 (Image source: Semiconductor Engineering / Susan Rambo)

Meanwhile, some of these companies are constantly reassessing computing architectures. “When we look at today’s system architecture, it should be about systems, not CPUs,” said Panesar from Siemens. “People talk about high-end CPUs and how to meet system requirements. But in reality, it needs to be placed in the context of applications. I am disappointed because there is almost no innovation. If you scrape off RISC-V and slap an ARM label on it, you really can’t tell the difference. Aside from whether it is a 32-bit or 64-bit processor, there is no other distinction. An opportunity is lost here because more could have been done to change the architecture in a deeper way than what is being done now. Domain-specific architectures and memory computation concepts will not become mainstream. There may be some niche markets focusing on this, but future innovations will come from breaking existing paradigms of doing things. For example, having a cache-based system is the same as the architecture I started with many years ago, except they have new buzzwords or acronyms. But it is more or less the same. I am not too convinced about caches and consistency because this is the paradigm people have held onto while they are looking for ad-hoc solutions for new applications.”

For some time, arguments have been made for applying more domain-specific architectures in data centers.

“At this stage, data centers tend to be very general-purpose,” said Panesar. “This burdens those who do not want everything and cannot provide consistent, appropriately optimized solutions for applications. In reality, you are not effectively serving other potential customers. We need to step back and ask what the goal is. Our goal is to provide innovations and products that are addressing the problems and challenges we face in the 21st century, rather than just taking existing solutions and improving them. There is an opportunity to adopt an ISA that can be modified and place it into specific or domain-specific systems. This is the source of innovation. It will not come from how good the designed CPU is. It is about systems. To achieve this, there needs to be an opportunity for all CPUs to look different.”

The Future of Data Center Architecture: x86, RISC-V, or ARM?
The Future of Data Center Architecture: x86, RISC-V, or ARM?

I am “watching”, click “like” for Micro-Nano China Chip

The Future of Data Center Architecture: x86, RISC-V, or ARM?

Leave a Comment

Your email address will not be published. Required fields are marked *