Source: TechSugar
The rise of artificial intelligence brings a series of brand new challenges, along with dazzling innovative choices and necessary trade-offs.
As AI is increasingly applied in edge devices, coupled with the growing demand for new functionalities, chip manufacturers are forced to rethink when and where to process data, what kind of processors to use, and how to build enough flexibility within the system to meet the application needs of multiple markets.
Unlike cloud solutions, which typically require almost unlimited resources, the resources needed for edge computing are strictly limited due to power consumption, area, and cost constraints. However, compared to sending all data to the cloud, edge computing is cheaper, faster, and more secure. Furthermore, with the rise of AI technology, edge computing brings a wealth of innovations and new opportunities for inference and some limited training.
Thomas Rosteck, President of Infineon’s Connected Secure Systems Division, stated in a recent speech: “AI is moving from the cloud to edge devices. With this intelligent technology, we can save more power because acquiring data, transmitting it to the cloud, computing in the cloud, and then retrieving it consumes a lot of power. This is a problem we can solve with edge AI, and we also contribute to carbon reduction. If data is not transmitted but kept locally, it is safer.”
The edge encompasses a wide range of systems, from mobile devices connected to a single battery to local data centers. Regardless of the architecture, these devices share a common need to process, store, and move increasing amounts of data at rates consistent with applications. They all need to execute these functions consistently and reliably, regardless of processor and memory utilization, as well as physical influences such as noise, heat, or vibration.
Due to the size constraints of mobile phones, devices like smartphones have been dealing with such issues for years, using various processing elements and complex thermal management to avoid burning users. These measures include determining which transistors are active to reduce dynamic thermal density using checkerboard methods, adding heat sinks and thermal monitors in appropriate places, and using different types of processors, including some manufactured at cutting-edge nodes.
Smartphone vendors can absorb these costs because they can be spread across millions of devices. However, for many other IoT/edge devices, the business environment is very different, price-sensitive, and sales volumes are much lower. For these products, vendors often rely on off-the-shelf commercial components like MCUs and DSPs, many of which are continuously evolving to meet growing computational demands. In some cases, device manufacturers also combine general-purpose processors with more targeted semi-custom accelerators, which can enhance critical performance and limit the energy required to execute specific computations. As AI models begin to appear everywhere, they are using all these components in new ways and combinations.
John Weil, Vice President and General Manager of Edge AI Processor Business at Synaptics, stated: “We are currently in the second phase of AI, which is why you see things like AI centers. For example, previous security cameras could identify people walking down the street. But now, this task is no longer performed by the camera but is completed in a centralized box without cloud connectivity. In the past, we needed the cloud to achieve this. Now we can do it with very low-cost products. The third phase of AI will include new product definitions, creating new products that change our lives. The market is starting to define new product categories that did not exist before AI.”
It not only improves processing efficiency but also affects how processors are utilized within and between devices. In many cases, it involves reconsidering system architecture and the sharing of computational resources. Similarly, noise filtering for set-top boxes can adopt very different design approaches than in the past.
“Taking far-field voice communication as an example, you have a device you are talking to, like Alexa, Google Home, or a set-top box,” said Prakash Madhvapathy, Director of Product Marketing for Cadence Tensilica Audio/Voice DSP, “Some people will first receive voice input and filter it to reduce noise, then process it through a signal crossover line. That is one method. Others will directly receive noisy signals, training the device with both noisy and clear signals to ensure the trained AI can interpret noisy data as well as it does clear data. Therefore, in this case, noise becomes part of the actual training data, and the AI can then interpret noise and signal separately and filter out the noise. If, through proper training, it can infer the same result as inferred from clear data, then you get a very close answer.”
It is also expected to bridge the gap between EDA tools and the devices created using these tools, making their relationship closer. Ravi Subramanian, General Manager of System Design at Synopsys, stated: “Generations of von Neumann architecture have driven EDA development, which has greatly benefited us. This learning has matured significantly, and we have refined the tools to reach the current level. But now we must engage in entirely new learning, which can be achieved through AI. Nowadays, we can build models that provide great opportunities for our learning on how to drive tool development before AI existed. These are questions we discuss directly with our customers. Moreover, you must also think about how to apply these technologies.”
Thermal Management Issues
For decades, data centers and smartphones have struggled to address thermal and energy efficiency issues, which have become increasingly challenging as data volumes grow. However, at the edge, demand processing has traditionally been lower in many IoT applications, and heat has not been a significant issue.
Take Bluetooth devices as an example. “We haven’t found this to be a major issue, primarily because these small IoT devices are overall low power,” said Marc Swinnen, Director of Product Marketing at Ansys. “They rely on battery power and sometimes power through energy harvesting, therefore adopting low-power designs, which means thermal management is usually not a big issue for these devices. Moreover, the power consumed by chips is proportional to their surface area rather than their volume, and as chip sizes decrease, power consumption decreases with surface area. Therefore, cooling is also proportional to surface area. If you halve the chip size, its surface area is only half, but the power consumption-generating surface area is also half. Unlike volume effects, both tend to scale in parallel.”
These types of chips will continue to be used in devices, but as AI is integrated into more devices and systems, there will be a need for more powerful processors.
“Some chips have very low power consumption, so they do not actually lead to system overheating,” said Scott Best, Technical Director at Rambus. “However, there are usually some components in the system that cause overheating. Every chip has this localized heating issue. Certain components inside the chip or within the system generate heat and heat everything around them.”
The rapid proliferation of medical devices and monitors has heightened awareness of these issues. Depending on what the device measures or detects, there can be various processing demands and thermal issues. The difference is that today there are more solutions available to address these issues than in the past.
“Even if battery capacity and other technologies still use current techniques, when monitoring human health—whether implanted in a kidney or other organs, or monitoring from the surface—you do not need continuous monitoring,” said Madhvapathy of Cadence. “If you can monitor once an hour or every two hours, then the device’s battery life will be longer.”
Of course, this largely depends on the application. But the overall trend is ubiquitous data processing, making thermal management not only a device health issue but also a user safety and comfort issue.
John Parry, Industry Director of Electronics and Semiconductors at Siemens EDA, stated: “Whether editing your photos on a phone or checking heart rate, sleep quality, or blood oxygen levels on a watch, IoT devices play a significant role in our daily lives. They consume battery power and generate heat. Eliminating this heat is important but also a real challenge. For wearable devices, the primary thermal flow path is conduction to the skin. The skin surface temperature must be kept below 45°C. Otherwise, there is a risk of low-temperature burns. The challenge with phones is that it is not guaranteed that the phone case will conduct heat. Users often use phone cases to prevent accidental damage, which hinders heat dissipation.”
Solutions Vary
Almost every design problem has a corresponding solution, and if not, clever engineers will develop one. However, the greater challenge is understanding how one or more chips will be used and having sufficient flexibility in the design to be able to adjust according to requirements.
“We have an MCU and a coprocessor for running the AI part of the chip,” Rosteck of Infineon stated. “This chip also includes a GPU. Therefore, we can handle these tasks very flexibly. Moreover, language models not only need to run on the core but also require accelerators. The second step is to help our customers get the models they need. We acquired Imagimob last year, and now we have a toolchain that allows models to be designed by experts in the field, and you can combine it with AI experts to achieve the depth you want. Finally, it will be converted into code that can be executed on our macros.”
Solutions vary widely, particularly in thermal management. For example, heat can be dissipated through the surface of the device or chip, but when the chip is small and in advanced packaging, it becomes more challenging when the substrate is thinned to shorten the distance signals need to be transmitted vertically.
“Designers need to maximize the effectiveness of the available surface area, so an effective strategy for optimizing cooling is thermal spreading,” Parry stated. “This means using ultra-thin heat spreaders to distribute heat away from the main heating elements within the device to reduce hotspots on the device surface. A uniformly heated surface maximizes the effectiveness of the available surface area while minimizing hotspots.”
Industry insiders say that heat spreaders are not a new idea, but they have not been successful in the past because the target devices were mobile, and movement significantly reduced their effectiveness.
Many other strategies can also be implemented to ensure that overheating does not affect device performance or pose risks to users. This is especially important for use cases that still have strict thermal limits.
Steve Roddy, Chief Marketing Officer at Quadric, stated: “One feasible approach is to sum the worst-case continuous power ratings of various components and SoC subsystem IP modules and verify whether the total of each major system does not exceed the thermal rating of the entire device when all components are running at full speed. This can be done by checking the specifications of individual component manufacturers or the ratings of IP module suppliers and applying some common-sense rules (which systems might be ‘on’ simultaneously) without extensive design analysis or simulation. While this method may be quick and require less engineering time investment, its downside is that it may overstate the actual active power consumption scenarios, potentially sacrificing peak performance or reducing functionality for thermal design goals.”
A second, newer approach relies on creating full-chip and full-system digital twins that include models of known or expected power consumption in each mode.
Roddy stated: “EDA companies have been promoting their latest tools to facilitate this left-shift approach, which allows designers to model virtual systems and run actual embedded software code before chip or circuit board design. If actual code is run on an accurate system model, it can model actual behavior at a finer level to determine answers to questions such as ‘What is the actual off-chip I/O traffic on this DDR interface, and therefore what is the actual dissipation? Does system A really turn on simultaneously with system B, or can I time-multiplex them with adjusted software for lower peak power consumption?'”
This approach helps predict system power consumption and achieve maximum performance within design constraints.
While controlling heat can first be achieved by controlling voltage, Swinnen of Ansys stated that more complex strategies can be adopted, especially through complex clock gating forms. However, this method requires special care, can be very time-consuming, and needs to be considered early in the design process.
“Not just large modules, but any small modules that can be turned off as well,” he stated. “They have very complex clock structures to achieve this. At the design stage, when you look at typical clock gating tools, like Synopsys’s Power Compiler, they place a gate in front of a few flip-flops, and then each flip-flop that can be turned off has its own gating. But in fact, all these gating can be unified into a gating closer to the tree root, rather than turning off this group of five, that group of five, and another six. Instead, you only need one gating to turn them off. The problem is that you must be careful in the design because the enable signal for that clock gating (to ‘turn it on’ and ‘off’) must be higher and higher in the tree structure, tightening the timing more and more.
However, this also has its drawbacks, as the delay between clock switching and the signal itself can become larger. This limits how far designers can push clock gating, requiring trade-offs between timing and power efficiency. Swinnen pointed out that while some solutions can achieve maximum power efficiency, these clocks need to be manually configured at the gating or RTL level. This approach can be a headache for designers, and due to the increased combinations of nodes that need to be tested, it complicates power analysis. “You can save power, but you need someone to work hard to adjust the design and ensure timing is correct, while if you use an automated system, I just need to set ten clock gates. This can also work, but efficiency won’t be as high.”
Roddy of Quadric noted that some modern tools enable designers to calculate power characteristics based on actual code rather than just measuring total cycle counts. “Compared to more modern converters that rely heavily on activation, normalization, and shape transformation, traditional convolutional neural networks use a lot of 5 x 5 and 3 x 3 convolutions, so the time for switching gates is longer,” he stated.
Some seemingly obvious solutions also have drawbacks. Swinnen cited hearing aids as an example, where overheating chips can cause serious issues. In this case, he stated that safety measures need to be built-in, such as embedding thermal sensors to reduce clock speed when abnormal signals occur. “This will reduce device performance, so you cannot reach nominal performance, but the temperature will be kept within a certain range. This is a somewhat brute-force method. You are not really solving the problem, just addressing the symptoms.”
Swinnen pointed out that cooling issues for IoT devices may become more critical in the coming years, as some companies have announced their intention to move more AI computation from data centers to the edge. This may force designers to consider more than they do now. “Power has always been a soft fault point, not a hard fault point,” he stated. “If you do not meet performance or DRC, if your shape is too close for the foundry, these are hard errors. You cannot tape out with these; they will hold the tape out and redesign until they fix these issues. But power is more often seen as a soft requirement.”
Fundamental Choices
Behind these efforts lies the question of which methods are the most efficient and cost-effective, namely optimizing data processing and management while minimizing data movement (which is an increasingly important consideration for AI and the growing volume of data), regardless of where this processing and management take place.
“This is really about understanding the data movement requirements of applications,” said Steven Woo, a researcher and distinguished inventor at Rambus. “The question is how to execute this correctly and which applications it applies to. No single application can benefit from it. There are many applications, and the ways to optimize them differ. Everyone understands this is a big issue.”
The key to all this is understanding what methods are most effective in what circumstances. “We spend a lot of time helping customers figure out how to avoid over-design issues,” Weil of Synaptics stated. “When we say ‘AI-native’, you do not need all the extra performance because you are only using it in research mode. So you can certainly use NVIDIA Jetson. It’s great, costs two to three times more than ordinary products, and has strong performance. But when you want to produce a million devices, you won’t use it. We spend a lot of time helping them explore more optimized solutions.”
