👆If you would like to meet regularly, feel free to star 🌟 and bookmark it~Source: This article is translated from Zach, thank you.
In late April, one of the most well-funded AI chip startups, SambaNova Systems, significantly deviated from its original goals. Like many other AI chip startups, SambaNova initially aimed to provide a unified architecture for both training and inference. However, starting this year, they abandoned their ambitions for training, laid off 15% of their staff, and shifted all their focus to AI inference. Moreover, they are not the first company to make such a transition.
In 2017, Groq was still boasting about their training performance, but by 2022, they had completely focused on inference benchmarks. Cerebras CS-1 was initially designed primarily for training workloads, but CS-2 and later versions shifted their focus to inference. SambaNova seems to be the last company among the first generation of AI chip startups still seriously focused on training, but that situation has finally changed. So, why are all these startups shifting from training to inference? Fortunately, as a former employee of SambaNova (referring to the author of this article, Zach, who claims to have worked at SambaNova Systems from 2019 to 2021), I have some insider insights.
SambaNova placed great importance on training models on their hardware. They published articles on how to train on their hardware, boasted about their training performance, and discussed training issues in their official documentation. Many analysts and external observers, including myself, believed that SambaNova had a unique advantage over competitors like Groq, which was one of the first startups to pivot to inference, by meeting the demands of both the inference and training markets with a single chip.
SambaNova also invested a significant amount of time and effort into achieving efficient training. During my time at the company from 2019 to 2021, I spent a lot of time implementing kernels for the NAdam optimizer, a momentum-based optimizer commonly used for training large neural networks. We designed and optimized hardware and software features specifically for training, and both internal and external information indicated that support for training was a key component of our value proposition.
Now, SambaNova has suddenly abandoned much of this work and shifted its focus to inference. I believe they did this for three main reasons: inference is an easier problem to solve; inference may have a larger market than training; and Nvidia’s absolute dominance in the AI training chip space.
Inference is an Easier and Larger Market
Many analysts believe that the market size for AI inference could be ten times that of the AI training market. Intuitively, this makes sense. Typically, you train a model once and then use that model for many inferences. The cost of running inference each time is far less than the entire training process of the model—but if you run inference on the same model enough times, it becomes the primary cost of servicing that model. If the future of AI is a few large models, each with a significant amount of inference, then the inference market will dwarf the training market. However, if many organizations ultimately train their custom models, this future may not materialize.
Even if inference does not ultimately become a larger market than training, there are some technical reasons that make inference easier for AI chip startups to tackle. When training a model, you need to run a large amount of training data through that model, collect gradient information during the model’s operation, and use those gradients to update the model’s weights. This process allows the model to learn. It is also extremely memory-intensive because you need to cache all those gradients and other values, such as the model’s activation values.
Therefore, efficient training requires a complex memory hierarchy that includes on-chip SRAM, in-package HBM, and off-chip DDR. AI startups find it difficult to obtain HBM and to integrate HBM into high-performance systems—thus, many AI chips (like Groq and d-Matrix) lack the HBM or DDR capacity or bandwidth needed for efficient training of large models. Inference does not have this problem. During inference, gradients do not need to be stored, and activation functions can be discarded after use. This greatly reduces the memory footprint of inference as a workload and lowers the complexity of the memory hierarchy required for chips dedicated solely to inference.
Another challenge is inter-chip networking. All gradients generated during training need to be synchronized across every chip used in the training process. This means you need a large, complex, all-to-all network to train efficiently. On the other hand, inference is a feedforward operation, where each chip only communicates with the next chip in the inference pipeline. Many AI chips from startups have limited networking capabilities, making them less suitable for the full connectivity required for training but sufficient to handle inference workloads. Conversely, Nvidia has excelled at addressing the memory and networking challenges required for AI training.
Nvidia Excels at Training
Since the release of AlexNet in 2012, Nvidia has been the preferred hardware for both inference and training. The versatility granted to GPUs by CUDA allows them to perform all the operations required for both training and inference. Over the past decade, Nvidia has not only focused on building hyper-optimized chips for machine learning workloads but has also been optimizing its entire memory and network stack to support large-scale training and inference.
With a large amount of HBM on each chip, Nvidia hardware can easily and efficiently cache all gradient updates generated at each training step. With vertical scaling technologies like NVLink and horizontal scaling technologies like Infiniband, Nvidia hardware can handle the all-to-all connections required to update all weights of large neural networks after each training step. In contrast, competitors like Groq and d-Matrix, which focus solely on inference, lack the memory and networking capabilities needed to compete with Nvidia in the training space.
But SambaNova chips do have HBM. SambaNova chips have point-to-point networking at both the server and rack levels. Why can’t they handle training problems like Nvidia?
It turns out that Nvidia not only has the HBM and networking technology to enhance training performance. They have invested a significant amount of effort into low-precision training, and top AI labs have also invested considerable effort into tuning algorithm hyperparameters to better fit the specific complexities of Nvidia’s low-precision training hardware. Transitioning from Nvidia chips to SambaNova chips for training requires modifying extremely sensitive training code to run on entirely new hardware, which introduces a whole new set of risks. For large models on the scale of GPT-4, the costs and risks of doing so are enormous.
SambaNova’s shift to the inference space demonstrates that even if an AI chip startup can provide memory and networking capabilities to compete with Nvidia, it is not enough to compete with this giant in the training market. If a startup wants to challenge Nvidia in the training space, they need to deliver remarkable training performance to overcome Nvidia’s inertia in the training market. So far, no one has been able to do that.
Original Link
https://www.zach.be/p/why-is-sambanova-giving-up-on-ai
END
👇 Recommended Semiconductor Quality Public Account 👇
▲ Click the card above to follow
Focusing on more original content in the semiconductor field
▲ Click the card above to follow
Follow global semiconductor industry trends and movements
*Disclaimer: This article is original by the author. The content reflects the author’s personal views, and Semiconductor Industry Observation reproduces it only to convey a different perspective, not representing Semiconductor Industry Observation’s endorsement or support of this view. If there are any objections, please contact Semiconductor Industry Observation.

Today is the 4025th issue shared by “Semiconductor Industry Observation” for you, welcome to follow.
Recommended Reading
★A Chip That Changed the World
★U.S. Secretary of Commerce: Huawei’s Chips Are Not That Advanced
★“ASML’s New Lithography Machine, Too Expensive!”
★The Quiet Rise of Nvidia’s New Competitors
★Chip Crash, All Blame Trump
★New Solutions Announced to Replace EUV Lithography
★Semi Equipment Giants, Salaries Soar by 40%
★Foreign Media: The U.S. Will Propose Banning Software and Hardware Made in China for Cars



『The First Vertical Media in Semiconductors』
Real-time, Professional, Original, In-depth
Public Account ID: icbank
If you like our content, please click“Looking” and share it with your friends!
