By丨Lanxi
According to Chuck Martin, a digital marketing expert in the United States, the moment the second and third screens—namely personal computers and smartphones—pulled the first family member away from the living room, the living room entertainment culture, which thrived on television media, was destined to become the first traditional barrier to be dismantled by the internet.
However, with the advancement of technology, the defense of living room entertainment has become increasingly diverse, from the highly anticipated smart home systems to key roles like smart TVs and smart speakers. The integration of internet forces and traditional habits has created a new era of living room entertainment economy.
Among these, smart speakers have become the most noteworthy component. The reason lies in the development of voice interaction technology, which allows smart speaker products to provide not only entertainment content such as music, audiobooks, and information queries but also life services. Their integrated voice interaction capabilities, when incorporated into home life scenarios, make them the control center for smart homes in the Internet of Things era.
As early as 2016, Mary Meeker, known as the ‘Queen of the Internet’, defined voice as the new generation of ‘computing interface platform’ in her Internet Trends report, viewing voice as the next future entry point for human-computer interaction.
On June 26, domestic artificial intelligence interaction technology company Rokid launched three new products at the Rokid Jungle 2018 conference: the portable smart speaker Rokid Me, the AI chip KAMINO18, and the AR glasses Rokid Glass, once again attracting industry attention.
It is worth mentioning that Li Haibo, Vice President of Ximalaya, endorsed Rokid, announcing that the recently launched Xiaoya mini smart speaker set a series of impressive sales records on its launch day, with Rokid as the technology provider behind it.
Among the three products released by Rokid, the AI chip KAMINO18, which was independently developed, received the highest attention. This chip, about the size of a one-yuan coin, differs from the current mainstream general-purpose chips as it integrates multiple core components such as ARM, NPU, DSP, DDR, and DAC, achieving a high level of integration for the chip module. Additionally, it combines Rokid’s phased array technology, CTC model, custom wake words, offline voice commands, and low-power wake-up voice algorithms, resulting in a 30%-50% reduction in power consumption during the operational state of products equipped with KAMINO18.
Furthermore, this chip, which incorporates Rokid’s latest algorithms, will significantly enhance the performance of smart speakers in complex scenarios and support more offline functionalities.
In simple terms, KAMINO18 has three major advantages over mainstream chips on the market: high integration, low power consumption, and low cost, while maintaining equivalent performance.
More importantly, the emergence of KAMINO18 marks a significant advancement in Rokid’s AI chip technology in the field of voice interaction, enabling the customization of voice solutions for enterprise users with different needs.
This will largely become a key driver for the domestic smart speaker industry to enter a phase of qualitative development.
In fact, as an important entry point for home entertainment in the new era of artificial intelligence, the smart speaker market has been booming in recent years, but it has always faced many pain points, such as the difficulty of technology integration, long development cycles, high costs, and a lack of in-depth customized TTS solutions.
Among these, the lag in technological advancement and application integration, which serves as the underlying infrastructure for products, is the fundamental reason. For example, if we break down voice interaction into precise recognition at the front end and intelligent learning at the back end, most companies today excel at the former but are weak at the latter, as the workload of the two is vastly different.
Looking at the current mainstream manufacturers in the industry, including Sobot and iFlytek, they all use general-purpose chips rather than a complete voice technology solution. This means that smart speaker companies need to develop or integrate back-end solutions themselves, such as Xiao Ai using voice technology from Voice Tech for the front end, while Tmall Genie uses Sobot’s front end and ASR.
This directly leads to the pain points of long development times and high costs for smart speaker products. For instance, the development of Xiao Ai took nearly nine months, Tmall Genie took nearly a year to test, and Tencent Tingting spent two years refining its product.
Before officially launching KAMINO18, Rokid had already provided a very complete customized voice solution for Ximalaya’s Xiaoya mini, conducting precise adjustments based on user habits and preference data analysis to match content with product functionality. The Xiaoya Mini, equipped with Rokid’s front-end algorithm CTC, has a wake-up rate that leads the industry in both quiet and AEC noise environments.
On the other hand, after using the customized solution provided by Rokid, Ximalaya no longer has to bear the costs of core technology and algorithm research, integration, and application, significantly reducing both development costs and time. The most direct result is that the product price dropped from 999 yuan to 299 yuan.
Behind this is Rokid, one of the few companies in the industry capable of providing a complete end-to-end voice solution, including front-end, ASR, NLP, and TTS, and is leading in many technical aspects. This is reflected in its ability to provide customized solutions for different product needs, including children’s story machines, smart TVs, and smart home devices, compared to other competitors’ general solutions.
It is clear that compared to chip manufacturing companies, Rokid is essentially more like a technology output enterprise, focusing on providing sufficient computing power through chips, achieving better matching between chips and algorithms, and reducing costs, with the core focus on outputting solutions.
In fact, before launching the KAMINO18 AI chip product, Rokid was known in the industry for several self-developed smart speaker products, such as the smart home robot Rokid Alien, the smart speaker Rokid Pebble, and the smart home voice remote control Rokid Mini.
The transition from product manufacturer to chip developer, or from industry competitor to technology enabler, is telling a new story for Rokid.
Just like during the American gold rush, while some gold miners may have struck it rich overnight, many others ended up losing everything. Instead, those who turned to sell water to the gold miners made a fortune and were written into business school textbooks of that time.
In addition to launching several successful smart speaker products, Rokid is also embarking on the path of empowering the industry with technology, becoming the ‘water seller’ in the gold rush of the smart speaker industry.
More than eighty years ago, Lewis Mumford warned society to be wary of the possibility of being subservient to precision machines.
“When worshippers can perceive God through printed copies of the Bible, they are weakened in their desire to experience the priest’s preaching in church. When the telephone rings at any time, unbound by its owner, the continuity of work and life becomes expensive.”
But as Einstein said, “I never think about the future, because it comes soon enough.”
The arrival of artificial intelligence and the Internet of Things as the next era is almost a certainty. For Rokid, what they are doing is not just a business about AI technology, but a new era of industrial division of labor, maximizing the interests of multiple parties.