In 2007, Steve Jobs introduced a phone called the “iPhone”. During his presentation, images on the 3.5-inch screen zoomed in and out with the movement of his fingers, creating a tremendous visual impact. This phone, which Microsoft CEO Steve Ballmer mocked as “a device without a keyboard”, changed the trajectory of the entire world. Smartphones brought about a mobile internet revolution, and touch technology, as an interaction method closer to human instinct, first opened the door to the intelligent world. In fact, different groups of people interact with smart devices in various ways; it may be through touch, buttons, or commands to a computer. However, there is one method that is not only closely related to human instinct but also transcends profession, attributes, and age: voice.
After smartphones, all innovators have been exploring another market opportunity that could match it. We have also seen that under the wave of intelligence, from wearable devices to a plethora of smart devices, many have passed through history like countless meteors, leaving no trace. However, the only product that has truly reached mainstream market standards is the smart speaker. This once-mocked niche product, referred to as a “tech toy”, has shown remarkable potential, second only to smartphones, transitioning from niche to the general public. Through continuous iterations of product forms by leading brands like Baidu, Alibaba, and Xiaomi, smart speakers have evolved from screenless to screen-enabled, with increasingly refined voice interaction capabilities, enhancing user experience. At the same time, the content ecosystem of smart speakers has benefited from the competitive relationships among several oligarchs, greatly enriching it and bringing quality services into thousands of households. The drive of human instinct, the maturity of technology, the richness of product diversification, combined with the full investment of several giants in market segmentation, have directly pushed smart speakers to the center stage of market development.Can smart speakers replicate the miracle of smartphones? Oligopoly MarketAt the end of 2014, Bezos launched a speaker called “Echo” to compete against Apple’s Fire Phone. Echo can play music and answer simple home questions. Similar to the initial market entry of the iPhone, many believed this “deceptive gadget” would be short-lived. Unexpectedly, five years later, smart speakers have become the second largest category of smart interactive hardware, second only to smartphones.Data shows that in 2019, global shipments of smart speakers reached 140 million units, with Echo leading at 37 million units shipped. In the domestic market, Baidu’s smart speaker ranked first with an annual shipment of 19 million units, already possessing global competitiveness.
Initially, the smart speaker was developed internally at Baidu as a benchmark product. At that time, Baidu’s Vice President and General Manager of the Smart Life Group, Jing Kun, explained that the entire tech industry urgently needed a new scene or device to carry AI capabilities and resources. Clearly, the smart speaker was seen as a new species born for AI. The explosion of smart speakers in the Chinese market began in 2018. The three domestic giants, Baidu, Alibaba, and Xiaomi, quickly established their foothold through high cost-performance strategies. Xiaomi offered a promotional price of 99 yuan during its fan festival, while Alibaba’s Fangtang speaker also adopted a cost-performance strategy to capture users. During that year’s Double Eleven shopping festival, Baidu reduced the price of its smart speaker from 99 yuan to 69 yuan, maximizing cost performance. However, Baidu’s true killer move was still technological innovation; in 2018, Baidu first proposed the concept of a screen-enabled speaker, launching the Xiaodu Smart Video Speaker, with the price set at 599 yuan, personally endorsed by Li Yanhong. Compared to traditional speakers, the Xiaodu screen-enabled smart speaker integrated both voice and touch interaction modes, further enriching the user’s visual experience and expanding the boundaries of smart speakers, officially ushering in a new era of multimodal interaction.
Affordable prices, innovative products, and new modes of voice interaction, combined with the fierce competition among the three giants, propelled smart speakers into the oligopoly era. This aligns with the global market structure of smart speakers, where research firm eMarketer reported that 69.7% of users in the US chose Echo, while nearly 30% opted for Google, reflecting a similar oligopoly structure. According to the “2020 Q1 Industry Development Summary of China’s Smart Speakers” report, the market concentration of the top three brands in the Chinese smart speaker market further deepened in Q1 2020, reaching 93.7%. Among them, Baidu’s performance was particularly eye-catching, with reports from Canalys, Strategy Analytics, and IDC all indicating that Baidu led in domestic shipments for all four quarters of 2019. According to the latest data report from Strategy Analytics, Baidu again secured the top spot in domestic shipments in Q1 2020, achieving a historic five consecutive championships.
Three Major Drivers The “Battle of the Hundred Speakers”, which was once much talked about by the media, actually did not stir up much in the market and was replaced by the three strong players of BAM. In fact, the clear market structure has given smart speakers higher recognition, standardization, and opportunities to move towards mainstream marketization through technological innovation. Initially, Apple defeated traditional mobile phone brands like Nokia, Motorola, and Sony Ericsson through touch mode. Voice, like touch, is also one of humanity’s most primitive demands. Once this market ignites, the effect will be no less than that of Apple replacing traditional phones. As Jing Kun said, “Every wave of technological change is driven by the transformation of human-computer interaction. Users have transitioned from obtaining information through mouse and keyboard on computers to touch screen modes, and now to using Xiaodu Assistant for voice dialogue. The interaction method has become more natural, and more users are engaging with and using AI voice interaction, driving a new era of change.” The core driving force is technological innovation. Smart voice, as the fundamental technology of smart speakers, determines the quality of user experience. Baidu itself is an AI company, and smart voice recognition and analysis are its core competencies. A report from the Chinese Academy of Sciences titled “Analysis of Intelligent Technologies in Smart Speakers and Their Maturity Assessment” shows that in terms of understanding and interpreting user commands, the Xiaodu series of smart speakers is the only product with a comprehension rate exceeding 90%, leading the domestic market for both screenless and screen-enabled products; in terms of meeting user needs and experience, Xiaodu also scored first in satisfaction assessments, with user satisfaction for screen-enabled speakers being relatively higher. From a technological perspective, the future landscape of smart speaker giants will also differentiate, with Baidu, possessing stronger technical capabilities, being in a more advantageous position. Another driver is the creation of more product experiences. In Jing Kun’s words, “Smart speakers are no longer just speakers.” Public data shows that users average over 30 interactions with Xiaodu daily, with an average viewing time of 158 minutes for long videos, 69 minutes for short videos, over 30 minutes for children’s learning, and a penetration rate of over 30% for video calls, indicating that user stickiness and activity levels are significantly higher than those of other brands.
Of course, in today’s internet era, educating users about more life values, expanding user bases, and allowing users to experience the benefits and advantages of smart speakers through more diversified channels are also key to their mainstream marketization. Observant friends may have noticed that Xiaodu has once again secured the naming rights for the fourth season of “The Life We Long For” and has collaborated with several major TV stations on popular variety shows. In an era of universal entertainment, one must not underestimate this “screen dominance” effect. Under the theory of attention economy, Xiaodu has taken the lead in initiating the second round of the smart speaker battle. Under the influence of the three elements of technological innovation, product experience, and user education, where will the mainstream smart speaker market go next? Commercialization and MainstreamingThe mainstream market will inevitably have mainstream competition specifications and standards. When these giants control the core technologies and market rules of smart speakers, smaller players participating in the battle will have already collapsed. Therefore, future market differentiation will still favor the strong, with no opportunities for startups. The details of the competition among giants will become more critical, requiring strong technical foundations and continuous product iterations and innovations. Objectively speaking, the level of marketing is more useful in the early stages of the market; as it matures, it becomes a contest of technology and product experience, inevitably leading to some companies falling behind. For today’s mainstream players in the market, the depth of commercialization of smart speakers also determines their potential to reach the finish line in this marathon. The degree of commercialization also determines the strength of smart speakers’ ability to break boundaries. Jing Kun mentioned that “Xiaodu at Home can reach populations that have not been deeply penetrated by the internet, such as the elderly and children.” This is actually a manifestation of Xiaodu’s successful commercialization. In the past two years, Xiaodu has built a commercial closed loop with paid skills, developer subsidies, and more, accumulating tens of thousands of paid products such as “Uncle Kai’s Storytelling”, “Meituan Takeout”, “Wukong Literacy”, “Electronic Pets”, and “Pocket Stories”. During this year’s pandemic, Xiaodu partnered with “Uncle Kai’s Storytelling”, “Wukong Literacy”, and “Yifang Education” to position itself in online education, and in late March, it rapidly introduced entertainment resources from Kuaishou, Douyin, Bilibili, Youku, Quanmin K Ge, Ximalaya, and Lizhi Live. The speed of commercialization is astonishing. It is evident that Xiaodu’s accelerated commercialization process is built on the continuous expansion of its content ecosystem and profit model. In fact, smart speakers are merely the external form of Xiaodu, a way to occupy the entry point. Baidu’s vision for Xiaodu is to create a conversational AI operating system. From the subsidy wars of smart speakers to Xiaodu’s successful attempts at commercialization, Xiaodu has also validated the enormous market potential behind this small product. From being underestimated to becoming the most mainstream intelligent product in the market, smart speakers have reached today’s commercial success through the technological iterations and continuously enriched content from several major companies. The smartphone market has become saturated, and smart speakers can fully take over, carrying greater imaginative space for intelligent experiences.