The Perfect Form of Smart Speakers: A Technical Exploration

The Perfect Form of Smart Speakers: A Technical Exploration
Ten years ago, the movie “Iron Man” was released, featuring the intelligent butler Jarvis, who assists the protagonist Tony Stark in managing various tasks and calculating information. With its seamless human-computer dialogue, almost omnipotent skills, personality, and independent thinking, Jarvis became an ideal representation of AI assistants for many.
The Perfect Form of Smart Speakers: A Technical Exploration
Indeed, sci-fi works have never lacked imagination. Remember the early sci-fi anime “Chobits”? It also depicted a more idealized AI world: AI transformed into humanoid forms to serve humanity.
The Perfect Form of Smart Speakers: A Technical Exploration
Of course, these are merely fantasies of sci-fi works, and there remains a significant gap between them and reality. However, some functions previously deemed sci-fi, such as voice assistants, have already appeared in products like smartphones and smart speakers.
The Perfect Form of Smart Speakers: A Technical Exploration
Image source: TechHive
Compared to smartphones, smart speakers, which have not yet achieved blockbuster sales, have become the new “containers” for AI that many companies are betting on. In today’s smart speaker landscape, not only are there guiding influences from Silicon Valley tech giants, but also a number of domestic startups are heavily investing.
Despite the increasing presence of smart speakers, many users report that the current generation of smart speakers still fails to spark their desire to purchase.
But where exactly is the problem?

The Perfect Form of Smart Speakers: A Technical Exploration

This may be attributed to a sense of “disparity”.
The Perfect Form of Smart Speakers: A Technical Exploration
The earlier comparison of AI scenarios in sci-fi works with real-world AI situations leads me to express a point—this disparity. In reality, most users experience a stark contrast before and after purchasing a smart speaker:
The Perfect Form of Smart Speakers: A Technical Exploration
Before buying: These features seem so convenient and useful!
After buying: It seems to be just okay…
If we further analyze and break down the sense of disparity brought about by the user experience of smart speakers, we can summarize several points that contribute to this feeling of contrast.
Features Are Not Essential
Currently, the functionalities of smart speakers are largely similar; they mainly focus on playing audio content and have expanded capabilities like chatting, controlling smart home devices, checking the weather, telling jokes, checking traffic, setting alarms, etc.
The Perfect Form of Smart Speakers: A Technical Exploration
XX, set an alarm for 7 AM.
XX, play a song by Jay Chou.
XX, do I need to bring an umbrella today?
…
In reality, for users, these features are not essential. Moreover, many smartphones already have voice assistants, allowing users to achieve similar functionalities and experiences through them.
The Perfect Form of Smart Speakers: A Technical Exploration
Image source: Tata CLiQ
Does Not Understand Human Intent, Answers Irrelevantly
Contrary to the beautiful scenarios depicted in promotional videos, in actual use, voice recognition rates and sentence understanding remain obstacles to users’ positive experience.
Firstly, regarding voice recognition rates, many users likely lose interest in smart speakers after multiple failed attempts to wake them or when the speaker fails to accurately recognize commands.
The Perfect Form of Smart Speakers: A Technical Exploration
Secondly, regarding the issue of sentence understanding, many smart speakers fail to grasp your meaning at times. When the questioning becomes complex or the sentence structure does not align with the speaker’s presets, it easily misinterprets your commands as song searches or web search keywords.
Lacks “Fluent” Conversational Ability
Because it does not understand human intent and answers irrelevantly, and because it has slow response times and requires wake words before accepting commands, the interruptions and inconveniences caused by these issues make it difficult to engage in a smooth conversation with smart speakers.
Additionally, many smart speakers currently do not support contextual relevance in chat content; simply put, they cannot remember what you said in the previous sentences.
AI Still Needs User Assistance to Cultivate
In reality, current smart speaker products cannot achieve true “intelligence.” They need to continue accumulating user data and analyzing usage patterns to gradually enhance their intelligence so that they can understand you better when you need them.
The Perfect Form of Smart Speakers: A Technical Exploration
This process may take a relatively long time to mature, which may not be acceptable to users who have purchased the product. This could leave them with a subjective impression of smart speakers as technically immature and impractical.

The Perfect Form of Smart Speakers: A Technical Exploration

What do users hope to gain from smart speakers?
Forbes mentioned in the article “Key Definitions Explaining the Importance of Artificial Intelligence” that the definition of AI has gradually shifted from traditional literal understanding to three forms it attempts to achieve:
  • Building systems similar to human thought processes (Strong AI)

  • Systems that execute commands without understanding human thought processes (Weak AI)

  • Systems that evolve and develop based on human thought processes as a template

From the above concepts, current smart speakers fall into the category of Weak AI; they lack independent thinking, cannot fathom user intentions, and only execute preset commands.
The Perfect Form of Smart Speakers: A Technical Exploration
For such AI smart speakers, it is unrealistic to expect them to understand you like Jarvis or humanoid computers. However, returning to actual needs, what do users want from smart speakers? If we summarize the issues listed above, we can derive two main aspects of demand: one is to be sufficiently intelligent, and the other is to have a wide range of skills and services.
Must Be Sufficiently Intelligent
This can be further broken down into two parts: one is that smart speakers should “know you and understand you,” and the other is to anticipate your needs as much as possible.
“Know you and understand you” means that the speaker should recognize your commands and comprehend your intended meaning. In simple terms, it refers to the accuracy of voice recognition and the ability to understand sentences.
The Perfect Form of Smart Speakers: A Technical Exploration
Previously, when I experienced the Raven H smart speaker, its voice recognition accuracy impressed me: even at 80% volume playing music and at a distance of about 3 meters, it could respond and accurately recognize voice commands.
As for sentence understanding, most smart speakers still operate within the confines of preset sentence structures.
However, the issue of “knowing you and understanding you” can fundamentally improve with the gradual maturity of dedicated voice chips and support from voice interaction ecosystems.
As for anticipating your needs, it involves understanding user habits to predict what users may need. Currently, most smart speakers lack this capability. So, is there a solution? Raven has implemented a “non-intelligent” approach to address this issue.
The Perfect Form of Smart Speakers: A Technical Exploration
On the Raven H, Raven provides a Flow feature, which, after manual setup by the user, will automatically report the day’s weather, traffic, restrictions, and your schedule after the alarm goes off each day.
Although this still falls short of truly anticipating your needs, it serves as a compromise while AI technology matures.
Skills and Services Must Be as Abundant as Possible
After using various smart speakers, what impressed me most were not the basic functions like playing music, checking the weather, or telling jokes, but the extended functionalities brought by their skills and services.
The Perfect Form of Smart Speakers: A Technical Exploration
For example, I remember that Tmall Genie X1 can order takeout and recharge phone credits; Xiaomi AI speaker and Raven H can find your phone…
The Perfect Form of Smart Speakers: A Technical Exploration
Although some of these skills and services may not be the most needed or practical for every user, in a space where basic functionality experiences do not differ significantly, these small skills and services can provide unique advantages for a smart speaker compared to its competitors.
In fact, the Amazon Echo smart speaker, which serves as a reference, initially had average sound quality and insufficient intelligence, but as Amazon continually added more skills, it eventually became a benchmark.
The Perfect Form of Smart Speakers: A Technical Exploration
One of the advantages Echo achieved through skills and services was extending functionalities to more smart home devices and establishing interactivity with them.
When it comes to integrating smart home devices, we previously experienced the convenience of Apple’s model home, where Apple had not yet launched the smart speaker HomePod, but many HomeKit devices were still controlled by Apple TV or iPad as the central hub.
The Perfect Form of Smart Speakers: A Technical Exploration
However, controlling with Apple TV was somewhat cumbersome, and since iPads are often carried around, they easily lose remote control over smart home hubs. In this scenario, a speaker is the best solution.
After all, users are faced with simple logical operations, such as controlling device switches and adjusting temperatures. These can be controlled through voice without needing a screen. In this context, users only need a connected speaker with a smart assistant to solve the problem.
The Perfect Form of Smart Speakers: A Technical Exploration
Therefore, expanding the skills and services of smart speakers as much as possible is also a key point.
In summary, as a product based on Weak AI, smart speakers essentially serve as an “auxiliary” tool in life. However, this does not mean they lack the potential to become necessities in life. With the continuous development of AI technology, improvements in hardware chip functionalities, breakthroughs in core technologies like voice recognition, and the popularization of smart homes, it is foreseeable that products like smart speakers will gradually become mature, reliable, and better serve our lives.
Who knows, perhaps the future form of smart speakers will indeed transform into an omnipresent Jarvis or a humanoid computer?
Image source: SailorBomber – DeviantArt

Follow the WeChat public account iFanr (ID: ifanr) and reply with the following keywords to access hot articles.

“iMac Pro Experience: Why Is Apple’s Most Expensive Computer So Expensive?”

Keyword: Most Expensive

“How Much Does It Cost to Assemble a VR Set from Ready Player One?”

Keyword: Ready Player One

“Comprehensive Summary of Apple’s Spring New Products: Besides the Cheapest iPad Ever, There Are Other Surprises”

Keyword: Spring

“Did You Think Bilibili Was Already Irreverent Enough? A Hong Kong Media Launched on the Same Day Is Even Funnier”

Keyword: Irreverent

“Is Starbucks Coffee Carcinogenic? You Should Drink 100 Cups Daily Before You Say That”

Keyword: Debunking

The Perfect Form of Smart Speakers: A Technical Exploration

Apple’s 2018 New iPad Hands-On: A Blessing for Students?

On Douyin, there is an iFan you haven’t seen before.

Search Douyin account: 625201449

The Perfect Form of Smart Speakers: A Technical Exploration

Leave a Comment