Smart speakers are devices that utilize technologies such as computer networks, the Internet, audio and video to provide users with information and life services, featuring functionalities like voice recognition, voice interaction, natural language understanding, and speech synthesis. Currently, security issues with smart speakers are frequent: In May 2017, Alexa was reported to have eavesdropped on user conversations and sent recordings to strangers in the user’s contact list. At the 2018 DefCon security conference, security experts demonstrated attacks on Amazon’s Echo smart speaker, enabling them to listen in on users and control the content played by the speaker.
1. Analysis of Security Risks of Smart Speakers
By reviewing the security status of major domestic smart speaker manufacturers, it is found that the security risks faced by smart speakers can be categorized into six aspects: network security risks, data security risks, algorithm security risks, information security risks, social security risks, and national security risks.
(1) Network Security Risks
The “cloud + edge” model has security vulnerabilities that can lead to system security issues. Many smart speakers on the market use a cloud platform + two ends (mobile terminal and smart device end) framework for communication. However, due to the lack of strict security management and authentication between the cloud platform, mobile terminal, and smart device end, there may be vulnerabilities or backdoors. If exploited, it could jeopardize the availability of smart speaker products or applications, potentially turning them into “zombie” devices controlled remotely by attackers to launch DDoS (Distributed Denial of Service) attacks. A report from Tencent Security Lab indicates that hackers typically attack smart speaker devices through weak passwords and remote command execution vulnerabilities, using worms and autonomous mass attacks to control a large number of target devices and build a massive botnet.
(2) Data Security Risks
Manufacturers collect user information through technical means, increasing the risk of personal privacy breaches. Smart speaker products need to continuously listen for wake words to function; when the default wake word is triggered, the speaker automatically records the user’s voice and sends the collected audio back for analysis to execute commands. The intelligence of smart speakers relies on vast amounts of data, including substantial amounts of user personal information, to provide personalized and customized services. Against this backdrop and profit-driven motives, it has become common for manufacturers to collect user privacy through technical means. For example, Amazon was reported to have sent chat records between family members to people in the contact list without user authorization.
Hacker attacks come from multiple sources, and once breached, they can lead to user data leaks. Since both the cloud and edge store user personal information and other private content, they have become hotspots for hacker attacks. Particularly, with the maturity of reverse engineering techniques for Android and iOS applications, and the diversity of cloud platform attack methods, breaching any point can successfully obtain detailed user information, preferences, and other sensitive data for illegal activities such as telecom fraud and phone harassment. For instance, after successfully invading, attackers can easily obtain sensitive information about victims, including home addresses and phone numbers, and misuse them for illegal activities.
(3) Algorithm Security Risks
Adversarial sample attacks can induce misjudgments in voice recognition, leading to security risks. As smart voice technology becomes widespread, smart speakers, like any software, can be exploited for illegal purposes. For example, Berkeley AI researchers Nicholas Carlini and David Wagner invented a new type of attack method against voice recognition AI. By adding slight noise, they can deceive the voice recognition system to produce any output desired by the attacker, thereby enabling identity theft and deceiving authentication systems.
(4) Information Security Risks
Smart speakers accelerate the spread of harmful information, increasing the difficulty of content regulation. As the quantity of online content continues to grow, the content sources of smart speakers may become new means for criminals to disseminate harmful information such as pornography and terrorism, making it more challenging to block harmful content. In addition to the inability of smart speaker manufacturers to identify or handle illegal content related to pornography, explosives, and politics, the lack of self-regulation awareness among online dissemination platforms has become a major reason for increased content regulation difficulties. For example, recent reports have surfaced on audio platforms like Lizhi, Qingting, and Ximalaya, where some audio content has included sexual innuendos or explicit sexual provocations, with many listeners being minors. The view counts for such content range from tens of thousands to hundreds of thousands, seriously endangering the physical and mental health of adolescents.
(5) Social Security Risks
The overly anthropomorphized design of smart speakers brings new ethical and moral risks. With the advent of smart speakers with screens, they combine visual and auditory elements to create an “anthropomorphized” existence, giving smart speakers more human-like characteristics. If children are placed in this environment, they may perceive the robot as a living individual, believing it has thoughts and can feel pain, thus becoming their friend. However, prolonged interaction with smart products can lead children to form a certain belief, potentially resulting in an unhealthy attachment to intelligent products. For instance, responsibilities that should be handled by parents and others shift to smart devices, reducing children’s opportunities for social interaction and leading them to interact with “anthropomorphized” machines instead of humans, thus generating ethical and moral risks.
(6) National Security Risks
Smart speakers can become tools for information theft, indirectly threatening national security. Smart speakers can capture various biometric features, including voiceprints. If policymakers and their colleagues live or work in an environment with smart speakers, these devices can profile individuals by gathering user information, potentially predicting decisions even before decision-makers make them, indirectly posing severe threats to national security. For instance, during a speech by Gavin Williamson at a UK parliamentary meeting in July 2018, his phone’s Siri interrupted him while he was discussing Syria issues in a high-level government meeting, having “eavesdropped” without authorization and offered its insights.
2. Suggestions for Promoting the Safe Development of Smart Speakers in China
(1) Timely Implementation of Regulatory Policies to Protect User Data and Privacy
In terms of information security, efforts should be accelerated to formulate and implement regulatory policies for the content sources of smart devices, further refining the responsibilities of stakeholders from legal and policy perspectives. At the same time, potential security risks such as data theft by smart speaker manufacturers should be reviewed to ensure the safety and controllability of smart speakers. Smart speaker service providers should implement self-supervision, enhance governance methods for harmful content sources, and improve their technical capabilities for identification and governance. Additionally, manufacturers should encourage more users to participate in this process, enhancing security awareness. For instance, by strengthening the promotion of cybersecurity laws to inform users about the governance of harmful information online, encouraging them to actively participate and become important forces in governance.
In terms of data security, the following three points should be achieved: First, companies should be transparent about data protection concerning users, collecting user data only with their permission in specific situations, and should not use or transfer user data without authorization. Second, user personal information should be classified according to content and provided with varying levels of protection based on the value and security risks of each type of information, which should be disclosed to users. Lastly, user data security protection measures should be strengthened, improving the security mechanisms for user data, and regularly conducting penetration testing on cloud and edge sides to reduce data security risks.
(2) Improve Technical Standards and Legal Constraints to Reduce Ethical and Moral Risks
On the social security front, relevant government departments can lead the establishment of an Artificial Intelligence Device Management Committee, consisting of technical experts, social science researchers, and government administrators, to review and assess research and development projects involving AI devices, strictly controlling the transition from technology to product. Developers could be required to incorporate human legal norms and moral requirements into the code of machines, digitizing all language to comply with human behavioral standards. Smart speaker manufacturers could also embed users’ social and psychological conditions into the design of intelligent devices, breaking down ethical barriers inherent in smart devices. From the design of programs, measures should be taken to avoid ethical deviations and minimize the excessive anthropomorphization of smart devices, thereby reducing the ethical and moral risks posed by these devices.
(3) Enhance Emergency Response Capabilities and Improve the Robustness of Voice Algorithms
In terms of network security, security capabilities can be enhanced from two aspects. First, security development and design should be integrated into cloud and mobile terminals, introducing security testing and regularly conducting penetration tests and risk assessments. Second, comprehensive security monitoring and operations should be established for cloud and mobile terminals, forming complete security protection measures to ensure the security and perception capabilities of both cloud and mobile sides.
In terms of algorithm security, it is necessary to enhance the response and processing capabilities for unreasonable data inputs, and improve the robustness of voice recognition algorithm models, minimizing potential risks associated with algorithm models to ensure model reliability, and establish emergency response plans that are reliable and accurate for voice algorithm models.
Author Bio
Gong Wenquan, an engineer at the Cloud Computing and Big Data Research Institute of the China Academy of Information and Communications Technology. His research focuses on network security, information content security, and related technical research and standard formulation.
Contact: [email protected]
Reviewed by | Chen Li, Shan Shan
Edited by | Ling Xiao
Welcome to share and forward!
If you need to reprint, please contact us for authorization:
Email: [email protected]
Content Recommendations





Recommended Reading
Shared bikes enter a critical period of quality improvement and transformation
Recommendations related to the development and regulation of Internet finance
