Original by Machine Heart
Author: Lu Xinfeng
Editor: Joni
In August of this year, Professor Zhang Kehua’s research group at the Chinese University of Hong Kong published a paper on arXiv, showcasing their research on the privacy of smart homes. The authors attempted to use an LSTM model to predict active devices in smart homes. This prediction could allow Internet Service Providers (ISPs) to infer what types of devices users are using at home, potentially leading to different marketing strategies for users with different devices.

-
Paper link: https://arxiv.org/pdf/1909.00104.pdf
Prior to this, many researchers had conducted related studies, but most of their research was based on clean laboratory environments, making it difficult to transfer to complex real-world scenarios. By analyzing IoT devices in the real world and public datasets, the authors found that the traffic of IoT devices differs from desktop and mobile traffic in the following ways:
-
Devices of the same category exhibit similar traffic patterns (the image below shows the traffic changes when two voice assistants recognize voice commands).

-
Devices have a “heartbeat” transmission to ensure connectivity between the network and devices, with different devices having different “heartbeat” patterns.
-
The proportion of different device transmission protocols varies (the image below shows the protocol usage of IoT devices versus non-IoT devices).

The authors believe that these features indicate that even in complex scenarios, and with certain security devices (NAPT and VPN), it is still possible to identify different IoT devices. Since existing datasets did not meet the authors’ requirements, the research team built their own data collection system.Data CollectionThe system includes 10 IoT devices and 4 non-IoT devices, as shown in the image below.
The authors planned to collect traffic information in three environments: a single device environment, a multi-device noisy environment (using NAPT technology), and a VPN environment.First, let’s introduce NAPT technology and VPN technology.NAPT is a type of Network Address Translation technology that, unlike NAT, supports port mapping.NAT performs the conversion between local IPs and NAT’s public IP, thus limiting the number of hosts in a local area network that can communicate with the public network based on the number of public IP addresses available.NAPT overcomes this limitation—while performing IP address translation, it also translates ports, allowing multiple hosts in a local area network to communicate with the public network using a single NAT public IP as long as there are no port conflicts.VPN is typically used to interconnect different networks to form a new network with greater capacity.It is based on the IP tunneling mechanism, allowing hosts in different subnets to communicate with each other and securely transmit information through authentication and encryption.During the traffic generation process, the authors employed two triggering methods:manual triggering and automatic triggering. Manual triggering simulates human-machine interaction in a real environment, while automatic triggering reduces the burden on the experimenter.In automatic triggering mode, the authors used Monkey Runner to trigger IoT devices that require interaction via an app;for voice assistants and other IoT devices, the authors triggered them by repeatedly playing commands.Manual triggering mode was only used in multi-device scenarios, where the authors triggered the experimental devices in the room by randomly entering and exiting the room.This method is more random compared to automatic triggering, thus aiding in the model’s generalization.The entire traffic collection process lasted 49.4 hours, collecting 4.05GB of data, which included 7,223,282 valid communication packets.Data PreprocessingBefore conducting experimental evaluations, the authors preprocessed the data—converting the initial data into numerical vectors that the model can handle.The data preprocessing process can be divided into two parts: feature extraction and labeling of data packets.During feature extraction, five features were extracted: destination port (dport), protocol, direction, frame length, and time interval, which were combined into a one-dimensional vector, as shown in the image below.
In the process of labeling data packets, the authors discovered the following patterns to accurately label packets in a VPN environment where labeling is challenging:
-
After VPN processing, the size of the data packets increases.
-
Data packets of different sizes become the same size after VPN encryption.
-
VPN causes transmission delays in data packets, typically shorter than 0.02 seconds.
Model SelectionFor model selection, the authors chose three models:Random Forest (baseline model), LSTM model, and BLSTM (Bidirectional LSTM) model.Since Random Forest cannot directly learn discrete values, the authors performed one-hot encoding on the feature values of the ports.For the LSTM model, the authors also processed the input data by grouping multiple consecutive vectors into traffic windows, as shown in the image below.
The LSTM model used by the authors is shown in the image below.This model consists of multiple basic modules, each containing an Embedding layer, LSTM layer, fully connected layer, and Softmax layer.
Since the LSTM model can only look at the “past” of the data packets when learning contextual information, the authors also used the BLSTM model.BLSTM (Bidirectional LSTM) is an extension of LSTM that utilizes information from the “future” by combining another LSTM layer that moves from the end of the sequence to its beginning.The BLSTM model used by the authors is shown in the image below.
Model EvaluationDatasetsThere are two datasets, Dataset-Ind and Dataset-Noise.Each dataset has two versions:NAPT version and VPN version.The Dataset-Ind dataset contains traffic data from 10 individual IoT devices, which are composed into traffic windows.The Dataset-Ind dataset has a total of 32,760 traffic windows.The Dataset-Noise dataset also contains data in the form of traffic windows, but unlike the Dataset-Ind dataset, each traffic window in this dataset consists of packets from multiple devices.The Dataset-Noise dataset contains 114,989 traffic windows.Evaluation Metrics
Overall accuracy and category accuracy
Evaluation ResultsThe evaluation results under the Dataset-Ind dataset are shown in the table below.From the table, it can be seen that the accuracy of the LSTM model is generally higher than that of the Random Forest model.
Subsequently, the authors studied the impact of traffic window size on experimental accuracy under the Dataset-Ind dataset, and the results showed that larger traffic windows lead to higher experimental accuracy.Therefore, in the subsequent experiments, the default size of the traffic window was set to 100.
The evaluation results under the Dataset-Noise dataset are shown in the image below.From the image, it can be seen that the accuracy of the Random Forest model significantly decreased in this dataset, with an overall accuracy of 84.5% in the NAPT environment and 67.6% in the VPN environment.In contrast, the LSTM model performed better in the NAPT environment but worse in the VPN environment.

The authors analyzed the phenomenon of accuracy reduction in both the Random Forest and LSTM models, concluding that the accuracy of the Random Forest model decreased because multiple IoT and non-IoT devices were communicating using the same port, leading to classification failures;while the accuracy drop in the LSTM model was attributed to sparse traffic:In extreme cases of the VPN protocol, the traffic packets generated by smart plugs (shown in the image as orvibo, tplink) could be diluted to less than 3% in the traffic window.This made it impossible to identify these two smart plugs.(PS: Based on this principle, we can also use this small program that generates “noise” while surfing the internet to protect our privacy:https://github.com/1tayH/noisy)ConclusionBased on the experimental results, the authors believe that even under encryption and traffic blending, the network communication of IoT devices poses serious privacy implications.More research in this area is needed to better understand and mitigate privacy issues in smart home networks.
Related materials:
1. Accessed: September 2019. “Can a MAC address be traced?” Available online at https://askleo.com/can_a_mac_address_be_traced/.
2. Acar A, Fereidooni H, Abera T, et al. “Peek-a-Boo: I see your smart home activities, even encrypted!” arXiv preprint arXiv:1808.02741, 2018. Available online at https://arxiv.org/pdf/1808.02741.
3. Bezawada B, Bachani M, Peterson J, et al. “Iotsense: Behavioral fingerprinting of iot devices” arXiv preprint arXiv:1804.03852, 2018. Available online at https://arxiv.org/abs/1804.03852.
4. Apthorpe N, Reisman D, Feamster N. “A smart home is no castle: Privacy vulnerabilities of encrypted iot traffic” arXiv preprint arXiv:1705.06805, 2017. Available online at http://arxiv.org/abs/1705.06805.
5. Apthorpe N, Reisman D, Sundaresan S, et al. “Spying on the smart home: Privacy attacks and defenses on encrypted iot traffic” arXiv preprint arXiv:1708.05044, 2017. Available online at http://arxiv.org/abs/1708.05044.
6. Accessed: September 2019. “Smart home blog”. Available online at https://blog.smarthome.com/.
Author Introduction:Lu Xinfeng, a master’s student at Jilin University, mainly researching object detection.
This article is original by Machine Heart, please contact this public account for authorization to reprint✄————————————————Join Machine Heart (Full-time reporter / Intern): [email protected]Submissions or inquiries: content@jiqizhixin.comAdvertising & Business Cooperation: [email protected]