Design of Smart Seeing Glasses Based on Machine Vision

Abstract

This paper proposes a design scheme for a smart seeing glasses system based on machine vision. The system uses the S5PV210 processor based on Samsung’s Cortex-A8 architecture, runs on a Linux operating system, and is equipped with six core functional modules: binocular image acquisition, GPS positioning, voice broadcasting, GSM messaging, voice calling, and wireless transmission, to build the hardware platform for the smart seeing glasses system. By combining deep learning algorithms on a remote cloud server, the system achieves intelligent recognition of target scenes and provides accurate real-time guidance for blind individuals through voice prompts. System test results indicate that this smart seeing glasses system can not only navigate blind individuals correctly in a testing environment but also possesses certain target recognition capabilities, assisting blind individuals in simple object classification. Additionally, the system includes multiple auxiliary functions such as GPS positioning, voice calling, and GSM messaging.

Narrowly defined home decoration refers to indoor decoration, which is considered from the perspective of beautification to make indoor spaces more aesthetically pleasing;

Broadly defined home decoration includes the transformation and renovation of indoor spaces; today, the home decoration we refer to is mostly in the broad sense, which is a combination of indoor renovation and decoration.

Chinese Citation Format: He Tengpeng, Zhang Rongfen, Liu Chao, et al. Design of smart seeing glasses based on machine vision[J]. Application of Electronic Technique, 2017, 43(4): 58-61.English Citation Format: He Tengpeng, Zhang Rongfen, Liu Chao, et al. Design of smart seeing glasses based on machine vision[J]. Application of Electronic Technique, 2017, 43(4): 58-61.

0 Introduction

According to statistics from the World Health Organization, there are approximately 78 million blind individuals worldwide, with 90% living in developing countries. The number of blind individuals in China accounts for 18% of the world’s total, reaching as high as 14 million. As a vulnerable group in society, visual impairments and eye diseases bring many inconveniences to their lives. On the other hand, with the continuous increase in the aging population in China, the number of visually impaired individuals is also rising sharply. Ensuring the safe and effective travel of blind individuals and those with visual impairments is particularly important. Therefore, this paper designs a smart seeing glasses system based on machine vision ^[1] to help blind individuals safely and effectively avoid obstacles on the road during their travels, maximizing their travel safety. Compared to the inefficient guiding canes available on the market and the expensive guide dogs, the machine vision-based smart seeing glasses system is more competitive.

1 Overall Design of the Smart Seeing Glasses Control System

The smart seeing glasses control system described in this paper consists of two main parts: the front-end embedded acquisition and transmission system and the remote cloud platform server. The embedded acquisition and transmission system is based on the Samsung Cortex-A8 architecture S5PV210 processor, running on a Linux kernel, and is equipped with core functional modules such as binocular image acquisition, GPS positioning, voice broadcasting, GSM messaging, voice calling, and wireless transmission to build the hardware platform for the smart seeing glasses system. It mainly completes information acquisition, transmission, and intelligent guiding functions. The cloud platform server serves as the remote data processing center for the smart seeing glasses. For server configuration, Alibaba Cloud is selected as the cloud server, integrating deep learning, binocular ranging ^[2], and other related algorithms to achieve image recognition, distance detection, and orientation judgment of the target scenes in front of the smart seeing glasses. Additionally, the server has built a GPS satellite data matching platform, which, combined with the physical smart seeing glasses, can provide real-time and effective positioning of the glasses user. The overall design block diagram of the system is shown in Figure 1.

2 Hardware Design of the Smart Seeing Glasses Control System

2.1 Design of the Binocular Acquisition Module

The binocular acquisition module uses two identical CMOS high-definition cameras to capture scene information in front of the smart seeing glasses, helping blind individuals obtain information about targets and corresponding scenes ahead.

2.2 Design of the GPS Positioning Module

The Global Positioning System (GPS) provides low-cost, high-precision three-dimensional positioning for users worldwide, enabling all-weather, all-direction real-time positioning globally. The system uses the NEO-6M module from Swiss company Ublox as the core unit of the GPS module, primarily used to obtain the geographical coordinates (latitude and longitude) of the blind individual’s location in real-time.

2.3 Design of the Wireless Communication Transmission Module

The wireless communication transmission module mainly consists of a 3G chip and corresponding peripheral circuits, utilizing 3G wireless technology to achieve bidirectional communication between the smart seeing glasses and the remote cloud platform server. On one hand, it sends images captured by the binocular camera module and geographical coordinates obtained by the GPS module to the remote cloud server via the 3G network; on the other hand, it receives the image recognition and geographical location matching results from the server and relays them back to the smart seeing glasses for voice broadcasting, informing the blind individual of the results in real-time. Additionally, using the GSM messaging function of the 3G module, the smart seeing glasses system will also inform the blind individual’s family of the results obtained from the cloud server in a timely manner via text message. Furthermore, in special circumstances, blind individuals can also use the 3G phone function to directly communicate with relatives. Figure 2 shows the application circuit diagram of the wireless communication transmission module.

2.4 Design of the Voice Broadcasting Module

The voice broadcasting function of the smart seeing glasses control system is mainly used to play the image recognition results, distance, orientation information, and geographical location of the targets in front of the glasses through the voice module, promptly informing blind individuals of their surrounding environmental conditions. The system uses the SYN6288 Chinese speech synthesis chip as the main component of the voice broadcasting module to achieve text-to-speech conversion, with its peripheral circuit shown in Figure 3.

3 Software Design of the Smart Seeing Glasses Control System

The software design of this system is divided into two parts: program design on the remote cloud platform server and program design on the front end of the smart seeing glasses. The software design on the remote cloud platform server mainly uses high-level programming languages such as C/C++ to convert algorithms related to image recognition, distance measurement, and orientation detection into program instructions that can be recognized by the computer system, thereby enabling the remote cloud server of the smart seeing glasses to perform recognition, distance measurement, and orientation detection functions. Additionally, the parsing of GPS latitude and longitude is also implemented through software programming on the cloud server. The front-end software design of the smart seeing glasses mainly includes subprograms for image acquisition from the binocular camera, obtaining GPS geographical location coordinates, data transmission and reception of the wireless communication transmission module, scheduling of the voice broadcasting module, and configuration of key interrupts, with the main program flowchart shown in Figure 4.

4 Natural Scene Recognition Based on Deep Belief Networks

Deep Belief Networks (DBN) ^[3] are one of the most widely used algorithm models in deep learning, commonly applied in handwritten character recognition and natural scene recognition. In the smart seeing glasses control system, deep belief networks are mainly used for object recognition in common natural scenes, which also reflects the application of deep learning in the field of machine vision. Figure 5 shows a typical network structure model of a deep belief network.

As shown in Figure 5, the deep belief network consists of multiple Restricted Boltzmann Machines (RBM) ^[4] to form a deep network. In this network, the training process of the DBN adopts a layer-by-layer training method, where each RBM is trained individually, and parameters are adjusted separately ^[5]. After training one layer, the training result of that layer is used as the input for the next RBM until all RBMs are trained, a process known as pre-training. Once all RBMs in the deep belief network are trained, the backpropagation algorithm is used to fine-tune the parameters based on the sample label values.

In the image training process of the smart seeing glasses, an improved CIFAR-10 natural scene dataset is used as the test training sample. The original CIFAR-10 dataset contains 60,000 32×32 color images, divided into 10 categories: airplane, automobile, cat, bird, deer, dog, frog, horse, ship, and truck. To enhance the practicality of the smart seeing glasses, considering the special circumstances of blind individuals, this paper incorporates common target scenes from daily life, such as tables, chairs, people, trash cans, and trees, into the actual system image training, improving the original CIFAR-10 natural scene dataset. The improved CIFAR-10 natural scene dataset is then trained and recognized using the deep belief network model shown in Figure 6.

In the aforementioned training model, each image in the improved CIFAR-10 natural scene dataset is a 32×32 color image, resulting in an input layer size of 3,072 nodes (3,072=32×32×3), with the two hidden layers having 1,000 and 200 nodes, respectively. After passing through a multi-class Softmax classifier ^[6], the output layer consists of 10 units, resulting in a final model structure of 3072-1000-200-10. During the actual image training phase of the smart seeing glasses system, the training iteration count for the two layers of RBM in the training model is set to 200, and the learning rate is set to 0.1. After training, the learned weights are used to initialize the neural network, fine-tuning the network parameters, and the Sigmoid function ^[7] is used to activate the neural network. Due to the large number of samples and complex data, the hidden layers require a relatively high number of nodes to learn better features. Additionally, the large amount of information in the images necessitates multiple iterations. The entire training process averages 10 hours, which is significantly shorter than the training time of convolutional neural networks ^[8] and autoencoder models ^[9] in deep learning, while still achieving a relatively ideal recognition rate. This is the main reason for selecting deep belief networks as the recognition training model in this system.

5 System Test Results and Analysis

The deep belief network training model described above was first used to conduct demonstrative tests on 10,000 test images randomly selected from the improved CIFAR-10 dataset, with Table 1 showing the recognition rates for various sample categories and the average recognition rate.

As shown in Table 1, the average recognition rate for the 10 categories of test samples in the improved CIFAR-10 dataset after training with the deep belief network model of the smart seeing glasses system is 82.9%, which exceeds the recognition rate of the support vector machine ^[10] training recognition model by nearly 10%, laying a foundation for further overall system testing of the smart seeing glasses. Finally, considering the living needs of blind individuals and integrating the other functions of the smart seeing glasses, the control system of the smart seeing glasses was systematically debugged in real scenarios. The frame rate of the binocular camera was set to 3 frames/s, and the voice navigation frequency was set to guide every two seconds. The test results from the remote server are shown in Figures 7 and 8. Figure 7 shows that the GPS function of the smart seeing glasses can accurately and in real-time obtain the latitude and longitude of the user wearing the glasses and transmit it to the remote server via the wireless communication transmission module for accurate geographical location matching. Figure 8 demonstrates that the smart seeing glasses can not only accurately identify the category of targets ahead but also measure the distance to the target objects, correctly indicating the orientation characteristics of obstacles, and provide real-time voice guidance to help blind individuals effectively avoid obstacles, ensuring their safe travel.

6 Conclusion

This system builds a smart seeing glasses system based on machine vision with the S5PV210 as the main controller. By running on a Linux kernel and equipping six core functional modules: binocular image acquisition, GPS positioning, voice broadcasting, GSM messaging, voice calling, and wireless transmission, the system completes both hardware circuit design and software design. Through system testing, the smart seeing glasses can not only provide real-time voice navigation for blind individuals’ independent travel but also allow blind individuals to use the GPS, GSM messaging, and voice calling functions of the smart glasses to seek help from friends and family in special circumstances by pressing a trigger button on the glasses. Additionally, since the smart seeing glasses also possess image recognition capabilities, they can assist blind individuals in simple object classification, thereby enhancing their self-care abilities to a certain extent, which is particularly important for a country like China with a large blind population.

References

[1] Milan Sonka, Vaclav Hlavac, Roger Boyle, et al. Image Processing, Analysis, and Machine Vision[M]. Beijing: Tsinghua University Press, 2016.

[2] Yue Ronggang, Wang Shaoping, Li Kai, et al. A New Binocular Ranging Method Based on Similarity Principles[J]. Optoelectronic Engineering, 2008, 35(4): 64-68.

[3] Chen Cuiping. Text Classification Algorithm Based on Deep Belief Networks[J]. Computer Systems Applications, 2015, 24(2): 121-126.

[4] Zhang Chunxia, Ji Nannan, Wang Guanhui. Introduction to Restricted Boltzmann Machines[J]. Journal of Engineering Mathematics, 2013(2): 159-173.

[5] HINTON G E, SRIVASTAVA N, KRIZHEVSKY A, et al. Improving Neural Networks by Preventing Co-adaptation of Feature Detectors[J]. Computer Science, 2012, 3(4): 212-223.

[6] Wang Shuang, Ma Wenping, Xie Huiming, et al. A Polarization SAR Image Classification Method Based on Stack Encoding and Softmax[P]. CN104156728A, 2014.

[7] Zhang Xuewei, Wang Yan. Prediction of Plate Shape Based on Sigmoid Function Parameter Adjustment in Double Hidden Layer BP Neural Network[J]. Chemical Automation and Instrumentation, 2010, 37(4): 42-44.

[8] Chen Xianchang. Research on Deep Learning Algorithms and Applications Based on Convolutional Neural Networks[D]. Hangzhou: Zhejiang Gongshang University, 2013.

[9] Wu Haiyan. Research on Semi-supervised Representation Learning and Classification Learning Based on Autoencoders[D]. Chongqing: Chongqing University, 2015.

[10] Cui Pengyu. Research on Classifier Training Based on Support Vector Machines[J]. Digital Technology and Applications, 2016(6): 58-58.

Author Information:

He Tengpeng, Zhang Rongfen, Liu Chao, Fang Lenan, Liu Yuhong

(School of Big Data and Information Engineering, Guizhou University, Guiyang, Guizhou 550025)

Related posts

Leave a Comment Cancel reply