Measurement of Bamboo Diameter Based on Lightweight Cascade Network

Wang Shaoping ¹ Yan Yonglin ¹ Hao Xiaofeng ²Li Xianjun ²Wang Xiaogang ¹

(1. Central South University of Forestry and Technology, College of Mechanical and Electrical Engineering, Changsha, Hunan 410004; 2. Central South University of Forestry and Technology, College of Materials Science and Engineering, Changsha, Hunan 410004)

DOI：10.12326/j.2096-9694.2022132

Abstract In response to the lack of universality of existing automatic bamboo splitting machine diameter measurement methods and the difficulty of simultaneously measuring thickness, a cascade network composed of YOLOv4-Tiny and MobileNet-SegNet is adopted, utilizing binocular vision for distance measurement, obtaining the outer diameter and thickness using the minimum enclosing circle, and constructing a measurement algorithm. The measurement system connects a USB binocular camera to a Raspberry Pi 4B and a Movidius neural compute stick, using Python to write the main program and deploying the model in OpenVINO asynchronous mode. Measurement experiments are conducted at different distances, angles, and bamboo tube specifications. The results show that the optimal measurement distance for the algorithm is 30-40 cm, with an average relative error of 1.43% for outer diameter measurement and an average relative error of 8.76% for thickness measurement, achieving a detection speed of 7.1 FPS. This study provides a basis for the design of the automatic bamboo splitting machine diameter measurement tool change system.

Keywords Bamboo Splitting Machine;Bamboo Diameter Measurement;YOLOv4-Tiny;MobileNet-SegNet;Asynchronous Inference

Bamboo is a naturally growing material, and its industrial utilization mostly involves cross-cutting different diameter raw bamboo into bamboo tubes and then longitudinally splitting them into bamboo strips of certain widths. The bamboo splitting machine requires an internal measurement mechanism to measure the outer diameter of the bamboo tube during the splitting process, controlling the rotation or movement of the knife disc based on the outer diameter size and continuously changing the tools to ensure consistent width of the bamboo strips. After splitting, bamboo strips of different thicknesses are stacked and processed into specified bamboo strips for the production of bamboo engineered materials. Currently, the degree of automation of the bamboo splitting process in China is relatively low, and diameter measurement and tool change as well as thickness sorting mostly rely on manual experience^[1-2]. To achieve automation of bamboo diameter measurement, Chang Feihu et al.^[3] applied electronic rulers for diameter detection of bamboo segments; Liang Guojian et al.^[4] used grating to measure the outer diameter of bamboo tubes to control tool change; Zhu Feng et al.^[5] applied Gabor wavelets to texture images, combined with clustering algorithms and modal extremum feature extraction methods, to achieve image segmentation of bamboo material end faces; Wang Haifeng et al.^[6] realized bamboo material size measurement based on camera models, calibration techniques, and pinhole imaging theory.

The above studies often target specific processing objects and conditions, making it difficult to balance the universality of measurement methods, and mainly focus on outer diameter measurement, making it challenging to achieve simultaneous thickness assessment. Therefore, improving the universality of measurement methods to realize synchronous measurement of outer diameter and thickness is key to upgrading the automation of the bamboo splitting process. This study proposes a visual measurement algorithm based on a cascade network composed of YOLOv4-Tiny and MobileNet-SegNet to simultaneously measure the outer diameter and thickness of bamboo tubes, providing a basis for the design of automatic bamboo splitting machine diameter measurement mechanisms.

1 Algorithm Principles and Calculation Steps

Measuring the outer diameter of bamboo tubes requires obtaining the size of the bamboo tube end face and the distance information from the end face to the camera. In theory, monocular, binocular, or depth cameras can be used. The feeding mechanism of the bamboo splitting machine is chain-based, and the position of the bamboo material varies greatly, with significant differences in the installation positions of different bamboo splitting machine diameter measurement mechanisms, making it difficult to achieve bamboo diameter measurement using monocular vision; while consumer-grade depth cameras have low depth measurement accuracy. Therefore, this study adopts a binocular vision scheme to improve the stability and universality of the algorithm.

The main steps for measuring the outer diameter and thickness of bamboo tubes are as follows: using the same object detection network, outputting the detection boxes of the bamboo tube end faces in the left and right images respectively; performing stereo matching of the detection boxes in the left and right images to calculate the distance from the bamboo tube end face to the camera; performing semantic segmentation on the bamboo tube end face within the detection boxes; obtaining the minimum enclosing circle of the semantic segmentation results of the bamboo tube end face, and combining it with the distance from the bamboo tube end face to the camera to obtain the outer diameter and thickness of the bamboo tube.

In the above steps, instance segmentation networks can be used to achieve simultaneous object detection and segmentation, but mainstream instance segmentation networks (such as Mask RCNN, YOLACT) are large in size and have many parameters, making it difficult to deploy on embedded devices such as microcomputers. Lightweight cascade networks have significant speed advantages, and the algorithm is constructed using a cascade network composed of the YOLOv4-Tiny object detection network and the MobileNet-SegNet semantic segmentation network, as shown in Figure 1.

Measurement of Bamboo Diameter Based on Lightweight Cascade Network

Figure 1 Overall process of the algorithm for measuring the outer diameter and thickness of bamboo tubesFig.1 General flow chart of bamboo tube outer diameter and thickness measurement algorithm

1.1 Object Detection

YOLOv4-Tiny is a lightweight object detection network that significantly simplifies YOLOv4 while ensuring high-speed operation and sufficient detection accuracy and multi-scale capability. The complete model size is only 23.6 M, and it has high compatibility with the model optimizer and inference engine of the OpenVINO toolkit, making it easy to deploy on embedded devices, meeting the needs of this study^[7-12].

1) Mosaic Data Augmentation. During the training of YOLOv4-Tiny, images are not input one by one but are randomly scaled, cropped, arranged, and stitched together from 4 images in the dataset for input, enriching the dataset and increasing the number of small targets in the dataset, allowing the network to achieve better training results, improving training speed, and reducing hardware resource consumption.

2) Feature Pyramid Networks (FPN) and Multi-Feature Layer Prediction. YOLOv4-Tiny adopts the FPN structure in the effective feature layer fusion stage. After the image is processed through CSPdarknet53_tiny for feature extraction, effective feature layers of two scales are obtained. The small-size feature layer undergoes convolution and up-sampling, which is then stacked with the large-size feature layer, and convolution is completed for feature fusion. The fused large and small-size feature layers are respectively input into two Yolo Heads for prediction. The FPN structure greatly enriches the information of the feature map, enhancing the network’s feature extraction and multi-scale capabilities.

3) Improved Bounding Box Loss Function. YOLOv4-Tiny adopts CIoU_loss as the loss function for bounding boxes. CIoU_loss adds measures for the intersection scale of the predicted box and the target box, center distance calculation, and the influence factor for measuring aspect ratio consistency to the IoU_loss, incorporating the overlapping area, center distance, and aspect ratio of the predicted box and target box into the loss calculation, accelerating the regression of the predicted box and improving learning efficiency and network prediction accuracy.

1.2 Stereo Matching and Distance Calculation

The left and right camera images processed by YOLOv4-Tiny yield detection box position data, which are stored in a list and traversed. Each predicted box in the left image is matched with all predicted boxes in the right image, calculating the intersection over union (IoU). If the ratio exceeds a pre-set threshold, the two boxes are considered matched. After testing, the threshold is set to 0.25. Upon successful matching, the program calculates the disparity using the center point X axis coordinates of the matched boxes. The principle of binocular vision ranging is shown in Figure 2^[13-14].

Measurement of Bamboo Diameter Based on Lightweight Cascade Network

Figure 2 Principle of binocular vision rangingFig.2 Principle of binocular vision ranging

The distance d from the measured point to the plane where the binocular camera is located should satisfy Equation 1.

Measurement of Bamboo Diameter Based on Lightweight Cascade Network

(1)

Where:

is the disparity, b and f are the built-in parameters of the camera, and the measured distance d is only related to the disparity D.

1.3 Semantic Segmentation

The SegNet network can classify each pixel point of the input image with high accuracy, but the complete model size is large and has many parameters^[15], which is not conducive to deployment on embedded devices.

MobileNet is a lightweight convolutional neural network that uses depthwise separable convolutions instead of regular convolutions, combining grouped convolutions with point convolutions to reduce the parameters and computational load required for a single convolution. While maintaining accuracy, it optimizes model size and speed^[16-17].

Considering the compatibility issues with OpenVINO, this study uses MobileNetv1 as the feature extraction network to build MobileNet-SegNet. Since the input image is specified as 128×128×3 (height × width × channels), the first 14 layers of MobileNetv1 are selected, resulting in a feature map of 8×8×512 after feature extraction. As MobileNetv1 does not use pooling layers in the downsampling process, the pooling index of SegNet cannot be realized. This study refers to U-Net, stacking the results of downsampling from each layer into upsampling layers of the same size and performing convolution^[18], allowing the network to obtain shallow layer information during the upsampling process. The model has 10,208,610 parameters, and the structure of MobileNet-SegNet is shown in Figure 3.

Measurement of Bamboo Diameter Based on Lightweight Cascade Network

Figure 3 MobileNet-SegNet structureFig.3 MobileNet-SegNet structure

1.4 Size Calculation

The output of MobileNet-SegNet is post-processed to obtain a binary image of the segmentation between the bamboo tube end face and the background. This study uses the minimum enclosing circle method to measure the pixel coordinates of the end face to address measurement errors caused by changes in measurement angles^[19-20].

Lens imaging is the mainstream imaging system. When the object distance u is much greater than the focal length f, it can be approximated as a pinhole imaging model, and the projection size of the object and the object distance can be calculated from the minimum enclosing circle and the disparity obtained from binocular vision. The formula for measuring the true size of the measured object can be transformed into the following form:

(2)

Where: S is the maximum outer diameter at pixel coordinates, b is the baseline length of the binocular camera (which can be calculated from the left and right imaging planes’ translation vector T obtained through camera calibration), and ul and ur are the X axis coordinates of the center points of the captured frames of the same bamboo tube end face in the left and right images respectively.

According to the theory in literature 21, the binocular vision system has higher accuracy when the structure is symmetrical, and the baseline length is best within 0.8-2.2 times the working distance. However, the installation space for the bamboo splitting machine diameter measurement mechanism is limited, so this study uses a long baseline version (60 mm), focal length 2.1 mm, wide-angle lens USB binocular camera as the imaging device.

2 Data Preparation and Model Training

2.1 Network Training and Test Set Preparation

Data Collection: 63 bamboo tubes with an outer diameter of 80-140 mm were obtained by cross-cutting 3-5 year old Mao bamboo (Dendrocalamus latiflorus), placed on the bamboo splitting machine, and photographed using a USB binocular camera with a resolution of 2560×720 at different angles, distances, directions, and lighting conditions, resulting in 1580 images. The training set (1264 images) and test set (316 images) were divided in an 8:2 ratio; for the test set data, the outer diameter, end face thickness, and the distance from the end face to the plane where the binocular camera is located were measured simultaneously.

The bamboo tube end face was labeled using Labelme, as shown in Figure 4. After training the network, it should be able to segment the red bamboo tube end face area from the black background area in the image, thereby extracting the complete shape of the bamboo tube end face. Since the samples are of a single category, overfitting is likely to occur. Data augmentation with random changes in size, hue, etc., is added to the training of MobileNet-SegNet, enhancing the training in real-time. The corresponding YOLOv4-Tiny enables Mosaic data augmentation. Regarding the loss function, YOLOv4-Tiny uses CIoU_loss, while MobileNet-SegNet uses a combination of categorical_crossentropy and IoU_loss as its loss^[22], and L2 regularization is applied to the weights of the convolutions and depthwise separable convolutions in the MobileNet-SegNet feature extraction network. Both networks adopt a novel annealing learning rate decay method, with the optimizer set to Adam, and the initial learning rates set to 0.003 (MobileNet-SegNet) and 0.00261 (YOLOv4-Tiny).

Measurement of Bamboo Diameter Based on Lightweight Cascade Network

Figure 4 Image results of labeling bamboo cross-sectionFig.4 Image results of labeling bamboo cross-section

Computer Configuration for Network Training: AMD Ryzen 7 PRO 4750U CPU, 16 GB memory, Windows 10 Professional operating system, deep learning frameworks keras 2.3.1 and darknet, Python version 3.6.5. Considering the need for deployment on Raspberry Pi 4B, experiments were conducted training YOLOv4-Tiny models with input image sizes of 416×416, 320×320, and 256×256, selecting the most suitable size by comparing the accuracy and speed of each model. MobileNet-SegNet uses 128×128 as the input image size.

2.2 Training Results

Neural network training can be divided into two steps: forward propagation and backward propagation. Forward propagation is the process of continuously calculating the results of each layer from the input layer and passing them down to the output layer to obtain predicted values; backward propagation compares the predicted values with the actual values, calculates the loss, and updates the internal parameters of the network layer by layer based on the loss from back to front. The complete training process involves continuously performing forward and backward propagation on the dataset samples during iterations until the internal parameters are optimal and the loss is minimized. To save computational resources and speed up training, multiple data samples are typically processed in batches during forward propagation, and the number of samples processed before updating internal model parameters is referred to as the Batch. Epoch is also an important hyperparameter in the training process; one Epoch represents one complete forward and backward propagation of all samples in the dataset and consists of one or more Batches. In this study, the training of YOLOv4-Tiny uses the number of Batches, which is also the Iteration, to count the training volume, with the Batch size set to 64, while MobileNet-SegNet training uses Epoch to count the training volume.

After multiple iterations, the training data and model files are saved in the results directory, and the training process is visualized using the matplotlib library, as shown in Figure 5; the model files are converted into the IR format supported by OpenVINO, and tested for accuracy and speed using images from the test set on the Raspberry Pi.

Measurement of Bamboo Diameter Based on Lightweight Cascade Network

Figure 5 YOLOv4-Tiny network trainingFig.5 YOLOv4-Tiny network training

Figure 6 shows the training situation of the YOLOv4-Tiny model with an image input size of 320×320. After 1000 Batch iterations, the mean average precision (mAP) of the network reached 77.3% at IoU threshold = 0.50, steadily increasing, reaching 100% at 2000 Batch iterations, while the loss decreased to 0.507 and ultimately stabilized at 0.352. As seen in Table 1, when the image input size is 320×320, the YOLOv4-Tiny model shows significant improvements in speed, and the accuracy meets detection requirements. The YOLOv4-Tiny model with an image input size of 320×320 is chosen for deployment.

Measurement of Bamboo Diameter Based on Lightweight Cascade Network

Figure 6 MobileNet-SegNet network trainingFig.6 MobileNet-SegNet network training

Table 1 YOLOv4-Tiny model performance with different input sizesTab.1 YOLOv4-Tiny model performance with different input sizes

As shown in Figure 6, the MobileNet-SegNet network stabilizes and converges after about 90 Epoch iterations, maintaining an accuracy of 98.5% and a loss of around 0.035 in subsequent training. The training process of both models is smooth, with no overfitting occurring.

To observe the performance of MobileNet-SegNet, a comparison is made between MobileNet-SegNet and SegNet, using mean intersection over union (mIoU) and Dice coefficient as accuracy evaluation indicators, with the calculation formulas shown in Equations 3 and 4.

(3)

(4)

Where: TP is the true positive classified as positive; FP is the false positive classified as positive; FN is the false negative classified as negative; k is the number of categories.

Table 2 shows the performance comparison results of MobileNet-SegNet and SegNet on the validation set. The inference time of MobileNet-SegNet is reduced by 21% compared to SegNet, while the accuracy remains basically unchanged. A comparison of the training results with and without data augmentation is shown in Figure 7.

Table 2 Performance comparison between MobileNet-SegNet and SegNetTab.2 Performance comparison between MobileNet-SegNet and SegNet

Measurement of Bamboo Diameter Based on Lightweight Cascade Network

Figure 7 Comparison of data augmentation trainingFig.7 Comparison of data augmentation training

3 Raspberry Pi Embedded Experimental Platform Algorithm Deployment

To verify the algorithm’s effectiveness, the USB binocular camera is connected to the Raspberry Pi 4B, and the Movidius compute stick is inserted into the USB2 interface. The main program is written in Python, and the model is deployed using OpenVINO. During the experiment, the bamboo tube is placed in front of the binocular camera, and the program is run to complete the measurement according to the predetermined experimental process. The images from the experiment are shown in Figure 8.

Measurement of Bamboo Diameter Based on Lightweight Cascade Network

Figure 8 Experimental imagesFig.8 Images of experiment

3.1 OpenVINO Asynchronous Inference

The model is deployed using OpenVINO on the Raspberry Pi, with OpenVINO supporting synchronous and asynchronous inference modes. Compared to synchronous, the asynchronous mode does not block threads and allows multiple requests to run independently^[23], as shown in the inference workflow in Figure 9.

Measurement of Bamboo Diameter Based on Lightweight Cascade Network

Figure 9 OpenVINO asynchronous inferenceFig.9 OpenVINO asynchronous inference

As shown in Figure 9, video images are processed in a cyclic manner in the order of 1, 2, and 3, with each request undergoing two cycles from input to output, increasing the available inference time for a single request. However, too many requests can cause delays in the content of the images, and the number of requests should conform to the following relationship.

(5)

Through testing, when detecting a single target on the Raspberry Pi 4B, the main program takes 0.1292 s for a single inference cycle, and based on the data in Tables 1 and 2, to reduce latency, the number of requests for the two networks is set as follows: YOLOv4-Tiny 4 (2 for left and 2 for right image inference), MobileNet-SegNet 2.

3.2 Camera Calibration and Stereo Rectification

The USB binocular camera used in the experiment often does not fully meet the similar triangle relationship of pinhole imaging due to processing, assembly, aging, etc., resulting in image distortion increasing from the center to the edge in the image. At the same time, the optical axes of the left and right cameras may not be ideally parallel, affecting disparity calculations and stereo matching. Therefore, it is necessary to calibrate the binocular camera before the experiment to eliminate distortion and perform stereo rectification to ensure measurement accuracy.

The overall mathematical model of image distortion is as follows:

(6)

Where: x is the X-axis coordinate of the image point on the image plane, y is the Y-axis coordinate of the image point on the image plane, x’ and y’ are the corresponding distorted coordinates. To eliminate distortion, it is necessary to obtain the tangential distortion parameters (p1, p2) and radial distortion parameters (k1, k2, k3). The radial distortion of ordinary cameras can be represented by k1 and k2. For stereo rectification, this study uses the Bouguet algorithm, which utilizes the rotation matrix (R) and translation vector (T) between the left and right imaging planes to rotate both imaging planes halfway toward each other, achieving coplanarity, and then constructs a rotation around the optical axis to make the epipolar lines parallel to the baseline, achieving row alignment.

For this, the entire calibration process requires obtaining the internal parameter matrices of both cameras (K), k1, k2, p1, p2, and R, T. This study uses the Stereo Camera Calibrator toolbox in MATLAB in conjunction with Zhang Zhengyou’s chessboard calibration method^[24] to obtain the above parameters. The calibration results are listed in Table 3, and the calibration effect is shown in Figure 10.

Table 3 Camera calibration resultsTab.3 Camera calibration results

Measurement of Bamboo Diameter Based on Lightweight Cascade Network

Figure 10 Image calibration effectFig.10 Image calibration effect

4 Algorithm Verification and Result Analysis

4.1 System Performance Testing

The relatively fast single-stage instance segmentation network YOLACT is converted into the IR format supported by OpenVINO and run on the Raspberry Pi. Its average single inference time during 30 video loops was 1.7599 s, while YOLOv4-Tiny and MobileNet-SegNet only took 0.0965 s and 0.1307 s, respectively. This indicates that mainstream instance segmentation networks do not have speed advantages when deployed on embedded devices such as microcomputers, making them difficult to apply in detection scenarios that require real-time performance.

To observe the acceleration effect of OpenVINO asynchronous inference, time statistics are set at the network inference work location of the main program. The average single inference time for the two networks and the average frames per second (FPS) of the algorithm are listed in Table 4.

Table 4 Comparison of synchronous and asynchronous inferenceTab.4 Comparison of synchronous and asynchronous inference

The asynchronous inference shows significant acceleration effects, with the average network inference time reduced by 98.1%, and the FPS during detection is 7.1.

4.2 Measurement Conditions Analysis

4.2.1 Measurement Distance

Two bamboo tubes with outer diameters of 115 mm and 100 mm were used to measure their outer diameters at different distances. Based on the FPS test results of the algorithm, the average measurement values for each specification bamboo tube at each distance were taken from 7 video loops (approximately 1 s measurement time), and the results are listed in Table 5.

Table 5 Outer diameter measurement data at different distances without deflection anglesTab.5 Outer diameter measurement at different distances without deflection angles

When the measurement distance is 30-40 cm, the average deviation of the bamboo tube’s outer diameter is minimal, making it the optimal detection distance, also suitable for the installation of embedded detection devices in bamboo splitting machines. The average measurement error of the method used in this study is 5.86 mm.

4.2.2 Measurement Angle

One bamboo tube with an outer diameter of 103 mm was taken, maintaining the optimal measurement distance while measuring at different angles. The average measurement values for each measurement angle were statistically taken from 7 video loops. This test aims to verify the measurement stability of the system under significant deflection angles caused by installation errors or aging of the frame. The measurement results are listed in Table 6. Different placement angles can cause a certain amount of error, mainly because the bamboo tube end face is not perpendicular to the radial direction. In actual production, the optical axis of the measurement mechanism’s camera usually remains parallel to the bamboo tube’s radial direction, with deflection angles generally smaller than the experimental setting angles, having no significant impact on measurement accuracy.

Table 6 Outer diameter measurement data at 30-40 cm fixed distance with different anglesTab.6 Outer diameter measurement at 30-40 cm distance with different angles

4.3 Verification Experiment for Bamboo Tube Outer Diameter and Thickness Measurement

Seven bamboo tubes with increasing outer diameters were selected, maintaining the measurement distance within 30-40 cm. The system was activated to measure the outer diameter and thickness, and the average measurement results for each outer diameter bamboo tube were statistically taken from 7 video loops, as listed in Table 7. At the optimal measurement distance, the average deviation for outer diameter is only 1.47 mm, with an average relative error of 1.43% for outer diameter measurement and an average relative error of 8.76% for thickness measurement.

Table 7 Measurement of bamboo tubes with different outer diametersTab.7 Measurement of bamboo tubes with different outer diameters

5 Conclusion

1) Under the conditions of distance (30-40 cm) and angle (0-45°), the algorithm’s measurement results are accurate and effective, meeting the needs for automated diameter measurement and providing a basis for subsequent tool change control.

2) The use of OpenVINO asynchronous inference and lightweight cascade networks ensures that the algorithm meets speed requirements (7.1 FPS) during detection, making it suitable for use in low-cost embedded devices.

3) The algorithm can obtain the distance and size of the end face using binocular vision and the minimum enclosing circle scheme, with no strict requirements on the camera installation position and angle, reducing measurement errors caused by installation errors, aging, and deformation of the frame.

4) The algorithm can simultaneously measure bamboo diameter and thickness, meeting the needs of the diameter measurement tool change system while evaluating the thickness of bamboo tubes, providing a basis for the design of subsequent bamboo strip sorting mechanism control systems.

Author Profile: Wang Shaoping, Male, Master’s Student, College of Mechanical and Electrical Engineering, Central South University of Forestry and Technology.

Corresponding Author: Yan Yonglin, Male, Professor, College of Mechanical and Electrical Engineering, Central South University of Forestry and Technology.

Funding Information: Key R&D Project of Hunan Province 2020 “Research and Demonstration of Key Technologies for Manufacturing Large-Scale Bamboo Engineered Materials” (2020NK2021).

1 Algorithm Principles and Calculation Steps

1.1 Object Detection

1.2 Stereo Matching and Distance Calculation

1.3 Semantic Segmentation

1.4 Size Calculation

2 Data Preparation and Model Training

2.1 Network Training and Test Set Preparation

2.2 Training Results

3 Raspberry Pi Embedded Experimental Platform Algorithm Deployment

3.1 OpenVINO Asynchronous Inference

3.2 Camera Calibration and Stereo Rectification

4 Algorithm Verification and Result Analysis

4.1 System Performance Testing

4.2 Measurement Conditions Analysis

4.3 Verification Experiment for Bamboo Tube Outer Diameter and Thickness Measurement

5 Conclusion

Related posts

Leave a Comment Cancel reply