Click on the top “Computer Vision Life“, select “Star”
Quickly get the latest dry goods
Guidance: This work was completed by a team from Hunan University, team members: Peng Yiping, Tao Ziming, Yuan Yucong, Wang Shixun, Tan Chang. The full text is about 3088 words, with a reading time of about 8 minutes, aiming to provide more learning references for developers. The article is reproduced from 3D Vision Developer Community, and is only for academic sharing.
1. Background Introduction
In fields such as intelligent manufacturing, AR, robotics, and indoor navigation, 3D reconstruction has a wide range of application prospects. With the popularity of consumer-grade RGB-D cameras, the application scenarios of 3D reconstruction have been further expanded. The depth camera Astra Pro developed by Orbeet has a relatively low cost, and can easily and quickly perform 3D imaging of objects, and has the advantage of high precision. Research and application of 3D reconstruction-related technologies will greatly promote the development of fields such as computer vision, and further deeply influence industrial production activities and people’s lifestyles.
Application Scenarios
At present, outdoor scene navigation has become very popular, while multi-story indoor places such as train stations, shopping malls, and supermarkets are usually large in scale, causing satellite positioning signals to weaken, making it difficult to achieve accurate positioning and navigation. Moreover, the internal routes of such places are very complex, and there is a strong demand for indoor navigation. Indoor positioning based on Bluetooth beacons and WiFi signals requires the deployment and maintenance of base stations in indoor locations, and buildings will also attenuate signals. Conducting 3D reconstruction of multi-story indoor environments and establishing indoor maps can enable vision-based indoor positioning and navigation, with advantages such as high precision and easy maintenance.
In addition, mixed reality (MR), indoor decoration, and other fields require 3D models as a basis. The indoor scene models obtained through 3D reconstruction can greatly improve the realism of applications and provide a better user experience.
Project Overview
The computing platform and RGB-D camera used in this project are the Zora P1 development board and Astra Pro depth camera developed by Orbeet. Based on the ORB-SLAM2 framework, pose estimation is performed and extended, constructing a complete system for dense reconstruction of large multi-story indoor environments, completing the acquisition of indoor 3D models, and focusing on the following aspects:
Global consistency model. The error in camera pose estimation accumulates as scanning progresses. If the accumulated error is not eliminated, the reconstructed 3D model will be distorted. To obtain a globally consistent 3D model, this project uses BoW (Bag-of-Words) for loop detection, and employs pose graph optimization and global BA (Bundle Adjustment) to correct pose drift and map point coordinate errors.
Model updating. 3D reconstruction inputs a sequence of images, and the algorithm not only needs to select key frames but also needs to be able to integrate new key frames into the 3D model based on their poses when new key frames are detected. We use TSDF (truncated signed distance function) maps to carry out voxel-based fusion and updating of the 3D model.
High-quality surface reconstruction. The RGB-D camera acquires depth maps and generates point clouds based on the camera’s internal parameters. However, completing 3D reconstruction ultimately requires obtaining continuous surfaces rather than discrete 3D points. We use the Marching Cube algorithm to reconstruct discrete spatial points into triangular patches.
2. System Composition
The hardware system of this project consists of the Zora P1 development board, Astra Pro depth camera, 11.6-inch LCD screen, and a lithium battery and inverter for power supply, as shown in Figure 1.
Figure 1 System Hardware CompositionThe Zora P1 development board has 2 USB3.0 and USB2.0 interfaces, as well as MIPI-CSI and Ethernet interfaces, making it easy to connect cameras and other peripherals. The development board comes with the Armbian operating system, which is specifically issued and recompiled for ARM development boards, allowing developers familiar with Ubuntu to quickly get started.The Astra Pro depth camera uses a monocular structured light method for 3D imaging, suitable for indoor scenes at close and medium distances, with advantages of high precision and low power consumption. Various functions can be realized based on this camera, such as face recognition, gesture recognition, human tracking, 3D measurement, environmental perception, and 3D map reconstruction, applicable in various scenarios such as living room entertainment, security monitoring, measurement, 3D scanning, interactive body sensing, and commercial display.
3. Algorithm Overview
This project mainly implements 3D reconstruction of large multi-story indoor environments. Using the ORB-SLAM2 open-source algorithm as the basic framework, key frame information including camera pose and timestamps is obtained through ORB feature descriptors, and corresponding pose, RGB, and depth information is matched according to key frame information to achieve 3D point cloud reconstruction through TSDF. To reproduce a realistic, textured, and spatially continuous 3D environment, this algorithm employs statistical filtering to optimize the quality of point clouds and uses the Marching Cube algorithm to achieve triangular patch reconstruction without losing environmental details. The specific design of this algorithm is described as follows:S1: This algorithm extends the ORB-SLAM2 framework to construct a complete system for dense reconstruction of large multi-story indoor environments. The principle schematic is shown in Figure 2:Figure 2 Algorithm FlowS2: In the ORB-SLAM2 framework, tracking, local mapping, and loop closure functions are realized through ORB feature descriptors. Key frames are determined based on established rules, and key frames are inserted and filtered while constructing the map, ultimately obtaining the key frame information required for dense 3D reconstruction.S3: Based on the filtered key frame information, pose, RGB images, depth maps, and timestamp information are matched. In traditional point cloud fusion and reconstruction, point cloud data is usually calculated first through depth maps and RGB images, and then fused based on pose information. However, this method has drawbacks such as high computational complexity, density explosion caused by point cloud accumulation, and cumulative errors in pose calculation.To avoid the above issues, this algorithm uses TSDF to reconstruct the point cloud dataset based on key frame information, pre-constructing a three-dimensional space to achieve controllable point cloud density and reduce unnecessary repeated calculations.S4: The reconstructed 3D environment point cloud undergoes statistical filtering to optimize point cloud quality. Throughout the data acquisition and reconstruction process, although the ORB descriptor in the ORB-SLAM2 framework avoids the influence of noise and outliers on pose estimation, the data collected by sensors inherently contains noise and outliers, and the reconstruction results will inevitably introduce these so-called ‘bad points’.To address the quality issues of the post-reconstruction point cloud, this algorithm conducts statistical analysis of the neighborhood of any point in the point cloud space, drawing a statistical histogram of neighborhood feature representation. Based on the characteristic differences between so-called ‘bad points’ and normal points, the histogram is cut to achieve statistical filtering of the point cloud.S5: The Marching Cube algorithm is used to achieve triangular patch reconstruction of the reconstructed point cloud. Since point cloud data is represented discretely in three-dimensional space, it is difficult to intuitively reflect the texture of the environment. The Marching Cube algorithm is used to reproduce a realistic, textured multi-story large indoor environment while retaining texture details, ultimately achieving a complete dense reconstruction system for large multi-story indoor environments.
4. Technical Advantage Analysis
Visual Frontend Based on ORB Feature Points
ORB feature points consist of improved FAST key points and BRIEF descriptors, possessing rotation invariance and scale invariance. Compared to feature points such as SIFT and SURF, ORB feature points have advantages of low computational cost, taking less than 33ms (30FPS) on CPUs, and high real-time performance.FAST key points compare the grayscale values of pixels around the center point with the grayscale value of the center point, and finally use non-maximum suppression to avoid excessive density of corner points within a certain area. Due to the simplicity of the steps, FAST key points can meet the real-time requirements of the visual frontend. Additionally, by constructing an image pyramid, FAST key point detection can be performed at various levels of the pyramid to meet the requirements of scale invariance. Using the grayscale centroid method can construct the main direction, thereby achieving rotation invariance.BRIEF descriptors use binary description vectors to describe the surrounding pixels of key points, making them highly specific and suitable for feature matching and appearance-based scene recognition for loop closure detection.
Point Cloud Fusion Based on TSDF
To address the drawbacks of traditional depth map to point cloud fusion steps, such as point cloud density explosion, complex calculations, and cumulative errors, this project utilizes selected key frame information to match pose, RGB images, and depth maps, achieving 3D point cloud fusion through the Truncated Signed Distance Function (TSDF).TSDF maps are voxel-based reconstruction methods made up of three-dimensional voxels, as shown in Figure 3. TSDF maps can run on GPUs, with each voxel being updated in parallel by the GPU, thus having the advantage of fast update speeds to meet real-time reconstruction requirements. Each voxel stores values from -1 to 1, where negative values indicate that the voxel is inside an object, and positive values indicate that the voxel is outside an object, with larger absolute values representing greater distances from the object’s surface. Thus, after point clouds are fused using the TSDF map, the process of extracting surfaces is simply the process of extracting zero isosurfaces.Figure 3 TSDF and Voxels
Surface Reconstruction Based on Marching Cube Algorithm
Point cloud data is represented discretely in three-dimensional space. The Marching Cube algorithm is used to extract isosurfaces from the TSDF map to achieve triangular patch reconstruction, reproducing a realistic, textured multi-story large indoor environment.The basic idea of the Marching Cube algorithm is to process each voxel in the scalar field, separating the voxels that intersect with the isosurface, and using interpolation to calculate the intersection points of the isosurface with the edges of the cube. Based on the relative positions of the isosurface to each vertex of the cube, the intersection points are connected in a specific manner to generate the isosurface as an approximate representation within that cube.Each voxel has 8 vertices, each of which can be in one of two states: above or below the isosurface, resulting in a total of 256 states. Considering rotation and emission symmetry, there are a total of 15 basic patterns, as shown in Figure 4.Figure 4 Fifteen Combinations of Marching Cube
5. Test Results
Using the hardware system shown in Figure 1, the ORB-SLAM2 open-source algorithm serves as the basic framework, implementing a visual frontend based on ORB feature points and a backend that includes pose graph optimization and global BA. Subsequently, matching information composed of timestamps, poses, RGB, and depth maps from Figure 2 is extracted, TSDF fusion is performed, and triangular patches are reconstructed using the Marching Cube algorithm, constructing a complete dense reconstruction system for large multi-story indoor environments. The reconstruction results are shown in Figure 5.Figure 5 3D Reconstruction Results6. Demo Presentation
Exclusive heavyweight courses!
1. 3D Reconstruction Course (2nd Session): “Visual Geometry Reconstruction” course and 84-page exclusive eBook are now online!
2. Heavyweight arrival! Multi-sensor Fusion SLAM series tutorials based on LiDAR: LOAM, LeGO-LOAM, LIO-SAM
3. Comprehensive camera calibration course: monocular/fisheye/stereo/array camera calibration: principles and practice
4. Essential courses for visual SLAM (2nd Session): Essential basics for visual SLAM: Detailed explanation of ORB-SLAM2 source code
5. Deep Learning 3D Reconstruction Course: Learning path for 3D reconstruction based on deep learning
6. Laser positioning + mapping course: How to learn laser SLAM? Hands-on teaching from beginner to expert in Cartographer!
7. Visual + IMU positioning course: All 58 lectures of the visual inertial odometry tutorial are online! IMU pre-integration/residual Jacobian derivation, marginalization constraints, sliding window BA!
The best SLAM and 3D vision learning community in the country ↓
Technical Communication WeChat Group
Welcome to join the WeChat group for readers of the public account to communicate with peers. Currently, there are WeChat groups for SLAM、3D Vision、Sensors、Autonomous Driving,Computational Photography, Detection, Segmentation, Recognition, Medical Imaging, GAN、Algorithm Competitions etc. Please add WeChat ID chichui502 or scan the QR code below to join the group, with the note: “Name/Nickname + School/Company + Research Direction”. Please note according to the format, otherwise it will not be approved. After successfully adding, invitations will be sent to relevant WeChat groups based on research direction. Please do not send advertisements in the group, otherwise you will be asked to leave the group, thank you for your understanding~
Submission and cooperation are also welcome to contact:[email protected]
Scan to follow the video number, see the latest technology landing and open source solution video show ↓
— Copyright Statement —
All original content of this public account belongs to Computer Vision Life; non-original text, images, and audio-visual materials collected, organized, and authorized for reproduction from public channels belong to the original authors. If there is any infringement, please contact us, and we will delete it in a timely manner.