Interactive Platform for Deep Reinforcement Learning and Wind Tunnel Testing
Xinhui Dong, Zhuoran Wang, Pengfei Lin, Qiulei Wang, Gang Hu
Harbin Institute of Technology (Shenzhen), School of Civil and Environmental Engineering, Intelligent Wind Engineering Laboratory
Citation format:Xinhui Dong, Zhuoran Wang, Pengfei Lin, Qiulei Wang, Gang Hu; An interactive platform of deep reinforcement learning and wind tunnel testing. Physics of Fluids 1 November 2024; 36 (11): 115197. https://doi.org/10.1063/5.0238959
Flow control has become an important research direction in fluid mechanics as it can regulate the forces and vibrations of structures such as buildings, wings, and vehicles. It has great potential in fields like automotive energy efficiency improvement and aerodynamic optimization. Flow control technology is divided into passive and active control, with many scholars shifting their research focus towards active control techniques due to the limitations of passive methods. Existing studies have explored methods such as rotating cylinders, synthetic jets, and plasma actuators, effectively controlling the aerodynamic forces on blunt bodies, providing possibilities for fine control in complex wind fields. Meanwhile, deep reinforcement learning (DRL) has emerged as a new direction for active flow control research due to its advantages in dealing with high-dimensional, multi-modal, and complex spatiotemporal variation problems.
Nevertheless, current active flow control research based on DRL still primarily relies on CFD simulations, mainly because there are challenges related to software, hardware, and their interactions in experimental environments, as shown in Figure 1.
Figure 1 Three major challenges in experimental research on active flow control based on deep reinforcement learning
1. In terms of software, DRL algorithms require a high level of parameter tuning capability, and due to the inefficiency of training data utilization, they can lead to prolonged experimental times, increased equipment wear, and rising costs.
2. In terms of hardware, the assembly and integration of devices are complex, involving power and information transmission, as well as connections between controllers and sensors, which are time-consuming and labor-intensive.
3. In terms of hardware and software interaction, data transmission relies on complex communication protocols, with varying standards across different devices; the DRL system needs to develop Python interfaces to coordinate the read and write operations of multiple protocols, which increases technical difficulty.
CFD simulations have propelled the development of active flow control, but experimental validation is crucial for the effectiveness of research. However, there is currently no open-source platform that integrates software, hardware, and algorithms to simplify the interaction between wind tunnel testing and DRL algorithms.
To address these challenges, this study has developed an open-source DRL wind tunnel testing interaction platform called “DRLinWT”. This platform unifies common communication protocols through a general adapter and integrates DRL libraries such as Stable Baselines3 and Tianshou, as shown in Figure 2.
Figure 2 Architecture of DRLinWT implemented in Python
2 DELinWT: Interactive Platform for Deep Reinforcement Learning and Wind Tunnel Testing
The wind tunnel is an important tool in fluid mechanics research, used to simulate airflow conditions around fixed objects and is widely applied in aerodynamic studies. Common wind tunnel tests in the field of flow control include force measurement, pressure measurement, and flow field visualization. These tests typically require support from various devices, which are mainly categorized into sensors, controllers, and actuators. Sensors are used for data acquisition, such as force sensors measuring force and torque, and pressure scanning valves recording pressure data.
Controllers are the core components in wind tunnel tests, used to manage and regulate the testing process. Typical controllers include PCs, Arduino microcontrollers, and programmable logic controllers (PLCs). These controllers interact with sensors and actuators by executing preset algorithms to automate and dynamically adjust the testing process. Actuators are responsible for performing specific actions based on control signals, such as motors adjusting the angle of the wind tunnel model to simulate different angles of attack, flow controllers regulating airflow for blowing and suction control, and servo drives and piezoelectric actuators used for precise position and vibration control.
Deep reinforcement learning (DRL) is a method that integrates deep learning and reinforcement learning, effectively solving complex decision-making and control problems. DRL learns policies that maximize cumulative rewards through interaction with the environment.
To tackle the complexities and cost issues of hardware and software interaction in experiments, this study has developed the open-source DRL wind tunnel testing interaction platform “DRLinWT”. This platform combines mainstream DRL libraries (such as Stable Baselines3 and Tianshou) and supports standardized Gym environments. In terms of device control, DRLinWT is compatible with various communication protocols, including serial communication, Modbus RTU, TCP/IP, and UDP protocols through a general adapter, achieving closed-loop interaction between sensor states and agent instructions, as shown in Figure 3.
Figure 3 Flowchart of DRLinWT, aimed at achieving information exchange between DRL and the wind tunnel
To validate the feasibility of DRLinWT, this study conducted a case study on flow control around a square cylinder based on DRLinWT. The experiment was carried out in the wind tunnel of the AIWE laboratory at Harbin Institute of Technology (Shenzhen), aiming to optimize the aerodynamic performance of the model while considering energy consumption. The size of the wind tunnel test section is 500 mm × 500 mm × 800 mm, with a wind speed range of 1 to 30 m/s and a turbulence intensity of less than 5%. The test model is a square cylinder with a length of 600 mm and a side length of 50 mm, with 50 pressure holes arranged in the central area for measuring wind pressure distribution and aerodynamic coefficients, as shown in Figure 4.
Figure 4 Square cylinder model with suction holes and pressure measurement points
This study designed three increasing complexity inflow conditions: periodic wind field (wind speed 3 to 10 m/s, maintaining each wind speed for 120 seconds), random wind field (wind speed 3 to 10 m/s, maintaining each wind speed for 120 seconds), and rapidly changing random wind field (wind speed 3 to 10 m/s, maintaining each wind speed for 30 seconds). By gradually increasing the complexity of the testing conditions, the control capability and robustness of DRLinWT in the experimental flow field were comprehensively evaluated, verifying its effectiveness in real environments, as shown in Figure 5.
(a) Inflow for Case 1: Periodic flow field lasting 120 seconds
(b) Inflow for Case 2: Random flow field lasting 120 seconds
(c) Inflow for Case 3: Random flow field lasting 30 seconds
Figure 5 Three cases: Increasing complexity of inflow fields
In the application of the DRLinWT platform, the flow controller and pressure scanning valve serve as the controller and sensor, respectively. The controller achieves the action instructions of the DRL algorithm by adjusting the blowing flow rate, while the sensor measures the wind pressure data on the model’s surface for calculating the drag coefficient (Cd) and lift coefficient (Cl), as shown in Figure 6. The two communicate via Modbus RTU and UDP protocols, using the Adapter.W and Adapter.R functions of DRLinWT to complete instruction transmission and data acquisition.
Figure 6 Wind tunnel test arrangement and laboratory equipment
(1) Deep Reinforcement Learning Training
In Case 1, the learning rate was set to 0.001. The reward curve (see Figure 7(a)) fluctuated significantly in the early stages, stabilizing after 400 steps within a range of 0.8 to 1.2, exhibiting periodic fluctuations. This periodicity is due to the convergence of the control strategy, as the system adapted to the periodic changes in wind speed, forming a corresponding periodic response strategy, resulting in the control effect and reward values also exhibiting periodicity. Additionally, the average drag coefficient (mean Cd) and lift coefficient standard deviation (std Cl) stabilized at lower values after 400 steps, as shown in Figures 7(b) and (c).
Figure 7 Reward values, Cd, and Cl variation curves during training in Case 1
The wind field in Case 1 is characterized by periodic wind speed changes, with predictable environmental changes. However, in real scenarios, the variation of the wind field is often more complex and unpredictable. To simulate this complexity, Case 2 broke the original periodic wind field, introducing randomness to make it more realistic and challenging.
Case 2 consisted of two experiments. In the first experiment, the hyperparameter configuration was the same as in Case 1; in the second experiment, the learning rate was adjusted to 0.0003 at step 520. As shown in Figure 8(a), the reward curve under the mixed learning rate was significantly higher than that under the constant learning rate, indicating that the adjusted learning rate improved performance. This is because the training difficulty of Case 2 was higher than that of Case 1, and reducing the learning rate at the right time could achieve more refined training results. Figures 8(b) and (c) show that the Cd and Cl curves ultimately stabilized at lower levels, with differences primarily attributed to variations in energy consumption. The mixed learning rate could improve the efficiency of the control strategy while reducing control-related energy consumption.
Figure 8 Reward values, Cd, and Cl variation curves during training in Case 2
Due to the randomness and unpredictability of wind speed, even in the later stages of training, the reward curve still experienced significant fluctuations. These fluctuations stem from the severe changes in the wind field and the errors in wind speed during action implementation. However, this unpredictability enhances the model’s adaptability and practical application capabilities to some extent.
Based on Case 2, Case 3 used three learning rates (0.001, 0.0003, and 0.0001) at step 520. Figure 9 shows that the reward, Cd, and Cl curves under the three learning rates ultimately exhibited similar convergence trends but did not reach a stable state. This was due to the rapid and severe changes in the wind field, which increased the task difficulty, making the effect of learning rate adjustments limited.
(b) Cd value curve
Figure 9 Reward values, Cd, and Cl variation curves during training in Case 3
To compare the advantages and disadvantages of the models from the three cases, the optimal models from the three cases were tested under three different wind fields, with results presented in box plots shown in Figure 10.
(a) Box plot of rewards, average Cd, and standard deviation Cl for the three models evaluated in the flow field of Case 1
(b) Box plot of rewards, average Cd, and standard deviation Cl for the three models evaluated in the flow field of Case 2
(c) Box plot of rewards, average Cd, and standard deviation Cl for the three models evaluated in the flow field of Case 3
Figure 10 Box plots of rewards, average Cd, and standard deviation Cl for the best models of the three cases during evaluation
From the figure, it can be seen that the average reward values and medians of the three models in the three wind fields are relatively consistent, but there are significant differences in stability. The model from Case 2 exhibits the highest stability across all wind fields; the model from Case 1 is stable in its training wind field but performs poorly in other wind fields; the model from Case 3 demonstrates low stability in all wind fields.
In terms of drag and lift coefficients, there are significant performance differences among the three models, further illustrating their adaptability and performance differences under different wind field conditions. In Figure 10(a), the model from Case 2 performs best in terms of average and median values, followed by Case 1, and Case 3 performs the worst. In terms of stability, the ranking is Case 1 > Case 2 > Case 3. In Figure 10(b), the average and median rankings are Case 2 < Case 1 ≤ Case 3, with stability ranking as Case 2 > Case 1 ≥ Case 3. In Figure 10(c), the ranking of average drag coefficients is consistent with the previous two wind fields, namely Case 2 < Case 1 ≤ Case 3; however, the performance of lift coefficient standard deviation is best for Case 2, followed by Case 3, and Case 1 performs the worst.
In summary, the model from Case 2 is the most robust among the three. Except for slightly lower reward values and average drag coefficients in the wind field of Case 1 compared to the model from Case 1, the model from Case 2 performs best in almost all scenarios, particularly significantly outperforming the other two models in the other two wind fields.
(3) Effectiveness Verification
Subsequently, the model from Case 2 was verified. The testing method is as follows: first, the wind field is turned on, and the blowing pipe is turned off, recording the average drag coefficient and lift coefficient standard deviation under uncontrolled conditions. At step 240, the blowing pipe is turned on, and control is performed by the DRL model from Case 2, with verification results shown in Figure 11. The model from Case 2 can effectively reduce the drag and lift coefficients of the two-dimensional square cylinder under the three wind fields, with the average drag coefficient reduced by approximately 16% and the lift coefficient standard deviation reduced by approximately 88%. This result not only reflects the control capability of DRL but also verifies the effectiveness of DRLinWT in facilitating the interaction between DRL and wind tunnel testing.
Figure 11 Verification of the most robust model (from Case 2) in three flow field situations, with control starting from step 240
(4) Spectrum and Pressure Coefficient Analysis
Furthermore, based on the verification of the Case 2 model, the lift data under controlled and uncontrolled conditions at a randomly selected wind speed of 5.8 m/s were analyzed for frequency spectrum and pressure coefficient. In the frequency spectrum analysis, the frequency was converted to Strouhal number, with results shown in Figure 12, indicating that the main peak of vortex shedding disappeared after control, demonstrating that the control strategy effectively suppressed vortex energy and reduced aerodynamic fluctuations acting on the square cylinder. As shown in Figure 13, the average pressure coefficient on the windward side of the model remained almost unchanged after control, while the average pressure coefficients on the other three sides decreased, especially the negative pressure on the leeward side increased, consistent with the reduction in average drag coefficient. Meanwhile, the standard deviation of pressure coefficients on both sides significantly decreased, consistent with the significant decrease in lift coefficient standard deviation.
Figure 12 Power spectral density curves under controlled and uncontrolled conditions at a wind speed of 5.8 m/s
(a) Average pressure coefficient
(b) Pressure coefficient standard deviation
Figure 13 Average and standard deviation of pressure coefficients on the surface of the square cylinder under controlled and uncontrolled conditions at a wind speed of 5.8 m/s
From the action box plots of the three models, as shown in Figure 14, it can be analyzed that the action value distributions of the three models in different flow fields exhibit significantly different patterns. The action values of the model in Case 1 are widely distributed, ranging from 13 to 27 L/min; the action values of the model in Case 2 are concentrated, with flow rates between 18 and 26 L/min; the model in Case 3 tends towards moderate flow rates, around 16 to 23 L/min.
Figure 14 Action box plots of the three cases during evaluation
Analyzing the reasons for this, the model in Case 1 performs well in predictable environments, effectively responding to wind speed changes and achieving high rewards. However, in unpredictable environments, the model struggles to adapt to rapid changes in wind speed, leading to performance degradation. In contrast, the model in Case 3, due to the complex wind field environment, exhibits more conservative behavior, with flow rates concentrated in a moderate range. The model in Case 2 performs best, demonstrating stronger adaptability and stability, able to balance flow rate choices and adapt to sudden changes in wind speed, exhibiting higher stability and reliability.
An interactive platform of deep reinforcement learning and wind tunnel testing.pdf