In the context of continuous innovation and development in financial digitization, Fudian Bank’s information system has gradually transitioned from the original monolithic centralized architecture to a distributed architecture, and from the original IOE architecture to a fully domestically produced architecture. In 2024, Fudian Bank will take the lead in launching the cloud migration of the next-generation core system based on distributed service mesh and full-stack domestic technology transformation, planning to integrate the original core system with financial distributed cloud platform technology. The new generation core system is expected to officially go live by mid-2025.
On March 21, 2025, the Information Technology Department and Technology Center of Fudian Bank successfully completed the first chaos attack and defense drill in the “Next-Generation Core System” project. The transformation to domestic technology, the migration of mainframe systems, and the governance of distributed grid services have increased the complexity of the architecture of the next-generation core system. Introducing chaos testing engineering to verify the service capabilities of high availability, disaster recovery, operations, and emergency response for the new system is the most effective means to ensure that the system continues to provide stable business support after going live.
First, quality management personnel identified nearly 200 potential fault scenarios that could occur during daily operations, including IAAS, PAAS, DAAS, and core business services. They analyzed each scenario and wrote fault injection scripts to derive expected results.
Then, the operations and maintenance management personnel were divided into blue and red teams. The blue team injected the identified faults into the next-generation core system through the chaos engineering drill platform, while the red team monitored business stability using the business monitoring platform (RMS) and database monitoring platform (OCP). When anomalies occurred, they promptly investigated and addressed issues according to the operations manual and emergency procedures. A total of 13 faults were randomly injected during the drill, with 8 handled well, 3 moderately, and 2 poorly. After the drill, both teams conducted a comprehensive review of the process and proposed improvement measures to guide future work.
The drill is just the starting point. Through chaos attack and defense drills, the system’s real monitoring, alerting, and fault handling capabilities will be effectively tested, identifying shortcomings in the production preparation work, filling gaps, and continuously enhancing business continuity support capabilities. In the next step, stability testing based on chaos engineering will gradually be promoted across relevant information systems in the bank, further enhancing the ability to discover root causes of faults, emergency response capabilities, and business continuity assurance capabilities through a rich chaos expert database and a normalized drill mechanism.
Source: Information Technology Department, Technology CenterReviewed by: Party Committee Propaganda Department