K8S Lecture 24: Introduction to Chaos Engineering – Core Features of Chaos Mesh and Fault Injection

K8S Lecture 24: Introduction to Chaos Engineering - Core Features of Chaos Mesh and Fault Injection

1. Introduction: Why is Chaos Engineering Needed? In the cloud-native era, system complexity is growing exponentially, and traditional testing methods can no longer cover all failure scenarios. Chaos Engineering is a proactive experimental approach that injects faults to help us discover system weaknesses before real failures occur in production environments. As a CNCF incubated project, … Read more

Practical Implementation of Network Isolation Fault Injection in Database Clusters

Practical Implementation of Network Isolation Fault Injection in Database Clusters

1. Introduction In database clusters, network isolation is one of the common fault scenarios. For example, when a node cannot communicate with other nodes due to a network interruption, the cluster may trigger a master-slave switch, data synchronization interruption, or split-brain issues. By actively injecting network isolation faults, we can verify the cluster’s high availability, … Read more

Fault Injection Testing in Go: A Practical Approach to Chaos Engineering for System Resilience

Fault Injection Testing in Go: A Practical Approach to Chaos Engineering for System Resilience

Click the “blue text” above to follow us Server down! Database unreachable! Network timeout! — Do these words make your scalp tingle? In a production environment, systems can face various bizarre failures at any time. But how do we know if the system can withstand these “critical hits”? Waiting until something goes wrong to regret? … Read more

Chaos Engineering Tools: Implementing Pod-Level Fault Injection with Go

Chaos Engineering Tools: Implementing Pod-Level Fault Injection with Go

Click the “blue text” above to follow us Late-night overtime, just about to leave, suddenly the production environment alarms go off! Checking logs, monitoring, troubleshooting… After two hours of hassle, I found out it was a cascading failure caused by a timeout in a dependent service. Sigh! Does this situation sound familiar? In a microservices … Read more

Injecting Faults and Debugging with ChaosBlade-Operator in K8S

Injecting Faults and Debugging with ChaosBlade-Operator in K8S

▌Introduction: A New Tool for Chaos Engineering ChaosBlade, as an open-source chaos engineering toolchain from Alibaba, abstracts chaos experiments into Kubernetes CRD resources through the ChaosBlade-Operator project. Utilizing the open capabilities of K8S, CRD resources can manage all software and hardware resources and interact with various resources, achieving declarative chaos experiment management that makes fault … Read more

Innovative Development: Fudian Bank Completes Its First Chaos Attack and Defense Drill for Business Systems and Pre-Production Fault Injection for the Next-Generation Core System

Innovative Development: Fudian Bank Completes Its First Chaos Attack and Defense Drill for Business Systems and Pre-Production Fault Injection for the Next-Generation Core System

In the context of continuous innovation and development in financial digitization, Fudian Bank’s information system has gradually transitioned from the original monolithic centralized architecture to a distributed architecture, and from the original IOE architecture to a fully domestically produced architecture. In 2024, Fudian Bank will take the lead in launching the cloud migration of the … Read more

Chaos Engineering Practice: Fault Injection and Monitoring System with Chaos Mesh

Chaos Engineering Practice: Fault Injection and Monitoring System with Chaos Mesh

1. Let’s clarify what Chaos Engineering is. A few days ago, my colleague Wang was mumbling in the break room: “Our system claims to be highly available, but who knows if it can really hold up when something goes wrong?” This hits the nail on the head—Chaos Engineering is essentially the study of proactively finding … Read more

Exploring Cloud System Stability Through Chaos Engineering

Exploring Cloud System Stability Through Chaos Engineering

By / China International Capital Corporation Information Technology Department Zhang Jing, Zhang Li, Lu Fei, Shi Peixuan The rapid development of financial technology has led China International Capital Corporation (CICC) to adapt accordingly, gradually moving its business systems to the cloud. However, moving to the cloud has increased the complexity of IT infrastructure and business … Read more

Strengthening System Resilience in the Cloud-Native Era

Strengthening System Resilience in the Cloud-Native Era

IT system construction has evolved through standalone, centralized, and distributed architectures, and the complexity of system operation and maintenance drills and fault simulation testing has continuously increased. In complex distributed systems, both infrastructure and application platforms can experience unpredictable failures. Without knowing the root cause of a failure, we cannot prevent its occurrence. A more … Read more

Breakthrough in Traditional Reliability Testing: Best Practices in Chaos Engineering

Against the backdrop of the rapid and stable development of the digital economy, cloud computing has become the cornerstone of enterprises’ digital transformation. The application layer pursues more comprehensive, convenient, and faster services, which in turn drives the technology layer systems to become increasingly large, making it more challenging to maintain these systems. The occurrence … Read more