Exploring the Principles of Fault Injection

Introduction

As the number of channel APIs increases, the user base is also growing rapidly. Due to the inherent business complexity of channel APIs and the numerous underlying services they depend on, potential issues can have a significant impact. Relying solely on common unit tests, integration tests, and performance tests to verify service stability is no longer sufficient. Therefore, last year we completed fault injection drills for the channel API and traffic monetization platform based on the platform’s chaos engineering, focusing on scenarios such as MySQL latency, MQ latency, request latency, and exceptions. During the drills, we identified and resolved potential issues in advance, while also gaining insights into the principles of fault injection for future customized fault drill scenarios based on the unique characteristics of the business.

1. Overview of Chaosblade

In fact, Chaosblade is an aggregated parent project that encapsulates all experimental scenario entrances into a command-line tool, which calls various specific implementations for different scenarios. It encapsulates scenarios by domain into individual projects, accommodating the implementation differences across platforms and languages. This not only standardizes the implementation of scenarios within the domain but also facilitates horizontal and vertical expansion of scenarios. By following the chaos experiment model, Chaosblade CLI achieves unified invocation. The projects currently included are as follows:

Exploring the Principles of Fault Injection

2. chaosblade-exec-jvm

2.1 System Design

Chaosblade-exec-jvm implements class transformation injection faults through the JavaAgent attach method, utilizing jvm-sandbox as the underlying implementation. It expands support for different Java applications through a plug-and-play design. Therefore, chaosblade-exec-jvm is merely a Java agent module and not an executable project; it must rely on jvm-sandbox.

2.2 Project Architecture

Exploring the Principles of Fault Injection

2.3 Module Management

Exploring the Principles of Fault Injection

2.4 Implementation Principles

Taking the delay of the /test interface for servlets and APIs as an example:

Exploring the Principles of Fault Injection

2.5 Experimental Steps

Exploring the Principles of Fault Injection

2.5.1 Agent Mounting

After this command is issued, the agent will be mounted in the target JVM process, triggering the SandboxModule onLoad() event, initializing the PluginLifecycleListener to manage the plugin lifecycle, while also triggering the SandboxModule onActive() event to load some plugins and the corresponding ModelSpec.

public void onLoad() throws Throwable {LOGGER.info("load chaosblade module"); ManagerFactory.getListenerManager().setPluginLifecycleListener(this); dispatchService.load(); ManagerFactory.load();// ChaosBlade module activation implementation}public void onActive() throws Throwable {LOGGER.info("active chaosblade module");loadPlugins();}

Plugin loading methods: • SandboxModule onActive() event • blade create command CreateHandler; The SandboxModule onActive() event will register ModelSpec; When loading plugins, an event listener is created using SandboxEnhancerFactory.createAfterEventListener(plugin) to listen for interesting events such as BeforeAdvice, AfterAdvice, etc. The specific implementation is as follows:

// Load pluginpublic void add(PluginBean plugin) {PointCut pointCut = plugin.getPointCut();if (pointCut == null) {return;}String enhancerName = plugin.getEnhancer().getClass().getSimpleName();// Create filter PointCut matchingFilter filter = SandboxEnhancerFactory.createFilter(enhancerName, pointCut);if (plugin.isAfterEvent()) {            // Event listener            int watcherId = moduleEventWatcher.watch(filter, SandboxEnhancerFactory.createAfterEventListener(plugin),                Type.BEFORE, Type.RETURN);            watchIds.put(PluginUtil.getIdentifierForAfterEvent(plugin), watcherId);        } else {            int watcherId = moduleEventWatcher.watch(                filter, SandboxEnhancerFactory.createBeforeEventListener(plugin), Event.Type.BEFORE);            watchIds.put(PluginUtil.getIdentifier(plugin), watcherId);        }    }

PointCut matching triggers the SandboxModule onActive() event to load plugins. After the filter is created by SandboxEnhancerFactory, it filters through the PointCut’s ClassMatcher and MethodMatcher.

When the Enhancer is triggered, if the plugin has already been loaded, the target application can match the filter, allowing the EventListener to be triggered. However, chaosblade-exec-jvm manages its state internally through StatusManager, so the fault capability will not be triggered.

For example, the BeforeEventListener triggers the beforeAdvice method of BeforeEnhancer, which can be interrupted when checking ManagerFactory.getStatusManager().expExists(targetName). The specific implementation is as follows:

com.alibaba.chaosblade.exec.common.aop.BeforeEnhancerpublic void beforeAdvice(String targetName,ClassLoader classLoader,String className,Object object,Method method,Object[] methodArguments) throws Exception {// StatusManagerif (!ManagerFactory.getStatusManager().expExists(targetName)) {return;}EnhancerModel model = doBeforeAdvice(classLoader, className, object, method, methodArguments);if (model == null) {return;}model.setTarget(targetName).setMethod(method).setObject(object).setMethodArguments(methodArguments);Injector.inject(model);}

2.5.2 Creating Chaos Experiments

./blade create servlet –requestpath=/topic delay –time=3000 After this command is issued, it triggers the SandboxModule @Http(“/create”) annotated method, dispatching the event to com.alibaba.chaosblade.exec.service.handler.CreateHandler for processing. After verifying the necessary uid, target, action, and model parameters, it calls handleInjection. handleInjection registers this experiment with the status manager. If the plugin type is PreCreateInjectionModelHandler, it will pre-process some items. If the Action type is DirectlyInjectionAction, it will directly inject fault capabilities, such as JVM OOM; otherwise, it will load the plugin. If the ModelSpec is of type PreCreateInjectionModelHandler and the ActionSpec is of type DirectlyInjectionAction, it will directly inject fault capabilities like JvmOom; if the ActionSpec type is not DirectlyInjectionAction, it will load the plugin directly.

private Response handleInjection(String suid, Model model, ModelSpec modelSpec) {// RegisterRegisterResult result = this.statusManager.registerExp(suid, model);if (result.isSuccess()) {// handle injectiontry {applyPreInjectionModelHandler(suid, modelSpec, model);} catch (ExperimentException ex) {this.statusManager.removeExp(suid);return Response.ofFailure(Response.Code.SERVER_ERROR, ex.getMessage());}return Response.ofSuccess(model.toString());}return Response.ofFailure(Response.Code.DUPLICATE_INJECTION, "the experiment exists");}

Upon successful registration, it returns the uid. If fault capability injection is performed directly at this stage or if the custom Enhancer advice returns null, it will not trigger the fault through the Inject class.

2.5.3 Fault Capability Injection

Fault capability injection can be done through Inject or directly through DirectlyInjectionAction. Direct injection does not go through the Inject class calling phase, such as JVM OOM; matching parameters wrap custom Enhancers, such as ServletEnhancer, wrapping parameters that need to match command-line parameters in MatcherModel, and then returning the wrapped EnhancerModel. For example, –requestpath = /index means that requestpath equals requestURI minus contextPath. Parameter matching is judged at the Injector.inject(model) stage.

public class ServletEnhancer extends BeforeEnhancer {private static final Logger LOOGER = LoggerFactory.getLogger(ServletEnhancer.class);@Override    public EnhancerModel doBeforeAdvice(ClassLoader classLoader, String className, Object object,                                        Method method, Object[] methodArguments,String targetName)        throws Exception {        // Get some parameters of the original method        Object request = methodArguments[0];        String queryString = ReflectUtil.invokeMethod(request, "getQueryString", new Object[] {}, false);        String contextPath = ReflectUtil.invokeMethod(request, "getContextPath", new Object[] {}, false);        String requestURI = ReflectUtil.invokeMethod(request, "getRequestURI", new Object[] {}, false);        String requestMethod = ReflectUtil.invokeMethod(request, "getMethod", new Object[] {}, false);        String requestPath = StringUtils.isBlank(contextPath) ? requestURI : requestURI.replaceFirst(contextPath, "");        //        MatcherModel matcherModel = new MatcherModel();        matcherModel.add(ServletConstant.QUERY_STRING_KEY, queryString);        matcherModel.add(ServletConstant.METHOD_KEY, requestMethod);        matcherModel.add(ServletConstant.REQUEST_PATH_KEY, requestPath);        return new EnhancerModel(classLoader, matcherModel);    }}

Parameter matching and capability injection (Inject call) in the inject phase first retrieves the experiments registered by StatusManager, and compare(model, enhancerModel) frequently compares parameters. If it fails, it returns. limitAndIncrease(statusMetric) determines –effect-count –effect-percent to control the frequency and percentage of impact.

public static void inject(EnhancerModel enhancerModel) throws InterruptProcessException {String target = enhancerModel.getTarget();List<StatusMetric> statusMetrics = ManagerFactory.getStatusManager().getExpByTarget(target);for (StatusMetric statusMetric : statusMetrics) {Model model = statusMetric.getModel();if (!compare(model, enhancerModel)) {continue;}try {boolean pass = limitAndIncrease(statusMetric);if (!pass) {LOGGER.info("Limited by: {}", JSON.toJSONString(model));break;}LOGGER.info("Match rule: {}", JSON.toJSONString(model));enhancerModel.merge(model);ModelSpec modelSpec = ManagerFactory.getModelSpecManager().getModelSpec(target);ActionSpec actionSpec = modelSpec.getActionSpec(model.getActionName());actionSpec.getActionExecutor().run(enhancerModel);} catch (InterruptProcessException e) {throw e;} catch (UnsupportedReturnTypeException e) {LOGGER.warn("unsupported return type for return experiment", e);statusMetric.decrease();} catch (Throwable e) {LOGGER.warn("inject exception", e); statusMetric.decrease();}break;}}

Fault triggering is initiated by Inject or directly by DirectlyInjectionAction, ultimately calling the custom ActionExecutor to generate faults, such as DefaultDelayExecutor. At this point, the fault capability has been activated.

public void run(EnhancerModel enhancerModel) throws Exception {String time = enhancerModel.getActionFlag(timeFlagSpec.getName());Integer sleepTimeInMillis = Integer.valueOf(time);int offset = 0;String offsetTime = enhancerModel.getActionFlag(timeOffsetFlagSpec.getName());if (!StringUtil.isBlank(offsetTime)) {offset = Integer.valueOf(offsetTime);}TimeoutExecutor timeoutExecutor = enhancerModel.getTimeoutExecutor();if (timeoutExecutor != null) {long timeoutInMillis = timeoutExecutor.getTimeoutInMillis();if (timeoutInMillis > 0 && timeoutInMillis < sleepTimeInMillis) {sleep(timeoutInMillis, 0);timeoutExecutor.run(enhancerModel);return;}}sleep(sleepTimeInMillis, offset);}public void sleep(long timeInMillis, int offsetInMillis) {Random random = new Random();int offset = 0;if (offsetInMillis > 0) {offset = random.nextInt(offsetInMillis);}if (offset % 2 == 0) {timeInMillis = timeInMillis + offset;} else {timeInMillis = timeInMillis - offset;}if (timeInMillis <= 0) {timeInMillis = offsetInMillis;}try {// Trigger delayTimeUnit.MILLISECONDS.sleep(timeInMillis);} catch (InterruptedException e) {LOGGER.error("running delay action interrupted", e);}}

2.5.4 Destruction

./blade destroy 52a27bafc252beee After this command is issued, it triggers the SandboxModule @Http(“/destroy”) annotated method, dispatching the event to com.alibaba.chaosblade.exec.service.handler.DestroyHandler for processing. It unregisters the status of this fault. If the plugin’s ModelSpec is of type PreDestroyInjectionModelHandler and the ActionSpec is of type DirectlyInjectionAction, it stops fault capability injection; if the ActionSpec type is not DirectlyInjectionAction, it will unload the plugin.

public Response handle(Request request) {String uid = request.getParam("suid");String target = request.getParam("target");String action = request.getParam("action");if (StringUtil.isBlank(uid)) {if (StringUtil.isBlank(target) || StringUtil.isBlank(action)) {return Response.ofFailure(Code.ILLEGAL_PARAMETER, "less necessary parameters, such as uid, target and" + " action");}// Unregister statusreturn destroy(target, action);}return destroy(uid);}

2.5.5 Unloading the Agent

./blade revoke 98e792c9a9a5dfea After this command is issued, it triggers the SandboxModule unload() event, and the plugin is unloaded.

public void onUnload() throws Throwable { LOGGER.info("unload chaosblade module"); dispatchService.unload(); ManagerFactory.unload(); watchIds.clear(); LOGGER.info("unload chaosblade module successfully"); }

Conclusion

The above is the overall process of chaosblade-exec-jvm. It also supports the customization of its own plugins under the chaosblade-exec-plugin module, allowing for customized drill scenarios based on one’s own projects. By simulating various possible fault situations, vulnerabilities and weaknesses can be identified early for improvement and enhancement.

Author Introduction

Hippo, a backend development expert at Xinyi Technology.

Leave a Comment