Recently, while looking at the TiDB source code, I found that it used failpoint for fault injection, which I found very interesting. It involves code generation and parsing the code AST tree for replacement to implement fault injection. I will also try to analyze it and learn how to parse the AST tree to generate code.
So, this article mainly explores the detailed usage of failpoint and its implementation principles.
Introduction
Failpoint is a tool for injecting errors during testing, and it is the Golang implementation of FreeBSD Failpoints. Typically, to enhance system stability, we have various testing scenarios, but some scenarios are very difficult to simulate, such as: random delays in a microservice, unavailability of a service; in game development, simulating unstable player networks, frame drops, excessive delays, etc.;
To conveniently test these issues, failpoint was created, which greatly simplifies our testing process, helping us simulate various errors in different scenarios to debug code bugs.
Failpoint has several main advantages:
-
Failpoint related code should have no additional overhead; -
It should not affect normal functional logic and must not invade functional code; -
Failpoint code must be easy to read, easy to write, and able to introduce compiler checks; -
The final generated code must be readable; -
In the generated code, the line numbers of functional logic code must not change (to facilitate debugging);
Usage
First, we need to build using the source code:
git clone https://github.com/pingcap/failpoint.git
cd failpoint
make
ls bin/failpoint-ctl
Translate the binary failpoint-ctl
for code transformation.
Then, we can use failpoint in the code to inject faults:
package main
import "github.com/pingcap/failpoint"
import "fmt"
func test() {
failpoint.Inject("testValue", func(v failpoint.Value) {
fmt.Println(v)
})
}
func main(){
for i:=0;i<100;i++{
test()
}
}
We can see when we enter the Inject method:
func Inject(fpname string, fpbody interface{}) {}
When failpoint is not enabled, it is just an empty implementation and does not affect the performance of our business logic. When our service code is compiled and built, this piece of code will be inlined and optimized away, which is the zero-cost fault injection principle implemented by failpoint.
Next, we will convert the above test function into usable fault injection code:
$ failpoint/bin/failpoint-ctl enable .
Call the compiled failpoint-ctl
to rewrite the current code:
package main
import (
"fmt"
"github.com/pingcap/failpoint"
)
func test() {
if v, _err_ := failpoint.Eval(_curpkg_("testValue")); _err_ == nil {
fmt.Println(v)
}
}
func main() {
for i := 0; i < 100; i++ {
test()
}
}
Next, we will perform the injection on the code:
$ GO_FAILPOINTS='main/testValue=2*return("abc")' go run main.go binding__failpoint_binding__.go
abc
abc
In the above case, 2 indicates that the injection will only execute twice, and the parameter in return("abc")
corresponds to the variable v obtained in the injection function.
Additionally, we can also set the activation probability:
$ GO_FAILPOINTS='main/testValue=5%return("abc")' go run main.go binding__failpoint_binding__.go
abc
abc
abc
abc
In the above case, 5%
indicates that it will only return abc with a 5% probability.
Besides the simple examples above, we can also use it to generate more complex scenarios:
package main
import (
"fmt"
"github.com/pingcap/failpoint"
"math/rand"
)
func main() {
failpoint.Label("outer")
for i := 0; i < 100; i++ {
failpoint.Label("inner")
for j := 1; j < 1000; j++ {
switch rand.Intn(j) + i {
case j / 5:
failpoint.Break()
case j / 7:
failpoint.Continue("outer")
case j / 9:
failpoint.Fallthrough()
case j / 10:
failpoint.Goto("outer")
default:
failpoint.Inject("failpoint-name", func(val failpoint.Value) {
fmt.Println("unit-test", val.(int))
if val == j/11 {
failpoint.Break("inner")
} else {
failpoint.Goto("outer")
}
})
}
}
}
}
In this example, we used failpoint.Break
, failpoint.Goto
, failpoint.Continue
, failpoint.Label
to implement code jumps, and the final generated code:
func main() {
outer:
for i := 0; i < 100; i++ {
inner:
for j := 1; j < 1000; j++ {
switch rand.Intn(j) + i {
case j / 5:
break
case j / 7:
continue outer
case j / 9:
fallthrough
case j / 10:
goto outer
default:
if val, _err_ := failpoint.Eval(_curpkg_("failpoint-name")); _err_ == nil {
fmt.Println("unit-test", val.(int))
if val == j/11 {
break inner
} else {
goto outer
}
}
}
}
}
}
We can see that our failpoint code has all been transformed into Go language jump keywords.
After testing, we can finally restore the code using disable:
$ failpoint/bin/failpoint-ctl disable .
Other usage methods can be found in the official documentation:
https://github.com/pingcap/failpoint
Implementation Principles
Code Injection
Example Explanation
When using failpoint, we will use a series of Marker functions it provides to construct our fault points:
func Inject(fpname string, fpblock func(val Value)) {}
func InjectContext(fpname string, ctx context.Context, fpblock func(val Value)) {}
func Break(label ...string) {}
func Goto(label string) {}
func Continue(label ...string) {}
func Fallthrough() {}
func Return(results ...interface{}) {}
func Label(label string) {}
Then, through failpoint-ctl
transformation, it constructs AST to replace marker statements, converting them into the final injected function code as follows:
package main
import (
"fmt"
"github.com/pingcap/failpoint"
)
func test() {
failpoint.Inject("testPanic", func(val failpoint.Value){
fmt.Println(val)
})
}
func main() {
for i := 0; i < 100; i++ {
test()
}
}
After conversion:
package main
import (
"fmt"
"github.com/pingcap/failpoint"
)
func test() {
if val, _err_ := failpoint.Eval(_curpkg_("testPanic")); _err_ == nil {
fmt.Println(val)
}
}
func main() {
for i := 0; i < 100; i++ {
test()
}
}
failpoint-ctl
conversion not only replaces the code content but also generates a binding__failpoint_binding__.go
file, which contains a _curpkg_
function to get the current package name:
package main
import "reflect"
type __failpointBindingType struct {pkgpath string}
var __failpointBindingCache = &__failpointBindingType{}
func init() {
__failpointBindingCache.pkgpath = reflect.TypeOf(__failpointBindingType{}).PkgPath()
}
func _curpkg_(name string) string {
return __failpointBindingCache.pkgpath + "/" + name
}
Getting the Code AST Tree
When we call failpoint-ctl
for code transformation, it rewrites the code through Rewriter. Rewriter is a tool structure that mainly traverses the code AST tree, detects Marker functions, and completes function replacement rewriting.
type Rewriter struct {
rewriteDir string // Rewrite path
currentPath string // File path
currentFile *ast.File // File AST tree
currsetFset *token.FileSet // FileSet
failpointName string // Import renaming of failpoint
rewritten bool // Whether rewriting is complete
output io.Writer // Redirect output
}
When failpoint-ctl
executes, it calls the RewriteFile method for code rewriting:
func (r *Rewriter) RewriteFile(path string) (err error) {
defer func() {
if e := recover(); e != nil {
err = fmt.Errorf("%s %v\n%s", r.currentPath, e, debug.Stack())
}
}()
fset := token.NewFileSet();
// Get the AST tree of the go file
file, err := parser.ParseFile(fset, path, nil, parser.ParseComments)
if err != nil {
return err
}
if len(file.Decls) < 1 {
return nil
}
// File path
r.currentPath = path;
// File AST tree
r.currentFile = file;
// File FileSet
r.currsetFset = fset;
// Mark whether rewriting is complete
r.rewritten = false;
// Get the failpoint import package
var failpointImport *ast.ImportSpec;
for _, imp := range file.Imports {
if strings.Trim(imp.Path.Value, "`\"") == packagePath {
failpointImport = imp;
break
}
}
if failpointImport == nil {
panic("import path should be check before rewrite")
}
if failpointImport.Name != nil {
r.failpointName = failpointImport.Name.Name;
} else {
r.failpointName = packageName;
}
// Traverse the top-level declarations in the file: such as type, function, import, global constants, etc.
for _, decl := range file.Decls {
fn, ok := decl.(*ast.FuncDecl);
if !ok {
continue;
}
// Traverse function declaration nodes and replace failpoint related functions
if err := r.rewriteFuncDecl(fn); err != nil {
return err;
}
}
if !r.rewritten {
return nil;
}
if r.output != nil {
return format.Node(r.output, fset, file);
}
// Generate binding__failpoint_binding__ code
found, err := isBindingFileExists(path);
if err != nil {
return err;
}
// If binding__failpoint_binding__.go file does not exist, regenerate one
if !found {
err := writeBindingFile(path, file.Name.Name);
if err != nil {
return err;
}
}
// Rename the original file, such as renaming main.go to main.go__failpoint_stash__
// To be used for restoration
targetPath := path + failpointStashFileSuffix;
if err := os.Rename(path, targetPath); err != nil {
return err;
}
newFile, err := os.OpenFile(path, os.O_TRUNC|os.O_CREATE|os.O_WRONLY, os.ModePerm);
if err != nil {
return err;
}
defer newFile.Close();
// Regenerate code file from constructed ast tree
return format.Node(newFile, fset, file);
}
This method first calls the Go provided parser.ParseFile
method to obtain the AST tree of the file. The AST tree represents the syntax structure of the source code using a tree structure, where each node of the tree represents a structure in the source code. Then it traverses the top-level declarations of this AST tree, which is equivalent to traversing from the top of the tree downwards, a depth-first traversal.
After traversal, it checks the binding__failpoint_binding__
file and backs up the source file before calling format.Node
to rewrite the entire file.
Traversing the Code AST Tree to Get Rewriter Execution Node Replacement
func (r *Rewriter) rewriteStmts(stmts []ast.Stmt) error {
// Traverse function body nodes
for i, block := range stmts {
switch v := block.(type) {
case *ast.DeclStmt:
...
// Includes separate expression statements
case *ast.ExprStmt:
call, ok := v.X.(*ast.CallExpr);
if !ok {
break;
}
switch expr := call.Fun.(type) {
// Function definition
case *ast.FuncLit:
// Recursively traverse function
err := r.rewriteFuncLit(expr);
if err != nil {
return err;
}
// Select structure, similar to a.b structure
case *ast.SelectorExpr:
// Get the package name of the function call
packageName, ok := expr.X.(*ast.Ident);
// Check if the package name equals the failpoint package name
if !ok || packageName.Name != r.failpointName {
break;
}
// Get the Rewriter for the function through Marker name
exprRewriter, found := exprRewriters[expr.Sel.Name];
if !found {
break;
}
// Rewrite the function
rewritten, stmt, err := exprRewriter(r, call);
if err != nil {
return err;
}
if !rewritten {
continue;
}
// Get the newly generated if node
if ifStmt, ok := stmt.(*ast.IfStmt); ok {
err := r.rewriteIfStmt(ifStmt);
if err != nil {
return err;
}
}
// Replace the node with the newly generated if node
stmts[i] = stmt;
r.rewritten = true;
}
case *ast.AssignStmt:
...
case *ast.GoStmt:
...
case *ast.DeferStmt:
...
case *ast.ReturnStmt:
...
default:
fmt.Printf("unsupported statement: %T in %s\n", v, r.pos(v.Pos()));
}
}
return nil;
}
This will sequentially traverse all functions until it finds the failpoint Marker declaration, and then it will retrieve the corresponding Rewriter from exprRewriters
based on the Marker name:
var exprRewriters = map[string]exprRewriter{
"Inject": (*Rewriter).rewriteInject,
"InjectContext": (*Rewriter).rewriteInjectContext,
"Break": (*Rewriter).rewriteBreak,
"Continue": (*Rewriter).rewriteContinue,
"Label": (*Rewriter).rewriteLabel,
"Goto": (*Rewriter).rewriteGoto,
"Fallthrough": (*Rewriter).rewriteFallthrough,
"Return": (*Rewriter).rewriteReturn,
}
Rewriter Rewriting
In our example, we use failpoint.Inject
, so we will explain using rewriteInject
.
Through this method, it will ultimately transform:
failpoint.Inject("testPanic", func(val failpoint.Value){
fmt.Println(val)
})
Into:
if val, _err_ := failpoint.Eval(_curpkg_("testPanic")); _err_ == nil {
fmt.Println(val)
}
Now let’s see how to construct the AST tree:
func (r *Rewriter) rewriteInject(call *ast.CallExpr) (bool, ast.Stmt, error) {
// Check if the function call failpoint.Inject is valid
if len(call.Args) != 2 {
return false, nil, fmt.Errorf("failpoint.Inject: expect 2 arguments but got %v in %s", len(call.Args), r.pos(call.Pos()));
}
// Get the first argument "testPanic"
fpname, ok := call.Args[0].(ast.Expr);
if !ok {
return false, nil, fmt.Errorf("failpoint.Inject: first argument expect a valid expression in %s", r.pos(call.Pos()));
}
// Get the second argument func(val failpoint.Value){}
ident, ok := call.Args[1].(*ast.Ident);
// Check if the second argument is nil
isNilFunc := ok && ident.Name == "nil";
// Check if the second argument is a function, as the second function argument can be null
// failpoint.Inject("failpoint-name", func(){...})
// failpoint.Inject("failpoint-name", func(val failpoint.Value){...})
fpbody, isFuncLit := call.Args[1].(*ast.FuncLit);
if !isNilFunc && !isFuncLit {
return false, nil, fmt.Errorf("failpoint.Inject: second argument expect closure in %s", r.pos(call.Pos()));
}
// The second argument is a function
if isFuncLit {
if len(fpbody.Type.Params.List) > 1 {
return false, nil, fmt.Errorf("failpoint.Inject: closure signature illegal in %s", r.pos(call.Pos()));
}
if len(fpbody.Type.Params.List) == 1 && len(fpbody.Type.Params.List[0].Names) > 1 {
return false, nil, fmt.Errorf("failpoint.Inject: closure signature illegal in %s", r.pos(call.Pos()));
}
}
// Construct the replacement function: _curpkg_("testPanic")
fpnameExtendCall := &ast.CallExpr{
Fun: ast.NewIdent(extendPkgName),
Args: []ast.Expr{fpname},
};
// Construct the function failpoint.Eval
checkCall := &ast.CallExpr{
Fun: &ast.SelectorExpr{
X: &ast.Ident{NamePos: call.Pos(), Name: r.failpointName},
Sel: ast.NewIdent(evalFunction),
},
Args: []ast.Expr{fpnameExtendCall},
};
if isNilFunc || len(fpbody.Body.List) < 1 {
return true, &ast.ExprStmt{X: checkCall}, nil;
}
// Construct if code block
ifBody := &ast.BlockStmt{
Lbrace: call.Pos(),
List: fpbody.Body.List,
Rbrace: call.End(),
};
// Check if the closure function in failpoint contains parameters
// func(val failpoint.Value) {...}
// func() {...}
var argName *ast.Ident;
if len(fpbody.Type.Params.List) > 0 {
arg := fpbody.Type.Params.List[0];
selector, ok := arg.Type.(*ast.SelectorExpr);
if !ok || selector.Sel.Name != "Value" || selector.X.(*ast.Ident).Name != r.failpointName {
return false, nil, fmt.Errorf("failpoint.Inject: invalid signature in %s", r.pos(call.Pos()));
}
argName = arg.Names[0];
} else {
argName = ast.NewIdent("_");
}
// Construct the return value of failpoint.Eval
err := ast.NewIdent("_err_");
init := &ast.AssignStmt{
Lhs: []ast.Expr{argName, err},
Rhs: []ast.Expr{checkCall},
Tok: token.DEFINE,
};
// Construct the if statement condition, which is _err_ == nil
cond := &ast.BinaryExpr{
X: err,
Op: token.EQL,
Y: ast.NewIdent("nil"),
};
// Construct the complete if code block
stmt := &ast.IfStmt{
If: call.Pos(),
Init: init,
Cond: cond,
Body: ifBody,
};
return true, stmt, nil;
}
The above comments should be detailed enough to follow along with the code.
Failpoint Execution
Constructing Fault Plans
For example, if we want this fault to have a 5% chance of being triggered, we can do this:
$ GO_FAILPOINTS='main/testValue=5%return("abc")' go run main.go binding__failpoint_binding__.go
The content declared in the GO_FAILPOINTS
variable will be read during initialization, and the corresponding mechanism will be registered. During execution, the fault control will be performed based on the registered mechanism.
func init() {
failpoints.reg = make(map[string]*Failpoint);
// Get the GO_FAILPOINTS variable
if s := os.Getenv("GO_FAILPOINTS"); len(s) > 0 {
// Split multiple values using ;
for _, fp := range strings.Split(s, ";") {
fpTerms := strings.Split(fp, "=");
if len(fpTerms) != 2 {
fmt.Printf("bad failpoint %q\n", fp);
os.Exit(1);
}
// Register injection plan
err := Enable(fpTerms[0], fpTerms[1]);
if err != nil {
fmt.Printf("bad failpoint %s\n", err);
os.Exit(1);
}
}
}
if s := os.Getenv("GO_FAILPOINTS_HTTP"); len(s) > 0 {
if err := serve(s); err != nil {
fmt.Println(err);
os.Exit(1);
}
}
}
The Enable function will eventually call the Failpoints structure’s Enable method. Let’s take a look at the Failpoints structure:
type Failpoints struct {
mu sync.RWMutex // Concurrency control
reg map[string]*Failpoint // Fault plan table
}
Failpoint struct {
mu sync.RWMutex // Concurrency control
t *terms
waitChan chan struct{} // Used for pausing
}
The Enable function will parse main/testValue=5%
into a key-value form stored in the reg
map, where the value will be parsed into the Failpoint structure.
The fault control plan in the Failpoint structure is mainly stored in the term structure:
type term struct {
desc string // Plan description, here is 5%return("abc")
mods mod // Plan type, whether it is fault probability control or fault count control, here is 5%
act actFunc // Fault behavior, here is return
val interface{} // Injected fault value, here is abc
parent *terms
fp *Failpoint
}
We used return to execute the fault, but there are also six other options:
-
off: Take no action (does not trigger failpoint code) -
return: Trigger failpoint with specified argument -
sleep: Sleep the specified number of milliseconds -
panic: Panic -
break: Execute gdb and break into debugger -
print: Print failpoint path for inject variable -
pause: Pause will pause until the failpoint is disabled
The entire Failpoint hierarchy is as follows:

Next, let’s look at Enable:
func (fp *Failpoint) Enable(inTerms string) error {
t, err := newTerms(inTerms, fp);
if err != nil {
return err;
}
fp.mu.Lock();
fp.t = t;
fp.waitChan = make(chan struct{});
fp.mu.Unlock();
return nil;
}
Enable mainly calls newTerms to build the terms structure:
func newTerms(desc string, fp *Failpoint) (*terms, error) {
// Parse the incoming strategy
chain, err := parse(desc, fp);
if err != nil {
return nil, err;
}
t := &terms{chain: chain, desc: desc};
for _, c := range chain {
c.parent = t;
}
return t, nil;
}
It parses the incoming strategy through parse and constructs terms to return.
Fault Execution
When we run the fault code, we will execute failpoint.Eval
, and then determine whether to execute the fault function based on whether it returns an error.
The Eval function will call the Eval method of Failpoints:
func (fps *Failpoints) Eval(failpath string) (Value, error) {
fps.mu.RLock();
// Get the registered Failpoint
fp, found := fps.reg[failpath];
fps.mu.RUnlock();
if !found {
return nil, errors.Wrapf(ErrNotExist, "error on %s", failpath);
}
// Execute plan judgment
val, err := fp.Eval();
if err != nil {
return nil, errors.Wrapf(err, "error on %s", failpath);
}
return val, nil;
}
The reg
map in the Eval method is the plan registered in the init function mentioned above, which retrieves the Failpoint and calls its Eval method:

The Eval method will call the eval method of terms to traverse the chain []*term
field, obtaining the set plan and calling the allow method to verify if it passes. If so, it will call the do method to execute the corresponding behavior.
Conclusion
In the above introduction, we first learned how to use Failpoint to serve our code, and then learned how Failpoint achieves fault injection through code injection. This includes traversing and modifying the Go AST tree, as well as code generation, which also provides us with an idea for writing code in the future, adding some additional functionality through this method of code generation.
Reference
https://github.com/pingcap/failpoint
https://pingcap.com/zh/blog/golang-failpoint
https://www.modb.pro/db/79460
If you think it’s good, remember to clicklikeand leave a message below, thank you very much!