1. Background
In recent years, AI+Test related intelligent testing technologies have gradually become a fundamental capability for major domestic and international internet companies and testing service providers. This intelligence includes automatic generation of testing code, large-scale testing result analysis, automated exploratory testing, defect localization and repair, etc. Notable companies, products, or services include Test.AI, Applitool, Totoro, Eggplant, Appdiff, etc.
Among these, the ability to automatically generate tests has always been a hot topic in the industry. In 2019, ByteDance’s Quality Lab conducted in-depth exploration in automatic test generation and developed the Fastbot stability testing service for Android. The core technologies of Fastbot mainly include:
-
Intelligent traversal: Using Model-Based Testing (MBT) and providing various algorithm strategies to achieve high Activity coverage and problem discovery capabilities; -
Multi-machine collaboration: Supporting hundreds of machines for long-term collaborative traversal, working together on the same target; -
Personalized expert system: Business parties can perform various personalized configurations, such as limiting tests to run on specified Activities and shielding certain scenarios; -
Model reuse: Utilizing historical testing experience data to learn and improve the current testing strategy; -
Complex case generation: Imitating manual cases for learning and mixing complex case combinations during traversal; -
Precise targeting: Automatically generating targeted tests for change scenarios based on code call chain changes.
Meanwhile, according to industry cross-platform research, the iOS market share has always been high, especially among high-end consumers who generally use iPhone devices for better performance experiences and have higher requirements for application stability. However, it is currently understood that due to the lack of iOS stability testing tools in the market, the stability and regression testing of iOS products mostly rely on manual verification, resulting in relatively low testing efficiency and output. At the same time, with the diversification and complexity of products and the rapid expansion of product lines, the labor costs for quality assurance are enormous. To alleviate this situation, there is an urgent need for a stability testing service for iOS applications, which can be deployed during the company’s product line testing phase with ultra-low access costs and unattended operation. To spread the intelligent capabilities of Fastbot to other platforms, ByteDance’s Quality Lab began to gradually develop the iOS stability testing service at the beginning of 2020. First, two questions are worth considering:
-
Can the Android traversal algorithm be universally applied across platforms? -
Is there a universal cross-platform page recognition method based on machine vision?
The answers to these two questions are affirmative.
Next, this article will focus on introducing the design ideas, technological evolution, and applications of ByteDance’s self-developed intelligent testing system Fastbot in cross-platform scenarios.
2. Test Generation
2.1 Introduction to Automated Test Generation
Automated Test Generation (ATG) technology, also known as Automated Input Generation (AIG) technology, is a traditional automation method, such as recording and replaying (Record & Replay), which relies on testers to write test scripts. As testing requirements change, testers need to spend time maintaining and adjusting the corresponding test scripts. Compared with the recording and replaying method, abstracting the common services that testing activities depend on and relying on automatic generation of operations required for testing activities can significantly reduce the workload of writing and maintaining test scripts.
Labor Demand | Script Workload | Reusability | Execution Efficiency | Generality | |
---|---|---|---|---|---|
Recording and Replay | High | High | Medium | Low | Related to Apk, low |
Native Monkey | Low | Low | Low | High | Built-in Android, high |
Test Generation | Low | Low | Low | High | Independent of Apk, high |
Currently, typical ATG technologies include:
-
Program analysis; -
Model-Based Testing; -
Combinatorial Testing; -
Search-Based Testing, Facebook’s Sapienz; -
Adaptive Random Testing.

Figure 1 Introduction to ATG Technology
The core logic focuses on “how to generate” testing logic. Taking MBT as an example, a certain page in the GUI testing (client testing) process can be defined as a state (State). Using the corresponding GUI control tree for that page, we can extract more meaningful operations, such as from State1 through Event1 to reach State3, and from State2 through Event2 to reach State1. Thus, the problem of test generation is transformed into a directed graph traversal problem. Random testing tools like Monkey often cause developers concern due to the lack of a higher-level representation of logs:
-
Test sequences generated by Monkey are difficult to document as use cases; -
It is challenging to reproduce bugs due to the lack of detailed steps for reproduction.
2.2 Automated Testing Tools
The ATG technology for apps mainly includes two categories.
The first is white-box automated testing tools based on code layers. This method usually requires obtaining the app’s source code in advance, analyzing it to generate a control flow graph, and then generating test cases based on this. Although white-box testing methods are more precise, they have many limitations, as they cannot effectively test apps for which the source code cannot be obtained. Additionally, to achieve high code coverage, an excessive number of test cases may be generated.
The second is black-box testing based on the GUI information within the app. This type of testing does not require obtaining the app’s source code; we only need to listen to the UI information of the phone’s pages during the testing process and perform action injections to achieve continuous interactive testing.
Other popular black-box automated testing tools include:
-
Facebook’s Sapienz, which uses genetic algorithms and search-based methods to generate test cases; -
Dynodroid developed by Georgia Tech, which treats the app as a series of executable actions and sequentially generates test sequences; -
EHBDroid (gray-box) bypasses the UI layer and directly triggers events through event handler callbacks using static and dynamic methods; -
Stoat constructs a State-Action probabilistic graph model and optimizes this model using MCMC sampling methods to achieve optimal coverage of the app; -
APE proposes a method for dynamically adjusting page state abstraction, allowing for suitable abstraction granularity based on different app conditions; -
TimeMachine runs on an emulator, optimizing tests and achieving precise playback by saving and loading emulator states at key testing stages; -
Q-Testing pre-trains a machine learning model for page abstraction and explores testing using curiosity-driven reinforcement learning methods; -
ComboDroid abstracts and extends manual cases, identifying the connectivity of states and generating richer test cases; -
And the previously mentioned random testing tool Monkey that comes with Android.
Additionally, tools developed by Peking University, such as Droidbot and Humanoid, also use model-based GUI testing, with Humanoid imitating user behavior and Droidbot abstracting pages and actions as graph models, traversing the graph using traditional DFS and BFS algorithms to achieve high coverage.
However, in our testing process, we found that traditional graph traversal algorithms perform poorly in model-based GUI testing due to:
-
The presence of numerous loops in the graph makes it easy for DFS-based algorithms to get stuck in local loops, covering only a limited number of pages and unable to exit; -
Most of the apps being tested are dynamic and real-time updating, with certain pages (such as Feed pages, search pages, etc.) facing severe issues of not being able to return to previous pages after exiting. Simple back operations cannot guarantee returning to the previous page, and actions like pull-to-refresh do not have corresponding back operations, etc.
Furthermore, the aforementioned methods store app models on the client side. Due to memory and performance limitations of mobile devices, the model size is severely constrained, making long-term testing impossible. Moreover, since many A/B experiments utilize data such as device model, OS version, etc., the number of states that can be traversed on each device varies.
On Android, Fastbot utilizes a richer variety of device models, leveraging a device farm to collaboratively construct the app model to guide future testing tasks. At the same time, we optimized traditional graph search algorithms, switching to heuristic search to achieve higher test coverage in a shorter time.
3. Design Principles of Fastbot
3.1 Workflow of Fastbot-Android
As mentioned above, to address the limitations imposed by the memory size and computational capabilities of mobile devices in model-based GUI testing, we deploy the parts that consume large amounts of memory and computational resources to the cloud, retaining only the UI information monitoring and action injection functions on the client side. Figure 2 shows the working method of separating the client and server.

Figure 2 Fastbot Workflow Diagram
In terms of specific workflow, we run a lightweight client driver on each device, which mainly includes: monitoring page GUI information sent to the server and receiving actions sent by the server to implement event injection on the device. Correspondingly, there is also an agent on the server side. Each server agent is responsible for one device, receives its page information, encapsulates it, generates state nodes, and the server agent makes action decisions based on the current state information, according to the assigned specified algorithm, interacting with the task model and sending the decided actions back to the client agent.
3.2 Algorithm Principles of Fastbot
3.2.1 Exploration and Exploitation Based on State
In terms of algorithms, we abstract the GUI information of the page into states in the model and the executed actions into actions in the model, connecting states as nodes of the graph and actions as edges of the graph, forming a directed cyclic graph model. The traversal decision idea is inspired by the Monte Carlo tree search concept used in Alphago. On this basis, we also use other reinforcement learning methods, designing an N-step Q-Learning algorithm and a reward function based on the degree of page changes to calculate the corresponding Q values for each action under the page, selecting the optimal action based on the Q values.
This entire process is akin to a robot exploring a map. Our goal is to cover all paths on the map, while prioritizing paths with higher value under limited time. Here, value is a broad concept that we can define according to our goals. If our goal is to travel from A to B, we can learn one or several fixed paths. It is important to understand another concept: when our exploring robot reaches a new intersection with N forks, if we have not explored these forks, we cannot know the value of the subsequent paths, and thus cannot make the right decision. Therefore, we need to balance exploration and exploitation. When we have explored a path sufficiently, we will also propagate the information observed along the path back to guide the robot in recording the value across the entire chain. Only when exploration is sufficient does exploitation become valuable (here, exploitation refers to performing the optimal action). Additionally, if we explore the entire map infinitely, the Q values among actions will stabilize to a certain ratio, allowing for more accurate decision-making due to sufficient information about the map; the same applies to traversal.
Simply put, during traversal, we choose the action corresponding to the maximum value under the current state. We select the action that provides the maximum value increment. For example, in the figure below, under StateA, there are 3 possible actions, but Action2 brings the maximum value, so when the agent enters StateA, it will choose Action2. (It is emphasized that the value here, at the beginning of reinforcement learning training, is unknown, generally set to 0. Then, the agent continuously tries various actions, interacts with the environment, obtains rewards, and updates the value based on our value calculation formula. After many rounds of training, the value will converge to a stable number, allowing us to determine what value can be obtained by choosing a specific action under a particular state.)

Figure 4 Reinforcement Learning Event Decision
It is emphasized that the value here is not only based on the immediate reward from the environment when transitioning from the current state to the next state. In actual training, we need to consider both immediate and long-term rewards, so the value here is derived from a calculation formula, which depends on whether it is a single-step or N-step reward; moreover, this value is sampled and needs to undergo multiple rounds of iteration, with loss convergence indicating the end of training.

Figure 5 Reverse Update Value
Another issue is that at StateA, initially, the values of Action1, Action2, and Action3 are all 0, as we do not know what these actions will yield. If Action1 is randomly chosen first, transitioning from StateA to StateB, we get Value=2, and the system records that choosing Action1 at StateA corresponds to Value=2. If the agent returns to StateA again, it will choose Action1 since it has the maximum value, while Action2 and Action3 still have values of 0. The agent has not yet attempted Action2 and Action3 to determine their values.
Therefore, in reinforcement learning traversal, we initially encourage exploration rather than always choosing the action with the highest value. There is a degree of randomness in action selection (we use a UCB decision-making mechanism based on access frequency and accumulated value) to cover more actions and try various possibilities. After many rounds of training, when various actions under various states have been sufficiently tried, we will significantly reduce the exploration ratio, allowing traversal to lean more towards exploitation, selecting the action with the highest returned value.
3.2.2 The Problem of Sparse Rewards
Another challenge in the algorithm is that rewards (rewards) during traversal are often sparse. Initially, our reward function was designed to calculate based on the statistics of covered activities and the number of controls, but in the later stages of traversal, these metrics tend to stabilize, making significant growth in rewards difficult, resulting in less-than-ideal learning effects based on such rewards in the later stages. Through some experimental pitfalls, we chose to use curiosity-driven reinforcement learning methods to address the sparse reward problem, while also leveraging natural language processing to abstract features from page information, adding curiosity (Curiosity) rewards to the original reward function, as follows:


Figure 6 Curiosity Driven RL Process Diagram
This allows us to assign different reward values to any state-action pairs, rather than providing fixed rewards based on artificially set sub-goals.
We also conducted several ablation experiments and obtained the following conclusions: not adding Curiosity to reward calculation (blue line c0) and calculating reward solely based on Curiosity (green line c99) both performed slightly worse than the mixed use of the original reward combined with Curiosity reward (orange line c30). The data shows that introducing Curiosity-driven learning has a positive effect on testing coverage, especially in the early stages.

Figure 7 Curiosity Driven RL Ablation Experiment
Curiosity-driven learning satisfies the principle that reinforcement learning needs to be meaningful based on temporal learning, adding a time variance factor.
However, this technology is not perfect. A known issue is that the agent may be distracted by random or noisy elements in the environment, causing curiosity disturbances. This situation is referred to as “white noise” or the “TV problem,” also known as “procrastination.”
To illustrate this situation, imagine an agent learning to explore a maze by observing the pixels it sees.

Animation 1 Exploring the Maze
Predicting the next state triggers the agent’s curiosity to explore the maze. It tends to seek unexplored areas because it can make good predictions in areas it has explored (or it cannot make good predictions in unexplored areas).
Now, suppose we place a “TV” on the wall of the maze that rapidly plays random animations. Due to the random source of images, the agent cannot accurately predict what image will appear next. The prediction model will generate high loss, providing the agent with high “intrinsic” rewards. As a result, the agent tends to stop and watch the “TV” instead of continuing to explore the maze.

Animation 2 Stuck in Front of the TV
In the environment, when the agent faces the “TV” or a source of random noise, the curiosity triggered by predicting the next state ultimately leads to “procrastination.”
The same applies to traversal; whether there is a “TV” depends entirely on whether the definition of this “pixel” is reasonable. Imagine a page playing a short video or an advertisement that keeps rotating; will the agent standing still think this is a place full of curiosity?
3.2.3 Reusing Testing Experience
Considering that traversal time is not fixed and varies between different apps, when the traversal time is short, training may be insufficient. Therefore, we persistently save the model trained after each traversal, allowing it to be loaded and continue training before the next test, thus continuously improving the “map”. We also store the traversal data in a “GUITree1, Action, GUITree2” paired format in a persistent database to improve the natural language model and curiosity model.
From actual test data, model reuse has a positive effect on testing coverage. As shown in the figure below, a and b are two different types of apps developed by ByteDance. After multiple rounds of cumulative testing, the coverage capacity of simultaneous tests increased by 17.9% and 33.3%, respectively.
Figure 8 Model Reuse
To validate the tool’s effectiveness, we compared Fastbot (Re) with several other state-of-the-art testing tools, including APE (A) based on dynamically adjusted page state abstraction and Stoat (St) based on sampling optimization of probabilistic graph models. The experiments involved 40 representative apps, with testing conducted on a single device for 1 hour. The performance of Fastbot (accumulated reuse of 3 rounds of testing experience) surpassed other tools. The figure below (Figure 9) shows the comparison of various tools running multiple tests on a single device, indicating that Fastbot has superior code coverage on large apps compared to other state-of-the-art tools, suggesting that Fastbot may have advantages when facing large apps.

Figure 9 Evaluation Data of Fastbot(Re), APE(A), Stoat(St)
3.3 Foundation of Cross-Platform Universality
Regarding the universality of cross-platform algorithms, we have fully considered this when designing the overall architecture, decoupling client capabilities and algorithm decision-making by serviceizing the backend algorithm decisions to support a set of algorithms across platforms.
The benefits of this decoupling are evident. For cross-platform system capabilities, such as iOS, we only need to focus on the differences from the Android side, such as the need to obtain GUI page information and inject various events. At the same time, in terms of message communication handling between the client and server, we only need to predefine the differences of multiple platforms to achieve cross-platform compatibility, such as standardizing communication protocols for structuring GUI page information reporting and event types, operation objects, etc.

Figure 3 Fastbot Cross-Platform Architecture Diagram
4. Applications of Fastbot in Cross-Platform
4.1 iOS Automation Testing Tools and Frameworks
Due to the strong closed nature of the iOS platform, most automated testing and intelligent testing research in academia or engineering has prioritized implementation based on Android, leaving a relatively vacuum state for intelligent testing solutions on the iOS platform.
In the process of implementing intelligent testing of GUIs on iOS, one key point is that some process operations need to be performed on the tested app, such as starting/killing/restarting/switching between foreground and background. Another key point is obtaining the current GUI page information (GUITree control tree) and abstracting its state to obtain the app’s current running status and the current page’s feature abstract representation. Typically, these foundational capabilities are implemented through the corresponding automated testing framework of the platform, such as using Android UIAutomator (or UIAutomator2, Accessibilityservice) to capture GUI page information as input for abstract states.
Company (Organization) | Key Technologies | Advantages | Disadvantages | |
---|---|---|---|---|
UIAutomation | Apple Native | Based on the Accessibility layer, it uses the UI Automation library through TCP communication to drive UI Automation for automated testing | Official native compatibility is guaranteed; no instrumentation required | Deprecated after Xcode8.x; only supports debugging on a single device, instruments limit a single Mac to correspond to a single iOS device |
XCTest/XCUITest | Apple Native | Introduced in Xcode7.x, a UI testing framework based on Accessibility that fully replaces UIAutomation and removes the single-device limitation | Official native compatibility is guaranteed; stronger capabilities than UIAutomation, supports regex positioning of UI elements, provides UI assert capabilities; supports unit testing, interface testing, UI testing; no instrumentation required | Relies on Xcodebuild to run; some foundational capabilities are not provided, such as obtaining the current foreground process, efficient event injection capabilities |
KIF | Based on XCTest framework, references some private interfaces | Supports Xcode 11.6 (iOS11-13); supports unit testing, UI testing | Uses private interfaces, backward compatibility cannot be guaranteed; running speed is relatively slow | |
WDA (WebDriverAgent) | Facebook/Appium | Based on the XCTest framework, references more private interfaces than KIF | Not limited by instruments single instance; open private interfaces meet most testing scenarios; good stability, subsequent well-known testing frameworks generally have extensions based on WDA | Uses private interfaces, backward compatibility cannot be guaranteed; speed is relatively slow when querying and matching controls; Facebook no longer maintains WDA, Appium takes over |
Appium | Open Source Community | A cross-platform UI testing framework based on WebDriver json protocol, operates on iOS through WDA | No instrumentation required; supports image recognition | Overly heavyweight, difficult to set up the environment; execution speed is slow (10 seconds) |
Airtest | Netease | Based on image recognition technology to locate UI elements, also developed a poco instrumentation library to obtain GUITree control tree | Supports image recognition, suitable for automation testing in gaming scenarios | High compatibility cost for new versions of Xcode |
EarlGrey | Based on XCTest framework, performs black-box testing via XCUITest or white-box testing via XCUnitTest | Automatically ensures UI, network requests, and queue callbacks are stable before executing tests | Requires source-level instrumentation | |
tidevice | Alibaba | A cross-platform automated open-source tool that can start WebDriverAgent (WDA) without relying on Xcode | Can run iOS automation scripts on Windows | Same as WDA |
Table 1 iOS UI Automation Frameworks
Table 1 lists several common iOS UI automation frameworks currently available in the market. Overall, they can be divided into three categories: 1) App source code instrumentation: This method obtains the host page control tree and injects executable operations within the process through an instrumentation SDK. While the instrumentation method executes quickly, it has drawbacks; poor SDKs may adversely affect the host app, such as deteriorating stability. Additionally, the process injection method cannot operate a system-level pop-up. 2) WDA private interfaces: This method does not require instrumentation and is currently the mainstream iOS UI automation solution. However, using private interfaces often leads to compatibility issues, and the performance of obtaining control trees through private interfaces can sometimes be concerning. 3) Image recognition combined with WDA private interfaces: This automation capability is entirely dependent on image capabilities; it also shares the advantages and disadvantages of the second category.
Table 2 lists several relatively good iOS Monkey testing tools currently available in the market. In summary, they are mainly based on XCTest and WDA, but a common issue is that updates and maintenance are not timely, and some have even ceased maintenance for a long time. The primary challenge faced by these tools is the enormous development cost of compatibility with new versions of iOS, especially concerning the compatibility of WDA (WebDriverAgent, which provides certain cross-process app scheduling and control tree acquisition capabilities) private interfaces, often having to wait for Facebook to resolve WDA compatibility before starting development. Unfortunately, Facebook has now abandoned subsequent compatibility for WDA and has shifted to developing IDB (iOS Development Bridge, similar to adb tools in Android, but with stability issues on real devices and cannot fully replace WDA), while WDA is now maintained by Appium in a community form.
Key Technologies | Advantages | Disadvantages | |
---|---|---|---|
ui-auto-monkey | The earliest tool for iOS Monkey testing, driven by JavaScript, based on UIAutomation’s Monkey | After the upgrade of iOS and Xcode, the UIAutomation framework was removed, and it only applies to versions before Xcode7.x. This project is now abandoned. | |
SwiftMonkey | Based on the XCUITest framework, developed in Swift, a purely coordinate-based random clicking Monkey | Fast speed (millisecond level), lightweight, good compatibility | The tool needs to be instrumented into the app’s source code and does not support parsing the control tree |
FastMonkey | Based on XCTestWD (a secondary development of WDA) and SwiftMonkey, optimizing WDA private interfaces and XCUITest, supports parsing the control tree, event probability-driven Monkey | Fast speed (millisecond level), lightweight; no instrumentation required, optional support for control tree parsing (second level), custom event configuration | Only supports Xcode 8.x, 9.x, and 10.1 versions; other versions are not yet compatible (other versions based on this secondary development are compatible with xcode10.x and 11.x) |
OCMonkey | Developed in Objective-C, integrates WDA private interfaces, customizable configuration for control type weight-driven Monkey | No instrumentation required, supports control tree parsing | Control tree parsing speed is slow (hundreds of milliseconds to seconds), does not support Xcode10.x and above, has stopped maintenance |
Macaca/iOSMonkey | Based on Macaca’s secondary encapsulation, developed in Node.js, integrates WDA private interfaces, provides an external driving instruments server-client automation testing framework similar to Appium | Cross-platform support capability, supports control tree parsing | Overly heavyweight, relatively complex environment setup, event-driven response speed is slow (10+ seconds), has stopped maintenance |
sjk_swiftmonkey | Developed as a secondary development of SwiftMonkey, compatible with modified WDA private interfaces, supports parsing the control tree, event probability-driven Monkey | Lightweight, does not require instrumentation, supports control tree parsing, supports Xcode 11.x | Control tree parsing speed is slow (second level), does not support Xcode12.x |
Table 2 iOS Monkey Tools
In addition to compatibility issues, the ability to obtain GUI page information on iOS also requires attention. Many Monkey tools listed in Table 2 have integrated the ability to parse the GUITree control tree, which significantly improves operational efficiency compared to pure coordinate clicks. For instance, repeatedly clicking several coordinates may result in operations within the same control area; moreover, with the ability to parse controls, it becomes possible to customize certain behavior trees or control shielding configuration mechanisms to enrich the tool’s capabilities. Also, the speed of control parsing is an important metric to consider, as a stress testing tool, we certainly do not want to see an operation taking 10 seconds to click; instead, we hope it has control parsing capabilities while the speed is close to that of coordinate-based event generation. The ability to recognize the control tree and the time taken for recognition are clearly a trade-off issue. After all, no “crazy” racer wants to see that changing a tire takes so long that they are overtaken by the second place!
4.2 Fastbot-iOS Cross-Platform Solution
In summary, considering the pros and cons, the architecture of Fastbot-iOS adopts lightweight yet necessary WDA private interfaces, optional instrumentation SDK (to provide additional plugin capabilities), and a technology solution based on pure image recognition.
The specific workflow is shown in Figure 10, highlighting the differences between Fastbot-iOS and Android.
First, we developed a Fastbot Native library based on machine vision that analyzes the page information through pure image parsing, which converts the captured image into structured GUITree XML information. Using OpenCV and machine vision algorithms, we can identify the layout structure of GUI pages, control information, and perform structured cropping for pop-up pages. This Fastbot Native library is implemented in C++, designed to be cross-platform, meaning it can be easily ported to Android and Mac PC.
Secondly, to optimize performance and compatibility, we made modifications to WDA, retaining only the minimal range of WDA private interfaces. The benefit of this design is to provide high availability for the tool and quickly adapt to the latest version of iOS, even without any modifications. For instance, on the day Apple launched iOS15, Fastbot-iOS was seamlessly compatible and could run directly.
Finally, we also provide extensible capabilities through optional plugins, such as integrating the ShootsSDK plugin (Shoots is a general UI automation framework developed internally by ByteDance for writing UI automation test cases, similar to the Airtest poco SDK available in the market). This plugin obtains the GUITree control tree of the app through internal reflection and is typically introduced only when there are special page parsing requirements in Webview, Lynx, games, or business applications. In general cases, the page parsing issue can be resolved through the Fastbot Native library. Additionally, this extensibility mechanism supports custom plugins developed internally by businesses, as long as they align with the communication protocol of Fastbot-iOS.

Figure 10 Fastbot-iOS Action Timing Diagram
4.2.1 Enhanced Testability
In addition to the Shoots plugin, Fastbot-iOS has developed the AAFastbotTweak testability plugin, which similarly integrates into the app to provide enhanced extensibility for testing. The capabilities include, but are not limited to:
-
Scene Limiting: Able to limit the host app to a specific scene, allowing access to any sub-page within that scene. If the app exits that scene, Fastbot-iOS will immediately re-enter the designated scene. It also provides a blacklist and whitelist mechanism to limit which pages cannot be transitioned to or which pages can only be transitioned to. -
Transition Shielding: Periodically shield all third-party transitions, such as QQ, WeChat, Taobao, etc. -
Shielding Upgrades: Shield automatic updates of the host app. -
Automatic Login: Automatically log in using specified type accounts from an account pool. -
Data Mocking: Mock preset A/B Testing values and keys. -
Forced Kill: Forcefully kill the app upon receiving execution messages, such as WatchDog kills. -
Schema Transition: Automatically transition to specific scene pages based on a preset Schema List.
All these functions are pluggable, can be activated on demand, and are highly customizable.
4.2.2 Fault Injection
In addition to the above two SDKs, Fastbot-iOS has also developed a fault injection SDK integrated into the app. This plugin simulates various complex extreme situations on online devices, injecting instantaneous or sustained faults during Fastbot traversal testing to verify the app’s stability under extreme conditions. Its capabilities include but are not limited to:
-
Simulating High CPU Load: Increasing CPU load through high-frequency calculations, simulating most threads running at full capacity while a single thread fluctuates. -
Simulating CPU Throttling: Adjusting the “maximum CPU frequency” value for a jailbroken iPhone in low battery mode, simulating low-frequency high-temperature conditions. -
Simulating Low Available Memory: Allocating non-releasable memory to occupy memory, causing the app to run under low available memory conditions. -
Simulating Disk Anomalies: Generating oversized files with random numbers and then copying, randomly inserting characters into the copied files, creating low available disk or no available disk situations. -
Simulating High I/O: Remaining in a write and erase state for an extended period, using minimal disk and memory for I/O simulation. -
Simulating High Concurrent Threads or Thread Pool: Adding count locks before thread execution to create high concurrent access to critical resources.
These functionalities can be integrated on demand and support various fault combinations.
4.2.3 WDA Optimization
The modifications to WDA are based on performance and compatibility considerations, focusing solely on minimizing usage. Ultimately, we retained only the following three interfaces, replacing all other private interfaces with native XCUITest interfaces:
-
Foreground Process Handle Related Private Interface: (NSArray _)activeApplications; -
Application Initialization Related Private Interface: (id)initPrivateWithPath:(id)arg1 bundleID:(id)arg2; -
Generating Device Events Private Interface: (void)synthesizeEvent:(XCSynthesizedEventRecord _)arg1 completion:(void (^)(NSError _))arg2.
This decoupled Fastbot-iOS is lighter, and for future iOS compatibility iterations, we only need to focus on these private interfaces.
Moreover, the iOS Monkey tools listed in Table 2, such as OCMonkey, generally call automation frameworks XCUITest or WDA to parse GUI page information. This method raises a stability issue. Fastbot-iOS parses the tested app as a third-party process, and when XCUITest or WDA dumps the GUITree, it involves recursively parsing page elements. Under complex pages, such recursion can lead to resource consumption issues, leading to a higher probability of connection interruptions or timeouts. Furthermore, running for over 10 hours on low-spec iPhones can lead to significant battery overheating, and prolonged exposure poses a risk of battery swelling. Therefore, when the business has not introduced the Shoots plugin (the default case is that we hope the tool operates entirely based on non-instrumentation without requiring such plugins, as introducing a plugin incurs certain modification costs for the app and is not suitable for release package testing; whether to integrate is also a trade-off issue), we completely abandon conventional page parsing methods, opting for cross-platform image structural coding technology. When applied, it only requires reliance on a screenshot interface of XCUITest, and this capability can directly screenshot everything at the system level, naturally supporting parsing of both in-app and out-of-app pages.
5. Fastbot’s Intelligent Image Processing Extends Cross-Platform Capabilities
5.1 Application of Image Algorithms in Testing
Intelligent image processing refers to a class of computer-based adaptive image processing and analysis techniques for various application scenarios. It is an independent theoretical and technical field, but it is also a crucial technology in machine vision.
The origins of machine vision can be traced back to the 1960s, when American scholar L.R. Roberts conducted image processing research on polyhedral blocks. In the 1970s, the Massachusetts Institute of Technology (MIT) introduced the “Machine Vision” course in its artificial intelligence lab. By the 1980s, a global wave of machine vision research began, resulting in several application systems based on machine vision. After the 1990s, with the rapid development of computer and semiconductor technologies, machine vision theory and applications advanced further.
Entering the 21st century, the speed of development in machine vision technology has accelerated, and it is now widely applied across various fields such as intelligent manufacturing, intelligent transportation, healthcare, and security monitoring. Currently, with the rise of artificial intelligence, machine vision technology is undergoing continuous breakthroughs, moving towards maturity.
According to research, more and more companies and academic organizations in the testing field are introducing image processing and machine vision, with increasingly rich use cases, leading to the emergence of many excellent tools. Table 3 lists several representative testing tools with image capabilities. Among them, the highlights evoke the feeling of “like discovering a guiding light in the vast dark night.”
Time | Company (Organization) | Image Technology | Application Field | |
---|---|---|---|---|
Sikuli | 2009 Open Source | MIT | Based on screen image control recognition, using template matching and SIFT feature matching from OpenCV | 1. UI Automation; 2. Image Matching |
Applitools | July 2017 | Applitools | Uses adaptive algorithms for visual testing, discovering potential UI errors through diff, where humans set checkpoints in each step of the baseline image and utilize image algorithms to assert the comparison of checkpoints | 1. Functional Testing; 2. Regression Testing |
AirTest | Released March 2018 | Netease | Automation testing framework based on image recognition, originating from Sikuli | 1. Game UI Recognition; 2. Cross-Platform AppUI Recognition |
Test.ai | August 2018 | Test.ai | Automatically identifies screens and elements in applications and drives applications to execute test cases | 1. UI Traversal Testing; 2. Object Detection |
Appium1.9 | August 2018 | Appium | Added image control recognition capabilities | 1. UI Automation |
AppiumPro | November 2018 | Cloud Grey | Uses Test.ai as a plugin, employing deep object detection for control recognition | 1. Object Detection |
Table 3 Applications of Image Algorithms in Testing
5.2 Image UI Recognition
Under the premise of low energy consumption, low time cost, and high performance requirements for Fastbot, we prioritize the most basic image processing technologies to recognize GUI interface information, capable of constructing page information in milliseconds. Basic image processing includes:
-
Basic Segmentation:
-
Preprocessing: including cropping, gray histogram equalization, and binarization. Cropping mainly targets the vertical and horizontal scroll bars on the sides of the page, which can lead to bad cases during row scanning, so the rightmost column is cropped. Gray histogram equalization primarily addresses cases where the overall image is relatively dark, such as night mode, enhancing the contrast between the background and UI after equalization. Binarization sets pixels below a specific threshold to 0, and those above the threshold to 1. -
Row and Column Scanning: Scanning the binary image from top to bottom or left to right for pixel values. If all are 1 (light color), it is considered a non-UI area; if not all are 1, it is considered a UI area. Alternating row and column scanning iteratively can effectively segment an image, as shown in Figure 11.

Figure 11 Row and Column Scanning
-
Text Block Aggregation: Aggregating adjacent UI types as text into one entity. First, we aggregate text lines, followed by column aggregation, as illustrated in Figure 12.

Figure 12 Text Block Aggregation
-
Night Mode: If the number of segmented areas is too few, it is determined that a night mode is present. We first perform gray histogram equalization and then adjust the binarization threshold for segmentation, as shown in Figure 13.
Figure 13 Night Mode
Simultaneously, when performance requirements are more lenient, we introduce deep machine learning-related technologies to improve the accuracy of page parsing:
-
Classification: Classifying detected controls into categories such as buttons, search boxes, images, text, short text, etc. -
OCR: Optical character recognition, useful for retrieving custom events. -
Object Detection: Utilizing the YOLOv3 object detection model to directly locate pre-labeled controls.

Figure 14 Object Detection
5.3 Image UI Anomaly Detection
In addition to recognizing UI interface information, we have also developed rich capabilities for detecting image UI anomalies. These capabilities include but are not limited to:
-
Black and White Screen: Black screen and white screen anomalies, generally caused by image path errors, application permissions, network disconnections, etc., leading to image loading errors and resulting in complete failure to render images on the interface. -
Image Overlap: Multiple images overlapping each other, usually caused by performance lag during asynchronous rendering and loading. -
Purple Block Anomaly: Purple block anomalies commonly occur in gaming scenarios, often caused by damage or loss of texture or model images. -
White Block Anomaly: White block anomalies usually occur in gaming scenarios, caused by damage or loss of UI images. -
Black Frame Anomaly: A black area exceeding a threshold width around the image, usually caused by insufficient compatibility adaptation for device models and layouts. -
Overexposure Anomaly: Commonly occurs in gaming scenarios, usually caused by errors in game engine rendering. -
Control Obstruction: As shown in Figure 15 left 1, one control overlaps another, completely blocking the lower control. This situation often arises from incorrect aspect ratios or text size settings of controls. -
Text Overlap: As shown in Figure 15 left 2, text in two text boxes overlaps, usually due to incorrect text size settings. Text overlap differs from control obstruction, as it involves two parts of text mixing together, while control obstruction means one control completely blocks another. -
Image Loss: As shown in Figure 15 left 3, due to incorrect image paths, application permissions, or network disconnections, leading to image loading errors and incomplete display of images on the interface. -
Null Value: As shown in Figure 15 left 4, due to parameter setting errors or database read errors, resulting in incorrect display of text on the interface. -
Screen Distortion: As shown in Figure 15 right 1, screen distortion in games or videos, usually due to hardware defects or errors when using GPU/CPU acceleration instructions.

Figure 15 UI Anomaly Detection Examples
6. Applications of Fastbot in Game Testing
In recent years, reinforcement learning has been able to learn to play games like Go, StarCraft, and Dota, even surpassing human professional players. These technological breakthroughs have not only revolutionized game AI design but also provided possibilities for intelligent game testing. In response to the real needs presented by game businesses, Fastbot has explored and attempted many innovations in the direction of game testing combined with current artificial intelligence technologies.
-
Multilingual Testing: In response to the insufficient manpower for multilingual testing as multiple games from Morning Star Light Year go global, we use Fastbot to achieve game UI traversal while capturing game screens and using OCR text recognition to compare text information against recognized text areas for translation omissions, translation errors, and text overflow issues.
Figure 16 RO: The World of Fantasy Game – Translation Error in Thai with English Present
-
AI for Automatic Task Completion: For scenes requiring completion of certain storyline tasks to access specific areas in games, we developed a Fastbot-a3c Agent algorithm that combines game state graphs, behavior tree rules, prior knowledge, and imitation learning to automatically complete tasks in games to meet long-term stability, compatibility testing, and multilingual detection.
Animation 3 RO: The World of Fantasy Game – AI Automatically Completing Tasks
7. Summary
Currently, Fastbot has been widely applied in the stability and compatibility testing of ByteDance client products. The number of tasks initiated daily exceeds 10,000, and over 50,000 crashes are identified on average monthly. With the capabilities of Fastbot, we can fix most crashes before release, ensuring a better user experience for online users. Additionally, Fastbot plays a crucial foundational service role in the entire DevOps process.
We have also open-sourced:
-
Fastbot-iOS: https://github.com/bytedance/Fastbot_iOS -
Fastbot-Android: https://github.com/bytedance/Fastbot_Android
We hope to engage in deep cooperation and communication with industry peers. We believe that the increasing deployment of intelligent testing tools will accelerate the transformation of quality engineering and promote the domestic quality engineering technology level to the forefront of the global quality engineering industry.
At the conclusion of this article, we sincerely thank the teams from Product Research Quality Engineering, Product Research iOS Client Platform Architecture, Data Visual Technology, Product Research Game AI, and Game Quality Efficiency for their strong support.
-
Fastbot QQ Group: 1164712203 -
Fastbot WeChat Group:

8. Join Us
ByteDance Quality Lab is an innovative team dedicated to theoretical research and technical pre-research in software engineering for the internet industry. Our mission is to become the world’s top intelligent tools team. We are committed to applying cutting-edge AI technology to the field of quality and engineering efficiency, providing intelligent testing tools for the industry, such as Fastbot, ByQI, SmartEye, SmartUnit, and more. On the road to becoming the world’s top intelligent tools team, we hope to bring more intelligent means to the quality field.
Here, you can create powerful testing robots using machine vision and reinforcement learning, verifying your algorithms on thousands of devices; you can also practice various textbook testing theories to help businesses improve testing efficiency, with combination testing, program analysis, precise testing, automatic generation of unit tests, and automatic defect repair all waiting for you to explore; you can also engage in exchanges and collaborations with top institutions at home and abroad, exploring more possibilities in the field of software engineering with scholars from around the world. We welcome all interested individuals to join us. Resume submission email: [email protected]; email subject: Name – Years of Experience – Quality Lab – Fastbot.
9. Related Materials
-
Sapienz: Intelligent automated software testing at scale, https://engineering.fb.com/2018/05/02/developer-tools/sapienz-intelligent-automated-software-testing-at-scale/ -
Dynodroid: an input generation system for Android apps, https://dl.acm.org/doi/10.1145/2491411.2491450 -
EHBDroid: Beyond GUI testing for Android applications, https://ieeexplore.ieee.org/document/8115615 -
Stoat: Guided, Stochastic Model-Based GUI Testing of Android Apps, https://tingsu.github.io/files/stoat.html -
APE: Practical GUI Testing of Android Applications via Model Abstraction and Refinement, https://helloqirun.github.io/papers/icse19_tianxiao.pdf -
TimeMachine: Time-travel Testing of Android Apps, https://www.comp.nus.edu.sg/~dongz/res/time-travel-testing-21-01-2020.pdf -
Q-Testing: Reinforcement Learning Based Curiosity-Driven Testing of Android Applications, https://minxuepan.github.io/Pubs/Q-testing.pdf -
ComboDroid: generating high-quality test inputs for Android apps via use case combinations, https://dl.acm.org/doi/10.1145/3377811.3380382 -
Google Monkey, https://developer.android.com/studio/test/monkey -
Droidbot: a lightweight UI-Guided test input generator for android, https://ieeexplore.ieee.org/document/7965248 -
Humanoid: A Deep Learning-Based Approach to Automated Black-box Android App Testing, https://ieeexplore.ieee.org/document/8952324 -
Wuji: Automatic Online Combat Game Testing Using Evolutionary Deep Reinforcement Learning, https://yanzzzzz.github.io/files/PID6139619.pdf -
The Game Imitation: Deep Supervised Convolutional Networks for Quick Video Game AI, https://arxiv.org/abs/1702.05663 -
Automated Video Game Testing Using Synthetic and Human-Like Agents, https://ieeexplore.ieee.org/document/8869824 -
Counter-Strike Deathmatch with Large-Scale Behavioural Cloning, https://arxiv.org/pdf/2104.04258.pdf -
Developed based on the following tools: -
zalando/SwiftMonkey -
b1ueshad0w/OCMonkey -
zhangzhao4444/Fastmonkey -
facebook/WebDriverAgent -
AirtestProject/Airtest -
tianxiaogu/ape -
zhangzhao4444/Maxim -
tingsu/Stoat -
skull591/ComboDroid-Artifact -
yzygitzh/Humanoid -
anlalalu/Q-testing
Click to see and debug ❤