APM Overview
APM stands for Application Performance Management & Monitoring.
Performance issues are one of the main reasons for app user churn. If users encounter issues such as page lag, slow response times, severe heating, and high data and battery consumption while using our app, they are likely to uninstall it. This poses a significant challenge we face in our current work, especially on lower-end devices.
Commercial APM platforms include renowned services like NewRelic, as well as domestic platforms like Tingyun, OneAPM, Alibaba’s Baichuan – Mali APM SDK, and Baidu’s paid APM products.
How APM Works:
-
First, data is collected on the client side (Android, iOS, Web, etc.); -
Then, the collected data is organized and reported to the server (using various methods like JSON, XML, upload strategies, etc.); -
After receiving the data, the server models, stores, and analyzes it, and then visualizes the data (using Spark + Flink) for user access.
The tasks required on the mobile end include:
-
Unified dual-end principle (technology selection NDK C++) -
Data collection (collecting indicators, refining, etc.) -
Data storage (write to file? mmap, file IO stream) -
Data reporting (reporting strategy, reporting method)
So how should we proceed? We must learn to look at open-source solutions. Let’s first see how major companies handle it. We are building our own APM collection framework.
Current Core Open Source APM Framework Products
-
Tencent: https://github.com/Tencent/matrix -
360: https://github.com/Qihoo360/ArgusAPM -
Didi: https://github.com/didi/booster
You will find custom Gradle plugin technology, ASM technology, packaging process hooks, Android packaging processes, etc. Consider why the main processes are similar, while the specific implementation details differ, such as how to collect page frame rates, data flow, power consumption, GC logs, etc.
Introduction to ArgusAPM Performance Monitoring Platform & SDK Open Source – Bu Yuntao.pdf
Let’s take a quick look at how to monitor IO disk performance using Java Hook and Native Hook in Matrix.
The hook point for Java Hook is the system class CloseGuard
, and the hooking method uses dynamic proxy.
https://github.com/Tencent/matrix/blob/b83c481938b21c0080540d0c2babb04caa5e72c9/matrix/matrix-android/matrix-io-canary/src/main/java/com/tencent/matrix/iocanary/detect/CloseGuardHooker.java#L74
private boolean tryHook() {
try {
Class<?> closeGuardCls = Class.forName("dalvik.system.CloseGuard");
Class<?> closeGuardReporterCls = Class.forName("dalvik.system.CloseGuard$Reporter");
Method methodGetReporter = closeGuardCls.getDeclaredMethod("getReporter");
Method methodSetReporter = closeGuardCls.getDeclaredMethod("setReporter", closeGuardReporterCls);
Method methodSetEnabled = closeGuardCls.getDeclaredMethod("setEnabled", boolean.class);
sOriginalReporter = methodGetReporter.invoke(null);
methodSetEnabled.invoke(null, true);
// open matrix close guard also
MatrixCloseGuard.setEnabled(true);
ClassLoader classLoader = closeGuardReporterCls.getClassLoader();
if (classLoader == null) {
return false;
}
methodSetReporter.invoke(null, Proxy.newProxyInstance(classLoader,
new Class<?>[]{closeGuardReporterCls},
new IOCloseLeakDetector(issueListener, sOriginalReporter)));
return true;
} catch (Throwable e) {
MatrixLog.e(TAG, "tryHook exp=%s", e);
}
return false;
}
What is the use of CloseGuard here? Why would Tencent’s team hook this? We will discuss this in detail in subsequent sections. To solve this question, the best way is to look at the source code. (== System buried point method, monitoring system resource exceptions ==)
About Native Hook:
Native Hook uses PLT (GOT) Hook to hook the IO-related methods open
, read
, write
, and close
in the system shared object. After proxying these system methods, Matrix implements some logical subdivisions to detect different IO Issues.
https://github.com/Tencent/matrix/blob/b83c481938b21c0080540d0c2babb04caa5e72c9/matrix/matrix-android/matrix-io-canary/src/main/cpp/io_canary_jni.cc#L290
JNIEXPORT jboolean JNICALL
Java_com_tencent_matrix_iocanary_core_IOCanaryJniBridge_doHook(JNIEnv *env, jclass type) {
__android_log_print(ANDROID_LOG_INFO, kTag, "doHook");
for (int i = 0; i < TARGET_MODULE_COUNT; ++i) {
const char* so_name = TARGET_MODULES[i];
__android_log_print(ANDROID_LOG_INFO, kTag, "try to hook function in %s.", so_name);
// Open so file and map it into memory in ELF format
loaded_soinfo* soinfo = elfhook_open(so_name);
if (!soinfo) {
__android_log_print(ANDROID_LOG_WARN, kTag, "Failure to open %s, try next.", so_name);
continue;
}
// Replace open function
elfhook_replace(soinfo, "open", (void*)ProxyOpen, (void**)&original_open);
elfhook_replace(soinfo, "open64", (void*)ProxyOpen64, (void**)&original_open64);
bool is_libjavacore = (strstr(so_name, "libjavacore.so") != nullptr);
if (is_libjavacore) {
if (!elfhook_replace(soinfo, "read", (void*)ProxyRead, (void**)&original_read)) {
__android_log_print(ANDROID_LOG_WARN, kTag, "doHook hook read failed, try __read_chk");
// http://refspecs.linux-foundation.org/LSB_4.1.0/LSB-Core-generic/LSB-Core-generic/libc---read-chk-1.html Similar to read()
if (!elfhook_replace(soinfo, "__read_chk", (void*)ProxyRead, (void**)&original_read)) {
__android_log_print(ANDROID_LOG_WARN, kTag, "doHook hook failed: __read_chk");
elfhook_close(soinfo);
return false;
}
}
if (!elfhook_replace(soinfo, "write", (void*)ProxyWrite, (void**)&original_write)) {
__android_log_print(ANDROID_LOG_WARN, kTag, "doHook hook write failed, try __write_chk");
if (!elfhook_replace(soinfo, "__write_chk", (void*)ProxyWrite, (void**)&original_write)) {
__android_log_print(ANDROID_LOG_WARN, kTag, "doHook hook failed: __write_chk");
elfhook_close(soinfo);
return false;
}
}
}
// Hook OS
elfhook_replace(soinfo, "close", (void*)ProxyClose, (void**)&original_close);
elfhook_close(soinfo);
}
return true;
}
The core code for hook replacement (essentially pointer replacement).
Let’s take a look at the proxy method:
It’s clear that Tencent’s team did not consider the issue of self-threads. This can be optimized. For specific details of other parts, please refer to the source code.
About Transform API:
When compiling an Android project, if we want to obtain the Class files generated during compilation and perform some processing before generating Dex, we can write a Transform
to receive these inputs (the Class files produced during compilation) and add some things to the already generated inputs.
How to use it?
https://github.com/Tencent/matrix/blob/master/matrix/matrix-android/matrix-gradle-plugin/src/main/java/com/tencent/matrix/trace/transform/MatrixTraceTransform.java
-
Write a custom Transform
-
Register a Plugin to complete it or directly register in the gradle file.
//MyCustomPlgin.groovy
public class MyCustomPlgin implements Plugin<Project> {
@Override
public void apply(Project project) {
project.getExtensions().findByType(BaseExtension.class)
.registerTransform(new MyCustomTransform());
}
}
project.extensions.findByType(BaseExtension.class).registerTransform(new MyCustomTransform()); // Directly write in build.gradle
MatrixTraceTransform utilizes compile-time bytecode instrumentation technology to optimize the detection methods for FPS, lag, and startup on mobile devices. During the packaging process, we hook the Dex generation Task to add method instrumentation logic. Our hook point is after Proguard, where the Class has already been obfuscated, so we need to consider class obfuscation issues.
MatrixTraceTransform
mainly focuses on the transform
method:
@Override
public void transform(TransformInvocation transformInvocation) throws TransformException, InterruptedException, IOException {
long start = System.currentTimeMillis();
// Is it incremental compilation?
final boolean isIncremental = transformInvocation.isIncremental() && this.isIncremental();
// The result of the transform, redirect the output to this directory
final File rootOutput = new File(project.matrix.output, "classes/${getName()}/");
if (!rootOutput.exists()) {
rootOutput.mkdirs();
}
final TraceBuildConfig traceConfig = initConfig();
Log.i("Matrix." + getName(), "[transform] isIncremental:%s rootOutput:%s", isIncremental, rootOutput.getAbsolutePath());
// Get the Class obfuscation mapping information and store it in mappingCollector
final MappingCollector mappingCollector = new MappingCollector();
File mappingFile = new File(traceConfig.getMappingPath());
if (mappingFile.exists() && mappingFile.isFile()) {
MappingReader mappingReader = new MappingReader(mappingFile);
mappingReader.read(mappingCollector);
}
Map<File, File> jarInputMap = new HashMap<>();
Map<File, File> scrInputMap = new HashMap<>();
transformInvocation.inputs.each { TransformInput input ->
input.directoryInputs.each { DirectoryInput dirInput ->
// Collect and redirect class in the directory
collectAndIdentifyDir(scrInputMap, dirInput, rootOutput, isIncremental);
}
input.jarInputs.each { JarInput jarInput ->
if (jarInput.getStatus() != Status.REMOVED) {
// Collect and redirect class in the jar package
collectAndIdentifyJar(jarInputMap, scrInputMap, jarInput, rootOutput, isIncremental);
}
}
}
// Collect the method information that needs instrumentation, encapsulating each instrumentation information as a TraceMethod object
MethodCollector methodCollector = new MethodCollector(traceConfig, mappingCollector);
HashMap<String, TraceMethod> collectedMethodMap = methodCollector.collect(scrInputMap.keySet().toList(), jarInputMap.keySet().toList());
// Execute instrumentation logic, adding MethodBeat's i/o logic at the entrance and exit of the methods that need instrumentation
MethodTracer methodTracer = new MethodTracer(traceConfig, collectedMethodMap, methodCollector.getCollectedClassExtendMap());
methodTracer.trace(scrInputMap, jarInputMap);
// Execute the original transform logic; the default transformClassesWithDexBuilderForDebug task will convert Class to Dex
origTransform.transform(transformInvocation);
Log.i("Matrix." + getName(), "[transform] cost time: %dms", System.currentTimeMillis() - start);
}
Having reached this point, should we summarize what the core technology of APM is?
One of the major interview questions at big companies: What is the core technology of APM? Have you ever developed your own APM?
The core principle of APM can be summarized in one sentence:
If you grasp the core principles of APM, you can also implement Android’s seamless burying points; the essence is the same, only the hooking points differ.
APM Monitoring Dimensions and Metrics
The basic performance indicators of the app are concentrated in 8 categories: Network performance, crashes, startup loading, memory, images, page rendering, IM and VoIP (business relevance and your APP-related), and user behavior monitoring. The basic dimensions include App, system platform, App version, and time dimensions.
Network Performance
Network service success rate, average time consumed, and monitoring of access volume, upload and download rates. Access links, time consumed, etc. Consider how to combine with OkHttp?
Network Monitoring Business Background
In situations where it’s uncertain which network URL is taking longer, establishing a full-link network monitoring system is a matter worth our deep thought. Here we utilize AspectJ and OkHttp’s own EventListener to encapsulate and report network information.
Steps to Implement Network Monitoring
Step 1: Convert the data sources we need to monitor into Bundle objects for easier transmission
public interface BundleMapping {
/**
* Convert data structure to Bundle
*/
Bundle asBundle();
}
Step 2: Determine the fields we need to monitor
Field | Field Meaning | Remarks |
---|---|---|
total | Total network time | End time of the call minus start time of the request |
pathname | Request address | / |
dns | dnsEndTime – dnsStartTime | End time of DNS query minus start time of DNS query |
protocol | Request protocol | / |
tcp | End time of connection | End time of the call minus start time of connection |
no_dns_tcp_tls | Total network time | End time of the call minus start time of the request |
tls | Total network time | End time of the call minus start time of the request |
no_response | Whether there is no response content (304 or returned body is empty) | / |
ttfb | Time to first byte | End time of response minus start time of request |
download | Total network time | End time of response minus start time of response |
pure_network | Total network time | End time of response minus start time of call |
transfer_size | Transfer size | / |
failed | Request failure information | / |
public class NetWorkData implements BundleMapping {
// Request address
String url;
// Request protocol
String protocol;
// Request start time
long callStartTime;
// Response start time
long responseStartTime;
// Response end time
long responseEndTime;
// DNS query start time
long dnsStartTime;
// DNS query end time
long dnsEndTime;
// Connection start time
long connectStartTime;
// Connection end time
long connectEndTime;
// Whether to reuse DNS, TCP, TLS simultaneously
boolean isDnsTcpTls;
// SSL connection start time
long secureConnectStartTime;
// SSL connection end time
long secureConnectEndTime;
// Whether there is no response content (304 or returned body is empty)
boolean isNoResponse;
// Call end time
long callEndTime;
// Request failure information
String failMessage;
// Transfer size
long byteCount;
@Override
public Bundle asBundle() {
Bundle bundle = new Bundle();
// Total network time
bundle.putLong("total", callEndTime - callStartTime);
bundle.putString("pathname", url);
// DNS
bundle.putLong("dns", dnsEndTime - dnsStartTime);
bundle.putString("protocol", protocol);
// TCP
bundle.putLong("tcp", connectEndTime - connectStartTime);
bundle.putBoolean("no_dns_tcp_tls", isDnsTcpTls);
// TLS
bundle.putLong("tls", secureConnectEndTime - secureConnectStartTime);
bundle.putBoolean("no_response", isNoResponse);
// Time to first byte
bundle.putLong("ttfb", responseEndTime - callStartTime);
// Download
bundle.putLong("download", responseEndTime - responseStartTime);
// Pure network
bundle.putLong("pure_network", responseEndTime - callStartTime);
// Transfer size
bundle.putLong("transfer_size", byteCount);
// Connection failure
bundle.putString("failed", failMessage);
return bundle;
}
}
Step 3: Add tags for reporting based on different domains
/**
* Get reporting Tag
*/
String getDataTag() {
String tag;
if (url == null) {
tag = "biz";
} else {
if (url.contains("microkibaco_report")) {
tag = "data";
} else if (url.contains("blog")) {
tag = "blog";
} else {
tag = "github";
}
}
return "network_api_" + tag;
}
Step 5: Add AspectJ plugin
classpath 'com.hujiang.aspectjx:gradle-android-plugin-aspectjx:2.0.6'
implementation 'org.aspectj:aspectjrt:1.8.9'
implementation "com.squareup.okhttp3:okhttp:3.12.1"
Step 6: Add event listener to OkHttp
@Keep
public class MkOkHttpEventListener extends EventListener {
private NetWorkData mNetWorkData;
public static final Factory FACTORY = new Factory() {
@Override
public EventListener create(Call call) {
return new MkOkHttpEventListener();
}
};
/**
* Each request will build
*/
private MkOkHttpEventListener() {
mNetWorkData = new NetWorkData();
}
@Override
public void callStart(Call call) {
mNetWorkData.url = call.request().url().toString();
mNetWorkData.callStartTime = SystemClock.elapsedRealtime();
}
@Override
public void connectStart(Call call, InetSocketAddress inetSocketAddress, Proxy proxy) {
mNetWorkData.connectStartTime = SystemClock.elapsedRealtime();
}
@Override
public void connectEnd(Call call, InetSocketAddress inetSocketAddress, Proxy proxy,
Protocol protocol) {
mNetWorkData.connectEndTime = SystemClock.elapsedRealtime();
}
@Override
public void connectFailed(Call call, InetSocketAddress inetSocketAddress, Proxy proxy,
Protocol protocol, IOException ioe) {
}
@Override
public void dnsStart(Call call, String domainName) {
mNetWorkData.dnsStartTime = SystemClock.elapsedRealtime();
}
@Override
public void dnsEnd(Call call, String domainName, List<InetAddress> inetAddressList) {
mNetWorkData.dnsEndTime = SystemClock.elapsedRealtime();
}
@Override
public void secureConnectStart(Call call) {
mNetWorkData.secureConnectStartTime = SystemClock.elapsedRealtime();
}
@Override
public void secureConnectEnd(Call call, Handshake handshake) {
mNetWorkData.secureConnectEndTime = SystemClock.elapsedRealtime();
}
@Override
public void responseHeadersStart(Call call) {
mNetWorkData.responseStartTime = SystemClock.elapsedRealtime();
}
@Override
public void responseHeadersEnd(Call call, Response response) {
if (response.code() == 304) {
mNetWorkData.isNoResponse = true;
}
mNetWorkData.protocol = response.protocol().toString();
mNetWorkData.responseEndTime = SystemClock.elapsedRealtime();
}
@Override
public void responseBodyEnd(Call call, long byteCount) {
// Response is empty
if (byteCount == 0) {
mNetWorkData.isNoResponse = true;
}
mNetWorkData.byteCount = byteCount;
}
@Override
public void callEnd(Call call) {
mNetWorkData.callEndTime = SystemClock.elapsedRealtime();
// Report
APM.getReportStrategy().report(mNetWorkData.getDataTag(), mNetWorkData.asBundle());
}
@Override
public void callFailed(Call call, IOException ex) {
mNetWorkData.failMessage = ex.getMessage();
// Report
APM.getReportStrategy().report(mNetWorkData.getDataTag(), mNetWorkData.asBundle());
}
}
Firebase has a certain timeliness, so we collect all logs into JSON and submit them uniformly.
Crashes
Crash data collection and analysis are similar to the functionality of the Bugly platform.
Startup Loading
We have made significant efforts to optimize the app’s startup, including multithreading and using Spark’s directed acyclic graph (DAG) to handle business dependencies. We monitor the cold startup duration of the app, the first startup duration after Android installation, and the startup loading duration of Android Bundle (Atlas framework).
Memory
The four monitoring targets are: memory peak, average memory, memory jitter, and memory leaks.
IM and VoIP Business Metrics
These two belong to the monitoring of business technical indicators, such as the monitoring of message arrival rates for various IM messages and the success rates, average time, and request volumes for VoIP calls. This requires sorting according to the business of your own app.
User Behavior Monitoring
This is used to statistically analyze user behaviors in the app, essentially monitoring all events and sending the events to the server. This was previously done through buried points, and now it is also standardized as part of APM’s responsibilities, such as the user access path, similar to the concepts of PV and UV in the PC era.
Images
Monitoring of resource files, such as Bitmap redundancy handling. Using the haha library for processing and indexing values.
Page Rendering
Monitoring of interface smoothness, FPS monitoring, slow function monitoring, stuttering monitoring, and monitoring various issues that lead to page rendering.
WeChat Matrix Framework Analysis
Overall Analysis
Matrix.Builder builder = new Matrix.Builder(application); // build matrix
builder.patchListener(new TestPluginListener(this)); // add general pluginListener
DynamicConfigImplDemo dynamicConfig = new DynamicConfigImplDemo(); // dynamic config
// init plugin
IOCanaryPlugin ioCanaryPlugin = new IOCanaryPlugin(new IOConfig.Builder()
.dynamicConfig(dynamicConfig)
.build());
//add to matrix
builder.plugin(ioCanaryPlugin);
//init matrix
Matrix.init(builder.build());
// start plugin
ioCanaryPlugin.start();
The organized structure is as follows:
Core functions:
Resource Canary:
Activity leak
Bitmap redundancy
Trace Canary
Interface smoothness
Startup time
Page switching time
Slow functions
Stuttering
SQLite Lint: Automated detection of SQLite statement quality based on official best practices
IO Canary: Detect file IO issues
File IO monitoring
Closeable Leak monitoring
Overall architecture analysis:
matrix-android-lib
-
Plugin is defined as an abstraction of a certain monitoring capability -
Issue represents a certain monitored event -
Report is an implementation of the observer pattern used to notify observers when an issue is discovered -
Matrix implements the singleton pattern, exposing interfaces to the outside -
matrix-config.xml contains relevant monitoring configuration items -
IDynamicConfig is open for user-defined relevant monitoring configuration items, with instances held by various plugins
Core plugin interface
-
IPlugin is an abstraction of a monitoring capability -
PluginListener is open to users, perceiving the lifecycle capabilities of monitoring module initialization, startup, stopping, and issue discovery. Users can implement their own and inject them into Matrix -
Plugin 1. Default implementation of IPlugin to trigger the lifecycle of related PluginListener 2. Implements IssuePublisher.OnIssueDetectListener’s onDetectIssue method -
This method will be called when a specific monitoring event occurs, such as Matrix.with().getPluginByClass(xxx).onDetectIssue(Issue). Note that this is the first notification method -
This method internally assigns relevant environment variables for the Issue, such as the information of the Plugin that triggered the Issue -
This method ultimately triggers PluginListener#onReportIssue
public interface IPlugin {
/**
* Used to identify the current monitoring, equivalent to a name index (can also be indexed directly by classname)
*/
String getTag();
/**
* Called during the construction of the Matrix object
*/
void init(Application application, PluginListener pluginListener);
/**
* Ability to perceive activity foreground and background transitions
*/
void onForeground(boolean isForeground);
void start();
void stop();
void destroy();
}
public interface PluginListener {
void onInit(Plugin plugin);
void onStart(Plugin plugin);
void onStop(Plugin plugin);
void onDestroy(Plugin plugin);
void onReportIssue(Issue issue);
}
Matrix External Interface
public class Matrix {
private static final String TAG = "Matrix.Matrix";
/********************************** Singleton Implementation **********************/
private static volatile Matrix sInstance;
public static Matrix init(Matrix matrix) {
if (matrix == null) {
throw new RuntimeException("Matrix init, Matrix should not be null.");
}
synchronized (Matrix.class) {
if (sInstance == null) {
sInstance = matrix;
} else {
MatrixLog.e(TAG, "Matrix instance is already set. this invoking will be ignored");
}
}
return sInstance;
}
public static boolean isInstalled() {
return sInstance != null;
}
public static Matrix with() {
if (sInstance == null) {
throw new RuntimeException("you must init Matrix sdk first");
}
return sInstance;
}
/**************************** Constructor **********************/
private final Application application;
private final HashSet<Plugin> plugins;
private final PluginListener pluginListener;
private Matrix(Application app, PluginListener listener, HashSet<Plugin> plugins) {
this.application = app;
this.pluginListener = listener;
this.plugins = plugins;
for (Plugin plugin : plugins) {
plugin.init(application, pluginListener);
pluginListener.onInit(plugin);
}
}
/**************************** Control Ability **********************/
public void startAllPlugins() {
for (Plugin plugin : plugins) {
plugin.start();
}
}
public void stopAllPlugins() {
for (Plugin plugin : plugins) {
plugin.stop();
}
}
public void destroyAllPlugins() {
for (Plugin plugin : plugins) {
plugin.destroy();
}
}
/**************************** Get | Set **********************/
public Plugin getPluginByTag(String tag) {
for (Plugin plugin : plugins) {
if (plugin.getTag().equals(tag)) {
return plugin;
}
}
return null;
}
public <T extends Plugin> T getPluginByClass(Class<T> pluginClass) {
String className = pluginClass.getName();
for (Plugin plugin : plugins) {
if (plugin.getClass().getName().equals(className)) {
return (T) plugin;
}
}
return null;
}
/**************************** Other **********************/
public static void setLogIml(MatrixLog.MatrixLogImp imp) {
MatrixLog.setMatrixLogImp(imp);
}
}
Utils Auxiliary Functions
-
MatrixLog provides users with logging capabilities -
MatrixHandlerThread is a common thread, often used for executing tasks asynchronously -
DeviceUtil retrieves relevant device information -
MatrixUtil determines the main thread and other utilities
IssuePublisher Monitored Event Observer
-
Issue is a monitored event type: used to distinguish between different types of reports with the same tag tag: the tag corresponding to this report stack: the stack corresponding to this report process: the name of the process corresponding to this report time: the time when the issue occurred
-
IssuePublisher follows the observer pattern
-
Holds a publishing Listener (its implementation is often the above Plugin)
-
Holds a map of published information to avoid duplicate publishing for the same event during a single runtime
-
Generally, a certain monitoring detector often inherits this class, and when an event is detected, it calls publishIssue(Issue) -> IssuePublisher.OnIssueDetectListener’s onDetectIssue method -> ultimately triggers PluginListener#onReportIssue
Core Source Code Analysis of Matrix’s IO Monitoring Module
IO Canary: The core function is to detect file IO issues, including: file IO monitoring and Closeable Leak monitoring. To understand IO monitoring and read the open-source code, the most important foundation is to master Native and Java-level Hook.
The Java-level hook is mainly based on reflection technology, which everyone is familiar with. Now let’s talk about the Native-level Hook. At the JVM level, Android uses three mainstream technologies: Android PLT (Procedure Linkage Table) Hook, Inline Hook, and ptrace.
Matrix uses PLT technology to implement SO file API Hook.
ELF: https://en.wikipedia.org/wiki/Executable_and_Linkable_Format
https://refspecs.linuxbase.org/elf/elf.pdf
Three forms of ELF files:
-
Relocatable object files, which are the familiar .o files; of course, .a libraries also count (since they are a collection of .o files) -
Executable object files, which are executable files under Linux -
Shared object files, which are dynamic link libraries .so
We consider that C programs need to be compiled and linked before running. From the perspective of participating in program execution, ELF files are also divided into two views.
The ELF header is located at the very beginning of the file, describing the organization of the entire file. The Program Header Table tells the system how to create the process image; it must exist when executing programs but is not required in relocatable files. Each program header describes a segment, including the segment’s size and addresses in the file and memory. The segments in the execution view are usually composed of many sections, such as text segments and data segments.
About the role and concept of relocation
Relocation is the process of linking symbol references to symbol definitions, which is also one of the main tasks of the Android linker. When a function is called in a program, the related call instruction must transfer control flow to the correct target address at execution time. Therefore, the so file must contain some relocation-related information, which the linker uses to complete the relocation work.
https://docs.oracle.com/cd/E19683-01/816-1386/chapter6-54839/index.html
https://android.googlesource.com/platform/bionic/+/master/linker/linker.cpp
The structure of the symbol table entry is elf32_sym:
typedef struct elf32_sym {
Elf32_Word st_name; /* Name - index into string table */
Elf32_Addr st_value; /* Offset address */
Elf32_Word st_size; /* Symbol length (for example, function length) */
unsigned char st_info; /* Type and binding type */
unsigned char st_other; /* Undefined */
Elf32_Half st_shndx; /* Section header index, indicating which section it is located in */
} Elf32_Sym;
Core relocation code:
http://androidxref.com/8.0.0_r4/xref/bionic/linker/linker.cpp#2513 (For specific relocation type definitions and calculation methods, refer to section 4.6.1.2 of the ELF specification document)
Basic principles of Android PLT Hook
When executing dynamically linked ELF files, Linux uses a strategy called delayed binding for performance optimization. When a function in a dynamically linked ELF program is called, the first time it calls, it looks up the corresponding entry in the PLT table, which then jumps to the GOT table to obtain the actual address of that function. At this point, the GOT table points to the code below the jump instruction in the PLT, which ultimately executes _dl_runtime_resolve()
and executes the target function. Therefore, PLT Hook modifies the GOT table directly, so that when the function of that shared library is called, it jumps to the user-defined Hook function code.
IO monitoring process:
JNIEXPORT jboolean JNICALL
Java_com_tencent_matrix_iocanary_core_IOCanaryJniBridge_doHook(JNIEnv *env, jclass type) {
__android_log_print(ANDROID_LOG_INFO, kTag, "doHook");
for (int i = 0; i < TARGET_MODULE_COUNT; ++i) {
const char* so_name = TARGET_MODULES[i];
__android_log_print(ANDROID_LOG_INFO, kTag, "try to hook function in %s.", so_name);
loaded_soinfo* soinfo = elfhook_open(so_name);
if (!soinfo) {
__android_log_print(ANDROID_LOG_WARN, kTag, "Failure to open %s, try next.", so_name);
continue;
}
// Hook OS
elfhook_replace(soinfo, "open", (void*)ProxyOpen, (void**)&original_open);
elfhook_replace(soinfo, "open64", (void*)ProxyOpen64, (void**)&original_open64);
bool is_libjavacore = (strstr(so_name, "libjavacore.so") != nullptr);
if (is_libjavacore) {
if (!elfhook_replace(soinfo, "read", (void*)ProxyRead, (void**)&original_read)) {
__android_log_print(ANDROID_LOG_WARN, kTag, "doHook hook read failed, try __read_chk");
if (!elfhook_replace(soinfo, "__read_chk", (void*)ProxyRead, (void**)&original_read)) {
__android_log_print(ANDROID_LOG_WARN, kTag, "doHook hook failed: __read_chk");
elfhook_close(soinfo);
return false;
}
}
// Hook OS
if (!elfhook_replace(soinfo, "write", (void*)ProxyWrite, (void**)&original_write)) {
__android_log_print(ANDROID_LOG_WARN, kTag, "doHook hook write failed, try __write_chk");
if (!elfhook_replace(soinfo, "__write_chk", (void*)ProxyWrite, (void**)&original_write)) {
__android_log_print(ANDROID_LOG_WARN, kTag, "doHook hook failed: __write_chk");
elfhook_close(soinfo);
return false;
}
}
}
// Hook OS
elfhook_replace(soinfo, "close", (void*)ProxyClose, (void**)&original_close);
elfhook_close(soinfo);
}
return true;
}
The core code for hook replacement (essentially pointer replacement).
Let’s take a look at the proxy method:
It’s clear that Tencent’s team did not consider the issue of self-threads. This can be optimized. For specific details of other parts, please refer to the source code.
Core Source Code Analysis of Matrix’s Memory Leak Monitoring Module
https://github.com/Tencent/matrix/wiki/Matrix-Android-ResourceCanary
Design Goals:
-
Automatically and accurately monitor Activity leaks, triggering Dump Hprof only after discovering leaks, instead of blindly triggering based on pre-set memory usage thresholds -
Automatically obtain the reference chain of leaked Activity and redundant Bitmap objects -
Flexibly extend the Hprof analysis logic, allowing extraction of Hprof files for manual analysis if necessary
To solve online monitoring and background analysis, Matrix’s ResourceCanary ultimately decided to split the monitoring steps and analysis steps into two independent tools to meet design goals.
-
The Hprof file is kept on the server, providing opportunities for manual analysis -
If the trigger for Dump Hprof is skipped, the monitoring steps can even be enabled in the production environment to discover Activity leaks that are difficult to trigger during testing
The client-side addresses the monitoring of memory leaks and the trimming of Hprof files. See the flowchart below:
ResourcePlugin
ResourcePlugin
is the entry point for this module, responsible for registering Android lifecycle listeners and configuring some parameters and interface callbacks.
ActivityRefWatcher is responsible for tasks such as popping up the Dump memory dialog, dumping memory data, reading memory data to trim Hprof files, generating information about the leaked Activity (process number, Activity name, time, etc.), and notifying the main thread to complete the backup of memory information and close the dialog.
Let’s take a look at the core memory leak monitoring code:
//ActivityRefWatcher
private final Application.ActivityLifecycleCallbacks mRemovedActivityMonitor = new ActivityLifeCycleCallbacksAdapter() {
private int mAppStatusCounter = 0;
private int mUIConfigChangeCounter = 0;
@Override
public void onActivityCreated(Activity activity, Bundle savedInstanceState) {
mCurrentCreatedActivityCount.incrementAndGet();
}
@Override
public void onActivityStarted(Activity activity) {
if (mAppStatusCounter <= 0) {
MatrixLog.i(TAG, "we are in foreground, start watcher task.");
mDetectExecutor.executeInBackground(mScanDestroyedActivitiesTask);
}
if (mUIConfigChangeCounter < 0) {
++mUIConfigChangeCounter;
} else {
++mAppStatusCounter;
}
}
@Override
public void onActivityStopped(Activity activity) {
if (activity.isChangingConfigurations()) {
--mUIConfigChangeCounter;
} else {
--mAppStatusCounter;
if (mAppStatusCounter <= 0) {
MatrixLog.i(TAG, "we are in background, stop watcher task.");
mDetectExecutor.clearTasks();
}
}
}
@Override
public void onActivityDestroyed(Activity activity) {
// When the activity is destroyed, start...
pushDestroyedActivityInfo(activity);
synchronized (mDestroyedActivityInfos) {
mDestroyedActivityInfos.notifyAll();
}
}
};
private void pushDestroyedActivityInfo(Activity activity) {
final String activityName = activity.getClass().getName();
// This Activity is confirmed to have a leak and has been reported
if (isPublished(activityName)) {
MatrixLog.d(TAG, "activity leak with name %s had published, just ignore", activityName);
return;
}
final UUID uuid = UUID.randomUUID();
final StringBuilder keyBuilder = new StringBuilder();
// Generate a unique identifier for the Activity instance
keyBuilder.append(ACTIVITY_REFKEY_PREFIX).append(activityName)
.append('_').append(Long.toHexString(uuid.getMostSignificantBits())).append(Long.toHexString(uuid.getLeastSignificantBits()));
final String key = keyBuilder.toString();
// Construct a data structure representing a destroyed Activity
final DestroyedActivityInfo destroyedActivityInfo
= new DestroyedActivityInfo(key, activity, activityName, mCurrentCreatedActivityCount.get());
// Put it into ConcurrentLinkedQueue for subsequent checks
mDestroyedActivityInfos.add(destroyedActivityInfo);
}
The core code for monitoring memory leaks:
private final RetryableTask mScanDestroyedActivitiesTask = new RetryableTask() {
@Override
public Status execute() {
// If the destroyed activity list is empty, just wait to save power.
while (mDestroyedActivityInfos.isEmpty()) {
synchronized (mDestroyedActivityInfos) {
try {
mDestroyedActivityInfos.wait();
} catch (Throwable ignored) {
// Ignored.
}
}
}
// Fake leaks will be generated when debugger is attached.
// Debugging mode, detection may fail, directly return
if (Debug.isDebuggerConnected() && !mResourcePlugin.getConfig().getDetectDebugger()) {
MatrixLog.w(TAG, "debugger is connected, to avoid fake result, detection was delayed.");
return Status.RETRY;
}
// Create a weak reference to an object
final WeakReference<Object> sentinelRef = new WeakReference<>(new Object());
triggerGc();
// If the system did not execute GC, directly return
if (sentinelRef.get() != null) {
// System ignored our gc request, we will retry later.
MatrixLog.d(TAG, "system ignore our gc request, wait for next detection.");
return Status.RETRY;
}
final Iterator<DestroyedActivityInfo> infoIt = mDestroyedActivityInfos.iterator();
while (infoIt.hasNext()) {
final DestroyedActivityInfo destroyedActivityInfo = infoIt.next();
// This instance of Activity has been marked as leaked, skip this instance
if (isPublished(destroyedActivityInfo.mActivityName)) {
MatrixLog.v(TAG, "activity with key [%s] was already published.", destroyedActivityInfo.mActivityName);
infoIt.remove();
continue;
}
// If we cannot obtain the Activity instance through weak reference, it means it has been reclaimed, skip this instance
if (destroyedActivityInfo.mActivityRef.get() == null) {
// The activity was recycled by a gc triggered outside.
MatrixLog.v(TAG, "activity with key [%s] was already recycled.", destroyedActivityInfo.mKey);
infoIt.remove();
continue;
}
// This Activity instance has been detected as leaked, increment the count
++destroyedActivityInfo.mDetectedCount;
// The difference in the number of Activity instances between the currently displayed Activity instance and the leaked Activity instance
long createdActivityCountFromDestroy = mCurrentCreatedActivityCount.get() - destroyedActivityInfo.mLastCreatedActivityCount;
// If the number of detections for the leaked Activity instance does not reach the threshold, or if the leaked Activity is very close to the currently displayed Activity, it can be considered a fault tolerance measure (there are such scenarios in practical applications), skip this instance
if (destroyedActivityInfo.mDetectedCount < mMaxRedetectTimes
|| (createdActivityCountFromDestroy < CREATED_ACTIVITY_COUNT_THRESHOLD && !mResourcePlugin.getConfig().getDetectDebugger())) {
// Although the sentinel tells us the activity should have been recycled,
// the system may still ignore it, so try again until we reach max retry times.
MatrixLog.i(TAG, "activity with key [%s] should be recycled but actually still
+ "exists in %s times detection with %s created activities during destroy, wait for next detection to confirm.",
destroyedActivityInfo.mKey, destroyedActivityInfo.mDetectedCount, createdActivityCountFromDestroy);
continue;
}
MatrixLog.i(TAG, "activity with key [%s] was suspected to be a leaked instance.", destroyedActivityInfo.mKey);
if (mHeapDumper != null) {
final File hprofFile = mHeapDumper.dumpHeap();
if (hprofFile != null) {
markPublished(destroyedActivityInfo.mActivityName);
final HeapDump heapDump = new HeapDump(hprofFile, destroyedActivityInfo.mKey, destroyedActivityInfo.mActivityName);
mHeapDumpHandler.process(heapDump);
infoIt.remove();
} else {
MatrixLog.i(TAG, "heap dump for further analyzing activity with key [%s] was failed, just ignore.",
destroyedActivityInfo.mKey);
infoIt.remove();
}
} else {
// Lightweight mode, just report leaked activity name.
MatrixLog.i(TAG, "lightweight mode, just report leaked activity name.");
markPublished(destroyedActivityInfo.mActivityName);
if (mResourcePlugin != null) {
final JSONObject resultJson = new JSONObject();
try {
resultJson.put(SharePluginInfo.ISSUE_ACTIVITY_NAME, destroyedActivityInfo.mActivityName);
} catch (JSONException e) {
MatrixLog.printErrStackTrace(TAG, e, "unexpected exception.");
}
mResourcePlugin.onDetectIssue(new Issue(resultJson));
}
}
}
return Status.RETRY;
}
};
Core points summary:
-
In the UIThreadMonitor, there are two arrays of length three: queueStatus
andqueueCost
, corresponding to the status and time consumed in each frame’s input event phase, animation phase, and traversal drawing phase.queueStatus
has three values: DO_QUEUE_DEFAULT, DO_QUEUE_BEGIN, and DO_QUEUE_END. -
UIThreadMonitor
implements theRunnable
interface, also to setUIThreadMonitor
as the callback method for input eventCALLBACK_INPUT
, to be set inChoreographer
.
Having understood the principles of detecting stuttering, how about calculating FPS?
The time information for each frame is callbacked through HashSet<LooperObserver> observers
. Let’s see where the observers
are added. We mainly look at the FrameTracer
class, which involves the code related to FPS calculation.
FPSCollector
is an inner class of FrameTracer
that implements the IDoFrameListener
interface, mainly focusing on the logic in the doFrameAsync()
method.
-
Code 1: It creates a corresponding FrameCollectItem object based on the current ActivityName, which is stored in a HashMap -
Code 2: Calls FrameCollectItem#collect()
to calculate FPS and other information -
Code 3: If the total drawing time of this Activity exceeds timeSliceMs (default is 10s), it calls FrameCollectItem#report()
to report statistical data and removes the current ActivityName and corresponding FrameCollectItem object from the HashMap
private class FPSCollector extends IDoFrameListener {
private Handler frameHandler = new Handler(MatrixHandlerThread.getDefaultHandlerThread().getLooper());
private HashMap<String, FrameCollectItem> map = new HashMap<>();
@Override
public Handler getHandler() {
return frameHandler;
}
@Override
public void doFrameAsync(String focusedActivityName, long frameCost, int droppedFrames) {
super.doFrameAsync(focusedActivityName, frameCost, droppedFrames);
if (Utils.isEmpty(focusedActivityName)) {
return;
}
FrameCollectItem item = map.get(focusedActivityName); // Code 1
if (null == item) {
item = new FrameCollectItem(focusedActivityName);
map.put(focusedActivityName, item);
}
item.collect(droppedFrames); // Code 2
if (item.sumFrameCost >= timeSliceMs) { // report // Code 3
map.remove(focusedActivityName);
item.report();
}
}
}
FrameCollectItem: calculates FPS.
-
Calculates fps value based on
float fps = Math.min(60.f, 1000.f * sumFrame / sumFrameCost)
-
Tracks the total number of dropped frames
-
Represents total time consumed
FPS calculation reference materials:
https://github.com/friendlyrobotnyc/TinyDancer read the code carefully and apply it to your own project.
About the hook part:
ActivityThreadHacker.java
Utilized reflection mechanisms for hooking, with clear code and a clear purpose of replacing the mCallback in the ActivityThread class, achieving the effect of intercepting the original messages of mCallback and selectively processing the messages we want. Understand the startup process of Activity and the interaction between AMS and ActivityThread processes.
Obtains the timing of Activity startup.
AppMethodBeat records the execution time of each method through hooking.
/**
* Hook method when it's called in.
*
* @param methodId
*/
public static void i(int methodId) {
if (status <= STATUS_STOPPED) {
return;
}
if (methodId >= METHOD_ID_MAX) {
return;
}
if (status == STATUS_DEFAULT) {
synchronized (statusLock) {
if (status == STATUS_DEFAULT) {
realExecute();
status = STATUS_READY;
}
}
}
if (Thread.currentThread().getId() == sMainThread.getId()) {
if (assertIn) {
android.util.Log.e(TAG, "ERROR!!! AppMethodBeat.i Recursive calls!!!");
return;
}
assertIn = true;
if (sIndex < Constants.BUFFER_SIZE) {
mergeData(methodId, sIndex, true);
} else {
sIndex = -1;
}
++sIndex;
assertIn = false;
}
}
/**
* Hook method when it's called out.
*
* @param methodId
*/
public static void o(int methodId) {
if (status <= STATUS_STOPPED) {
return;
}
if (methodId >= METHOD_ID_MAX) {
return;
}
if (Thread.currentThread().getId() == sMainThread.getId()) {
if (sIndex < Constants.BUFFER_SIZE) {
mergeData(methodId, sIndex, false);
} else {
sIndex = -1;
}
++sIndex;
}
}
/**
* Merge trace info as a long data
*
* @param methodId
* @param index
* @param isIn
*/
private static void mergeData(int methodId, int index, boolean isIn) {
if (methodId == AppMethodBeat.METHOD_ID_DISPATCH) {
// Record the time difference between the above two methods
sCurrentDiffTime = SystemClock.uptimeMillis() - sDiffTime;
}
long trueId = 0L;
if (isIn) {
trueId |= 1L << 63;
}
trueId |= (long) methodId << 43;
trueId |= sCurrentDiffTime & 0x7FFFFFFFFFFL;
sBuffer[index] = trueId;
checkPileup(index);
sLastIndex = index;
}
Having read the source code up to this point, we should think about how to locate method calls.
https://github.com/Tencent/matrix/blob/b54b09ae06cc225c1cc9aedc8be39f3db4a2a340/matrix/matrix-android/matrix-gradle-plugin/src/main/java/com/tencent/matrix/trace/MethodTracer.java
Using ASM technology to execute the i()
method in the class com/tencent/matrix/trace/core/AppMethodBeat
before each method and execute the o()
method at the end of each method.
ASM
Application scenarios:
-
Seamless burying points -
Hook -
APM monitoring -
Dynamic modification of code requirements during compilation
What is ASM
ASM can directly generate binary class files and also enhance the functionality of existing classes. Java classes are stored in strictly formatted .class files, which contain sufficient metadata to parse all elements in the class: class names, methods, attributes, and Java bytecode (instructions).
http://blog.jamesdbloom.com/JavaCodeToByteCode_PartOne.html
https://blog.csdn.net/weelyy/article/details/78969412
https://asm.ow2.io/
Class file structure:
-
Magic: This item stores the magic number (magic number) and version information of a Java class file. The first four bytes of a Java class file are called its magic number. Every valid Java class file starts with 0xCAFEBABE, ensuring that the Java virtual machine can easily distinguish Java files from non-Java files. -
Version: This item stores the version information of the Java class file, which is important for a Java file. As Java technology continues to evolve, the format of class files is also changing. The version information of class files informs the virtual machine how to read and process that class file. -
Constant Pool: This item stores various literal strings, class names, method names, interface names, final variables, and references to external classes. The virtual machine must maintain a constant pool for each loaded class, storing all types, fields, and method symbol references used in the corresponding type. Therefore, it plays a core role in Java’s dynamic linking. The constant pool occupies an average of about 60% of the total size of the class. -
Access_flag: This item indicates whether the file defines a class or an interface (only one class or interface can be in a class file) and specifies the access flags of the class or interface, such as public, private, abstract, etc. ACC_ -
This Class: Points to a pointer constant that represents the fully qualified name of the class. -
Super Class: Points to a pointer constant that represents the fully qualified name of the parent class. -
Interfaces: An array of pointers storing the names of all interfaces implemented by the class or parent class as string constant pointers. The above three items, especially the first two, are generally modified when we derive a new class from an existing class using ASM: changing the class name to the subclass name; changing the parent class to the class name before deriving; adding new implementation interfaces if necessary. -
Fields: This item provides detailed descriptions of fields declared in the class or interface. It is worth noting that the fields list only lists fields in the class or interface itself and does not include fields inherited from superclasses and parent interfaces. -
Methods: This item provides detailed descriptions of methods declared in the class or interface, such as method names, parameters, and return types. It is worth noting that the methods list only contains methods in the class or interface itself and does not include methods inherited from superclasses and parent interfaces. Using ASM for AOP programming usually involves adjusting the instructions in the Method. -
Class attributes: This item stores basic information about attributes defined by the class or interface in that file.
Core Classes
-
ClassReader: This class is used to parse compiled class bytecode files. (Event producer) -
ClassWriter: This class is used to reconstruct compiled classes, such as modifying class names, attributes, and methods, or even generating new class bytecode files. (Event consumer, subclass of ClassVisitor) -
ClassVisitor: Responsible for visiting class member information. This includes annotations marked on the class, constructors of the class, fields of the class, methods of the class, and static code blocks. -
AdviceAdapter: Implements the MethodVisitor interface and is responsible for visiting method information to perform specific bytecode operations.
Function Instrumentation
What is function instrumentation?
Instrumentation: Inserting or modifying code at certain locations in the target program code to obtain certain program states during the program’s runtime for analysis. In simple terms, it involves inserting code into the code. Therefore, function instrumentation refers to inserting or modifying code within functions. During the Android compilation process, custom bytecode is inserted into the bytecode, so it can also be referred to as bytecode instrumentation.
Purpose
Function instrumentation can help us achieve many surgical-like code designs, such as seamless statistical reporting, lightweight AOP, etc. Applied in Android, it can be used for behavior statistics, method time consumption statistics, and other functionalities.
About Gradle Transform API
http://google.github.io/android-gradle-dsl/javadoc/2.1/com/android/build/api/transform/Transform.html
Common scenarios for using Transform
include:
Generally, we use Transform
for the following two scenarios:
-
We need to perform custom processing on compiled class files. -
We need to read the class files generated during compilation and perform some other tasks, but do not need to modify them.
public abstract class Transform {
public abstract String getName(); // Name of the custom Transform
public abstract Set<ContentType> getInputTypes(); // Input types processed by Transform (CLASSES, RESOURCES)
public abstract Set<? super Scope> getScopes();
public abstract boolean isIncremental(); // Whether incremental compilation is supported
}
Scope code:
enum Scope implements ScopeType {
/** Only the project content */
PROJECT(0x01), // Only the current project's code
/** Only the project's local dependencies (local jars) */
PROJECT_LOCAL_DEPS(0x02), // Local jar of the project
/** Only the sub-projects. */
SUB_PROJECTS(0x04), // Only sub-projects
/** Only the sub-projects' local dependencies (local jars). */
SUB_PROJECTS_LOCAL_DEPS(0x08),
/** Only the external libraries */
EXTERNAL_LIBRARIES(0x10),
/** Code that is being tested by the current variant, including dependencies */
TESTED_CODE(0x20),
/** Local or remote dependencies that are provided-only */
PROVIDED_ONLY(0x40);
}
Test code:
public void transform(TransformInvocation invocation) {
for (TransformInput input : invocation.getInputs()) {
input.getJarInputs().parallelStream().forEach(jarInput -> {
File src = jarInput.getFile();
JarFile jarFile = new JarFile(file);
Enumeration<JarEntry> entries = jarFile.entries();
while (entries.hasMoreElements()) {
JarEntry entry = entries.nextElement();
// Process
}
}
}
About Me
Unending bugs, unending pretentiousness. Public account: Small Wooden Box Growth Camp, focusing on the mobile development field, covering audio and video, APM, information security, and various knowledge areas; only the geekiest public account on the internet, welcome your attention!