Analysis of the Android FART Unpacking Process

1. Introduction

On the Android platform, the Java code written by programmers is ultimately compiled into bytecode that runs on the Android virtual machine. Since Android came into the public eye, decompilation tools such as apktool and jadx have emerged one after another, becoming increasingly powerful. The bytecode compiled from Java has become vulnerable in front of these decompilation tools, which is akin to a person running naked in a sea of people, with every part of their body exposed to the public. With the emergence of one thing, its opposite will also arise. The appearance of decompilation tools will naturally lead to the emergence of anti-decompilation tools, which we generally refer to as strengthening techniques. Once an app is strengthened, it is equivalent to dressing that naked person, and the “clothes” protect the app to some extent, making it not so easy to decompile. Of course, with the emergence of strengthening techniques, anti-strengthening techniques will also emerge, which is the unpacking technology that this article will analyze.

Android has undergone several version updates, with many changes both in appearance and internally. The early Android used the Dalvik virtual machine, while starting with Android 4.4, the ART virtual machine was added but not enabled by default. From Android 5.0 onwards, ART replaced Dalvik as the default virtual machine. Due to the different operating mechanisms of Dalvik and ART, the unpacking principles within them are also somewhat different. This article analyzes the unpacking scheme under ART: FART. Its overall idea is to achieve unpacking through active invocation. Project address: https://github.com/hanbinglengyue/FART (please click “Read Original” to view the link). The code for FART is created by modifying a small number of Android source files, compiling the modified Android source into a system image, and flashing it into a phone. After such a phone starts, it becomes a device that can be used for unpacking.

2. Process Analysis

The entry point of FART is in frameworks\base\core\java\android\app\ActivityThread.java in the performLaunchActivity function, which is executed when the app’s Activity starts with fartthread.

private Activity performLaunchActivity(ActivityClientRecord r, Intent customIntent) {    Log.e("ActivityThread","go into performLaunchActivity");    ActivityInfo aInfo = r.activityInfo;    if (r.packageInfo == null) {        r.packageInfo = getPackageInfo(aInfo.applicationInfo, r.compatInfo,                Context.CONTEXT_INCLUDE_CODE);    }    ......    // Start fart thread    fartthread();    ......}

The fartthread function starts a thread that sleeps for one minute before calling the fart function.

public static void fartthread() {    new Thread(new Runnable() {        @Override        public void run() {            try {                Log.e("ActivityThread", "start sleep, wait for fartthread start......");                Thread.sleep(1 * 60 * 1000);            } catch (InterruptedException e) {                e.printStackTrace();            }            Log.e("ActivityThread", "sleep over and start fartthread");            fart();            Log.e("ActivityThread", "fart run over");        }    }).start();}

In the fart function, it obtains the Classloader and uses reflection to obtain some classes. It reflects the dexElements field of dalvik.system.DexPathList to get the array of class objects of dalvik.system.DexPathList$Element, which stores the path and other information of the dex. Next, by traversing dexElements, it obtains each DexFile object in each Element object, then retrieves the mCookie field value from the DexFile object, and calls the String[] getClassNameList(Object cookie) function in the DexFile class, passing in the obtained mCookie to get all class names in the dex file. Then, it traverses all class names in the dex and passes them to the loadClassAndInvoke function.

public static void fart() {    ClassLoader appClassloader = getClassloader();    List<Object> dexFilesArray = new ArrayList<Object>();    Field pathList_Field = (Field) getClassField(appClassloader, "dalvik.system.BaseDexClassLoader", "pathList");    Object pathList_object = getFieldOjbect("dalvik.system.BaseDexClassLoader", appClassloader, "pathList");    Object[] ElementsArray = (Object[]) getFieldOjbect("dalvik.system.DexPathList", pathList_object, "dexElements");    Field dexFile_fileField = null;    try {        dexFile_fileField = (Field) getClassField(appClassloader, "dalvik.system.DexPathList$Element", "dexFile");    } catch (Exception e) {        e.printStackTrace();    }    Class DexFileClazz = null;    try {        DexFileClazz = appClassloader.loadClass("dalvik.system.DexFile");    } catch (Exception e) {        e.printStackTrace();    }    Method getClassNameList_method = null;    Method defineClass_method = null;    Method dumpDexFile_method = null;    Method dumpMethodCode_method = null;    for (Method field : DexFileClazz.getDeclaredMethods()) {        if (field.getName().equals("getClassNameList")) {            getClassNameList_method = field;            getClassNameList_method.setAccessible(true);        }        if (field.getName().equals("defineClassNative")) {            defineClass_method = field;            defineClass_method.setAccessible(true);        }        if (field.getName().equals("dumpMethodCode")) {            dumpMethodCode_method = field;            dumpMethodCode_method.setAccessible(true);        }    }    Field mCookiefield = getClassField(appClassloader, "dalvik.system.DexFile", "mCookie");    for (int j = 0; j < ElementsArray.length; j++) {        Object element = ElementsArray[j];        Object dexfile = null;        try {            dexfile = (Object) dexFile_fileField.get(element);        } catch (Exception e) {            e.printStackTrace();        }        if (dexfile == null) {            continue;        }        if (dexfile != null) {            dexFilesArray.add(dexfile);            Object mcookie = getClassFieldObject(appClassloader, "dalvik.system.DexFile", dexfile, "mCookie");            if (mcookie == null) {                continue;            }            String[] classnames = null;            try {                classnames = (String[]) getClassNameList_method.invoke(dexfile, mcookie);            } catch (Exception e) {                e.printStackTrace();                continue;            } catch (Error e) {                e.printStackTrace();                continue;            }            if (classnames != null) {                for (String eachclassname : classnames) {                    loadClassAndInvoke(appClassloader, eachclassname, dumpMethodCode_method);                }            }        }    }    return;}

The loadClassAndInvoke function takes in the class name mentioned above, as well as the ClassLoader object and the Method object of the dumpMethodCode function. As can be seen from the above code, the dumpMethodCode function comes from DexFile. The original DexFile class does not have this function; it is added by FART. What exactly does dumpMethodCode do? We will look at it later. First, let’s finish looking at the loadClassAndInvoke function. The loadClassAndInvoke function works quite simply: it loads the class based on the passed class name, then retrieves all its constructors and functions from the loaded class, and then calls dumpMethodCode, passing in the Constructor object or Method object.

public static void loadClassAndInvoke(ClassLoader appClassloader, String eachclassname, Method dumpMethodCode_method) {    Log.i("ActivityThread", "go into loadClassAndInvoke->" + "classname:" + eachclassname);    Class resultclass = null;    try {        resultclass = appClassloader.loadClass(eachclassname);    } catch (Exception e) {        e.printStackTrace();        return;    } catch (Error e) {        e.printStackTrace();        return;    }     if (resultclass != null) {        try {            Constructor<?> cons[] = resultclass.getDeclaredConstructors();            for (Constructor<?> constructor : cons) {                if (dumpMethodCode_method != null) {                    try {                        dumpMethodCode_method.invoke(null, constructor);                    } catch (Exception e) {                        e.printStackTrace();                        continue;                    } catch (Error e) {                        e.printStackTrace();                        continue;                    }                 } else {                    Log.e("ActivityThread", "dumpMethodCode_method is null ");                }            }        } catch (Exception e) {            e.printStackTrace();        } catch (Error e) {            e.printStackTrace();        }         try {            Method[] methods = resultclass.getDeclaredMethods();            if (methods != null) {                for (Method m : methods) {                    if (dumpMethodCode_method != null) {                        try {                           dumpMethodCode_method.invoke(null, m);                         } catch (Exception e) {                            e.printStackTrace();                            continue;                        } catch (Error e) {                            e.printStackTrace();                            continue;                        }                     } else {                        Log.e("ActivityThread", "dumpMethodCode_method is null ");                    }                }            }        } catch (Exception e) {            e.printStackTrace();        } catch (Error e) {            e.printStackTrace();        }     }

As mentioned earlier, the dumpMethodCode function is in the DexFile class, and the complete path of DexFile is: libcore\dalvik\src\main\java\dalvik\system\DexFile.java, and it is defined as follows:

private static native void dumpMethodCode(Object m);

As can be seen, it is a native method, and its actual code is in: art\runtime\native\dalvik_system_DexFile.cc, and the code is:

static void DexFile_dumpMethodCode(JNIEnv* env, jclass,jobject method) {ScopedFastNativeObjectAccess soa(env);  if(method!=nullptr)  {          ArtMethod* artmethod = ArtMethod::FromReflectedMethod(soa, method);          myfartInvoke(artmethod);      }      return;}

In the DexFile_dumpMethodCode function, the method is the java.lang.reflect.Method object passed from the loadClassAndInvoke function, which is then passed to the FromReflectedMethod function to get the ArtMethod structure pointer. This ArtMethod structure pointer is then passed to the myfartInvoke function.

The actual code for myfartInvoke is in the art/runtime/art_method.cc file.

extern "C" void myfartInvoke(ArtMethod * artmethod) SHARED_LOCKS_REQUIRED(Locks::mutator_lock_) {    JValue *result = nullptr;    Thread *self = nullptr;    uint32_t temp = 6;    uint32_t *args = &amp;temp;    uint32_t args_size = 6;    artmethod->Invoke(self, args, args_size, result, "fart");}

In the myfartInvoke function, the notable point is that self is set to a null pointer and passed to the Invoke function of ArtMethod.

The Invoke function is also in the art/runtime/art_method.cc file. At the beginning of the Invoke function, it checks the self parameter; if it is null, it indicates that the Invoke function is called by FART. Otherwise, it is called by the system itself. When self is null, it calls the dumpArtMethod function and immediately returns.

void ArtMethod::Invoke(Thread * self, uint32_t * args,               uint32_t args_size, JValue * result,               const char *shorty) {    if (self == nullptr) {        dumpArtMethod(this);        return;    }    ......    }

The dumpArtMethod function involves the code for dumping the dex.

extern "C" void dumpArtMethod(ArtMethod * artmethod) SHARED_LOCKS_REQUIRED(Locks::mutator_lock_) {    char *dexfilepath = (char *) malloc(sizeof(char) * 2000);    if (dexfilepath == nullptr) {        LOG(INFO) &lt;&lt;            "ArtMethod::dumpArtMethodinvoked,methodname:"            &lt;&lt; PrettyMethod(artmethod).            c_str() &lt;&lt; "malloc 2000 byte failed";        return;    }    int fcmdline = -1;    char szCmdline[64] = { 0 };    char szProcName[256] = { 0 };    int procid = getpid();    sprintf(szCmdline, "/proc/%d/cmdline", procid);    fcmdline = open(szCmdline, O_RDONLY, 0644);    if (fcmdline &gt; 0) {        read(fcmdline, szProcName, 256);        close(fcmdline);    }    if (szProcName[0]) {        const DexFile *dex_file = artmethod->GetDexFile();         const char *methodname =            PrettyMethod(artmethod).c_str();        const uint8_t *begin_ = dex_file->Begin();         size_t size_ = dex_file->Size();         memset(dexfilepath, 0, 2000);        int size_int_ = (int) size_;        memset(dexfilepath, 0, 2000);        sprintf(dexfilepath, "%s", "/sdcard/fart");        mkdir(dexfilepath, 0777);        memset(dexfilepath, 0, 2000);        sprintf(dexfilepath, "/sdcard/fart/%s",            szProcName);        mkdir(dexfilepath, 0777);        memset(dexfilepath, 0, 2000);        sprintf(dexfilepath,            "/sdcard/fart/%s/%d_dexfile.dex",            szProcName, size_int_);        int dexfilefp = open(dexfilepath, O_RDONLY, 0666);        if (dexfilefp &gt; 0) {            close(dexfilefp);            dexfilefp = 0;        } else {            dexfilefp =                open(dexfilepath, O_CREAT | O_RDWR,                 0666);            if (dexfilefp &gt; 0) {                write(dexfilefp, (void *) begin_,                      size_);                 fsync(dexfilefp);                close(dexfilefp);            }        }        // The second half starts        const DexFile::CodeItem * code_item =            artmethod->GetCodeItem(); // (1)        if (LIKELY(code_item != nullptr)) {            int code_item_len = 0;            uint8_t *item = (uint8_t *) code_item;            if (code_item->tries_size_ &gt; 0) { // (2)                const uint8_t *handler_data = (const uint8_t *) (DexFile::GetTryItems(*code_item,code_item->tries_size_));                uint8_t *tail = codeitem_end(&amp;handler_data);                code_item_len = (int)(tail - item);            } else {                code_item_len =                    16 +                    code_item->                    insns_size_in_code_units_ * 2;            }            memset(dexfilepath, 0, 2000);            int size_int = (int) dex_file->Size();    // Length of data            uint32_t method_idx =                artmethod->get_method_idx();            sprintf(dexfilepath,                "/sdcard/fart/%s/%d_%ld.bin",                szProcName, size_int, gettidv1());            int fp2 =                open(dexfilepath,                 O_CREAT | O_APPEND | O_RDWR,                 0666);            if (fp2 &gt; 0) {                lseek(fp2, 0, SEEK_END);                memset(dexfilepath, 0, 2000);                int offset = (int) (item - begin_);                sprintf(dexfilepath,                    "{name:%s,method_idx:%d,offset:%d,code_item_len:%d,ins:",                    methodname, method_idx,                    offset, code_item_len);                int contentlength = 0;                while (dexfilepath[contentlength]                       != 0)                    contentlength++;                write(fp2, (void *) dexfilepath,                      contentlength);                long outlen = 0;                char *base64result =                    base64_encode((char *) item,                          (long)                          code_item_len,                          &amp;outlen);                write(fp2, base64result, outlen);                write(fp2, "};", 2);                fsync(fp2);                close(fp2);                if (base64result != nullptr) {                    free(base64result);                    base64result = nullptr;                }            }        }    }    if (dexfilepath != nullptr) {        free(dexfilepath);        dexfilepath = nullptr;    }}

The dumpArtMethod function first reads the process name corresponding to the process pid through the virtual file /proc/<pid>/cmdline, and creates a directory under sdcard based on the obtained process name. Therefore, before unpacking, the APP must be granted permission to write to external storage. Then it obtains the DexFile pointer through the GetDexFile function of ArtMethod, which points to the dex where the ArtMethod is located. It then retrieves the starting address and size of the dex file in memory using the Begin and Size functions of DexFile, and uses the write function to write the dex in memory to a file named _dexfile.dex.

However, this function is not finished yet. The second half of the dumpArtMethod function dumps the CodeItem of the function. Some may wonder, didn’t the first half of the function already dump the dex? Why is it necessary to dump the CodeItem of the function? For some shells, the upper half of dumpArtMethod can already perform an overall dump of the dex, but for some extraction shells, even if the dex is dumped, the function body is filled with nops, i.e., an empty function body. FART also dumps the function’s CodeItem to allow users to manually repair these empty functions that have been dumped.

Let’s take a look at the second half of the dumpArtMethod function, which will involve the structure of the dex file. If you are not familiar with it, please refer to the documentation. The comment (1) indicates that it retrieves a CodeItem from ArtMethod. The comment (2) indicates that it calculates the size of the CodeItem based on the tries_size_, which is the number of try_items:

(1) If the tries size is not 0, it indicates that this CodeItem has try_items, so it calculates the ending address of the CodeItem.

const uint8_t *handler_data = (const uint8_t *) (DexFile::GetTryItems(*code_item,code_item->tries_size_));                uint8_t *tail = codeitem_end(&amp;handler_data);                code_item_len = (int)(tail - item);

How does the codeitem_end function calculate the ending address of the CodeItem?

GetTryItems’ second parameter is passed the tries size, which skips all try_item and gets the address of the encoded_catch_handler_list, which is then passed to the codeitem_end function.

uint8_t *codeitem_end(const uint8_t ** pData) {    uint32_t num_of_list = DecodeUnsignedLeb128(pData);    for (; num_of_list &gt; 0; num_of_list--) {        int32_t num_of_handlers =            DecodeSignedLeb128(pData);        int num = num_of_handlers;        if (num_of_handlers &lt;= 0) {            num = -num_of_handlers;        }        for (; num &gt; 0; num--) {            DecodeUnsignedLeb128(pData);            DecodeUnsignedLeb128(pData);        }        if (num_of_handlers &lt;= 0) {            DecodeUnsignedLeb128(pData);        }    }    return (uint8_t *) (*pData);}

The codeitem_end function starts by reading how many encoded_catch_handler structures are in the encoded_catch_handler_list. If it is not 0, it iterates through all encoded_catch_handler structures, reading how many encoded_type_addr_pair structures are in the encoded_catch_handler structure, skipping them all, effectively skipping the entire encoded_catch_handler_list structure. Finally, the function returns pData, which is the ending address of the CodeItem.

After calculating the ending address of the CodeItem, the true size of the CodeItem is obtained by subtracting the starting address of the CodeItem from its ending address.

(2) If the tries size is 0, then there are no try_items, and the size of the CodeItem can be calculated directly:

code_item_len = 16 + code_item->insns_size_in_code_units_ * 2;

Once the size of the CodeItem is calculated, several variables are printed in formatted manner to the dexfilepath:

sprintf(dexfilepath,    "{name:%s,method_idx:%d,offset:%d,code_item_len:%d,ins:",    methodname,     method_idx,    offset,     code_item_len);

methodname is the name of the function.

method_idx comes from the function added by FART: `uint32_t get_method_idx(){ return dex_method_index; }, which returns dex_method_index_, the index of the function in method_ids.

offset is the offset of the function’s CodeItem relative to the start of the dex file.

code_item_len is the length of the CodeItem.

Once the data is assembled, it is written to a file with a suffix of .bin:

write(fp2, (void *) dexfilepath,        contentlength);long outlen = 0;char *base64result =    base64_encode((char *) item,            (long)            code_item_len,            &amp;outlen);write(fp2, base64result, outlen);write(fp2, "};", 2);fsync(fp2);close(fp2);if (base64result != nullptr) {free(base64result);base64result = nullptr;}}

For the above dexfilepath, they are plaintext characters, which can be written directly. However, for the bytecode in the CodeItem, which is non-plaintext, writing it directly would not be visually appealing, so FART chooses to base64 encode them before writing.

At this point, the analysis seems to be over. From active invocation to overall dex dump, and then to the dump of the function’s CodeItem, everything has been analyzed. However, there are indeed still some parts of the logic in FART that have not been analyzed. If you have used FART to unpack, you will find that it also dumps a dex file ending with _execute.dex. How is this dex generated?

This part of the code is also in the art/runtime/art_method.cc file.

extern "C" void dumpDexFileByExecute(ArtMethod * artmethod)     SHARED_LOCKS_REQUIRED(Locks::mutator_lock_) {        char *dexfilepath = (char *) malloc(sizeof(char) * 2000);        if (dexfilepath == nullptr) {            LOG(INFO) &lt;&lt;                "ArtMethod::dumpDexFileByExecute,methodname:"                &lt;&lt; PrettyMethod(artmethod).                c_str() &lt;&lt; "malloc 2000 byte failed";            return;        }        int fcmdline = -1;        char szCmdline[64] = { 0 };        char szProcName[256] = { 0 };        int procid = getpid();        sprintf(szCmdline, "/proc/%d/cmdline", procid);        fcmdline = open(szCmdline, O_RDONLY, 0644);        if (fcmdline &gt; 0) {            read(fcmdline, szProcName, 256);            close(fcmdline);        }        if (szProcName[0]) {            const DexFile *dex_file = artmethod->GetDexFile();            const uint8_t *begin_ = dex_file->Begin();    // Start of data.            size_t size_ = dex_file->Size();    // Length of data.            memset(dexfilepath, 0, 2000);            int size_int_ = (int) size_;            memset(dexfilepath, 0, 2000);            sprintf(dexfilepath, "%s", "/sdcard/fart");            mkdir(dexfilepath, 0777);            memset(dexfilepath, 0, 2000);            sprintf(dexfilepath, "/sdcard/fart/%s",                szProcName);            mkdir(dexfilepath, 0777);            memset(dexfilepath, 0, 2000);            sprintf(dexfilepath,                "/sdcard/fart/%s/%d_dexfile_execute.dex",                szProcName, size_int_);            int dexfilefp = open(dexfilepath, O_RDONLY, 0666);            if (dexfilefp &gt; 0) {                close(dexfilefp);                dexfilefp = 0;            } else {                dexfilefp =                    open(dexfilepath, O_CREAT | O_RDWR,                     0666);                if (dexfilefp &gt; 0) {                    write(dexfilefp, (void *) begin_,                          size_);                    fsync(dexfilefp);                    close(dexfilefp);                }            }        }        if (dexfilepath != nullptr) {            free(dexfilepath);            dexfilepath = nullptr;        }    }

As can be seen, the dumpDexFileByExecute function is somewhat similar to the upper half of dumpArtMethod, i.e., it performs an overall dump of the dex file. So where is dumpDexFileByExecute called?

By searching, I found that in the art/runtime/interpreter/interpreter.cc file, FART defines a dumpDexFileByExecute function under the art namespace.

namespace art {extern "C" void dumpDexFileByExecute(ArtMethod* artmethod);namespace interpreter {        ......    }}

Meanwhile, I found a call to the dumpDexFileByExecute function in the file:

static inline JValue Execute(Thread* self, const DexFile::CodeItem* code_item,                             ShadowFrame&amp; shadow_frame, JValue result_register) {   if(strstr(PrettyMethod(shadow_frame.GetMethod()).c_str(),"&lt;clinit&gt;")!=nullptr)  {      dumpDexFileByExecute(shadow_frame.GetMethod());  }  ......}

In the Execute function, it determines whether to call dumpDexFileByExecute based on whether the function name contains <clinit>, i.e., whether it is a static block. If it exists, it calls the dumpDexFileByExecute function and passes an ArtMethod pointer to it.

The dumpDexFileByExecute function performs an overall dump of the dex, which can be considered a complement to dumpArtMethod; sometimes, if dumpArtMethod does not yield the desired dex, using dumpDexFileByExecute might provide a pleasant surprise.

3. Conclusion

Thank you very much to the author of FART for open-sourcing FART, which has provided a good idea for people to combat app shells in the ART environment. The FART unpacking machine can theoretically unpack most shells, but there are still exceptions that need to be explored independently.

4. References

https://bbs.pediy.com/thread-252630.htm

https://source.android.google.cn/devices/tech/dalvik/dex-format

(please click “Read Original” to view the link)

– End –

hTPM: A Hybrid Implementation of Trusted Platform Module

Using Oracle VirtualBox to Achieve Virtual Machine Escape

Microsoft Exposes “Ping of Death” Vulnerability Again

Unexpected Gains from Special Operations – Analysis Report on the Mozi Zombie Network in September 2020

Analysis of the Android FART Unpacking Process

Click “Read Original” for more content

hTPM: A Hybrid Implementation of Trusted Platform Module

Using Oracle VirtualBox to Achieve Virtual Machine Escape

Microsoft Exposes “Ping of Death” Vulnerability Again

Unexpected Gains from Special Operations – Analysis Report on the Mozi Zombie Network in September 2020

Leave a Comment Cancel reply