Ultra Lightweight! AI Model Compiler MegCC Open Source Reduces Inference Engine Size

Currently, there are quite a few mobile deep learning inference frameworks in the community (such as NCNN, MNN), which provide considerable convenience for community users to deploy deep learning on mobile devices. However, these inference frameworks share a common problem: as they continue to iterate and optimize performance, the runtime libraries gradually increase in size. Especially when different operators are fused, it leads to a large number of long-tail operators, making the App or SDK size bloated.

To address this issue, the MegEngine team has open-sourced MegCC, which innovatively uses a model pre-compilation scheme to generate the necessary code for model inference while removing code unrelated to model inference, thus significantly reducing the size of the inference engine. Its main approach isto move all necessary steps of traditional framework runtime, such as computation graph optimization, kernel selection, and memory allocation, into the compilation process, thereby minimizing the binary size during runtime and further optimizing performance based on model information.

Ultra Lightweight! AI Model Compiler MegCC Open Source Reduces Inference Engine Size
GitHub open-source address: https://github.com/MegEngine/MegCC

Features of the Scheme

  • The size of the inference engine will no longer increase with the iteration of the framework.
  • Operator fusion can generate corresponding code at compile time based on model information.
  • The entire computation graph information can be obtained during model compilation for further extreme performance optimization.
  • Can absorb community experiences in code generation to help generate code for MegCC.
Unlike traditional inference frameworks, MegCC is a true deep learning model compiler, featuringextremely lightweight runtime binary size, high performance, easy portability, extremely low memory usage, and fast startup among its core features. Users can perform computation graph optimization and memory planning on MLIR, and finally generate code through pre-written code templates.Currently, MegCC supports Arm64, Armv7, x86, RISC-V, and microcontroller platforms.
Ultra Lightweight! AI Model Compiler MegCC Open Source Reduces Inference Engine Size
MegCC Architecture

Usage Method and Effects

To complete model deployment using MegCC, only the following 3 steps are required:
  1. Model Compilation: Compile the MegEngine model to generate the corresponding kernel and the optimized model.
  2. Runtime Compilation: This stage compiles the runtime and the kernel generated in the previous step into a static library.
  3. Integration into the Application: Call the interface of the static library compiled in the previous step for inference.
For detailed instructions, see: https://github.com/MegEngine/MegCC/blob/main/doc/how-to-use-chinese.md
Taking the YOLOX model as an example, the running effect is shown below:

Ultra Lightweight! AI Model Compiler MegCC Open Source Reduces Inference Engine Size

As seen in the image, the inference program generated by MegCC maintains good inference performance (model speed test result is 670ms) while its size can be as small as 95KB.

Future Plans

Currently, MegCC only supports MegEngine models as input. Other model formats can be considered for conversion to ONNX, and then converted to model format using mgeconvert.
It is expected that within the next 2 months, MegCC will support more model formats for compilation. Additionally, the following advanced features are planned:
  • Support ONNX models as input
  • More kernel fusion
  • Support more backend devices
If you have any questions while using MegCC, feel free to raise an issue to let us know, and you are also welcome to submit PRs to help make MegCC better.

Exciting Preview

Chen Qiyou, the person in charge of MegEngine edge-side inference, will participate in the AI Basic Software Architecture Summit “Deep Learning Framework Forum” held during DataFunSummit2022 on November 19, 2022, and present a keynote speech titled “Achieving Ultra Lightweight High-Performance Inference on Edge with Model Compilation via MegCC”. This will cover an analysis of the current state of edge inference, especially regarding the size of inference engines, and a detailed explanation of the innovative use of model pre-compilation schemes in MegCC, showcasing the charm of the next generation AI model compiler MegCC.

Speech Outline:

1. Overview of the current state of edge inference, mainly focusing on the size of inference engines

2. Introduction of MegCC’s compiler scheme

3. Sharing the realization ideas of characteristics such as “ultra-lightweight, high performance, and strong scalability”

4. Summary of MegCC’s current status, advantages, and future plans

Don’t miss out on this exciting event, and we look forward to seeing you at the online live broadcast~

Click to read the original text at the end of the article and sign up immediately

Leave a Comment