Machine Heart reports
Stable Diffusion can now run on Raspberry Pi!
-
Decoupling inference engine from WeightsProvider -
WeightsProvider can be DiskNoCache, DiskPrefetch, or custom -
Attention slicing -
Dynamic quantization (8 bit unsigned, asymmetric, percentiles) -
Static quantization (W8A8 unsigned, asymmetric, percentiles) -
Easy calibration of quantized models -
Supports FP16 (with or without FP16 arithmetic) -
Implemented 24 ONNX operators (most commonly used operators) -
Operations are executed sequentially, but all operators are multithreaded -
Single implementation file + header file -
XNNPACK calls are encapsulated in the XnnPack class (for future replacement)
-
The first run of OnnxRuntime is a warm-up inference because its InferenceSession is created before the first run and reused in all subsequent runs. OnnxStream, however, does not have a warm-up inference because its design is purely ‘eager’ (though subsequent runs can benefit from the operating system’s caching of weight files). -
Currently, OnnxStream does not support batch size != 1 input, which differs from OnnxRuntime, which can significantly speed up the whole diffusion process when running the UNET model with batch size = 2. -
In the test, changing OnnxRuntime’s SessionOptions (such as EnableCpuMemArena and ExecutionMode) had no significant impact on the results. -
In terms of memory consumption and inference time, the performance of OnnxRuntime is very similar to that of NCNN (another framework). -
Test running conditions: Windows Server 2019, 16GB RAM, 8750H CPU (AVX2), 970 EVO Plus SSD, 8 virtual cores on VMWare.
© THE END
For reprints, please contact this public account for authorization
Submissions or inquiries: [email protected]
Leave a Comment
Your email address will not be published. Required fields are marked *