Java Edge AI Inference: Deploying TensorFlow Lite on Raspberry Pi

Click the blue text to follow us

Java Edge AI Inference: Deploying TensorFlow Lite on Raspberry Pi

To be honest, when I first encountered edge AI, I completely went in the wrong direction. I thought that simply shrinking the model would allow it to run, but I ended up hitting a lot of pitfalls. At first, I was dazzled by those flashy Python demos, but as an old Java programmer, I still wanted to solve the problem using Java, as it is my comfort zone.

Last year, we worked on a factory equipment anomaly detection project where the client required the AI model to run directly on-site on a Raspberry Pi, without allowing data to be uploaded to the cloud (due to privacy policy restrictions). This forced me to study the TensorFlow Lite Java API. To be honest, the official documentation reads like a foreign language, and many examples are in Python and C++, with very few Java examples available.

Why Choose TensorFlow Lite Over Other Frameworks?

I personally think that TensorFlow Lite is currently the most reliable choice for edge devices, especially for us Java developers. While DL4J is a pure Java deep learning framework, its performance on resource-constrained devices is significantly poorer; PyTorch Mobile’s Java support was not very mature at the time (it might be better now, but I’m not sure).

The TFLite model files (.tflite) are small in size, consume less memory at runtime, and are optimized for ARM processors, making them suitable for devices like the Raspberry Pi. However, the catch is that you must first train the model on your computer, convert it to the tflite format, and then deploy it to the Raspberry Pi.

Environment Configuration

First, you need to ensure that the Raspberry Pi has the appropriate version of the JDK. We used the Raspberry Pi 4B with 4GB of RAM, installed a 64-bit system (this is very important!) and OpenJDK 11. Why not use Java 17? To be honest, there were some legacy libraries in the project that were not compatible with 17, so we compromised.

Next, you need to add the TensorFlow Java dependency. If you are using Maven:

<dependency>
    <groupId>org.tensorflow</groupId>
    <artifactId>tensorflow-lite</artifactId>
    <version>2.9.0</version>
</dependency>
// TODO: Upgrade to the latest version, this one is a bit outdated

But there is a big pitfall here! The default dependency does not include the native libraries for the ARM architecture; you need to specifically find the version that includes arm64-v8a. I spent a whole day figuring out that the official Java library is actually incomplete, and in the end, I compiled the native library myself to get it working. However, it seems that there is now an AAR package that can be used directly, but I don’t remember the details clearly.

Loading the Model Pitfalls

Next is the most critical part: loading the tflite model and executing inference. The basic code looks like this:

// Assume the model file is already placed in the resources directory
try (Interpreter interpreter = new Interpreter(loadModelFile())) {
    float[][] input = new float[1][inputSize];
    // Fill the input array...
    float[][] output = new float[1][outputSize];
    interpreter.run(input, output);
    // Process the output results...
} catch (IOException e) {
    // Exception handling...
}
// FIXME: There is a risk of memory leak, remember to close

It sounds pretty simple, but don’t be fooled by appearances. There are several pitfalls that I initially overlooked:

First, the model loading path issue. If you are using a Java application packaged as a jar, don’t assume you can read it directly with File; the resource path will cause issues. We ended up using this solution:

private MappedByteBuffer loadModelFile() throws IOException {
    File file = new File("models/my_model.tflite"); // External path is more controllable
    FileInputStream inputStream = new FileInputStream(file);
    FileChannel fileChannel = inputStream.getChannel();
    return fileChannel.map(FileChannel.MapMode.READ_ONLY, 0, fileChannel.size());
}

The second pitfall is data preprocessing. If your model was trained using Python (which is 99% of the time), the input format is likely different from what you expect. For example, our image classification model requires the input to be normalized to a float between [-1,1], rather than the integers [0,255] that I initially thought. This detail is hardly mentioned in the official documentation, which caused me to debug for several days.

Performance Issues and Optimization

On devices like the Raspberry Pi, performance is a significant issue. Initially, our model inference took over 3 seconds, which completely failed to meet real-time requirements. Later, we made several optimizations:

Enable GPU delegation (but be careful, not all operations support GPU)
Model quantization (converting floats to integers can speed it up by 3-4 times)
Batch processing instead of inferring one image at a time

The key is to choose the right number of threads. Initially, we thought with a 4-core Raspberry Pi, we should set 4 threads, but it turned out to be slower because resource contention caused overhead. In the end, setting it to 2 threads yielded the best results.

Additionally, we found that some image preprocessing operations implemented in Java were particularly slow, so we switched to using JNI to call the C++ implementation of OpenCV, which improved the speed by more than 5 times. To be honest, for some compute-intensive tasks, Java is not the best choice.

Actual Deployment and Maintenance

By the way, there’s also the deployment method. We ended up packaging it into a fat jar and managing the process through a systemd service to ensure it runs automatically after a system reboot.

Monitoring is also very important. The Raspberry Pi has limited resources, and once there is a memory leak or sustained high CPU load, it can easily crash. We wrote a simple JMX monitoring tool, combined with Prometheus and Grafana for remote monitoring.

After using it for a while, we found that temperature was a significant issue! The Raspberry Pi can easily overheat under high load, so we had to add a small fan and implement a simple load control in the software to reduce the inference frequency when the temperature exceeds 70°C.

In Summary

To be honest, using Java for AI inference on the Raspberry Pi is not a particularly mainstream choice; most people still use Python. However, for teams that already have a Java tech stack, this is indeed a viable solution.

Officially, the recommended solution is Python, but I have become accustomed to the JVM ecosystem over the years, and I personally find it more convenient. Moreover, Java’s type safety and engineering capabilities are indeed appealing in large projects, making it more comfortable than Python’s frequent TypeErrors.

Finally, if you really want to use this solution in a production environment, it’s best to consider backup and remote update mechanisms. At that time, to update the model, we specifically wrote a small service that could remotely push new models to the device, saving the need to send someone to the site each time.

If you have better Java edge AI solutions, feel free to let me know; I am also learning. After all, this field is developing too quickly, and there might be better frameworks available tomorrow.

If you think this is good, please give me a thumbs up!