Harnessing Raspberry Pi for AI Models: Running Phi-2, Mistral, and LLaVA

Have you ever thought about running your own large language models (LLMs) or vision-language models (VLMs) on your own device? You might have had such thoughts, but when it comes to setting up from scratch, managing the environment, downloading the right model weights, and whether your device can handle the uncertainty of these models, you might hesitate.

Let’s take it a step further. Imagine running your own LLM or VLM on a credit card-sized device — like a Raspberry Pi. Impossible? Not at all. After all, I am writing this post, so it is definitely possible.

It is indeed possible, but why do it?

Currently, running LLMs on edge devices seems a bit far-fetched. However, this specific niche use case should mature over time, and we will definitely see some cool edge solutions that utilize fully local generative AI solutions running on edge devices.

It is also about exploring the limits of possibility. If it can be achieved at this extreme of computational scale, then it can be achieved at any level between Raspberry Pi and large powerful server GPUs.

Traditionally, edge AI has been closely linked to computer vision. Exploring the deployment of LLMs and VLMs at the edge adds an exciting dimension to this emerging field.

Most importantly, I just want to do something fun with my recently purchased Raspberry Pi 5.

So, how do we achieve all this on Raspberry Pi? Using Ollama!

What is Ollama?

https://ollama.com/

Ollama has become one of the best solutions for running local LLMs on personal computers without the hassle of setting up from scratch. With just a few commands, everything can be set up without issues. In my experience, it is completely self-sufficient and works perfectly across multiple devices and models. It even provides a REST API for model inference, allowing you to run it on your Raspberry Pi and call it from your other applications and devices (if you wish).

Harnessing Raspberry Pi for AI Models: Running Phi-2, Mistral, and LLaVA

There is also the Ollama Web UI, a beautiful AI user interface (UI)/user experience (UX) that runs seamlessly with Ollama, perfect for those who are uncomfortable with command line interfaces. Essentially, it is a local ChatGPT interface.

Ollama Web UI:https://github.com/open-webui/open-webui

These two pieces of open-source software together provide what I believe is currently the best local hosting LLM experience.

Ollama and Ollama Web UI also support VLMs like LLaVA, opening up more doors for edge generative AI use cases.

Technical Requirements

You only need the following devices:

Raspberry Pi 5 (or a slower Raspberry Pi 4) — choose the 8GB RAM version to accommodate 7B models.
SD card — at least 16GB; the larger the capacity, the more models you can accommodate. Preloaded with a suitable operating system like Raspbian Bookworm or Ubuntu.
Internet connection

As I mentioned earlier, running Ollama on a Raspberry Pi is already at the extreme end of the hardware spectrum. Theoretically, any device more powerful than a Raspberry Pi (as long as it runs a Linux distribution and has similar memory capacity) should be able to run Ollama and the models discussed in this article.

1. Installing Ollama

To install Ollama on a Raspberry Pi, we will avoid using Docker to save resources.

Run in the terminal

curl https://ollama.ai/install.sh | sh

After running the command above, you should see an image similar to the one below.

As the output says, go to 0.0.0.0:11434 to verify that Ollama is running. Since we are using a Raspberry Pi, it is normal to see “WARNING: No NVIDIA GPU detected. Ollama will run in CPU-only mode.” However, if you see this in a device that should have an NVIDIA GPU, then there might be an issue.

If you encounter any issues or need updates, please refer to the Ollama GitHub repository.

Ollama GitHub repository:https://github.com/ollama/ollama/tree/main

2. Running LLMs from the Command Line

Check the official Ollama model library for a list of models that can be run with Ollama. On an 8GB Raspberry Pi, models larger than 7B will not fit. Let’s use Phi-2, a 2.7B LLM launched by Microsoft, now under MIT license.

library:https://ollama.com/library

We will use the default Phi-2 model, but you can freely use other tags found here (https://ollama.com/library/phi/tags) as well. Check the model page for Phi-2 (https://ollama.com/library/phi) to learn how to interact with it.

Run in the terminal

ollama run phi

Once you see output similar to the one below, you have successfully run an LLM on your Raspberry Pi! It’s that simple.

Here is an interaction with Phi-2 2.7B. Obviously, you won’t get the same output, but you should get the idea.

You can try other models like Mistral, Llama-2, etc., just make sure you have enough space on the SD card to store the model weights.

Of course, the larger the model, the slower the output speed. On Phi-2 2.7B, I can get about 4 tokens per second. However, using Mistral 7B, the generation speed drops to about 2 tokens per second. One token roughly corresponds to one word.

Here is an interaction with Mistral 7B

Now that we have run LLMs on Raspberry Pi, we are not done yet. The terminal is not for everyone. Let’s also get Ollama Web UI running!

3. Installing and Running Ollama Web UI

We will follow the instructions on the official Ollama Web UI GitHub repository (https://github.com/open-webui/open-webui) for installation without using Docker. It recommends Node.js version at least 20.10, so we will follow that recommendation. It also suggests having Python version at least 3.11, but Raspbian OS has that version preinstalled for us.

First, we need to install Node.js. Run in the terminal

curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash - && sudo apt-get install -y nodejs

For future readers, you can change 20.x to a more appropriate version if needed.

Then run the code block below.

git clone https://github.com/ollama-webui/ollama-webui.gitcd ollama-webui/ # Copying required .env filecp -RPp example.env .env # Building Frontend Using Nodenpm inpm run build # Serving Frontend with the Backendcd ./backendpip install -r requirements.txt --break-system-packages sh start.sh

This is a slight modification of what is provided on GitHub. Please note that for brevity and convenience, we did not follow best practices like using a virtual environment, and we used the –break-system-packages flag. If you encounter errors like not finding uvicorn, please restart the terminal session.

If all goes well, you should be able to access the Ollama Web UI at http://0.0.0.0:8080 on your Raspberry Pi or at http://:8080/ from another device on the same network.

After creating an account and logging in, you should see an image similar to the one below.

If you previously downloaded some model weights, you should see them in the dropdown menu below. If not, you can go to settings to download models.

Possible models will appear here

If you want to download new models, go to Settings > Models to pull models

The entire interface is very clean and intuitive, so I won’t elaborate too much. It is indeed an outstanding open-source project.

Here is an example of interaction with Mistral 7B through Ollama Web UI

4. Running Vision-Language Models (VLMs) through Ollama Web UI

As I mentioned at the beginning of this article, we can also run VLMs. Let’s run a popular open-source VLM — LLaVA, which happens to be supported by Ollama as well. To do this, pull “llava” through the interface to download its weights.

Unfortunately, unlike large language models (LLMs), the setup on Raspberry Pi takes quite a long time to interpret images. The example below takes about 6 minutes to process. Most of the time might be due to the fact that image processing has not yet been properly optimized, but this situation will surely change in the future. The token generation speed is about 2 tokens per second.

Conclusion

At this point, we have basically achieved the goals of this article. To recap, we have successfully run LLMs and VLMs such as Phi-2, Mistral, and LLaVA on Raspberry Pi using Ollama and Ollama Web UI.

I can easily imagine several use cases for hosting local LLMs on Raspberry Pi (or other small edge devices), especially if we use models like Phi-2, where a speed of 4 tokens per second seems acceptable for streaming in some use cases.

The field of “small” LLMs and VLMs (considering its designation as “large,” this name is somewhat contradictory) is an active research area, and quite a few models have been released recently. I hope this emerging trend continues and more efficient and compact models are released! This is undoubtedly worth watching in the coming months.

Disclaimer: The author has no affiliation with Ollama or Ollama Web UI. All opinions are the author’s personal views and do not represent any organization.

If you want to consult about Raspberry Pi standard products and Raspberry Pi industrial products, feel free to contact us!

1. Scan the code to add Engineer Yang for consultation.

2. Send us your contact information via private message, and we will get in touch with you as soon as possible.

3. Shanghai Jingheng’s official website for Raspberry Pi distributors:https://www.edatec.cn/cn

We will update regularly!

Follow Raspberry Pi developers!

Learn more about Raspberry Pi related content!