Are you eager to give your ESP32 more powerful functions? Do you want it to understand your voice commands and engage in natural, smooth conversations? The ESP32_AI_LLM project was born for this purpose! It utilizes the ESP32 chip, combined with iFLYTEK’s Spark model, Doubao model (streaming invocation), and Tongyi Qianwen model, to achieve voice dialogue chat functionality. It supports online voice wake-up, continuous conversation, music playback, and even connects to an external display to show conversation content in real-time.
Project Features
-
• Multi-Model Support: You can choose from iFLYTEK’s Spark model, Doubao model, and Tongyi Qianwen model to meet different needs.
-
• Voice Interaction: Supports online voice wake-up, recognizes user voice commands, and engages in natural, smooth conversations.
-
• Continuous Conversation: Supports continuous conversation functionality, making dialogues more natural and human-like.
-
• Music Playback: Supports music playback functionality, allowing you to enjoy music anytime, anywhere.
-
• Screen Display: Connects to an external display to show conversation content in real-time, making it easier for users to follow the conversation process.
Function Overview
1. Voice Wake-Up Function
After the device starts and connects to the internet, it will enter standby mode, activating recording and connecting to iFLYTEK’s STT service for wake word recognition. You simply need to say the set wake word to wake up the ESP32 and start the conversation.
2. Voice Dialogue Function
By using voice wake-up or pressing the onboard boot button, you can start a conversation. The ESP32 will send your voice input to iFLYTEK’s STT service for voice recognition, then send the recognized result to the selected large model, receive the model’s reply, and finally convert the reply content into speech for playback.
3. Convenient Network Configuration Function
The ESP32 supports automatic network connection. If it fails to connect, the ESP32 will start AP mode, creating a temporary network hotspot, making it easy for you to configure the network via your phone or computer.
4. Music Playback Function
The project utilizes NetEase Cloud’s music server to play your favorite music. You can add and delete music information through the configuration web interface.
5. Volume Adjustment and Light Switch Function
Through voice commands, you can adjust the volume and display, as well as turn the LED lights on and off.
6. Music Pause and Resume Playback Commands
You can use voice commands to pause and resume music playback.
7. Large Model Switching Function Commands
You can switch between Doubao, Spark, and Tongyi Qianwen large models using voice commands.
Project Deployment Tutorial
-
1. Download VSCode and PlatformIO plugin
-
2. Enable iFLYTEK related services (optional: enable Doubao large model service)
-
3. Clone the project to your local machine, open the entire folder in VSCode, and wait for the dependency libraries to download
-
4. Find the User_Setup.h file in the .pio\libdeps\upesy_wroom\TFT_eSPI path, delete it, and then cut and paste the User_Setup.h file from the root directory there
-
5. Fill in the required iFLYTEK account parameters in main.cpp (optional: fill in Doubao large model parameters)
-
6. Install the ESP32 driver
-
7. Compile and upload
Conclusion
The ESP32_AI_LLM project empowers the ESP32 with powerful conversational abilities, allowing it to understand your voice commands and engage in natural, smooth conversations, as well as play music and adjust volume. This project is easy to deploy, feature-rich, and suitable for various application scenarios such as smart homes, robot control, etc.
Project Address: https://github.com/Explorerlowi/ESP32_AI_LLM