Mastering Voice Recognition on Raspberry Pi: A Simple Guide

Author: Digi-Key’s North American Editors

Voice assistants have quickly become an important product feature, thanks to popular smart voice-based products like Amazon Echo and Google Home. While voice service providers offer developers API support——so they do not have to become experts in voice recognition and parsing details——the requirement of combining audio hardware and voice processing software remains a significant hurdle.

Additionally, the detailed work related to each discipline, in acoustic design, audio engineering, and cloud-based serviceslacksrich experience projects,may face serious delaysissues.

To address these issues, vendors provide complete voice assistant development kits to significantly simplify the problem. This article will introduce two such kits, one fromXMOS, and another fromSeeed Technology, which respectively enable rapid development of customized products based on Amazon Alexa Voice Service (AVS) andGoogle Assistant. These circuit boards can connect with theRaspberry Foundation‘sRaspberry Pi 3(RPi 3) circuit board.

This article will show how to get each kit up and running and demonstrate how each kit can leverage voice assistant technology at any time.

Quickly Build AVS Prototype

Amazon has launched the Alexa smart speaker, a home product that provides smart voice assistant capabilities, but these features were largely limited to smartphones in the past. For developers, the release of the AVS API opened the door to using the same voice assistant functionalities in custom system designs, but it still requires rich expertise in audio hardware and software. Now, with the launch of the XMOS xCORE VocalFusion 4-Mic Kit for Amazon Alexa Voice Service (AVS), the last puzzle of implementing voice assistant functionality has been solved.

The XMOS kit includes an XVF3000 processor board, a 100mm linear array composed of fourInfineon IM69D130 MEMS microphones,xTAG debugger, installation kit, and cables. Developers need to provide active speakers, USB power, and USB keyboard, mouse, monitor, and internet connection for the RPi 3. After using the installation kit to connect the XMOS board and microphone array to the RPi 3, developers can quickly evaluate the Amazon Alexa voice assistant (Figure 1).

Mastering Voice Recognition on Raspberry Pi: A Simple Guide

Figure 1: Developers start working with the XMOS xCORE VocalFusion kit by inserting the provided microphone array board (far left) and XMOS processor board (middle) into the Raspberry Pi 3 board (right). (Image source: XMOS)

After connecting the RPi 3 to the USB keyboard, mouse, monitor, and internet service, the next step is to install the Raspbian operating system from an SD micro card, open the terminal on the RPi 3, and clone theXMOS VocalFusion repository. After installing the operating system and repository, simply run theauto_install.sh located in the clonedvocalfusion-avs-setup directory.

The installation script will configure the Raspberry Pi audio system and its connection to the xCORE VocalFusion kit, and install and configure the AVS Device SDK on the Raspberry Pi. This installation process may take about two hours to complete.

Once the installation is complete, developers need to perform a simple process to load their Amazon developer credentials and then start testing a variety of voice commands and built-in features. At this point, the XMOS kit will be able to demonstrate the full capabilities of Alexa, such as timers, alarms, and calendars, as well as third-party features built using the Alexa Skills Kit.

AVS Design Kit Unveiled

While the setup steps are simple, the functionality of the hardware and software components in the XMOS kit is quite complex. The kit provides developers with a comprehensive reference design for implementing custom designs. At the core of the XMOS kit is the XMOS XVF3000TQ128 device, which offers high processing power (Figure 2).

Mastering Voice Recognition on Raspberry Pi: A Simple Guide

Figure 2: The XMOS XVF3000-TQ128 device integrates two xCORE Tiles, each containing eight cores for high-performance audio processing. (Image source: XMOS)

This device is built for parallel processing tasks and contains two xCORE Tiles, each with eight 32-bit xCORE cores with integrated I/O, 256KB of SRAM, and 8KB of one-time programmable (OTP) on-chip memory. The xTIME scheduler manages the cores and triggers core operations from hardware events originating from I/O pins. Each core can independently execute computation, signal processing, or control tasks, leveraging the integrated 2MB flash memory in the xCORE VocalFusion kit, as well as the code and data for setting up and executing the kit.

In addition to the XVF3000-TQ128 device, the XMOS processor board also requires a few additional components (Figure 3). In addition to basic buffers and socket connections, the board includesCirrus Logic CS43L21 digital-to-analog converter (DAC) for generating output audio for external speakers. Finally, the baseboard also brings out the XVF3000-TQ128 device’s I2C port, as well as an audio-optimized I2S digital audio interface.

Mastering Voice Recognition on Raspberry Pi: A Simple Guide

Figure 3: The XMOS kit’s baseboard includes the XVF3000-TQ128 device, DAC, buffers, and sockets for connecting to the Raspberry Pi 3 board and external speakers. (Image source: XMOS)

The overall functionality of the kit is divided into two parts: audio processing on the XMOS board and advanced voice processing services on the RPi 3 (Figure 4). The RPi’sBroadcom quad-core processor runs software used to analyze audio streams, perform wake word recognition, and handle interactions with Amazon AVS.

Mastering Voice Recognition on Raspberry Pi: A Simple Guide

Figure 4: The XMOS VocalFusion kit separates the Alexa functionality on the baseboard and the Raspberry Pi 3 board, with the former used for audio signal processing and the latter for voice recognition and higher-level Alexa services. (Image source: XMOS)

The software installation process configures these subsystems and loads the required software packages, including Sensory’s speaker-independent wake word engine and AVS client software.

AVS provides a range of interfaces related to advanced functionalities like voice recognition, audio playback, and volume control. Operations are conducted through messages (commands) from AVS and messages (events) from the client. For example, in response to certain conditions, AVS may send a directive to the client indicating that the client should play audio, set alarms, or turn on lights. Conversely, events from the client can notify AVS that certain events have occurred, such as a new voice request from the user.

Developers can use the AVS Device Software Development Kit (SDK) API and C++ software library to extend the functionality of their XMOS kit or custom XMOS designs. The AVS Device SDK extracts low-level operations, such as audio input processing, communication, and AVS command management, into a series of separate C++ classes and objects that developers can use or extend for custom applications (Figure 5).

Mastering Voice Recognition on Raspberry Pi: A Simple Guide

Figure 5: The Amazon AVS Device SDK organizes the extensive functionality of AVS into separate functional areas, each with its own interface and library. (Image source: AWS)

The complete sample application included in the AVS Device SDK demonstrates key design patterns, including creating device clients and wake word interaction managers (List 1). In addition to the full set of sample service routines, this application shows how the main program only needs to instantiate the sample application objectsampleApplication and start it with a simple command:sampleApplication->run().

/*

* Creating the DefaultClient – this component serves as an out-of-box default object that instantiates and “glues”

* together all the modules.

*/

std::shared_ptr<alexaClientSDK::defaultClient::DefaultClient> client =

alexaClientSDK::defaultClient::DefaultClient::create(

m_speakMediaPlayer,

m_audioMediaPlayer,

m_alertsMediaPlayer,

speakSpeaker,

audioSpeaker,

alertsSpeaker,

audioFactory,

authDelegate,

alertStorage,

settingsStorage,

{userInterfaceManager},

{connectionObserver, userInterfaceManager});

// If wake word is enabled, then creating the interaction manager with a wake word audio provider.

auto interactionManager = std::make_shared<alexaClientSDK::sampleApp::InteractionManager>(

client,

micWrapper,

userInterfaceManager,

holdToTalkAudioProvider,

tapToTalkAudioProvider,

wakeWordAudioProvider);

client->addAlexaDialogStateObserver(interactionManager);

// Creating the input observer.

m_userInputManager = alexaClientSDK::sampleApp::UserInputManager::create(interactionManager);

void SampleApplication::run() {

m_userInputManager->run();

}

List 1: Developers can use the AVS Device SDK C++ sample application to extend the device AVS client, demonstrating key design patterns for creating AVS clients, wake word interaction managers, and user input managers. (List source: AWS)

Rapid Prototyping with Google Assistant Development

The XMOS kit accelerates the development of Amazon Alexa prototypes, while Seeed Technology’sGoogle AIY Voice Kit helps developers build prototypes using Google Assistant. Similar to the XMOS AVS kit, the Seeed Google AIY Voice Kit can be used with the Raspberry Pi 3 board to build prototypes and provides the necessary components (Figure 6).

Mastering Voice Recognition on Raspberry Pi: A Simple Guide

Figure 6: Developers can use the Raspberry Pi 3 with Seeed Technology’s Google AIY Voice Kit (which provides the components needed to build prototypes) to quickly create Google Assistant applications. (Image source: Google)

In addition to the Seeed Voice HAT expansion board (1), microphone board (2), and speaker (4) shown in Figure 6, the kit also includes a cardboard shell (8) and internal frame (9), as well as some basic components including supports (3), cables (6 and 7), and buttons (5).

Developers first connect the RPi 3, speaker wires, and microphone cables to the Voice HAT, then assemble the kit. Unlike the AVS kit, the Google kit provides a simple shell and internal frame to secure the circuit board components and speakers (Figure 7).

Mastering Voice Recognition on Raspberry Pi: A Simple Guide

Figure 7: The Seeed Google AIY Voice Kit includes an internal cardboard frame, which developers fold into a carrier for the circuit board components. (Image source: Seeed Technology)

The frame is installed inside the shell that supports the button and microphone array, completing the assembly (Figure 8).

Mastering Voice Recognition on Raspberry Pi: A Simple Guide

Figure 8: In addition to securing the internal frame and speaker, the Seeed Google AIY Voice Kit’s shell also includes buttons and microphones (which appear as two holes at the top of the shell). (Image source: Seeed Technology)

After downloading the voice kit image and loading it onto the SD card, simply insert the SD card into the RPi and power on the board to call up the kit. After a brief initialization process to confirm that each component is functioning properly, developers need to activate services on the Google Cloud side. To do this, set up a working sandbox area and enable the Google Assistant API to create and download authentication credentials.

Finally, developers need to open a terminal console on the RPi 3 and execute the Python scriptassistant_library_demo.py to launch Google Assistant on the kit. At this point, developers can effortlessly utilize the full functionality of Google Assistant.

Customizing Google Assistant Development

Using the Seeed Google AIY Voice Kit for custom development allows developers to fully leverage the flexibility of the Raspberry Pi. The Seeed Voice HAT exposes multiple GPIOs already configured for typical IO functions on the RPi 3 (Figure 9).

Mastering Voice Recognition on Raspberry Pi: A Simple Guide

Figure 9: Developers can quickly expand the hardware capabilities of the Seeed Google AIY Voice Kit using the I/O ports exposed on the Seeed Voice HAT expansion board. (Image source: Raspberry Pi)

On the software side, developers can easily expand the baseline functionality of the kit using Google’s Voice Kit API software. In addition to supporting software and utilities, the software package also contains sample application software that demonstrates various ways to implement voice services using the Google Cloud Speech API and Google Assistant SDK.

Cloud voice services are fundamentally different from smart assistant approaches, providing voice recognition functionality while leaving the implementation of specific voice-activated operations to the programmer. For designs that only require voice input functionality, this service offers a straightforward solution. Developers simply need to pass audio to the cloud voice service to convert speech to text and return the recognized text, as demonstrated in the sample Python script included in the Voice Kit API (List 2).

import aiy.audio

import aiy.cloudspeech

import aiy.voicehat

def main():

recognizer = aiy.cloudspeech.get_recognizer()

recognizer.expect_phrase(‘turn off the light’)

recognizer.expect_phrase(‘turn on the light’)

recognizer.expect_phrase(‘blink’)

button = aiy.voicehat.get_button()

led = aiy.voicehat.get_led()

aiy.audio.get_recorder().start()

while True:

print(‘Press the button and speak’)

button.wait_for_press()

print(‘Listening…’)

text = recognizer.recognize()

if not text:

print(‘Sorry, I did not hear you.’)

else:

print(‘You said “‘, text, ‘”‘)

if ‘turn on the light’ in text:

led.set_state(aiy.voicehat.LED.ON)

elif ‘turn off the light’ in text:

led.set_state(aiy.voicehat.LED.OFF)

elif ‘blink’ in text:

led.set_state(aiy.voicehat.LED.BLINK)

elif ‘goodbye’ in text:

break

}

if __name__ == ‘__main__’:

main()

List 2: The sample program in the Google Voice Kit API software routine shows how to use the Google Cloud Speech service to convert speech to text, leaving the implementation of any voice-guided operations to the programmer. (List source: Google)

For developers needing broader functionalities of Google Assistant, the Google Assistant SDK offers two implementation options: Google Assistant Library and Google Assistant Service.

The Python-based Google Assistant Library provides a quick way to implement Google Assistant in prototypes, such as in the Seeed Voice Kit. Using this approach, prototypes can instantly leverage basic Google Assistant services, including audio capture, dialogue management, and timers.

In contrast to the Cloud Speech method, the Google Assistant Library manages dialogues by treating each conversation as a series of events related to dialogue and speaking states. Once voice recognition is complete, the instantiated assistant object provides event objects that include the appropriate handling results. As demonstrated in another Google sample script, developers use feature event handling design patterns and a series of if/else statements to handle expected event results (List 3).

import aiy.assistant.auth_helpers

import aiy.audio

import aiy.voicehat

from google.assistant.library import Assistant

from google.assistant.library.event import EventType

def power_off_pi():

aiy.audio.say(‘Good bye!’)

subprocess.call(‘sudo shutdown now’, shell=True)

def reboot_pi():

aiy.audio.say(‘See you in a bit!’)

subprocess.call(‘sudo reboot’, shell=True)

def say_ip():

ip_address = subprocess.check_output(“hostname -I | cut -d’ ‘ -f1”, shell=True)

aiy.audio.say(‘My IP address is %s’ % ip_address.decode(‘utf-8’))

def process_event(assistant, event):

status_ui = aiy.voicehat.get_status_ui()

if event.type == EventType.ON_START_FINISHED:

status_ui.status(‘ready’)

if sys.stdout.isatty():

print(‘Say “OK, Google” then speak, or press Ctrl+C to quit…’)

elif event.type == EventType.ON_CONVERSATION_TURN_STARTED:

status_ui.status(‘listening’)

elif event.type == EventType.ON_RECOGNIZING_SPEECH_FINISHED and event.args:

print(‘You said:’, event.args[‘text’])

text = event.args[‘text’].lower()

if text == ‘power off’:

assistant.stop_conversation()

power_off_pi()

elif text == ‘reboot’:

assistant.stop_conversation()

reboot_pi()

elif text == ‘ip address’:

assistant.stop_conversation()

say_ip()

elif event.type == EventType.ON_END_OF_UTTERANCE:

status_ui.status(‘thinking’)

elif event.type == EventType.ON_CONVERSATION_TURN_FINISHED:

status_ui.status(‘ready’)

def main():

credentials = aiy.assistant.auth_helpers.get_assistant_credentials()

with Assistant(credentials) as assistant:

for event in assistant.start():

process_event(assistant, event)

if __name__ == ‘__main__’:

main()

List 3: The main loop in an application using the Google Assistant Library starts an assistant object and generates a series of events that are handled by the developer’s code. (Image source: Google)

For higher customization needs, developers can turn to the full set of interfaces provided by the Google Assistant Service (formerly known as the Google Assistant gRPC API). Google Assistant Service is based on Google RPC (gRPC), allowing developers to transmit audio queries to the cloud, process the recognized speech text, and handle the corresponding responses. To achieve custom functionality, developers can use various programming languages (including C++, Node.js, and Java) to access the Google Assistant Service API.

When using the Google Assistant SDK for their designs, designers can utilize Google’s device matching feature to implement hardware-specific functionalities. As part of the device setup, developers provide information about the custom device, including functionalities and characteristics known as traits. For user voice requests involving custom devices, the service identifies the valid traits of the device and generates appropriate responses for the device (Figure 10). Developers only need to include corresponding code related to the device traits in the device’s event handler (e.g.,def power_off_pi() in List 3).

Mastering Voice Recognition on Raspberry Pi: A Simple Guide

Figure 10: Google Assistant SDK uses automatic speech recognition (ASR) and natural language processing (NLP) services to match user requests with specific devices and issue responses consistent with the custom device and its recognized traits. (Image source: Google)

Conclusion

In the past, smart voice assistants were largely unattainable for mainstream developers. With the launch of two off-the-shelf kits, developers can quickly implement Amazon Alexa and Google Assistant in custom designs. Each kit allows developers to quickly call upon the respective smart assistant in basic prototypes or expand designs using custom hardware and software..

Original text:https://www.digikey.com.cn/zh/articles/techzone/2018/feb/rapid-prototyping-smart-voice-assistant-raspberry-pi

Mastering Voice Recognition on Raspberry Pi: A Simple Guide

Like Digi-Key’s articles? Visit the Digi-Key website now or follow Digi-Key’s official WeChat!

Mastering Voice Recognition on Raspberry Pi: A Simple Guide

About Electronic Innovation Network
The Electronic Innovation Network publishes the latest global semiconductor industry information, semiconductor supplier updates, exhibition seminar information, technology trend information, and interviews with key figures in a timely manner. Follow our official account for more information.
Mastering Voice Recognition on Raspberry Pi: A Simple Guide

Welcome to follow the annual technology conference. Click to read the original link for more information.

Mastering Voice Recognition on Raspberry Pi: A Simple Guide

Leave a Comment