Complete Guide to Developing a Speech Recognition System with Python

Click the blue text to follow us

Complete Guide to Developing a Speech Recognition System with Python

When Python Meets Audio Processing: Complete Tutorial for Developing a Speech Recognition System!

Hello everyone, I am Xiang Qian.

Today we will talk about how to develop a simple speech recognition system using Python.

This topic sounds impressive, but don’t worry, I will guide you step by step in a simple and understandable way to achieve this interesting project.

Whether you are a Python beginner or a veteran, I believe you can learn something new from this.

Let’s start this colorful Python journey!

1. Preparation

We need to install some necessary libraries. Open your terminal or command prompt and enter the following command:

“`bash
pip install SpeechRecognition pyaudio
“`

Here we installed two main libraries:

SpeechRecognition: A high-level library for speech recognition pyaudio: A library for recording audio

Once installed, we can start coding!

2. Implementing Recording Functionality

Let’s implement the recording functionality. This is the first step in speech recognition, after all, we need sound to recognize!

“`python
import pyaudio
import wave
“`

def record_audio(filename, duration=5, sample_rate=44100, chunk=1024, channels=2):
audio = pyaudio.PyAudio()

Open Audio Stream

stream = audio.open(format=pyaudio.paInt16, channels=channels, rate=sample_rate, input=True, frames_per_buffer=chunk)

print(“Recording…”)
frames = []

Recording

for i in range(0, int(sample_rate / chunk * duration)):
data = stream.read(chunk)
frames.append(data)

print(“Recording finished!”)

Stop Audio Stream

stream.stop_stream()
stream.close()
audio.terminate()

Save Audio File

wf = wave.open(filename, ‘wb’)
wf.setnchannels(channels)
wf.setsampwidth(audio.get_sample_size(pyaudio.paInt16))
wf.setframerate(sample_rate)
wf.writeframes(b”.join(frames))
wf.close()

Using the Function

record_audio(“test.wav”, duration=5)

This code defines a record_audio function that can record audio for a specified duration and save it as a WAV file. We set some default parameters, such as a recording duration of 5 seconds and a sample rate of 44100Hz.

Tip: The sample rate determines the quality of the audio. 44100Hz is CD quality, which may be a bit high for speech recognition; 16000Hz is sufficient.

3. Implementing Speech Recognition

With the recording done, the next step is recognition! We will use the SpeechRecognition library to accomplish this task.

“`python
import speech_recognition as sr
“`

def transcribe_audio(filename):
recognizer = sr.Recognizer()

with sr.AudioFile(filename) as source:
audio = recognizer.record(source)

try:
text = recognizer.recognize_google(audio, language=”zh-CN”)
print(f”Recognition result: {text}”)
return text
except sr.UnknownValueError:
print(“Google Speech Recognition could not understand audio”)
except sr.RequestError as e:
print(f”Could not get results from Google Speech Recognition service; {e}”)

Using the Function

transcribe_audio(“test.wav”)

This transcribe_audio function reads the audio file we just recorded and uses Google’s speech recognition service to convert it to text. Note that we set the language to Chinese (“zh-CN”).

Note: Using Google’s speech recognition service requires an internet connection. If you are in mainland China, you may need to use a VPN or consider other speech recognition APIs.

4. Integrating Functions

Let’s integrate the recording and recognition functions together to create a simple voice assistant:

def voice_assistant():
while True:
print(“\nPlease say something (say ‘exit’ to end the program)”)
record_audio(“temp.wav”, duration=5)
text = transcribe_audio(“temp.wav”)

if text and “exit” in text:
print(“Goodbye!”)
break
elif text:
print(f”You said: {text}”)

More logic can be added here, such as answering questions or executing commands

else:
print(“Sorry, I didn’t catch that. Please say it again.”)

Run the Voice Assistant

voice_assistant()

This simple voice assistant will continuously record and recognize what you say until you say “exit”.

Summary

Today we learned how to create a basic speech recognition system using Python.

We used pyaudio for recording and SpeechRecognition for speech recognition.

This is just the tip of the iceberg in speech processing; you can try adding more features, such as:

1. Using different speech recognition APIs 2. Adding noise cancellation features 3. Implementing real-time speech recognition 4. Combining with natural language processing to understand and respond to user commands

Friends, today’s Python learning journey ends here!

Remember to code, and feel free to ask Xiang Qian in the comments if you have any questions.

Wishing everyone a pleasant learning experience, may your Python skills soar!

‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌

Recommended Reading

Wrote a small tool in Python, and all my colleagues came to ask for the code

After a year of learning Python, my income tripled, and here’s the secret.

Master these Python libraries to make your code more elegant.

Python Architect Advancement: Complete System Design and Optimization Plan!

When Python Meets Audio Processing: Complete Tutorial for Developing a Speech Recognition System!

Open Audio Stream

Recording

Stop Audio Stream

Save Audio File

Using the Function

Using the Function

More logic can be added here, such as answering questions or executing commands

Run the Voice Assistant

Related posts

Leave a Comment Cancel reply