In this era where constant interaction with users is required, text-to-speech technology is becoming increasingly important. While there are many online services that can convert text to speech, they often require an internet connection, API keys, and paid subscriptions. The Python library pyttsx3 provides a completely offline solution, allowing you to convert text to speech without an internet connection.
A Lifesaver for Special Occasions
A few years ago, while participating in the development of an educational software project for visually impaired individuals, we encountered a tricky problem. The software needed to be deployed in remote schools that might not have a stable internet connection, yet it had to provide high-quality voice feedback. Initially, we tried several cloud-based TTS (text-to-speech) services, but they performed poorly in unstable network environments.
As the project deadline approached, I discovered the pyttsx3 library. This niche but powerful library became a turning point for the project—it required no internet connection, had minimal dependencies, was easy to set up, and could run on various operating systems. Most surprisingly, it supported multiple voices and languages, which was perfect for our multilingual educational content.
After a weekend of integration work, our application gained reliable offline speech synthesis capabilities, ultimately receiving positive feedback from our target user group during testing. Since then, whenever I need to quickly implement speech synthesis without relying on cloud services, pyttsx3 has become my go-to tool.
Installation and Configuration
The beauty of pyttsx3 lies in its construction on the native speech APIs of various operating systems:
- • Uses SAPI5 on Windows
- • Uses NSSpeechSynthesizer on macOS
- • Uses espeak on Linux
This means you do not need to install additional speech engines; just install the pyttsx3 library itself.
Installation Steps
pip install pyttsx3
Looks simple, right? But on some systems, you may need additional dependencies:
Windows Users: Generally, no additional installation is required, as Windows comes with SAPI5.
macOS Users: You may need to install the following dependency:
pip install pyobjc
Linux Users: You need to install espeak, for example, on Debian-based systems:
sudo apt-get install espeak
Common Installation Issues and Solutions
- 1. ImportError: No module named win32com.clientSolution:
pip install pywin32
- 2. No module named ‘drivers’ error on LinuxSolution:
pip install pyttsx3==2.71
Some newer versions have compatibility issues on Linux, using a specific version can resolve this.
- 3. NSSpellCheckerSpellingService error on macOSSolution: Ensure you have the latest version of pyobjc installed:
pip install --upgrade pyobjc
Basic Configuration Example
Here is a simple configuration example to ensure the speech synthesis engine works properly:
import pyttsx3
# Initialize the engine
engine = pyttsx3.init()
# Test speech
engine.say("pyttsx3 test successful, the speech synthesis engine is working properly.")
engine.runAndWait()
Basic Usage and Core Features
The design philosophy of pyttsx3 is straightforward—you can complete text-to-speech conversion in just a few lines of code. Let’s explore its basic features through a series of examples.
Basic Speech Synthesis
import pyttsx3
# Initialize the engine
engine = pyttsx3.init()
# Single sentence synthesis
engine.say("Hello, this is speech generated by pyttsx3.")
engine.runAndWait()
# Multiple sentence synthesis
engine.say("This is the first sentence.")
engine.say("This is the second sentence.")
engine.runAndWait()
<span>runAndWait()</span>
method is crucial—it blocks the program until all queued speech is finished playing. Without this line, your program will continue executing but you won’t hear any sound.
Adjusting Voice Properties
You can easily adjust the rate, volume, and pitch of the voice:
import pyttsx3
engine = pyttsx3.init()
# Get current speech rate
rate = engine.getProperty('rate')
print(f"Current speech rate: {rate}")
# Set new speech rate (default is usually 200)
engine.setProperty('rate', 150) # Decrease speech rate
# Get volume
volume = engine.getProperty('volume')
print(f"Current volume: {volume}")
# Set volume (0.0 to 1.0)
engine.setProperty('volume', 0.8) # 80% volume
# Get available voices
voices = engine.getProperty('voices')
for idx, voice in enumerate(voices):
print(f"Voice #{idx}:")
print(f" - ID: {voice.id}")
print(f" - Name: {voice.name}")
print(f" - Languages: {voice.languages}")
print(f" - Gender: {voice.gender}")
print(f" - Age: {voice.age}")
# Select the second voice (usually female on Windows)
if len(voices) > 1:
engine.setProperty('voice', voices[1].id)
engine.say("This is an example using modified voice properties.")
engine.runAndWait()
Saving Speech to File
In addition to directly playing speech, pyttsx3 also allows you to save speech as an audio file:
import pyttsx3
engine = pyttsx3.init()
# Set the file to save
engine.save_to_file("This text will be saved as an audio file.", "output.mp3")
engine.runAndWait()
Note: The file format depends on the underlying engine’s support. On Windows, it is usually .wav format, even if you specify the .mp3 extension.
Advanced Techniques and Optimization Methods
Once you have mastered the basics, we can explore some advanced techniques to make pyttsx3 even more powerful in practical applications.
Creating Speech Callbacks
pyttsx3 supports callback functions, allowing you to track the start and end of speech:
import pyttsx3
def onStart(name):
print(f"Started speaking: {name}")
def onWord(name, location, length):
print(f"Word: {name}, Location: {location}, Length: {length}")
def onEnd(name, completed):
print(f"Finished speaking: {name}, Completion status: {completed}")
engine = pyttsx3.init()
# Connect callbacks
engine.connect('started-utterance', onStart)
engine.connect('started-word', onWord) # Note: Not all engines support this callback
engine.connect('finished-utterance', onEnd)
engine.say("This is an example with callbacks.")
engine.runAndWait()
These callbacks are very useful, especially when you need to display speech progress in a UI or perform actions after specific sentences finish.
Using Proxy Functionality
pyttsx3 allows the creation of driver proxies, which is particularly useful for cross-thread operations:
import pyttsx3
import threading
import time
def text_to_speech_thread(text):
engine = pyttsx3.init()
engine.say(text)
engine.runAndWait()
print("Speech thread completed")
# Create a thread to handle speech synthesis
speech_thread = threading.Thread(target=text_to_speech_thread, args=("This speech is generated in a separate thread.",))
speech_thread.start()
# Main thread continues to execute other tasks
for i in range(5):
print(f"Main thread working... {i}")
time.sleep(0.5)
speech_thread.join()
print("All threads completed")
Optimizing Long Text Processing
When dealing with long texts, passing the entire article directly to the <span>say()</span>
method may cause delays or performance issues. A better approach is to split the text into smaller paragraphs:
import pyttsx3
import re
def speak_long_text(text, engine):
# Split text into paragraphs
paragraphs = re.split(r'\n\s*\n', text)
for para in paragraphs:
if para.strip(): # Skip empty paragraphs
# Further split long paragraphs into sentences
sentences = re.split(r'(?<=[.!?])\s+', para)
for sentence in sentences:
if sentence.strip():
engine.say(sentence)
print(f"Reading: {sentence}")
engine.runAndWait()
# Example long text
long_text = """
Artificial intelligence is changing our world. From self-driving cars to smart voice assistants, AI technology is everywhere.
In the coming years, we will see more innovative applications. Medical diagnosis, financial analysis, and personalized education will all benefit from the development of AI.
However, we also need to consider the ethical issues and social impacts brought by AI.
"""
engine = pyttsx3.init()
speak_long_text(long_text, engine)
The advantages of this method are:
- 1. Users can hear the first sentence sooner without waiting for the entire text to be processed.
- 2. There are natural pauses between sentences.
- 3. If needed, you can cleanly stop between sentences.
Performance Optimization Tips
- 1. Preload the Engine: If your application will use TTS functionality multiple times, initialize the engine once and reuse it instead of creating a new instance every time speech is needed.
# Global engine instance
tts_engine = None
def get_tts_engine():
global tts_engine
if tts_engine is None:
tts_engine = pyttsx3.init()
# Set default properties
tts_engine.setProperty('rate', 150)
tts_engine.setProperty('volume', 0.8)
return tts_engine
# When using
engine = get_tts_engine()
engine.say("Using the preloaded engine")
engine.runAndWait()
- 1. Batch Processing: If you have multiple short texts to convert to speech, add them to the queue in bulk instead of executing them one by one:
messages = ["First message", "Second message", "Third message"]
engine = pyttsx3.init()
for msg in messages:
engine.say(msg)
# Execute all speech at once
engine.runAndWait()
Application Scenarios and Possibilities
The offline characteristics and simple interface of pyttsx3 make it an ideal choice for various application scenarios. Below are some of the most promising use cases:
Assistive Technology Applications
Screen readers or assistive applications developed for visually impaired individuals are ideal use cases for pyttsx3. These applications require reliable voice output without relying on internet connections:
import pyttsx3
class SimpleScreenReader:
def __init__(self):
self.engine = pyttsx3.init()
# Set a slightly slower speech rate for clarity
self.engine.setProperty('rate', 150)
def read_text(self, text):
self.engine.say(text)
self.engine.runAndWait()
def read_file(self, filename):
try:
with open(filename, 'r', encoding='utf-8') as file:
content = file.read()
print(f"Reading file: {filename}")
self.read_text(content)
except Exception as e:
error_msg = f"Unable to read file: {str(e)}"
print(error_msg)
self.read_text(error_msg)
# Usage example
reader = SimpleScreenReader()
reader.read_text("Welcome to the simple screen reader.")
# reader.read_file("sample.txt")
Educational Tools
Language learning applications and educational software can use pyttsx3 to provide pronunciation examples:
import pyttsx3
import random
import time
class LanguageTutor:
def __init__(self):
self.engine = pyttsx3.init()
# Get all available voices
voices = self.engine.getProperty('voices')
# Set voice (choose a suitable voice)
if len(voices) > 0:
self.engine.setProperty('voice', voices[0].id)
def practice_vocabulary(self, words_dict):
"""Use speech to assist vocabulary practice"""
items = list(words_dict.items())
random.shuffle(items)
score = 0
total = len(items)
for word, meaning in items:
print(f"\nWord: {word}")
self.engine.say(word) # Read the word
self.engine.runAndWait()
time.sleep(0.5) # Give user time to think
user_answer = input("Please enter the meaning of this word: ")
if user_answer.lower() == meaning.lower():
print("✓ Correct!")
score += 1
else:
print(f"✗ Incorrect. The correct answer is: {meaning}")
# Read the word again to reinforce memory
self.engine.say(word)
self.engine.runAndWait()
time.sleep(1) # Short pause before the next word
print(f"\nPractice completed! Score: {score}/{total} ({score/total*100:.1f}%)")
result_speech = f"Practice completed. Your score is {score} out of {total}."
self.engine.say(result_speech)
self.engine.runAndWait()
# Usage example
if __name__ == "__main__":
tutor = LanguageTutor()
vocabulary = {
"apple": "苹果",
"banana": "香蕉",
"computer": "电脑",
"database": "数据库",
"engineer": "工程师"
}
print("==== Vocabulary Practice ====")
print("After hearing the word, enter its Chinese meaning")
tutor.practice_vocabulary(vocabulary)
Automation and IoT Applications
When high-quality speech is not required, pyttsx3 is very suitable for adding voice notifications to automation scripts or IoT devices:
import pyttsx3
import psutil
import time
class SystemMonitor:
def __init__(self, threshold=80):
self.engine = pyttsx3.init()
self.threshold = threshold # CPU usage threshold
def monitor_cpu(self, duration=60, interval=5):
"""Monitor CPU usage and provide voice alerts"""
end_time = time.time() + duration
print(f"Starting to monitor CPU usage, threshold: {self.threshold}%")
self.speak(f"Starting to monitor system CPU usage, warning threshold set to {self.threshold}%")
while time.time() < end_time:
cpu_percent = psutil.cpu_percent(interval=interval)
print(f"Current CPU usage: {cpu_percent}%")
if cpu_percent > self.threshold:
warning = f"Warning! CPU usage has reached {cpu_percent}%, exceeding the threshold!"
print("\033[91m" + warning + "\033[0m") # Red output
self.speak(warning)
time.sleep(1) # Short pause to avoid too frequent checks
self.speak("CPU monitoring completed")
def speak(self, text):
"""Convert text to speech"""
self.engine.say(text)
self.engine.runAndWait()
# Usage example
if __name__ == "__main__":
monitor = SystemMonitor(threshold=70)
monitor.monitor_cpu(duration=30, interval=2) # Monitor for 30 seconds, check every 2 seconds
Content Creation Tools
Create audio versions of blogs, articles, or other text content:
import pyttsx3
import os
import time
class ContentAudioConverter:
def __init__(self, output_dir="audio_content"):
self.engine = pyttsx3.init()
# Set a slower speech rate for clarity
self.engine.setProperty('rate', 150)
# Ensure output directory exists
self.output_dir = output_dir
if not os.path.exists(output_dir):
os.makedirs(output_dir)
def text_to_audio(self, text, filename=None):
"""Convert text to audio file"""
if not filename:
# Use timestamp as default filename
filename = f"audio_{int(time.time())}.wav"
output_path = os.path.join(self.output_dir, filename)
print(f"Generating audio: {output_path}")
# Save as audio file
self.engine.save_to_file(text, output_path)
self.engine.runAndWait()
print(f"Audio generation completed: {output_path}")
return output_path
# Usage example
if __name__ == "__main__":
converter = ContentAudioConverter()
article = """
Python is a widely used interpreted, high-level programming language.
Its design philosophy emphasizes code readability and simplicity of syntax.
Python supports multiple programming paradigms, including object-oriented, imperative, functional, and procedural programming.
"""
audio_file = converter.text_to_audio(article, "python_intro.wav")
print(f"You can find the generated audio here: {audio_file}")
Pros and Cons Analysis and Usage Recommendations
After years of use and exploration, I have a comprehensive understanding of pyttsx3. Below are the main pros and cons of this library along with usage recommendations.
Advantages
- 1. Completely Offline Operation: No internet connection is required, suitable for offline environments.
- 2. No API Key Required: No need to register an account or obtain an API key.
- 3. Cross-Platform Compatibility: Supports Windows, macOS, and Linux.
- 4. Simple and Easy to Use: The API is straightforward, and basic functionality can be achieved in just a few lines of code.
- 5. No Usage Limits: There are no quotas or usage limits.
- 6. Responsive: Local processing allows for fast speech synthesis.
Disadvantages
- 1. Limited Voice Quality: Compared to commercial TTS services, the naturalness of the voice is lower.
- 2. Limited Languages and Voices: Supported languages and voices depend on the operating system.
- 3. Relatively Simple Functionality: Lacks advanced features such as emotional expression and pitch control.
- 4. Dependent on Operating System: Voice effects may vary significantly across different systems.
- 5. Installation Complexity on Some Platforms: Especially on Linux, additional configuration may be required.
- 6. Inadequate Chinese Support: On some systems, Chinese pronunciation may not be natural enough.
Usage Recommendations
- 1. Choose Appropriate Scenarios:
- • Suitable for: Offline applications, rapid prototyping, personal projects, assistive tools, system notifications
- • Not suitable for: Commercial products requiring high-quality speech, professional reading applications, multilingual internationalization applications
- • Adjust speech rate and volume for optimal results (default speech rate is usually too fast)
- • Split long texts into smaller segments for processing
- • Choose appropriate voices for different types of content
- • Consider adding appropriate pauses between sentences
- • If higher quality speech is needed and internet requirements are acceptable, consider Google Text-to-Speech, Amazon Polly, or other commercial TTS services
- • For more advanced offline needs, consider using open-source neural network TTS systems like Mozilla TTS
Conclusion and Outlook
pyttsx3 is a hidden gem in the Python ecosystem. While it may not be as feature-rich or natural-sounding as some commercial TTS services, its simplicity, offline capability, and cross-platform characteristics make it an ideal choice for many application scenarios.
As artificial intelligence and speech synthesis technology advance, we can expect pyttsx3 to potentially integrate more advanced offline TTS engines in the future, providing more natural voices and additional features. Meanwhile, its core advantages—simplicity, offline capability, and no restrictions—will continue to make it invaluable in specific scenarios.
Whether you want to develop assistive tools for visually impaired users, create language learning applications, or simply add voice feedback functionality to your scripts, pyttsx3 is a choice worth considering. It may not be the most advanced, but it is undoubtedly one of the easiest to use and most practical Python TTS libraries.
Have you used pyttsx3 or other text-to-speech libraries? Have you encountered any interesting application scenarios or technical challenges? Feel free to share your experiences and thoughts in the comments! If you have any questions about the use cases or technologies mentioned in the article, please also leave a comment for discussion. Let’s explore more possibilities of Python speech synthesis together!