Introduction
What is a Thread
- 1) A thread is the smallest unit of scheduling in an operating system.
- 2) A thread is the actual executor of a process, consisting of a set of instructions (the owner of process resources).
- 3) Multiple threads within the same process share the same memory space, allowing for direct data access (data sharing).
- 4) To ensure data safety, thread locks must be used.
GIL and Thread Locks
- GIL (Global Interpreter Lock)
1) Under the Python Global Interpreter Lock, only one thread can run at a time.2) It prevents multiple threads from modifying the same data simultaneously.3) Essentially: a. Each thread must acquire the GIL to ensure that only one thread can execute code at any given moment. b. This means that only one thread can use the CPU at a time, so multithreading does not truly execute simultaneously.
- Thread Locks (Mutex Locks)
1) The GIL only ensures that only one thread can operate on a resource at a time, but it may release the GIL before the previous thread has finished executing, allowing other threads to operate.2) The essence of a thread lock is to add a mutex lock to the data within the thread; once a thread lock is applied, all other threads cannot read this data.3) Why do we still need thread locks with the GIL:Because the CPU is used in a time-sharing manner..
- Deadlock
Deadlock is a phenomenon where two or more processes or threads are unable to proceed because they are each waiting for the other to release resources. Without external intervention, they cannot continue execution.
Thread Locks
-
Explanation of Thread Locks
1) The GIL ensures that only one thread can operate on a resource at a time. However, if the previous thread has not finished executing, it may release the GIL, allowing other threads to operate.2) The essence of a thread lock is to add a mutex lock to the data within the thread; once a thread lock is applied, all other threads cannot read this data.3) Why do we still need thread locks with the GIL:Because the CPU is used in a time-sharing manner..
-
Using Thread Locks
# Step 1: lock = threading.Lock() # Define a lock
# Step 2: lock.acquire() # Lock before data operation to prevent data from being accessed by another thread
# Step 3: lock.release() # Release the lock after data operation
Example Code
#!/usr/bin/env python3
# -*- coding:utf-8 -*-
"""@Version: v1.0
@File: threading_join.py
@Time: 2025-11-05 09:47:52
@License: (C)Copyright 2020-2030
@Desc:"""
# here put the import lib
# Step 1: lock = threading.Lock() # Define a lock
# Step 2: lock.acquire() # Lock before data operation to prevent data from being accessed by another thread
# Step 3: lock.release() # Release the lock after data operation
import time
import threading
lock = threading.Lock()
def worker():
global number
print(f"number value ---> {number}")
time.sleep(1)
lock.acquire()
number += 1
lock.release()
Deadlock
-
Definition of Deadlock
Deadlock is a phenomenon where two or more processes or threads are unable to proceed because they are each waiting for the other to release resources. Without external intervention, they cannot continue execution.
-
Example of Deadlock
1) Start 5 threads executing the run method. If thread1 first acquires lock A, and does not release it, then executes the code mutexB.acquire() and acquires lock B, while no other thread competes for lock A, other threads can only wait.2) Thread1 executes func1, then func2, acquiring lock B and executing time.sleep(2), not releasing lock B.3) While thread1 executes func2, thread2 starts executing func1 and acquires lock A, then continues to acquire lock B.4) Unfortunately, lock B is still held by thread1, which needs to acquire lock A to proceed, but finds that lock A is already held by thread2, resulting in a deadlock.
Example Code
#!/usr/bin/env python3
# -*- coding:utf-8 -*-
"""@Version: v1.0
@File: threading_join.py
@Time: 2025-11-05 09:47:52
@License: (C)Copyright 2020-2030
@Desc:"""
# here put the import lib
import time
from threading import Thread, Lock
mutexA = Lock()
mutexB = Lock()
class TestThread(Thread):
def run(self):
self.func1()
self.func2()
def func1(self):
mutexA.acquire()
print("\033[41m%s acquired lock A\033]0m" % self.name)
mutexB.acquire()
print("\033[42m%s acquired lock B\033]0m" % self.name)
mutexB.release()
mutexA.release()
def func2(self):
mutexB.acquire()
print("\033[43m%s acquired lock B\033]0m" % self.name)
time.sleep(2)
mutexA.acquire()
print("\033[44m%s acquired lock A\033]0m" % self.name)
mutexA.release()
mutexB.release()
if __name__ == '__main__':
for i in range(5):
t = TestThread()
t.start()
"""Execution Result: Thread-1 acquired lock A]0mThread-1 acquired lock B]0mThread-1 acquired lock B]0mThread-2 acquired lock A]0m"""
-
Recursive Locks
1) lock = threading.RLock() solves deadlock issues.2) The purpose of a recursive lock is to allow multiple requests for the same resource within the same thread without causing deadlock.3) This RLock internally maintains a Lock and a counter variable, where the counter records the number of acquire calls, allowing the resource to be acquired multiple times.4) Only when all acquires of a thread are released can other threads obtain the resource.
Example Code
#!/usr/bin/env python3
# -*- coding:utf-8 -*-
"""@Version: v1.0
@File: threading_join.py
@Time: 2025-11-05 09:47:52
@License: (C)Copyright 2020-2030
@Desc:"""
# here put the import lib
import time
from threading import Thread, RLock
mutexA = mutexB = RLock()
class TestThread(Thread):
def run(self):
self.func1()
self.func2()
def func1(self):
mutexA.acquire()
print("\033[41m%s acquired lock A\033]0m" % self.name)
mutexB.acquire()
print("\033[42m%s acquired lock B\033]0m" % self.name)
mutexB.release()
mutexA.release()
def func2(self):
mutexB.acquire()
print("\033[43m%s acquired lock B\033]0m" % self.name)
time.sleep(2)
mutexA.acquire()
print("\033[44m%s acquired lock A\033]0m" % self.name)
mutexA.release()
mutexB.release()
if __name__ == '__main__':
for i in range(5):
t = TestThread()
t.start()
"""Execution Result: Thread-1 acquired lock A]0mThread-1 acquired lock B]0mThread-1 acquired lock B]0mThread-1 acquired lock A]0mThread-2 acquired lock A]0m"""
Basics of Python Multithreading
Creating Threads: The threading Module
In Python, the threading module can be used to create and manage threads. The main steps are as follows:
1. Import the threading module
2. Define a subclass that inherits from threading.Thread and override the run() method to implement the thread's execution logic.
3. Create an instance of the subclass and call the start() method to start the thread.
Example Code
import threading
class MyThread(threading.Thread):
def run(self):
print("Thread execution logic")
# Create thread instance and start
t = MyThread()
t.start()
Join and Daemon
-
The join function
1) Ensures that the main thread executes only after all threads have finished executing.2) Code Example
#!/usr/bin/env python3
# -*- coding:utf-8 -*-
"""@Version: v1.0
@File: threading_join.py
@Time: 2025-11-05 09:47:52
@License: (C)Copyright 2020-2030
@Desc:"""
# here put the import lib
import time
import threading
start_time = time.time()
# Thread execution function
def hi(num):
print(f"Running on number: {num}")
time.sleep(30)
tasks = [] # Store process example objects in this list
for i in range(50):
t = threading.Thread(target=hi, args=(i,))
t.start() # Start thread, program will not block
tasks.append(t)
print(threading.active_count()) # Current number of active threads
for t in tasks:
t.join() # Block a program
print(threading.current_thread())
print("==================== All threads have finished! =================")
print(threading.active_count())
print(time.time() - start_time)
-
Daemon Property
1) Daemon threads exit when the main thread exits, requiring child threads to exit with the main thread.2) Code Example
#!/usr/bin/env python3
# -*- coding:utf-8 -*-
"""@Version: v1.0
@File: threading_join.py
@Time: 2025-11-05 09:47:52
@License: (C)Copyright 2020-2030
@Desc:"""
# here put the import lib
import time
import threading
start_time = time.time()
# Thread execution function
def hi(num):
print(f"Running on number: {num}")
time.sleep(30)
tasks = [] # Store process example objects in this list
for i in range(50):
t = threading.Thread(target=hi, args=(i,))
t.daemon = True # Set the current thread as a daemon, must be set before start()
t.start() # Start thread, program will not block
tasks.append(t)
print(threading.active_count()) # Current number of active threads
for t in tasks:
t.join() # Block a program
print(threading.current_thread())
print("==================== All threads have finished! =================")
print(threading.active_count())
print(time.time() - start_time)
Thread Lifecycle
Threads have the following states:
1) Initial State (New): The thread object has been created but has not yet started.2) Ready State (Runnable): The thread has started and is waiting for a CPU time slice.3) Running State: The thread has obtained a CPU time slice and is executing.4) Blocked State: The thread has given up the CPU time slice for some reason and cannot run temporarily.5) Terminated State: The thread has finished execution.
Threads transition between these states until they ultimately enter the terminated state.
Thread Synchronization and Communication
Since threads share process resources, synchronization mechanisms are needed to coordinate thread access and avoid data races and inconsistencies. The threading module provides the following synchronization tools:
- Lock: A mutex lock used to protect critical section resources.
- RLock: A reentrant lock that allows the same thread to acquire the lock multiple times.
- Condition: A condition variable used for notification and waiting between threads.
- Semaphore: A semaphore that controls the number of threads accessing shared resources.
- Event: An event object used for event notification between threads.
Thread Pools and Asynchronous Programming
ThreadPoolExecutor
ThreadPoolExecutor is the thread pool implementation in Python, located in the concurrent.futures module, which conveniently manages multiple threads executing concurrent tasks. Its main features include:
1) Provides the submit() method to submit tasks for execution in the thread pool.
2) Allows control over the size of the thread pool to avoid resource waste from creating too many threads.
3) Supports asynchronous retrieval of task execution results.
Example Code
Asynchronous I/O operations: Implement non-blocking I/O operations using the asynchronous API provided by asyncio.
from concurrent.futures import ThreadPoolExecutor
def task(n):
return n * n
# Create thread pool
with ThreadPoolExecutor(max_workers=3) as executor:
future = executor.submit(task, 5)
result = future.result()
print(result)
Asynchronous I/O and Coroutines
Asynchronous I/O is a non-blocking I/O model that continuously switches execution tasks before I/O operations are completed, improving the concurrent performance of programs. Coroutines in Python are lightweight threads that can voluntarily yield the CPU when encountering I/O operations, allowing other tasks to execute.
-
Introduction to the asyncio Module
asyncio is a module in the Python standard library for writing asynchronous I/O, based on the concepts of event loops and coroutines, providing an efficient solution for asynchronous programming. Its main components include:
1) Event Loop: Responsible for scheduling the execution of coroutine tasks.
2) Coroutines: Asynchronous tasks defined using the async and await keywords.
3) Future Objects: Represent the results of asynchronous operations, used to obtain task execution status and results.
4) Asynchronous I/O Operations: Implemented through asyncio
Example Code
import asyncio
async def main():
print("Hello")
await asyncio.sleep(1)
print("World")
asyncio.run(main())
Thread Synchronization Techniques
Locks and RLocks
Locks (Simple Locks):threading.Lock is a mutex lock used to protect shared resources, ensuring that only one thread can access it at a time. When a thread acquires the lock, other threads must wait for it to be released.
import threading
lock = threading.Lock()
def thread_function():
with lock:
print("Thread is executing")
RLocks (Reentrant Locks):threading.RLock allows a thread that has already acquired the lock to acquire it again, but not in other threads. This is useful in scenarios where the lock needs to be acquired multiple times within a loop.
rlock = threading.RLock()
for _ in range(5):
rlock.acquire() # do something
rlock.release()
Semaphores
Semaphores: threading.Semaphore is used to control the number of threads accessing a resource simultaneously. It maintains a counter; when the counter is greater than 0, threads can acquire it, and the counter decreases by one. When the counter is 0, threads must wait.
semaphore = threading.Semaphore(3)
def thread_function():
semaphore.acquire()
try:
pass # do something
finally:
semaphore.release()
Conditions and Events
Conditions (Condition Variables): threading.Condition is used for communication between threads, allowing threads to enter or exit a waiting state when certain conditions are met. It is usually used in conjunction with locks.
lock = threading.Lock()
cond = threading.Condition(lock)
def thread():
cond.acquire()
try:
# wait for condition
cond.wait()
# do something
finally:
cond.release()
def thread2():
with lock:
# set condition
cond.notify_all()
Events: threading.Event is also used for communication between threads, but it is simply a flag that can be set or cleared. When set, all waiting threads are awakened.
event = threading.Event()
def thread1():
event.wait() # Wait for the event
# do something
event.set() # Set the event, waking up waiting threads
Event Flags
1) event.set(): Set the flag.
2) event.clear(): Clear the flag.
3) event.wait(): Wait for the flag to be set.
4) event.is_set(): Check if the flag is set.
Traffic Light Example Code
#!/usr/bin/env python3
# -*- coding:utf-8 -*-
"""@Version: v1.0
@File: threading_join.py
@Time: 2025-11-05 09:47:52
@License: (C)Copyright 2020-2030
@Desc:"""
# here put the import lib
import time
from threading import Event, Thread
event = Event() # A traffic light loop
def lighter():
count = 0
event.set() # Set to red light first
while True:
if 5 <= count <= 10: # Change to red light
event.clear()
print("Red light is on ......")
elif count > 10:
event.set() # Set the flag again, change to green light
count = 0
else:
print("Green light is on ......")
time.sleep(1)
count += 1
# Car loop
def car(name):
while True:
if event.is_set():
print(f"[{name}] is running")
time.sleep(1)
else:
print(f"[{name}] sees red light, waiting ......")
event.wait()
print(f"[{name}] green light is on, start going ......")
if __name__ == '__main__':
light = Thread(target=lighter)
light.start()
car = Thread(target=car, args=("Tesla",))
car.start()
Queues and Priority Queues
Queues: queue module provides various queue implementations, such as Queue and PriorityQueue. Queue is a FIFO (First In First Out) queue, while PriorityQueue is a priority queue that sorts elements based on their priority.
import queue
q = queue.Queue()
q.put("A")
q.put("B")
q.get()
q.put("C", block=False) # If the queue is full, do not block, throw an exception directly
# Using PriorityQueue
pq = queue.PriorityQueue()
pq.put((3, "C"))
pq.put((1, "A"))
pq.get() # Returns ("A", 1)
Inter-thread Communication and Data Sharing
Shared Memory
Shared memory is a way for threads to communicate. In Python, you can use the multiprocessing module’s Value and Array to create shared memory objects.
from multiprocessing import Value, Array
def worker(counter, array):
with counter.get_lock():
counter.value += 1
array[0] += 1
counter = Value("i", 0) # "i" indicates an integer
array = Array("i", 3) # An integer array of length 3
Pickle and Queue Module
1) Pickle module can serialize Python objects into byte streams for passing between threads.2) Queue module provides a thread-safe queue implementation for inter-thread communication.
import pickle
from queue import Queue
q = Queue()
obj = {'a': 1, 'b': 2}
q.put(pickle.dumps(obj))
received_obj = pickle.loads(q.get())
threading.local
threading.local can create independent data copies for each thread, which is useful for situations where data needs to be shared between threads without causing race conditions.
import threading
local_data = threading.local()
def worker():
local_data.x = 123
print(f"Thread {threading.current_thread().name}: {local_data.x}")
if __name__=="__main__":
t1 = threading.Thread(target=worker)
t2 = threading.Thread(target=worker)
t1.start()
t2.start()
t1.join()
t2.join()
Thread Safety and Best Practices in Concurrent Programming
Avoiding Global Variables
- Global variables are prone to race conditions and thread safety issues in a multithreaded environment.
- Prefer using local variables or encapsulating shared data within objects. If global variables must be used, they should be protected with locks.
Avoiding Deadlocks
- Deadlocks are a common problem in multithreaded programming. The main causes of deadlocks include:1) Circular waiting for resources.2) Improper resource occupation and requests.3) Improper resource allocation strategies.
- Measures to prevent deadlocks include:1) Reasonable design of resource allocation strategies.2) Using ordered locking.3) Implementing timeout mechanisms.4) Using threading.RLock to support reentrancy.
Considerations for Using Thread Pools
Advanced Applications of threading.local
- Thread pools can help manage the creation and destruction of threads, improving performance. However, care must be taken:1) The size of the thread pool should be set reasonably; it should not be too small to affect concurrency or too large to waste resources.2) Task submissions should be arranged reasonably to avoid a large number of tasks piling up in a short time.3) Reasonably set task timeout periods to avoid unresponsive tasks blocking the thread pool.4) Monitor the health of the thread pool and handle exceptions promptly.
Thread-Safe Concurrent Data Structures
In multithreaded programming, using thread-safe data structures can ensure that read and write operations do not result in race conditions and data inconsistencies.
- collections.deque: A thread-safe double-ended queue that can be used for queue operations in a multithreaded environment.
- queue.Queue: A lock-based queue that can be used for producer-consumer models in a multithreaded environment.
- threading.Semaphore: A counting semaphore that can be used to control access to limited resources.
- threading.Lock: A basic mutex lock that can be used to control access to shared resources.
- threading.RLock: A reentrant mutex lock that can be used to control access to shared resources.
concurrent.futures Module
- concurrent.futures is a high-level concurrency library that provides a simple way to use multithreading and multiprocessing.
- ThreadPoolExecutor: A thread pool-based executor that can be used to execute tasks in multithreading.
- ProcessPoolExecutor: A process pool-based executor that can be used to execute tasks in multiprocessing.
- Future: An object that can return results in the future, used for executing tasks in multithreading and multiprocessing.
- threading.local: A thread-local storage object that can be used to store thread-specific data in multithreading.
- Advanced Applications: Can be used to implement thread-isolated database connection pools in multithreading.
import threading
class ThreadLocalDBConnection:
_instances = {}
def __init__(self, db_name):
self.db_name = db_name
def __enter__(self):
if self.db_name not in self._instances:
self._instances[self.db_name] = threading.local()
self._instances[self.db_name].conn = create_connection(self.db_name)
return self._instances[self.db_name].conn
def __exit__(self, exc_type, exc_val, exc_tb):
self._instances[self.db_name].conn.close()
# Usage
# with ThreadLocalDBConnection('db1') as conn:
# Use conn in the current thread
Modern Python Concurrency Frameworks: asyncio and AIOHTTP
The Future of Asynchronous Programming
- Python 3.5 introduced the asyncio library, marking the beginning of Python’s support for asynchronous/coroutine programming, an efficient way to handle I/O-intensive tasks, especially in network programming.
- The future development trends of asynchronous programming include:
1) Broader applications: As server-side and client-side programming continues to evolve, asynchronous programming will become increasingly important, especially in web development, network services, and game development.2) Better performance: Asynchronous programming can significantly reduce blocking and improve the concurrent processing capabilities of programs.3) Asynchronous/parallel mixing: Modern programming may increasingly adopt a combination of asynchronous I/O and parallel computing to fully utilize multi-core processors and network resources.
AIOHTTP Library Introduction
- AIOHTTP (Asynchronous I/O HTTP Client/Server) is a high-performance Python HTTP client and server library based on asyncio.
- Its design goal is to provide an easy-to-use API while maintaining high performance and scalability, particularly suitable for building asynchronous web services and APIs.
- AIOHTTP supports HTTP/1.1 and HTTP/2 protocols, connection pools, request/response caching, automatic retries, streaming, WebSocket, and other features.
- Using AIOHTTP, developers can write cleaner, more efficient network code, reducing blocking and improving concurrent processing capabilities.
Here is a simple AIOHTTP example for sending a GET request:
import asyncio
import aiohttp
async def fetch(session, url):
async with session.get(url) as response:
return await response.text()
async def main():
async with aiohttp.ClientSession() as session:
html = await fetch(session, 'https://example.com')
print(html)
loop = asyncio.get_event_loop()
loop.run_until_complete(main())