Speed Up Python Execution: 8 Effective Techniques

Speed Up Python Execution: 8 Effective Techniques
Source: Data Analysis 1480

This article is about 4400 words long and is suggested to be read in 8 minutes.
This content shares methods to speed up execution purely using <strong>Python</strong> programming.

Python is a scripting language that has some limitations in efficiency and performance compared to compiled languages like C/C++. However, in many cases, the efficiency of Python is not as exaggerated as one might think. This article organizes some techniques for speeding up the execution of Python code.

0. Code Optimization Principles

This article will introduce several techniques for speeding up Python code execution. Before diving into the details of code optimization, it’s important to understand some basic principles of code optimization.
First Basic Principle: Don’t Optimize Too Early
Many people start coding with the goal of performance optimization, “Making a correct program faster is much easier than making a fast program correct.” Therefore, the prerequisite for optimization is that the code works correctly. Optimizing too early may overlook the overall performance metrics, so do not lose sight of the big picture before obtaining global results.
Second Basic Principle: Weigh the Cost of Optimization

Optimization comes at a cost; it is nearly impossible to solve all performance issues. The usual trade-off is between time and space or space and time. Additionally, the development cost must also be considered.

Third Principle: Don’t Optimize Unimportant Parts

If every part of the code is optimized, these changes will make the code difficult to read and understand. If your code runs slowly, first identify where it is slow, which is usually in the inner loops, and focus on optimizing the slow parts. A slight loss of time in other areas does not matter.

1. Avoid Global Variables

# Not Recommended. Code Time: 26.8 seconds
import math
size = 10000
for x in range(size):
    for y in range(size):
        z = math.sqrt(x) + math.sqrt(y)

Many programmers start by writing simple scripts in Python and tend to write them as global variables, as in the code above. However, due to the different implementations of global and local variables, code defined in the global scope runs significantly slower than code defined in functions. By placing script statements into functions, a speed improvement of typically 15% – 30% can be achieved.

# Recommended. Code Time: 20.6 seconds
import math
def main():  # Define in a function to reduce the use of global variables
    size = 10000
    for x in range(size):
        for y in range(size):
            z = math.sqrt(x) + math.sqrt(y)
main()

2. Avoid

2.1 Avoid Module and Function Attribute Access

# Not Recommended. Code Time: 14.5 seconds
import math
def computeSqrt(size: int):
    result = []
    for i in range(size):
        result.append(math.sqrt(i))
    return result
def main():
    size = 10000
    for _ in range(size):
        result = computeSqrt(size)
main()
Each time the . (attribute access operator) is used, specific methods like __getattribute__() and __getattr__() are triggered, which involve dictionary operations and thus incur extra time overhead. Using the from import statement can eliminate attribute access.
# First Optimization. Code Time: 10.9 seconds
from math import sqrt
def computeSqrt(size: int):
    result = []
    for i in range(size):
        result.append(sqrt(i))  # Avoid using math.sqrt
    return result
def main():
    size = 10000
    for _ in range(size):
        result = computeSqrt(size)
main()
In section 1, we mentioned that local variable lookup is faster than global variable lookup, so for frequently accessed variables like sqrt, assigning it to a local variable can speed up execution.
# Second Optimization. Code Time: 9.9 seconds
import math
def computeSqrt(size: int):
    result = []
    sqrt = math.sqrt  # Assign to a local variable
    for i in range(size):
        result.append(sqrt(i))  # Avoid using math.sqrt
    return result
def main():
    size = 10000
    for _ in range(size):
        result = computeSqrt(size)
main()
Besides math.sqrt, the computeSqrt function also has . present, which is the call to the list append method. By assigning this method to a local variable, we can completely eliminate the use of . within the for loop in computeSqrt.
# Recommended. Code Time: 7.9 seconds
import math
def computeSqrt(size: int):
    result = []
    append = result.append
    sqrt = math.sqrt  # Assign to a local variable
    for i in range(size):
        append(sqrt(i))  # Avoid using result.append and math.sqrt
    return result
def main():
    size = 10000
    for _ in range(size):
        result = computeSqrt(size)
main()

2.2 Avoid Class Attribute Access

# Not Recommended. Code Time: 10.4 seconds
import math
from typing import List
class DemoClass:
    def __init__(self, value: int):
        self._value = value
    def computeSqrt(self, size: int) -> List[float]:
        result = []
        append = result.append
        sqrt = math.sqrt
        for _ in range(size):
            append(sqrt(self._value))
        return result

def main():
    size = 10000
    for _ in range(size):
        demo_instance = DemoClass(size)
        result = demo_instance.computeSqrt(size)
main()

The principle of avoiding . also applies to class attributes; accessing self._value is slower than accessing a local variable. By assigning frequently accessed class attributes to local variables, we can improve code execution speed.
# Recommended. Code Time: 8.0 seconds
import math
from typing import List
class DemoClass:
    def __init__(self, value: int):
        self._value = value
    def computeSqrt(self, size: int) -> List[float]:
        result = []
        append = result.append
        sqrt = math.sqrt
        value = self._value
        for _ in range(size):
            append(sqrt(value))  # Avoid using self._value
        return result
def main():
    size = 10000
    for _ in range(size):
        demo_instance = DemoClass(size)
        demo_instance.computeSqrt(size)
main()

3. Avoid Unnecessary Abstraction

# Not Recommended. Code Time: 0.55 seconds
class DemoClass:
    def __init__(self, value: int):
        self.value = value
    @property
    def value(self) -> int:
        return self._value
    @value.setter
    def value(self, x: int):
        self._value = x

def main():
    size = 1000000
    for i in range(size):
        demo_instance = DemoClass(size)
        value = demo_instance.value
        demo_instance.value = i
main()
Any time you use an additional layer of processing (like decorators, property access, descriptors) to wrap code, it will slow down the code. In most cases, it is necessary to reconsider whether the use of property accessors is necessary; using getter/setter functions to access properties is often a coding style left over from C/C++ programmers. If it is really unnecessary, just use simple attributes.
# Recommended. Code Time: 0.33 seconds
class DemoClass:
    def __init__(self, value: int):
        self.value = value  # Avoid unnecessary property accessors

def main():
    size = 1000000
    for i in range(size):
        demo_instance = DemoClass(size)
        value = demo_instance.value
        demo_instance.value = i
main()

4. Avoid Data Copying

4.1 Avoid Meaningless Data Copying

# Not Recommended. Code Time: 6.5 seconds
def main():
    size = 10000
    for _ in range(size):
        value = range(size)
        value_list = [x for x in value]
        square_list = [x * x for x in value_list]
main()
In the code above, value_list is completely unnecessary, creating unnecessary data structures or copies.
# Recommended. Code Time: 4.8 seconds
def main():
    size = 10000
    for _ in range(size):
        value = range(size)
        square_list = [x * x for x in value]  # Avoid meaningless copying
main()
Another situation is being overly paranoid about Python’s data sharing mechanism, not understanding or trusting Python’s memory model well, and misusing functions like copy.deepcopy(). Usually, in these codes, copying operations can be eliminated.

4.2 Avoid Using Intermediate Variables When Swapping Values

# Not Recommended. Code Time: 0.07 seconds
def main():
    size = 1000000
    for _ in range(size):
        a = 3
        b = 5
        temp = a
        a = b
        b = temp
main()
The code above creates a temporary variable temp when swapping values; without using intermediate variables, the code is cleaner and runs faster.
# Recommended. Code Time: 0.06 seconds
def main():
    size = 1000000
    for _ in range(size):
        a = 3
        b = 5
        a, b = b, a  # Avoid using intermediate variables
main()

4.3 Use join for String Concatenation Instead of +

# Not Recommended. Code Time: 2.6 seconds
import string
from typing import List
def concatString(string_list: List[str]) -> str:
    result = ''
    for str_i in string_list:
        result += str_i
    return result
def main():
    string_list = list(string.ascii_letters * 100)
    for _ in range(10000):
        result = concatString(string_list)
main()

When using a + b to concatenate strings, since strings in Python are immutable objects, it allocates a block of memory and copies a and b into that newly allocated memory space. Therefore, if concatenating n strings, n-1 intermediate results will be generated, and each time an intermediate result is created, memory must be allocated and copied, severely impacting runtime efficiency. Using join() for string concatenation first calculates the total memory needed, then allocates the required memory all at once and copies each string element into that memory.

# Recommended. Code Time: 0.3 seconds
import string
from typing import List
def concatString(string_list: List[str]) -> str:
    return ''.join(string_list)  # Use join instead of +
def main():
    string_list = list(string.ascii_letters * 100)
    for _ in range(10000):
        result = concatString(string_list)
main()

5. Utilize Short-Circuiting of If Conditions

# Not Recommended. Code Time: 0.05 seconds
from typing import List
def concatString(string_list: List[str]) -> str:
    abbreviations = {'cf.', 'e.g.', 'ex.', 'etc.', 'flg.', 'i.e.', 'Mr.', 'vs.'}
    abbr_count = 0
    result = ''
    for str_i in string_list:
        if str_i in abbreviations:
            result += str_i
    return result
def main():
    for _ in range(10000):
        string_list = ['Mr.', 'Hat', 'is', 'Chasing', 'the', 'black', 'cat', '.']
        result = concatString(string_list)
main()
The short-circuiting feature of if conditions means that for statements like if a and b, when a is False, b will not be computed; for if a or b, when a is True, b will not be computed. Therefore, to save runtime, for or statements, the variable with a higher likelihood of being True should be placed before or, and for and should be placed afterwards.
# Recommended. Code Time: 0.03 seconds
from typing import List
def concatString(string_list: List[str]) -> str:
    abbreviations = {'cf.', 'e.g.', 'ex.', 'etc.', 'flg.', 'i.e.', 'Mr.', 'vs.'}
    abbr_count = 0
    result = ''
    for str_i in string_list:
        if str_i[-1] == '.' and str_i in abbreviations:  # Utilize short-circuiting of if conditions
            result += str_i
    return result
def main():
    for _ in range(10000):
        string_list = ['Mr.', 'Hat', 'is', 'Chasing', 'the', 'black', 'cat', '.']
        result = concatString(string_list)
main()

6. Loop Optimization

6.1 Use for Loop Instead of while Loop

# Not Recommended. Code Time: 6.7 seconds
def computeSum(size: int) -> int:
    sum_ = 0
    i = 0
    while i < size:
        sum_ += i
        i += 1
    return sum_
def main():
    size = 10000
    for _ in range(size):
        sum_ = computeSum(size)
main()

The for loop in Python is significantly faster than the while loop.

# Recommended. Code Time: 4.3 seconds
def computeSum(size: int) -> int:
    sum_ = 0
    for i in range(size):  # Use for loop instead of while loop
        sum_ += i
    return sum_
def main():
    size = 10000
    for _ in range(size):
        sum_ = computeSum(size)
main()

6.2 Use Implicit for Loop Instead of Explicit for Loop

For the example above, we can further replace the explicit for loop with an implicit for loop.
# Recommended. Code Time: 1.7 seconds
def computeSum(size: int) -> int:
    return sum(range(size))  # Use implicit for loop instead of explicit for loop

def main():
    size = 10000
    for _ in range(size):
        sum = computeSum(size)
main()

6.3 Reduce Calculations in Inner for Loops

# Not Recommended. Code Time: 12.8 seconds
import math
def main():
    size = 10000
    sqrt = math.sqrt
    for x in range(size):
        for y in range(size):
            z = sqrt(x) + sqrt(y)
main() 
In the code above, sqrt(x) is located in the inner for loop, which recalculates it every iteration, increasing time overhead.
# Recommended. Code Time: 7.0 seconds
import math
def main():
    size = 10000
    sqrt = math.sqrt
    for x in range(size):
        sqrt_x = sqrt(x)  # Reduce calculations in inner for loop
        for y in range(size):
            z = sqrt_x + sqrt(y)
main() 

7. Use numba.jit

We will continue with the example introduced above and use numba.jit on top of it. numba can JIT compile Python functions into machine code for execution, greatly improving code execution speed. For more information on numba, see the homepage below:
http://numba.pydata.org/numba.pydata.org
# Recommended. Code Time: 0.62 seconds
import numba
@numba.jit
def computeSum(size: float) -> int:
    sum = 0
    for i in range(size):
        sum += i
    return sum
def main():
    size = 10000
    for _ in range(size):
        sum = computeSum(size)
main()

8. Choose the Right Data Structures

Built-in data structures in Python such as str, tuple, list, set, and dict are implemented in C at the bottom level, making them very fast. Implementing new data structures oneself is almost impossible to achieve the speed of built-in ones.
The list is similar to C++’s std::vector, which is a dynamic array. It pre-allocates a certain amount of memory space, and when the pre-allocated memory space is used up, it requests a larger block of memory and then copies all existing elements over, destroying the previous memory space, and then inserting new elements. Deleting elements operates similarly; when the used memory space is less than half of the pre-allocated memory space, it requests a smaller block of memory, does a copy of elements, then destroys the original larger memory space. Therefore, if there are frequent add and delete operations, and the number of elements added or deleted is large, the efficiency of the list is not high. In this case, consider using collections.deque. collections.deque is a double-ended queue that has the characteristics of both stacks and queues, allowing O(1) complexity insertions and deletions at both ends.
Searching operations on lists are also very time-consuming. When frequent searches for certain elements in a list or frequent ordered accesses to these elements are needed, bisect can be used to maintain the list object in order and perform binary searches to improve search efficiency.
Another common requirement is to find minimum or maximum values; in this case, the heapq module can be used to turn a list into a heap, making the time complexity for getting the minimum value O(1).
The following webpage provides the time complexity of various operations for commonly used Python data structures:
TimeComplexity – Python Wiki wiki.python.org

References

  • https://zhuanlan.zhihu.com/p/143052860

  • David Beazley & Brian K. Jones. Python Cookbook, Third edition. O’Reilly Media, ISBN: 9781449340377, 2013.

  • Ying Zhang & Yonghao Lai. Writing High-Quality Code: 91 Suggestions to Improve Python Programs. Machinery Industry Press, ISBN: 9787111467045, 2014.

Editor: Wang Jing

Speed Up Python Execution: 8 Effective Techniques

Leave a Comment