How Does Python’s Garbage Collection Mechanism (GC) Work? An Analysis of Reference Counting and Generational Collection Principles

One day, the memory of the online server exploded. I stared at the monitoring chart, thinking that the code was clearly fine.

After checking for a long time, I found out that it was actually a memory leak caused by circular references. Objects referencing each other caused Python’s reference counting mechanism to fail, and the garbage collector did not reclaim them in time.

This made me deeply realize how important it is to understand Python’s garbage collection mechanism.

01

Python’s garbage collection mainly relies on two mechanisms. Reference counting is the primary method, while generational collection is a supplementary one.

Reference counting is quite simple. Each object has a counter that records how many variables point to it. When the count reaches 0, the object is immediately reclaimed.

import sys

# Create a list object
my_list = [1, 2, 3]
print(sys.getrefcount(my_list))  # Output: 2 (sys.getrefcount itself also increases the reference)

# Create another reference
another_ref = my_list
print(sys.getrefcount(my_list))  # Output: 3

# Delete the reference
del another_ref
print(sys.getrefcount(my_list))  # Output: 2

It seems perfect, right?

Unfortunately, reality is always cruel.

02

Circular references are the Achilles’ heel of reference counting. When two objects reference each other, even if there are no external variables pointing to them, their reference count will not reach 0.

class Node:
    def __init__(self, value):
        self.value = value
        self.parent = None
        self.children = []

# Create circular references
parent = Node("parent")
child = Node("child")

parent.children.append(child)  # parent references child
child.parent = parent          # child references parent

# Even if external references are deleted, their reference counts are not 0
del parent
del child
# These two objects form an island and cannot be reclaimed by the reference counting mechanism

I encountered this situation in a project. The tree-structured data had parent nodes referencing child nodes, and child nodes referencing parent nodes. Over time, the memory exploded.

How does Python solve this problem?

03

The generational collection mechanism comes into play. Python divides objects into three generations, with newly created objects in generation 0.

The basic idea is that objects that survive longer are less likely to become garbage. Therefore, the frequency of checks is also lower.

import gc

# View garbage collection statistics
print(gc.get_stats())

# Manually trigger garbage collection
collected = gc.collect()
print(f"Collected {collected} objects")

# Check how many objects are currently present
print(f"Number of generation 0 objects: {gc.get_count()[0]}")
print(f"Number of generation 1 objects: {gc.get_count()[1]}")  
print(f"Number of generation 2 objects: {gc.get_count()[2]}")

The core algorithm of generational collection is the three-color marking. White indicates potential garbage, gray indicates objects being checked, and black indicates objects that are definitely not garbage.

When generation 0 is full, garbage collection is triggered. Surviving objects are promoted to generation 1.

04

In actual development, I have summarized several best practices.

First, try to avoid circular references. If they must be used, consider using weak references.

import weakref

class Parent:
    def __init__(self):
        self.children = []

class Child:
    def __init__(self, parent):
        # Use weak references to avoid circular references
        self.parent = weakref.ref(parent)
        parent.children.append(self)
    
    def get_parent(self):
        # Weak reference may have already been reclaimed
        parent = self.parent()
        if parent is None:
            print("Parent object has been reclaimed")
        return parent

Second, delete large objects promptly after processing. Although deleting does not immediately free memory, it will reduce the reference count.

def process_large_data():
    # Process a large amount of data
    large_list = list(range(1000000))
    
    # Delete the reference promptly after processing
    result = sum(large_list)
    del large_list  # Reduce reference count
    
    return result

There is also a trick.

05

When optimizing performance, you can appropriately adjust the garbage collection thresholds.

import gc

# View current thresholds
print(gc.get_threshold())  # Default is (700, 10, 10)

# Adjust thresholds to reduce collection frequency
gc.set_threshold(800, 15, 15)

# Or temporarily disable garbage collection in critical code segments
gc.disable()
# Execute critical code...
gc.enable()

But be careful. Disabling garbage collection may lead to memory leaks.

I did this in an image processing project. When processing a large batch of images, temporarily disabling garbage collection improved performance by 30%. After processing, I manually triggered garbage collection once.

Of course, the current version of Python has a very intelligent garbage collection mechanism. In most cases, manual intervention is not necessary unless encountering specific performance bottlenecks or actual memory leak issues.

Understanding the garbage collection mechanism is more about writing better code. Knowing when objects will be reclaimed and when leaks may occur gives you a solid foundation.

The next time you encounter a memory issue, at least you know where to start troubleshooting.

Leave a Comment