In-Depth Analysis: How Python’s OrderedDict Maintains Order

In the world of Python, the “orderliness” of dictionaries has been a hot topic among developers. However, since Python 3.7 officially incorporated “dictionaries maintain insertion order” into the language specification in 2018, the related debates have gradually subsided.

Nevertheless, during the era when dictionaries could not guarantee order, developers would often think of collections.OrderedDict when there was a need for ordered storage. Even now that built-in dictionaries support order, as of Python 3.14, OrderedDict remains in the standard library’s collections module for three key reasons:

  • Backward Compatibility: Ensures that old code can run without modification, avoiding failures in historical projects due to version updates.
  • Different Equality Determination Rules: OrderedDict considers the order of keys as part of the equality judgment, while built-in dictionaries only focus on whether key-value pairs are consistent, regardless of order.
  • Support for Additional Features: Provides unique methods like move_to_end, which can quickly move a specified key to the end of the dictionary, for example:

>>> d = OrderedDict([('a', 1), ('b', 2), ('c', 3)])
>>> d.move_to_end('a')
>>> d
OrderedDict([('b', 2), ('c', 3), ('a', 1)])

The Core Implementation of OrderedDict: A Clever Combination of Two Structures

OrderedDict is essentially an “ordered dictionary”—it possesses the characteristics of a regular dictionary while maintaining the insertion order of keys. Its implementation logic revolves around two core ideas:

  1. Inherits from dict: An instance of OrderedDict is itself a dictionary that stores key-value pairs (self is equivalent to {}). It can directly reuse all basic operations of a regular dictionary without redeveloping the functionality to store and retrieve key-value pairs.
  2. New External Data Structure: Records the order of keys through an additional ordered structure, without affecting the O(1) (constant time) operational efficiency of a regular dictionary.

To meet the requirement of “not slowing down efficiency,” OrderedDict chooses a combination of “doubly linked list + auxiliary dictionary,” where each serves its purpose while cooperating:

1. Doubly Linked List: Maintaining Key Order

A doubly linked list is an ordered data structure that supports O(1) time complexity for node insertion and deletion operations. In OrderedDict, each linked list node stores only one key, and the order of the nodes directly corresponds to the insertion order of the keys.

2. Auxiliary Dictionary: Accelerating Node Lookup

The shortcoming of a linked list is “lookup”—to find the node corresponding to a key, one must scan from the head of the list, which takes an average of O(n) (linear time) and can severely slow down overall efficiency. Therefore, OrderedDict adds an auxiliary dictionary (self.__map) that specifically stores the mapping of “keys to linked list nodes,” allowing direct location of the corresponding node through the key, compressing lookup time to O(1).

3. Overall Structure Breakdown

The internal structure of OrderedDict can be divided into three parts, which work together to achieve “ordered storage + efficient operations”:

  • self: The base dictionary that directly stores key-value pairs (e.g., {‘a’:1, ‘b’:2, ‘c’:3}).
  • Doubly Linked List (self.__root as the starting point): Links all nodes corresponding to keys in insertion order, serving as the core carrier of “orderliness.”
  • Auxiliary Dictionary (self.__map): Keys are the keys in OrderedDict, and values are the corresponding linked list nodes, responsible for quick node lookup.

II. Key Method Analysis: Taking setitem as an ExampleTo understand how OrderedDict works, we can look at its __setitem__ method (i.e., the operation of “assigning a value to the dictionary,” such as d[“aa”] = 4) and see how it synchronously updates the three internal structures:


def __setitem__(self, key, value, dict_setitem=dict.__setitem__, proxy=_proxy, Link=_Link):
    if key not in self:
        # 1. Create a linked list node for the new key and store it in the auxiliary dictionary __map
        self.__map[key] = link = Link()
        # 2. Locate the root node of the linked list and the last node
        root = self.__root
        last = root.prev
        # 3. Adjust the previous and next pointers of the new node to insert it at the end of the linked list
        link.prev, link.next, link.key = last, root, key
        last.next = link
        root.prev = proxy(link)  # Use weak reference to avoid circular references
        # 4. Store the key-value pair in the base dictionary self like a regular dictionary
        dict_setitem(self, key, value)

When executing d[“aa”] = 4 to insert a new key-value pair, the three structures will be updated synchronously:

  1. Base Dictionary self: Adds the ‘aa’:4 key-value pair.
  1. Auxiliary Dictionary self.__map: Adds the mapping of ‘aa’ to the corresponding linked list node.
  1. Doubly Linked List: Inserts the node corresponding to ‘aa’ at the end of the linked list, ensuring the order is consistent with the insertion order.

Similarly, the __delitem__ (delete key) and pop (delete and return key value) methods also follow the logic of “synchronously updating the three structures”—first modifying the base dictionary, then adjusting the doubly linked list and auxiliary dictionary, ensuring the order remains accurate after the operation.

III. Implementation of Ordered Iteration: Traversing the Doubly Linked List

The ordered iteration of OrderedDict (e.g., for key in d) is essentially a traversal of the doubly linked list. Its __iter__ method (iterator method) is implemented as follows:


def __iter__(self):
    'od.__iter__() <==> iter(od)'
    root = self.__root
    curr = root.next  # Start from the first valid node of the linked list
    while curr is not root:  # Traverse until reaching the root node (the linked list is circular)
        yield curr.key  # Return each key one by one
        curr = curr.next

The traversal logic is straightforward: starting from the first valid node of the linked list (root.next), it moves backward one by one until returning to the root node (curr is root), yielding each node’s key in the process, thus achieving “iteration in insertion order.”

IV. Two Interesting Implementation Details

In the code of OrderedDict, there are two design details that are considered “textbook-level,” solving practical problems while reflecting the cleverness of Python programming.

1. Using Weak References to Avoid Circular References

Python’s garbage collection mainly relies on “reference counting,” but it cannot handle “circular references”—that is, when two objects reference each other, causing the reference count to never reach zero, preventing memory from being released.

In the doubly linked list, when a new node is inserted at the tail:

  • The new node’s next points to the root node (link.next = root).
  • The root node’s prev points to the new node (root.prev = link).

This creates a “new node ↔ root node” circular reference. To solve this problem, OrderedDict uses weakref.proxy to create a weak reference:

root.prev = proxy(link)  # proxy is an alias for weakref.proxy

A weak reference does not increase the object’s reference count, thus preventing circular references and ensuring that garbage collection can properly reclaim unused nodes.

2. Using object() as a Default Value Sentinel

The pop method of OrderedDict needs to accurately distinguish between “key exists” and “key does not exist”: if the key exists, delete and return the corresponding value; if the key does not exist, return the user-specified default value (or raise an error if not specified).

To implement this logic, OrderedDict defines a private variable __marker, which is set to object():

class OrderedDict(dict):
    __marker = object()  # The unique object created at class definition
    def pop(self, key, default=__marker):
        marker = self.__marker
        # Call the ordinary dictionary's pop, using marker as the default value
        result = dict.pop(self, key, marker)
        if result is not marker:
            # Key exists, update the doubly linked list (code omitted)
            return result
        # Key does not exist, handle the default value logic
        if default is marker:
            raise KeyError(key)
        return default

Why use object()? Because each call to object() creates a unique object that does not duplicate any user data. This means that no matter what default value the user passes in, it will not confuse with __marker, allowing for precise determination of whether the key exists.

V. Conclusion: Design Insights from OrderedDict

The essence of OrderedDict is to balance “orderliness” and “efficient operations” by combining “doubly linked lists + auxiliary dictionaries” based on regular dictionaries. Its design philosophy teaches us:

  • When the basic data structure cannot meet the needs, functionality can be expanded by “combining multiple data structures” rather than developing from scratch.
  • The core of performance optimization is “leveraging strengths and avoiding weaknesses”—using the orderliness of linked lists to maintain order and using the quick lookup of dictionaries to compensate for the shortcomings of linked lists.

Today, although built-in dictionaries support order, the unique equality rules of OrderedDict and methods like move_to_end still make it irreplaceable in specific scenarios. Understanding its implementation principles not only helps us better utilize this tool but also enhances our understanding of “data structure combination design.”

Leave a Comment