The Misconception of C++ Vector Doubling Capacity Strategy

In C++, std::vector is a dynamic array that can automatically adjust its capacity at runtime as needed. Many beginners believe that the capacity growth strategy of std::vector is simply “doubling” (i.e., each time it expands to twice its original size), but this is not the case.

The fact is, the C++ standard does not specify a specific growth strategy for std::vector. Different compilers (such as MSVC, GCC, Clang) are free to choose what they consider the optimal growth strategy.

Let’s verify this with code, a simple example:

#include <iostream>
#include <vector>

int main()
{
    std::vector<int> vec;

    for (int i = 0; i < 100; i++) {
        vec.emplace_back(i);
        std::cout << "i:" << i << " size:" << vec.size() << " capacity:" << vec.capacity() << std::endl;
    }
    return 0;
}

The output of this code on MSVC (simplified output):

i:0 size:1 capacity:1
i:1 size:2 capacity:2
i:2 size:3 capacity:3
i:3 size:4 capacity:4
i:4 size:5 capacity:6
i:5 size:6 capacity:6
i:6 size:7 capacity:9
i:8 size:9 capacity:9
i:9 size:10 capacity:13
i:12 size:13 capacity:13
i:13 size:14 capacity:19
i:18 size:19 capacity:19
i:19 size:20 capacity:28
i:27 size:28 capacity:28
i:28 size:29 capacity:42
i:41 size:42 capacity:42
i:42 size:43 capacity:63
i:62 size:63 capacity:63
i:63 size:64 capacity:94
i:93 size:94 capacity:94
i:94 size:95 capacity:141
i:99 size:100 capacity:141

As we can see, after instantiating the std::vector, the capacity defaults to 0, the first four additions all expand by 1, the fifth time it expands by 2, and the seventh time it expands by 3; 4, 6, 9, 14, 21, 31, 47, which is not a doubling strategy.

On GCC or Clang, the output is:

i:0 size:1 capacity:1
i:1 size:2 capacity:2
i:2 size:3 capacity:4
i:3 size:4 capacity:4
i:4 size:5 capacity:8
i:7 size:8 capacity:8
i:8 size:9 capacity:16
i:15 size:16 capacity:16
i:16 size:17 capacity:32
i:31 size:32 capacity:32
i:32 size:33 capacity:64
i:63 size:64 capacity:64
i:64 size:65 capacity:128
i:99 size:100 capacity:128

GCC/Clang follows a doubling growth strategy.

The above are the observed phenomena, and now let’s analyze from the source code implementation perspective.

MSVC source code:

    _CONSTEXPR20 pointer _Emplace_reallocate(const pointer _Whereptr, _Valty&amp;&amp;... _Val) {
        // reallocate and insert by perfectly forwarding _Val at _Whereptr
        _Alty&amp; _Al        = _Getal();
        auto&amp; _My_data    = _Mypair._Myval2;
        pointer&amp; _Myfirst = _My_data._Myfirst;
        pointer&amp; _Mylast  = _My_data._Mylast;

        _STL_INTERNAL_CHECK(_Mylast == _My_data._Myend); // check that we have no unused capacity

        const auto _Whereoff = static_cast<size_type>(_Whereptr - _Myfirst);
        const auto _Oldsize  = static_cast<size_type>(_Mylast - _Myfirst);

        if (_Oldsize == max_size()) {
            _Xlength();
        }

        const size_type _Newsize = _Oldsize + 1;
        size_type _Newcapacity   = _Calculate_growth(_Newsize);

        const pointer _Newvec           = _STD _Allocate_at_least_helper(_Al, _Newcapacity);
        ...
    }

    _CONSTEXPR20 size_type _Calculate_growth(const size_type _Newsize) const {
        // given _Oldcapacity and _Newsize, calculate geometric growth
        const size_type _Oldcapacity = capacity();
        const auto _Max              = max_size();

        if (_Oldcapacity > _Max - _Oldcapacity / 2) {
            return _Max; // geometric growth would overflow
        }

        // Calculate new capacity
        const size_type _Geometric = _Oldcapacity + _Oldcapacity / 2;

        if (_Geometric < _Newsize) {
            return _Newsize; // geometric growth would be insufficient
        }

        return _Geometric; // geometric growth is sufficient
    }

GCC source code:

#if __cplusplus >= 201103L
  template<typename _Tp, typename _Alloc>
    template<typename... _Args>
#if __cplusplus > 201402L
      _GLIBCXX20_CONSTEXPR
      typename vector<_Tp, _Alloc>::reference
#else
      void
#endif
      vector<_Tp, _Alloc>::
      emplace_back(_Args&amp;&amp;... __args)
      {
 if (this->_M_impl._M_finish != this->_M_impl._M_end_of_storage)
   {
     _GLIBCXX_ASAN_ANNOTATE_GROW(1);
     _Alloc_traits::construct(this->_M_impl, this->_M_impl._M_finish,
         std::forward<_Args>(__args)...);
     ++this->_M_impl._M_finish;
     _GLIBCXX_ASAN_ANNOTATE_GREW(1);
   }
 else
   _M_realloc_insert(end(), std::forward<_Args>(__args)...);
#if __cplusplus > 201402L
 return back();
#endif
      }
#endif

#if __cplusplus >= 201103L
  template<typename _Tp, typename _Alloc>
    template<typename... _Args>
      _GLIBCXX20_CONSTEXPR
      void
      vector<_Tp, _Alloc>::_M_realloc_insert(iterator __position, _Args&amp;&amp;... __args)
#else
  template<typename _Tp, typename _Alloc>
    void
    vector<_Tp, _Alloc>::_M_realloc_insert(iterator __position, const _Tp&amp; __x)
#endif
    {
      // Calculate new capacity  
      const size_type __len =
 _M_check_len(size_type(1), "vector::_M_realloc_insert");
      pointer __old_start = this->_M_impl._M_start;
      pointer __old_finish = this->_M_impl._M_finish;
      const size_type __elems_before = __position - begin();
      pointer __new_start(this->_M_allocate(__len));
      pointer __new_finish(__new_start);
      __try
 {
   // The order of the three operations is dictated by the C++11
   // case, where the moves could alter a new element belonging
   // to the existing vector.  This is an issue only for callers
   // taking the element by lvalue ref (see last bullet of C++11
   // [res.on.arguments]).
   _Alloc_traits::construct(this->_M_impl,
       __new_start + __elems_before,
#if __cplusplus >= 201103L
       std::forward<_Args>(__args)...);
#else
       __x);
#endif
   __new_finish = pointer();

#if __cplusplus >= 201103L
   if _GLIBCXX17_CONSTEXPR (_S_use_relocate())
     {
       __new_finish = _S_relocate(__old_start, __position.base(),
      __new_start, _M_get_Tp_allocator());

       ++__new_finish;

       __new_finish = _S_relocate(__position.base(), __old_finish,
      __new_finish, _M_get_Tp_allocator());
     }
   else
#endif
     {
       __new_finish
  = std::__uninitialized_move_if_noexcept_a
  (__old_start, __position.base(),
   __new_start, _M_get_Tp_allocator());

       ++__new_finish;

       __new_finish
  = std::__uninitialized_move_if_noexcept_a
  (__position.base(), __old_finish,
   __new_finish, _M_get_Tp_allocator());
     }
 }
      __catch(...)
 {
   if (!__new_finish)
     _Alloc_traits::destroy(this->_M_impl,
       __new_start + __elems_before);
   else
     std::_Destroy(__new_start, __new_finish, _M_get_Tp_allocator());
   _M_deallocate(__new_start, __len);
   __throw_exception_again;
 }
#if __cplusplus >= 201103L
      if _GLIBCXX17_CONSTEXPR (!_S_use_relocate())
#endif
 std::_Destroy(__old_start, __old_finish, _M_get_Tp_allocator());
      _GLIBCXX_ASAN_ANNOTATE_REINIT;
      _M_deallocate(__old_start,
      this->_M_impl._M_end_of_storage - __old_start);
      this->_M_impl._M_start = __new_start;
      this->_M_impl._M_finish = __new_finish;
      this->_M_impl._M_end_of_storage = __new_start + __len;
    }

      _GLIBCXX20_CONSTEXPR
      size_type
      _M_check_len(size_type __n, const char* __s) const
      {
 if (max_size() - size() < __n)
   __throw_length_error(__N(__s));

    // New capacity = current size + max(current size, 1)
 const size_type __len = std::max)(size(), __n);
 return (__len < size() || __len > max_size()) ? max_size() : __len;
      }

From the source code, we can see that MSVC adopts an approximate 1.5 times growth strategy, while GCC and Clang typically adopt a doubling (2 times) strategy.

In summary, the growth strategy of std::vector entirely depends on the compiler’s standard library implementation. For GCC/Clang, the doubling strategy is correct, but for MSVC, it is not.

Moreover, this is independent of the Modern C++ standard, as it holds true for both C++11 and C++23.

It can also be concluded that when inserting only a small amount of data, the performance impact is even greater due to the high frequency of memory reallocations. The best practice is to allocate space in advance for a small amount of data to reduce the overhead caused by memory allocation during growth.

I knew you were “watching”

Related posts

Leave a Comment Cancel reply