In C++, <span>std::vector</span> is a dynamic array that can automatically adjust its capacity at runtime as needed. Many beginners believe that the capacity growth strategy of <span>std::vector</span> is simply “doubling” (i.e., each time it expands to twice its original size), but this is not the case.
The fact is, the C++ standard does not specify a specific growth strategy for <span>std::vector</span>. Different compilers (such as MSVC, GCC, Clang) are free to choose what they consider the optimal growth strategy.
Let’s verify this with code, a simple example:
#include <iostream>
#include <vector>
int main()
{
std::vector<int> vec;
for (int i = 0; i < 100; i++) {
vec.emplace_back(i);
std::cout << "i:" << i << " size:" << vec.size() << " capacity:" << vec.capacity() << std::endl;
}
return 0;
}
The output of this code on MSVC (simplified output):
i:0 size:1 capacity:1
i:1 size:2 capacity:2
i:2 size:3 capacity:3
i:3 size:4 capacity:4
i:4 size:5 capacity:6
i:5 size:6 capacity:6
i:6 size:7 capacity:9
i:8 size:9 capacity:9
i:9 size:10 capacity:13
i:12 size:13 capacity:13
i:13 size:14 capacity:19
i:18 size:19 capacity:19
i:19 size:20 capacity:28
i:27 size:28 capacity:28
i:28 size:29 capacity:42
i:41 size:42 capacity:42
i:42 size:43 capacity:63
i:62 size:63 capacity:63
i:63 size:64 capacity:94
i:93 size:94 capacity:94
i:94 size:95 capacity:141
i:99 size:100 capacity:141
As we can see, after instantiating the <span>std::vector</span>, the capacity defaults to 0, the first four additions all expand by 1, the fifth time it expands by 2, and the seventh time it expands by 3; 4, 6, 9, 14, 21, 31, 47, which is not a doubling strategy.
On GCC or Clang, the output is:
i:0 size:1 capacity:1
i:1 size:2 capacity:2
i:2 size:3 capacity:4
i:3 size:4 capacity:4
i:4 size:5 capacity:8
i:7 size:8 capacity:8
i:8 size:9 capacity:16
i:15 size:16 capacity:16
i:16 size:17 capacity:32
i:31 size:32 capacity:32
i:32 size:33 capacity:64
i:63 size:64 capacity:64
i:64 size:65 capacity:128
i:99 size:100 capacity:128
GCC/Clang follows a doubling growth strategy.
The above are the observed phenomena, and now let’s analyze from the source code implementation perspective.
MSVC source code:
_CONSTEXPR20 pointer _Emplace_reallocate(const pointer _Whereptr, _Valty&&... _Val) {
// reallocate and insert by perfectly forwarding _Val at _Whereptr
_Alty& _Al = _Getal();
auto& _My_data = _Mypair._Myval2;
pointer& _Myfirst = _My_data._Myfirst;
pointer& _Mylast = _My_data._Mylast;
_STL_INTERNAL_CHECK(_Mylast == _My_data._Myend); // check that we have no unused capacity
const auto _Whereoff = static_cast<size_type>(_Whereptr - _Myfirst);
const auto _Oldsize = static_cast<size_type>(_Mylast - _Myfirst);
if (_Oldsize == max_size()) {
_Xlength();
}
const size_type _Newsize = _Oldsize + 1;
size_type _Newcapacity = _Calculate_growth(_Newsize);
const pointer _Newvec = _STD _Allocate_at_least_helper(_Al, _Newcapacity);
...
}
_CONSTEXPR20 size_type _Calculate_growth(const size_type _Newsize) const {
// given _Oldcapacity and _Newsize, calculate geometric growth
const size_type _Oldcapacity = capacity();
const auto _Max = max_size();
if (_Oldcapacity > _Max - _Oldcapacity / 2) {
return _Max; // geometric growth would overflow
}
// Calculate new capacity
const size_type _Geometric = _Oldcapacity + _Oldcapacity / 2;
if (_Geometric < _Newsize) {
return _Newsize; // geometric growth would be insufficient
}
return _Geometric; // geometric growth is sufficient
}
GCC source code:
#if __cplusplus >= 201103L
template<typename _Tp, typename _Alloc>
template<typename... _Args>
#if __cplusplus > 201402L
_GLIBCXX20_CONSTEXPR
typename vector<_Tp, _Alloc>::reference
#else
void
#endif
vector<_Tp, _Alloc>::
emplace_back(_Args&&... __args)
{
if (this->_M_impl._M_finish != this->_M_impl._M_end_of_storage)
{
_GLIBCXX_ASAN_ANNOTATE_GROW(1);
_Alloc_traits::construct(this->_M_impl, this->_M_impl._M_finish,
std::forward<_Args>(__args)...);
++this->_M_impl._M_finish;
_GLIBCXX_ASAN_ANNOTATE_GREW(1);
}
else
_M_realloc_insert(end(), std::forward<_Args>(__args)...);
#if __cplusplus > 201402L
return back();
#endif
}
#endif
#if __cplusplus >= 201103L
template<typename _Tp, typename _Alloc>
template<typename... _Args>
_GLIBCXX20_CONSTEXPR
void
vector<_Tp, _Alloc>::_M_realloc_insert(iterator __position, _Args&&... __args)
#else
template<typename _Tp, typename _Alloc>
void
vector<_Tp, _Alloc>::_M_realloc_insert(iterator __position, const _Tp& __x)
#endif
{
// Calculate new capacity
const size_type __len =
_M_check_len(size_type(1), "vector::_M_realloc_insert");
pointer __old_start = this->_M_impl._M_start;
pointer __old_finish = this->_M_impl._M_finish;
const size_type __elems_before = __position - begin();
pointer __new_start(this->_M_allocate(__len));
pointer __new_finish(__new_start);
__try
{
// The order of the three operations is dictated by the C++11
// case, where the moves could alter a new element belonging
// to the existing vector. This is an issue only for callers
// taking the element by lvalue ref (see last bullet of C++11
// [res.on.arguments]).
_Alloc_traits::construct(this->_M_impl,
__new_start + __elems_before,
#if __cplusplus >= 201103L
std::forward<_Args>(__args)...);
#else
__x);
#endif
__new_finish = pointer();
#if __cplusplus >= 201103L
if _GLIBCXX17_CONSTEXPR (_S_use_relocate())
{
__new_finish = _S_relocate(__old_start, __position.base(),
__new_start, _M_get_Tp_allocator());
++__new_finish;
__new_finish = _S_relocate(__position.base(), __old_finish,
__new_finish, _M_get_Tp_allocator());
}
else
#endif
{
__new_finish
= std::__uninitialized_move_if_noexcept_a
(__old_start, __position.base(),
__new_start, _M_get_Tp_allocator());
++__new_finish;
__new_finish
= std::__uninitialized_move_if_noexcept_a
(__position.base(), __old_finish,
__new_finish, _M_get_Tp_allocator());
}
}
__catch(...)
{
if (!__new_finish)
_Alloc_traits::destroy(this->_M_impl,
__new_start + __elems_before);
else
std::_Destroy(__new_start, __new_finish, _M_get_Tp_allocator());
_M_deallocate(__new_start, __len);
__throw_exception_again;
}
#if __cplusplus >= 201103L
if _GLIBCXX17_CONSTEXPR (!_S_use_relocate())
#endif
std::_Destroy(__old_start, __old_finish, _M_get_Tp_allocator());
_GLIBCXX_ASAN_ANNOTATE_REINIT;
_M_deallocate(__old_start,
this->_M_impl._M_end_of_storage - __old_start);
this->_M_impl._M_start = __new_start;
this->_M_impl._M_finish = __new_finish;
this->_M_impl._M_end_of_storage = __new_start + __len;
}
_GLIBCXX20_CONSTEXPR
size_type
_M_check_len(size_type __n, const char* __s) const
{
if (max_size() - size() < __n)
__throw_length_error(__N(__s));
// New capacity = current size + max(current size, 1)
const size_type __len = std::max)(size(), __n);
return (__len < size() || __len > max_size()) ? max_size() : __len;
}
From the source code, we can see that MSVC adopts an approximate 1.5 times growth strategy, while GCC and Clang typically adopt a doubling (2 times) strategy.
In summary, the growth strategy of <span>std::vector</span> entirely depends on the compiler’s standard library implementation. For GCC/Clang, the doubling strategy is correct, but for MSVC, it is not.
Moreover, this is independent of the Modern C++ standard, as it holds true for both C++11 and C++23.
It can also be concluded that when inserting only a small amount of data, the performance impact is even greater due to the high frequency of memory reallocations. The best practice is to allocate space in advance for a small amount of data to reduce the overhead caused by memory allocation during growth.
I knew you were “watching”