C++ Guide:<span>wstring</span>
– Wide Character String Type
1. What is <span>wstring</span>
?
<span>std::wstring</span>
is a wide character (wchar_t) string type provided by the C++ standard library. Unlike <span>std::string</span>
, <span>std::wstring</span>
is used to store Unicode characters, making it suitable for applications that require multilingual support, such as Chinese, Japanese, Korean, and other non-ASCII languages.
2. Underlying Data Type of <span>wstring</span>
📌 <span>std::string</span>
vs. <span>std::wstring</span>
String Type | Stored Character Type | Common Encoding |
---|---|---|
<span>std::string</span> |
<span>char</span> (single-byte) |
ASCII / UTF-8 |
<span>std::wstring</span> |
<span>wchar_t</span> (wide character) |
UTF-16 / UTF-32 |
Note:
<span>std::wstring</span>
uses<span>wchar_t</span>
to store characters, which may vary in size across different operating systems:- Windows:
<span>wchar_t</span>
is typically 2 bytes (UTF-16) - Linux/macOS:
<span>wchar_t</span>
is typically 4 bytes (UTF-32) <span>wstring</span>
is suitable for programs that need to handle multi-byte character sets (MBCS) or wide character sets (WCS).
3. Basic Usage of <span>wstring</span>
📌 1️⃣ Creating <span>wstring</span>
#include <iostream>
#include <string>
int main() {
std::wstring ws1 = L"你好,世界!"; // L prefix indicates wide string
std::wstring ws2 = L"Hello, Wide World!";
std::wcout << L"Wide character string: " << ws1 << std::endl;
std::wcout << L"English string: " << ws2 << std::endl;
return 0;
}
💡 Note:
- Use the
<span>L</span>
prefix to indicate a wide character string, for example<span>L"你好"</span>
. <span>std::wcout</span>
**is used to output<span>wstring</span>
** (<span>std::cout</span>
cannot directly output<span>wstring</span>
).- It is necessary to set the locale to correctly display
<span>wstring</span>
(see section 5 below).
📌 2️⃣ Common Operations on <span>wstring</span>
#include <iostream>
#include <string>
int main() {
std::wstring ws = L"宽字符字符串";
// Length
std::wcout << L"String length: " << ws.length() << std::endl;
// Concatenate strings
ws += L" - additional content";
std::wcout << L"After concatenation: " << ws << std::endl;
// Access character
std::wcout << L"First character: " << ws[0] << std::endl;
// Find substring
size_t pos = ws.find(L"追加");
if (pos != std::wstring::npos) {
std::wcout << L"Found '追加', position: " << pos << std::endl;
}
return 0;
}
🛠 Common Functions:
Function | Purpose |
---|---|
<span>length()</span> |
Get string length |
<span>append()</span> or <span>+=</span> |
Concatenate strings |
<span>find(L"substring")</span> |
Find substring |
<span>substr(start, length)</span> |
Get substring |
<span>compare()</span> |
Compare strings |
<span>empty()</span> |
Check if empty |
4. Converting between <span>string</span>
and <span>wstring</span>
📌 1️⃣ <span>wstring</span>
→ <span>string</span>
(narrow character)
#include <iostream>
#include <string>
#include <locale>
#include <codecvt>
std::string wstringToString(const std::wstring& wstr) {
std::wstring_convert<std::codecvt_utf8<wchar_t>> converter;
return converter.to_bytes(wstr);
}
int main() {
std::wstring ws = L"你好,世界!";
std::string s = wstringToString(ws);
std::cout << "Converted string: " << s << std::endl;
return 0;
}
📝 <span>std::wstring_convert</span>
is a encoding conversion tool introduced in C++11, which converts <span>wstring</span>
to a UTF-8 encoded <span>string</span>
.
📌 2️⃣ <span>string</span>
→ <span>wstring</span>
(wide character)
#include <iostream>
#include <string>
#include <locale>
#include <codecvt>
std::wstring stringToWstring(const std::string& str) {
std::wstring_convert<std::codecvt_utf8<wchar_t>> converter;
return converter.from_bytes(str);
}
int main() {
std::string s = "Hello, 世界!";
std::wstring ws = stringToWstring(s);
std::wcout << L"Converted wstring: " << ws << std::endl;
return 0;
}
5. Resolving the Issue of <span>std::wcout</span>
Not Displaying Chinese Characters Correctly
In Windows cmd or Linux terminal, directly using <span>std::wcout</span>
may not display <span>wstring</span>
correctly.Solution:
📌 1️⃣ Windows (UTF-16)
#include <iostream>
#include <string>
#include <locale>
int main() {
setlocale(LC_ALL, ""); // Set locale to support Chinese output
std::wstring ws = L"你好,世界!";
std::wcout << L"Correctly displaying wide character: " << ws << std::endl;
return 0;
}
💡 <span>setlocale(LC_ALL, "")</span><span> allows </span><code><span>std::wcout</span>
to correctly display Unicode characters.
📌 2️⃣ Linux/macOS (UTF-32)
The Linux terminal typically uses UTF-8, so it is recommended to use <span>wstring_convert</span>
for conversion:
#include <iostream>
#include <string>
#include <locale>
#include <codecvt>
int main() {
std::wstring ws = L"你好,世界!";
// Convert wstring → string
std::wstring_convert<std::codecvt_utf8<wchar_t>> converter;
std::string utf8_str = converter.to_bytes(ws);
std::cout << "UTF-8 display: " << utf8_str << std::endl;
return 0;
}
6. Summary
Operation | Method |
---|---|
Create <span>wstring</span> |
<span>std::wstring ws = L"你好";</span> |
Output <span>wstring</span> |
<span>std::wcout << ws;</span> (requires <span>setlocale(LC_ALL, "")</span> ) |
Concatenate strings | <span>ws += L"追加";</span> |
Find substring | <span>ws.find(L"substring")</span> |
<span>wstring</span> to <span>string</span> |
<span>std::wstring_convert<std::codecvt_utf8<wchar_t>></span> |
<span>string</span> to <span>wstring</span> |
<span>std::wstring_convert<std::codecvt_utf8<wchar_t>></span> |
🚀 <span>wstring</span>
is suitable for multilingual support and international applications, but be aware of the character encoding of the platform!