โก simdjson: An Ultra-Fast C++ Library for Parsing GB-Level JSON per Second
Have you encountered these issues in web services, big data processing, log analysis, database systems, or real-time communication?
“JSON parsing is too slow, becoming a bottleneck in the system!”
“Is there a faster solution than RapidJSON or nlohmann/json?”
“Can we parse several GB of JSON per second on a single core?”
“How can we leverage modern CPU’s SIMD instructions to accelerate text processing?”
The answer is: simdjson
๐ What is simdjson?
simdjson is a super-fast, single-file, header-only JSON parsing library that utilizes modern CPU’s SIMD (Single Instruction, Multiple Data) instructions (such as SSE, AVX, ARM NEON) to process JSON text in parallel, achieving an astonishing performance of parsing GB-level data per second.
โ It is currently one of the fastest JSON parsers in the world, leading in multiple benchmark tests.
It was developed by Daniel Lemire, Geoff Langdale, and others, and has been widely used in:
- Database systems (such as MongoDB, DuckDB)
- Search engines
- Log processing (such as Loki, Fluent Bit)
- Web servers
- Blockchain and smart contract parsing
- High-frequency trading systems
โ Core Features
| Feature | Description |
|---|---|
| ๐ Extreme Performance | Utilizes SIMD for parallel parsing, far exceeding traditional parsers |
| ๐ฆ Single Header File | Only requires simdjson.h and simdjson.cpp |
| ๐งฉ No External Dependencies | Pure C++, no need for Boost, nlohmann, etc. |
| ๐ DOM + SAX Support | Supports tree structure (DOM) and streaming parsing (SAX) |
| ๐งช Zero Memory Copy | Directly references the original string during parsing |
| ๐ Validation Parsing | Syntax checking and structure building are completed in one parsing |
| ๐ Cross-Platform | Supports x86, ARM, PowerPC, WebAssembly |
| ๐ ๏ธ Automatic CPU Instruction Set Detection | Runtime selection of the optimal implementation (fallback mechanism) |
| ๐ Production Ready | Used by several large companies in online systems |
| ๐ Apache 2.0 License | Can be used for commercial projects |
๐ ๏ธ Installation and Integration
Method 1: Directly Use Single File (Recommended)
wget https://github.com/simdjson/simdjson/releases/latest/download/simdjson.h
wget https://github.com/simdjson/simdjson/releases/latest/download/simdjson.cpp
Then include in your project:
#include "simdjson.h"
Compile with C++11 and optimizations enabled:
g++ -O3 -std=c++11 your_app.cpp simdjson.cpp -o your_app
Method 2: Using CMake
git clone https://github.com/simdjson/simdjson.git
cd simdjson
mkdir build && cd build
cmake .. -DSIMDJSON_BUILD_STATIC=ON
make -j
sudo make install
CMakeLists.txt:
find_package(simdjson REQUIRED)
target_link_libraries(your_app PRIVATE simdjson)
Method 3: Using vcpkg
vcpkg install simdjson
๐ก First Example: Parsing JSON and Accessing Data
โ
simdjson_demo.cpp
#include "simdjson.h"
#include <iostream>
int main() {
// JSON string
const char json[] = R"({
"name": "Alice",
"age": 30,
"is_student": false,
"scores": [85, 92, 78],
"address": {
"city": "Beijing",
"zipcode": "100000"
}
})";
// Create parser (reusable)
simdjson::ondemand::parser parser;
try {
// Parse
simdjson::ondemand::document doc = parser.iterate(json);
// Access fields
std::string_view name = doc["name"];
uint64_t age = doc["age"];
bool is_student = doc["is_student"];
std::string_view city = doc["address"]["city"];
std::cout << "Name: " << name << "\n";
std::cout << "Age: " << age << "\n";
std::cout << "Is Student: " << (is_student ? "true" : "false") << "\n";
std::cout << "City: " << city << "\n";
// Iterate over array
std::cout << "Scores: ";
for (auto score : doc["scores"]) {
std::cout << int64_t(score) << " ";
}
std::cout << "\n";
} catch (const simdjson::error_code& error) {
std::cerr << "Error: " << simdjson::error_message(error) << std::endl;
}
return 0;
}
โ Compile and Run
g++ -O3 -std=c++11 simdjson_demo.cpp simdjson.cpp -o simdjson_demo
./simdjson_demo
๐ Output:
Name: Alice
Age: 30
Is Student: false
City: Beijing
Scores: 85 92 78
๐งฉ Core Technical Principles
1. On-Demand Parsing
- Does not pre-build a complete DOM tree
- Only parses the corresponding part when accessing fields
- Saves memory and time
2. SIMD Parallel Processing
- Uses 64/128/256-bit registers to scan JSON characters in parallel
- For example: quickly finds quotes, brackets, commas using
pcmpestrm(x86) orvqtbl(ARM) - Processes 32 bytes at once, far exceeding traditional character-by-character parsing
3. Staged Parsing
- Stage 1: String and Structure Recognition (SIMD accelerated)
- Stage 2: Value Parsing and Validation
- Stage 3: DOM Construction (optional)
4. Automatic CPU Detection
simdjson::active_implementation->name(); // View the current implementation in use
// Output like: "haswell", "westmere", "arm64", "fallback"
๐ฏ Real-World Application Scenarios
| Scenario | Description |
|---|---|
| Log Processing | Fast parsing of JSON logs (such as Nginx, Fluent Bit) |
| Database | MongoDB, DuckDB use it to accelerate JSON queries |
| Web API | Enhances throughput of RESTful interfaces |
| Big Data | JSON processing in Spark, Flink |
| Blockchain | Parsing transaction and block data |
| Configuration Loading | Quickly read large JSON configuration files |
๐ Performance Comparison (Benchmark Tests)
| Parser | Speed (GB/s) | Memory Usage | Features |
|---|---|---|---|
| simdjson | 2.5 – 4.0+ | Low | SIMD accelerated, fastest |
| RapidJSON | 1.0 – 1.8 | Medium | Fast, but no SIMD |
| nlohmann/json | 0.3 – 0.6 | High | Easy to use, but slow |
| ArduinoJson | 0.5 – 1.0 | Very low | Embedded specific |
| jansson | 0.2 – 0.4 | Medium | Traditional C library |
๐ก On modern CPUs, simdjson can achieve parsing speeds of over 3 GB/s.
โ Why Choose simdjson?
- Unmatched Speed: Fully utilizes SIMD, outperforming traditional solutions
- Production Ready: Used by companies like MongoDB, Cloudflare, Meta
- Easy Integration: Single file, no dependencies
- Modern C++: Supports RAII, exceptions, move semantics
- Apache 2.0 License: Freely usable for commercial projects
- Actively Maintained: Continuous optimization, supports new architectures (like Apple Silicon)
๐ Learning Resources
- Official Website: https://simdjson.org
- GitHub Repository: https://github.com/simdjson/simdjson
- Documentation: https://simdjson.org/api/
- Blog Articles: Daniel Lemire’s https://lemire.me/blog/
- Performance Analysis: https://github.com/simdjson/simdjson/blob/master/doc/benchmarks.md
โ Conclusion
simdjson is not just a JSON library; it is a perfect combination of modern CPU architecture and algorithm engineering.
It implements cutting-edge technology to achieve:
โก Parsing several GB of JSON per second
๐ฆ Extremely low memory overhead
๐ One-time parsing with full validation
Whether you are developing:
- High-throughput web services
- Real-time logging systems
- Big data analytics platforms
- Embedded devices (supports ARM)
simdjson can provide you with unprecedented JSON processing speed.
“In the age of data explosion, speed is competitiveness.”
Integrate simdjson into your project now and unleash the full potential of your CPU!