simdjson: A Powerful C++ Library

โšก simdjson: An Ultra-Fast C++ Library for Parsing GB-Level JSON per Second

Have you encountered these issues in web services, big data processing, log analysis, database systems, or real-time communication?

“JSON parsing is too slow, becoming a bottleneck in the system!”
“Is there a faster solution than RapidJSON or nlohmann/json?”
“Can we parse several GB of JSON per second on a single core?”
“How can we leverage modern CPU’s SIMD instructions to accelerate text processing?”

The answer is: simdjson

๐Ÿ” What is simdjson?

simdjson is a super-fast, single-file, header-only JSON parsing library that utilizes modern CPU’s SIMD (Single Instruction, Multiple Data) instructions (such as SSE, AVX, ARM NEON) to process JSON text in parallel, achieving an astonishing performance of parsing GB-level data per second.

โœ… It is currently one of the fastest JSON parsers in the world, leading in multiple benchmark tests.

It was developed by Daniel Lemire, Geoff Langdale, and others, and has been widely used in:

  • Database systems (such as MongoDB, DuckDB)
  • Search engines
  • Log processing (such as Loki, Fluent Bit)
  • Web servers
  • Blockchain and smart contract parsing
  • High-frequency trading systems

โœ… Core Features

Feature Description
๐Ÿš€ Extreme Performance Utilizes SIMD for parallel parsing, far exceeding traditional parsers
๐Ÿ“ฆ Single Header File Only requires simdjson.h and simdjson.cpp
๐Ÿงฉ No External Dependencies Pure C++, no need for Boost, nlohmann, etc.
๐Ÿ”„ DOM + SAX Support Supports tree structure (DOM) and streaming parsing (SAX)
๐Ÿงช Zero Memory Copy Directly references the original string during parsing
๐Ÿ” Validation Parsing Syntax checking and structure building are completed in one parsing
๐ŸŒ Cross-Platform Supports x86, ARM, PowerPC, WebAssembly
๐Ÿ› ๏ธ Automatic CPU Instruction Set Detection Runtime selection of the optimal implementation (fallback mechanism)
๐Ÿ“Š Production Ready Used by several large companies in online systems
๐Ÿ†“ Apache 2.0 License Can be used for commercial projects

๐Ÿ› ๏ธ Installation and Integration

Method 1: Directly Use Single File (Recommended)

wget https://github.com/simdjson/simdjson/releases/latest/download/simdjson.h
wget https://github.com/simdjson/simdjson/releases/latest/download/simdjson.cpp

Then include in your project:

#include "simdjson.h"

Compile with C++11 and optimizations enabled:

g++ -O3 -std=c++11 your_app.cpp simdjson.cpp -o your_app

Method 2: Using CMake

git clone https://github.com/simdjson/simdjson.git
cd simdjson
mkdir build && cd build
cmake .. -DSIMDJSON_BUILD_STATIC=ON
make -j
sudo make install

CMakeLists.txt:

find_package(simdjson REQUIRED)
target_link_libraries(your_app PRIVATE simdjson)

Method 3: Using vcpkg

vcpkg install simdjson

๐Ÿ’ก First Example: Parsing JSON and Accessing Data

โœ… simdjson_demo.cpp

#include "simdjson.h"
#include <iostream>

int main() {
    // JSON string
    const char json[] = R"({
        "name": "Alice",
        "age": 30,
        "is_student": false,
        "scores": [85, 92, 78],
        "address": {
            "city": "Beijing",
            "zipcode": "100000"
        }
    })";

    // Create parser (reusable)
    simdjson::ondemand::parser parser;

    try {
        // Parse
        simdjson::ondemand::document doc = parser.iterate(json);

        // Access fields
        std::string_view name = doc["name"];
        uint64_t age = doc["age"];
        bool is_student = doc["is_student"];
        std::string_view city = doc["address"]["city"];

        std::cout << "Name: " << name << "\n";
        std::cout << "Age: " << age << "\n";
        std::cout << "Is Student: " << (is_student ? "true" : "false") << "\n";
        std::cout << "City: " << city << "\n";

        // Iterate over array
        std::cout << "Scores: ";
        for (auto score : doc["scores"]) {
            std::cout << int64_t(score) << " ";
        }
        std::cout << "\n";

    } catch (const simdjson::error_code& error) {
        std::cerr << "Error: " << simdjson::error_message(error) << std::endl;
    }

    return 0;
}

โœ… Compile and Run

g++ -O3 -std=c++11 simdjson_demo.cpp simdjson.cpp -o simdjson_demo
./simdjson_demo

๐ŸŽ‰ Output:

Name: Alice
Age: 30
Is Student: false
City: Beijing
Scores: 85 92 78 

๐Ÿงฉ Core Technical Principles

1. On-Demand Parsing

  • Does not pre-build a complete DOM tree
  • Only parses the corresponding part when accessing fields
  • Saves memory and time

2. SIMD Parallel Processing

  • Uses 64/128/256-bit registers to scan JSON characters in parallel
  • For example: quickly finds quotes, brackets, commas using pcmpestrm (x86) or vqtbl (ARM)
  • Processes 32 bytes at once, far exceeding traditional character-by-character parsing

3. Staged Parsing

  1. Stage 1: String and Structure Recognition (SIMD accelerated)
  2. Stage 2: Value Parsing and Validation
  3. Stage 3: DOM Construction (optional)

4. Automatic CPU Detection

simdjson::active_implementation->name(); // View the current implementation in use
// Output like: "haswell", "westmere", "arm64", "fallback"

๐ŸŽฏ Real-World Application Scenarios

Scenario Description
Log Processing Fast parsing of JSON logs (such as Nginx, Fluent Bit)
Database MongoDB, DuckDB use it to accelerate JSON queries
Web API Enhances throughput of RESTful interfaces
Big Data JSON processing in Spark, Flink
Blockchain Parsing transaction and block data
Configuration Loading Quickly read large JSON configuration files

๐Ÿ†š Performance Comparison (Benchmark Tests)

Parser Speed (GB/s) Memory Usage Features
simdjson 2.5 – 4.0+ Low SIMD accelerated, fastest
RapidJSON 1.0 – 1.8 Medium Fast, but no SIMD
nlohmann/json 0.3 – 0.6 High Easy to use, but slow
ArduinoJson 0.5 – 1.0 Very low Embedded specific
jansson 0.2 – 0.4 Medium Traditional C library

๐Ÿ’ก On modern CPUs, simdjson can achieve parsing speeds of over 3 GB/s.

โœ… Why Choose simdjson?

  • Unmatched Speed: Fully utilizes SIMD, outperforming traditional solutions
  • Production Ready: Used by companies like MongoDB, Cloudflare, Meta
  • Easy Integration: Single file, no dependencies
  • Modern C++: Supports RAII, exceptions, move semantics
  • Apache 2.0 License: Freely usable for commercial projects
  • Actively Maintained: Continuous optimization, supports new architectures (like Apple Silicon)

๐Ÿ“š Learning Resources

  • Official Website: https://simdjson.org
  • GitHub Repository: https://github.com/simdjson/simdjson
  • Documentation: https://simdjson.org/api/
  • Blog Articles: Daniel Lemire’s https://lemire.me/blog/
  • Performance Analysis: https://github.com/simdjson/simdjson/blob/master/doc/benchmarks.md

โœ… Conclusion

simdjson is not just a JSON library; it is a perfect combination of modern CPU architecture and algorithm engineering.

It implements cutting-edge technology to achieve:

โšก Parsing several GB of JSON per second
๐Ÿ“ฆ Extremely low memory overhead
๐Ÿ” One-time parsing with full validation

Whether you are developing:

  • High-throughput web services
  • Real-time logging systems
  • Big data analytics platforms
  • Embedded devices (supports ARM)

simdjson can provide you with unprecedented JSON processing speed.

“In the age of data explosion, speed is competitiveness.”

Integrate simdjson into your project now and unleash the full potential of your CPU!

Leave a Comment