Implementing a Lightweight Embedded Database in C++

Creating a Mini Embedded Database with C++ – A Practical Journey from 0 to 1

Databases might sound intimidating, but we can actually create a simple one using C++. Today, I will guide you in implementing a lightweight embedded database that, despite its small size, is fully functional with basic CRUD capabilities. This small project will help you understand the core principles of databases and can be useful in some small projects.

Overall Design

Essentially, a database is a tool for managing data. Our mini database mainly consists of the following parts:

  • Data File Management
  • Serialization and Deserialization of Records
  • Index Mechanism
  • Basic Operation Interface

Let’s take a look at the specific code implementation:

// MiniDB.h
class MiniDB {
private:
    string dbFile;
    fstream fs;
    unordered_map<string, size_t> index;

public:
    struct Record {
        string key;
        string value;
    };

    MiniDB(const string& filename) : dbFile(filename) {
        loadIndex();
    }

    bool put(const string& key, const string& value);
    bool get(const string& key, string& value);
    bool remove(const string& key);
};

Data Storage Design

🤔 To store data in a file, we need to think about how to save it. I chose the simplest method – writing each record in a fixed format: key length + key + value length + value.

bool MiniDB::put(const string& key, const string& value) {
    fs.open(dbFile, ios::binary | ios::app);
    if (!fs) return false;
    
    size_t keyLen = key.length();
    size_t valueLen = value.length();
    size_t pos = fs.tellp();
    
    fs.write((char*)&keyLen, sizeof(keyLen));
    fs.write(key.c_str(), keyLen);
    fs.write((char*)&valueLen, sizeof(valueLen));
    fs.write(value.c_str(), valueLen);
    
    index[key] = pos;
    fs.close();
    return true;
}

Index Mechanism

For fast database queries, an index is essential. I used a simple hash table to record the file position corresponding to each key:

void MiniDB::loadIndex() {
    fs.open(dbFile, ios::binary | ios::in);
    if (!fs) return;
    
    while (fs) {
        size_t pos = fs.tellg();
        size_t keyLen;
        fs.read((char*)&keyLen, sizeof(keyLen));
        if (fs.eof()) break;
        
        char* key = new char[keyLen + 1];
        fs.read(key, keyLen);
        key[keyLen] = '\0';
        
        size_t valueLen;
        fs.read((char*)&valueLen, sizeof(valueLen));
        fs.seekg(valueLen, ios::cur);
        
        index[string(key)] = pos;
        delete[] key;
    }
    fs.close();
}

Query Operation

With the index in place, querying data becomes straightforward; we simply jump to the corresponding position to read:

bool MiniDB::get(const string& key, string& value) {
    if (index.find(key) == index.end()) return false;
    
    fs.open(dbFile, ios::binary | ios::in);
    fs.seekg(index[key]);
    
    size_t keyLen;
    fs.read((char*)&keyLen, sizeof(keyLen));
    fs.seekg(keyLen, ios::cur);
    
    size_t valueLen;
    fs.read((char*)&valueLen, sizeof(valueLen));
    
    char* buf = new char[valueLen + 1];
    fs.read(buf, valueLen);
    buf[valueLen] = '\0';
    value = string(buf);
    
    delete[] buf;
    fs.close();
    return true;
}

⚠️ Friendly reminder: Always check if file operations succeed and don’t forget to close the file. Also, remember to release memory promptly after using pointers.

Delete Operation

Deleting records is a bit more complicated. To keep it simple, we use a mark-delete method, meaning the actual file content is not deleted:

bool MiniDB::remove(const string& key) {
    if (index.find(key) == index.end()) return false;
    index.erase(key);
    return true;
}

This mini database still has many areas for improvement:

  • Compress Storage Space
  • Support Concurrent Access
  • Implement Transaction Mechanism
  • Add Caching Layer

However, for an introductory project, we won’t consider these features for now. Once you master the basics of CRUD operations, you can explore those advanced features later.

🔔 There’s a small trick in the code: I used <span>tellg()</span> and <span>tellp()</span> to get the file position. These two functions are used for reading and writing respectively, so don’t mix them up.

The most important aspect of writing a database is to ensure data’s consistency and reliability. Even if the program crashes, data should not be lost or corrupted. This requires careful design of the file format and operation process.

When you combine all these codes together, you have a fully functional small database. Although the functionality is simple, it covers all the core concepts.

Leave a Comment