tsink – High-Performance Rust Embedded Time Series Database

Project Overview

tsink is a high-performance embedded time series database engine written in Rust, focusing on efficient storage and retrieval of time series data. This project is open-source under the MIT license, featuring lightweight and high-performance characteristics, making it particularly suitable for applications that need to handle large amounts of time series data.

Core Features

🚀 High Performance

Utilizes the Gorilla compression algorithm, requiring an average of only about 1.37 bytes per data point

Zero-copy read functionality, achieving efficient disk operations through memory-mapped files

Performs excellently on a single-core AMD Ryzen 7940HS processor

🔒 Concurrency Safety

Lock-free read design

Configurable write worker thread pool supports concurrent writes

Built-in write timeout mechanism (default 60 seconds)

💾 Storage Flexibility

Supports pure in-memory mode or persistent disk storage

Automatic time partitioning (configurable partition duration)

Write-ahead logging (WAL) support ensures data durability and crash recovery

📊 Advanced Features

Supports multi-dimensional labels for metric data

Automatic data expiration (configurable retention period)

Container awareness (cgroup support) optimizes resource allocation in container environments

Automatically handles out-of-order data points

Architecture Design

tsink adopts a linear sequential partitioning model, dividing time series data into clearly defined time-bound blocks:

┌─────────────────────────────────────────┐│ tsink Storage                           │├─────────────────────────────────────────┤│                                         ││  ┌───────────────┐ Active Partition      ││ │ Memory Part.  │◄─ (Writable)          ││ └───────────────┘                       ││                                         ││ ┌───────────────┐ Buffer Partition      ││ │ Memory Part.  │◄─ (Out-of-order)      ││ └───────────────┘                       ││                                         ││ ┌───────────────┐                       ││ │ Disk Part. 1  │◄─ Read-only           ││  └───────────────┘ (Memory-mapped)       ││                                         ││ ┌───────────────┐                       ││ │ Disk Part. 2  │◄─ Read-only           ││  └───────────────┘                       ││ ...                                     │└─────────────────────────────────────────┘

Partition Lifecycle

  1. Active Partition‌: Receives new writes, kept in memory
  2. Buffer Partition‌: Handles out-of-order writes within the recent time window
  3. Flush‌: When the active partition is full, flush to disk
  4. Disk Partition‌: Read-only, memory-mapped to improve query efficiency
  5. Expiration‌: Automatically removes old partitions based on retention policy

Quick Start

Basic Usage

use tsink::{DataPoint, Row, StorageBuilder, Storage, TimestampPrecision};
fn main() -> Result<(), Box> {    // Create storage instance (default settings)    let storage = StorageBuilder::new()        .with_timestamp_precision(TimestampPrecision::Seconds)        .build()?;
    // Insert data points    let rows = vec![        Row::new("cpu_usage", DataPoint::new(1600000000, 45.5)),        Row::new("cpu_usage", DataPoint::new(1600000060, 47.2)),        Row::new("cpu_usage", DataPoint::new(1600000120, 46.8)),    ];    storage.insert_rows(&rows)?;
    // Query data points    let points = storage.select("cpu_usage", &[], 1600000000, 1600000121)?;    for point in points {        println!("Timestamp: {}, Value: {}", point.timestamp, point.value);    }
    storage.close()?;    Ok(())}

Persistent Storage Configuration

use tsink::{StorageBuilder, Storage};
use std::time::Duration;
let storage = StorageBuilder::new()    .with_data_path("./tsink-data")      // Enable disk persistence    .with_partition_duration(Duration::from_secs(3600))  // 1 hour partition    .with_retention(Duration::from_secs(7*24*3600))      // 7 days retention    .with_wal_buffer_size(8192)         // 8KB WAL buffer    .build()?;

Advanced Features

Multi-Dimensional Label Metrics

use tsink::{DataPoint, Label, Row};
// Create metrics with labelslet rows = vec![    Row::with_labels(        "http_requests",        vec![            Label::new("method", "GET"),            Label::new("status", "200"),            Label::new("endpoint", "/api/users"),        ],        DataPoint::new(1600000000, 150.0),    ),    // More labeled data points...];
// Query specific label combinationslet points = storage.select(    "http_requests",    &[        Label::new("method", "GET"),        Label::new("status", "200"),    ],    1600000000, 1600000100,)?;
// Query all label combinations for the metriclet all_results = storage.select_all("http_requests", 1600000000, 1600000100)?;

Concurrent Operations

use std::thread;
use std::sync::Arc;
let storage = Arc::new(StorageBuilder::new().build()?);
// Generate multiple write threadslet mut handles = vec![];for worker_id in 0..10 {    let storage = storage.clone();    let handle = thread::spawn(move || {        for i in 0..1000 {            let row = Row::new(                "concurrent_metric",                 DataPoint::new(1600000000 + i, i as f64),            );            storage.insert_rows(&[row]).unwrap();        }    });    handles.push(handle);}
// Wait for all threadsfor handle in handles {    handle.join().unwrap();}

Production Environment Configuration

let storage = StorageBuilder::new()    .with_data_path("/var/lib/tsink")    .with_retention(Duration::from_secs(30*24*3600))  // 30 days retention    .with_timestamp_precision(TimestampPrecision::Milliseconds)    .with_max_writers(16)                            // Maximum 16 writers    .with_write_timeout(Duration::from_secs(60))    .with_partition_duration(Duration::from_secs(6*3600))  // 6 hours partition    .with_wal_buffer_size(16384)                     // 16KB WAL buffer    .build()?;

https://github.com/h2337/tsink

Leave a Comment