Project Overview
tsink is a high-performance embedded time series database engine written in Rust, focusing on efficient storage and retrieval of time series data. This project is open-source under the MIT license, featuring lightweight and high-performance characteristics, making it particularly suitable for applications that need to handle large amounts of time series data.
Core Features
🚀 High Performance
Utilizes the Gorilla compression algorithm, requiring an average of only about 1.37 bytes per data point
Zero-copy read functionality, achieving efficient disk operations through memory-mapped files
Performs excellently on a single-core AMD Ryzen 7940HS processor
🔒 Concurrency Safety
Lock-free read design
Configurable write worker thread pool supports concurrent writes
Built-in write timeout mechanism (default 60 seconds)
💾 Storage Flexibility
Supports pure in-memory mode or persistent disk storage
Automatic time partitioning (configurable partition duration)
Write-ahead logging (WAL) support ensures data durability and crash recovery
📊 Advanced Features
Supports multi-dimensional labels for metric data
Automatic data expiration (configurable retention period)
Container awareness (cgroup support) optimizes resource allocation in container environments
Automatically handles out-of-order data points
Architecture Design
tsink adopts a linear sequential partitioning model, dividing time series data into clearly defined time-bound blocks:
┌─────────────────────────────────────────┐│ tsink Storage │├─────────────────────────────────────────┤│ ││ ┌───────────────┐ Active Partition ││ │ Memory Part. │◄─ (Writable) ││ └───────────────┘ ││ ││ ┌───────────────┐ Buffer Partition ││ │ Memory Part. │◄─ (Out-of-order) ││ └───────────────┘ ││ ││ ┌───────────────┐ ││ │ Disk Part. 1 │◄─ Read-only ││ └───────────────┘ (Memory-mapped) ││ ││ ┌───────────────┐ ││ │ Disk Part. 2 │◄─ Read-only ││ └───────────────┘ ││ ... │└─────────────────────────────────────────┘
Partition Lifecycle
- Active Partition: Receives new writes, kept in memory
- Buffer Partition: Handles out-of-order writes within the recent time window
- Flush: When the active partition is full, flush to disk
- Disk Partition: Read-only, memory-mapped to improve query efficiency
- Expiration: Automatically removes old partitions based on retention policy
Quick Start
Basic Usage
use tsink::{DataPoint, Row, StorageBuilder, Storage, TimestampPrecision};
fn main() -> Result<(), Box> { // Create storage instance (default settings) let storage = StorageBuilder::new() .with_timestamp_precision(TimestampPrecision::Seconds) .build()?;
// Insert data points let rows = vec![ Row::new("cpu_usage", DataPoint::new(1600000000, 45.5)), Row::new("cpu_usage", DataPoint::new(1600000060, 47.2)), Row::new("cpu_usage", DataPoint::new(1600000120, 46.8)), ]; storage.insert_rows(&rows)?;
// Query data points let points = storage.select("cpu_usage", &[], 1600000000, 1600000121)?; for point in points { println!("Timestamp: {}, Value: {}", point.timestamp, point.value); }
storage.close()?; Ok(())}
Persistent Storage Configuration
use tsink::{StorageBuilder, Storage};
use std::time::Duration;
let storage = StorageBuilder::new() .with_data_path("./tsink-data") // Enable disk persistence .with_partition_duration(Duration::from_secs(3600)) // 1 hour partition .with_retention(Duration::from_secs(7*24*3600)) // 7 days retention .with_wal_buffer_size(8192) // 8KB WAL buffer .build()?;
Advanced Features
Multi-Dimensional Label Metrics
use tsink::{DataPoint, Label, Row};
// Create metrics with labelslet rows = vec![ Row::with_labels( "http_requests", vec![ Label::new("method", "GET"), Label::new("status", "200"), Label::new("endpoint", "/api/users"), ], DataPoint::new(1600000000, 150.0), ), // More labeled data points...];
// Query specific label combinationslet points = storage.select( "http_requests", &[ Label::new("method", "GET"), Label::new("status", "200"), ], 1600000000, 1600000100,)?;
// Query all label combinations for the metriclet all_results = storage.select_all("http_requests", 1600000000, 1600000100)?;
Concurrent Operations
use std::thread;
use std::sync::Arc;
let storage = Arc::new(StorageBuilder::new().build()?);
// Generate multiple write threadslet mut handles = vec![];for worker_id in 0..10 { let storage = storage.clone(); let handle = thread::spawn(move || { for i in 0..1000 { let row = Row::new( "concurrent_metric", DataPoint::new(1600000000 + i, i as f64), ); storage.insert_rows(&[row]).unwrap(); } }); handles.push(handle);}
// Wait for all threadsfor handle in handles { handle.join().unwrap();}
Production Environment Configuration
let storage = StorageBuilder::new() .with_data_path("/var/lib/tsink") .with_retention(Duration::from_secs(30*24*3600)) // 30 days retention .with_timestamp_precision(TimestampPrecision::Milliseconds) .with_max_writers(16) // Maximum 16 writers .with_write_timeout(Duration::from_secs(60)) .with_partition_duration(Duration::from_secs(6*3600)) // 6 hours partition .with_wal_buffer_size(16384) // 16KB WAL buffer .build()?;
https://github.com/h2337/tsink