FPZIP: A Remarkable C++ Library for Multi-Dimensional Floating Point Array Compression

FPZIP is a C/C++ library developed by Lawrence Livermore National Laboratory for the compression of multi-dimensional floating point arrays. It supports lossless compression of 1D, 2D, and 3D single precision (float) and double precision (double) arrays, and also allows lossy compression by specifying the number of precision bits to retain. The table below provides a quick overview of its core features:

Feature Dimension Supported Options
Compression Type Lossless Compression / Lossy Compression
Data Dimension 1D, 2D, 3D Arrays
Data Type Single Precision (float), Double Precision (double)
Precision Control Floating Point: 8, 16, 24, 32 bits; Double Precision: 16, 32, 48, 64 bits
Input/Output Memory Buffers, Files

🔧 Installing the FPZIP Library

On Debian/Ubuntu systems, you can install the FPZIP development library directly via the package manager:

sudo apt-get install libfpzip-dev

This command will install the libfpzip-dev package, which includes the necessary header files and static libraries for compilation.

💻 Using FPZIP for Compression and Decompression

The FPZIP library provides a simple C interface. Let’s look at a complete code example to see how to use it to compress and decompress a one-dimensional float array.

#include <fpzip.h>
#include <stdio.h>
#include <stdlib.h>

int main() {
    // Initialize example data: an array containing 1000 floats
    const int num_floats = 1000;
    float* original_data = (float*)malloc(num_floats * sizeof(float));
    for (int i = 0; i < num_floats; i++) {
        original_data[i] = (float)i / 10.0f; // Generate some example data
    }

    // Estimate the maximum buffer size needed for compression and allocate memory
    size_t buffer_size = 1024 + num_floats * sizeof(float);
    unsigned char* compressed_buffer = (unsigned char*)malloc(buffer_size);

    // Initialize compression structure
    FPZ* fpz = fpzip_write_to_buffer(compressed_buffer, buffer_size);
    fpz->type = FPZIP_TYPE_FLOAT; // Set data type to float
    fpz->prec = 0;                // 0 indicates lossless compression
    fpz->nx = num_floats;         // Set first dimension size
    fpz->ny = 1;                  // Set second dimension size
    fpz->nz = 1;                  // Set third dimension size
    fpz->nf = 1;                  // Set number of scalar fields

    // Write compression header
    if (!fpzip_write_header(fpz)) {
        fprintf(stderr, "Failed to write compression header\n");
        return 1;
    }

    // Compress data
    size_t compressed_size = fpzip_write(fpz, original_data);
    if (compressed_size == 0) {
        fprintf(stderr, "Compression failed\n");
        return 1;
    }
    fpzip_write_close(fpz); // End compression

    printf("Original size: %zu bytes, Compressed size: %zu bytes, Ratio: %.2f%%\n",
           num_floats * sizeof(float), compressed_size,
           (float)compressed_size / (num_floats * sizeof(float)) * 100.0f);

    // Prepare for decompression
    float* decompressed_data = (float*)malloc(num_floats * sizeof(float));
    FPZ* fpz_read = fpzip_read_from_buffer(compressed_buffer);

    // Read decompression header
    if (!fpzip_read_header(fpz_read)) {
        fprintf(stderr, "Failed to read decompression header\n");
        return 1;
    }

    // Decompress data
    size_t decompressed_count = fpzip_read(fpz_read, decompressed_data);
    if (decompressed_count == 0) {
        fprintf(stderr, "Decompression failed\n");
        return 1;
    }
    fpzip_read_close(fpz_read); // End decompression

    // Verify that the decompressed data is correct
    int success = 1;
    for (int i = 0; i < num_floats; i++) {
        if (original_data[i] != decompressed_data[i]) {
            success = 0;
            break;
        }
    }
    printf("Decompression data verification: %s\n", success ? "PASS" : "FAIL");

    // Free memory
    free(original_data);
    free(compressed_buffer);
    free(decompressed_data);

    return 0;
}

Core Logic of the Code Explained:

  1. Data Initialization: Create and initialize a one-dimensional floating point array original_data.
  2. Configure Compression Parameters: Set key compression parameters using the FPZ structure:
    • type: Specify data type (FPZIP_TYPE_FLOAT or FPZIP_TYPE_DOUBLE).
    • prec: Control precision, 0 indicates lossless compression; you can also set it to 24 for lossy compression.
    • nx, ny, nz: Define the size of each dimension of the array.
    • nf: Typically set to 1, indicating a scalar field.
  3. Execute Compression: Call fpzip_write_header and fpzip_write to complete the compression.
  4. Execute Decompression: Use fpzip_read_from_buffer, fpzip_read_header, and fpzip_read to perform decompression.
  5. Resource Cleanup: After compression and decompression are complete, remember to use fpzip_write_close and fpzip_read_close for cleanup and free dynamically allocated memory.

🛠️ Advanced Applications and Scenarios of FPZIP

After mastering the basic usage, let’s explore some advanced features and application scenarios of FPZIP.

  • Controlling Compression Precision: FPZIP allows you to balance between precision and compression ratio. For example, in the fpz->prec field, you can specify the number of bits to retain. For single precision floating point numbers, you can choose to retain 24 bits for lossy compression. This is very useful in scenarios where data precision is not extremely sensitive, significantly improving the compression ratio.

  • Multi-Dimensional Array Compression: The true advantage of FPZIP lies in handling multi-dimensional arrays. Suppose you have a 100x100x100 3D single precision floating point array, you can set the parameters as follows:

    fpz->type = FPZIP_TYPE_FLOAT;
    fpz->nx = 100;
    fpz->ny = 100;
    fpz->nz = 100;
    fpz->nf = 1;

    By correctly setting the dimensions, FPZIP can more effectively utilize the spatial correlation of the data, achieving better results than simple one-dimensional stream compression.

  • HDF5 Filter Plugin: FPZIP can also be used as a compression filter for HDF5 format. This is very convenient for handling large-scale HDF5 data files commonly found in scientific computing. After installing and configuring the fpzip_plugin, you can directly enable FPZIP compression when creating HDF5 datasets, saving significant space at the storage and I/O levels.

💎 Summary

The FPZIP library provides an efficient, flexible, and reliable solution for compressing floating point array data. It is particularly suitable for handling multi-dimensional floating point data with spatial correlation, such as data from scientific computing, numerical simulations, or large numerical datasets. With its simple C interface, you can easily integrate compression functionality into your C or C++ programs.

Leave a Comment