Best Practices of Alibaba Group's Graph Database GDB

Guest Speaker: Dou Chao, Alibaba Technical Expert

Editor: Wang Shuai, Kingsoft Cloud

Produced by: DataFunTalk

Introduction:The Alibaba Cloud Graph Database (Graph Database, GDB) is a self-developed, real-time, reliable online graph database service supporting the Property Graph model for handling highly connected data queries and storage.GDB is the only graph database product in China to enter the Forrester Wave evaluation report and is the best practice graph database within Alibaba Group.GDB references the Volcano model executor and NUMA’s Morsel parallel execution operator to construct the GDB parallel executor, significantly enhancing performance in multiple star applications of Alibaba, such as DingTalk and Hema Fresh’s recommendation and query scenarios.Since its launch, GDB has over 100 application scenarios and is widely used both internally and externally within Alibaba Group, covering multiple industries.

Today’s introduction includes the following four aspects:

Product Introduction
Product Capabilities
Graph Executor
Application Scenarios

Graph Database GDB – Product Introduction

1. Self-developed Product by Alibaba

The Graph Database (GDB) is a self-developed product by Alibaba, supporting the Property Graph model for real-time, reliable online graph database services for handling highly connected data queries and storage.

Product Homepage: https://www.aliyun.com/product/gdb

Best Practices of Alibaba Group's Graph Database GDB

GDB originated from the Tair Key-Value model, evolving into Tair Multi-Model for high concurrency queries and writes within the group. With the trend of cloud migration, the Tair GDB graph database was incubated, primarily targeting highly associated complex scenarios.

2. Product Advantages

GDB is the only graph database product in China to enter the Forrester Wave evaluation report. It is the best practice graph database within Alibaba Group: continuously refined through over 30 star applications such as Hema Fresh, Tmall Supermarket, DingTalk, Gaode, Cainiao, and Ele.me, with more than 100 internal and external best practices since moving to the cloud.

Best Practices of Alibaba Group's Graph Database GDB

① Compatible with Mainstream Graph Query Languages: Supports property graphs – compatible with mainstream query languages Gremlin and Cypher, reducing development barriers.

② High Performance, Low Cost:

Compared to self-built Neo4j, costs are reduced by 40%.
Performance for large-scale data queries outperforms Neo4j, Neptune, etc.
Read-only nodes support will soon be available to further improve read performance.

③ High Availability: Instances are highly available with automatic failover of nodes, ensuring business continuity with a 99.95% SLA guarantee.

④ Ease of Use and Maintenance:

Supports ACID transactions; automatic indexing, Schema-Free.
Includes commonly used graph algorithms.
Supports clients in Go, Java, Python, etc.
Provides various data import functions.
Offers backup and recovery, automatic upgrades, monitoring alarms, failover features, significantly reducing maintenance costs.

3. Output Methods

GDB includes various forms such as public cloud, private cloud (for banks and research institutions), and integrable versions (for evaluation scenarios). The table below introduces the base, storage, product capabilities, and selling points of each form.

Best Practices of Alibaba Group's Graph Database GDB

Graph Database GDB – Product Capabilities

1. Convenient Access to Graph Services

The Graph Database GDB is compatible with various query languages, allowing users to migrate smoothly, and can import data from multiple data sources and channels.

① Query Languages

Compatible with the two major mainstream graph query languages, Gremlin and Cypher, facilitating smooth migration to Alibaba Cloud’s GDB.
Supports open-source TinkerPop 3.3.3 (and above) version Driver.
The official guarantees compatibility for Java, Python, Go, .Net, REST.
Supports Gremlin Console.

Best Practices of Alibaba Group's Graph Database GDB

② Data Import

Supports multiple data sources like neo4j, MySQL, odps…
Supports multiple data channels like OSS DataWorks.

Best Practices of Alibaba Group's Graph Database GDB

2. Graph Instance Management & Maintenance-Free

After moving to the cloud, GDB implements graph instance management, and if used on Alibaba Cloud, maintenance costs can be waived.

(1) Periodic Backup, One-Click Recovery

Data is periodically backed up, and in case of damage or loss, can be recovered with one click.

Best Practices of Alibaba Group's Graph Database GDB

(2) 24-Hour Monitoring, Custom Alerts

Instances can be monitored 24/7, with customizable alerts and configurations, such as CPU saturation and data anomalies—data unavailability can trigger phone alerts.

Best Practices of Alibaba Group's Graph Database GDB

(3) One-Click Resizing, Elastic Scaling

Based on customer traffic increases and decreases, one-click resizing and elastic scaling of storage capacity can be achieved.

(4) Graph Visualization

Provides native visualization services, allowing testing based on this visualization product.

Best Practices of Alibaba Group's Graph Database GDB

3. Product Forms

(1) Basic Instances

Basic instances are single-node instances with the following characteristics:

Low cost
No high availability requirement

(2) High Availability Instances

High availability instances use a primary-backup architecture with the following characteristics:

Failover in seconds
Automatic backup recovery

(3) Read-Only Instance Clusters

If customers have high QPS demands, they can use read-only instance clusters with the following characteristics:

Horizontal query scaling
Read-write separation

Best Practices of Alibaba Group's Graph Database GDB

Graph Executor

1. Overall Architecture

(1) Diverse Access

At the access layer, it is compatible with the two major industry mainstream syntaxes, Gremlin and Cypher, while covering multi-language SDKs as well as permission management, authentication management, and account management.

(2) Self-developed Parser

Compared to the native parser, the speed has significantly improved.

(3) Query Optimizer

Includes rule-based optimizer (RBO) and cost-based optimizer (CBO).

(4) Graph Executor

The graph executor can perform mixed loads and parallel execution.

(5) Graph Storage Engine

The graph storage engine is native graph storage, including automatic indexing and ACID transactions.

(6) Efficient Import, Service Control

Best Practices of Alibaba Group's Graph Database GDB

2. Database Executor

In the history of database system development, the optimization of SQL statement query optimizers has always been a focus in the database SQL engine industry, but during the execution of calculations, equally important are the query schedulers and plan executors that execute the execution plans generated by the optimizer.

In the early development of relational databases, constrained by computer IO capabilities, the time spent on calculations during queries was not significant, at which point the roles of schedulers and executors were diminished. The quality of a query mainly depended on the optimizer’s selection of execution plans.

With advancements in computer hardware and enhanced IO capabilities, schedulers and executors have gradually highlighted their importance.

Best Practices of Alibaba Group's Graph Database GDB

3. Volcano Model Executor

① The Volcano model is a mature interpretive computation model in the database field, characterized by:

Relational algebra operators abstracted as iterators, including next
Next produces a data tuple each time it is called
The root node repeatedly calls next to obtain the complete result
MySQL, SqlServer, Oracle

② Advantages

Operator decoupling, easy to extend
Low resource consumption (especially memory)

③ Disadvantages

High overhead of virtual functions
Not favorable for CPU Cache

④ Optimization Solutions

Batch processing, SIMD
Reduce the number of virtual function calls
Improve cache hit rate
Multi-row processing, SIMD parallelism provides possibilities (vectorization)

Best Practices of Alibaba Group's Graph Database GDB

4. Morsel-Driven Parallelism

As data volumes increase and single-core capabilities cannot meet demands, the execution flow is divided into multiple operators based on small data blocks. The CPU will not suffer from frequent context switching, leading to low hit rates.

NUMA: Efficient utilization of multi-core CPUs, faster memory reads, and parallel execution scheduling solutions—the system uses a fixed number of thread pools, and the data during the execution process of each query is divided into fine-grained units (morsels), which are then processed by operator pipelines encapsulated as tasks and handed over to the thread pool for execution.

Best Practices of Alibaba Group's Graph Database GDB

5. GDB Parallel Executor

(1) Introduction to GDB Parallel Executor

Referring to the Volcano model and NUMA’s Morsel parallel execution operator, GDB primarily adopts the Volcano model executor, returning data in batch processing formats. A traversal targets a batch of data.

The entire executor is divided into modules for task recognition and decomposition, resource management, etc., supporting mixed loads (including parallel decomposition of long tasks and iterative execution of short tasks), parallel recognition, and task decomposition. The parts interacting with storage include indexing and data iteration, result asynchronous serialization, etc.

The operator part can perform Morsel parallelization scenarios, allowing an operator to be split into multiple Morsel streams that can run on multiple CPUs. When encountering aggregation operators, they can be aggregated and returned uniformly, thus enabling mixed serial and parallel processing.

Best Practices of Alibaba Group's Graph Database GDB

Mixed loads include short tasks running directly and long tasks being split for thread pool execution. During execution, multi-level exception handling is adopted, where Gather node exceptions are prioritized for return, and sub-task exceptions first return to the parent task, followed by batch termination and release.

Best Practices of Alibaba Group's Graph Database GDB

(2) Parallel Executor Case Study

① DingTalk Friend Recommendations

a. Scenario

Second-degree friend recommendations

For new users coming in, find out who their friends are.

Batch processing of adjacent nodes

In a subsidiary with thousands of employees, Group by processing can be performed. The previous Morsel stream can be used for processing.

b. Business Value

The GDB graph database provides DingTalk with hundreds of billions of relational storage and query capabilities, managing relationships between people, between people and enterprises, and between enterprises. It provides core capabilities for functions like corporate identification, user intimacy calculation, executive predictions, and colleague relationship predictions.

Best Practices of Alibaba Group's Graph Database GDB

② Hema Fresh

a. Scenario: Hema Fresh Online Recommendations

Associated dish recommendations
Multiple path queries for recipes

The DSL request response time for Hema’s business was reduced from 210ms to under 100ms; before optimization, the flatMap took 140ms, and the total DSL time was 210ms. After optimization, the flatMap took 33ms, and the total DSL time was 82ms.

Best Practices of Alibaba Group's Graph Database GDB

6. Other Database Product Executors

(1) Oracle 10g

Based on Oracle7, parallel optimization was performed. There are multiple Oracle data execution nodes, each with full data, stored in a shared disk. The PSC mechanism is adopted, with each execution point sharing the plan tree, and PX distributes the load across the nodes. Different execution operators run on different nodes. Serial execution on a single machine and parallel execution across multiple machines can be combined. After execution, results are aggregated and returned uniformly.

Best Practices of Alibaba Group's Graph Database GDB

(2) PolarDB-X Parallel Executor

Alibaba’s PolarDB-X is based on the Volcano model, with multiple MySQL nodes in the storage layer. The aggregation operators above can parallelly send execution plans to the MySQL database.

Simple statements pushed down to the storage layer for MySQL execution
Volcano executor, Gather
DataChunk batch pulling

Best Practices of Alibaba Group's Graph Database GDB

(3) Nebula Graph Query Engine

Nebula, as an open-source graph query engine, has the following characteristics:

① Storage and computation separation

② Volcano execution engine

③ Optimizer + Plan Cache

④ Execution plan optimization, filtering pushdown

⑤ Scheduler + Executor

⑥ Each Executor goes through four stages: create-open-execute-close

⑦ Nebula parallel implementation

Query engine scheduling
Parallel jobs sent to the storage engine

Best Practices of Alibaba Group's Graph Database GDB

Application Scenarios

Since GDB’s launch, there have been over 100 application scenarios, with more than 30 star applications within Alibaba Group, including social/recommendation, risk control, knowledge graphs, and relationship management; among external users, it covers industries such as internet, e-commerce, gaming, finance, education, government, and pharmaceuticals.

Best Practices of Alibaba Group's Graph Database GDB

That’s all for today’s sharing, thank you, everyone.

Please share, like, and give a three-hit combo at the end of the article~

01/Guest Speaker

Best Practices of Alibaba Group's Graph Database GDB

Dou Chao

Alibaba Technical Expert

Master’s graduate from Xi’an University of Electronic Science and Technology, previously engaged in TFS storage development at Tencent, now mainly responsible for the development of the GDB graph database.

02/ Free Download of Materials

03/Sign Up for Live Broadcast and Get PPT for Free

Best Practices of Alibaba Group's Graph Database GDB

04/About Us

DataFun: Focused on sharing and communication of big data and artificial intelligence technology applications. Launched in 2017, it has held over 100 offline and 100 online salons, forums, and summits in cities like Beijing, Shanghai, Shenzhen, and Hangzhou, inviting over 2000 experts and scholars to share. Its WeChat public account, DataFunTalk, has produced over 700 original articles, with over a million readings and 140,000+ precise followers.

🧐 Share, Like, and Watch, give a three-hit combo!👇

Best Practices of Alibaba Group’s Graph Database GDB

Leave a Comment Cancel reply

Related posts

Leave a Comment Cancel reply