Using TinkerPop Framework for CRUD Operations on GDB

Using TinkerPop Framework for CRUD Operations on GDB

This article introduces the points to note when using GDB as storage in server-side development. It implements several common examples using the TinkerPop framework, showcasing the important aspects of CRUD operations on GDB and the differences between two methods of submitting GDB operations.

Introduction
As one of the more well-known and general-purpose graph database server frameworks, TinkerPop (address: https://tinkerpop.apache.org/) with Gremlin is a common choice for many. Some students have researched several graph databases used by Alibaba and summarized the following advantages and disadvantages:
Query Language
Data Loading
Commercialization
Management and Operation
Features
GDB
Partially supports Gremlin
Supports Cypher
Batch: MaxCompute, OSS
Incremental: SDK
Alibaba Cloud Products
RDS Management
Alibaba Cloud’s graph engine based on open-source Gremlin, fully reliant on Alibaba Cloud infrastructure, and has commercial applications as a cloud product. It offers high availability, easy maintenance, and supports ACID transactions.
GraphCompute
Supports Gremlin
Batch: MaxCompute
Incremental: Blink, SDK
Based on Dataworks
Based on Dataworks
Product developed in collaboration with Alibaba DAMO Academy, self-built distributed execution engine, focused on single-node parallel execution optimization in graph computing. Built-in algorithms for graph queries and analysis, with high support for machine learning scenarios.
Seamlessly integrated with Dataworks, good user interface
TuGraph
Self-developed graph query language, partial Gremlin syntax
Ant Group is currently open for trial
Self-developed IDE
Distributed graph database developed for the special needs of the financial sector’s complex giant networks and ultra-large real-time updating data, offering high performance and availability.
Neo4j
Supports Cypher
Open-source product
It can be seen that most of the databases mentioned above support Gremlin syntax. Due to the business requirement for ACID transaction guarantees for write consistency, GDB was chosen as the data storage.
In the friendly links on the TinkerPop official website, some API Docs about tinkerpop-java can be found. However, in actual use, we found that many APIs provided in the frameworks would execute incorrectly on GDB, leading to errors such as:
org.apache.tinkerpop.gremlin.driver.exception.ResponseException: Invalid OpProcessor requested
such errors.
Searching online for GDB, it is actually difficult to obtain a relatively complete literature solution. Therefore, this article shares the experiences related to GDB.
Background

Due to the need to process and store graph data in the business, GDB was chosen as the repository. Most of the official examples from GDB involve directly submitting string-type scripts (Script) for CRUD operations on the database. However:

  1. Many initial filtering conditions for business logic are similar, and subsequent operations are dispersed. Since Gremlin syntax resembles chain operations, directly using string concatenation cannot predict the data type returned from the previous step in the code, leading to missing or incorrect logic in the code;

  2. Node writing and edge addition operations rely on complex business deduplication logic, which need to be wrapped in transactions, but the official documentation does not provide examples of transaction operations;

  3. Some APIs provided by TinkerPop cannot be used on GDB, resulting in strange errors (e.g., SessionedClient, tx, Transaction, etc.).

Additionally, my backend coding skills need improvement; the example’s layering is for reference only, and corrections and discussions are welcome [manual dog head].

Dependency References

After synthesizing various materials, the best maven dependency for GDB is:

https://github.com/aliyun/alibabacloud-gdb-java-sdk

After using git clone, run maven install (there is no available release package in the maven public repo).

Because:

  1. The official TinkerPop maven package does not implement transaction support for GDB

  2. The GDB-Java-SDK version 1.0.1 does not have source code or documentation

In the alibabacloud-gdb-java-driver dependency, the tinkerpop’s gremlin-driver and gremlin-core packages are referenced with version 3.4.2; it also includes references to junit.jupiter and slf4j. Please pay attention to package exclusions.

The following logic will be introduced based on the APIs provided in GDB-JAVA-SDK.

DAO Layer Encapsulation

Since GDB supports both Script and Bytecode submission modes, the DAO layer can be generated in the following ways:

  1. Implement Mybatis interface and template, and implement script template concatenation; someone has patented this capability (address: http://www.xjishu.com/zhuanli/55/202210251927.html); — this implementation is quite cumbersome, and examples will be added later

  2. Concatenate scripts using StringBuffer, and submit scripts and parameters through the GDBClient’s submitAsync method—note that parameters are obtained through the HashMap key;

    public String insert(DataDo dataDO) throws CoreSystemException {    //Param Checker...        StringBuilder stringBuffer = new StringBuilder();    //Fill entity Label    stringBuffer.append(String.format("g.addV('%s')", dataDO.getLabel()));        //Fill entity properties, note that the parameters filled in the For loop are based on property as Key    for (String property : dataDO.getPropertyCreationParamList()) {        stringBuffer.append(String.format(".property('%s', %s)", property, property));    }    stringBuffer.append(String.format(".property('propId', '%s')", id));    stringBuffer.append(String.format(".property('creatorId', %d)", dataDO.getCreatorId()));    stringBuffer.append(String.format(".property('area', %d)", dataDO.getArea()));        //Generate entity id    String id = UUID.randomUUID().toString();    //Set id    stringBuffer.append(String.format(".property(id, '%s')", id));    //Submit result, note to include a HashMap<String, Object> parameter as fill parameters. For final replacement of values    List<Result> results = gdbClient.queryResultSync(stringBuffer.toString(),                                                     dataDO.getPropertyCreationParamMap());        //Result processing    dataDO.setId(id);    if (results.size() != 1) {        return null;    }    return id;}
  3. Construct the operation flow through GraphTraversal, and compile it into Bytecode for submission through Client. The operation flow concatenation

    public GraphTraversal insertTraversal(DataDO dataDO, GraphTraversal traversal) {  //Param Checker...    //Fill entity Label  traversal = traversal.addV(knowledgePropNodeDO.getLabel());    //Fill entity properties, note that the parameters filled in the For loop are specific values  for (String property : knowledgePropNodeDO.getPropertyCreationParamList()) {    traversal = traversal.property(property, knowledgePropNodeDO.getPropertyCreationParamMap().get(property));  }  traversal = traversal.property("propId", id)    .property("creatorId", knowledgePropNodeDO.getCreatorId())    .property("area", knowledgePropNodeDO.getArea());    //Generate id  String id = UUID.randomUUID().toString();  knowledgePropNodeDO.setId(id);  traversal = traversal.property(T.id, id);    //Do event log append...    //Return the concatenated operation flow without executing the actual insertion  return traversal;}

To submit the operation flow, use the GdbClient.submit method (without timeout, internally calls submitAsync method directly get) or exec method (with timeout parameter, internally calls submitAsync method to loop query completion status).

Note

In the two cases of 2/3, the script concatenated through String supports variable parameter predicates such as within, without; whereas under Bytecode, such operations will report necessary parameter missing exceptions: actual identifier, but expect expect : boolean int long float double string as shown in the image below

Using TinkerPop Framework for CRUD Operations on GDB

TinkerPop (version 3.6.0 or above) supports custom DSL, allowing Gremlin operations to be repackaged into domain-specific semantic operations through annotations. This allows direct invocation of some methods of DO for operation flow orchestration, such as:

Add dependency:

<dependency>  <groupId>org.apache.tinkerpop</groupId>  <artifactId>gremlin-annotations</artifactId>  <version>3.6.0</version><!-- It is best to align the version number with GdbClient's dependency --></dependency>

    Add interface:

@GremlinDslpublic interface SocialTraversalDsl<S, E> extends GraphTraversal.Admin<S, E> {    public default GraphTraversal<S, Vertex> knows(String personName) {        return out("knows").hasLabel("person").has("name", personName);    }            public default <E2 extends Number> GraphTraversal<S, E2> youngestFriendsAge() {        return out("knows").hasLabel("person").values("age").min();    }            public default GraphTraversal<S, Long> createdAtLeast(int number) {        return outE("created").count().is(P.gte(number));    }}

Introduce the interface and use it (note that the knows method can be used here)

SocialTraversalSource social = traversal(SocialTraversalSource.class).withEmbedded(graph);social.V().has("name","marko").knows("josh");
Service Layer Encapsulation

Here, through transactional/non-transactional business operations, we connect the necessary DAOs.

In the TinkerPop framework, a single no Session submission to the client is equivalent to a transaction processing; if an error occurs during computation, the entire statement does not take effect.

By using the g.tx() method, a Transaction can be obtained for manual management of multiple statements execution, with operations for transaction begin, commit, close, and rollback, enabling the opening, committing, closing, and rolling back of transactions.

Therefore, in the Service layer, operations are divided into NoSession operations and Session operations based on whether transactions are needed.

Generally, create, modify, and delete operations are categorized as Session operations (because they need to determine whether to execute subsequent operations based on the success of the previous query’s conditions), especially for insert operations, which require the insertion of points and corresponding relationship edges. Until the point insertion is successful, the point’s ID cannot be known, thus preventing direct continuous operations from completing the edge insertion—only two separate operations can be used to accomplish this.

No Session operations will not be elaborated here; they only need to be submitted according to the complete Script or Bytecode and obtain execution results.

TinkerPop Official Usage Methods

According to the official example, when the database supports transactions, we can use g.tx() to obtain a Transaction object. Then, through the Transaction object’s begin method, a new GraphTraversalSource object can be obtained for coherent operations. This object can be used multiple times for CRUD operations, and results can be obtained for logical judgments. Finally, commit or rollback can be chosen to confirm or roll back operations.

As shown in the following example:

GraphTraversalSource g = traversal().withRemote(conn);Transaction tx = g.tx();// spawn a GraphTraversalSource from the Transaction. Traversals spawned// from gtx will be essentially be bound to tx// Start transactionGraphTraversalSource gtx = tx.begin();try {    //Add vertex 1    List<Vertex> personResult = gtx.addV("person").property("name", "John").toList();    Assert.isTrue(!CollectionUtils.isEmpty(personResult), "Failed in create PersonNode");    //Add vertex 2    List<Vertex> softwareResult = gtx.addV("software").property("name", "YYDS").toList();    Assert.isTrue(!CollectionUtils.isEmpty(personResult), "Failed in create SoftwareNode");    //Add relationship from vertex 1 to vertex 2    List<Edge> edgeCreateResult = gtx.addE("create")      .from(gtx.V(personResult.get(0).id())).to(gtx.V(softwareResult.get(0).id())).toList();    //Submit result    tx.commit();    log.info("Success create Record [{} -create-> {}]", "John", "YYDS");    return edgeCreateResult.get(0);} catch (Exception ex) {    //Rollback transaction    tx.rollback();    throw ex;} finally {    tx.close();}

In this example, to completely create a record “John create YYDS”, two vertex insert operations need to be executed on the database. The final edge relationship depends on the IDs of the two vertices, so subsequent operations can only be executed after the previous operations succeed, achieving a submission through gtx generated by tx, which can be rolled back on failure.

GDB Practical Usage

However, following TinkerPop’s official examples directly in GDB will result in an Invalid OpProcessor exception. The reason is that GDB does not support the g.tx() API operations [see Gremlin (address: https://help.aliyun.com/document_detail/102883.html) compatibility; actually, submitting Script strings can open transactions, and this is how open, commit, and rollback methods are implemented in the SDK’s GdbClient]. Therefore, it is impossible to directly obtain a Transaction object.

In fact, in the official framework, the Transaction object wraps a SessionedClient object. When this Client performs operation submissions, it adds a SessionId and additional SessionSettings parameters to confirm/rollback operations within the same transaction.

In the GDB SDK, a special GdbClient class has been implemented to change the submission logic of Bytecode transactions into a partial Script submission method. Therefore, in GDB, transactions can be implemented by submitting scripts like “g.tx().open()” to achieve the same transaction switch as the official ScriptDemo; transactions can also be submitted using methods provided by GdbClient.

Here, only the method of submitting Bytecode is given (the logic of Script mode operations is consistent with the official example; each operation flow is just changed to be submitted as a String using client.submit).

g.tx() API operations are not supported (seeGremlin compatibility; actually, submitting Script strings can open transactions, and this is how open, commit, and rollback methods are implemented in the SDK’s GdbClient), therefore it is impossible to directly obtain a Transaction object.

In fact, in the official framework, the Transaction object wraps a SessionedClient object. When this Client performs operation submissions, it adds a SessionId and additional SessionSettings parameters to confirm/rollback operations within the same transaction.

In the GDB SDK, a special GdbClient class has been implemented to change the submission logic of Bytecode transactions into a partial Script submission method. Therefore, in GDB, transactions can be implemented by submitting scripts like “g.tx().open()” to achieve the same transaction switch as the official ScriptDemo; transactions can also be submitted using methods provided by GdbClient.

Here, only the method of submitting Bytecode is given (the logic of Script mode operations is consistent with the official example; each operation flow is just changed to be submitted as a String using client.submit).

//Create Cluster connection pool and obtain SessionedClientGdbClient.SessionedClient client = GdbCluster.build(gdbAHost).port(gdbAPort).credentials(gdbAUsername, gdbAPassword)        .serializer(Serializers.GRAPHBINARY_V1D0)        .create().connect(UUID.randomUUID().toString()).init();//Used to store execution resultsStringBuffer result = new StringBuffer();try {    //Key point: call batchTransaction method to implement transaction submission    client.batchTransaction((tx, gtx) -> {        try {            //Key point: call tx (actually GdbClient) exec method to submit Bytecode operation flow and obtain execution results            //Cannot directly use gtx's toList method (will produce Invalid OpProcessor exception)            List<Result> personResult = tx.exec(                    gtx.addV("person").property("name", "John")                    , GQL_TIMEOUNT            );            Assert.isTrue(!CollectionUtils.isEmpty(personResult), "Failed in create PersonNode");            List<Result> softwareResult = tx.exec(                    gtx.addV("software").property("name", "YYDS")                    , GQL_TIMEOUNT            );            Assert.isTrue(!CollectionUtils.isEmpty(personResult), "Failed in create SoftwareNode");            //Key point: the returned result is different from directly executing bytecode            Vertex person = personResult.get(0).getVertex();            Vertex software = softwareResult.get(0).getVertex();            List<Result> edgeCreateResult = tx.exec(                    gtx.addE("create").from(gtx.V(person.id())).to(gtx.V(software.id()))                    , GQL_TIMEOUNT            );            log.info("Success create Record [{} -create-> {}]", "John", "YYDS");            result.append(edgeCreateResult.get(0).getEdge().id());        } catch (Exception ex) {            //log it            throw ex;        }    });}finally {    //Key point: always manually close the client    client.close();}return result.toString();

All key points marked in this example must be noted, as these practices cannot be replaced by TinkerPop’s standard practices.

Conclusion

This article documents the differences in transaction operations in GDB and the common TinkerPop framework, and collects effective GDB help documents and related information. Opportunity permitting, we can further discuss common usage and optimizations of graph databases for graph retrieval.

Team Introduction

We are the New Products Platform Technical Team of the Taobao Technology Department, relying on Taobao’s big data to establish a complete set of new product development and innovation incubation platforms covering consumer insights, macro and segmented market analysis, competitive analysis, market strategy research, and product innovation mechanisms, providing scalable new product incubation and operational capabilities for brands, merchants, and industries, consolidating new product incubation mechanisms and operational strategies, and establishing a complete new product operation platform driven by big data from market research to new product R&D to new product marketing.

Further Reading

Using TinkerPop Framework for CRUD Operations on GDB

Using TinkerPop Framework for CRUD Operations on GDB

Author|Mao Diao (Jing Yuchen)
Editor|Chengzi Jun
Using TinkerPop Framework for CRUD Operations on GDB

Leave a Comment