dracule_redrose Posts - Answer Overflow

dracule_redrose

•Created by dracule_redrose on 3/22/2024 in #questions

Serialization Issue

I have a weird error, when I am connecting with JanusGraph gremlin client using conf/remote-graph-binary.yaml I am able to get results. But when I am trying to use my java application I am getting, java.io.IOException: Serializer for custom type 'janusgraph.RelationIdentifier' not found. Googling around I got that this is due to serialization issue. It looks to me that the gremlin-client and my java application has similar configs but gremlin-client is not having any serialization problem.

hosts: [localhost]
port: 8182
serializer: { className: org.apache.tinkerpop.gremlin.util.ser.GraphBinaryMessageSerializerV1, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}

hosts: [localhost]
port: 8182
serializer: { className: org.apache.tinkerpop.gremlin.util.ser.GraphBinaryMessageSerializerV1, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}

Code setting up the serialization:

import org.apache.tinkerpop.gremlin.structure.io.binary.TypeSerializerRegistry;
import org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry;
import org.apache.tinkerpop.gremlin.util.ser.GraphBinaryMessageSerializerV1;
...
       TypeSerializerRegistry typeSerializerRegistry = TypeSerializerRegistry.build()
                .addRegistry(JanusGraphIoRegistry.getInstance())
                .create();

        // Build cluster and connect client
        Cluster cluster = Cluster.build(host)
                .port(port)
                .serializer(new GraphBinaryMessageSerializerV1(typeSerializerRegistry))
                .maxConnectionPoolSize(1)
                .minConnectionPoolSize(1)
                .maxInProcessPerConnection(1)
                .minSimultaneousUsagePerConnection(1)
                .maxSimultaneousUsagePerConnection(1)
                .create();
        Client client = cluster.connect();
...

import org.apache.tinkerpop.gremlin.structure.io.binary.TypeSerializerRegistry;
import org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry;
import org.apache.tinkerpop.gremlin.util.ser.GraphBinaryMessageSerializerV1;
...
       TypeSerializerRegistry typeSerializerRegistry = TypeSerializerRegistry.build()
                .addRegistry(JanusGraphIoRegistry.getInstance())
                .create();

        // Build cluster and connect client
        Cluster cluster = Cluster.build(host)
                .port(port)
                .serializer(new GraphBinaryMessageSerializerV1(typeSerializerRegistry))
                .maxConnectionPoolSize(1)
                .minConnectionPoolSize(1)
                .maxInProcessPerConnection(1)
                .minSimultaneousUsagePerConnection(1)
                .maxSimultaneousUsagePerConnection(1)
                .create();
        Client client = cluster.connect();
...

5 replies

ATApache TinkerPop

•Created by dracule_redrose on 3/19/2024 in #questions

Design decision related to multiple heterogenous relational graphs

I'm working with over 100k instances of heterogeneous, relational node-and-edge attributed graphs, each graph having around 5k vertices and 10k edges. Vertices are of 3 types with 10 attributes (7 numerical, 3 string), and edges are of 5 types with 8 attributes (4 numerical, 4 string). Considering the complexity and size of the data, running queries like traversal paths, average clustering coefficients, and identifying nodes in clustering triangles across all these instances presents a significant challenge. I've been using a naive gremlin-server setup with an in-memory database to run my queries on one graph instance, but it's becoming clear that this approach isn't sustainable for multi-graph persistence or memory efficiency, as a single graph instance consumes about 1.2 GB of RAM. I'm exploring the possibility of switching to JanusGraph with a Berkeley DB backend to support persistent storage of multiple graphs (based on the feedback I got from the gremlin google group, https://groups.google.com/g/gremlin-users/c/UotOZFVvi3k/m/-hVd2oNNAQAJ). Given the data structure and requirements, especially the need for efficient loading and querying of individual graph instances in a possibly serializable fashion, do you think JanusGraph with Berkeley DB is a viable solution, or are there alternative approaches I should consider for managing and querying this volume of graph data effectively? I tried finding similar question, the closest matching question i found was https://discord.com/channels/838910279550238720/1087383361129037845, but was asking how to manage multiple graphs in gremlin-server.

9 replies

Gaming

Programming