rpuga Comments - Answer Overflow

rpuga

Explore posts from servers

•Created by rpuga on 12/18/2024 in #questions

Running OLAP queries on Janusgraph outside the Gremlin Console (from Java and G.V())

For the rest of the solution to this post, see my related by separate post titled: "Graph does not support the provided graph computer: SparkGraphComputer"

8 replies

JJanusGraph

•Created by rpuga on 1/6/2025 in #questions

Graph does not support the provided graph computer: SparkGraphComputer

@gdotv , I wanted to mention that the above solution answers the related but separate question I asked previously on running OLAP queries from G.V(). It may be useful to other GV() users as well. 😉

7 replies

JJanusGraph

•Created by rpuga on 1/6/2025 in #questions

Graph does not support the provided graph computer: SparkGraphComputer

OK, finally I was able to figure out the issue on my own. To make things work, since JanusGraphFactory fails to create a graph, I replaced

graphManager: org.janusgraph.graphdb.management.JanusGraphManager

graphManager: org.janusgraph.graphdb.management.JanusGraphManager

with

graphManager: org.apache.tinkerpop.gremlin.server.util.DefaultGraphManager

graphManager: org.apache.tinkerpop.gremlin.server.util.DefaultGraphManager

The rest of the relevant parts of the .yaml file look like this:

host: 0.0.0.0
port: 8182
evaluationTimeout: 30000
channelizer: org.apache.tinkerpop.gremlin.server.channel.WebSocketChannelizer
# graphManager: org.janusgraph.graphdb.management.JanusGraphManager
graphManager: org.apache.tinkerpop.gremlin.server.util.DefaultGraphManager
graphs: {
  graph: conf/janusgraph-cql-es-server.properties,
  olapgraph: conf/hadoop-graph/spark-cql-es.properties
}
scriptEngines: {
  gremlin-groovy: {
    plugins: { org.janusgraph.graphdb.tinkerpop.plugin.JanusGraphGremlinPlugin: {},
         .......      org.apache.tinkerpop.gremlin.jsr223.ScriptFileGremlinPlugin: {files: [scripts/spark-janusgraph.groovy]}}}}

host: 0.0.0.0
port: 8182
evaluationTimeout: 30000
channelizer: org.apache.tinkerpop.gremlin.server.channel.WebSocketChannelizer
# graphManager: org.janusgraph.graphdb.management.JanusGraphManager
graphManager: org.apache.tinkerpop.gremlin.server.util.DefaultGraphManager
graphs: {
  graph: conf/janusgraph-cql-es-server.properties,
  olapgraph: conf/hadoop-graph/spark-cql-es.properties
}
scriptEngines: {
  gremlin-groovy: {
    plugins: { org.janusgraph.graphdb.tinkerpop.plugin.JanusGraphGremlinPlugin: {},
         .......      org.apache.tinkerpop.gremlin.jsr223.ScriptFileGremlinPlugin: {files: [scripts/spark-janusgraph.groovy]}}}}

and, importantly, scripts/spark-janusgraph.groovy looks like this:

def globals = [:]

globals << [hook : [
        onStartUp: { ctx ->
            ctx.logger.info("Executed once at startup of Gremlin Server.")
        },
        onShutDown: { ctx ->
            ctx.logger.info("Executed once at shutdown of Gremlin Server.")
        }
] as LifeCycleHook]

globals << [g : graph.traversal(), og : olapgraph.traversal().withComputer(org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer)]

def globals = [:]

globals << [hook : [
        onStartUp: { ctx ->
            ctx.logger.info("Executed once at startup of Gremlin Server.")
        },
        onShutDown: { ctx ->
            ctx.logger.info("Executed once at shutdown of Gremlin Server.")
        }
] as LifeCycleHook]

globals << [g : graph.traversal(), og : olapgraph.traversal().withComputer(org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer)]

7 replies

JJanusGraph

•Created by rpuga on 1/6/2025 in #questions

Graph does not support the provided graph computer: SparkGraphComputer

I wanted to provide an update on this issue. From the janusgraph server logs (attached), it's clear that the janusgraph server uses JanusGraphFactory to open the graph

org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:115) ~[janusgraph-core-1.1.0.jar:?]

org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:115) ~[janusgraph-core-1.1.0.jar:?]

The same error can be reproduced in the gremlin console:

plugin activated: tinkerpop.hadoop
plugin activated: tinkerpop.spark
plugin activated: tinkerpop.utilities
plugin activated: janusgraph.imports
gremlin>
gremlin> graph = JanusGraphFactory.open("conf/hadoop-graph/spark-cql-es.properties")
23:18:43 INFO  org.apache.commons.beanutils.FluentPropertyBeanIntrospector.introspect - Error when creating PropertyDescriptor for public final void org.apache.commons.configuration2.AbstractConfiguration.setProperty(java.lang.String,java.lang.Object)! Ignoring this property.
Need to set configuration value: root.storage.backend
Type ':help' or ':h' for help.
Display stack trace? [yN]n

plugin activated: tinkerpop.hadoop
plugin activated: tinkerpop.spark
plugin activated: tinkerpop.utilities
plugin activated: janusgraph.imports
gremlin>
gremlin> graph = JanusGraphFactory.open("conf/hadoop-graph/spark-cql-es.properties")
23:18:43 INFO  org.apache.commons.beanutils.FluentPropertyBeanIntrospector.introspect - Error when creating PropertyDescriptor for public final void org.apache.commons.configuration2.AbstractConfiguration.setProperty(java.lang.String,java.lang.Object)! Ignoring this property.
Need to set configuration value: root.storage.backend
Type ':help' or ':h' for help.
Display stack trace? [yN]n

whereas using GraphFactory instead of JanusGraphFactory works fine:

gremlin> graph = GraphFactory.open("conf/hadoop-graph/spark-cql-es.properties")
==>hadoopgraph[cqlinputformat->nulloutputformat]
gremlin>

gremlin> graph = GraphFactory.open("conf/hadoop-graph/spark-cql-es.properties")
==>hadoopgraph[cqlinputformat->nulloutputformat]
gremlin>

Why is janusgraphmr.ioformat.conf.storage.backend=cql read/accepted by GraphFactory but not by JanusGraphFactory?

7 replies

JJanusGraph

•Created by rpuga on 12/18/2024 in #questions

Running OLAP queries on Janusgraph outside the Gremlin Console (from Java and G.V())

BTW: I'm using JanusGraph 1.1.0. Through lots of trial and error, I found these maven dependencies to work for me:

spark-gremlin 3.7.3
cassandra-hadoop-util 1.1.0
janusgraph-core 1.1.0
janusgraph-cql 1.1.0
janusgraph-es 1.1.0
janusgraph-hadoop 1.1.0

slf4j-api 2.0.13
slf4j-simple 2.0.13
log4j-slf4j-impl 2.23.1
log4j-core 2.23.1
jackson-databind 2.17.2
jackson-annotations 2.17.2
jackson-core 2.17.2
netty-transport-native-epoll 4.1.109.Final

spark-gremlin 3.7.3
cassandra-hadoop-util 1.1.0
janusgraph-core 1.1.0
janusgraph-cql 1.1.0
janusgraph-es 1.1.0
janusgraph-hadoop 1.1.0

slf4j-api 2.0.13
slf4j-simple 2.0.13
log4j-slf4j-impl 2.23.1
log4j-core 2.23.1
jackson-databind 2.17.2
jackson-annotations 2.17.2
jackson-core 2.17.2
netty-transport-native-epoll 4.1.109.Final

8 replies

JJanusGraph

•Created by rpuga on 12/18/2024 in #questions

Running OLAP queries on Janusgraph outside the Gremlin Console (from Java and G.V())

Thanks, @gdotv. It would be great if the JanusGraph folks can follow up on how to expse a GraphTraversalSource. In the meantime, I've been able to make progress on the question about using Java, by following this old-ish post by @Bo : https://li-boxuan.medium.com/spark-on-janusgraph-tinkerpop-a-pagerank-example-43950189b159 I've been able to include all missing dependencies and compile/run this example:

package org.example.gdb;

import org.apache.commons.configuration2.Configuration;
...
import static org.apache.tinkerpop.gremlin.hadoop.Constants.GREMLIN_HADOOP_OUTPUT_LOCATION;

import java.lang.Long;

public class JGGremlinSpark {
    public static void main(String[] args) throws Exception {
        Configuration sparkGraphConfiguration = getSparkGraphConfig();
        sparkGraphConfiguration.setProperty(Constants.GREMLIN_SPARK_GRAPH_STORAGE_LEVEL, "MEMORY_AND_DISK");
        sparkGraphConfiguration.setProperty(GREMLIN_HADOOP_GRAPH_WRITER, GraphSONOutputFormat.class.getCanonicalName());
        sparkGraphConfiguration.setProperty(GREMLIN_HADOOP_OUTPUT_LOCATION, "/home/hadoop/jgspark_test/hadoop_output");
        sparkGraphConfiguration.setProperty(SparkLauncher.EXECUTOR_MEMORY, "1g");
        Graph graph = GraphFactory.open(sparkGraphConfiguration);

        long startTime = System.currentTimeMillis();
        GraphTraversalSource g = graph.traversal().withComputer(SparkGraphComputer.class);
        final Long vCount = g.V().count().next();
        final Long eCount = g.E().count().next();
        System.out.println("V count = " + vCount);
        System.out.println("E count = " + eCount);
        long duration = (System.currentTimeMillis() - startTime) / 1000;
        System.out.println("Finished JGGremlinSpark test - elapsed time = " + duration + " seconds.");
    }
}

package org.example.gdb;

import org.apache.commons.configuration2.Configuration;
...
import static org.apache.tinkerpop.gremlin.hadoop.Constants.GREMLIN_HADOOP_OUTPUT_LOCATION;

import java.lang.Long;

public class JGGremlinSpark {
    public static void main(String[] args) throws Exception {
        Configuration sparkGraphConfiguration = getSparkGraphConfig();
        sparkGraphConfiguration.setProperty(Constants.GREMLIN_SPARK_GRAPH_STORAGE_LEVEL, "MEMORY_AND_DISK");
        sparkGraphConfiguration.setProperty(GREMLIN_HADOOP_GRAPH_WRITER, GraphSONOutputFormat.class.getCanonicalName());
        sparkGraphConfiguration.setProperty(GREMLIN_HADOOP_OUTPUT_LOCATION, "/home/hadoop/jgspark_test/hadoop_output");
        sparkGraphConfiguration.setProperty(SparkLauncher.EXECUTOR_MEMORY, "1g");
        Graph graph = GraphFactory.open(sparkGraphConfiguration);

        long startTime = System.currentTimeMillis();
        GraphTraversalSource g = graph.traversal().withComputer(SparkGraphComputer.class);
        final Long vCount = g.V().count().next();
        final Long eCount = g.E().count().next();
        System.out.println("V count = " + vCount);
        System.out.println("E count = " + eCount);
        long duration = (System.currentTimeMillis() - startTime) / 1000;
        System.out.println("Finished JGGremlinSpark test - elapsed time = " + duration + " seconds.");
    }
}

8 replies

JJanusGraph

•Created by HailDevil on 10/29/2024 in #questions

Edge cardinality SIMPLE documentation not clear

I remember running an example test for the same question. If I remember correctly, direction does count and you can add one edge per direction: A->B and B->A, as by default you would be working with a directed graphs. This should also be quite easy to test on a small toy graph.

4 replies

JJanusGraph

•Created by rpuga on 5/27/2024 in #questions

Unable to load GraphSON file

Ah, got it. Given the similarity between the exported JSON and GraphSON v1.0 I though they were the same. Thanks for the clarification, @gdotv. Interestingly, the example GraphSON v1.0 from the TinkerPop documentation also does not work, though I did not think it was deprecated (maybe it's mentioned somewhere in the docs and I missed it...)

8 replies

JJanusGraph

•Created by rpuga on 5/26/2024 in #questions

Gremlin statement exceeds the maximum compilation size

I see... unfortunately, it seems that this is a hard constraint. Thanks.

5 replies

JJanusGraph

•Created by rpuga on 5/27/2024 in #questions

Unable to load GraphSON file

Basically, I used the G.V() IDE -- @gdotv -- to create a single node in a graph and then exported it to JSON format. Then, I'm trying to import that single node graph into JanusGraph. To my understanding, the JSON format exported by G.V() follows the GraphSON version 1.0 documented here: https://tinkerpop.apache.org/docs/3.7.2/dev/io/#graphson so, I thought that importing it should also work. Here is similar example, taken from the TinkerPop documentation (I copied only the first node from the TinkerGraph GraphSON example and reformatted it to properly close parenthesis, etc.):

{"vertices":[{"id":1,"label":"person","type":"vertex","properties":{"name":[{"id":0,"value":"marko"}]}}]}

{"vertices":[{"id":1,"label":"person","type":"vertex","properties":{"name":[{"id":0,"value":"marko"}]}}]}

I get the same Label can not be null when trying to load this in a G.V() playground as well. So it seems this is not an issue that is unique to JanusGraph. Maybe this is because GraphSON version 1.0 is not supported anymore? Or maybe there is a way to specify that the GraphSON format to use when loading the grpah is v1.0?

8 replies

JJanusGraph

•Created by rpuga on 4/21/2024 in #questions

~20% write performance hit when using custom str IDs?

Hi @Boxuan Li, here is an example: Int ID: 3232243223 Str ID: "3232243223" Everything else except the ID data type is exactly the same. Each inserted vertex has 2 properties (besides a label and the custom ID): a Long and a String. I expected a bit of write performance degradation from using custom str IDs, but I was somewhat surprised to see a ~20% perfromance impact, which seems quite significant. That's why I was wondering if this is a known issue.

13 replies

ATApache TinkerPop

•Created by rpuga on 3/23/2024 in #questions

mergeE(): increment counter on match

Thanks @Kelvin Lawrence, your solution works great! Adapting it to the example graph I was using, it looks like this:

g.mergeE([(T.label):'called', (from):p1, (to):p2]).
  option(Merge.onCreate, ['num_calls': 1]).
  option(Merge.onMatch, property('num_calls', union(values('num_calls'), constant(1)).sum()).constant([:]))

g.mergeE([(T.label):'called', (from):p1, (to):p2]).
  option(Merge.onCreate, ['num_calls': 1]).
  option(Merge.onMatch, property('num_calls', union(values('num_calls'), constant(1)).sum()).constant([:]))

16 replies

ATApache TinkerPop

•Created by dracule_redrose on 3/22/2024 in #questions

Serialization Issue

I have faced a similar issue in the past (but mostly related to gremlin-python) and @Boxuan Li suggested a solution in the JanusGraph discord server. It was something like along these lines:

private static MessageSerializer createGraphBinaryMessageSerializerV1() {
  final GraphBinaryMessageSerializerV1 serializer = new GraphBinaryMessageSerializerV1();
  final Map<String, Object> config = new HashMap<>();
  config.put(GraphBinaryMessageSerializerV1.TOKEN_IO_REGISTRIES, Collections.singletonList(JanusGraphIoRegistry.class.getName()));
  serializer.configure(config, Collections.emptyMap());
  return serializer;
}

private static MessageSerializer createGraphBinaryMessageSerializerV1() {
  final GraphBinaryMessageSerializerV1 serializer = new GraphBinaryMessageSerializerV1();
  final Map<String, Object> config = new HashMap<>();
  config.put(GraphBinaryMessageSerializerV1.TOKEN_IO_REGISTRIES, Collections.singletonList(JanusGraphIoRegistry.class.getName()));
  serializer.configure(config, Collections.emptyMap());
  return serializer;
}

and

Cluster cluster = Cluster.build()
    .addContactPoint(gremlinServer)
    .port(gremlinServerPort)
    // .serializer(new GraphBinaryMessageSerializerV1(typeSerializerRegistry))
    .serializer(createGraphBinaryMessageSerializerV1())
    .create();

Cluster cluster = Cluster.build()
    .addContactPoint(gremlinServer)
    .port(gremlinServerPort)
    // .serializer(new GraphBinaryMessageSerializerV1(typeSerializerRegistry))
    .serializer(createGraphBinaryMessageSerializerV1())
    .create();

Also, here is how he suggested to setup the serializers in the JanusGraph config file: https://github.com/Citegraph/citegraph/blob/main/backend/src/main/resources/gremlin-server-cql.yaml I hope this leads you closer to a solution.

5 replies

ATApache TinkerPop

•Created by rpuga on 3/23/2024 in #questions

mergeE(): increment counter on match

BTW, I was able to obtain the result I need with a query that looks like this:

p1 = g.addV('person').property('name', 'marko').next()
p2 = g.addV('person').property('name', 'maria').next()
g.V(p1).as("v1").
V(p2).as("v2").
coalesce(
    select("v1").outE("called").where(inV().has(id, select("v2").id())),
    addE("called").from("v1").to("v2").property("num_calls", 0)).
    as("e").
property(
    "num_calls",
    union(select("e").values("num_calls").unfold(), constant(1)).sum())

p1 = g.addV('person').property('name', 'marko').next()
p2 = g.addV('person').property('name', 'maria').next()
g.V(p1).as("v1").
V(p2).as("v2").
coalesce(
    select("v1").outE("called").where(inV().has(id, select("v2").id())),
    addE("called").from("v1").to("v2").property("num_calls", 0)).
    as("e").
property(
    "num_calls",
    union(select("e").values("num_calls").unfold(), constant(1)).sum())

but I'm wondering how something similar can be done with mergeE()

16 replies

JJanusGraph

•Created by rpuga on 1/21/2024 in #questions

Changing default ES index name prefix

I was able to submit a pull request with the documentation fix (#4225)

10 replies

JJanusGraph

•Created by rpuga on 1/21/2024 in #questions

Changing default ES index name prefix

Sure, I'd be happy to contribute. I have not done this before, so I'll first need to read the related info and guidelines for contributing to the documentation (I found the janusgraph/CONTRIBUTING.md guide on GitHub).

10 replies

JJanusGraph

•Created by rpuga on 1/21/2024 in #questions

Changing default ES index name prefix

It worked, thank you so much! Does this mean that this page in the JanusGraph docs is incorrect? https://docs.janusgraph.org/index-backend/elasticsearch/ It explicitly mentions index.[X].elasticsearch.index-name as the option name, whereas the following page correctly indicates index.[X].index-name as the option to use. https://docs.janusgraph.org/configs/configuration-reference/

10 replies

JJanusGraph

•Created by rpuga on 1/21/2024 in #questions

Changing default ES index name prefix

@Oleksandr Porunov, thank you for your answer. I tried to start from a completely new instance of JanusGraph, Cassandra, and Elastisearch (a single node cluster setting from both the server and storage/index backends). Although I set the index-name to a different name, JanusGraph still used janusgraph as the prefix for the ES indices. I think there is something wrong I'm doing in the configuration file. Here is what I currently have:

gremlin.graph = org.janusgraph.core.JanusGraphFactory

storage.backend = cql
storage.hostname = cassandra1
storage.cql.keyspace = jgidxtest
cache.db-cache = true
cache.db-cache-clean-wait = 20
cache.db-cache-time = 180000
cache.db-cache-size = 0.25

index.search.backend = elasticsearch
index.search.hostname = elasticsearch1
index.search.elasticsearch.index-name = jgidxtest

gremlin.graph = org.janusgraph.core.JanusGraphFactory

storage.backend = cql
storage.hostname = cassandra1
storage.cql.keyspace = jgidxtest
cache.db-cache = true
cache.db-cache-clean-wait = 20
cache.db-cache-time = 180000
cache.db-cache-size = 0.25

index.search.backend = elasticsearch
index.search.hostname = elasticsearch1
index.search.elasticsearch.index-name = jgidxtest

Dispite changing the name, ES created these indices:

$ curl -XGET http://localhost:9200/_cat/indices/
yellow open janusgraph_lastseenemixedindex  VOTTeC_5T9-UqcpUIKCKww 1 1 0 0 227b 227b 227b
yellow open janusgraph_namevmixedindex      sfqlgiy4QY66T_vnUI1gsw 1 1 0 0 227b 227b 227b
yellow open janusgraph_firstseenemixedindex zkD21PZjRRSvuPtC5gy74A 1 1 0 0 227b 227b 227b

$ curl -XGET http://localhost:9200/_cat/indices/
yellow open janusgraph_lastseenemixedindex  VOTTeC_5T9-UqcpUIKCKww 1 1 0 0 227b 227b 227b
yellow open janusgraph_namevmixedindex      sfqlgiy4QY66T_vnUI1gsw 1 1 0 0 227b 227b 227b
yellow open janusgraph_firstseenemixedindex zkD21PZjRRSvuPtC5gy74A 1 1 0 0 227b 227b 227b

whereas I was expecteing indices that start with jgidxtest_. I feel that I'm doing something wrong when setting index.{yourIndex}.index-name. Specifically, I'm not entirely sure about what {yourIndex} should be. In the JanusGraph docs, it mentions index.[X].elasticsearch.index-name but again, I'm not sure what [X] should be in my case. I've seen some examples online where [X] is simply replaced with search, which is what I did, but that does not seem to work. Any thoughts on what the correct configuration line should be?

10 replies

Gaming

Programming