JanusGraph•5mo ago

Running OLAP queries on Janusgraph outside the Gremlin Console (from Java and G.V())

Hi, I'm able to run OLAP queries against my graph DB from the Gremlin Console, by following the directions provided here: https://docs.janusgraph.org/advanced-topics/hadoop/ However, I would like to also run OLAP queries without using the console, from an embedded Janusgraph Java application as well as from G.V(). In G.V(), I tried this while selecting Groovy Mode for query submission:

graph = GraphFactory.open('/opt/janusgraph/conf/hadoop-graph/spark-cql.properties')
g = graph.traversal().withComputer(SparkGraphComputer)
g.V().groupCount().by(label()).toList()

graph = GraphFactory.open('/opt/janusgraph/conf/hadoop-graph/spark-cql.properties')
g = graph.traversal().withComputer(SparkGraphComputer)
g.V().groupCount().by(label()).toList()

but I get the following error:

No such property: SparkGraphComputer for class: Script10

No such property: SparkGraphComputer for class: Script10

I'm assuming this is because the needed plugins are not loaded. I tried:

:plugin use tinkerpop.hadoop
:plugin use tinkerpop.spark

:plugin use tinkerpop.hadoop
:plugin use tinkerpop.spark

but this does not work outside of the console. Any suggestions, @gdotv ? Also, is there any example code I could use to run OLAP queries from a Java application? Thanks!

Solution:

Thanks, @gdotv. It would be great if the JanusGraph folks can follow up on how to expse a GraphTraversalSource. In the meantime, I've been able to make progress on the question about using Java, by following this old-ish post by @Bo : https://li-boxuan.medium.com/spark-on-janusgraph-tinkerpop-a-pagerank-example-43950189b159 ...

Jump to solution

3 Replies

Arthur from gdotv•5mo ago

Hey, good question! G.V() isn't designed to run against embedded graphs as it will attempt to connect to a graph database via the Gremlin driver. In your case I believe you would need to expose a graph traversal source to your in memory graph on your server. G.V() is kind of a console replacement and not at the same time as it only allows running scripts or queries against the server, and does not include the more advanced console plug ins such as hadoop Hopefully this provides more context - maybe folks on the JanusGraph team can provide some more information on how to expose a graph traversal source from a spark computed job

Solution

rpuga•5mo ago

package org.example.gdb;

import org.apache.commons.configuration2.Configuration;
...
import static org.apache.tinkerpop.gremlin.hadoop.Constants.GREMLIN_HADOOP_OUTPUT_LOCATION;

import java.lang.Long;

public class JGGremlinSpark {
    public static void main(String[] args) throws Exception {
        Configuration sparkGraphConfiguration = getSparkGraphConfig();
        sparkGraphConfiguration.setProperty(Constants.GREMLIN_SPARK_GRAPH_STORAGE_LEVEL, "MEMORY_AND_DISK");
        sparkGraphConfiguration.setProperty(GREMLIN_HADOOP_GRAPH_WRITER, GraphSONOutputFormat.class.getCanonicalName());
        sparkGraphConfiguration.setProperty(GREMLIN_HADOOP_OUTPUT_LOCATION, "/home/hadoop/jgspark_test/hadoop_output");
        sparkGraphConfiguration.setProperty(SparkLauncher.EXECUTOR_MEMORY, "1g");
        Graph graph = GraphFactory.open(sparkGraphConfiguration);

        long startTime = System.currentTimeMillis();
        GraphTraversalSource g = graph.traversal().withComputer(SparkGraphComputer.class);
        final Long vCount = g.V().count().next();
        final Long eCount = g.E().count().next();
        System.out.println("V count = " + vCount);
        System.out.println("E count = " + eCount);
        long duration = (System.currentTimeMillis() - startTime) / 1000;
        System.out.println("Finished JGGremlinSpark test - elapsed time = " + duration + " seconds.");
    }
}

package org.example.gdb;

import org.apache.commons.configuration2.Configuration;
...
import static org.apache.tinkerpop.gremlin.hadoop.Constants.GREMLIN_HADOOP_OUTPUT_LOCATION;

import java.lang.Long;

public class JGGremlinSpark {
    public static void main(String[] args) throws Exception {
        Configuration sparkGraphConfiguration = getSparkGraphConfig();
        sparkGraphConfiguration.setProperty(Constants.GREMLIN_SPARK_GRAPH_STORAGE_LEVEL, "MEMORY_AND_DISK");
        sparkGraphConfiguration.setProperty(GREMLIN_HADOOP_GRAPH_WRITER, GraphSONOutputFormat.class.getCanonicalName());
        sparkGraphConfiguration.setProperty(GREMLIN_HADOOP_OUTPUT_LOCATION, "/home/hadoop/jgspark_test/hadoop_output");
        sparkGraphConfiguration.setProperty(SparkLauncher.EXECUTOR_MEMORY, "1g");
        Graph graph = GraphFactory.open(sparkGraphConfiguration);

        long startTime = System.currentTimeMillis();
        GraphTraversalSource g = graph.traversal().withComputer(SparkGraphComputer.class);
        final Long vCount = g.V().count().next();
        final Long eCount = g.E().count().next();
        System.out.println("V count = " + vCount);
        System.out.println("E count = " + eCount);
        long duration = (System.currentTimeMillis() - startTime) / 1000;
        System.out.println("Finished JGGremlinSpark test - elapsed time = " + duration + " seconds.");
    }
}

rpugaOP•4mo ago

BTW: I'm using JanusGraph 1.1.0. Through lots of trial and error, I found these maven dependencies to work for me:

spark-gremlin 3.7.3
cassandra-hadoop-util 1.1.0
janusgraph-core 1.1.0
janusgraph-cql 1.1.0
janusgraph-es 1.1.0
janusgraph-hadoop 1.1.0

slf4j-api 2.0.13
slf4j-simple 2.0.13
log4j-slf4j-impl 2.23.1
log4j-core 2.23.1
jackson-databind 2.17.2
jackson-annotations 2.17.2
jackson-core 2.17.2
netty-transport-native-epoll 4.1.109.Final

spark-gremlin 3.7.3
cassandra-hadoop-util 1.1.0
janusgraph-core 1.1.0
janusgraph-cql 1.1.0
janusgraph-es 1.1.0
janusgraph-hadoop 1.1.0

slf4j-api 2.0.13
slf4j-simple 2.0.13
log4j-slf4j-impl 2.23.1
log4j-core 2.23.1
jackson-databind 2.17.2
jackson-annotations 2.17.2
jackson-core 2.17.2
netty-transport-native-epoll 4.1.109.Final

For the rest of the solution to this post, see my related by separate post titled: "Graph does not support the provided graph computer: SparkGraphComputer"

Gaming

Programming

Running OLAP queries on Janusgraph outside the Gremlin Console (from Java and G.V())

Did you find this page helpful?