Running OLAP queries on Janusgraph outside the Gremlin Console (from Java and G.V())

Hi, I'm able to run OLAP queries against my graph DB from the Gremlin Console, by following the directions provided here: https://docs.janusgraph.org/advanced-topics/hadoop/ However, I would like to also run OLAP queries without using the console, from an embedded Janusgraph Java application as well as from G.V(). In G.V(), I tried this while selecting Groovy Mode for query submission:
graph = GraphFactory.open('/opt/janusgraph/conf/hadoop-graph/spark-cql.properties')
g = graph.traversal().withComputer(SparkGraphComputer)
g.V().groupCount().by(label()).toList()
graph = GraphFactory.open('/opt/janusgraph/conf/hadoop-graph/spark-cql.properties')
g = graph.traversal().withComputer(SparkGraphComputer)
g.V().groupCount().by(label()).toList()
but I get the following error:
No such property: SparkGraphComputer for class: Script10
No such property: SparkGraphComputer for class: Script10
I'm assuming this is because the needed plugins are not loaded. I tried:
:plugin use tinkerpop.hadoop
:plugin use tinkerpop.spark
:plugin use tinkerpop.hadoop
:plugin use tinkerpop.spark
but this does not work outside of the console. Any suggestions, @gdotv ? Also, is there any example code I could use to run OLAP queries from a Java application? Thanks!
Solution:
Thanks, @gdotv. It would be great if the JanusGraph folks can follow up on how to expse a GraphTraversalSource. In the meantime, I've been able to make progress on the question about using Java, by following this old-ish post by @Bo : https://li-boxuan.medium.com/spark-on-janusgraph-tinkerpop-a-pagerank-example-43950189b159 ...
Jump to solution
3 Replies
gdotv
gdotv2mo ago
Hey, good question! G.V() isn't designed to run against embedded graphs as it will attempt to connect to a graph database via the Gremlin driver. In your case I believe you would need to expose a graph traversal source to your in memory graph on your server. G.V() is kind of a console replacement and not at the same time as it only allows running scripts or queries against the server, and does not include the more advanced console plug ins such as hadoop Hopefully this provides more context - maybe folks on the JanusGraph team can provide some more information on how to expose a graph traversal source from a spark computed job
Solution
rpuga
rpuga2mo ago
Thanks, @gdotv. It would be great if the JanusGraph folks can follow up on how to expse a GraphTraversalSource. In the meantime, I've been able to make progress on the question about using Java, by following this old-ish post by @Bo : https://li-boxuan.medium.com/spark-on-janusgraph-tinkerpop-a-pagerank-example-43950189b159 I've been able to include all missing dependencies and compile/run this example:
package org.example.gdb;

import org.apache.commons.configuration2.Configuration;
...
import static org.apache.tinkerpop.gremlin.hadoop.Constants.GREMLIN_HADOOP_OUTPUT_LOCATION;

import java.lang.Long;

public class JGGremlinSpark {
public static void main(String[] args) throws Exception {
Configuration sparkGraphConfiguration = getSparkGraphConfig();
sparkGraphConfiguration.setProperty(Constants.GREMLIN_SPARK_GRAPH_STORAGE_LEVEL, "MEMORY_AND_DISK");
sparkGraphConfiguration.setProperty(GREMLIN_HADOOP_GRAPH_WRITER, GraphSONOutputFormat.class.getCanonicalName());
sparkGraphConfiguration.setProperty(GREMLIN_HADOOP_OUTPUT_LOCATION, "/home/hadoop/jgspark_test/hadoop_output");
sparkGraphConfiguration.setProperty(SparkLauncher.EXECUTOR_MEMORY, "1g");
Graph graph = GraphFactory.open(sparkGraphConfiguration);

long startTime = System.currentTimeMillis();
GraphTraversalSource g = graph.traversal().withComputer(SparkGraphComputer.class);
final Long vCount = g.V().count().next();
final Long eCount = g.E().count().next();
System.out.println("V count = " + vCount);
System.out.println("E count = " + eCount);
long duration = (System.currentTimeMillis() - startTime) / 1000;
System.out.println("Finished JGGremlinSpark test - elapsed time = " + duration + " seconds.");
}
}
package org.example.gdb;

import org.apache.commons.configuration2.Configuration;
...
import static org.apache.tinkerpop.gremlin.hadoop.Constants.GREMLIN_HADOOP_OUTPUT_LOCATION;

import java.lang.Long;

public class JGGremlinSpark {
public static void main(String[] args) throws Exception {
Configuration sparkGraphConfiguration = getSparkGraphConfig();
sparkGraphConfiguration.setProperty(Constants.GREMLIN_SPARK_GRAPH_STORAGE_LEVEL, "MEMORY_AND_DISK");
sparkGraphConfiguration.setProperty(GREMLIN_HADOOP_GRAPH_WRITER, GraphSONOutputFormat.class.getCanonicalName());
sparkGraphConfiguration.setProperty(GREMLIN_HADOOP_OUTPUT_LOCATION, "/home/hadoop/jgspark_test/hadoop_output");
sparkGraphConfiguration.setProperty(SparkLauncher.EXECUTOR_MEMORY, "1g");
Graph graph = GraphFactory.open(sparkGraphConfiguration);

long startTime = System.currentTimeMillis();
GraphTraversalSource g = graph.traversal().withComputer(SparkGraphComputer.class);
final Long vCount = g.V().count().next();
final Long eCount = g.E().count().next();
System.out.println("V count = " + vCount);
System.out.println("E count = " + eCount);
long duration = (System.currentTimeMillis() - startTime) / 1000;
System.out.println("Finished JGGremlinSpark test - elapsed time = " + duration + " seconds.");
}
}
rpuga
rpugaOP2w ago
BTW: I'm using JanusGraph 1.1.0. Through lots of trial and error, I found these maven dependencies to work for me:
spark-gremlin 3.7.3
cassandra-hadoop-util 1.1.0
janusgraph-core 1.1.0
janusgraph-cql 1.1.0
janusgraph-es 1.1.0
janusgraph-hadoop 1.1.0

slf4j-api 2.0.13
slf4j-simple 2.0.13
log4j-slf4j-impl 2.23.1
log4j-core 2.23.1
jackson-databind 2.17.2
jackson-annotations 2.17.2
jackson-core 2.17.2
netty-transport-native-epoll 4.1.109.Final
spark-gremlin 3.7.3
cassandra-hadoop-util 1.1.0
janusgraph-core 1.1.0
janusgraph-cql 1.1.0
janusgraph-es 1.1.0
janusgraph-hadoop 1.1.0

slf4j-api 2.0.13
slf4j-simple 2.0.13
log4j-slf4j-impl 2.23.1
log4j-core 2.23.1
jackson-databind 2.17.2
jackson-annotations 2.17.2
jackson-core 2.17.2
netty-transport-native-epoll 4.1.109.Final
For the rest of the solution to this post, see my related by separate post titled: "Graph does not support the provided graph computer: SparkGraphComputer"

Did you find this page helpful?