Running OLAP queries on Janusgraph outside the Gremlin Console (from Java and G.V())

Hi, I'm able to run OLAP queries against my graph DB from the Gremlin Console, by following the directions provided here: https://docs.janusgraph.org/advanced-topics/hadoop/ However, I would like to also run OLAP queries without using the console, from an embedded Janusgraph Java application as well as from G.V(). In G.V(), I tried this while selecting Groovy Mode for query submission:
graph = GraphFactory.open('/opt/janusgraph/conf/hadoop-graph/spark-cql.properties')
g = graph.traversal().withComputer(SparkGraphComputer)
g.V().groupCount().by(label()).toList()
graph = GraphFactory.open('/opt/janusgraph/conf/hadoop-graph/spark-cql.properties')
g = graph.traversal().withComputer(SparkGraphComputer)
g.V().groupCount().by(label()).toList()
but I get the following error:
No such property: SparkGraphComputer for class: Script10
No such property: SparkGraphComputer for class: Script10
I'm assuming this is because the needed plugins are not loaded. I tried:
:plugin use tinkerpop.hadoop
:plugin use tinkerpop.spark
:plugin use tinkerpop.hadoop
:plugin use tinkerpop.spark
but this does not work outside of the console. Any suggestions, @gdotv ? Also, is there any example code I could use to run OLAP queries from a Java application? Thanks!
2 Replies
gdotv
gdotv4d ago
Hey, good question! G.V() isn't designed to run against embedded graphs as it will attempt to connect to a graph database via the Gremlin driver. In your case I believe you would need to expose a graph traversal source to your in memory graph on your server. G.V() is kind of a console replacement and not at the same time as it only allows running scripts or queries against the server, and does not include the more advanced console plug ins such as hadoop Hopefully this provides more context - maybe folks on the JanusGraph team can provide some more information on how to expose a graph traversal source from a spark computed job
rpuga
rpugaOP3d ago
Thanks, @gdotv. It would be great if the JanusGraph folks can follow up on how to expse a GraphTraversalSource. In the meantime, I've been able to make progress on the question about using Java, by following this old-ish post by @Bo : https://li-boxuan.medium.com/spark-on-janusgraph-tinkerpop-a-pagerank-example-43950189b159 I've been able to include all missing dependencies and compile/run this example:
package org.example.gdb;

import org.apache.commons.configuration2.Configuration;
...
import static org.apache.tinkerpop.gremlin.hadoop.Constants.GREMLIN_HADOOP_OUTPUT_LOCATION;

import java.lang.Long;

public class JGGremlinSpark {
public static void main(String[] args) throws Exception {
Configuration sparkGraphConfiguration = getSparkGraphConfig();
sparkGraphConfiguration.setProperty(Constants.GREMLIN_SPARK_GRAPH_STORAGE_LEVEL, "MEMORY_AND_DISK");
sparkGraphConfiguration.setProperty(GREMLIN_HADOOP_GRAPH_WRITER, GraphSONOutputFormat.class.getCanonicalName());
sparkGraphConfiguration.setProperty(GREMLIN_HADOOP_OUTPUT_LOCATION, "/home/hadoop/jgspark_test/hadoop_output");
sparkGraphConfiguration.setProperty(SparkLauncher.EXECUTOR_MEMORY, "1g");
Graph graph = GraphFactory.open(sparkGraphConfiguration);

long startTime = System.currentTimeMillis();
GraphTraversalSource g = graph.traversal().withComputer(SparkGraphComputer.class);
final Long vCount = g.V().count().next();
final Long eCount = g.E().count().next();
System.out.println("V count = " + vCount);
System.out.println("E count = " + eCount);
long duration = (System.currentTimeMillis() - startTime) / 1000;
System.out.println("Finished JGGremlinSpark test - elapsed time = " + duration + " seconds.");
}
}
package org.example.gdb;

import org.apache.commons.configuration2.Configuration;
...
import static org.apache.tinkerpop.gremlin.hadoop.Constants.GREMLIN_HADOOP_OUTPUT_LOCATION;

import java.lang.Long;

public class JGGremlinSpark {
public static void main(String[] args) throws Exception {
Configuration sparkGraphConfiguration = getSparkGraphConfig();
sparkGraphConfiguration.setProperty(Constants.GREMLIN_SPARK_GRAPH_STORAGE_LEVEL, "MEMORY_AND_DISK");
sparkGraphConfiguration.setProperty(GREMLIN_HADOOP_GRAPH_WRITER, GraphSONOutputFormat.class.getCanonicalName());
sparkGraphConfiguration.setProperty(GREMLIN_HADOOP_OUTPUT_LOCATION, "/home/hadoop/jgspark_test/hadoop_output");
sparkGraphConfiguration.setProperty(SparkLauncher.EXECUTOR_MEMORY, "1g");
Graph graph = GraphFactory.open(sparkGraphConfiguration);

long startTime = System.currentTimeMillis();
GraphTraversalSource g = graph.traversal().withComputer(SparkGraphComputer.class);
final Long vCount = g.V().count().next();
final Long eCount = g.E().count().next();
System.out.println("V count = " + vCount);
System.out.println("E count = " + eCount);
long duration = (System.currentTimeMillis() - startTime) / 1000;
System.out.println("Finished JGGremlinSpark test - elapsed time = " + duration + " seconds.");
}
}
BTW: I'm using JanusGraph 1.1.0. Through lots of trial and error, I found these maven dependencies to work for me:
spark-gremlin 3.7.3
cassandra-hadoop-util 1.1.0
janusgraph-core 1.1.0
janusgraph-cql 1.1.0
janusgraph-es 1.1.0
janusgraph-hadoop 1.1.0

slf4j-api 2.0.13
slf4j-simple 2.0.13
log4j-slf4j-impl 2.23.1
log4j-core 2.23.1
jackson-databind 2.17.2
jackson-annotations 2.17.2
jackson-core 2.17.2
netty-transport-native-epoll 4.1.109.Final
spark-gremlin 3.7.3
cassandra-hadoop-util 1.1.0
janusgraph-core 1.1.0
janusgraph-cql 1.1.0
janusgraph-es 1.1.0
janusgraph-hadoop 1.1.0

slf4j-api 2.0.13
slf4j-simple 2.0.13
log4j-slf4j-impl 2.23.1
log4j-core 2.23.1
jackson-databind 2.17.2
jackson-annotations 2.17.2
jackson-core 2.17.2
netty-transport-native-epoll 4.1.109.Final
Want results from more Discord servers?
Add your server