Using Spark with JansuGraph on a Cassandra storage backend

Hello there!! Hope you are having an awesome day. I am trying to use OLAP query for my janusgraph setup. I am using Cassandra for storing graph, but I am using our own defined custom storage backend class. I think that graph is created without using my storage backend. Can you please help me figure out what is wrong ? I am creating the graph using the following code.
graph = org.apache.tinkerpop.gremlin.structure.util.GraphFactory.open(executorConfig.getCassandraGraphConfiguration().getJanusGraphConfig());
graph = org.apache.tinkerpop.gremlin.structure.util.GraphFactory.open(executorConfig.getCassandraGraphConfiguration().getJanusGraphConfig());
cassandra:
janusGraphConfig:
gremlin:
graph: "org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph"
hadoop:
graphReader: "org.janusgraph.hadoop.formats.cql.CqlInputFormat"
graphWriter: "org.apache.hadoop.mapreduce.lib.output.NullOutputFormat"
jarsInDistributedCache: true
inputLocation: none
outputLocation: output
persistContext: true
janusgraphmr:
ioformat:
conf:
storage:
backend: "com.company.department.diskstorage.companycassandra.CompanyCassandraStoreManager"
cassandra:
keyspace: "janusgraph-olap"
storage:
backend: "com.company.department.diskstorage.companycassandra.CompanyCassandraStoreManager"
cassandra:
keyspace: "janusgraph-olap"
cassandra:
input:
keyspace: "janusgraph-olap"
partitioner:
class: "org.apache.cassandra.dht.Murmur3Partitioner"
widerows: true
columnfamily: "edgestore"
spark:
master: "k8s:https://kubernetes.default.svc:443"
executor.memory: 1g
serializer: "org.apache.spark.serializer.KryoSerializer"
kryo.registrator: "org.janusgraph.hadoop.serialize.JanusGraphKryoRegistrator"
query:
batch: true
batch-property-prefetch: true
fast-property: true
force-index: false
graph:
replace-instance-if-exists: true
ids:
block-size: 10000000
cassandra:
janusGraphConfig:
gremlin:
graph: "org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph"
hadoop:
graphReader: "org.janusgraph.hadoop.formats.cql.CqlInputFormat"
graphWriter: "org.apache.hadoop.mapreduce.lib.output.NullOutputFormat"
jarsInDistributedCache: true
inputLocation: none
outputLocation: output
persistContext: true
janusgraphmr:
ioformat:
conf:
storage:
backend: "com.company.department.diskstorage.companycassandra.CompanyCassandraStoreManager"
cassandra:
keyspace: "janusgraph-olap"
storage:
backend: "com.company.department.diskstorage.companycassandra.CompanyCassandraStoreManager"
cassandra:
keyspace: "janusgraph-olap"
cassandra:
input:
keyspace: "janusgraph-olap"
partitioner:
class: "org.apache.cassandra.dht.Murmur3Partitioner"
widerows: true
columnfamily: "edgestore"
spark:
master: "k8s:https://kubernetes.default.svc:443"
executor.memory: 1g
serializer: "org.apache.spark.serializer.KryoSerializer"
kryo.registrator: "org.janusgraph.hadoop.serialize.JanusGraphKryoRegistrator"
query:
batch: true
batch-property-prefetch: true
fast-property: true
force-index: false
graph:
replace-instance-if-exists: true
ids:
block-size: 10000000
6 Replies
deepak8347
deepak8347OP17mo ago
The current error message is
Caused by: java.lang.UnsupportedOperationException: You must set the initial output address to a Cassandra node with setInputInitialAddress
at org.apache.cassandra.hadoop.cql3.CqlInputFormat.validateConfiguration(CqlInputFormat.java:112)
at org.apache.cassandra.hadoop.cql3.CqlInputFormat.getSplits(CqlInputFormat.java:121)
at org.janusgraph.hadoop.formats.cql.CqlBinaryInputFormat.getSplits(CqlBinaryInputFormat.java:53)
at org.janusgraph.hadoop.formats.util.HadoopInputFormat.getSplits(HadoopInputFormat.java:55)
at org.apache.tinkerpop.gremlin.hadoop.structure.io.HadoopElementIterator.<init>(HadoopElementIterator.java:65)
Caused by: java.lang.UnsupportedOperationException: You must set the initial output address to a Cassandra node with setInputInitialAddress
at org.apache.cassandra.hadoop.cql3.CqlInputFormat.validateConfiguration(CqlInputFormat.java:112)
at org.apache.cassandra.hadoop.cql3.CqlInputFormat.getSplits(CqlInputFormat.java:121)
at org.janusgraph.hadoop.formats.cql.CqlBinaryInputFormat.getSplits(CqlBinaryInputFormat.java:53)
at org.janusgraph.hadoop.formats.util.HadoopInputFormat.getSplits(HadoopInputFormat.java:55)
at org.apache.tinkerpop.gremlin.hadoop.structure.io.HadoopElementIterator.<init>(HadoopElementIterator.java:65)
I have added the complete stack trace in the file
Bo
Bo17mo ago
You should use cql for janusgraphmr.ioformat.conf.storage.backend NOTE: the default CQLStoreManager or your custom CompanyCassandraStoreManager has nothing to do with OLAP. OLAP relies on Cassandra's official cassandra-hadoop-util package which defines how to partition and read Cassandra data in parallel.
deepak8347
deepak8347OP17mo ago
Thanks @boxuanli for helping !! 1. I changed the backend to cql, but still getting the same issue. Do I need some additional configurations ?
janusgraphmr:
ioformat:
conf:
storage:
backend: cql
janusgraphmr:
ioformat:
conf:
storage:
backend: cql
2. The CompanyCassandraStoreManager allows us to create connection with our Cassandra. Our OLTP janusgraph is working with this storage backend. It contains the details regarding how we are going to connect with Cassandra (especially it contains the Cassandra Managed which is used to manage Cassandra client along with taking care of Authentication).
public class CompanyCassandraStoreManager extends DistributedStoreManager implements KeyColumnValueStoreManager {
static final String CONSISTENCY_QUORUM = "QUORUM";
static final String EDGESTORE_LOCK = "edgestore_lock_";
.......
static final String TXLOG = "txlog";
private final CassandraManaged cassandraManaged;
public class CompanyCassandraStoreManager extends DistributedStoreManager implements KeyColumnValueStoreManager {
static final String CONSISTENCY_QUORUM = "QUORUM";
static final String EDGESTORE_LOCK = "edgestore_lock_";
.......
static final String TXLOG = "txlog";
private final CassandraManaged cassandraManaged;
Can I not use this storage backend and I should specify the hostname and port as shown in the example(https://docs.janusgraph.org/advanced-topics/hadoop/) ?
Bo
Bo17mo ago
Can I not use this storage backend
No you cannot use it
I should specify the hostname and port
Yes
deepak8347
deepak8347OP17mo ago
How do we support authentication to the cassandra ? Do we also take password as a parameter ?
Bo
Bo17mo ago
Stack Overflow
Janusgraph OLAP traversal - Connection with cassandra using trussto...
We have a working setup of Janusgraph 0.5.2 version where we are able to insert and query (OLTP) the data as per need. We are exploring JanusGraph OLAP traversal for some reporting and analytical
Want results from more Discord servers?
Add your server