JanusGraph•2y ago

Using Spark with JansuGraph on a Cassandra storage backend

Hello there!! Hope you are having an awesome day. I am trying to use OLAP query for my janusgraph setup. I am using Cassandra for storing graph, but I am using our own defined custom storage backend class. I think that graph is created without using my storage backend. Can you please help me figure out what is wrong ? I am creating the graph using the following code.

graph = org.apache.tinkerpop.gremlin.structure.util.GraphFactory.open(executorConfig.getCassandraGraphConfiguration().getJanusGraphConfig());

graph = org.apache.tinkerpop.gremlin.structure.util.GraphFactory.open(executorConfig.getCassandraGraphConfiguration().getJanusGraphConfig());

cassandra:
  janusGraphConfig:
    gremlin:
      graph: "org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph"
      hadoop:
        graphReader: "org.janusgraph.hadoop.formats.cql.CqlInputFormat"
        graphWriter: "org.apache.hadoop.mapreduce.lib.output.NullOutputFormat"
        jarsInDistributedCache: true
        inputLocation: none
        outputLocation: output
        persistContext: true
    janusgraphmr:
      ioformat:
        conf:
          storage:
            backend: "com.company.department.diskstorage.companycassandra.CompanyCassandraStoreManager"
            cassandra:
              keyspace: "janusgraph-olap"
    storage:
      backend: "com.company.department.diskstorage.companycassandra.CompanyCassandraStoreManager"
      cassandra:
        keyspace: "janusgraph-olap"
    cassandra:
      input:
        keyspace: "janusgraph-olap"
        partitioner:
          class: "org.apache.cassandra.dht.Murmur3Partitioner"
        widerows: true
        columnfamily: "edgestore"
    spark:
      master: "k8s:https://kubernetes.default.svc:443"
      executor.memory: 1g
      serializer: "org.apache.spark.serializer.KryoSerializer"
      kryo.registrator: "org.janusgraph.hadoop.serialize.JanusGraphKryoRegistrator"
    query:
      batch: true
      batch-property-prefetch: true
      fast-property: true
      force-index: false
    graph:
      replace-instance-if-exists: true
    ids:
      block-size: 10000000

cassandra:
  janusGraphConfig:
    gremlin:
      graph: "org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph"
      hadoop:
        graphReader: "org.janusgraph.hadoop.formats.cql.CqlInputFormat"
        graphWriter: "org.apache.hadoop.mapreduce.lib.output.NullOutputFormat"
        jarsInDistributedCache: true
        inputLocation: none
        outputLocation: output
        persistContext: true
    janusgraphmr:
      ioformat:
        conf:
          storage:
            backend: "com.company.department.diskstorage.companycassandra.CompanyCassandraStoreManager"
            cassandra:
              keyspace: "janusgraph-olap"
    storage:
      backend: "com.company.department.diskstorage.companycassandra.CompanyCassandraStoreManager"
      cassandra:
        keyspace: "janusgraph-olap"
    cassandra:
      input:
        keyspace: "janusgraph-olap"
        partitioner:
          class: "org.apache.cassandra.dht.Murmur3Partitioner"
        widerows: true
        columnfamily: "edgestore"
    spark:
      master: "k8s:https://kubernetes.default.svc:443"
      executor.memory: 1g
      serializer: "org.apache.spark.serializer.KryoSerializer"
      kryo.registrator: "org.janusgraph.hadoop.serialize.JanusGraphKryoRegistrator"
    query:
      batch: true
      batch-property-prefetch: true
      fast-property: true
      force-index: false
    graph:
      replace-instance-if-exists: true
    ids:
      block-size: 10000000

6 Replies

deepak8347OP•2y ago

The current error message is

Caused by: java.lang.UnsupportedOperationException: You must set the initial output address to a Cassandra node with setInputInitialAddress
        at org.apache.cassandra.hadoop.cql3.CqlInputFormat.validateConfiguration(CqlInputFormat.java:112)
        at org.apache.cassandra.hadoop.cql3.CqlInputFormat.getSplits(CqlInputFormat.java:121)
        at org.janusgraph.hadoop.formats.cql.CqlBinaryInputFormat.getSplits(CqlBinaryInputFormat.java:53)
        at org.janusgraph.hadoop.formats.util.HadoopInputFormat.getSplits(HadoopInputFormat.java:55)
        at org.apache.tinkerpop.gremlin.hadoop.structure.io.HadoopElementIterator.<init>(HadoopElementIterator.java:65)

Caused by: java.lang.UnsupportedOperationException: You must set the initial output address to a Cassandra node with setInputInitialAddress
        at org.apache.cassandra.hadoop.cql3.CqlInputFormat.validateConfiguration(CqlInputFormat.java:112)
        at org.apache.cassandra.hadoop.cql3.CqlInputFormat.getSplits(CqlInputFormat.java:121)
        at org.janusgraph.hadoop.formats.cql.CqlBinaryInputFormat.getSplits(CqlBinaryInputFormat.java:53)
        at org.janusgraph.hadoop.formats.util.HadoopInputFormat.getSplits(HadoopInputFormat.java:55)
        at org.apache.tinkerpop.gremlin.hadoop.structure.io.HadoopElementIterator.<init>(HadoopElementIterator.java:65)

I have added the complete stack trace in the file

Error_Message

Bo•2y ago

You should use cql for janusgraphmr.ioformat.conf.storage.backend NOTE: the default CQLStoreManager or your custom CompanyCassandraStoreManager has nothing to do with OLAP. OLAP relies on Cassandra's official cassandra-hadoop-util package which defines how to partition and read Cassandra data in parallel.

deepak8347OP•2y ago

Thanks @boxuanli for helping !! 1. I changed the backend to cql, but still getting the same issue. Do I need some additional configurations ?

    janusgraphmr:
      ioformat:
        conf:
          storage:
            backend: cql

    janusgraphmr:
      ioformat:
        conf:
          storage:
            backend: cql

2. The CompanyCassandraStoreManager allows us to create connection with our Cassandra. Our OLTP janusgraph is working with this storage backend. It contains the details regarding how we are going to connect with Cassandra (especially it contains the Cassandra Managed which is used to manage Cassandra client along with taking care of Authentication).

public class CompanyCassandraStoreManager extends DistributedStoreManager implements KeyColumnValueStoreManager {
    static final String CONSISTENCY_QUORUM = "QUORUM";
    static final String EDGESTORE_LOCK = "edgestore_lock_";
    .......
    static final String TXLOG = "txlog";
    private final CassandraManaged cassandraManaged;

public class CompanyCassandraStoreManager extends DistributedStoreManager implements KeyColumnValueStoreManager {
    static final String CONSISTENCY_QUORUM = "QUORUM";
    static final String EDGESTORE_LOCK = "edgestore_lock_";
    .......
    static final String TXLOG = "txlog";
    private final CassandraManaged cassandraManaged;

Can I not use this storage backend and I should specify the hostname and port as shown in the example(https://docs.janusgraph.org/advanced-topics/hadoop/) ?

Bo•2y ago

Can I not use this storage backend

No you cannot use it

I should specify the hostname and port

Yes

deepak8347OP•2y ago

How do we support authentication to the cassandra ? Do we also take password as a parameter ?

Bo•2y ago

https://stackoverflow.com/questions/76152740/janusgraph-olap-traversal-connection-with-cassandra-using-trusstore-config-not/76179225#76179225

Stack Overflow

Janusgraph OLAP traversal - Connection with cassandra using trussto...

We have a working setup of Janusgraph 0.5.2 version where we are able to insert and query (OLTP) the data as per need. We are exploring JanusGraph OLAP traversal for some reporting and analytical

Gaming

Programming

Using Spark with JansuGraph on a Cassandra storage backend

Did you find this page helpful?