shivam.choudhary Posts - Answer Overflow

shivam.choudhary

Explore posts from servers

•Created by shivam.choudhary on 12/14/2023 in #questions

Olap using spark cluster taking much more time than expected.

Hi All, We have setup a spark cluster to run olap queries on janusgraph with bigtable as storage backend. Details:

Backend: *Bigtable*
Vertices: *~4 Billion*
Data in backend: *~3.6 TB*
Spark workers: *2 workers each having 6 cpu, 25 gb ram*
Spark executors: *6 executors on each worker having 1 cpu 4gb ram*

Backend: *Bigtable*
Vertices: *~4 Billion*
Data in backend: *~3.6 TB*
Spark workers: *2 workers each having 6 cpu, 25 gb ram*
Spark executors: *6 executors on each worker having 1 cpu 4gb ram*

Now I'm trying to count all the vertices with the label ticket which we know are of the order of ~100k, the query fired to do that is as follows:

graph = GraphFactory.open("conf/hadoop-graph/read-hbase-cluster.properties")
g = graph.traversal().withComputer(org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer)
g.withComputer(Computer.compute().vertices(hasLabel(ticket))).V().count()

graph = GraphFactory.open("conf/hadoop-graph/read-hbase-cluster.properties")
g = graph.traversal().withComputer(org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer)
g.withComputer(Computer.compute().vertices(hasLabel(ticket))).V().count()

The query is running from the past 36 hours and is still not completed, looking at the average throughput (>50 mb/sec) at which data is being read it should have read the ~3.6TB of data by now. Is it possible to use indexes while running the olap query resulting in faster loading of the subgraph into spark rdds (currently it is scanning the full graph) ?

2 replies

JJanusGraph

•Created by shivam.choudhary on 8/14/2023 in #questions

Impact of ID Pool Initialisation on Query Performance

Greetings everyone, We're currently operating a JanusGraph setup with cluster.max-partitions set at 1024 and ids.num-partition at 10. Our primary goal is to ensure high availability for the cluster instances. However, we've noticed that the initialization of the ID pool is causing disruptions, during server restarts. The root cause seems to be the initialisation of ID pool threads for each partition until every partition has a ID pool. Upon server restart, the ID pool is initialised based on write operations. Unfortunately, this process has been negatively impacting the performance of query execution. To mitigate this challenge, we're exploring the possibility of implementing an eager initialisation approach for the ID pool threads. Is there a way to achieve it? Thank you for your attention and assistance in addressing this matter.

5 replies

JJanusGraph

•Created by shivam.choudhary on 7/30/2023 in #questions

JanusGraph Instance startup failure due to id block allocation

2 replies

JJanusGraph

•Created by shivam.choudhary on 7/27/2023 in #questions

Data Storage wit TTL

Hi Everyone, We have a requirement in which we have to store around ~80 million records daily in our graph storage and we should have a TTL of 90 (~7 Billion) days for this data but the issue we are having is that we can have TTL on static vertex only and we don't want to do that as that restrict us from further updates on that vertex (correct me if I'm wrong). Please suggest some way in which we can store this data so that we can have TTL also. We are using bigtable backend, so will it be possible for us if we directly can have gc policy(90 days) on Bigtable column families?

4 replies

JJanusGraph

•Created by shivam.choudhary on 6/27/2023 in #questions

JanusGraph metrics data having value 0 for most metrics

I have a janusgrpah server with metrics enabled along with jmx.metric enabled, The issue is all the metric starting with org_janusgraph have a constant 0. I'm able to see all the metrics with name org_apache_tinkerpop but not the janusgraph ones. Can someone please suggest if I m missing or how can I enable it.

11 replies

JJanusGraph

•Created by shivam.choudhary on 5/31/2023 in #questions

Reading the writes on Instance A done by Instance B

We are facing an issue with the janusgraph cluster having around 20 instances. The issue here is that if we are writing a vertex on instance A and try to read it from other instance it takes few seconds for the write to reflect on other instances. I have checked that storage.batch-loading is set to false.

5 replies

Gaming

Programming