shivam.choudhary Posts - Answer Overflow

shivam.choudhary

Explore posts from servers

•Created by shivam.choudhary on 12/10/2023 in #questions

Implementing Graph Filter for Sub-Graph Loading in Spark Cluster with JanusGraph

Hello, I'm currently utilizing JanusGraph 0.6.4 with Bigtable as the storage backend and encountering difficulties when attempting to run OLAP queries on my graph via SparkGraphComputer. The graph is quite large, containing billions of vertices, and I'm only able to execute queries on significantly smaller graphs. My queries are being run through the Gremlin console, and the problem appears to be related to loading the graph into Spark RDD. I'm interested in applying a filter to load only vertices and edges with specific labels before running the query. I've noticed that creating a vertex program and using it as described in the Tinkerpop documentation(https://tinkerpop.apache.org/docs/current/reference/#graph-filter) only loads the specified subgraph into the Spark RDDs:

graph.computer().
  vertices(hasLabel("person")).
  vertexProperties(__.properties("name")).
  edges(bothE("knows")).
  program(PageRankVertexProgram...)
)

graph.computer().
  vertices(hasLabel("person")).
  vertexProperties(__.properties("name")).
  edges(bothE("knows")).
  program(PageRankVertexProgram...)
)

Is it possible to implement this filtering directly through the Gremlin console? I've attempted to use g.V().limit(1), but without success. I suspect this is because the entire graph is being loaded into the RDD for this query as well. Here's the code I used:

graph = GraphFactory.open("conf/hadoop-graph/read-hbase-cluster.properties")
hg = graph.traversal().withComputer(org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer)
hg.V().limit(1)

graph = GraphFactory.open("conf/hadoop-graph/read-hbase-cluster.properties")
hg = graph.traversal().withComputer(org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer)
hg.V().limit(1)

Any insights or suggestions would be greatly appreciated. Thank you.

4 replies

ATApache TinkerPop

•Created by shivam.choudhary on 7/31/2023 in #questions

User-Agent Metric Not Exposed in Gremlin Server - Need Help Troubleshooting

Hey everyone, I've been working with Gremlin and noticed that we can pass the User-Agent in requests to the Gremlin server. According to the documentation (reference: https://tinkerpop.apache.org/docs/current/reference/#metrics), the server should maintain a metric called user-agent.*, which counts the number of connection requests from clients providing a specific user agent. We have already implemented sending the User-Agent in our HTTP requests to the Gremlin server, but the metric mentioned in the documentation doesn't seem to be exposed or working as expected. Has anyone encountered a similar issue? Do we need to enable the metric in some way, or could there be something else causing the problem? Any help or insights on this matter would be greatly appreciated. Thanks!

9 replies

ATApache TinkerPop

•Created by shivam.choudhary on 7/18/2023 in #questions

ReadOnlyStrategy for remote script execution to make a read only server instance

Hi all, I am setting up a read only cluster of gremlin server, I have conifgured the initialization script like this: globals << [g : traversal().withEmbedded(graph).withStrategies(ReferenceElementStrategy)] Now when I'm using g and sending a write request to the gremlin server I'm getting the proper exception and not able to add data. The issue I'm facing is that I can access the graph instance directly and able to send request like graph.traversal().addV() in place of g.addV, is there a way I can restrict this and make the server only accept write request? TIA.

13 replies

ATApache TinkerPop

•Created by shivam.choudhary on 6/27/2023 in #questions

[parameterized queries] Increased time in query evaluation when gremlin server starts/restarts

Hi folks, High latency observed in query evaluation time whenever janusgraph server restarts/starts & the latency degradation is there for atleast 5 mins. * I'm using parameterized queries for the janusgraph server requests, So I know there will be some increased latency whenever sever starts/restarts but the issue is this degradation does not go away for atleast 5 mins and the latency for evaluation goes from aoound 300ms to 5k ms. * Janusgraph is deployed in a kubernetes cluster with 20 pods, So everytime I redeploy the janusgraph cluster this issue arises which results in timeouts at client side. Wanted to know if there is some other way to add all the parameterized queries to the cache so that whenever started/restarted janusgraph pod is ready to serve requests all the parametrized query should be already in cache

24 replies

ATApache TinkerPop

•Created by shivam.choudhary on 3/10/2023 in #questions

Verifying the count of ingested vertex and edges after bulk loading in Janusgraph.

I have bulk loaded around 600k Vertices and 800k Edges into my janusgraph cluster backed with bigtable, I want to verify the number of vertex with a given label 'A' using gremlin query but I'm getting evaluation timeout error. The evaluation timeout is set to 5 minutes. Gremlin query used is = g.V().hasLabel('A').count() Can anyone help me on how I can verify the count of vertices and edges loaded into the graph? Thanks.

4 replies

Gaming

Programming