shivam.choudhary
shivam.choudhary
Explore posts from servers
ATApache TinkerPop
Created by shivam.choudhary on 12/10/2023 in #questions
Implementing Graph Filter for Sub-Graph Loading in Spark Cluster with JanusGraph
Hello, I'm currently utilizing JanusGraph 0.6.4 with Bigtable as the storage backend and encountering difficulties when attempting to run OLAP queries on my graph via SparkGraphComputer. The graph is quite large, containing billions of vertices, and I'm only able to execute queries on significantly smaller graphs. My queries are being run through the Gremlin console, and the problem appears to be related to loading the graph into Spark RDD. I'm interested in applying a filter to load only vertices and edges with specific labels before running the query. I've noticed that creating a vertex program and using it as described in the Tinkerpop documentation(https://tinkerpop.apache.org/docs/current/reference/#graph-filter) only loads the specified subgraph into the Spark RDDs:
graph.computer().
vertices(hasLabel("person")).
vertexProperties(__.properties("name")).
edges(bothE("knows")).
program(PageRankVertexProgram...)
)
graph.computer().
vertices(hasLabel("person")).
vertexProperties(__.properties("name")).
edges(bothE("knows")).
program(PageRankVertexProgram...)
)
Is it possible to implement this filtering directly through the Gremlin console? I've attempted to use g.V().limit(1), but without success. I suspect this is because the entire graph is being loaded into the RDD for this query as well. Here's the code I used:
graph = GraphFactory.open("conf/hadoop-graph/read-hbase-cluster.properties")
hg = graph.traversal().withComputer(org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer)
hg.V().limit(1)
graph = GraphFactory.open("conf/hadoop-graph/read-hbase-cluster.properties")
hg = graph.traversal().withComputer(org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer)
hg.V().limit(1)
Any insights or suggestions would be greatly appreciated. Thank you.
4 replies
ATApache TinkerPop
Created by shivam.choudhary on 7/31/2023 in #questions
User-Agent Metric Not Exposed in Gremlin Server - Need Help Troubleshooting
Hey everyone, I've been working with Gremlin and noticed that we can pass the User-Agent in requests to the Gremlin server. According to the documentation (reference: https://tinkerpop.apache.org/docs/current/reference/#metrics), the server should maintain a metric called user-agent.*, which counts the number of connection requests from clients providing a specific user agent. We have already implemented sending the User-Agent in our HTTP requests to the Gremlin server, but the metric mentioned in the documentation doesn't seem to be exposed or working as expected. Has anyone encountered a similar issue? Do we need to enable the metric in some way, or could there be something else causing the problem? Any help or insights on this matter would be greatly appreciated. Thanks!
9 replies
ATApache TinkerPop
Created by shivam.choudhary on 7/18/2023 in #questions
ReadOnlyStrategy for remote script execution to make a read only server instance
Hi all, I am setting up a read only cluster of gremlin server, I have conifgured the initialization script like this: globals << [g : traversal().withEmbedded(graph).withStrategies(ReferenceElementStrategy)] Now when I'm using g and sending a write request to the gremlin server I'm getting the proper exception and not able to add data. The issue I'm facing is that I can access the graph instance directly and able to send request like graph.traversal().addV() in place of g.addV, is there a way I can restrict this and make the server only accept write request? TIA.
13 replies
ATApache TinkerPop
Created by shivam.choudhary on 6/27/2023 in #questions
[parameterized queries] Increased time in query evaluation when gremlin server starts/restarts
Hi folks, High latency observed in query evaluation time whenever janusgraph server restarts/starts & the latency degradation is there for atleast 5 mins. * I'm using parameterized queries for the janusgraph server requests, So I know there will be some increased latency whenever sever starts/restarts but the issue is this degradation does not go away for atleast 5 mins and the latency for evaluation goes from aoound 300ms to 5k ms. * Janusgraph is deployed in a kubernetes cluster with 20 pods, So everytime I redeploy the janusgraph cluster this issue arises which results in timeouts at client side. Wanted to know if there is some other way to add all the parameterized queries to the cache so that whenever started/restarted janusgraph pod is ready to serve requests all the parametrized query should be already in cache
24 replies
ATApache TinkerPop
Created by shivam.choudhary on 3/10/2023 in #questions
Verifying the count of ingested vertex and edges after bulk loading in Janusgraph.
I have bulk loaded around 600k Vertices and 800k Edges into my janusgraph cluster backed with bigtable, I want to verify the number of vertex with a given label 'A' using gremlin query but I'm getting evaluation timeout error. The evaluation timeout is set to 5 minutes. Gremlin query used is = g.V().hasLabel('A').count() Can anyone help me on how I can verify the count of vertices and edges loaded into the graph? Thanks.
4 replies