shivam.choudhary Comments - Answer Overflow

Topics

shivam.choudhary

Explore posts from servers

ATApache TinkerPop JJanusGraph

•Created by shivam.choudhary on 8/14/2023 in #questions

Impact of ID Pool Initialisation on Query Performance

@Boxuan Li any suggestion on this, currently the idpool gets initialised only after the instance start receiving write requests and the queries start to timeout for ~15 minutes until all the idpool are initialised. Is there a way to initialise the idpools before we send the write throughput to the janusgraph instance?

5 replies

•Created by aschwartz on 12/13/2023 in #questions

Reindexing using the Mgmt System

Hi @aschwartz if you use the first approach then your queries which are eligible for that index usage will not run as expected. After you enable the index and the reindexing is not yet done, then all the queries which are eligible for that index will try to use the index which is non existent resulting in empty result. While in the second approach the index won't be eligible for usage until the reindexing job is completed after which it will automatically move to enabled state. https://docs.janusgraph.org/schema/index-management/index-lifecycle/#:~:text=.commit()%3B-,Index%20states%20and%20transitions,-The%20diagram%20below

4 replies

ATApache TinkerPop

•Created by shivam.choudhary on 12/10/2023 in #questions

Implementing Graph Filter for Sub-Graph Loading in Spark Cluster with JanusGraph

Thanks @spmallette , I was able to fire the query using this via gremlin console. I'm marking the the question as answered as I was able to apply the filter successfully. The job is now running from the past 36 hours but isn't getting completed.My spark cluster has 12 executors and the graph data is getting read at above 50mb/sec and considering that the spark will scan the full graph having 3.6 TB of data I guess the job should have been completed by now. The olap query which I've fired is:

g.withComputer(Computer.compute().vertices(hasLabel('ticket'))).V().count()

g.withComputer(Computer.compute().vertices(hasLabel('ticket'))).V().count()

is there anything which I might be missing here? Thanks in advance.

4 replies

•Created by shivam.choudhary on 8/14/2023 in #questions

Impact of ID Pool Initialisation on Query Performance

Hi Everyone, we are still facing the above issue, any help will be greatly appreciated. TIA.

5 replies

•Created by 4j4y. on 8/29/2023 in #questions

Does the quintillion edges limitation stil exists after introducing custom vertex id support?

Also sometime we have to restart due to the following reasons: 1. Mapping additional props to vertex/edge label. 2. Changing state of newly created index from INSTALLED to REGISTERED. In both the cases changes does not reflect by itself unless we restart, we tried waiting for hours for changes to be communicated to all the instances via backend but only after redeployment it happens.

7 replies

•Created by 4j4y. on 8/29/2023 in #questions

Does the quintillion edges limitation stil exists after introducing custom vertex id support?

I’m not sure how you calculate ids exhaustion.

@porunov Most of our ids got wasted due to server restarts as we have Janusgraph server running on kubernetes pods (~ 20-30 pods). Also, we have set cluster.max-partitions to 1024 which we didn't know at the start that will reserve 10 bits of id, giving 2^50 ids for edges and 2^49 for vertices. Initially we ingested a lot of data into the graph , block-size was set to 1million and recently while going through the Janusgraph code we found that Janusgraph id pool for edge namespace has a block size 8 times of base block size which equates to 8million per id pool. Now one deployment costs us 8m * (1024 id pools) * (~ 25 pods) = 204 billion edges, we've had a lot of deployments in the starting phase. we are planning to move Janusgraph server to VMs and reduce the block size, but unfortunately we cannot change cluster.max-partitions as it is a fixed config. Please let us know if our understading is wrong and any more step we can take to reduce the wastage of ids.

7 replies

ATApache TinkerPop

•Created by shivam.choudhary on 7/31/2023 in #questions

User-Agent Metric Not Exposed in Gremlin Server - Need Help Troubleshooting

Hello @colegreer, I'm including the User-Agent header as a part of the handshake request headers. We're dealing with several clients for our Gremlin server, and we intended to utilise User-Agent metrics to monitor the throughput we're getting from these different clients. However, considering your remark that "It will not look for that header in subsequent messages sent via the existing connection," it appears that we can only monitor the count of connections per client. This leads me to believe that we might not have the capability to track it in the manner we thought?

9 replies

ATApache TinkerPop

•Created by shivam.choudhary on 7/31/2023 in #questions

User-Agent Metric Not Exposed in Gremlin Server - Need Help Troubleshooting

Hello @colegreer and @spmallette , I've attempted to send WebSocket requests to the server with the 'User-Agent' header, but unfortunately, I'm still not able to observe the metrics even under these conditions. Could you kindly assist me in identifying what I might be overlooking? Thank you in advance for your help!

9 replies

ATApache TinkerPop

•Created by shivam.choudhary on 7/18/2023 in #questions

ReadOnlyStrategy for remote script execution to make a read only server instance

Yeah makes sense, thanks for the clarification

13 replies

ATApache TinkerPop

•Created by shivam.choudhary on 7/18/2023 in #questions

ReadOnlyStrategy for remote script execution to make a read only server instance

This is mentioned in the doc which I missed initially: https://tinkerpop.apache.org/docs/current/reference/#configuration-steps-withoutstrategies

13 replies

ATApache TinkerPop

•Created by shivam.choudhary on 6/27/2023 in #questions

[parameterized queries] Increased time in query evaluation when gremlin server starts/restarts

Currently I'm working on setting the read only janusgraph instances, will be able to test it out soon with the load which we have on the current janusgraph instances.

24 replies

ATApache TinkerPop

•Created by shivam.choudhary on 7/18/2023 in #questions

ReadOnlyStrategy for remote script execution to make a read only server instance

Sure, let me go through the guideline and will create one but there's one more thing I found that with g also I can override the ReadOnlyStrategy set during initialisation by using withoutStrategies(ReadOnlyStrategy) configuration.

13 replies

ATApache TinkerPop

•Created by shivam.choudhary on 6/27/2023 in #questions

[parameterized queries] Increased time in query evaluation when gremlin server starts/restarts

No, we havent change anything related to backend connection but we did change the ids.block-size when we were initially ingesting the data into the graph as the data was huge. Please find the janusgraph.properties below:

properties:
  storage.backend: hbase
  storage.directory: null
  storage.hbase.ext.google.bigtable.instance.id: ##########
  storage.hbase.ext.google.bigtable.project.id: ##########
  storage.hbase.ext.google.bigtable.app_profile.id: ############
  storage.hbase.ext.hbase.client.connection.impl: com.google.cloud.bigtable.hbase2_x.BigtableConnection
  storage.hbase.short-cf-names: true
  storage.hbase.table: ###########
  cache.db-cache: false
  cache.db-cache-clean-wait: 20
  cache.db-cache-time: 180000
  cache.db-cache-size: 0.5
  cluster.max-partitions: 1024
  graph.replace-instance-if-exists: true
  metrics.enabled: true
  metrics.jmx.enabled: true
  ids.block-size: "1000000"
  query.batch: true
  query.limit-batch-size: true
  schema.constraints: true
  schema.default: none
  storage.batch-loading: false
  storage.hbase.scan.parallelism: 10

properties:
  storage.backend: hbase
  storage.directory: null
  storage.hbase.ext.google.bigtable.instance.id: ##########
  storage.hbase.ext.google.bigtable.project.id: ##########
  storage.hbase.ext.google.bigtable.app_profile.id: ############
  storage.hbase.ext.hbase.client.connection.impl: com.google.cloud.bigtable.hbase2_x.BigtableConnection
  storage.hbase.short-cf-names: true
  storage.hbase.table: ###########
  cache.db-cache: false
  cache.db-cache-clean-wait: 20
  cache.db-cache-time: 180000
  cache.db-cache-size: 0.5
  cluster.max-partitions: 1024
  graph.replace-instance-if-exists: true
  metrics.enabled: true
  metrics.jmx.enabled: true
  ids.block-size: "1000000"
  query.batch: true
  query.limit-batch-size: true
  schema.constraints: true
  schema.default: none
  storage.batch-loading: false
  storage.hbase.scan.parallelism: 10

24 replies

ATApache TinkerPop

•Created by shivam.choudhary on 6/27/2023 in #questions

[parameterized queries] Increased time in query evaluation when gremlin server starts/restarts

Sorry I had a mixup last time, the value ids.num-partitions is set as 10 and the value cluster.max-partitions is set as 1024

24 replies

ATApache TinkerPop

•Created by shivam.choudhary on 7/18/2023 in #questions

ReadOnlyStrategy for remote script execution to make a read only server instance

Hi @boxuanli Yes I have configured storage.read-only = true but I was curious if the same can be achieved using strategies. Can you please tell me more about how can I not expose the graph instance to users? The graph instance is not getting set in global variable in the initialixzation script on server start.

13 replies

ATApache TinkerPop

•Created by shivam.choudhary on 6/27/2023 in #questions

[parameterized queries] Increased time in query evaluation when gremlin server starts/restarts

I'm still figuring out a way to eliminate if StandardIDPool have anything to do with this problem. Have tried several ways but nothing so far. As the config which sets the size of the StandardIdPool can not be change that's why it is getting a bit challenging.

24 replies

ATApache TinkerPop

•Created by shivam.choudhary on 6/27/2023 in #questions

[parameterized queries] Increased time in query evaluation when gremlin server starts/restarts

@spmallette I checked this metric - longRunCompilationCount which gives the count of events where the script compilation time time was more than the expectedCompilationTime (I configured it as 100 milliseconds) but it came out to be 0.* (Actually it was 1 due to a script which gets evaluated on start up by default which took around 2403ms).* This means that the query compilation time in not taking that much time but still the latency we are observing is of the order of ~500ms for few minutes after startup.

24 replies

ATApache TinkerPop

•Created by shivam.choudhary on 6/27/2023 in #questions

[parameterized queries] Increased time in query evaluation when gremlin server starts/restarts

@boxuanli I checked it and the value is set as 1024, as the config is fixed for the lifetime of the graph we wanted to have the graph sufficiently partitioned so that we can make partitioned vertex labels for supernodes. But as of now we haven't had the requirement to create partition label, are there issues which we might face due to this down the line? Also we use Bigtable as our storage backend not sure how it can help here as we mostly have 1 or 2 bigtable nodes based on the traffic.

24 replies

ATApache TinkerPop

•Created by shivam.choudhary on 6/27/2023 in #questions

[parameterized queries] Increased time in query evaluation when gremlin server starts/restarts

unfortunately it does not yet handle parameteres

We have desgined the architecture on usage of parameters in the query it wont be possible for now to switch from it. btw I profiled the JVM and turns out that these ~1000 threads which are getting created belongs to StandardIDPool

24 replies

ATApache TinkerPop

•Created by shivam.choudhary on 6/27/2023 in #questions

[parameterized queries] Increased time in query evaluation when gremlin server starts/restarts

No description

24 replies