shivam.choudhary
shivam.choudhary
Explore posts from servers
JJanusGraph
Created by shivam.choudhary on 8/14/2023 in #questions
Impact of ID Pool Initialisation on Query Performance
@Boxuan Li any suggestion on this, currently the idpool gets initialised only after the instance start receiving write requests and the queries start to timeout for ~15 minutes until all the idpool are initialised. Is there a way to initialise the idpools before we send the write throughput to the janusgraph instance?
5 replies
JJanusGraph
Created by aschwartz on 12/13/2023 in #questions
Reindexing using the Mgmt System
Hi @aschwartz if you use the first approach then your queries which are eligible for that index usage will not run as expected. After you enable the index and the reindexing is not yet done, then all the queries which are eligible for that index will try to use the index which is non existent resulting in empty result. While in the second approach the index won't be eligible for usage until the reindexing job is completed after which it will automatically move to enabled state. https://docs.janusgraph.org/schema/index-management/index-lifecycle/#:~:text=.commit()%3B-,Index%20states%20and%20transitions,-The%20diagram%20below
4 replies
ATApache TinkerPop
Created by shivam.choudhary on 12/10/2023 in #questions
Implementing Graph Filter for Sub-Graph Loading in Spark Cluster with JanusGraph
Thanks @spmallette , I was able to fire the query using this via gremlin console. I'm marking the the question as answered as I was able to apply the filter successfully. The job is now running from the past 36 hours but isn't getting completed.My spark cluster has 12 executors and the graph data is getting read at above 50mb/sec and considering that the spark will scan the full graph having 3.6 TB of data I guess the job should have been completed by now. The olap query which I've fired is:
g.withComputer(Computer.compute().vertices(hasLabel('ticket'))).V().count()
g.withComputer(Computer.compute().vertices(hasLabel('ticket'))).V().count()
is there anything which I might be missing here? Thanks in advance.
4 replies
JJanusGraph
Created by shivam.choudhary on 8/14/2023 in #questions
Impact of ID Pool Initialisation on Query Performance
Hi Everyone, we are still facing the above issue, any help will be greatly appreciated. TIA.
5 replies
JJanusGraph
Created by 4j4y. on 8/29/2023 in #questions
Does the quintillion edges limitation stil exists after introducing custom vertex id support?
Also sometime we have to restart due to the following reasons: 1. Mapping additional props to vertex/edge label. 2. Changing state of newly created index from INSTALLED to REGISTERED. In both the cases changes does not reflect by itself unless we restart, we tried waiting for hours for changes to be communicated to all the instances via backend but only after redeployment it happens.
7 replies
JJanusGraph
Created by 4j4y. on 8/29/2023 in #questions
Does the quintillion edges limitation stil exists after introducing custom vertex id support?
I’m not sure how you calculate ids exhaustion.
@porunov Most of our ids got wasted due to server restarts as we have Janusgraph server running on kubernetes pods (~ 20-30 pods). Also, we have set cluster.max-partitions to 1024 which we didn't know at the start that will reserve 10 bits of id, giving 2^50 ids for edges and 2^49 for vertices. Initially we ingested a lot of data into the graph , block-size was set to 1million and recently while going through the Janusgraph code we found that Janusgraph id pool for edge namespace has a block size 8 times of base block size which equates to 8million per id pool. Now one deployment costs us 8m * (1024 id pools) * (~ 25 pods) = 204 billion edges, we've had a lot of deployments in the starting phase. we are planning to move Janusgraph server to VMs and reduce the block size, but unfortunately we cannot change cluster.max-partitions as it is a fixed config. Please let us know if our understading is wrong and any more step we can take to reduce the wastage of ids.
7 replies
ATApache TinkerPop
Created by shivam.choudhary on 7/31/2023 in #questions
User-Agent Metric Not Exposed in Gremlin Server - Need Help Troubleshooting
Hello @colegreer, I'm including the User-Agent header as a part of the handshake request headers. We're dealing with several clients for our Gremlin server, and we intended to utilise User-Agent metrics to monitor the throughput we're getting from these different clients. However, considering your remark that "It will not look for that header in subsequent messages sent via the existing connection," it appears that we can only monitor the count of connections per client. This leads me to believe that we might not have the capability to track it in the manner we thought?
9 replies
ATApache TinkerPop
Created by shivam.choudhary on 7/31/2023 in #questions
User-Agent Metric Not Exposed in Gremlin Server - Need Help Troubleshooting
Hello @colegreer and @spmallette , I've attempted to send WebSocket requests to the server with the 'User-Agent' header, but unfortunately, I'm still not able to observe the metrics even under these conditions. Could you kindly assist me in identifying what I might be overlooking? Thank you in advance for your help!
9 replies
ATApache TinkerPop
Created by shivam.choudhary on 7/18/2023 in #questions
ReadOnlyStrategy for remote script execution to make a read only server instance
Yeah makes sense, thanks for the clarification
13 replies
ATApache TinkerPop
Created by shivam.choudhary on 7/18/2023 in #questions
ReadOnlyStrategy for remote script execution to make a read only server instance
13 replies
ATApache TinkerPop
Created by shivam.choudhary on 6/27/2023 in #questions
[parameterized queries] Increased time in query evaluation when gremlin server starts/restarts
Currently I'm working on setting the read only janusgraph instances, will be able to test it out soon with the load which we have on the current janusgraph instances.
24 replies
ATApache TinkerPop
Created by shivam.choudhary on 7/18/2023 in #questions
ReadOnlyStrategy for remote script execution to make a read only server instance
Sure, let me go through the guideline and will create one but there's one more thing I found that with g also I can override the ReadOnlyStrategy set during initialisation by using withoutStrategies(ReadOnlyStrategy) configuration.
13 replies
ATApache TinkerPop
Created by shivam.choudhary on 6/27/2023 in #questions
[parameterized queries] Increased time in query evaluation when gremlin server starts/restarts
No, we havent change anything related to backend connection but we did change the ids.block-size when we were initially ingesting the data into the graph as the data was huge. Please find the janusgraph.properties below:
properties:
storage.backend: hbase
storage.directory: null
storage.hbase.ext.google.bigtable.instance.id: ##########
storage.hbase.ext.google.bigtable.project.id: ##########
storage.hbase.ext.google.bigtable.app_profile.id: ############
storage.hbase.ext.hbase.client.connection.impl: com.google.cloud.bigtable.hbase2_x.BigtableConnection
storage.hbase.short-cf-names: true
storage.hbase.table: ###########
cache.db-cache: false
cache.db-cache-clean-wait: 20
cache.db-cache-time: 180000
cache.db-cache-size: 0.5
cluster.max-partitions: 1024
graph.replace-instance-if-exists: true
metrics.enabled: true
metrics.jmx.enabled: true
ids.block-size: "1000000"
query.batch: true
query.limit-batch-size: true
schema.constraints: true
schema.default: none
storage.batch-loading: false
storage.hbase.scan.parallelism: 10
properties:
storage.backend: hbase
storage.directory: null
storage.hbase.ext.google.bigtable.instance.id: ##########
storage.hbase.ext.google.bigtable.project.id: ##########
storage.hbase.ext.google.bigtable.app_profile.id: ############
storage.hbase.ext.hbase.client.connection.impl: com.google.cloud.bigtable.hbase2_x.BigtableConnection
storage.hbase.short-cf-names: true
storage.hbase.table: ###########
cache.db-cache: false
cache.db-cache-clean-wait: 20
cache.db-cache-time: 180000
cache.db-cache-size: 0.5
cluster.max-partitions: 1024
graph.replace-instance-if-exists: true
metrics.enabled: true
metrics.jmx.enabled: true
ids.block-size: "1000000"
query.batch: true
query.limit-batch-size: true
schema.constraints: true
schema.default: none
storage.batch-loading: false
storage.hbase.scan.parallelism: 10
24 replies
ATApache TinkerPop
Created by shivam.choudhary on 6/27/2023 in #questions
[parameterized queries] Increased time in query evaluation when gremlin server starts/restarts
Sorry I had a mixup last time, the value ids.num-partitions is set as 10 and the value cluster.max-partitions is set as 1024
24 replies
ATApache TinkerPop
Created by shivam.choudhary on 7/18/2023 in #questions
ReadOnlyStrategy for remote script execution to make a read only server instance
Hi @boxuanli Yes I have configured storage.read-only = true but I was curious if the same can be achieved using strategies. Can you please tell me more about how can I not expose the graph instance to users? The graph instance is not getting set in global variable in the initialixzation script on server start.
13 replies
ATApache TinkerPop
Created by shivam.choudhary on 6/27/2023 in #questions
[parameterized queries] Increased time in query evaluation when gremlin server starts/restarts
I'm still figuring out a way to eliminate if StandardIDPool have anything to do with this problem. Have tried several ways but nothing so far. As the config which sets the size of the StandardIdPool can not be change that's why it is getting a bit challenging.
24 replies
ATApache TinkerPop
Created by shivam.choudhary on 6/27/2023 in #questions
[parameterized queries] Increased time in query evaluation when gremlin server starts/restarts
@spmallette I checked this metric - longRunCompilationCount which gives the count of events where the script compilation time time was more than the expectedCompilationTime (I configured it as 100 milliseconds) but it came out to be 0.* (Actually it was 1 due to a script which gets evaluated on start up by default which took around 2403ms).* This means that the query compilation time in not taking that much time but still the latency we are observing is of the order of ~500ms for few minutes after startup.
24 replies
ATApache TinkerPop
Created by shivam.choudhary on 6/27/2023 in #questions
[parameterized queries] Increased time in query evaluation when gremlin server starts/restarts
@boxuanli I checked it and the value is set as 1024, as the config is fixed for the lifetime of the graph we wanted to have the graph sufficiently partitioned so that we can make partitioned vertex labels for supernodes. But as of now we haven't had the requirement to create partition label, are there issues which we might face due to this down the line? Also we use Bigtable as our storage backend not sure how it can help here as we mostly have 1 or 2 bigtable nodes based on the traffic.
24 replies
ATApache TinkerPop
Created by shivam.choudhary on 6/27/2023 in #questions
[parameterized queries] Increased time in query evaluation when gremlin server starts/restarts
unfortunately it does not yet handle parameteres
We have desgined the architecture on usage of parameters in the query it wont be possible for now to switch from it. btw I profiled the JVM and turns out that these ~1000 threads which are getting created belongs to StandardIDPool
24 replies
ATApache TinkerPop
Created by shivam.choudhary on 6/27/2023 in #questions
[parameterized queries] Increased time in query evaluation when gremlin server starts/restarts
No description
24 replies