aschwartz
aschwartz
JJanusGraph
Created by aschwartz on 1/23/2024 in #questions
Vertex ID collisions
Hello! I'm following up on my questions from https://discord.com/channels/981533699378135051/1198566881360093274/1198566881360093274 and https://discord.com/channels/981533699378135051/1188579609667711047 We want to migrate our existing JanusGraph from Bigtable to Cassandra. One of the hypothesis was to try and leverage JG 1.0 custom vertex IDs, to import a GraphSON file while keeping the original IDs. The experiment went like this: 1. Export graph with JG 0.6, using graph.io(graphson()).writeGraph("/tmp/my-graph.json") 2. Bootstrap a JG 1.0 insance against Cassandra, setting graph.set-vertex-id to true. 3. Importing the graph with graph.io(IoCore.graphson()).readGraph('/tmp/my-graph.json') (we make sure to call graph.tx().commit() at the end) 4. Rebuilding our Elastic indices. 5. Setting graph.set-vertex-id to false and shutting down the instance. 6. Bring back JG 0.6.4 working with Cassandra, and resume operation. The import worked and we were able to see the vertices with their original IDs. The show-stopper came later, when we tried resuming our business logic, we started experiencing vertex lack-of-collisions. New vertices would get IDs of existing vertices and overwrite them, resulting in a corrupteg graph. Some thing we experimented with, that did not work * Explicitly setting "ids.authority.conflict-avoidance-mode") to GLOBAL_AUTO. * Increasing ids.block-size to a number slightly higher than the number of vertices we import (~26,000) @Florian Hockmann @Boxuan Li would appreciate any ideas an insights on this. I know that jumping between JG versions doesn't sound like a great idea, but we still working on migrating out python codebase to Tinkerpop 3.7.0 many thanks in advance!
2 replies
JJanusGraph
Created by aschwartz on 1/21/2024 in #questions
Migrating from Bigtable to Cassandra
Hello! I want to explore migrating our Janusgraph storage from Bigtable to Cassandra. One suggestion is checking Google's Dataflow export to Parquet, but I'm not sure if the underlying storage schema would be the same. I'm hoping to not resort to a GraphSON export, because we want to keep the vertex IDs, and still have some showstoppers when testing JG 1.0 If anyone has ever attempted this, I'll be happy to hear suggestions. Many thanks in advance.
2 replies
JJanusGraph
Created by aschwartz on 12/24/2023 in #questions
Using custom vertex IDs for import/export
Hi All, According to some tests done long long time ago, when exporting / import data using io.graphson.read / write, would not preseve vertex IDs. Will enabling cutom vertex IDs allow us to perserve vertex IDs? The use-case in question is migrating JanusGraph between different storage backends (Bigtable to Cassandra). Ideally we'd like to then disable custom IDs. I've seen some places mention that the setting is global offline, and some places where it's specified as fixed. Not sure which is the right answer. Many thanks!
6 replies
JJanusGraph
Created by aschwartz on 12/13/2023 in #questions
Reindexing using the Mgmt System
Hi all! we have an internal debate on how to best perform a reindex, after adding a new index. On JanusGraph 0.6, which of those options is preferred? and why?
mgmt.buildIndex('IDX_NAME', Vertex.class).addKey(key1, Mapping.STRING.asParameter()).addKey(key2, Mapping.TEXTSTRING.asParameter()).addKey(key3).buildMixedIndex("search")
mgmt.commit()

ManagementSystem.awaitGraphIndexStatus(graph, 'IDX_NAME').status(SchemaStatus.REGISTERED, SchemaStatus.ENABLED).call()
mgmt.updateIndex('IDX_NAME', SchemaAction.REINDEX).get()
mgmt.commit()
mgmt.buildIndex('IDX_NAME', Vertex.class).addKey(key1, Mapping.STRING.asParameter()).addKey(key2, Mapping.TEXTSTRING.asParameter()).addKey(key3).buildMixedIndex("search")
mgmt.commit()

ManagementSystem.awaitGraphIndexStatus(graph, 'IDX_NAME').status(SchemaStatus.REGISTERED, SchemaStatus.ENABLED).call()
mgmt.updateIndex('IDX_NAME', SchemaAction.REINDEX).get()
mgmt.commit()
vs (this looks more like the examples in the documentation)
mgmt.buildIndex('IDX_NAME', Vertex.class).addKey(key1, Mapping.STRING.asParameter()).addKey(key2, Mapping.TEXTSTRING.asParameter()).addKey(key3).buildMixedIndex("search")
mgmt.commit()

ManagementSystem.awaitGraphIndexStatus(graph, 'IDX_NAME').status(SchemaStatus.REGISTERED).call()
mgmt.updateIndex('IDX_NAME', SchemaAction.REINDEX).get()
mgmt.commit()
ManagementSystem.awaitGraphIndexStatus(graph, 'IDX_NAME').status(SchemaStatus.ENABLED).call()
mgmt.buildIndex('IDX_NAME', Vertex.class).addKey(key1, Mapping.STRING.asParameter()).addKey(key2, Mapping.TEXTSTRING.asParameter()).addKey(key3).buildMixedIndex("search")
mgmt.commit()

ManagementSystem.awaitGraphIndexStatus(graph, 'IDX_NAME').status(SchemaStatus.REGISTERED).call()
mgmt.updateIndex('IDX_NAME', SchemaAction.REINDEX).get()
mgmt.commit()
ManagementSystem.awaitGraphIndexStatus(graph, 'IDX_NAME').status(SchemaStatus.ENABLED).call()
a followup question - what happens to the index while the index job is running? is it usable? does it transition states? Thanks!
4 replies