Performance issue in large graphs

When performing changes in large graph (ca. 100K nodes, 500K edges) which is stored in one kryo file I am experiencing some huge delays. Just as an example, when writing initially I can change 10K nodes in minutes, but when the graph is big the same changes need more than one hour. Is there any easy solution possible, i.e., like breaking down and saving in smaller files etc. Any suggestion is helpful. Initial preference is saving in file system (local or network). Thanks for your suggestions/solutions.
Solution:
I'm not sure if any of the other serializations such as GraphML or GraphSON might perform better, but I would say this is likely not a common way we see those graphs used so we may not have too much data on which techniques may work best. With the exception of TinkerGraph which is often used as an in-memory, somewhat ephemeral, graph, we typically see persistent graph stores used where the data is persisted on disk by the database and you do not need to constantly reload the data each time. If y...
Jump to solution
6 Replies
kelvinl2816
kelvinl28169mo ago
Are you using TinkerGraph and saving the updates to file, or are you using some other graph database store? In general what you describe is actually quite a small graph, but I'm not sure what technology stack you are using.
Tanvir
TanvirOP9mo ago
Hi Kelvin, I am using tinkergraph. I also used Janusgraph - it was even slower there and then have to switch back to tinkergraph.
kelvinl2816
kelvinl28169mo ago
How exactly are you using Kyro here? Are you serializing the whole graph out periodically and then reloading it?
Tanvir
TanvirOP9mo ago
I am reading the whole graph and then adding the the nodes.
Solution
kelvinl2816
kelvinl28169mo ago
I'm not sure if any of the other serializations such as GraphML or GraphSON might perform better, but I would say this is likely not a common way we see those graphs used so we may not have too much data on which techniques may work best. With the exception of TinkerGraph which is often used as an in-memory, somewhat ephemeral, graph, we typically see persistent graph stores used where the data is persisted on disk by the database and you do not need to constantly reload the data each time. If you ever have the need to look at commercial graph database engines, then you will find tools like bulk loaders that make loading the data easier/faster. I wonder if @spmallette has any thoughts on this one?
Tanvir
TanvirOP9mo ago
Thanks a lot for the prompt response. Agree, a persistent store solution will help a lot. I think, that is possibly the solution I have to opt for long run.
Want results from more Discord servers?
Add your server