Performance issue in large graphs
When performing changes in large graph (ca. 100K nodes, 500K edges) which is stored in one kryo file I am experiencing some huge delays. Just as an example, when writing initially I can change 10K nodes in minutes, but when the graph is big the same changes need more than one hour. Is there any easy solution possible, i.e., like breaking down and saving in smaller files etc. Any suggestion is helpful. Initial preference is saving in file system (local or network). Thanks for your suggestions/solutions.
Solution:Jump to solution
I'm not sure if any of the other serializations such as GraphML or GraphSON might perform better, but I would say this is likely not a common way we see those graphs used so we may not have too much data on which techniques may work best. With the exception of TinkerGraph which is often used as an in-memory, somewhat ephemeral, graph, we typically see persistent graph stores used where the data is persisted on disk by the database and you do not need to constantly reload the data each time. If y...
6 Replies
Are you using TinkerGraph and saving the updates to file, or are you using some other graph database store? In general what you describe is actually quite a small graph, but I'm not sure what technology stack you are using.
Hi Kelvin, I am using tinkergraph. I also used Janusgraph - it was even slower there and then have to switch back to tinkergraph.
How exactly are you using Kyro here? Are you serializing the whole graph out periodically and then reloading it?
I am reading the whole graph and then adding the the nodes.
Solution
I'm not sure if any of the other serializations such as GraphML or GraphSON might perform better, but I would say this is likely not a common way we see those graphs used so we may not have too much data on which techniques may work best. With the exception of TinkerGraph which is often used as an in-memory, somewhat ephemeral, graph, we typically see persistent graph stores used where the data is persisted on disk by the database and you do not need to constantly reload the data each time. If you ever have the need to look at commercial graph database engines, then you will find tools like bulk loaders that make loading the data easier/faster. I wonder if @spmallette has any thoughts on this one?
Thanks a lot for the prompt response. Agree, a persistent store solution will help a lot. I think, that is possibly the solution I have to opt for long run.