What would be the ideal way to set up deep learning on janusgraph data?

I have setup a simple knowledge graph on janusgraph and I wanted to use the data to train a deep learning model. I found DGL(https://dglke.dgl.ai/doc/index.html) library which seems to be used mostly. I am not sure how I should go about this. DGL accepts training data as a simple comma seperated strings which defines the from_vertex, relationship and the to_vertex ex: "London", "isCapitalOf", "UK" I have multiple relationships and eventually want to get embeddings of the data and use them for recommendations with vector similarity search. Any help would be appreciated. Thank you
4 Replies
porunov
porunov6mo ago
I haven’t personally use DGL, so can’t direct you too much there. Does DGL support distributed training? If so, I would look into OLAP via Spark to train embeddings in parallel on each node and then reduce those embeddings on the final stage into a single combined model. However, I didn’t look into DGL and don’t know if it can even merge weights afterwards (so, it’s something to check out). If your Graph is small then you might not need to use parallel training and probably you could use normal OLTP via Gremlin. For example, you can produce a list of maps with outV, edge label, and inV as you asked above by using something like this: g.E().project(“from_vertex”, “relationship”, “to_vertex”).by(outV().id()).by(label()).by(inV().id()).toList()
karthikraju
karthikrajuOP6mo ago
It does support distributed training, but my graph is not that big yet so I will try it with the query you suggested once. Thank you!
Bo
Bo6mo ago
Using OLAP via Spark for training (or even inference) is not quite acceptable due to its slowness. If you have big data the best way is to export the data first I think.
karthikraju
karthikrajuOP6mo ago
The scenario right now is that I am already importing data from MongoDB to construct the graph. I don't know how I will be able to maintain everything if I have to export it as well.

Did you find this page helpful?