What would be the ideal way to set up deep learning on janusgraph data?
I have setup a simple knowledge graph on janusgraph and I wanted to use the data to train a deep learning model. I found DGL(https://dglke.dgl.ai/doc/index.html) library which seems to be used mostly. I am not sure how I should go about this.
DGL accepts training data as a simple comma seperated strings which defines the
from_vertex
, relationship
and the to_vertex
ex: "London", "isCapitalOf", "UK"
I have multiple relationships and eventually want to get embeddings of the data and use them for recommendations with vector similarity search.
Any help would be appreciated. Thank you4 Replies
I haven’t personally use DGL, so can’t direct you too much there.
Does DGL support distributed training? If so, I would look into OLAP via Spark to train embeddings in parallel on each node and then reduce those embeddings on the final stage into a single combined model.
However, I didn’t look into DGL and don’t know if it can even merge weights afterwards (so, it’s something to check out).
If your Graph is small then you might not need to use parallel training and probably you could use normal OLTP via Gremlin.
For example, you can produce a list of maps with outV, edge label, and inV as you asked above by using something like this:
g.E().project(“from_vertex”, “relationship”, “to_vertex”).by(outV().id()).by(label()).by(inV().id()).toList()
It does support distributed training, but my graph is not that big yet so I will try it with the query you suggested once. Thank you!
Using OLAP via Spark for training (or even inference) is not quite acceptable due to its slowness. If you have big data the best way is to export the data first I think.
The scenario right now is that I am already importing data from MongoDB to construct the graph. I don't know how I will be able to maintain everything if I have to export it as well.