How to run the mapreduce reindexing job

Did anyone succeed in running the map-reduce reindexing job? We went into the usual dependencies nightmare. I would assume we should put together all the dependencies into an uber-jar right? Otherwise we should put in the yarn node classpath the janusgraph dependencies, no?
5 Replies
Bo
Bo15mo ago
I used to have a successful set up on yarn cluster, but cannot find it anymore. IIRC, a uber-jar sounds like the way to go.
we should put in the yarn node classpath the janusgraph dependencies
I am not 100% sure but I don't think I did this
dgreco
dgrecoOP15mo ago
Thank you 🙏 the Uber-jar seems the most plausible solution. Then there is the usual mess for putting all the dependencies together A last point, did you ever think to create reindexing job based on spark instead of MR? It would be more portable, MR is restricted to the hadoop env, yarn etc.
Bo
Bo15mo ago
I agree we should gradually move away from MapReduce, or at least, allow people to do reindexing using Spark. I don't foresee it happens in the near future, unless someone is willing to tackle that. There's an adhoc way to do it by yourself: use Spark job to scan all vertices, update the properties that you want to reindex, and commit. It could be a no-op update that just does an in-place update without changing the value, but it will trigger a reindexing for that vertex/edge (if I recall correctly).
Bo
Bo15mo ago
In case you don't know how to "use Spark job to scan all vertices ... and commit", here's an example: https://github.com/Citegraph/citegraph/blob/main/backend/src/main/java/io/citegraph/data/spark/loader/VertexPropertyEnricher.java
GitHub
citegraph/backend/src/main/java/io/citegraph/data/spark/loader/Vert...
CiteGraph: A citation graph web visualizer. Contribute to Citegraph/citegraph development by creating an account on GitHub.
dgreco
dgrecoOP15mo ago
Thank you so much, very helpful thanks
Want results from more Discord servers?
Add your server