dgreco
dgreco
JJanusGraph
Created by dgreco on 1/15/2024 in #questions
Idempotent upsert, is that possible?
Thank you so much
13 replies
JJanusGraph
Created by dgreco on 1/15/2024 in #questions
Idempotent upsert, is that possible?
Yes, it seems to be working; we tested all the different scenarios, and I think that this is strictly related to how data are stored in the backend. We observed the same behavior using Cassandra.
13 replies
JJanusGraph
Created by dgreco on 1/15/2024 in #questions
Idempotent upsert, is that possible?
using the internal id this mechanism doesn't work, you don't have idempotent upsert, so you would need to check the existence of the vertex (by some property) before inserting, so in an highly concurrent scenario and with an eventually consistent backend like cassandra I think that the risk to have ghost nodes is very high. Moreover, keeping on the transactions (bulkloading = false) potentially synchronises all the writers to ensure the consistency. A possible solution could be to implement a single writer model, where all the equal nodes are always written by the same writer process. I think it's possible, we thought to a potential implementation based on spark streaming. The point is how to scale the insertion? With the solution we implement we reached almost 500K vertex insertion per second on our cluster
13 replies
JJanusGraph
Created by dgreco on 1/15/2024 in #questions
Idempotent upsert, is that possible?
did it happen with creating vertexes with the same customer defined id? This should work only if you insert vertexes with the same id
13 replies
JJanusGraph
Created by dgreco on 1/15/2024 in #questions
Idempotent upsert, is that possible?
BTW, we tested the same approach with the Cassandra backend, and we got the same result, so it could be a generic recipe for enabling fast streaming of vertexes and edges
13 replies
JJanusGraph
Created by dgreco on 1/15/2024 in #questions
Idempotent upsert, is that possible?
we use sha256
13 replies
JJanusGraph
Created by dgreco on 9/28/2023 in #questions
How to run the mapreduce reindexing job
Thank you so much, very helpful thanks
9 replies
JJanusGraph
Created by dgreco on 9/28/2023 in #questions
How to run the mapreduce reindexing job
A last point, did you ever think to create reindexing job based on spark instead of MR? It would be more portable, MR is restricted to the hadoop env, yarn etc.
9 replies
JJanusGraph
Created by dgreco on 9/28/2023 in #questions
How to run the mapreduce reindexing job
Thank you 🙏 the Uber-jar seems the most plausible solution. Then there is the usual mess for putting all the dependencies together
9 replies
JJanusGraph
Created by dgreco on 7/13/2023 in #questions
Janusgraph limits
Interesting thanks, I wish I could have an example with Hbase
4 replies