dgreco Posts - Answer Overflow

dgreco

•Created by dgreco on 1/15/2024 in #questions

Idempotent upsert, is that possible?

For our project, we need to be able to insert vertexes and edges at a very high pace using spark streaming. After many tests, we found an approach that seems very promising. In our context, we could have sporadic vertex collisions, so instead of checking for the existence of a vertex before inserting it, we decided to use a custom ID. As an id, we use a hash generated from the vertex property; the id somehow represents the vertex value, so if two vertexes have the same property values, they have the same hash. Using this approach, we don't check but insert the vertex again; we find that the vertex is precisely overwritten. Looking at the HBase storage layout, it seems ok. We did additional tests, and we couldn't find any counterexample. We wonder if this approach could be potentially dangerous. Performance-wise, we reach more than 600K/s vertex insertions by removing transactions with the existence checks. Any comment on this?

13 replies

JJanusGraph

•Created by dgreco on 12/1/2023 in #questions

Accelerating the vertex upsert

We need to accelerate the ingestion rate; the scenario is pretty typical. We could have repeated vertexes with new relationships. So, at each vertex insertion, we should check if it's already been inserted. Is there any particular recipe to accelerate this step? I would assume that this check would cause contention for maintaining consistency. We are considering introducing an external memory-based cache where we can accumulate all the vertex IDs and check the cache before hitting the DB. Any other suggestions?

2 replies

JJanusGraph

•Created by dgreco on 10/5/2023 in #questions

Benchmarks

We are trying to benchmark the ingestion rate of JG, we use SOLR as indexing engine, is there any number already available?

2 replies

JJanusGraph

•Created by dgreco on 9/28/2023 in #questions

How to run the mapreduce reindexing job

Did anyone succeed in running the map-reduce reindexing job? We went into the usual dependencies nightmare. I would assume we should put together all the dependencies into an uber-jar right? Otherwise we should put in the yarn node classpath the janusgraph dependencies, no?

9 replies

JJanusGraph

•Created by dgreco on 7/13/2023 in #questions

Janusgraph limits

I need to build a massive knowledge graph where I should aggregate many kinds of pieces of information, and the logical model of the system I need to build naturally fits in a property graph. I could need to manage approximately 200B of vertices and 500B edges. I'm familiar with HBase, and I know it scales pretty well, just wondering if any of you used Janusgraph (with Solr) with similar numbers, pros? cons? hints? is it risky?

4 replies

Gaming

Programming