Custom ID best practice
Hey I'm looking into using custom ID for my DB vertices & came across this:
https://github.com/JanusGraph/janusgraph/issues/1221#issuecomment-938060054
Where there is a mention of potentially moving to use UUID internally. Can I ask what's the overall consensus/traction on this? And does this mean it's preferable for new graphs to use custom id with UUID now? Thank you.
GitHub
Support String-type custom vertex ID · Issue #1221 · JanusGraph/jan...
Though #147 is merged, we still need to call graph.getIDManager().toVertexId(custom_id) to explicitly convert a custom vertex ID to a valid Janusgraph vertex ID. I feel it's unfriendly for the ...
4 Replies
There's no native support for UUID. But you can convert your UUID to string format and use that as your custom ID.
Note that UUID contains "-", which is a reserved character in JanusGraph. You either need to convert it to something else, or you need to use this: https://docs.janusgraph.org/advanced-topics/custom-vertex-id/#override-reserved-character
Notice, string ids are usually bigger than 8 byte Long data type. So, by using it you may have some overhead.
I would use custom vertex id in String data type in the following cases:
- Data migration from the other system where String ids were used.
- Application specific logic which prefers string ids.
- 8 bytes isn’t enough for you for some reason.
In all other cases I would stick to Long data type.
By storing UUID as strings you will end up using 36 bytes instead 16 bytes (native UUID). Meaning it’s 2.25 times bigger than native UUID and 4.5 times bigger than the default Long data type.
Thanks for the clarification guys, got confused a little bit because I saw mention of possible performance gain from using custom ID. And some people pushing for string ID replacing Long ID as the default option.
Would be nice if the docs mentions what’s the recommended approach n why. Useful for new users like me 🙂
There's a trade off and which is more performant depends on your use case. As @Oleksandr Porunov mentioned, string ids are usually bigger than 8 byte long data type, posing a bigger network and storage overhead.
On the other hand, what we have seen a lot in the past is that without the support for custom string ID (prior to 1.0.0 there was no such support), customers ended up creating a vertex property to store their "ID"s. Since they also need to look up vertices by their custom "ID" field, they ended up creating an index. Default long ID + custom "ID" property + index has way more overhead than just a custom string ID.