J
JanusGraph10mo ago
rpuga

~20% write performance hit when using custom str IDs?

Hi, I've been experimenting with using custom vertex IDs. I have a process that reads data from a file and writes (or updates with mergeV()) nodes to a JanusGraph 1.0.0 instance with a Cassandra+ES backend. Keeping exactly the same client code and test data, I noticed a 20% write slowdown when writing nodes with custom string IDs, rather than custom int IDs. In both cases, the IDs are exactly the same, with the only difference being that in one case I convert int to string, before submitting the query to the JG server (I'm using parametrized scripts submitted via gremlin-python). Is this a known issue? (I could not find info about this in the documentation)
3 Replies
Bo
Bo10mo ago
Can you provide an example pair of int ID and string ID? Is the string ID significantly longer than int ID? Do your nodes have node properties and edges? If it's all dummy data (no properties), then it's expected that int IDs (represented as long data type in JanusGraph) should be faster. After all, long typically uses less memory than String.
rpuga
rpugaOP10mo ago
Hi @Boxuan Li, here is an example: Int ID: 3232243223 Str ID: "3232243223" Everything else except the ID data type is exactly the same. Each inserted vertex has 2 properties (besides a label and the custom ID): a Long and a String. I expected a bit of write performance degradation from using custom str IDs, but I was somewhat surprised to see a ~20% perfromance impact, which seems quite significant. That's why I was wondering if this is a known issue.
Bo
Bo10mo ago
One thing that is worth testing is to use JanusGraph as a library i.e. use Java code that calls JanusGraph library to do the insertion this will help us judge whether the overhead comes from network & serialization Another thing worth testing (orthogonal to the above scenario): when you use numeric ID type, add another dummy String property that stores the string representation of the ID And no, this is not a known issue. People usually see great performance improvement because they were using JanusGraph auto-generated IDs. So yeah, that's not a fair comparison. What you have found is interesting and surprised me as well.

Did you find this page helpful?