How do you use 128-bit values with varbinary and what is the fastest way to search for rows?
I have been researching the fastest way to insert data into SQL tables, & I'm having a hard time using Drizzle with PlanetScale because PS doesn't support foreign keys; which means that PS is auto-sharded & the autoincrement primary keys are not going to be the same. You don't want to use UUID because 1.) it exposes your MAC address & 2.) the values don't always increase, i.e. non-monotonic.
The current industry choice is ULID, which uses a 128-bit pattern where you take the JS new Date().getTime() & shift the lower 48-bits into the MSB & the lower 80 bits are a cryptographically-secure random number. (CSRN) generated at runtime.
I don't like the idea of having to pay to generate random numbers at runtime so I created an npm package called linearid (LID) that uses a 128-bit value where the LSB is a 64-bit CSRN created upon server initialization, & the MSB is a 42-bit millisecond timestamp followed by a 22-bit spin ticker that gets reset every millisecond & incremented each time you call LIDNext(). If you call LIDNext() more than 2^22 (4,194,304) times in one millisecond it will spin wait until the next millisecond. This gives you a 128-bit monotonically increasing unique ID that you can use to look up data in SQL tables. Then to search for the SQL row, you extract the seconds timestamp from the LID, & search for the row by date range using the filesystems timestamps so you don't have to do a full table scan. Instead you search through all of the SQL rows created starting at the millisecond the LID was generated to something like three seconds after, which is much faster for searching for iNodes on disk.
I read a StackOverflow benchmark that showed it was faster to search for 128-bit UIDs using a varbinary as opposed to two 64-bit bigints. There is no documentation on how you work with 128-bit varbinary. How do I create a table entry & search for a table entry using the above described search pattern? What is the fastest way to search for 128-bit LIDs/ULIDs?
7 Replies
In this example i'm passing in eq(users.uid, LID()) (LID() returns two BigInt [msb, lsb]) which isn't right because the second that the LID gets assigned is not necessarily the same second as the SQL timestamp, so I need to search for a range of some number of seconds after the the LID was generated. I haven't figured out yet. The algorithm does rely on the clocks being accurate.
You need to use a Buffer to represent your UUID when interacting with the DB e.g.
Buffer.from('975b2add-6362-4cda-ae30-a9e56361f857'.replace(/-/g, ''),'hex')
Looks like one step closer but I can't just pass in a bigint[].
This function appears to do it, but I'm still having trouble importing my npm linearid package into my Next.JS project. When I try to load my website it says "ReferenceError: module is not defined" I think I don't have my npm package.json configured correctly.
This link has detailed information on how to optimize MySQL search qurries (i.e. PlanetScale is MySQL). I'm not sure if my timestamp method is the fastest. You really want a 64-bit index to search with B-Trees. I'm thinking that maybe I can XOR the two 64-bit LSW and MSW. The LSW is a cryptographically-secure randomly generated number. I would image there would be some clustering around the timestamp and zeros of the spin ticker. https://learn.microsoft.com/en-us/sql/relational-databases/sql-server-index-design-guide?view=sql-server-ver15
SQL Server and Azure SQL index architecture and design guide - SQL ...
Learn about designing efficient indexes in SQL Server and Azure SQL to achieve good database and application performance. Read about index architecture and best practices.
Why do you want two 64 bit integers?
Storing UUIDs as 128bit binary is the industry standard.
Where a database has higher level type support for "UUID", it will still be the 128 bit binary value under the hood