What's the best way to prevent duplicate entries?
I am building a voting app for college sports. A user can go and submit their Top 25 teams each week in a season.
My API route looks like
Right now I can just keep submitting the same form over and over again. Is there a quick and dirty way to just see if this user has submitted a vote for that week and year? Currently year is just autopopulated.
6 Replies
"Quick and dirty" and "the best way" are sort of at odds with one another, but the former could just be
I'm not terribly proficient with SQL, but I think better would be to just use some
.notExists()
conditional operator in the db.insert()
chain, or to combine the SELECT and INSERT into a single transaction. Ideally, you might implement some measures to prevent spammed requests from sneaking through on a race condition - API rate-limiting and SQL locks if your db supports them.
If the table is expected to be large, you may also benefit from indexing userId
, year
, and week
- or combining them into a compound key/index. But maybe just make a note of that and save it for a later benchmarking/tweaking pass down the road.
Other things to consider:
- It's not great that a request with a key like asdfasdgrank_23456245734567asdfasdf
will still hit the database. That's clearly a malformed key - the server should respond with an error before ever touching the db. You know where rank_
's going to appear in the string - match the whole key, and restrict the digits to like (\d|1[0-9]|2[0-5])
. My preference would be to send an array of objects in the request body and just specify rank
as it's own dedicated property; no reason to encode data into a string then decode it back to data again.
- Make sure the schema has some sort of relational constraint on teamId
, and ideally perform some sort of validation on it prior to the db query.
- That second .map()
can error out with a vague cannot read properties of null
error since the first .map()
can return null
- that's not terribly useful information to return to the front-end or send to Sentry.
All together, short of using a dedicated validation library and any sort of better query, the following might be an improvement:
This has been super helpful! I’m using Supabase for my DB and dealing with it and Drizzle is all new to me. My IDs for the teamId are basically coming from Sanity CMS, so I guess I could make a query to Ssnity to validate the teamIds 🤔
Your suggestion on the array I’ll look into. Right now I have a zod schema for a form that uses react-hook-form and everything comes through as an object with the 25 rankings like so
rank_1: teamId
.
Another question I have for you since I’ve gotten mixed responses on it… you touched on the indexing part. Someone also suggested making it unique on the userId, week, and year but it would require a 3rd column because I and inserting 25 rows into the database for every ballot submitted. Would it be “better” to just have a single row per ballot but have a column for every rank?My IDs for the teamId are basically coming from Sanity CMS, so I guess I could make a query to Ssnity to validate the teamIdsMaybe not a bad idea. My thought with all the validation was mostly just to avoid any more expensive calls like DB operations is there was any hint of anything wrong or malicious with the request. To that end, you might validate the type and format of all of the teamIds first, then after the request body's been successfully mapped to vote objects hit up Sanity to make sure they actually exist. But if you're just prototyping, this might also be a little much - I have a long history of over-engineering things before I really need to 👍. To the end of over-engineering things, if Sanity hits are costly in time or money, I might seek to find a way to cache teamIds somewhere less expensive.
Your suggestion on the array I’ll look into.No biggie either way - an array is just my subjective preference. I'm also presently calf-deep in JSON:API where all data is moved around in arrays, so my personal preference might be somewhat skewed :P
Right now I have a zod schema for a form [...]Glad to hear it! Writing validation functions from scratch sucks... I haven't tried zod yet, but I believe you could parse the request body against the schema to skip most of this validation?
Another question I have for you since I’ve gotten mixed responses on it… you touched on the indexing part.Definitely don't take anything I say about SQL as gospel or even well-informed - I really need to spend some dedicated time studying it some day instead of just relying on what I've picked up over the years. So take the below as only my best understanding and preference rather than anything close to a "best practice."
Someone also suggested making it unique on the userId, week, and year but it would require a 3rd columnNot necessarily - the
UNIQUE
constraint can be applied at the table level to multiple columns together (creating a "composite key") without explicitly needing an extra column (instead that data is stored separately from the table row data as an index). It looks like in Drizzle ORM this is achieved through the refiner
callback you can provide as the second argument to a table schema constructor (https://orm.drizzle.team/docs/indexes-constraints#unique), so your refiner might look something like
I believe this is one of the ways in which you could prevent duplicate entries with out the extra SELECT query, and I think you'd do that by just throwing .onConflictDoNothing()
on to your db.insert()
chain - but it may need to be further configured to target those columns (https://orm.drizzle.team/docs/insert#on-conflict-do-nothing).
Would it be “better” to just have a single row per ballot but have a column for every rank?I think it depends on what else you intend to be doing with that data... like if you want to calculate a team's overall ranking across all users in a week or season, or a user's most favored team, or some such. Consider how you'd achieve those calculations with Drizzle/SQL for either scenario. Row-per-ballot is definitely smaller and would likely outperform row-per-vote for most tasks, but I feel like row-per-vote is more flexible and the queries would be easier to compose without resorting to SQL... I think I'd probably stick with row-per-vote unless it becomes evident that row-per-ballot offers necessary advantages.
The only thing I will be querying for is to make a page similar to this https://www.espn.com/college-football/rankings. And then maybe to say how each voter voted
Also, were you suggesting to change the submitted values from
to something like
Yeah that's what I was thinking. I just like that you can iterate over the array and test it's length and such, instead of pulling k/v pairs from the object. I also like that the format has a better parity with your data model. Likewise, the original request format would map better to the row-per-ballot strategy, as then
rank_15
directly corresponds to a column/property on your model.
When the API data format closely aligns with the resource model, you get to skip over some parsing/mapping work which you'd otherwise need to do 👍I started on this and ran into an issue where the do nothing on conflict causes the primary key to still increment even though no inserts are happening