What's the best way to prevent duplicate entries?

I am building a voting app for college sports. A user can go and submit their Top 25 teams each week in a season. My API route looks like
import * as Sentry from '@sentry/nextjs'

import { auth } from '@clerk/nextjs/server'
import { db } from '@/server/db'
import { votes } from '@/server/db/schema'

export async function POST(req: Request) {
try {
const body = await req.json()
const user = auth()

if (!user.userId) {
return new Response('Unauthorized', { status: 401 })
}

const votesConverted = Object.entries(body)
.map(([key, value]) => {
const match = key.match(/rank_(\d+)/)
if (!match) return null
const rank = parseInt(match[1], 10)
return { teamId: String(value), rank: rank } // Convert 'value' to a string
})
.map((vote) => ({
userId: user.userId,
week: 1,
teamId: vote!.teamId,
rank: vote!.rank,
}))

await db.insert(votes).values(votesConverted)

return new Response('OK', { status: 200 })
} catch (error) {
Sentry.captureException(error)
return Response.error()
}
}
import * as Sentry from '@sentry/nextjs'

import { auth } from '@clerk/nextjs/server'
import { db } from '@/server/db'
import { votes } from '@/server/db/schema'

export async function POST(req: Request) {
try {
const body = await req.json()
const user = auth()

if (!user.userId) {
return new Response('Unauthorized', { status: 401 })
}

const votesConverted = Object.entries(body)
.map(([key, value]) => {
const match = key.match(/rank_(\d+)/)
if (!match) return null
const rank = parseInt(match[1], 10)
return { teamId: String(value), rank: rank } // Convert 'value' to a string
})
.map((vote) => ({
userId: user.userId,
week: 1,
teamId: vote!.teamId,
rank: vote!.rank,
}))

await db.insert(votes).values(votesConverted)

return new Response('OK', { status: 200 })
} catch (error) {
Sentry.captureException(error)
return Response.error()
}
}
Right now I can just keep submitting the same form over and over again. Is there a quick and dirty way to just see if this user has submitted a vote for that week and year? Currently year is just autopopulated.
6 Replies
A Dapper Raccoon
"Quick and dirty" and "the best way" are sort of at odds with one another, but the former could just be
const hasAlreadyVoted = !!await db.select(1).from(votes).where(
and(
eq(votes.userId, user.userId),
eq(votes.year, sql`(YEAR(CURDATE()))`),
eq(votes.week, 1),
)
).limit(1);
const hasAlreadyVoted = !!await db.select(1).from(votes).where(
and(
eq(votes.userId, user.userId),
eq(votes.year, sql`(YEAR(CURDATE()))`),
eq(votes.week, 1),
)
).limit(1);
I'm not terribly proficient with SQL, but I think better would be to just use some .notExists() conditional operator in the db.insert() chain, or to combine the SELECT and INSERT into a single transaction. Ideally, you might implement some measures to prevent spammed requests from sneaking through on a race condition - API rate-limiting and SQL locks if your db supports them. If the table is expected to be large, you may also benefit from indexing userId, year, and week - or combining them into a compound key/index. But maybe just make a note of that and save it for a later benchmarking/tweaking pass down the road. Other things to consider: - It's not great that a request with a key like asdfasdgrank_23456245734567asdfasdf will still hit the database. That's clearly a malformed key - the server should respond with an error before ever touching the db. You know where rank_'s going to appear in the string - match the whole key, and restrict the digits to like (\d|1[0-9]|2[0-5]). My preference would be to send an array of objects in the request body and just specify rank as it's own dedicated property; no reason to encode data into a string then decode it back to data again. - Make sure the schema has some sort of relational constraint on teamId, and ideally perform some sort of validation on it prior to the db query. - That second .map() can error out with a vague cannot read properties of null error since the first .map() can return null - that's not terribly useful information to return to the front-end or send to Sentry. All together, short of using a dedicated validation library and any sort of better query, the following might be an improvement:
function teamIdIsValid(teamId: any) {
if (typeof teamId !== 'string') return false;
// ...
return true;
}

function rankIsValid(rank: any) {
if (typeof rank !== 'number') return false;
if (rank % 1) return false;
if (rank < 1 || rank > 25) return false;
return true;
}
function teamIdIsValid(teamId: any) {
if (typeof teamId !== 'string') return false;
// ...
return true;
}

function rankIsValid(rank: any) {
if (typeof rank !== 'number') return false;
if (rank % 1) return false;
if (rank < 1 || rank > 25) return false;
return true;
}
if (!Array.isArray(body))
throw new Error('Invalid request body.');

// ...

const ranksSubmitted = new Set();
const votes = body.map(
({teamId, rank, ...rest}) => {
if (Object.keys(rest).length > 0)
throw new Error('Unknown properties in vote object.');

if (!teamIdIsValid(teamId))
throw new Error('Invalid vote teamId.');

if (!rankIsValid(rank))
throw new Error('Invalid vote rank.');

if (ranksSubmitted.has(rank))
throw new Error('Duplicate vote rank.');
else
ranksSubmitted.add(rank);

return {
userId: user.userId,
week: 1,
teamId,
rank,
};
}
);

// ...

if (hasAlreadyVoted)
throw new Error('User has already voted this week.');

// ...
if (!Array.isArray(body))
throw new Error('Invalid request body.');

// ...

const ranksSubmitted = new Set();
const votes = body.map(
({teamId, rank, ...rest}) => {
if (Object.keys(rest).length > 0)
throw new Error('Unknown properties in vote object.');

if (!teamIdIsValid(teamId))
throw new Error('Invalid vote teamId.');

if (!rankIsValid(rank))
throw new Error('Invalid vote rank.');

if (ranksSubmitted.has(rank))
throw new Error('Duplicate vote rank.');
else
ranksSubmitted.add(rank);

return {
userId: user.userId,
week: 1,
teamId,
rank,
};
}
);

// ...

if (hasAlreadyVoted)
throw new Error('User has already voted this week.');

// ...
jsingleton37
jsingleton37OP8mo ago
This has been super helpful! I’m using Supabase for my DB and dealing with it and Drizzle is all new to me. My IDs for the teamId are basically coming from Sanity CMS, so I guess I could make a query to Ssnity to validate the teamIds 🤔 Your suggestion on the array I’ll look into. Right now I have a zod schema for a form that uses react-hook-form and everything comes through as an object with the 25 rankings like sorank_1: teamId. Another question I have for you since I’ve gotten mixed responses on it… you touched on the indexing part. Someone also suggested making it unique on the userId, week, and year but it would require a 3rd column because I and inserting 25 rows into the database for every ballot submitted. Would it be “better” to just have a single row per ballot but have a column for every rank?
A Dapper Raccoon
My IDs for the teamId are basically coming from Sanity CMS, so I guess I could make a query to Ssnity to validate the teamIds
Maybe not a bad idea. My thought with all the validation was mostly just to avoid any more expensive calls like DB operations is there was any hint of anything wrong or malicious with the request. To that end, you might validate the type and format of all of the teamIds first, then after the request body's been successfully mapped to vote objects hit up Sanity to make sure they actually exist. But if you're just prototyping, this might also be a little much - I have a long history of over-engineering things before I really need to 👍. To the end of over-engineering things, if Sanity hits are costly in time or money, I might seek to find a way to cache teamIds somewhere less expensive.
Your suggestion on the array I’ll look into.
No biggie either way - an array is just my subjective preference. I'm also presently calf-deep in JSON:API where all data is moved around in arrays, so my personal preference might be somewhat skewed :P
Right now I have a zod schema for a form [...]
Glad to hear it! Writing validation functions from scratch sucks... I haven't tried zod yet, but I believe you could parse the request body against the schema to skip most of this validation?
Another question I have for you since I’ve gotten mixed responses on it… you touched on the indexing part.
Definitely don't take anything I say about SQL as gospel or even well-informed - I really need to spend some dedicated time studying it some day instead of just relying on what I've picked up over the years. So take the below as only my best understanding and preference rather than anything close to a "best practice."
Someone also suggested making it unique on the userId, week, and year but it would require a 3rd column
Not necessarily - the UNIQUE constraint can be applied at the table level to multiple columns together (creating a "composite key") without explicitly needing an extra column (instead that data is stored separately from the table row data as an index). It looks like in Drizzle ORM this is achieved through the refiner callback you can provide as the second argument to a table schema constructor (https://orm.drizzle.team/docs/indexes-constraints#unique), so your refiner might look something like
(table) => ({
isUniqueVote: unique().on(table.userId, table.year, table.week),
})
(table) => ({
isUniqueVote: unique().on(table.userId, table.year, table.week),
})
I believe this is one of the ways in which you could prevent duplicate entries with out the extra SELECT query, and I think you'd do that by just throwing .onConflictDoNothing() on to your db.insert() chain - but it may need to be further configured to target those columns (https://orm.drizzle.team/docs/insert#on-conflict-do-nothing).
Would it be “better” to just have a single row per ballot but have a column for every rank?
I think it depends on what else you intend to be doing with that data... like if you want to calculate a team's overall ranking across all users in a week or season, or a user's most favored team, or some such. Consider how you'd achieve those calculations with Drizzle/SQL for either scenario. Row-per-ballot is definitely smaller and would likely outperform row-per-vote for most tasks, but I feel like row-per-vote is more flexible and the queries would be easier to compose without resorting to SQL... I think I'd probably stick with row-per-vote unless it becomes evident that row-per-ballot offers necessary advantages.
jsingleton37
jsingleton37OP8mo ago
The only thing I will be querying for is to make a page similar to this https://www.espn.com/college-football/rankings. And then maybe to say how each voter voted Also, were you suggesting to change the submitted values from
{
"rank_1": "",
"rank_2": "",
"rank_3": "",
"rank_4": "",
"rank_5": ""
}
{
"rank_1": "",
"rank_2": "",
"rank_3": "",
"rank_4": "",
"rank_5": ""
}
to something like
[
{
"teamId": ",
"rank": 1
},
{
"teamId": "",
"rank": 2
},
{
"teamId": "",
"rank": 3
},
{
"teamId": "",
"rank": 4
},
{
"teamId": "",
"rank": 5
}
]
[
{
"teamId": ",
"rank": 1
},
{
"teamId": "",
"rank": 2
},
{
"teamId": "",
"rank": 3
},
{
"teamId": "",
"rank": 4
},
{
"teamId": "",
"rank": 5
}
]
A Dapper Raccoon
Yeah that's what I was thinking. I just like that you can iterate over the array and test it's length and such, instead of pulling k/v pairs from the object. I also like that the format has a better parity with your data model. Likewise, the original request format would map better to the row-per-ballot strategy, as then rank_15 directly corresponds to a column/property on your model. When the API data format closely aligns with the resource model, you get to skip over some parsing/mapping work which you'd otherwise need to do 👍
jsingleton37
jsingleton37OP8mo ago
I started on this and ran into an issue where the do nothing on conflict causes the primary key to still increment even though no inserts are happening

Did you find this page helpful?