Raw SQL much faster (200x) than Drizzle for some queries

For some of our queries we see a huge difference in speed between executing the query with Drizzle vs. a direct raw Postgres query without Drizzle. The raw query takes between 10-20ms to return, however if we run the same query with Drizzle, then it can take up to 5000ms to return. This seems to happen for queries that involve pg_vector for some reason. Has anybody noticed this before? Does Drizzle do anything beside creating a raw SQL query and sending it to the DB that could explain this huge difference? We are using Neon as Db provider and our app is running on Cloudflare Workers. We have tried using the neon/serverless driver as well as the "pg" client (using Cloudflare Hyperdrive).
3 Replies
T3NED
T3NED7mo ago
do you have an example reproduction, or at least the raw query vs drizzle query? im using pgvector on a large dataset (250m+ rows) and have no problem with it
Tilon
Tilon7mo ago
Here is the Drizzle query (actually query + subquery). I do not have the raw query right now, but will be able to add it here as well tomorrow. Essentially it's a search based on CLIP embeddings + additional scoring on product attributes that match a search query.
const similarity = db
.select({
styleId: Variants.styleId,
id: Variants.id,
distance: sql`
${Variants.avgClipEmbedding} <=> ${toVector(embedding)}
`.as("distance"),
})
.from(Variants)
.where(
and(
eq(Variants.tenantId, tenantId),
exists(
sql`
(
SELECT 1
FROM ${VariantSizeOptions}
WHERE ${and(
eq(VariantSizeOptions.tenantId, Variants.tenantId),
eq(VariantSizeOptions.styleId, Variants.styleId),
eq(VariantSizeOptions.variantId, Variants.id),
eq(VariantSizeOptions.enabled, true),
filter?.sizes?.length
? inArray(VariantSizeOptions.size, filter.sizes)
: undefined,
filter?.lengths?.length
? inArray(VariantSizeOptions.length, filter.lengths)
: undefined
)}
)
`
)
)
)
.orderBy(asc(sql`"distance"`))
.limit(1000)
.as("similarity");

const query = db
.select({
styleId: similarity.styleId,
id: similarity.id,
score: sql`(1 - ${similarity.distance}) *
${SearchService.CLIP_SEARCH_SCORE_BOOST_FACTOR}
`
.mapWith(Number)
.as("score"),
})
.from(similarity)
.where(isNotNull(similarity.distance))
.orderBy(desc(sql`"score"`), desc(sql`"styleId"`), desc(sql`"id"`));

return query;
const similarity = db
.select({
styleId: Variants.styleId,
id: Variants.id,
distance: sql`
${Variants.avgClipEmbedding} <=> ${toVector(embedding)}
`.as("distance"),
})
.from(Variants)
.where(
and(
eq(Variants.tenantId, tenantId),
exists(
sql`
(
SELECT 1
FROM ${VariantSizeOptions}
WHERE ${and(
eq(VariantSizeOptions.tenantId, Variants.tenantId),
eq(VariantSizeOptions.styleId, Variants.styleId),
eq(VariantSizeOptions.variantId, Variants.id),
eq(VariantSizeOptions.enabled, true),
filter?.sizes?.length
? inArray(VariantSizeOptions.size, filter.sizes)
: undefined,
filter?.lengths?.length
? inArray(VariantSizeOptions.length, filter.lengths)
: undefined
)}
)
`
)
)
)
.orderBy(asc(sql`"distance"`))
.limit(1000)
.as("similarity");

const query = db
.select({
styleId: similarity.styleId,
id: similarity.id,
score: sql`(1 - ${similarity.distance}) *
${SearchService.CLIP_SEARCH_SCORE_BOOST_FACTOR}
`
.mapWith(Number)
.as("score"),
})
.from(similarity)
.where(isNotNull(similarity.distance))
.orderBy(desc(sql`"score"`), desc(sql`"styleId"`), desc(sql`"id"`));

return query;
We are using the HSNW index for pgvector At this point we have around 250K rows of "Variants" (CLIP embeddings), so not that much really. Using Postgres 16, the raw query equivalent of above would even return under 10ms, just with Drizzle we see sometimes total query times of several seconds for some reason. Sometime the Drizzle queries are also "fast" and in the 100-400 ms range, however at least one out of 10 queries takes several seconds.
Angelelz
Angelelz7mo ago
It is understandable that the drizzle queries run slower that raw sql. Drizzle has to build the query from the object you give it. That's why drizzle offers prepared queries as a way to continue using the type-safety of drizzle with the speed closer to raw sql
Want results from more Discord servers?
Add your server