Performance questions

I'm using Drizzle ORM with Planetscale DatabaseJS driver and my application has spots where I want to decrease number of HTTP requests to my database. I have following questions: 1. Does Drizzle+DatabaseJS make separate HTTP request for each query? (I know I should be asking this question to DatabaseJS creators, but asking here in case someone is aware of that) 2. If it does, what would be the best way to group several queries so that they are performed in one HTTP request?
32 Replies
Dan
Dan2y ago
1. Yes 2. That'd require batching implementation which we currently don't have Out of curiosity, what's the reason behind decreasing the number of HTTP requests?
BREAD
BREADOP2y ago
Thank you for your answer In my understanding, each HTTP roundtrip will add additional overhead resulting in increased response time. I have spots in my application where I want to achieve as quick response as possible. I know I can do that using your sql operator, but it does not provide type-safety which is not good Basically, in my current case I need to perform several inserts into different tables. Maybe any way to build query using Drizzle(to ensure type-safety) then turn it into SQL and execute in one request? It would be great if I could do something like this
const query1 = db.insert(table1).values(...).toSQL()

const query2 = db.insert(table2).values(...).toSQL()

await db.execute(`${query1}${query2}`)
const query1 = db.insert(table1).values(...).toSQL()

const query2 = db.insert(table2).values(...).toSQL()

await db.execute(`${query1}${query2}`)
BREAD
BREADOP2y ago
Inside of MySqlInsert class I see following methods. Maybe I can use one of those for my use case?
BREAD
BREADOP2y ago
As far as I can see, I can get an object consisting out of query and params. For now I can't find a way of turning this staff into plain query
BREAD
BREADOP2y ago
So you are passing query and args separately to Planetscale driver directly and they use their formatter(if no custom is passed) to transform it into query. Maybe I can reuse it to transform drizzle query into plain SQL and then execute it
BREAD
BREADOP2y ago
@Dan Kochetov Using the code below I was able to transform it into query. Only thing is that I had to use as any[] but it should not be a problem since Drizzle handles type-safety anyway. As far as I can see queries are not prepended with semicolon. Construction on second ss should result in working query, but haven't checked in runtime. What do you think? If it's too cringe pls don't ban me ;D
insert into `orders` (`user_id`, `status`, `products_amount`, `shipping_price`, `shipping_method`) values ('user_2MrQqlBvNUnKgSPHwXqd6p10xUI', 'pending', 50000, 0, 'free')
insert into `orders` (`user_id`, `status`, `products_amount`, `shipping_price`, `shipping_method`) values ('user_2MrQqlBvNUnKgSPHwXqd6p10xUI', 'pending', 50000, 0, 'free')
Dan
Dan2y ago
If the driver supports running multiple statements in a single query - sure, go for it, looks like a good enough replacement of batching
Mosaad
Mosaad2y ago
if those queries don’t depend on each other, i use promise all
Dan
Dan2y ago
promise.all just runs the queries in parallel, it's still multiple HTTP requests if your goal is to decrease latency, it's fine
mcgrealife
mcgrealife2y ago
@BREAD have you considered edge runtimes? In my case, the cold start time of lambdas node runtime is where most of the overhead occurs. In edge runtimes though, the cold start is milliseconds. From there, Most multiple round trips to the database are near-negligible difference on total duration. As long as the edge runtime region is limited to be near the PlanetScale MySql server (both in US-east in my case)
BREAD
BREADOP2y ago
@mcgrealife thanks for tip. Will definitely try it out
Mosaad
Mosaad2y ago
worth noting that CloudFlare recently announced “Smart Placement” which automatically puts your worker closer to the database source. working great for me with planetscale
BREAD
BREADOP2y ago
@Dan Kochetov Hi, have another question regarding performance. It is related to relational queries. I have following code:
await db
.query
.orders
.findMany({
with: {
orderItems: {
with: {
product: true
}
},
},
where: (orders, { eq }) => eq(orders.userId, userId),
})
await db
.query
.orders
.findMany({
with: {
orderItems: {
with: {
product: true
}
},
},
where: (orders, { eq }) => eq(orders.userId, userId),
})
It results in query I attached in the file query.txt The query explanation is attached in the form of screenshot Judging by query explanation it seems like such query structure is not able to leverage indexes on tables and performs full table scans Is there any way I can optimise it? As far as I can see in the docs (https://orm.drizzle.team/docs/joins) simple drizzle joins does not use JSON functions and aggregations. Are there any plans on optimising relational queries?
Dan
Dan2y ago
Indexes usage depends on a lot of factors, including number of rows in the tables Query planner might not always use an index even if it can be used, if it thinks it'll be more efficient not to
BREAD
BREADOP2y ago
@Dan Kochetov Even if we ignore indexes, it looks like Drizzle in the case of relational queries relies on multiple subqueries that does not have any constraints. Meaning that each subquery will select whole table. In addition, if we take a look at my case, I need to select orders that are bound to specific customer, but as far as I can see, the query built by Drizzle will select all orders with all joined and aggregated data using subqueries and only then will perform WHERE lookup on resulting derived table. Since this table is derived, not an original one stored in DB, it makes sense it does not utilise indexes, since there are none
Dan
Dan2y ago
Are you sure it works like that in the query planner? For example, the query planner in Postgres is smart enough to optimize those queries properly.
BREAD
BREADOP2y ago
@Dan Kochetov I'm using PlanetScale which uses MySQL under the hood. As you can see in query explanation(I've used EXPLAIN <query> to analyse it on DB level) the outermost derived table, from which final select with WHERE clause happens, is estimated to have ~98k rows(I have ~100k records in orders table right now). The type of JOIN is ALL meaning full table scan is performed. Additionally, in 3 cases out of 5 type of JOIN is ALL. In these three cases relatively large amount of rows is expected to be returned. And in the same three cases no index is being used. In two cases it uses filesort for full table scan
Dan
Dan2y ago
I see, I'll think about it
BREAD
BREADOP2y ago
@Dan Kochetov Btw, also noticed, that such query results in large amount of row reads. I also guess that it will only grow with tables size the query is performed against. To compare with, simple query with two joins(that should return the same data, just in different format)
select orders.id as order_id from orders left join order_items on orders.id = order_items.order_id left join products on order_items.product_id = products.id where orders.user_id = ?
select orders.id as order_id from orders left join order_items on orders.id = order_items.order_id left join products on order_items.product_id = products.id where orders.user_id = ?
Results in only 4 reads and this amount should not depend on table size, but on number of rows connected with each other(orders->order_items->products)
Dan
Dan2y ago
Got it. Yes, this is expected, if indexes are not used. I'll look into how it can be improved. Thanks for investigation!
BREAD
BREADOP2y ago
@Dan Kochetov I've played around and thought about how can it be optimised and may suggest couple of ideas. Hope, you will find something useful here The first one is based on fact that Drizzle tries to perform only one query to retrieve all relational data. I changed a structure a bit so that it takes advantage of indexes. Example and result are attached. Query explanation as well. So there are two major changes. Firstly, WHERE clause with the main condition is used in outermost subquery, which actually selects from main table orders. It allows query planner to utilise index. Secondly, subquery used for left join is kind of simpler version of outermost subquery. It selects orders by user_id but also joins order items. Such structure allows to utilise the index in here as well. The resulting data format looks like the one drizzle can handle right now. The idea here is to use subqueries that are able to leverage indexes. In query explanation you can see that the estimated amount of row reads is minimal. On the other hand, such structure implies repetitive queries resulting in reading the same rows several times, but it is still way less than selecting whole tables The another idea I had is from 'Keep it simple stupid' category and might not be compatible with Drizzle mindset. Separate queries(4th screenshot). Such approach will allow to keep amount of row reads at minimum, no complicated queries need to be built for deeply nested relations. From disadvantages: such approach completely differs from what Drizzle is currently doing; multiple queries/roundtrips are going to be needed instead of one(the amount will grow depending on depth of relation to select); additional overhead on client(especially when selecting large amounts of data since each step is being returned to client )
Dan
Dan2y ago
Thanks a lot for the effort you made! However, I'm not sure this approach can be applied, because the query you've optimized solves a specific case, and I don't think it can be applied to the general case. I see that you've put the where clause in the innermost subquery, which works for your particular case when you're only filtering by the orders table column, but it's also possible to filter by relational fields that are added in different subqueries (and we're considering implementing filtering by deeply nested relational fields as well). That's why I've added the where clause to the outermost query instead, where all the relations are joined. Also, since our where API accepts any SQL condition, it wouldn't really be possible to split it based on the column to, for example, add the user_id condition to the inner subquery and move the rest to the top.
BREAD
BREADOP2y ago
@Dan Kochetov Hm, maybe it's possible to think about it in following way: 1. Every relational query starts from some main table. In current case orders. Condition applied on this level is applicable to main table only. Let's call it main condition 2. Each time Drizzle goes deeper it builds subquery for LEFT JOIN. Each such subquery contains main condition to make sure we select only relevant data from main table and make use of indexes. In case some additional condition on relation is needed, it can be added as usual(using Drizzle where API). Since it is being applied on specific relation, it will be used only there, I don't think it has to be repeated like the main condition. Relational query config has nested nature, therefore 1st level condition have to be applied to 1st level relation and so on(I attached query example where I added conditions to order_items and products) If I properly understand the meaning of filtering by deeply nested relational fields then I believe it should be possible to implement it using suggested structure. As I already mentioned, condition will be applied to respective subquery where table, being joined right now, is introduced Is there a specific reason, why you want to apply conditions after relations are joined, not before/in the process of it (like I did for order_items and products)? In theory, if we need to filter out specific, let's say, products, why not doing it in subquery where this table is being introduced?
Dan
Dan2y ago
Is there a specific reason, why you want to apply conditions after relations are joined
because all the conditions are applied at the same time (it's a single SQL chunk), and it's possible that one of the conditions may reference a relation field that is only joined in one of the outer subqueries and thus is not available in the innermost query. Also, the query you've sent uses two joins at the same level, which won't work I've implemented it like this initially, and it resulted in items duplication in the result You need to add one join per subquery for this to work correctly the where condition is defined like this (for example):
where: and(eq(orders.id, id), sql`json_array_length(${orders.orderItems}) > 0`))
where: and(eq(orders.id, id), sql`json_array_length(${orders.orderItems}) > 0`))
So here, there are two conditions, one of them can be applied to the innermost query and another one cannot, because it references a relation field that is joined in one of the outer subqueries. Since we can't split the condition (it can be any SQL expression, really), the only solution is to always apply it to the outermost subquery, that has all the relations joined.
BREAD
BREADOP2y ago
@Dan Kochetov I see in current implementation Drizzle relies on multiple GROUP BY. Can we not use it to ensure that only distinct records will be selected
Dan
Dan2y ago
Not really sure, this requires further research
BREAD
BREADOP2y ago
@Dan Kochetov To clarify, will these two WHERE clauses be combined and applied on the same level?
BREAD
BREADOP2y ago
If the conditions would be scoped to the subqueries, there shouldn't be an issue with referencing table that is out of scope, if condition is built following Drizzle's type hints Regarding your condition example
where: and(eq(orders.id, id), sql`json_array_length(${orders.orderItems}) > 0`))
where: and(eq(orders.id, id), sql`json_array_length(${orders.orderItems}) > 0`))
Well, if I understand correctly, sql operator does not provide any type-safety and therefore should not be treated as a tool that provides guarantee because the user is in complete control and can pass anything in there whether it's correct or not
Dan
Dan2y ago
no, see the example condition I've provided. It's what you can specify on the root query level, and it uses fields that come from subqueries from different levels.
BREAD
BREADOP2y ago
@Dan Kochetov As far as I understand, this example filters order that have less than 0 order items. The same can be achieved, but using the HAVING clause (condition is different because in my case there can't be order with no items). Not sure if in Drizzle query I can access specific column, maybe with JSON functions but I'm not familiar with them enough. It is also possible to filter specific values without relying on JSON functions(check last two screenshots). It is still not possible to access deeper level joined table orders_orderItems_product but it seems to me that it's not possible in Drizzle either. Let me know if I'm wrong
itay
itay14mo ago
how is this strategy working for you to batch multiple statements together? also what is that format() method you are using? did you implement it or is it a lib?
BREAD
BREADOP14mo ago
Hey, didn't try it out myself. The format() method was taken from the planetscale driver if I remember correctly
Want results from more Discord servers?
Add your server