SQLite 'WHERE table.relationship (one to many) > 0'
Hi! I'm trying to figure out how to query for rows that have a relationship at least defined once.
Can't seem to find any examples of this in the docs. Any help would be greatly appreciated, thanks.
67 Replies
👋 you can use sub query in where if you want:
https://drizzle.run/dxdp67ytednpzy3ercnabpfb
Drizzle Run
Sub query filter in Query API - Drizzle Run
Thanks a bunch! Will try this out.
Works ✅
It seems like in production (with 3000 videos), this query is suuuuuuper slow (takes 10 seconds or more):
Maybe you know @Raphaël M (@rphlmr) ⚡ if anything might be terrible in terms of performance 😄 Maybe some index I'm missing? Maybe the order by?
Do you have indexes on anything involved in your relations?
It makes a huge difference
I don’t think I’ve looked at that!
Was surprised this dataset causes this amount of lag
Where would you optimize given the query above?
timestamps.videoId and videos.id but I guess they are indexed (primary key?). Check in your DB in case it is not.
fields and references of
thumbnailJob
relation
videos.createdAt
too
+ if not all columns of thumbnailJob
are required (same for video), try to select what you really need (https://orm.drizzle.team/docs/rqb#partial-fields-select)They are indeed PKs. I’ll look at the others! Maybe the sorted by field of createdAt actually
Indexing the two fields that are ordered made a huge difference. Dope!
Takes on average 300ms which is still pretty slow though if you ask me, hmm...
I'll try to optimize further 🙂
If you can, try select and joins :p
then you will be 1:1 with a "handmade" sql query
Could you elaborate what you mean? I suck at this language 😂😂😂
The query API will be reviewed again by the dev team because other people have complained about it and SQLite as well.
Drizzle has 2 APIs: Query (the one you use) and Select (the "classic" one that looks like SQL)
Drizzle ORM - Select
Drizzle ORM is a lightweight and performant TypeScript ORM with developer experience in mind.
I see. Yeah I did not use sqlite to have slow read speeds
So they are different from eachother? Interesting
something like that maybe:
yes, the Query API will use some json db function to transform the result.
The Select API will not but Drizzle handles the final mapping
Man this is insanely more readable IMO
This query results in around 100ms, which is a great improvement. Still slow though maybe? Not sure.
Uncommenting the innerJoin of the thumbnail jobs table makes it take up to 8 seconds to run. Not sure if the change I made from what you posted (no
on
was the issue). What you posted didn't compile because innerJoin expected a second argument.
That inner join should be indexed because of PK/FK relationship, right?yes
oupsy, yes innerJoin needs the "on" :tired:
You start to love SQL.
1. The inner join makes it ultra slow
2. The output is now a JSON of
thumbnail_jobs
instead of the video perspective thumbnailJob
I guess those are the two things that are changed nowthe initial query could be improved. the sub query is no more needed here. It can be an innerJoin too
does thumbnailJobs has an index on videoId ?
a unique index maybe?
If your database is huge and you have just added the index, I wonder if it is instant or needs some time to build it.
Commenting out the
where(subquery)
reduces it (even with the thumbnail job inner join) to about 10ms(!!!), so I think you're on to something here.
There's nothing more than a one to one. Maybe that makes it unique? Not sure how this ORM differentiates between 1tm and 1t1 relationships
How would you conceptualize this? You would inner join on what condition? And what table? Trying to wrap my head around itinnerJoin on timestamps with timestamps.videoId
Except if you have a many-to-many relationship here (you will have as many rows as there are matches).
There shouldn't be a many-to-many relationship anywhere in the model thus far
you have a unique timestamps.videoId ?
But that wouldn't restrict videos from only returning video rows based on the conditions of at least 1 timestamp existing?
Yes, it's a PK UUID
yes that's it.
Are you saying yes that understanding is correct, or yes it would restrict? 😄
Yes, for both, it will restrict, and I think this is what you understood. (sorry, not my native language 😬)
Are you thinking something like:
innerJoin is like => ok if we match, else, not this video in the final result
leftJoin is like => ok if we match, else, would have null for the joined table results
maybe, yes. Not sure about the having if
timestamps.videoId
is unique
if there is no match on videos.id, this video will not be in the final resultI'm remembering slowly how joins work by having this discussion 😄
Did you feel like a superhero who regains his power?
It should be unique as it's a FK!
yes if it is not a composite fk 😄
I think I feel like a bank person walking with a fancy suit to work, tripping on every piece of dog poop there is
That's fair. It is not 😄
Dropped the group by, dropped the having
looks good to me
About 100 ms now
Now, should I tell you how to change the result shape?
Removing the joins nets like 3ms~~
It's completely broken, so that'd be great 😄
how is the result? does it have the expected columns?
select
can tell what to get from the tables
So the videos is newly nested (didn't exist as a level before). And timestamps is not an array anymore. And thumbnail_jobs is not camelcase singular 😄
Drizzle ORM - Goodies
Drizzle ORM is a lightweight and performant TypeScript ORM with developer experience in mind.
ohhh I think we miss something about timestamps here isn't it?
can you have many timestamps for a video?
Yes!!!
so now you have as many videos as you have timestamp matches lol
it's named a cardinality something
so a groupby is required too
So, to not lose your time, let's recap
you have the same request time for both queries?
I'm calcing the time elapsed with
console.time
and console.timeEnd
and this is done locallyare they equal or one is faster?
When you say both queries, what do you mean?
this one vs?
there has been like 10 variants 😄
the one using
query
vs the last with select
Oh. The original one took 12 seconds or more. This one takes 300 ms
(select is faster)
but that was also without the indexes
wait a second. They are about the same now. I think both are around 300 ms
I'm lying to you...
the old one takes 300 ms (non-select)
ok good (if I can say that lol)
so it is useful I try to help you to finish this query with timestamps array
Yes! I did try to add
timestamps: getTableColumns(timestamps),
but then it turned out to be 150ms, and it only returned an objectoh I am curious about the returned values of this one (with groupBy)
yes to array is not built in 😬
we need to agregate the results to group timestamps
we can do that in plain javascript but the db can do that too (I'm sure for postgres, but sqlite ... 😄 )
Trying to find the docs on this conversion... Any guidance?
This query is way too slow 😦 I need to stop for now, but will continue to optimize this
If I can't make this at least sub 50ms, then I feel something is off
This database should be insanely quick to read...
every step of the way is causing me problems. Never encountered something like this in the world of SQL. Always used postgres and maybe I should've stuck with that
yes sorry I was building a helper:
Hi again @Raphaël M (@rphlmr) ⚡ . Thanks again for the assistance. I tried your helper and it worked, but I couldn't manage to make it quicker than this:
This query takes roughly 130 ms for a limit of 12 videos
(It turns out I don't actually need more than the count)
130 ms is still slow if you ask me... like, really slow haha
but it's better than 300 that I had
Hey thanks for reporting back.
Could you share the generated sql of this request? (you can enable logger in drizzle https://orm.drizzle.team/docs/goodies#logging)
Drizzle ORM - Goodies
Drizzle ORM is a lightweight and performant TypeScript ORM with developer experience in mind.
Does this help?
(default logging on localhost)
Seems like these queries when running in production on Fly (same dataset) takes 350-400 ms 😦
(also with a limit of 12)
It will depend on how far your server is from your database 😬
Formatted it gives that
for me, looks like there is too many sub select that query the same thing
My db is running on a mounted volume in what’s called LiteFS. So every query runs on the same machine
And the times printed above are from the ORM typescript code
Hmmm, looks funky indeed!
i wonder if this could work
Oooh you’re using the extras field?
I’ll try that when I get back home!
Really curious how the deployed version of identical data takes 3 times longer
Will have to investigate that
Ok works as expected in drizzle run
Great! Is it any quicker?
nope, you have to split your declaration if you want to prevent errors on the column alias
ex: you rename
timestampsCount
extra, it will still works
of course you can where: gt(sql
"timestampsCount", 0),
But be careful, you have no guard against a misspelling.
My personal rule: whether to split or not, do what you think is best for your future selfThanks Raphael. I’ve been talking to people in the Fly community, and I got an answer like this:
litefs just primarily handles the global replication + write node to primary right? If you're reading from your local sqlite file, it should take microseconds for simple queries and single digit ms for complex ones. I have queries that join/filter/select json props and they average ~5ms.Something seems very off here I’m going to try to see what happens without drizzle
I think it is more a driver issue than Drizzle itself because drizzle only build queries that are then passed to the driver. (i can be wrong though)
Interesting reporting you are sharing here!
@Andrew Sherman are you aware of issues like this?
SQLite on fly litefs, better sqlite 3 driver, a query that is very long with a simple join (better with select api but not that fast), column indexed.
Someone speculated that my deployed variant being slower might be Fly machine config (shared cpu instead of performance), but that doesn’t explain the 130 ms one locally…
I must have overlooked one important detail: FK:s are NOT automatically indexed. I didn't actually know this. After adding the FK index to
timestamps.video_id
these are the results:
@Raphaël M (@rphlmr) ⚡ this is an improvement of astronomical proportionsWow !
Thanks for sharing. I thought they were too
Actually they are implicitly on association table with a composite PK made with them, but it is not your case here.
Today I learned!
I know what to do after holidays on my client project 😅
I learnt as well! Thanks a bunch for the back and forth. Really appreciated your help.