Running queries with DISTINCT
Hi team!
Is there a way to run queries with something like .distinct(["column_name"])?
We know about summarize (https://xata.io/docs/sdk/summarize#what-can-summarize-do), but it has a limit on 1000 results (and defaults to 20).
We are aware of the alternative using code and filtering out repeated results... Or running direct SQL queries to the API.
What would be the recommended way?
Thanks!
Summarize
Perform calculations and get insights on a set of records using the Xata SDK
3 Replies
Hey Eusebio, the SQL endpoint is also limited to 1000 results. I don't think there is a way, via any of the SQL/API/SDK methods, to do this natively within Xata beyond that limit. We had to place some limits to keep resource utilization in check. In the near future we'll offer dedicated environments where I am fairly sure we'll be able to lift such limitations. I added a feature request: https://feedback.xata.io/feature-requests/p/distinct-with-unrestricted-result-count to take note to address this.
For now you can use the unique count aggregation https://xata.io/docs/sdk/aggregate#unique-count to get the number of unique results and based on that judge if you can use summarize to get the actual results, of if you need to do it with a "code scan" via pagination. Note that, as per the docs, unique count also turns to an approximate approach if it matches over 40k results.
Ok, that's a good approach we can follow. Thanks for adding the FR!
In our case is of course, to avoid getting duplicated results, given that 2 entities can belong to a single other entity, ...
Another option is to duplicate that column adding another "flavor" of it that also has uniqueness enabled. You can get guaranteed distinct values from that column. It will generate a uniqueness error though when writting a duplicate value so you'd need some extra handling in the write/update logic for it.