Vector search with filtering.
Hi!
I am interested in the ordering in vector search with filtering. 1. First the search is performed and then filtering and we may have empty results. 2. First filtering and then search and as a result we always have a given number of documents in the search.
I assumed that Xata had the second method and the code was customized for it. The program worked and there were no problems. But then errors started to occur (probably due to more testing) and it turned out that the first option worked.
Is this true or not? Thank you.
10 Replies
hi, we do pre-filtering by default, indeed, so version 2.
Can you tell me more about what test you performed where it doesn't seem to be the case?
The number of base chunks is greater than two.
Perhaps filtering has started to work differently?
Hi, there haven't been updates to the vector search endpoint recently, so any deviation in results would definitely not be due to changes in the filtering order.
In your screenshots, I see that when not applying a filter there are 3 results, while with the filter applied there are 2 results. That means, that the record with topic
61
does not match the filter. Does that not look correct to you?The question is, do you search first and then filter and maybe an empty result will be returned or do you filter and then search only the desired chunks and return a given number of chunks.
We filter first and then search only the chunks that matched the filter
And then the search should return the given number of documents if there are more documents after filtering. But that's not the case in the image. 3 chunks are requested and 2 are returned, although there are many of them in the database.
This would be because there are 2 documents that match both the filter and the vector query. The size parameter is the max number of results to return but if there aren't enough matches then the API will only return those that satisfy the query parameters.
Is there a document returned when running the vector query with the filter not applied, which should normally match the filter and be returned when the filter is applied? If so, can you grant us access to your workspace by enabling the "Allow support to view your workspace" under Settings, and share here the the vector query you are using so we can reproduce and investigate?
Thank you for granting us access. Could you send here in text the vector queries that demonstrate the problem, so we can have a look?