hi guys i'm curious about the pricing models. so the pricing model is based on stored and queried ve
hi guys i'm curious about the pricing models. so the pricing model is based on stored and queried vector dimensions. so what if i use metadata filtering so i'm not querying against the whole vdb but rather a smaller set of data? what would the cost structure look like? is the queried vector dimensions is adjusted based on the queried space instead of the total space?

17 Replies
is there the concept of semantic caching? What about caching for searches based on similarity
Hey, is there any doc / link about using Cloudflare Vectorize with llamaindex ?
AI AutoRag funnel led to error page

Hi CF team, Has there been an issue with Vectorize lately? The indexing is taking more than 5 minutes to be able to query. I know it is an async process but this is a huge time. Can any one provide more details on this?
Hi @beingmudit we are seeing increased Vectorize usage since AutoRAG launch, we are scaling our infra for WAL processing to get the metrics back to normal.
@beingmudit done, it should be under 30 seconds now. Thanks for reporting. Because it didn't cause any errors on our side, we haven't caught the delay in our monitoring.
Thanks @yevgen for the resolving this. Yeah I didnt find any issues in the status page as well.
@yevgen Is there any way to resync error files in vector db?
Hi, is there a way of accessing each vecor to add metadata tags to each vector? Used standard R2 ingest but did not add any meta-data and cannot seem to recover a vector based upon id. ID appears to be random value (hash?) I see that each chunk returns the same basic format:
Chunk 4 Cosine Sim. 0.6107 Relevancy 0.0017 Metadata 450db445-638b-4bb9-a0e8-16af21562e23/291385e0-f9c0-4126-96b4-6b1e0c1dfab1/1997-011.pdf Chunk 1 Cosine Sim. 0.6169 Relevancy 0.0006 Metadata 450db445-638b-4bb9-a0e8-16af21562e23/291385e0-f9c0-4126-96b4-6b1e0c1dfab1/1997-008.pdf Chunk 3 Cosine Sim. 0.6140 Relevancy 0.0019 Metadata 450db445-638b-4bb9-a0e8-16af21562e23/291385e0-f9c0-4126-96b4-6b1e0c1dfab1/1997-012.pdf So the metadata appears to be the same apart from the file name and a chunk number. Is there a way of determining these values? I assume the first one is "processeduptoMutation ID" but not sure what the second value might be or perhaps the other way around. Need to understand how to recover this to be able to recover a specific set of vector values for a chunk. Or, a way of exporting the vectors from vector database without knowing the IDs ?
Chunk 4 Cosine Sim. 0.6107 Relevancy 0.0017 Metadata 450db445-638b-4bb9-a0e8-16af21562e23/291385e0-f9c0-4126-96b4-6b1e0c1dfab1/1997-011.pdf Chunk 1 Cosine Sim. 0.6169 Relevancy 0.0006 Metadata 450db445-638b-4bb9-a0e8-16af21562e23/291385e0-f9c0-4126-96b4-6b1e0c1dfab1/1997-008.pdf Chunk 3 Cosine Sim. 0.6140 Relevancy 0.0019 Metadata 450db445-638b-4bb9-a0e8-16af21562e23/291385e0-f9c0-4126-96b4-6b1e0c1dfab1/1997-012.pdf So the metadata appears to be the same apart from the file name and a chunk number. Is there a way of determining these values? I assume the first one is "processeduptoMutation ID" but not sure what the second value might be or perhaps the other way around. Need to understand how to recover this to be able to recover a specific set of vector values for a chunk. Or, a way of exporting the vectors from vector database without knowing the IDs ?
Hey all, I’m experiencing unexpectedly slow insert performance with Cloudflare Vectorize during a large-scale vector insertion. Over 12 hours, I successfully inserted about 2.5 million documents individually or in very small groups (1-2 vectors at a time). However, after about 36 hours, my process is still at around 1.9 million vectors total. It appears that Vectorize is batching inserts at about 1,000 vectors each, rather than the advertised batches of up to 200,000 vectors for improved throughput.
My understanding was that Vectorize would automatically batch inserts at these larger sizes to optimize performance, but this doesn’t seem to be happening. Do I need to explicitly batch my inserts (e.g., in groups of 5,000 vectors) to achieve better efficiency, or is there something else going on here?
Could anyone from Cloudflare clarify how batching works internally with Vectorize and suggest the best practices or architecture adjustments for optimizing large-scale vector insert operations?
Thanks so much
delete vectors from AutoRAG vector store when file is deleted
We have a new AutoRAG projects and we regularly add new files and delete old files from the data store.
it was observed that when we delete some file from data rource i.e. R2 bucket. the vectors fro these files is still available in the vector store and search results return those vectors.
ideally we should have some way to delete or it should get deleted automatically.
Hi Vectorize team,
I’m really enjoying Vectorize—great work on building it! I’m running into issues though with metadata filtering on a number field.
Issue: I have two indexes with ~3M vectors each, with each vector representing a document. I have a metadata index on the field "authored" which contains a UNIX timestamp representing the date of authorship. Doing a query on date ranges like 1970-1979 or 1939-1955 consistently causes a 504 error (code 7009: upstream unavailable) after a long wait. Without the filter, queries work fine. Ranges of 5 years or less usually work, but not always.
What’s happening: It seems like filtering on number fields ($gt and $lt) with large result sets triggers a timeout.
Request: Could you investigate this? It’d be awesome to know any limits on metadata filtering and if this can be fixed in the future.
Thanks for your help and for making a solid product!
-Nick
Heya all my friend is looking to use vectorize to have an AI that can read from his bookstack instance. He mentioned vectorize to create the index. Does this make sense
I recommend to start with AutoRAG https://blog.cloudflare.com/introducing-autorag-on-cloudflare/
The Cloudflare Blog
Introducing AutoRAG: fully managed Retrieval-Augmented Generation o...
AutoRAG is here: fully managed Retrieval-Augmented Generation (RAG) pipelines powered by Cloudflare's global network and powerful developer ecosystem. Simplify how you build and scale RAG pipelines to power your context-aware AI chatbots and search applications.
much easier to get it up and running.
I would like to remind this again, but is there any update?
Do you expect to have indexes under 1mln vectors with 3072 dimensions? Heavier vectors will be slower to insert, slower to train, and queries will be slower. There a lot of tradeoffs once we allow vectors with 3072 dimensions. It's not on the plans to support in Q2.
Separately, you get diminishing returns with vector dimensions. It varies per model of course, but with AWS’ Titan Embeddings V2 for example found this:
We measured the accuracy of the vectors generated by Amazon Titan Text Embeddings V2 and we observed that vectors with 512 dimensions keep approximately 99 percent of the accuracy provided by vectors with 1024 dimensions. Vectors with 256 dimensions keep 97 percent of the accuracy. This means that you can save 75 percent in vector storage (from 1024 down to 256 dimensions) and keep approximately 97 percent of the accuracy provided by larger vectors.I’d wager it’s a similar story for most models, but maybe your use case is different, but 3k dimensions is wildly expensive vs say 1k or 512. Maybe you know that already, and you need the 3k, but just a thought