Hey all, I’m experiencing unexpectedly
Hey all, I’m experiencing unexpectedly slow insert performance with Cloudflare Vectorize during a large-scale vector insertion. Over 12 hours, I successfully inserted about 2.5 million documents individually or in very small groups (1-2 vectors at a time). However, after about 36 hours, my process is still at around 1.9 million vectors total. It appears that Vectorize is batching inserts at about 1,000 vectors each, rather than the advertised batches of up to 200,000 vectors for improved throughput.
My understanding was that Vectorize would automatically batch inserts at these larger sizes to optimize performance, but this doesn’t seem to be happening. Do I need to explicitly batch my inserts (e.g., in groups of 5,000 vectors) to achieve better efficiency, or is there something else going on here?
Could anyone from Cloudflare clarify how batching works internally with Vectorize and suggest the best practices or architecture adjustments for optimizing large-scale vector insert operations?
Thanks so much
4 Replies
Hey @Nick, if you want to observe a high upsert throughput, please supply Vectorize with fewer, larger batches of vectors. The closer you are to the Vectorize upsert batch limit of 5000, the higher throughput you'd observe.
Unknown User•2w ago
Message Not Public
Sign In & Join Server To View
Great thanks so much guys
@garvitg @Alex Graham And is there any difference between inserting and upserting in terms of throughput?
I understand upserting will overwrite if IDs match, but if I can guarantee that IDs are unique, is there any benefit to upserting instead of inserting?
Thanks again
Unknown User•2w ago
Message Not Public
Sign In & Join Server To View