Too big requests for serverless infinity vector embedding cause errors
I keep running into "400 Bad Request" server errors for this service, and finally discovered that it was because my requests were too large and running into this constraint: https://github.com/runpod-workers/worker-infinity-embedding/blob/acd1a2a81714a14d77eedfe177231e27b18a48bd/src/utils.py#L14
Is this a hard limit?
GitHub
worker-infinity-embedding/src/utils.py at acd1a2a81714a14d77eedfe17...
Contribute to runpod-workers/worker-infinity-embedding development by creating an account on GitHub.
4 Replies
i dont know why they set this limit but try to open a issue on the github
for now you can chunk your requests first to smaller size or to that size estimated
Yep, that's what I started doing, but it's hard to come close to the memory limits of the GPU with a cap of 8192 items
@zilli if you can open a pr for that or create issue, not sure what the intent was behind this limitation
I finally remembered to start building the docker image (the last one took 2.5 hours...). I'll try it out tomorrow, and if it works, put in that PR
...and the build failed because the hardcoded nightly version of pytorch (for the end of life CUDA 12.1.0) is unavailable 😅
I'm rebuilding after updating dependencies, but the PR won't be just the string length change.