R
RunPod•3mo ago
zilli

Too big requests for serverless infinity vector embedding cause errors

I keep running into "400 Bad Request" server errors for this service, and finally discovered that it was because my requests were too large and running into this constraint: https://github.com/runpod-workers/worker-infinity-embedding/blob/acd1a2a81714a14d77eedfe177231e27b18a48bd/src/utils.py#L14
INPUT_STRING = StringConstraints(max_length=8192 * 15, strip_whitespace=True)
ITEMS_LIMIT = {
"min_length": 1,
"max_length": 8192,
}
INPUT_STRING = StringConstraints(max_length=8192 * 15, strip_whitespace=True)
ITEMS_LIMIT = {
"min_length": 1,
"max_length": 8192,
}
Is this a hard limit?
GitHub
worker-infinity-embedding/src/utils.py at acd1a2a81714a14d77eedfe17...
Contribute to runpod-workers/worker-infinity-embedding development by creating an account on GitHub.
4 Replies
nerdylive
nerdylive•3mo ago
i dont know why they set this limit but try to open a issue on the github for now you can chunk your requests first to smaller size or to that size estimated
zilli
zilliOP•3mo ago
Yep, that's what I started doing, but it's hard to come close to the memory limits of the GPU with a cap of 8192 items
flash-singh
flash-singh•2mo ago
@zilli if you can open a pr for that or create issue, not sure what the intent was behind this limitation
zilli
zilliOP•2mo ago
I finally remembered to start building the docker image (the last one took 2.5 hours...). I'll try it out tomorrow, and if it works, put in that PR ...and the build failed because the hardcoded nightly version of pytorch (for the end of life CUDA 12.1.0) is unavailable 😅 I'm rebuilding after updating dependencies, but the PR won't be just the string length change.

Did you find this page helpful?