RunPod•4mo ago

Too big requests for serverless infinity vector embedding cause errors

I keep running into "400 Bad Request" server errors for this service, and finally discovered that it was because my requests were too large and running into this constraint: https://github.com/runpod-workers/worker-infinity-embedding/blob/acd1a2a81714a14d77eedfe177231e27b18a48bd/src/utils.py#L14

    INPUT_STRING = StringConstraints(max_length=8192 * 15, strip_whitespace=True)
    ITEMS_LIMIT = {
        "min_length": 1,
        "max_length": 8192,
    }

    INPUT_STRING = StringConstraints(max_length=8192 * 15, strip_whitespace=True)
    ITEMS_LIMIT = {
        "min_length": 1,
        "max_length": 8192,
    }

Is this a hard limit?

GitHub

worker-infinity-embedding/src/utils.py at acd1a2a81714a14d77eedfe17...

Contribute to runpod-workers/worker-infinity-embedding development by creating an account on GitHub.

4 Replies

Jason•4mo ago

i dont know why they set this limit but try to open a issue on the github for now you can chunk your requests first to smaller size or to that size estimated

zilliOP•4mo ago

Yep, that's what I started doing, but it's hard to come close to the memory limits of the GPU with a cap of 8192 items

flash-singh•4mo ago

@zilli if you can open a pr for that or create issue, not sure what the intent was behind this limitation

zilliOP•4mo ago

I finally remembered to start building the docker image (the last one took 2.5 hours...). I'll try it out tomorrow, and if it works, put in that PR ...and the build failed because the hardcoded nightly version of pytorch (for the end of life CUDA 12.1.0) is unavailable 😅 I'm rebuilding after updating dependencies, but the PR won't be just the string length change.

Gaming

Programming

Too big requests for serverless infinity vector embedding cause errors

Did you find this page helpful?