Deploying bloom on runpod serverless vllm using openai compatibility, issue with CUDA?
Hey guys, I'm getting this error and I really need help:
InternalServerError Traceback (most recent call last)
Cell In[42], line 4
1 def get_translation(claim, model=model):
2 user_prompt = f"Translate the following claim into English: '{claim}'. You must always make sure your final response is prefixed with 'Translated Claim:' followed by the translated claim."
----> 4 response = client.chat.completions.create(
5 model=model,
6 messages=[
7 {"role": "user", "content": user_prompt}
8 ],
9 temperature=0,
10 #max_tokens=100,
11 )
12 if not response or not response.choices:
13 print("Error: Model response is empty or malformed.")
File ~/anaconda3/envs/venv_name/lib/python3.12/site-packages/openai/_utils/_utils.py:274, in required_args.<locals>.inner.<locals>.wrapper(*args, **kwargs)
272 msg = f"Missing required argument: {quote(missing[0])}"
...
(...)
1048 retries_taken=options.get_max_retries(self.max_retries) - retries,
1049 )
InternalServerError: Error code: 500 - {'error': 'Error processing the request}
for context, what I'm trying to do is run the bloom-7b1 model for inference on vscode through runpod, and I'm using the openai compatibility to send requests.
the function that triggers the error when called is attached
I was looking into possibilities, do I have an issue with the CUDA version? Bloom-7b1's documentation says it runs on CUDA 11.5 but vllm only supports 11.8 and 12, my serverless endpoint currently is 12. How can I make it work if this is the issue?
0 Replies