RunPod•7mo ago

Deploying bloom on runpod serverless vllm using openai compatibility, issue with CUDA?

Hey guys, I'm getting this error and I really need help: InternalServerError Traceback (most recent call last) Cell In[42], line 4 1 def get_translation(claim, model=model): 2 user_prompt = f"Translate the following claim into English: '{claim}'. You must always make sure your final response is prefixed with 'Translated Claim:' followed by the translated claim." ----> 4 response = client.chat.completions.create( 5 model=model, 6 messages=[ 7 {"role": "user", "content": user_prompt} 8 ], 9 temperature=0, 10 #max_tokens=100, 11 ) 12 if not response or not response.choices: 13 print("Error: Model response is empty or malformed.") File ~/anaconda3/envs/venv_name/lib/python3.12/site-packages/openai/_utils/_utils.py:274, in required_args.<locals>.inner.<locals>.wrapper(*args, **kwargs) 272 msg = f"Missing required argument: {quote(missing[0])}" ... (...) 1048 retries_taken=options.get_max_retries(self.max_retries) - retries, 1049 ) InternalServerError: Error code: 500 - {'error': 'Error processing the request} for context, what I'm trying to do is run the bloom-7b1 model for inference on vscode through runpod, and I'm using the openai compatibility to send requests. the function that triggers the error when called is attached I was looking into possibilities, do I have an issue with the CUDA version? Bloom-7b1's documentation says it runs on CUDA 11.5 but vllm only supports 11.8 and 12, my serverless endpoint currently is 12. How can I make it work if this is the issue?

doc.rtf

0 Replies

No replies yetBe the first to reply to this messageJoin

Gaming

Programming

Deploying bloom on runpod serverless vllm using openai compatibility, issue with CUDA?

Did you find this page helpful?