Pandafan Posts - Answer Overflow

Pandafan

•Created by Pandafan on 8/11/2024 in #⚡｜serverless

Deploying bloom on runpod serverless vllm using openai compatibility, issue with CUDA?

Hey guys, I'm getting this error and I really need help: InternalServerError Traceback (most recent call last) Cell In[42], line 4 1 def get_translation(claim, model=model): 2 user_prompt = f"Translate the following claim into English: '{claim}'. You must always make sure your final response is prefixed with 'Translated Claim:' followed by the translated claim." ----> 4 response = client.chat.completions.create( 5 model=model, 6 messages=[ 7 {"role": "user", "content": user_prompt} 8 ], 9 temperature=0, 10 #max_tokens=100, 11 ) 12 if not response or not response.choices: 13 print("Error: Model response is empty or malformed.") File ~/anaconda3/envs/venv_name/lib/python3.12/site-packages/openai/_utils/_utils.py:274, in required_args.<locals>.inner.<locals>.wrapper(*args, **kwargs) 272 msg = f"Missing required argument: {quote(missing[0])}" ... (...) 1048 retries_taken=options.get_max_retries(self.max_retries) - retries, 1049 ) InternalServerError: Error code: 500 - {'error': 'Error processing the request} for context, what I'm trying to do is run the bloom-7b1 model for inference on vscode through runpod, and I'm using the openai compatibility to send requests. the function that triggers the error when called is attached I was looking into possibilities, do I have an issue with the CUDA version? Bloom-7b1's documentation says it runs on CUDA 11.5 but vllm only supports 11.8 and 12, my serverless endpoint currently is 12. How can I make it work if this is the issue?

1 replies

Gaming

Programming