naaviii
RRunPod
•Created by naaviii on 8/29/2024 in #⚡|serverless
Urgent: Issue with Runpod vllm Serverless Endpoint
We are encountering a critical issue with the runpod vllm serverless endpoint. Specifically, when attaching a network volume, the following code is failing:
response = client.completions.create(
model="llama3-dumm/llm",
prompt=["hello? How are you "],
temperature=0.8,
max_tokens=600,
)
But the below is working :
response = client.chat.completions.create(
model="llama3-dumm/llm",
messages=[{'role': 'user', 'content': "hell0"}],
max_tokens=100,
temperature=0.9,
)
And This is the client object :
client = OpenAI(
api_key=api_key, base_url=f"https://api.runpod.ai/v2/endpoint_id/openai/v1", )This behavior is unusual and suggests there might be a bug. Given our tight deadline, could you please investigate this issue as soon as possible? Your prompt assistance would be greatly appreciated. Thank you very much for your help.
22 replies