naaviii Posts - Answer Overflow

naaviii

•Created by naaviii on 8/29/2024 in #⚡｜serverless

Urgent: Issue with Runpod vllm Serverless Endpoint

We are encountering a critical issue with the runpod vllm serverless endpoint. Specifically, when attaching a network volume, the following code is failing:

response = client.completions.create(
    model="llama3-dumm/llm",
    prompt=["hello? How are you "],
    temperature=0.8,
    max_tokens=600,
)

But the below is working :

response = client.chat.completions.create(
    model="llama3-dumm/llm",
    messages=[{'role': 'user', 'content': "hell0"}],
    max_tokens=100,
    temperature=0.9,
)

And This is the client object : client = OpenAI(

api_key=api_key, base_url=f"https://api.runpod.ai/v2/endpoint_id/openai/v1", )

This behavior is unusual and suggests there might be a bug. Given our tight deadline, could you please investigate this issue as soon as possible? Your prompt assistance would be greatly appreciated. Thank you very much for your help.

22 replies

Gaming

Programming