Request Format Runpod VLLM Worker
I have been using the above format with Runpod VLLM worker to utilize the chat history functionality.
I've been getting the error that input is missing in the JSON request so this works.
{
"input": {
"prompt": "Tell me why RunPod is the best GPU provider",
"sampling_params": {
"max_tokens": 100
},
"apply_chat_template": true,
"stream": true
}
}
Did the input change recently?
7 Replies
No, its always been like that
Everything sent in the payload to serverless needs to be in
input
and the output that is returned is in output
.
You need to move the payload above to be in the input
field.
2024-01-15T23:05:16.864801581Z TypeError: Object of type AsyncEngineDeadError is not JSON serializable
Gave this error.
Hi, @Concept were you able to figure out the right format for vLLM chat interface? I'm facing the same issue
const requestBody = {
input: {
prompt: chatHistory,
sampling_params: {
max_tokens: 2000,
},
apply_chat_template: true,
stream: true,
},
};
This worked for me
Cool yeah that worked for me too. I was wondering if the messages property gives better result for chat conversations.
I’m not too sure if there’s a difference
Hi, to do chat history/multiple messages, you must use messages instead of prompt, as shown in the example below