RunPod•16mo ago

Request Format Runpod VLLM Worker

{
  "conversation": {
    "id": "some_conversation_id",
    "messages": [
      {
        "source": "USER",
        "content": "Previous messages in the conversation..."
      }
    ]
  },
  "message": {
    "content": "Tell me why RunPod is the best GPU provider",
    "source": "USER"
  }
}

{
  "conversation": {
    "id": "some_conversation_id",
    "messages": [
      {
        "source": "USER",
        "content": "Previous messages in the conversation..."
      }
    ]
  },
  "message": {
    "content": "Tell me why RunPod is the best GPU provider",
    "source": "USER"
  }
}

I have been using the above format with Runpod VLLM worker to utilize the chat history functionality. I've been getting the error that input is missing in the JSON request so this works. { "input": { "prompt": "Tell me why RunPod is the best GPU provider", "sampling_params": { "max_tokens": 100 }, "apply_chat_template": true, "stream": true } } Did the input change recently?

7 Replies

ashleyk•16mo ago

No, its always been like that Everything sent in the payload to serverless needs to be in input and the output that is returned is in output. You need to move the payload above to be in the input field.

ConceptOP•16mo ago

{
  "input": {
    "conversation": {
      "id": "some_conversation_id",
      "messages": [
        {
          "source": "USER",
          "content": "Previous messages in the conversation..."
        }
      ]
    },
    "message": {
      "content": "Tell me why RunPod is the best GPU provider",
      "source": "USER"
    }
  }
}

{
  "input": {
    "conversation": {
      "id": "some_conversation_id",
      "messages": [
        {
          "source": "USER",
          "content": "Previous messages in the conversation..."
        }
      ]
    },
    "message": {
      "content": "Tell me why RunPod is the best GPU provider",
      "source": "USER"
    }
  }
}

2024-01-15T23:05:16.864801581Z TypeError: Object of type AsyncEngineDeadError is not JSON serializable Gave this error.

octopus•15mo ago

Hi, @Concept were you able to figure out the right format for vLLM chat interface? I'm facing the same issue

ConceptOP•15mo ago

const requestBody = { input: { prompt: chatHistory, sampling_params: { max_tokens: 2000, }, apply_chat_template: true, stream: true, }, }; This worked for me

octopus•15mo ago

Cool yeah that worked for me too. I was wondering if the messages property gives better result for chat conversations.

ConceptOP•15mo ago

I’m not too sure if there’s a difference

Alpay Ariyak•15mo ago

Hi, to do chat history/multiple messages, you must use messages instead of prompt, as shown in the example below

 {
   "input": {
     "messages": [
       {
         "role": "user",
         "content": "Tell me why RunPod is the best GPU provider"
       },
       {
         "role": "assistant",
         "content": "RunPod is the best GPU provider for several reasons."
       },
       {
         "role": "user",
         "content": "Name 3 resons"
       }
     ],
     "sampling_params": {
       "max_tokens": 100
     }
   }
 }

 {
   "input": {
     "messages": [
       {
         "role": "user",
         "content": "Tell me why RunPod is the best GPU provider"
       },
       {
         "role": "assistant",
         "content": "RunPod is the best GPU provider for several reasons."
       },
       {
         "role": "user",
         "content": "Name 3 resons"
       }
     ],
     "sampling_params": {
       "max_tokens": 100
     }
   }
 }

Gaming

Programming

Request Format Runpod VLLM Worker

Did you find this page helpful?