RunPod•15mo ago

quick python vLLM endpoint example please?

…I’ve been on this for 2 hours and the best I can get so far is have a bunch of stuff endlessly ‘queued’. I’m getting responses from the test thing on the ‘my endpoints’ page but my python script isn’t working… 😅

Solution:

Here's the answer btw ```import requests import json ...

Jump to solution

2 Replies

Solution

Toxibunny•15mo ago

Here's the answer btw

import requests
import json

url = "<https://api.runpod.ai/v2/llama2-13b-chat/runsync">
headers = {
    "accept": "application/json",
    "authorization": "7YEK9P00D6BR8UY8WORQ3Y61XA8X2VDSZT5VGV54",
    "content-type": "application/json",
}
data = {
    "input": {
        "prompt": "Who is the president of the United States?",
        "sampling_params": {
            "max_tokens": 16,
            "n": 1,
            "best_of": None,
            "presence_penalty": 0,
            "frequency_penalty": 0,
            "temperature": 0.7,
            "top_p": 1,
            "top_k": -1,
            "use_beam_search": False,
            "stop": [
                "None"
            ],
            "ignore_eos": False,
            "logprobs": None
        }
    }
}

response = requests.post(url, headers=headers, data=json.dumps(data))

print(response.json())

import requests
import json

url = "<https://api.runpod.ai/v2/llama2-13b-chat/runsync">
headers = {
    "accept": "application/json",
    "authorization": "7YEK9P00D6BR8UY8WORQ3Y61XA8X2VDSZT5VGV54",
    "content-type": "application/json",
}
data = {
    "input": {
        "prompt": "Who is the president of the United States?",
        "sampling_params": {
            "max_tokens": 16,
            "n": 1,
            "best_of": None,
            "presence_penalty": 0,
            "frequency_penalty": 0,
            "temperature": 0.7,
            "top_p": 1,
            "top_k": -1,
            "use_beam_search": False,
            "stop": [
                "None"
            ],
            "ignore_eos": False,
            "logprobs": None
        }
    }
}

response = requests.post(url, headers=headers, data=json.dumps(data))

print(response.json())

ToxibunnyOP•15mo ago

...for the part I was stuck on, at least. the formatting of the 'data' part...

Gaming

Programming

quick python vLLM endpoint example please?

Did you find this page helpful?