R
RunPod10mo ago
JJonahJ

quick python vLLM endpoint example please?

…I’ve been on this for 2 hours and the best I can get so far is have a bunch of stuff endlessly ‘queued’. I’m getting responses from the test thing on the ‘my endpoints’ page but my python script isn’t working… 😅
Solution:
Here's the answer btw ```import requests import json ...
Jump to solution
2 Replies
Solution
JJonahJ
JJonahJ10mo ago
Here's the answer btw
import requests
import json

url = "<https://api.runpod.ai/v2/llama2-13b-chat/runsync">
headers = {
"accept": "application/json",
"authorization": "7YEK9P00D6BR8UY8WORQ3Y61XA8X2VDSZT5VGV54",
"content-type": "application/json",
}
data = {
"input": {
"prompt": "Who is the president of the United States?",
"sampling_params": {
"max_tokens": 16,
"n": 1,
"best_of": None,
"presence_penalty": 0,
"frequency_penalty": 0,
"temperature": 0.7,
"top_p": 1,
"top_k": -1,
"use_beam_search": False,
"stop": [
"None"
],
"ignore_eos": False,
"logprobs": None
}
}
}

response = requests.post(url, headers=headers, data=json.dumps(data))

print(response.json())
import requests
import json

url = "<https://api.runpod.ai/v2/llama2-13b-chat/runsync">
headers = {
"accept": "application/json",
"authorization": "7YEK9P00D6BR8UY8WORQ3Y61XA8X2VDSZT5VGV54",
"content-type": "application/json",
}
data = {
"input": {
"prompt": "Who is the president of the United States?",
"sampling_params": {
"max_tokens": 16,
"n": 1,
"best_of": None,
"presence_penalty": 0,
"frequency_penalty": 0,
"temperature": 0.7,
"top_p": 1,
"top_k": -1,
"use_beam_search": False,
"stop": [
"None"
],
"ignore_eos": False,
"logprobs": None
}
}
}

response = requests.post(url, headers=headers, data=json.dumps(data))

print(response.json())
JJonahJ
JJonahJOP10mo ago
...for the part I was stuck on, at least. the formatting of the 'data' part...
Want results from more Discord servers?
Add your server