naaviii Comments - Answer Overflow

api_key = "xxxxxxxxx"
endpoint_id = "vllm-xxxxx"

client = OpenAI(
    base_url=f"https://api.runpod.ai/v2/{endpoint_id}/openai/v1",
    api_key=api_key,
)

# Create a completion
response = client.completions.create(
    model="microsoft/Phi-3.5-mini-instruct",
    prompt="Runpod is the best platform because",
    temperature=0,
    max_tokens=100,
)

print(response)
# Print the response
print(response.choices[0].text)

################Output###############################

{
  "delayTime": 104,
  "error": "handler: 'NoneType' object has no attribute 'headers' \ntraceback: Traceback (most recent call last):\n  File \"/usr/local/lib/python3.10/dist-packages/runpod/serverless/modules/rp_job.py\", line 192, in run_job_generator\n    async for output_partial in job_output:\n  File \"/src/handler.py\", line 13, in handler\n    async for batch in results_generator:\n  File \"/src/engine.py\", line 151, in generate\n    async for response in self._handle_chat_or_completion_request(openai_request):\n  File \"/src/engine.py\", line 179, in _handle_chat_or_completion_request\n    response_generator = await generator_function(request, raw_request=None)\n  File \"/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/serving_completion.py\", line 129, in create_completion\n    raw_request.headers):\nAttributeError: 'NoneType' object has no attribute 'headers'\n",
  "executionTime": 1191,
  "id": "sync-9c9ccd0f-7e42-4f6a-8c5d-d430004b399f-e1",
  "status": "FAILED"
}

This is the basic code that I have used

22 replies

Gaming

Programming