SvenBrnn
SvenBrnn
RRunPod
Created by lsdvaibhavvvv on 1/30/2025 in #⚡|serverless
I am not able to hit the api in serverless ollama server llama3.2 model , Here is the screenshot
which serverless ollama repository are you using as a base? The one i found in runpod blog did not really work well anymore so i ended up building my own: See https://discord.com/channels/912829806415085598/1334451089256480768/1334451089256480768 If you still got problems with this i can try to help
3 replies
RRunPod
Created by tzushi on 2/5/2025 in #⚡|serverless
"Error decoding stream response" on Completed OpenAI compatible stream requests
Have a look here: https://github.com/SvenBrnn/runpod-worker-ollama/blob/master/wrapper/src/engine.py#L88 I compared the default OpenAI response (from vllm) to the one i did, the difference was that not a real json was returned in stream but a "data: <json string>" With that it worked for me.
3 replies
RRunPod
Created by nikolai on 12/15/2024 in #⚡|serverless
Consistently timing out after 90 seconds
it is the openai stream 🙂 i've build myself a ollama worker thats working with openai endpoint and open-webui to call it, currently i just disabled streaming for now to have it work as with streaming it will stop writing after about 1 minute
32 replies
RRunPod
Created by nikolai on 12/15/2024 in #⚡|serverless
Consistently timing out after 90 seconds
Do we get a information when this is live? It also happens on my own worker when i enable serverless, i was thinking that i'm doing something still wrong on my side but if this happens for vllm too i meight just want to wait till your fix is live to see if my code works then two
32 replies
RRunPod
Created by Mohamed Nagy on 1/26/2025 in #⚡|serverless
How to respond to the requests at https://api.runpod.ai/v2/<YOUR ENDPOINT ID>/openai/v1
@Mohamed Nagy you can have a look at my two json files to see how the /openapi/xxx are sent to the input, i figured it was a good way to just add these to my repo for later testing 😂
18 replies
RRunPod
Created by Mohamed Nagy on 1/26/2025 in #⚡|serverless
How to respond to the requests at https://api.runpod.ai/v2/<YOUR ENDPOINT ID>/openai/v1
Just a way to save some money, testing can be bit expensive espessally when you don't want to really run anything but only find out how stuff works internly
18 replies
RRunPod
Created by Mohamed Nagy on 1/26/2025 in #⚡|serverless
How to respond to the requests at https://api.runpod.ai/v2/<YOUR ENDPOINT ID>/openai/v1
I am running this on a GPU instance now, hovever i used a CPU instance figuring out with print's what its getting sent with different OpenAI commands, together with the code of the VLLM worker i figured out how to fully integrate my own worker with OpenAI endpoint
18 replies
RRunPod
Created by Bj9000 on 1/27/2025 in #⚡|serverless
Serveless quants
i was also searching for it last week, i ended up giving up as there seems to be a ticket about this on the github of the vllm worker. https://github.com/runpod-workers/worker-vllm/issues/98 Doesn't seem on their tasklist anytime soon so i ended up building my own ollama based runner.
10 replies
RRunPod
Created by Mohamed Nagy on 1/26/2025 in #⚡|serverless
How to respond to the requests at https://api.runpod.ai/v2/<YOUR ENDPOINT ID>/openai/v1
18 replies