SvenBrnn Comments - Answer Overflow

SvenBrnn

Posts Comments

RRunPod

•Created by ammar on 3/17/2025 in #⚡｜serverless

Ollama serverless?

It will however fully work with all endpoints including the openapi endpoints

7 replies

RRunPod

•Created by ammar on 3/17/2025 in #⚡｜serverless

Ollama serverless?

no its most likely not, there is no cache implemented that vllm can use so startup will take a bit longer as vllm the wrapper is just starting a ollama inside and is wrapping requests to runpod so ollama can understand and answer them. Its also nice for gguf models or models from official ollama repository however its just a small project. I just added automatically updating the container today, so there should always ne a new container for new ollama versions ready after max 24h now

7 replies

RRunPod

•Created by lsdvaibhavvvv on 1/30/2025 in #⚡｜serverless

I am not able to hit the api in serverless ollama server llama3.2 model , Here is the screenshot

which serverless ollama repository are you using as a base? The one i found in runpod blog did not really work well anymore so i ended up building my own: See https://discord.com/channels/912829806415085598/1334451089256480768/1334451089256480768 If you still got problems with this i can try to help

3 replies

RRunPod

•Created by tzushi on 2/5/2025 in #⚡｜serverless

"Error decoding stream response" on Completed OpenAI compatible stream requests

Have a look here: https://github.com/SvenBrnn/runpod-worker-ollama/blob/master/wrapper/src/engine.py#L88 I compared the default OpenAI response (from vllm) to the one i did, the difference was that not a real json was returned in stream but a "data: <json string>" With that it worked for me.

3 replies

RRunPod

•Created by nikolai on 12/15/2024 in #⚡｜serverless

Consistently timing out after 90 seconds

it is the openai stream 🙂 i've build myself a ollama worker thats working with openai endpoint and open-webui to call it, currently i just disabled streaming for now to have it work as with streaming it will stop writing after about 1 minute

32 replies

RRunPod

•Created by nikolai on 12/15/2024 in #⚡｜serverless

Consistently timing out after 90 seconds

Do we get a information when this is live? It also happens on my own worker when i enable serverless, i was thinking that i'm doing something still wrong on my side but if this happens for vllm too i meight just want to wait till your fix is live to see if my code works then two

32 replies

RRunPod

•Created by Mohamed Nagy on 1/26/2025 in #⚡｜serverless

How to respond to the requests at https://api.runpod.ai/v2/<YOUR ENDPOINT ID>/openai/v1

@Mohamed Nagy you can have a look at my two json files to see how the /openapi/xxx are sent to the input, i figured it was a good way to just add these to my repo for later testing 😂

18 replies

RRunPod

•Created by Mohamed Nagy on 1/26/2025 in #⚡｜serverless

How to respond to the requests at https://api.runpod.ai/v2/<YOUR ENDPOINT ID>/openai/v1

Just a way to save some money, testing can be bit expensive espessally when you don't want to really run anything but only find out how stuff works internly

18 replies

RRunPod

•Created by Mohamed Nagy on 1/26/2025 in #⚡｜serverless

How to respond to the requests at https://api.runpod.ai/v2/<YOUR ENDPOINT ID>/openai/v1

I am running this on a GPU instance now, hovever i used a CPU instance figuring out with print's what its getting sent with different OpenAI commands, together with the code of the VLLM worker i figured out how to fully integrate my own worker with OpenAI endpoint

18 replies

RRunPod

•Created by Bj9000 on 1/27/2025 in #⚡｜serverless

Serveless quants

i was also searching for it last week, i ended up giving up as there seems to be a ticket about this on the github of the vllm worker. https://github.com/runpod-workers/worker-vllm/issues/98 Doesn't seem on their tasklist anytime soon so i ended up building my own ollama based runner.

10 replies

RRunPod

•Created by Mohamed Nagy on 1/26/2025 in #⚡｜serverless

How to respond to the requests at https://api.runpod.ai/v2/<YOUR ENDPOINT ID>/openai/v1

Just have a look at https://github.com/SvenBrnn/runpod-worker-ollama/tree/master/wrapper/src I've build this last week with trying around on a CPU instance and figured out how stuff comes in when it comes from an OpenAI endpoint: https://github.com/SvenBrnn/runpod-worker-ollama/blob/master/test_inputs/openai_completion.json https://github.com/SvenBrnn/runpod-worker-ollama/blob/master/test_inputs/openai_get_models.json

18 replies

Gaming

Programming