norefreshing
Too many failed requests
Thank you for your reply. I wanted to test how many requests it can manage. I'm still learning about LLMs and how to host them, I wasn't aware that GPU cloud is suitable for handling many concurrent requests. Could you kindly explain a bit more about why serverless is preferable in this context compared to GPU clouds or any documents that I could check for more detailed information?
7 replies
How can I use ollama Docker image?
I think I'm missing something here. I thought the template is for pulling and running a Docker image. If that's the case, shouldn't I need to add the run commands from the ollama documentation (
docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
) into the Container Start Command input?12 replies