VllM Memory Error / Runpod Error?
https://pastebin.com/vjSgS4up
I get this error when I tried to start my vllm mistral serverless, it ended up fixing itself by just increasing the GPU to 24GB GPU Pro; which made me guess the GPU just wasn't good enough (even though it was my CPU indicating a 100% usage).
But I guess the problem I have is how do I stop it from erroring out and repeating infinitely if it happens again? Does runpod or VLLM is it possible to catch this somehow?
(The pastebin shows it worked eventually, cause that was a log from my second request after I upgraded the GPU, but otherwise it just kept going for a bit till i manually killed it)
Pastebin
2024-02-16 22:15:28.531[2akn5byerrxpel][info]Finished running gener...
Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.
4 Replies
@Alpay Ariyak ;D not sure if u know, or if this is more of a generic runpod thing
Also wondering.. if you happen to know if the mistral is just really dumb ;D.... or maybe something weird?
I asked it like:
"Hello world", "Tell me a funny joke", etc:
And it responds very weirdly? It seems to always begin with a
Hey,
So mistralai/Mistral-7B-v0.1 is a completion/base model, so rather than being something you can chat with, its purpose is to complete the text you give it
You have 2 options:
1. Use a chat/instruct model, such as mistralai/Mistral-7B-Instruct-v0.1 - this is the best option
2. Set a chat template using the CUSTOM_CHAT_TEMPLATE env variable. You can find jinja chat templates in tokenizer_config.json files of chat/instruct models. E.g. here's Mistral Instruct's chat template: https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1/blob/9ab9e76e2b09f9f29ea2d56aa5bd139e4445c59e/tokenizer_config.json#L32. If you really wanted to use Base mistral instead of Instruct, you would copy the template and set it as the CUSTOM_CHAT_TEMPLATE var. But you will get the best performance out of the first option
In terms of this issue:
Try setting MAX_MODEL_LENGTH env var to a number under 24144 that will be enough for you