Getting timeout with network volume
I want to install llama 3.1 70B model as serverless but 'cold start' takes too long 1-3 minutes. For this reason, I tried to do it with 'network volume', but this time the model cannot be downloaded for the first time, I keep getting timeout after waiting 6-7 minutes. In short, the model cannot be downloaded from HuggingFace servers and transferred to 'network volume'. I am using vLLM. Thanks for your help.
1 Reply
currently serverless worker has a max cold start of about 5-6mins, your likely running into this, i advise you to download the model using gpu cloud pod and then have it run using network volume cache