zethos Posts - Answer Overflow

zethos

•Created by zethos on 1/30/2025 in #⛅｜pods

Cannot find any model weights with `/models/huggingface-cache/hub/models...`

Hi, I made a docker image using the "STEP-2" mentioned in Readme file. I created an template with docker image with below environment variables: MODEL_NAME="migtissera/Tess-3-Mistral-Large-2-123B" MAX_MODEL_LEN=65536 TENSOR_PARALLEL_SIZE=8 GPU_MEMORY_UTILIZATION=0.92 ENABLE_CHUNKED_PREFILL=1 NCCL_P2P_DISABLE=1 OMP_NUM_THREADS=1 ENFORCE_EAGER=1 The docker image: snbhanja/tess3mistrallarge128b:latest I tried to deploy this into a serverless with 8 48GB GPU. I get the below error but I didn't get this error when the very first time it is deployed: RuntimeError: Cannot find any model weights with `/models/huggingface-cache/hub/models--migtissera--Tess-3-Mistral-Large-2-123B/snapshots/8047f7cc9615909650b6a4ae5d13719d3e11594d Even if i delete the serveless endpoint and try to make one using this, it gives same error. Full log: https://github.com/user-attachments/files/18603761/logs.11.txt

2 replies

RRunPod

•Created by zethos on 1/29/2025 in #⚡｜serverless

Need help in fixing long running deployments in serverless vLLM

18 replies

RRunPod

•Created by zethos on 1/13/2024 in #⚡｜serverless

#How to upload a file using a upload api in gpu serverless?

This is my current code, there is a separate fastapi running.

automatic_session = requests.Session()
retries = Retry(total=10, backoff_factor=0.1, status_forcelist=[502, 503, 504])
automatic_session.mount('http://', HTTPAdapter(max_retries=retries))

def run_inference(params):
    config = {
        "baseurl": "http://127.0.0.1:8080",
        "api": {
            "health_check":  ("GET", "/health_check"),
            "upload":  ("POST", "/upload"),
        },
        "timeout": 600
    }

    api_name = params["api_name"]

    if api_name in config["api"]:
        api_config = config["api"][api_name]
    else:
        raise Exception("Method '%s' not yet implemented")

    api_verb = api_config[0]
    api_path = api_config[1]

    response = {}

    if api_verb == "GET":
        response = automatic_session.get(
                url='%s%s' % (config["baseurl"], api_path),
                timeout=config["timeout"])
        
    elif api_verb == "POST":
        if "upload" in api_path:
            print("Inside upload rp")
            # For "upload" endpoints, use form data
            files = {'file': ('filename', params['file'].read())}  # Assuming the file is passed in params
            response = automatic_session.post(
                    url='%s%s' % (config["baseurl"], api_path),
                    files=files,
                    timeout=config["timeout"])
        else:
            # For other POST requests, use JSON
            response = automatic_session.post(
                    url='%s%s' % (config["baseurl"], api_path),
                    json=params, 
                    timeout=config["timeout"])

    return response.json()
def handler(event):
    json = run_inference(event["input"])
    return json

if __name__ == "__main__":
    runpod.serverless.start({"handler": handler})

automatic_session = requests.Session()
retries = Retry(total=10, backoff_factor=0.1, status_forcelist=[502, 503, 504])
automatic_session.mount('http://', HTTPAdapter(max_retries=retries))

def run_inference(params):
    config = {
        "baseurl": "http://127.0.0.1:8080",
        "api": {
            "health_check":  ("GET", "/health_check"),
            "upload":  ("POST", "/upload"),
        },
        "timeout": 600
    }

    api_name = params["api_name"]

    if api_name in config["api"]:
        api_config = config["api"][api_name]
    else:
        raise Exception("Method '%s' not yet implemented")

    api_verb = api_config[0]
    api_path = api_config[1]

    response = {}

    if api_verb == "GET":
        response = automatic_session.get(
                url='%s%s' % (config["baseurl"], api_path),
                timeout=config["timeout"])
        
    elif api_verb == "POST":
        if "upload" in api_path:
            print("Inside upload rp")
            # For "upload" endpoints, use form data
            files = {'file': ('filename', params['file'].read())}  # Assuming the file is passed in params
            response = automatic_session.post(
                    url='%s%s' % (config["baseurl"], api_path),
                    files=files,
                    timeout=config["timeout"])
        else:
            # For other POST requests, use JSON
            response = automatic_session.post(
                    url='%s%s' % (config["baseurl"], api_path),
                    json=params, 
                    timeout=config["timeout"])

    return response.json()
def handler(event):
    json = run_inference(event["input"])
    return json

if __name__ == "__main__":
    runpod.serverless.start({"handler": handler})

#⚡｜serverless

4 replies

Gaming

Programming