jackson hole Comments - Answer Overflow

jackson hole

•Created by jackson hole on 1/13/2025 in #⚡｜serverless

I want to increase/decrease workers by code or script, can you help? (GraphQL)

Hello! Finally with the help of @Eren and of course you @nerdylive I was able to create a script so anyone can use it in place.

import requests
import json
import argparse
import logging
from datetime import datetime

# Logging setup
logging.basicConfig(
    filename="runpod_scheduler.log",
    level=logging.INFO,
    format="%(asctime)s - %(levelname)s - %(message)s"
)

API_KEY = "your_api_key_here"
URL = f"https://api.runpod.io/graphql?api_key={API_KEY}"


# Parse command-line arguments
parser = argparse.ArgumentParser(description="Update RunPod workersMin value")
parser.add_argument("--minWorkers", type=int, required=True, help="Value for workersMin")
parser.add_argument("--id", type=str, required=True, help="Endpoint ID")
parser.add_argument("--name", type=str, required=True, help="Endpoint Name")
args = parser.parse_args()


# GraphQL mutation for saving endpoint
mutation = """
mutation saveEndpoint($input: EndpointInput!) {
  saveEndpoint(input: $input) {
    gpuIds
    id
    name
    workersMin
  }
}
"""

# Payload data
variables = {
    "input": {
        "gpuIds": "AMPERE_24,-NVIDIA L4,-NVIDIA RTX A5000",
        "name": f"{args.name} -fb",
        "id": args.id,
        "workersMin": args.minWorkers,
    }
}



# Construct the request
payload = {
    "operationName": "saveEndpoint",
    "query": mutation,
    "variables": variables
}

try:
    logging.info(f"Mutating for the endpoint `{args.name=}` and `{args.id=}` ({args.minWorkers=})")
    # Send the request
    response = requests.post(URL, headers={"Content-Type": "application/json"}, data=json.dumps(payload))
        
    if response.status_code == 200:
        logging.info(f"Successfully updated workersMin to {args.minWorkers}\nResponse:\n{response.json()}")
    else:
        logging.error(f"Failed to update workersMin. Status: {response.status_code}, Response: {response.json()}")


except Exception as e:
    logging.error(f"Error: {str(e)}")

import requests
import json
import argparse
import logging
from datetime import datetime

# Logging setup
logging.basicConfig(
    filename="runpod_scheduler.log",
    level=logging.INFO,
    format="%(asctime)s - %(levelname)s - %(message)s"
)

API_KEY = "your_api_key_here"
URL = f"https://api.runpod.io/graphql?api_key={API_KEY}"


# Parse command-line arguments
parser = argparse.ArgumentParser(description="Update RunPod workersMin value")
parser.add_argument("--minWorkers", type=int, required=True, help="Value for workersMin")
parser.add_argument("--id", type=str, required=True, help="Endpoint ID")
parser.add_argument("--name", type=str, required=True, help="Endpoint Name")
args = parser.parse_args()


# GraphQL mutation for saving endpoint
mutation = """
mutation saveEndpoint($input: EndpointInput!) {
  saveEndpoint(input: $input) {
    gpuIds
    id
    name
    workersMin
  }
}
"""

# Payload data
variables = {
    "input": {
        "gpuIds": "AMPERE_24,-NVIDIA L4,-NVIDIA RTX A5000",
        "name": f"{args.name} -fb",
        "id": args.id,
        "workersMin": args.minWorkers,
    }
}



# Construct the request
payload = {
    "operationName": "saveEndpoint",
    "query": mutation,
    "variables": variables
}

try:
    logging.info(f"Mutating for the endpoint `{args.name=}` and `{args.id=}` ({args.minWorkers=})")
    # Send the request
    response = requests.post(URL, headers={"Content-Type": "application/json"}, data=json.dumps(payload))
        
    if response.status_code == 200:
        logging.info(f"Successfully updated workersMin to {args.minWorkers}\nResponse:\n{response.json()}")
    else:
        logging.error(f"Failed to update workersMin. Status: {response.status_code}, Response: {response.json()}")


except Exception as e:
    logging.error(f"Error: {str(e)}")

13 replies

RRunPod

•Created by jackson hole on 1/13/2025 in #⚡｜serverless

I want to increase/decrease workers by code or script, can you help? (GraphQL)

The network hack is a Gem 💎 Will surely try this!! Thanks!!

13 replies

RRunPod

•Created by jackson hole on 1/13/2025 in #⚡｜serverless

I want to increase/decrease workers by code or script, can you help? (GraphQL)

Let's hope for the best!

13 replies

RRunPod

•Created by jackson hole on 1/8/2025 in #⚡｜serverless

Some basic confusion about the `handlers`

EXACTLY! Thanks @3WaD !! And of course @xeith_ for your detailed walkthrough!! 🤗

10 replies

RRunPod

•Created by jackson hole on 1/8/2025 in #⚡｜serverless

Some basic confusion about the `handlers`

So, basically this FastAPI is implemented in this async way and I guess it should do that job!

10 replies

RRunPod

•Created by jackson hole on 1/8/2025 in #⚡｜serverless

Some basic confusion about the `handlers`

Actually my code structure looks like this:

app = FastAPI()

@app.post("/get_response")
async def get_response(request: Request) -> Response:
    request_dict = await request.json()
    print("JSON:", request_dict)
    
    payload = {"topic": request_dict["story_topic"]}
    request_id = random_uuid()
    story = await functions.generate_story(payload)
   
    return {"response":story}

app = FastAPI()

@app.post("/get_response")
async def get_response(request: Request) -> Response:
    request_dict = await request.json()
    print("JSON:", request_dict)
    
    payload = {"topic": request_dict["story_topic"]}
    request_id = random_uuid()
    story = await functions.generate_story(payload)
   
    return {"response":story}

And that calls the appropriate async function:

async def generate_story(payload):
  
  sampling_params =  SamplingParams(temperature=0.42,
                                    max_tokens=2048,
                                    top_p=0.734,
                                    repetition_penalty=1.0,
                                    stop=["Note", "note"])
  
  prompt = "<>"

  ### the below will call the runpod openai endpoint ###
  completion = client.completions.create(
          model="iqbalamo93",
          prompt=prompt,
          **sampling_params)
return completion.choices[0].text

async def generate_story(payload):
  
  sampling_params =  SamplingParams(temperature=0.42,
                                    max_tokens=2048,
                                    top_p=0.734,
                                    repetition_penalty=1.0,
                                    stop=["Note", "note"])
  
  prompt = "<>"

  ### the below will call the runpod openai endpoint ###
  completion = client.completions.create(
          model="iqbalamo93",
          prompt=prompt,
          **sampling_params)
return completion.choices[0].text

10 replies

RRunPod

•Created by jackson hole on 1/8/2025 in #⚡｜serverless

Some basic confusion about the `handlers`

Yes, that handler. So I am basically utilizing that already!? Wow.

10 replies

RRunPod

•Created by jackson hole on 1/7/2025 in #⚡｜serverless

How to monitor the LLM inference speed (generation token/s) with vLLM serverless endpoint?

Absolutely, but I found Discord (and nerdylive support) faster and quicker 😉

7 replies

RRunPod

•Created by jackson hole on 1/7/2025 in #⚡｜serverless

How to monitor the LLM inference speed (generation token/s) with vLLM serverless endpoint?

Oh yeah, I thought runpod has built-in support for this. Thanks

7 replies

RRunPod

•Created by jackson hole on 1/3/2025 in #⚡｜serverless