RunPod•15mo ago

Does async generator allow a worker to take off multiple jobs? Concurrency Modifier?

I was reading the runpod docs, and I saw the below. But does an async generator_handler, mean that if I sent 5 jobs for example that one worker will just keep on picking up new jobs? I also tried to add the:

"concurrency_modifier": 4

"concurrency_modifier": 4

But if I queued up 10 jobs, it would first max out the workers, and just have jobs sitting in queue, rather than each worker picking up to the max number of concurrency modifiers? https://docs.runpod.io/serverless/workers/handlers/handler-async

import runpod
import asyncio

async def async_generator_handler(job):
    for i in range(5):
        # Generate an asynchronous output token
        output = f"Generated async token output {i}"
        yield output

        # Simulate an asynchronous task, such as processing time for a large language model
        await asyncio.sleep(1)


# Configure and start the RunPod serverless function
runpod.serverless.start(
    {
        "handler": async_generator_handler,  # Required: Specify the async handler
        "return_aggregate_stream": True,  # Optional: Aggregate results are accessible via /run endpoint
    }
)

import runpod
import asyncio

async def async_generator_handler(job):
    for i in range(5):
        # Generate an asynchronous output token
        output = f"Generated async token output {i}"
        yield output

        # Simulate an asynchronous task, such as processing time for a large language model
        await asyncio.sleep(1)


# Configure and start the RunPod serverless function
runpod.serverless.start(
    {
        "handler": async_generator_handler,  # Required: Specify the async handler
        "return_aggregate_stream": True,  # Optional: Aggregate results are accessible via /run endpoint
    }
)

Asynchronous Handler | RunPod Documentation

RunPod supports the use of asynchronous handlers, enabling efficient handling of tasks that benefit from non-blocking operations. This feature is particularly useful for tasks like processing large datasets, interacting with APIs, or handling I/O-bound operations.

Solution:

``` import runpod import asyncio import random ...

Jump to solution

43 Replies

Alpay Ariyak•15mo ago

Try “concurrency_modifier”: lambda x: <MAX CONCURRENCY, e.g. 4>

J.OP•15mo ago

Im not sure it works? Hm. I sent like 3 requests to the handler, and the third request just sits by itself.

import asyncio
import runpod


async def async_generator_handler(job):
    '''
    async generator handler job
    '''
    for i in range(5):
        # Generate an asynchronous output token
        output = f"Generated async token output {i}; \n {job}"
        yield output

        # Simulate an asynchronous task, such as processing time for a large language model
        await asyncio.sleep(30)


# Configure and start the RunPod serverless function
runpod.serverless.start(
    {
        "handler": async_generator_handler,
        "return_aggregate_stream": False,
        "concurrency_modifier": lambda x: 4,
    }
)

import asyncio
import runpod


async def async_generator_handler(job):
    '''
    async generator handler job
    '''
    for i in range(5):
        # Generate an asynchronous output token
        output = f"Generated async token output {i}; \n {job}"
        yield output

        # Simulate an asynchronous task, such as processing time for a large language model
        await asyncio.sleep(30)


# Configure and start the RunPod serverless function
runpod.serverless.start(
    {
        "handler": async_generator_handler,
        "return_aggregate_stream": False,
        "concurrency_modifier": lambda x: 4,
    }
)

Alpay Ariyak•15mo ago

@Justin Merrell

Justin Merrell•15mo ago

Do you see anything in the logs?

J.OP•15mo ago

Sorry meant to respond, I don't see anything in my logs? it just responds with what is expected the: Generated async token output {i}; \n {job} so the async generator works, it just that my expectation is if I send like 10 requests into the queue, and I have the async generator: 1. The number of max workers will first scale up (so let's say for ex. 3) 2) Meaning that now 7 jobs remain in the queue 3. That even though the worker is pending with the asyncio sleep for 30 seconds, b/c it is an async generator, it will keep pulling off the queue, so each worker will take another X-1 lambda jobs from the queue. So that way I don't just have my jobs sitting on the queue, and each worker can handle stuff more efficiently. @Justin Merrell Is maybe the way above not the expected behavior? Honestly, would just love to be able to have one worker take more jobs at once, just I haven't even been able to get hello world working.

Justin Merrell•15mo ago

The concurrency modifier as a worker level and tells the worker how many jobs the worker should try to mantain at any given time

J.OP•15mo ago

I think one of the issues is that this is maybe not properly working? B/c I can see for ex. the first two jobs have a delay time of 4 - 8 seconds, while the others have ever-increasing delay time (due to sitting in queue).

J.OP•15mo ago

I just feel that this is unexpected behavior though otherwise that it is not picking up the jobs though still which is being defined by the concurrency_modifier. Wouldn't expect the jobs to just sit in queue. So hopefully can give advice or check it up some point. The below is the exact code, and I just feel it is a bit weird that I can't get something that I feel is pretty simple to work.

import asyncio
import runpod


async def async_generator_handler(job):
    '''
    async generator handler job
    '''
    for i in range(5):
        # Generate an asynchronous output token
        output = f"Generated async token output {i}; \n {job}"
        yield output

        # Simulate an asynchronous task, such as processing time for a large language model
        await asyncio.sleep(10)


# Configure and start the RunPod serverless function
runpod.serverless.start(
    {
        "handler": async_generator_handler,
        "return_aggregate_stream": True,
        "concurrency_modifier": lambda x: 4,
    }
)

import asyncio
import runpod


async def async_generator_handler(job):
    '''
    async generator handler job
    '''
    for i in range(5):
        # Generate an asynchronous output token
        output = f"Generated async token output {i}; \n {job}"
        yield output

        # Simulate an asynchronous task, such as processing time for a large language model
        await asyncio.sleep(10)


# Configure and start the RunPod serverless function
runpod.serverless.start(
    {
        "handler": async_generator_handler,
        "return_aggregate_stream": True,
        "concurrency_modifier": lambda x: 4,
    }
)

Justin Merrell•15mo ago

Try running it again with 1 active worker and 1 max worker

J.OP•15mo ago

@Justin Merrell Tried it out, and still the same result, where essentially with one active worker, 1 max worker, only the first request is taken

J.OP•15mo ago

@Justin Merrell Is it maybe b/c it isn't concurrency_modifier but rather max_concurrency?

J.OP•15mo ago

I was looking at the runpod SDK: https://github.com/runpod/runpod-python/blob/main/runpod/serverless/core.py#L175 And maybe this is the code block where you guys are grabbing the max_concurrency from? as some sort of key?

GitHub

runpod-python/runpod/serverless/core.py at main · runpod/runpod-pyt...

🐍 | Python library for RunPod API and serverless worker SDK. - runpod/runpod-python

J.OP•15mo ago

I see you do similar things for return_aggregate_stream, and so on.

Justin Merrell•15mo ago

Where is this screenshot coming from? And what does your entire handler file look like?

J.OP•15mo ago

The screenshot is coming from: https://github.com/runpod/runpod-python/blob/bd68176db7feb6059a3d73b41f6ec6d0d83e0f15/runpod/serverless/core.py#L218C1-L229C1

GitHub

runpod-python/runpod/serverless/core.py at bd68176db7feb6059a3d73b4...

🐍 | Python library for RunPod API and serverless worker SDK. - runpod/runpod-python

J.OP•15mo ago

And this is the entire handler.py file

J.OP•15mo ago

I basically just took: https://github.com/blib-la/runpod-worker-helloworld And then replaced the rp_handler.py with the code block I posted in the discord. *well im trying max_concurrency now it was : concurrency_modifier

Justin Merrell•15mo ago

Oh, the code file is diffrent, re-working the runpod SDK now to make sure things are clearer. And change "max_concurrency" to "concurrency_modifier"

J.OP•15mo ago

Yeah, it was: concurrency_modifier > but it wasn't working 😅.

import asyncio
import runpod


async def async_generator_handler(job):
    '''
    async generator handler job
    '''
    for i in range(5):
        # Generate an asynchronous output token
        output = f"Generated async token output {i}; \n {job}"
        yield output

        # Simulate an asynchronous task, such as processing time for a large language model
        await asyncio.sleep(10)


# Configure and start the RunPod serverless function
runpod.serverless.start(
    {
        "handler": async_generator_handler,
        "return_aggregate_stream": True,
        "concurrency_modifier": lambda x: 4,
    }
)

import asyncio
import runpod


async def async_generator_handler(job):
    '''
    async generator handler job
    '''
    for i in range(5):
        # Generate an asynchronous output token
        output = f"Generated async token output {i}; \n {job}"
        yield output

        # Simulate an asynchronous task, such as processing time for a large language model
        await asyncio.sleep(10)


# Configure and start the RunPod serverless function
runpod.serverless.start(
    {
        "handler": async_generator_handler,
        "return_aggregate_stream": True,
        "concurrency_modifier": lambda x: 4,
    }
)

Yeah.. anyways... concurrency_modifier and max_concurrency both don't work it seems now 😔. at least as far i can tell

Justin Merrell•15mo ago

is it the for loop that is blocking? To be honest I am not a fan of python async haha https://stackoverflow.com/questions/53486744/making-async-for-loops-in-python @PatrickR did we ever release that example to docs?

J.OP•15mo ago

HMM. Im not too sure haha. I was just trying to follow runpod's thing on another example for generators in general: https://docs.runpod.io/serverless/workers/handlers/handler-async I guess, I wouldn't expect the for-loop / asyncio.sleep() to block the concurrency_modifier? My thought is that the SDK is probably just calling the method more times and not waiting for the function call to complete, thus why the handler is given the async def I know you said the SDK on github is out of date, but I imagine even with whatever refactoring the logic is the same:

        for job in jobs:
            asyncio.create_task(_process_job(config, job, serverless_hook), name=job['id'])
            await asyncio.sleep(0)

        for job in jobs:
            asyncio.create_task(_process_job(config, job, serverless_hook), name=job['id'])
            await asyncio.sleep(0)

Where it gets a list of jobs _process_job(config, job, serverless_hook), and then it creates a a task asyncio.create_task( per job in the list that _process_job Yeah T-T. Python async really sucks. A huge hate for it, after having to deal with asyncio myself before. Also thought was close enough to the worker-vllm, which you guys had pointed to? https://github.com/runpod-workers/worker-vllm/blob/main/src/handler.py

Justin Merrell•15mo ago

it might need that async for

J.OP•15mo ago

Hm... I'll give that a try Ill just get rid of the for loop IM not sure u can do an async on the for on a range(5) iterator at least the linter is throwing me errors

Justin Merrell•15mo ago

maybe adding a await async.sleep(10) in there?

J.OP•15mo ago

I think needs to be await asyncio.sleep(10), async.sleep(10) just gives me an error: Tried the below still failed:

import asyncio
import runpod


async def async_generator_handler(job):
    '''
    async generator handler job
    '''
    # Simulate an asynchronous task, such as processing time for a large language model
    await asyncio.sleep(10)
    # Generate an asynchronous output token
    output = f"Generated async token output: {job}"
    yield output


# Configure and start the RunPod serverless function
runpod.serverless.start(
    {
        "handler": async_generator_handler,
        "return_aggregate_stream": True,
        "max_concurrency": lambda x: 4,
    }
)

# depot build -t justinwlin/runpodhelloworldasync:1.4 . --platform linux/amd64 --push

import asyncio
import runpod


async def async_generator_handler(job):
    '''
    async generator handler job
    '''
    # Simulate an asynchronous task, such as processing time for a large language model
    await asyncio.sleep(10)
    # Generate an asynchronous output token
    output = f"Generated async token output: {job}"
    yield output


# Configure and start the RunPod serverless function
runpod.serverless.start(
    {
        "handler": async_generator_handler,
        "return_aggregate_stream": True,
        "max_concurrency": lambda x: 4,
    }
)

# depot build -t justinwlin/runpodhelloworldasync:1.4 . --platform linux/amd64 --push

PatrickR•15mo ago

https://github.com/runpod/docs/pull/24 I have a pull request for review.

GitHub

Adds documentation on concurrent workers by rachfop · Pull Request ...

Adds documentation on concurrent workers.

J.OP•15mo ago

@PatrickR / @Justin Merrell Does it need to be a lambda x: 4? Or does it need to be a integer seeing the documentation that patrickR just posted? Oh wait it prob does need to be a lambda i see that the adjust_concurrency, is a function taking in the current_concurrency FYI, if you could add some info on what: current_concurrency is in the docs be great. Cause the way it is right now, all I assume is its prob the number of jobs in the queue? Or it the default current_concurrency, which is 1 - which actually I think is the latter now that I read through what the code is doing. Also the update_request_rate I don't know if this is best in the documentation, b/c this isn't something that the client is able to access as far as I know normally? Like I can't tell the rate in which jobs are entering the queue, so the example right now where that is randomized, doesn't really give me any thing corresponding if I was the client i feel, and I'd just be confused if anyone else was reading the document and be wondering, oh, how do I modify the concurrency rate based off the number of jobs in the queue? (which is why I feel the documentation a bit misleading)

J.OP•15mo ago

FYI, I tried with a modified function in the docs + also just tried to follow the docs exactly just with a high_request_rate_threshold = 1 and I got an error both times.

{
  "delayTime": 22778,
  "error": "{\"error_type\": \"<class 'KeyError'>\", \"error_message\": \"'data'\", \"error_traceback\": \"Traceback (most recent call last):\\n  File \\\"/usr/local/lib/python3.10/site-packages/runpod/serverless/modules/rp_job.py\\\", line 134, in run_job\\n    job_output = await handler_return if inspect.isawaitable(handler_return) else handler_return\\n  File \\\"/rp_handler.py\\\", line 10, in process_request\\n    return f\\\"Processed: {job['data']}\\\"\\nKeyError: 'data'\\n\", \"hostname\": \"fnyqdflnz8kizc-64411280\", \"worker_id\": \"fnyqdflnz8kizc\", \"runpod_version\": \"1.3.7\"}",
  "executionTime": 2168,
  "id": "50089ca9-d87a-4266-9f98-b7f68e39f7e2-u1",
  "status": "FAILED"
}

{
  "delayTime": 22778,
  "error": "{\"error_type\": \"<class 'KeyError'>\", \"error_message\": \"'data'\", \"error_traceback\": \"Traceback (most recent call last):\\n  File \\\"/usr/local/lib/python3.10/site-packages/runpod/serverless/modules/rp_job.py\\\", line 134, in run_job\\n    job_output = await handler_return if inspect.isawaitable(handler_return) else handler_return\\n  File \\\"/rp_handler.py\\\", line 10, in process_request\\n    return f\\\"Processed: {job['data']}\\\"\\nKeyError: 'data'\\n\", \"hostname\": \"fnyqdflnz8kizc-64411280\", \"worker_id\": \"fnyqdflnz8kizc\", \"runpod_version\": \"1.3.7\"}",
  "executionTime": 2168,
  "id": "50089ca9-d87a-4266-9f98-b7f68e39f7e2-u1",
  "status": "FAILED"
}

message.txt

J.OP•15mo ago

ANyways Ill leave this for now 😅 I guess this just isn't working right now

PatrickR•15mo ago

@justin The function adjust_concurrency should be written by you and for your use case. The documentation is showing what you could use and make. But there is no way the example can accurately handle each user's use case. Also, this was added in 1.4, so make sure your Python SDK is at least on that version.

"runpod_version": "1.3.7"}

Change log: https://github.com/runpod/runpod-python/blob/bd68176db7feb6059a3d73b41f6ec6d0d83e0f15/CHANGELOG.md?plain=1#L82C12-L82C17 You can check your SDK version by running:

import runpod

version = runpod.version.get_version()

print(f"RunPod version number: {version}")

import runpod

version = runpod.version.get_version()

print(f"RunPod version number: {version}")

J.OP•15mo ago

Got it~ will give it a try! Thank you! #THANKS to @PatrickR and @Justin Merrell The documentation examples work 🙂

Solution

J.•15mo ago

import runpod
import asyncio
import random


async def process_request(job):
    await asyncio.sleep(10)  # Simulate processing time
    return f"Processed: {job['data']}"

def adjust_concurrency(current_concurrency):
    """
    Adjusts the concurrency level based on the current request rate.
    """
    return 5

# Start the serverless function with the handler and concurrency modifier
runpod.serverless.start({
    "handler": process_request,
    "concurrency_modifier": adjust_concurrency
})

import runpod
import asyncio
import random


async def process_request(job):
    await asyncio.sleep(10)  # Simulate processing time
    return f"Processed: {job['data']}"

def adjust_concurrency(current_concurrency):
    """
    Adjusts the concurrency level based on the current request rate.
    """
    return 5

# Start the serverless function with the handler and concurrency modifier
runpod.serverless.start({
    "handler": process_request,
    "concurrency_modifier": adjust_concurrency
})

J.OP•15mo ago

😄

J.OP•15mo ago

5 jobs instantly picke dup oh lol but i still got an error.. hmm, ill keep playing with it xDDD but interesting

{
  "delayTime": 14453,
  "error": "{\"error_type\": \"<class 'KeyError'>\", \"error_message\": \"'data'\", \"error_traceback\": \"Traceback (most recent call last):\\n  File \\\"/usr/local/lib/python3.10/site-packages/runpod/serverless/modules/rp_job.py\\\", line 135, in run_job\\n    job_output = await handler_return if inspect.isawaitable(handler_return) else handler_return\\n  File \\\"/rp_handler.py\\\", line 8, in process_request\\n    return f\\\"Processed: {job['data']}\\\"\\nKeyError: 'data'\\n\", \"hostname\": \"e0adia1urlntl9-64410cdc\", \"worker_id\": \"e0adia1urlntl9\", \"runpod_version\": \"1.6.0\"}",
  "executionTime": 10150,
  "id": "b32f3b62-2129-4044-9f8c-b32baf33ea33-u1",
  "status": "FAILED"
}

{
  "delayTime": 14453,
  "error": "{\"error_type\": \"<class 'KeyError'>\", \"error_message\": \"'data'\", \"error_traceback\": \"Traceback (most recent call last):\\n  File \\\"/usr/local/lib/python3.10/site-packages/runpod/serverless/modules/rp_job.py\\\", line 135, in run_job\\n    job_output = await handler_return if inspect.isawaitable(handler_return) else handler_return\\n  File \\\"/rp_handler.py\\\", line 8, in process_request\\n    return f\\\"Processed: {job['data']}\\\"\\nKeyError: 'data'\\n\", \"hostname\": \"e0adia1urlntl9-64410cdc\", \"worker_id\": \"e0adia1urlntl9\", \"runpod_version\": \"1.6.0\"}",
  "executionTime": 10150,
  "id": "b32f3b62-2129-4044-9f8c-b32baf33ea33-u1",
  "status": "FAILED"
}

Lol so close, ill keep messing with it but at least it threw 5 errors in parallel xD Oh prob cause i dont have data got it actually ['data']

Justin Merrell•15mo ago

Change it to input, you were on prototype docs lol

J.OP•15mo ago

hehe thank u thoooo 😄

PatrickR•15mo ago

Yes. Just fixed docs to use the right key.

J.OP•15mo ago

Yay Im so happy

PatrickR•15mo ago

Thanks for playing along! Super great to see it helping.

J.OP•15mo ago

Yeah, im trying to work on a busy box for runpod with like an easy base template for open ssh / jupyter notebook / all that, which I can just call and setup easily for all my future docker images so this is super helpful~ cause i wanted to add a concurrency example + also my current projects I wanted to add some concurrency stuff so im not just wasting gpu power Thank you guys ;D

Gaming

Programming

Does async generator allow a worker to take off multiple jobs? Concurrency Modifier?

Did you find this page helpful?