Ammar Ahmed Comments - Answer Overflow

class ModelPool:
    def __init__(self, max_models=3):
        self.lock = Lock()
        self.model_queue = queue.Queue()
        self.max_models = max_models

        # Initialize the pool with a set number of models
        for _ in range(max_models):
            self.model_queue.put(self._create_model())

    def _create_model(self):
        """ Load and return a new instance of the model. """
        model_id = "SG161222/Realistic_Vision_V2.0"
        scheduler = DPMSolverMultistepScheduler.from_pretrained(model_id, subfolder="scheduler")
        pipeline = DiffusionPipeline.from_pretrained(model_id, scheduler=scheduler, torch_dtype=torch.float16, cache_dir="model_cache")
        pipeline = pipeline.to("cuda" if torch.cuda.is_available() else "mps" if torch.backends.mps.is_available() else "cpu")
        return pipeline

    def get_model(self):
        """ Get a model from the pool. """
        with self.lock:
            return self.model_queue.get()

    def return_model(self, model):
        """ Return a model to the pool. """
        with self.lock:
            self.model_queue.put(model)

model_pool = ModelPool(max_models=10)

class ModelPool:
    def __init__(self, max_models=3):
        self.lock = Lock()
        self.model_queue = queue.Queue()
        self.max_models = max_models

        # Initialize the pool with a set number of models
        for _ in range(max_models):
            self.model_queue.put(self._create_model())

    def _create_model(self):
        """ Load and return a new instance of the model. """
        model_id = "SG161222/Realistic_Vision_V2.0"
        scheduler = DPMSolverMultistepScheduler.from_pretrained(model_id, subfolder="scheduler")
        pipeline = DiffusionPipeline.from_pretrained(model_id, scheduler=scheduler, torch_dtype=torch.float16, cache_dir="model_cache")
        pipeline = pipeline.to("cuda" if torch.cuda.is_available() else "mps" if torch.backends.mps.is_available() else "cpu")
        return pipeline

    def get_model(self):
        """ Get a model from the pool. """
        with self.lock:
            return self.model_queue.get()

    def return_model(self, model):
        """ Return a model to the pool. """
        with self.lock:
            self.model_queue.put(model)

model_pool = ModelPool(max_models=10)

This is the pool, which loaded into memory. Everytime a request is on the server it gets model from this pool.

38 replies

RRunPod

•Created by Ammar Ahmed on 10/4/2024 in #⚡｜serverless

How can I make a single worker handle multiple requests concurrently before starting the next worker

Fixed it, I created a model pool which will keep number of models loaded according to the max concurrency. It reduced the time to below 10 seconds 😀

38 replies

RRunPod

•Created by Ammar Ahmed on 10/4/2024 in #⚡｜serverless

How can I make a single worker handle multiple requests concurrently before starting the next worker

also processing on a single request is fast, but when multiple requests are being processed concurrently, processing is very slow

38 replies

RRunPod

•Created by Ammar Ahmed on 10/4/2024 in #⚡｜serverless

How can I make a single worker handle multiple requests concurrently before starting the next worker

yes i have flashboot enabled

38 replies

RRunPod

•Created by Ammar Ahmed on 10/4/2024 in #⚡｜serverless

How can I make a single worker handle multiple requests concurrently before starting the next worker

It seems like concurrent request are taking too long to get processed together.

38 replies

RRunPod

•Created by Ammar Ahmed on 10/4/2024 in #⚡｜serverless

How can I make a single worker handle multiple requests concurrently before starting the next worker

It's taking time to load the model into memory

38 replies

RRunPod

•Created by Ammar Ahmed on 10/4/2024 in #⚡｜serverless

How can I make a single worker handle multiple requests concurrently before starting the next worker

okay

38 replies

RRunPod

•Created by Ammar Ahmed on 10/4/2024 in #⚡｜serverless

How can I make a single worker handle multiple requests concurrently before starting the next worker

38 replies

RRunPod

•Created by Ammar Ahmed on 10/4/2024 in #⚡｜serverless

How can I make a single worker handle multiple requests concurrently before starting the next worker

ohh okay. Thanks

38 replies

RRunPod

•Created by Ammar Ahmed on 10/4/2024 in #⚡｜serverless

How can I make a single worker handle multiple requests concurrently before starting the next worker

yes

38 replies

RRunPod

•Created by Ammar Ahmed on 10/4/2024 in #⚡｜serverless

How can I make a single worker handle multiple requests concurrently before starting the next worker

yes i found it on docs and was figuring out how to implement it in python. Will it go with input in handler.py?

38 replies

RRunPod

•Created by Ammar Ahmed on 10/4/2024 in #⚡｜serverless

How can I make a single worker handle multiple requests concurrently before starting the next worker

Essentially, I want to modify how the queue behaves so that premium requests jump ahead of regular ones in the processing order. Can I modify the queue behavior or set some priority rules for incoming requests in RunPod?

38 replies

Gaming

Programming