R
RunPod10mo ago
Blah Blah

Probleme when writing a multi processing handler

Hi there ! I got an issue when I try to write a handler that processes 2 tasks in parallel (I use ThreadPoolExecutor). I use the transformers library by HF for loading the models and I use Langchain to process the inference. I tested my handler on Google collab, it works well, so I create my docker template and create an endpoint in Runpod, but when it comes to the inference, I constantly have an error : CUDA error: device-side assert triggered. Which I don't have when I test the handler on collab. How can I handle that, and particularly, what can cause this error ? Because I use a 48GB GPU (which is highly sufficient for my models that take around 18 GB in total), so it can't be a resource issue.
3 Replies
ashleyk
ashleyk10mo ago
If you're trying to process concurent jobs, you need to follow this doc: https://docs.runpod.io/serverless/workers/handlers/handler-concurrency
Concurrent Handlers | RunPod Documentation
RunPod supports asynchronous functions for request handling, enabling a single worker to manage multiple tasks concurrently through non-blocking operations. This capability allows for efficient task switching and resource utilization.
Blah Blah
Blah BlahOP10mo ago
Thanks ! I'll that. I naively thought I didn't have to change anything from a local handler code. Hopefully that solves the problem
justin
justin10mo ago
GitHub
Runpod-OpenLLM-Pod-and-Serverless/handler.py at main · justinwlin/R...
A repo for OpenLLM to run pod. Contribute to justinwlin/Runpod-OpenLLM-Pod-and-Serverless development by creating an account on GitHub.
Want results from more Discord servers?
Add your server