Blah Blah
RRunPod
•Created by Blah Blah on 2/23/2024 in #⚡|serverless
Probleme when writing a multi processing handler
Hi there ! I got an issue when I try to write a handler that processes 2 tasks in parallel (I use ThreadPoolExecutor). I use the transformers library by HF for loading the models and I use Langchain to process the inference. I tested my handler on Google collab, it works well, so I create my docker template and create an endpoint in Runpod, but when it comes to the inference, I constantly have an error : CUDA error: device-side assert triggered. Which I don't have when I test the handler on collab.
How can I handle that, and particularly, what can cause this error ? Because I use a 48GB GPU (which is highly sufficient for my models that take around 18 GB in total), so it can't be a resource issue.
5 replies