Antek
RRunPod
•Created by Antek on 3/28/2024 in #⚡|serverless
Enabling and Viewing Logs for Serverless Jobs in Runpod
I've been utilizing Runpod for running serverless jobs with the WhisperX model and have encountered a situation where I need to debug and monitor my job executions more closely. For this purpose, I've integrated basic Python print statements throughout my code to log essential information and milestones.
However, I am unable to find these logs or any related output in the Serverless -> Logs tab within the Runpod interface. I have checked the documentation but couldn't locate any guidelines on how to enable logging for serverless jobs or view them once they're produced.
Could you provide detailed instructions or point me to the relevant section of the documentation on how to:
Enable logging for serverless jobs in Runpod, specifically if there are any configurations or code adjustments needed to ensure that my print statements or any other logging mechanism output is captured.
Access or view these logs within the Runpod platform once my job has been executed, particularly if there's a specific tab or section that I might have overlooked.
Understanding how to effectively log and retrieve these logs will significantly aid in debugging and optimizing my applications.
My endpoint id: m3lzr86sanjxt5
2 replies
RRunPod
•Created by Antek on 3/28/2024 in #⚡|serverless
Subject: CUFFT_INTERNAL_ERROR on Specific GPU Models While Running WhisperX Model
Hello,
I've encountered a recurring CUFFT_INTERNAL_ERROR while running the WhisperX model for audio transcription on Runpod, specifically with certain GPU models. Below are the details:
Successful Execution:
RTX A5000: GPU Utilization 96%, Successful
RTX 3090: GPU Utilization 97-100%, Successful
RTX A4000: GPU Utilization 97%, Successful
RTX A4500: GPU Utilization 96%, Successful
Error Encountered:
L4: GPU Utilization 100%, Error: CUFFT_INTERNAL_ERROR
RTX 4000 Ada: GPU Utilization 31%, Error: CUFFT_INTERNAL_ERROR
The issue seems to occur irrespective of the GPU's memory capacity, as it affects both 24 GB (L4) and 16 GB (RTX 4000 Ada) GPUs. The same script and audio file work without issues on Google Colab using T4 or V100 GPUs.
P.S My endpoint ID: m3lzr86sanjxt5
2 replies