Worker hangs for really long time, performance is not close to what it should be
Hi, I'm working with a transcription and diarization endpoint. The docker image works great, tested locally and also inside a worker, I ssh into the worker and tested using:
The processing time is around 1 minute for this video (11 min), works great, these are the logs I get from running inside the worker the same reques -> Message.txt appended.
Once I test this endpoint using a normal request the worker behaves completely abnormal, taking more than 5-6 minutes just to start the transcription, then even more minutes transcribing. The really weird part is I tested the handler in the worker itself using ssh, I have no idea how to debug this or what might be happening:
This should happen in matter of seconds, just liek the logs from the execution within the worker says:
Manual execution
Weirdly enough, I have another endpoint using just transcription, its also fast:
Solution:Jump to solution
Take a look at our implementation of Fast Whisper https://github.com/runpod-workers/worker-faster_whisper/blob/main/src/rp_handler.py
Your code is already blocking, async is likely just introducing complexities...
45 Replies
@Madiator2011 here
Probably it's downloading input file
though it's look like you use cpu and not gpu
Is that custom code?
yes is a custom code, but I have every model in cache
if i log in to the worker with ssh
the coderuns smoothly in just 1-2 min for a 11 minuyte video
and even 6-8 minues for a 2.5 hour video
but once the worker is actually called it hebhaves completely different
you use local file uploaded to worker or download input file from remote location
the file is in AWS S3, it does take more time, but that is not the problem, you can see it actually prints the type of video and everything but once it starts transcription or diarization it hangs
performance of just transcription is close now
but once the file is too big
it takes a lot of time
why is it that this does not take that much time when I manually execute the handler using ssh into the worker?
I think it's probably you giving path to local file
could you explain better? what I do is:
1. Send S3 path to worker
2. Download the object as bytes, just using memory that way I do not download the file
3. Apply transcription, for small files this handles it well, for bigger files the difference is huge
Though still you getting latency
Shouldnt the performance within ssh be the same then?
reading file from remote server will be slower than reading local stored file
But if you have the local file, you need to loaded into memory either way, I'm doing the same? Are you saying that even if I download the object as bytes, it will be faster to download and then open as bytes? seems weird
you also adding network latency when reading file
I know, but I have been saying the difference is mainly the gpu process, not reading the file,l I know that can take time and it does, but why should this affect the process with GPU once the file is already loaded into python memory
I do not know code of your worker so cant help
Here is my code:
file contents is the file in memoery
same video, now hangs forever, I ven changed to a gpu 24gb pro
more than 10 minutes and still no tanscription, file is already lodad per logs
the performance is not stable at all, still working at great speed while doing ssh
why is the worker working faster in ssh ?? it does not make any sense, it supposed to ahve the same performance
in the image I sent. the worker is completely stuck, I just ran the hadnler manually ni shh int he worker it works exceptionally fast, this is what I expect fromn the serverless worker
worker still frozen ... ssh already finished
@justin please anyone that can help, I'm being billed for every second, can anyone frm runpod actually help me please?
@Justin Merrell
@Polar
@Finley
tagging admin in hope someone actually helps, @Madiator2011 did not answer any question nor try to help me, please we relly want to scale using runpod, I did not have any problem using constant pods, but serverless has been compeltely unpredictable
I am trying to catch up on the issue, what is the concern again?
Have you tried adding any print statements within your handler to see where it might be getting caught up?
yes after the print on the left image
ic| type(audio_file): <class '_io.BytesIO'>
the transcription "starts", so at leat I should see the message transcribe.py :263 2024-02-04 22:35:25,366 Processing audio with duration 02:36:32.192
, as you can see in the successfull run:
Run inside the active worker with ssh:
As you can see after between loading the file and start transcription only seconds go by.
While in the serverless worker being called, sometimes it just gets stuck before transcription, printing only the file type as you can see in the image on the left
Only sometimes the worker actually works as expected but is not stable, and the main questions is why it takes like 3 minutes doing it frm ssh, the worker should work exactly as the ssh no?
@Justin MerrellNot sure I am following why you are SSHing into your worker
To test the hardware, If I ssh basically I'm using the image and also the hardware, If I run
python handler_tmp.py --test_input $INPUT_DATA
then results should be similar, otherwse why would you offer a ssh command to connect to the active worker?
And that is not even the problem @Justin Merrell why on earth this Pro GPU is not working, this funciton was working on a pod with A4000, and using a RTX 4090 gets stuck?? It does not make any sensemore than 12 minutes and still, the transcription has not started
Is like the code is not even being executed properly, I should be able to see at least thje transcription logs, not the final segmentes wich are also a print
May you paste the job id and endpoint id here?
From the screenshots it appears that something is getting caught within the handler functions
of course, sending it right away
jyylcod6owxt9i
this only happens with larger file sit seems, I have more than 40 gb in the container disk so it shuold be enough
is there anything I can do to help debug this? @Justin Merrell
Is it getting caught up trying to do CPU work? I am seeing that one of the jobs failed and was retried
It seems that way, but in some executions I was able to see gpu utilziation
Can you add print statements while your handler code is processing things just to confirm it is still working
i just canceled the request that was retrying
I have print statements, I'll write each content should appear
1. Init ->
ic| device: 'cuda', compute: 'float16'
2. Download file frm AWS S3 in bytes form, at the end we should see: ic| type(audio_file): <class '_io.BytesIO'>
3. Transcription, when it starts we should see:
4. After finishing transcription, segment should print: ic| segments: [Segment(id=1, seek=2428,...
5. After segments a work_dir print should appear: ic| work_dir: 'dev/tmp/test_files'
since we are saving the file there
6. A final message after saving the file should appear: ic| f'{filename}_{info.language}_segments.json': 'ππ 1) Environments PEPE-20240131_140530-Meeting Recording_es_segments.json'
I'm using icecream printing library that is the ic
I cannot put more prints, as ther is no middle step, these prints were designed to debug the system but are no quite helpful now in the worker
@Justin MerrellWhich step is it getting hung up on?
number 3, transcription never seems to start
this image
@Justin Merrell
And what code in the handler should be running that?
Iβm suspecting that the code might not recognize stream file as correct file though wonβt will be able to check it to morning
this is the code:
an async version of running the faster wshiper medium model, previosly cached in the dockerfile
@Justin Merrell
why async?
it was previously handled like that and after we upload using async as well
are you saying I shoudl try 100% sync and maybe i'll work? shouldnt we see an error from asyncio?
Solution
Take a look at our implementation of Fast Whisper https://github.com/runpod-workers/worker-faster_whisper/blob/main/src/rp_handler.py
Your code is already blocking, async is likely just introducing complexities
I'll try sync mode, will be posting updates here
4 endpoints were corrected with using sync methods, I had to sacrifice 1 functino that combined async logics and could not be replaced as easily, now the testing makes sense with the resutls obtained by ssh, still thinking why it didnt work with async, eveything was going smooth in a normal gpu with that code
@Justin Merrell thanks a lot for the help and listening
@Madiator2011 hope you can work con support skills, basically the first 50 messages were lost and the problem was visible, you just did not ask any helpful question but just asume, hope you can work on that
either thanks a lot, I'll be testing async logics next week, If i manage to mix it correctly I'll be posting results here so other people can benefit, thanks
I'm sorry that I could not help with that but you tagged me in the late night π