RunPod•14mo ago

Worker hangs for really long time, performance is not close to what it should be

Hi, I'm working with a transcription and diarization endpoint. The docker image works great, tested locally and also inside a worker, I ssh into the worker and tested using:

python handler.py --test_input '{"input": {"endpoint": "transcribe_option", "file_path": "dev/tmp/test_files/FastAPI_Introduction_-_Build_Your_First_Web_App_-_Python_Tutorial.mp4", "is_diarization": true}}'

python handler.py --test_input '{"input": {"endpoint": "transcribe_option", "file_path": "dev/tmp/test_files/FastAPI_Introduction_-_Build_Your_First_Web_App_-_Python_Tutorial.mp4", "is_diarization": true}}'

The processing time is around 1 minute for this video (11 min), works great, these are the logs I get from running inside the worker the same reques -> Message.txt appended. Once I test this endpoint using a normal request the worker behaves completely abnormal, taking more than 5-6 minutes just to start the transcription, then even more minutes transcribing. The really weird part is I tested the handler in the worker itself using ssh, I have no idea how to debug this or what might be happening:

2024-02-04T03:05:20.295007097Z --- Starting Serverless Worker |  Version 1.6.0 ---
2024-02-04T03:10:30.435312352Z {"requestId": "c81dbe2d-2000-47a8-9336-3c056b9576ca-u1", "message": "Started.", "level": "INFO"}
2024-02-04T03:10:30.596425577Z credentials.py      :1123 2024-02-04 03:10:30,596 Found credentials in environment variables.
2024-02-04T03:11:12.226402320Z ic| type(audio_file): <class '_io.BytesIO'>
2024-02-04T03:17:26.405880274Z transcribe.py       :263  2024-02-04 03:17:26,405 Processing audio with duration 11:56.266
2024-02-04T03:21:45.191995763Z transcribe.py       :317  2024-02-04 03:21:45,191 Detected language 'en' with probability 1.00

2024-02-04T03:05:20.295007097Z --- Starting Serverless Worker |  Version 1.6.0 ---
2024-02-04T03:10:30.435312352Z {"requestId": "c81dbe2d-2000-47a8-9336-3c056b9576ca-u1", "message": "Started.", "level": "INFO"}
2024-02-04T03:10:30.596425577Z credentials.py      :1123 2024-02-04 03:10:30,596 Found credentials in environment variables.
2024-02-04T03:11:12.226402320Z ic| type(audio_file): <class '_io.BytesIO'>
2024-02-04T03:17:26.405880274Z transcribe.py       :263  2024-02-04 03:17:26,405 Processing audio with duration 11:56.266
2024-02-04T03:21:45.191995763Z transcribe.py       :317  2024-02-04 03:21:45,191 Detected language 'en' with probability 1.00

This should happen in matter of seconds, just liek the logs from the execution within the worker says: Manual execution

--- Starting Serverless Worker |  Version 1.6.0 ---
{"requestId": null, "message": "test_input set, using test_input as job input.", "level": "INFO"}
{"requestId": "local_test", "message": "Started.", "level": "INFO"}
credentials.py      :1123 2024-02-04 03:15:50,178 Found credentials in environment variables.
ic| type(audio_file): <class '_io.BytesIO'>
transcribe.py       :263  2024-02-04 03:15:58,890 Processing audio with duration 11:56.266
transcribe.py       :317  2024-02-04 03:16:00,622 Detected language 'en' with probability 1.00

--- Starting Serverless Worker |  Version 1.6.0 ---
{"requestId": null, "message": "test_input set, using test_input as job input.", "level": "INFO"}
{"requestId": "local_test", "message": "Started.", "level": "INFO"}
credentials.py      :1123 2024-02-04 03:15:50,178 Found credentials in environment variables.
ic| type(audio_file): <class '_io.BytesIO'>
transcribe.py       :263  2024-02-04 03:15:58,890 Processing audio with duration 11:56.266
transcribe.py       :317  2024-02-04 03:16:00,622 Detected language 'en' with probability 1.00

Weirdly enough, I have another endpoint using just transcription, its also fast:

2024-02-04T03:28:55.812030213Z ic| 'Initializing transcribe with files'
2024-02-04T03:28:55.865688543Z credentials.py      :1123 2024-02-04 03:28:55,865 Found credentials in environment variables.
2024-02-04T03:29:00.602901119Z ic| 'transcribe_wfiles'
2024-02-04T03:29:00.612098062Z ic| type(audio_file): <class '_io.BytesIO'>
2024-02-04T03:29:33.423009670Z transcribe.py       :263  2024-02-04 03:29:33,422 Processing audio with duration 11:56.266
2024-02-04T03:30:02.189243458Z transcribe.py       :317  2024-02-04 03:30:02,188 Detected language 'en' with probability 1.00

2024-02-04T03:28:55.812030213Z ic| 'Initializing transcribe with files'
2024-02-04T03:28:55.865688543Z credentials.py      :1123 2024-02-04 03:28:55,865 Found credentials in environment variables.
2024-02-04T03:29:00.602901119Z ic| 'transcribe_wfiles'
2024-02-04T03:29:00.612098062Z ic| type(audio_file): <class '_io.BytesIO'>
2024-02-04T03:29:33.423009670Z transcribe.py       :263  2024-02-04 03:29:33,422 Processing audio with duration 11:56.266
2024-02-04T03:30:02.189243458Z transcribe.py       :317  2024-02-04 03:30:02,188 Detected language 'en' with probability 1.00

Solution:

Take a look at our implementation of Fast Whisper https://github.com/runpod-workers/worker-faster_whisper/blob/main/src/rp_handler.py Your code is already blocking, async is likely just introducing complexities...

Jump to solution

45 Replies

bggaiOP•14mo ago

@Madiator2011 here

Madiator2011•14mo ago

Probably it's downloading input file though it's look like you use cpu and not gpu Is that custom code?

bggaiOP•14mo ago

yes is a custom code, but I have every model in cache if i log in to the worker with ssh the coderuns smoothly in just 1-2 min for a 11 minuyte video and even 6-8 minues for a 2.5 hour video but once the worker is actually called it hebhaves completely different

Madiator2011•14mo ago

you use local file uploaded to worker or download input file from remote location

bggaiOP•14mo ago

the file is in AWS S3, it does take more time, but that is not the problem, you can see it actually prints the type of video and everything but once it starts transcription or diarization it hangs

bggaiOP•14mo ago

performance of just transcription is close now

bggaiOP•14mo ago

but once the file is too big it takes a lot of time why is it that this does not take that much time when I manually execute the handler using ssh into the worker?

Madiator2011•14mo ago

I think it's probably you giving path to local file

bggaiOP•14mo ago

could you explain better? what I do is: 1. Send S3 path to worker 2. Download the object as bytes, just using memory that way I do not download the file 3. Apply transcription, for small files this handles it well, for bigger files the difference is huge

Madiator2011•14mo ago

Though still you getting latency

bggaiOP•14mo ago

Shouldnt the performance within ssh be the same then?

Madiator2011•14mo ago

reading file from remote server will be slower than reading local stored file

bggaiOP•14mo ago

But if you have the local file, you need to loaded into memory either way, I'm doing the same? Are you saying that even if I download the object as bytes, it will be faster to download and then open as bytes? seems weird

Madiator2011•14mo ago

you also adding network latency when reading file

bggaiOP•14mo ago

I know, but I have been saying the difference is mainly the gpu process, not reading the file,l I know that can take time and it does, but why should this affect the process with GPU once the file is already loaded into python memory

Madiator2011•14mo ago

I do not know code of your worker so cant help

bggaiOP•14mo ago

Here is my code:

async def async_transcribe_wfiles(
        audio_file, 
        filename: str, 
        file_path:str,
        ):
    # Create timestamp
    ic(type(audio_file))
    loop = asyncio.get_event_loop()
    # Faster whisper
    segments, info = await loop.run_in_executor(None, lambda: whisper_model.transcribe(audio_file, beam_size=5))
    segments = await loop.run_in_executor(None, lambda: list(segments))
    ic(segments)
    # Create directory
    #os.makedirs(f'/tmp/{org_id}/{session}', exist_ok=True)
    work_dir = f'{file_path.split(f"/{filename}")[0]}'
    ic(work_dir)
    os.makedirs(work_dir, exist_ok=True)

    # Create files to upload
    segments_x = [{"text": segment.text, "start": round(segment.start, 3), "end": round(segment.end, 3)} for segment in segments]

    json_path = f'{work_dir}/{filename}_{info.language}_segments.json'
     # Check if SRT file (subtitles options is being called)
    srt_path = f'{work_dir}/{filename}_transcribe.srt'
    srt_content = await loop.run_in_executor(None, create_srt, segments_x)
    txt_path = f'{work_dir}/{filename}_transcribe.txt'
    txt_content = "".join([x['text'] for x in segments_x]) #Devolver esta variable.
    # Save to S3 using bytes
    await loop.run_in_executor(None, save_string_to_s3, txt_path, txt_content)
    await loop.run_in_executor(None, save_string_to_s3, srt_path, srt_content)
    await async_write_json(json_path, segments_x)
    await async_upload_s3(json_path, settings.AWS_BUCKET_NAME, f'{work_dir}/{filename}_{info.language}_segments.json')
    ic(f'{filename}_{info.language}_segments.json') 
    return info, txt_path, srt_path, json_path

async def transcribe_option(
        file_path: str,
        is_diarization: bool,
        ):
    # Download from s3
    loop = asyncio.get_event_loop()
    file_contents = await loop.run_in_executor(None, download_s3_object_to_memory, file_path)
    filename, file_extension, clean_filename = sanitize_file_name_s3path(file_path)

    # Create transcription
    info, txt_path, srt_path, json_path = await async_transcribe_wfiles(file_contents, filename, file_path)

    return {
        "transcripts": [txt_path, srt_path],
        "type": "diarization" if is_diarization else "transcription"
    }

### Handler
# Define the handler function.
def handler(job):
    job_input = job["input"]  # Access the input from the request.
    endpoint = job_input["endpoint"]
    try:
        # Check endpoint and process
        if endpoint == "fast_transcribe":
            job_input.pop("endpoint")
            return transcribe(**job_input)
        if endpoint == "transcribe":
            job_input.pop("endpoint")
            return transcribe_wfiles(**job_input)
        if endpoint == "transcribe_option":
            job_input.pop("endpoint")
            return transcribe_option(**job_input)
        if endpoint == "transcribe_diarize":
            job_input.pop("endpoint")
            return transcribe_diarize(**job_input)
    except Exception as e:
        return {"error": f"Invalid input data: {str(e)}"}

runpod.serverless.start({"handler": handler})

async def async_transcribe_wfiles(
        audio_file, 
        filename: str, 
        file_path:str,
        ):
    # Create timestamp
    ic(type(audio_file))
    loop = asyncio.get_event_loop()
    # Faster whisper
    segments, info = await loop.run_in_executor(None, lambda: whisper_model.transcribe(audio_file, beam_size=5))
    segments = await loop.run_in_executor(None, lambda: list(segments))
    ic(segments)
    # Create directory
    #os.makedirs(f'/tmp/{org_id}/{session}', exist_ok=True)
    work_dir = f'{file_path.split(f"/{filename}")[0]}'
    ic(work_dir)
    os.makedirs(work_dir, exist_ok=True)

    # Create files to upload
    segments_x = [{"text": segment.text, "start": round(segment.start, 3), "end": round(segment.end, 3)} for segment in segments]

    json_path = f'{work_dir}/{filename}_{info.language}_segments.json'
     # Check if SRT file (subtitles options is being called)
    srt_path = f'{work_dir}/{filename}_transcribe.srt'
    srt_content = await loop.run_in_executor(None, create_srt, segments_x)
    txt_path = f'{work_dir}/{filename}_transcribe.txt'
    txt_content = "".join([x['text'] for x in segments_x]) #Devolver esta variable.
    # Save to S3 using bytes
    await loop.run_in_executor(None, save_string_to_s3, txt_path, txt_content)
    await loop.run_in_executor(None, save_string_to_s3, srt_path, srt_content)
    await async_write_json(json_path, segments_x)
    await async_upload_s3(json_path, settings.AWS_BUCKET_NAME, f'{work_dir}/{filename}_{info.language}_segments.json')
    ic(f'{filename}_{info.language}_segments.json') 
    return info, txt_path, srt_path, json_path

async def transcribe_option(
        file_path: str,
        is_diarization: bool,
        ):
    # Download from s3
    loop = asyncio.get_event_loop()
    file_contents = await loop.run_in_executor(None, download_s3_object_to_memory, file_path)
    filename, file_extension, clean_filename = sanitize_file_name_s3path(file_path)

    # Create transcription
    info, txt_path, srt_path, json_path = await async_transcribe_wfiles(file_contents, filename, file_path)

    return {
        "transcripts": [txt_path, srt_path],
        "type": "diarization" if is_diarization else "transcription"
    }

### Handler
# Define the handler function.
def handler(job):
    job_input = job["input"]  # Access the input from the request.
    endpoint = job_input["endpoint"]
    try:
        # Check endpoint and process
        if endpoint == "fast_transcribe":
            job_input.pop("endpoint")
            return transcribe(**job_input)
        if endpoint == "transcribe":
            job_input.pop("endpoint")
            return transcribe_wfiles(**job_input)
        if endpoint == "transcribe_option":
            job_input.pop("endpoint")
            return transcribe_option(**job_input)
        if endpoint == "transcribe_diarize":
            job_input.pop("endpoint")
            return transcribe_diarize(**job_input)
    except Exception as e:
        return {"error": f"Invalid input data: {str(e)}"}

runpod.serverless.start({"handler": handler})

file contents is the file in memoery same video, now hangs forever, I ven changed to a gpu 24gb pro

bggaiOP•14mo ago

more than 10 minutes and still no tanscription, file is already lodad per logs

bggaiOP•14mo ago

the performance is not stable at all, still working at great speed while doing ssh why is the worker working faster in ssh ?? it does not make any sense, it supposed to ahve the same performance in the image I sent. the worker is completely stuck, I just ran the hadnler manually ni shh int he worker it works exceptionally fast, this is what I expect fromn the serverless worker

bggaiOP•14mo ago

worker still frozen ... ssh already finished

bggaiOP•14mo ago

@justin please anyone that can help, I'm being billed for every second, can anyone frm runpod actually help me please? @Justin Merrell @Polar @Finley tagging admin in hope someone actually helps, @Madiator2011 did not answer any question nor try to help me, please we relly want to scale using runpod, I did not have any problem using constant pods, but serverless has been compeltely unpredictable

Justin Merrell•14mo ago

I am trying to catch up on the issue, what is the concern again? Have you tried adding any print statements within your handler to see where it might be getting caught up?

bggaiOP•14mo ago

yes after the print on the left image ic| type(audio_file): <class '_io.BytesIO'> the transcription "starts", so at leat I should see the message transcribe.py :263 2024-02-04 22:35:25,366 Processing audio with duration 02:36:32.192, as you can see in the successfull run: Run inside the active worker with ssh:

--- Starting Serverless Worker |  Version 1.6.0 ---
{"requestId": null, "message": "test_input set, using test_input as job input.", "level": "INFO"}
{"requestId": "local_test", "message": "Started.", "level": "INFO"}
credentials.py      :1123 2024-02-04 22:58:15,683 Found credentials in environment variables.
ic| type(audio_file): <class '_io.BytesIO'>
transcribe.py       :263  2024-02-04 22:58:26,199 Processing audio with duration 02:36:32.192
transcribe.py       :317  2024-02-04 22:58:40,144 Detected language 'es' with probability 0.98

--- Starting Serverless Worker |  Version 1.6.0 ---
{"requestId": null, "message": "test_input set, using test_input as job input.", "level": "INFO"}
{"requestId": "local_test", "message": "Started.", "level": "INFO"}
credentials.py      :1123 2024-02-04 22:58:15,683 Found credentials in environment variables.
ic| type(audio_file): <class '_io.BytesIO'>
transcribe.py       :263  2024-02-04 22:58:26,199 Processing audio with duration 02:36:32.192
transcribe.py       :317  2024-02-04 22:58:40,144 Detected language 'es' with probability 0.98

As you can see after between loading the file and start transcription only seconds go by. While in the serverless worker being called, sometimes it just gets stuck before transcription, printing only the file type as you can see in the image on the left Only sometimes the worker actually works as expected but is not stable, and the main questions is why it takes like 3 minutes doing it frm ssh, the worker should work exactly as the ssh no? @Justin Merrell

Justin Merrell•14mo ago

Not sure I am following why you are SSHing into your worker

bggaiOP•14mo ago

To test the hardware, If I ssh basically I'm using the image and also the hardware, If I run python handler_tmp.py --test_input $INPUT_DATA then results should be similar, otherwse why would you offer a ssh command to connect to the active worker? And that is not even the problem @Justin Merrell why on earth this Pro GPU is not working, this funciton was working on a pod with A4000, and using a RTX 4090 gets stuck?? It does not make any sense

bggaiOP•14mo ago

more than 12 minutes and still, the transcription has not started Is like the code is not even being executed properly, I should be able to see at least thje transcription logs, not the final segmentes wich are also a print

Justin Merrell•14mo ago

May you paste the job id and endpoint id here? From the screenshots it appears that something is getting caught within the handler functions

bggaiOP•14mo ago

of course, sending it right away jyylcod6owxt9i this only happens with larger file sit seems, I have more than 40 gb in the container disk so it shuold be enough is there anything I can do to help debug this? @Justin Merrell

Justin Merrell•14mo ago

Is it getting caught up trying to do CPU work? I am seeing that one of the jobs failed and was retried

bggaiOP•14mo ago

It seems that way, but in some executions I was able to see gpu utilziation

Justin Merrell•14mo ago

Can you add print statements while your handler code is processing things just to confirm it is still working

bggaiOP•14mo ago

i just canceled the request that was retrying I have print statements, I'll write each content should appear 1. Init -> ic| device: 'cuda', compute: 'float16' 2. Download file frm AWS S3 in bytes form, at the end we should see: ic| type(audio_file): <class '_io.BytesIO'> 3. Transcription, when it starts we should see:

transcribe.py       :263  2024-02-04 22:58:26,199 Processing audio with duration 02:36:32.192
transcribe.py       :317  2024-02-04 22:58:40,144 Detected language 'es' with probability 0.98

transcribe.py       :263  2024-02-04 22:58:26,199 Processing audio with duration 02:36:32.192
transcribe.py       :317  2024-02-04 22:58:40,144 Detected language 'es' with probability 0.98

4. After finishing transcription, segment should print: ic| segments: [Segment(id=1, seek=2428,... 5. After segments a work_dir print should appear: ic| work_dir: 'dev/tmp/test_files' since we are saving the file there 6. A final message after saving the file should appear:

ic| f'{filename}_{info.language}_segments.json': '🚚🎓 1) Environments PEPE-20240131_140530-Meeting Recording_es_segments.json'

I'm using icecream printing library that is the ic I cannot put more prints, as ther is no middle step, these prints were designed to debug the system but are no quite helpful now in the worker @Justin Merrell

Justin Merrell•14mo ago

Which step is it getting hung up on?

bggaiOP•14mo ago

number 3, transcription never seems to start this image @Justin Merrell

Justin Merrell•14mo ago

And what code in the handler should be running that?

Madiator2011•14mo ago

I’m suspecting that the code might not recognize stream file as correct file though won’t will be able to check it to morning

bggaiOP•14mo ago

this is the code:

async def async_transcribe_wfiles(
        audio_file, 
        filename: str, 
        file_path:str,
        ):
    # Create timestamp
    ic(type(audio_file))
    loop = asyncio.get_event_loop()

    segments, info = await loop.run_in_executor(None, lambda: whisper_model.transcribe(audio_file, beam_size=5))
    segments = await loop.run_in_executor(None, lambda: list(segments))
    ic(segments)
    # Create directory
    #os.makedirs(f'/tmp/{org_id}/{session}', exist_ok=True)
    work_dir = f'{file_path.split(f"/{filename}")[0]}'
    ic(work_dir)
    os.makedirs(work_dir, exist_ok=True)

async def async_transcribe_wfiles(
        audio_file, 
        filename: str, 
        file_path:str,
        ):
    # Create timestamp
    ic(type(audio_file))
    loop = asyncio.get_event_loop()

    segments, info = await loop.run_in_executor(None, lambda: whisper_model.transcribe(audio_file, beam_size=5))
    segments = await loop.run_in_executor(None, lambda: list(segments))
    ic(segments)
    # Create directory
    #os.makedirs(f'/tmp/{org_id}/{session}', exist_ok=True)
    work_dir = f'{file_path.split(f"/{filename}")[0]}'
    ic(work_dir)
    os.makedirs(work_dir, exist_ok=True)

an async version of running the faster wshiper medium model, previosly cached in the dockerfile @Justin Merrell

Justin Merrell•14mo ago

why async?

bggaiOP•14mo ago

it was previously handled like that and after we upload using async as well are you saying I shoudl try 100% sync and maybe i'll work? shouldnt we see an error from asyncio?

Solution

Justin Merrell•14mo ago

bggaiOP•14mo ago

I'll try sync mode, will be posting updates here 4 endpoints were corrected with using sync methods, I had to sacrifice 1 functino that combined async logics and could not be replaced as easily, now the testing makes sense with the resutls obtained by ssh, still thinking why it didnt work with async, eveything was going smooth in a normal gpu with that code @Justin Merrell thanks a lot for the help and listening @Madiator2011 hope you can work con support skills, basically the first 50 messages were lost and the problem was visible, you just did not ask any helpful question but just asume, hope you can work on that either thanks a lot, I'll be testing async logics next week, If i manage to mix it correctly I'll be posting results here so other people can benefit, thanks

Madiator2011•14mo ago

I'm sorry that I could not help with that but you tagged me in the late night 🙂

Gaming

Programming

Worker hangs for really long time, performance is not close to what it should be

Did you find this page helpful?