Avoiding hallucinations/repetitions when using the faster whisper worker ?
worker:
https://github.com/runpod-workers/worker-faster_whisper
Hi everyone, as the title suggests, I'm encountering an issue where the transcription occasionally might repeat the same word/sentence.
When this occurs it ruins the entire transcription from the point where it happens.
My use case 90% of the time will be large audio recordings ranging from 40 to 120 minutes.
From what I read this seems like a semi-common whisper issue but I haven't found any consistent solutions.
Some things I've tried:
- Using large-v2 instead of large-v3
- enabling VAD
Other than that I haven't adjusted any different parameters.
Any help will be greatly appreciated! πππ:poddy:
GitHub
GitHub - runpod-workers/worker-faster_whisper: π§ | RunPod worker of...
π§ | RunPod worker of the faster-whisper model for Serverless Endpoint. - runpod-workers/worker-faster_whisper
data:image/s3,"s3://crabby-images/35cf6/35cf61abd0d7b78a7e33fcd8b71deb184b6c75c8" alt="No description"
1 Reply
i dont have alot experience on this, but maybe you can tweak some settings on the whisper?
this might worth trying tho:
Instead of transcribing the entire 40-120 minute file at once, split it into smaller chunks (e.g., 5-10 minute segments). Transcribe each chunk individually and then concatenate the results. This can help prevent the model from getting lost in long files. Be aware that you may have to manually adjust the seams between the chunks.
also check these: