Support for Speaker Diarization on Cloudflare Workers AI
Hi everyone,
I’m currently using Cloudflare Workers AI for speech-to-text transcription with Whisper-large-v3-turbo, and it works great. However, I also need speaker diarization to differentiate between multiple speakers in an audio file.
Right now, the best open-source option is Pyannote, but it requires a GPU and seems too heavy to run on Cloudflare Workers due to resource limits.
**Is there any way to run speaker diarization on Cloudflare Workers AI (e.g., an optimized model or workaround)?
Alternatively has anyone successfully implemented lightweight diarization within Cloudflare’s ecosystem (Workers, KV, R2, etc.)?
I’d love to keep everything within Cloudflare rather than using third-party services. Any suggestions or insights would be greatly appreciated!
Thanks in advance! 🚀
0 Replies