Support for Speaker Diarization on Cloudflare Workers AI

Hi everyone, I’m currently using Cloudflare Workers AI for speech-to-text transcription with Whisper-large-v3-turbo, and it works great. However, I also need speaker diarization to differentiate between multiple speakers in an audio file. Right now, the best open-source option is Pyannote, but it requires a GPU and seems too heavy to run on Cloudflare Workers due to resource limits. **Is there any way to run speaker diarization on Cloudflare Workers AI (e.g., an optimized model or workaround)? Alternatively has anyone successfully implemented lightweight diarization within Cloudflare’s ecosystem (Workers, KV, R2, etc.)? I’d love to keep everything within Cloudflare rather than using third-party services. Any suggestions or insights would be greatly appreciated! Thanks in advance! 🚀
0 Replies
No replies yetBe the first to reply to this messageJoin

Did you find this page helpful?