Split audio file into 20mb chunks

Im trying to figure out how to take the file at this URL, and send it to OpenAI in chunks of 20mb: https://www.podtrac.com/pts/redirect.mp3/pdst.fm/e/chrt.fm/track/3F7F74/traffic.megaphone.fm/SCIM6504498504.mp3?updated=1710126905 Any help would be amazing!!
1 Reply
varsill
varsill9mo ago
Hello! If I get it correctly, you want to chunk the given .mp3 file into several .mp3 files of a desired size, and then provide them as an input to OpenAI service with API of a type similar to that one: https://platform.openai.com/docs/api-reference/audio/createTranscription. If so, you would need to first read that .mp3 via some kind of a HTTP client, then parse the input bytestream to split it into mp3 frames, then accumulate the frames into larger chunks of a size ~20MB and finally send HTTP request to the service. You could do it with the use of Membrane, but you would need to write some custom elements. Then you could create a pipeline of such a form:
child(%Membrane.Hackney.Source{location: <mp3 ULR>}) |> child(MP3.Parser) |> child(Aggregator) |> child(HTTP.Sink)
child(%Membrane.Hackney.Source{location: <mp3 ULR>}) |> child(MP3.Parser) |> child(Aggregator) |> child(HTTP.Sink)
where: * Membrane.Hackney.Source is already available in the :membrane_hackney_plugin package, * the MP3.Parser would split the bytestream into MP3 frames based on the MP3 header (it shouldn't be difficult, but we could surely help you write that element, for some information about MP3 you can see the following: https://www.codeproject.com/Articles/8295/MPEG-Audio-Frame-Header) * the Aggragator would accumulate the MP3 frames to create a buffers of a size ~20MB * the HTTP.Sink would prepare the HTTP requests that are compliant with the OpenAI API and send these requests (you could use https://github.com/benoitc/hackney for that purpose)
CodeProject
MPEG Audio Frame Header
An article about the MPEG audio frame header.
GitHub
GitHub - benoitc/hackney: simple HTTP client in Erlang
simple HTTP client in Erlang. Contribute to benoitc/hackney development by creating an account on GitHub.
Want results from more Discord servers?
Add your server