Volume normalization
I know this might be a bit of an out-of-scope question, but I couldn't really find any documentation on this online. I found ffmpeg has the option of volume normalization, but it would be nice to not analyze the entire file beforehand.
I have a folder with a bunch of mp3s or oggs and I'm playing then using the example on the website.
The issue is that some are much louder than others.
Of course, I can fix the files, but that would be no fun!
Is there a way to dynamically set the peak volume (maybe based on the first 10 seconds of the file or so) to a certain level?
I want to avoid scanning the entire file ffmpeg because what if I have a super long file?
Any help is much appreciated!
8 Replies
- What's your exact discord.js
npm list discord.js
and node node -v
version?
- Not a discord.js issue? Check out #other-js-ts.
- Consider reading #how-to-get-help to improve your question!
- Explain what exactly your issue is.
- Post the full error stack trace, not just the top part!
- Show your code!
- Issue solved? Press the button!You must convert whatever the format you have into raw audio first. After that, perform digital signal processing algorithms to make it do what you want.
FFmpeg is excellent at digital audio signal processing, but if you want to avoid it, implement your own. For volume normalization, you have to detect peaks first and then write something like dynamics compressor
I don't know much of the helper libs on npm for this, but for web, WebAudioAPI exists.
Also if you are performing it in real time, make sure your algorithm is fast, slight delay above ~10ms to 20ms can cause lag. Workaround is to use SIMD, so might as well explore webassembly or native code?
And another thing, you should not process entire file at once, it is not efficient. Utilize streams and process it in chunks
tldr; if everything happens inside a web browser, just use web audio api, otherwise implement your own dsp
No, I'm running it from Node actually.
I know about ffmpeg very well but I didn't find anything suitable...
I'm pretty sure I can just use FS to get a read stream from the file and give that to the djs audio player, but I'm not sure what to do in the middle between FS and djs to analyze chunks of the audio or a single chunk and normalize the volume for the entire stream
I guess my issue is that I don't know what tools I could use
Basic idea of audio processing is like
source
-> transformer
-> destination
(basically a pipeline of streams)
you can construct it like this (note: this is just an example code, dont expect it to work)
simplest transformer of all is the volume transformer, where volume = volume * (value between 0 and 1)
which you'd use like chunk.readInt16LE(...) * 0.5
(50% volume) considering the raw audio is in s16le
format, which you'd have to write using chunk.writeInt16LE(...)
. This transformer exists inside prism-media's source code, you can take a lookThank you very much for the details example @twlite !
I understand that, my issue is actually I guess mostly logical on how to implement the audio leveling, because if I just follow the example it would decrease the volume in general by 50% right?
But I would like to be able to normalize the volume of let's say individual files if possible without scanning them first, but use a sample of let's say 10 seconds or so
Dynamic range compression
Dynamic range compression (DRC) or simply compression is an audio signal processing operation that reduces the volume of loud sounds or amplifies quiet sounds, thus reducing or compressing an audio signal's dynamic range. Compression is commonly used in sound recording and reproduction, broadcasting, live sound reinforcement and some instrument...
I did this a while ago decoding to PCM using prism media, but now I recommend something lighter and more updated like the @Evan/opus module
Then, would calculate the root mean square value of each buffer of pcm audio and see if it was higher than fixed amplitude, then modify the volume accordingly
This attempt worked in theory but it wasn’t really good, hopefully it gives you some ideas tho
Yeah after reading that I was thinking if it would be possible to somehow make the base volume a given dB
There is not fad- ins or fade outs, it's linear sfxs so that would work to me