discord.js - Imagine an app•16mo ago

Volume normalization

I know this might be a bit of an out-of-scope question, but I couldn't really find any documentation on this online. I found ffmpeg has the option of volume normalization, but it would be nice to not analyze the entire file beforehand. I have a folder with a bunch of mp3s or oggs and I'm playing then using the example on the website. The issue is that some are much louder than others. Of course, I can fix the files, but that would be no fun! Is there a way to dynamically set the peak volume (maybe based on the first 10 seconds of the file or so) to a certain level? I want to avoid scanning the entire file ffmpeg because what if I have a super long file? Any help is much appreciated!

8 Replies

d.js toolkit•16mo ago

- What's your exact discord.js npm list discord.js and node node -v version? - Not a discord.js issue? Check out #other-js-ts. - Consider reading #how-to-get-help to improve your question! - Explain what exactly your issue is. - Post the full error stack trace, not just the top part! - Show your code! - Issue solved? Press the button!

Twilight•16mo ago

You must convert whatever the format you have into raw audio first. After that, perform digital signal processing algorithms to make it do what you want. FFmpeg is excellent at digital audio signal processing, but if you want to avoid it, implement your own. For volume normalization, you have to detect peaks first and then write something like dynamics compressor I don't know much of the helper libs on npm for this, but for web, WebAudioAPI exists. Also if you are performing it in real time, make sure your algorithm is fast, slight delay above ~10ms to 20ms can cause lag. Workaround is to use SIMD, so might as well explore webassembly or native code? And another thing, you should not process entire file at once, it is not efficient. Utilize streams and process it in chunks tldr; if everything happens inside a web browser, just use web audio api, otherwise implement your own dsp

kriskotooBGOP•16mo ago

No, I'm running it from Node actually. I know about ffmpeg very well but I didn't find anything suitable... I'm pretty sure I can just use FS to get a read stream from the file and give that to the djs audio player, but I'm not sure what to do in the middle between FS and djs to analyze chunks of the audio or a single chunk and normalize the volume for the entire stream I guess my issue is that I don't know what tools I could use

Twilight•16mo ago

Basic idea of audio processing is like source -> transformer -> destination (basically a pipeline of streams) you can construct it like this (note: this is just an example code, dont expect it to work)

// read your file
const input = fs.createReadStream('./my-file.ogg');
// create ogg demuxer, in this case it is used to parse opus packets
const oggDemuxer = new prism.opus.OggDemuxer(...);
// create opus decoder to get the raw audio packets
const decoder = new prism.opus.Decoder(...);

// create decoder pipeline
const pcm = input.pipe(oggDemuxer).pipe(decoder);

// define your transformer
class MyTransformer extends Transformer {
  _transform(chunk, _, done) {
    // perform some magic with the chunk
    const modifiedChunk = ...;
    this.push(modifiedChunk);
    done();
  }
}

// now pipe the raw audio stream to your transformer
const stream = pcm.pipe(new MyTransformer())

// now pass it to djs/voice for example, make sure to set stream type to raw
passToDiscordJS(stream, { type: StreamType.Raw })

// read your file
const input = fs.createReadStream('./my-file.ogg');
// create ogg demuxer, in this case it is used to parse opus packets
const oggDemuxer = new prism.opus.OggDemuxer(...);
// create opus decoder to get the raw audio packets
const decoder = new prism.opus.Decoder(...);

// create decoder pipeline
const pcm = input.pipe(oggDemuxer).pipe(decoder);

// define your transformer
class MyTransformer extends Transformer {
  _transform(chunk, _, done) {
    // perform some magic with the chunk
    const modifiedChunk = ...;
    this.push(modifiedChunk);
    done();
  }
}

// now pipe the raw audio stream to your transformer
const stream = pcm.pipe(new MyTransformer())

// now pass it to djs/voice for example, make sure to set stream type to raw
passToDiscordJS(stream, { type: StreamType.Raw })

simplest transformer of all is the volume transformer, where volume = volume * (value between 0 and 1) which you'd use like chunk.readInt16LE(...) * 0.5 (50% volume) considering the raw audio is in s16le format, which you'd have to write using chunk.writeInt16LE(...). This transformer exists inside prism-media's source code, you can take a look

kriskotooBGOP•16mo ago

Thank you very much for the details example @twlite ! I understand that, my issue is actually I guess mostly logical on how to implement the audio leveling, because if I just follow the example it would decrease the volume in general by 50% right? But I would like to be able to normalize the volume of let's say individual files if possible without scanning them first, but use a sample of let's say 10 seconds or so

Twilight•16mo ago

https://en.wikipedia.org/wiki/Dynamic_range_compression

Dynamic range compression

Dynamic range compression (DRC) or simply compression is an audio signal processing operation that reduces the volume of loud sounds or amplifies quiet sounds, thus reducing or compressing an audio signal's dynamic range. Compression is commonly used in sound recording and reproduction, broadcasting, live sound reinforcement and some instrument...

Elucid•16mo ago

I did this a while ago decoding to PCM using prism media, but now I recommend something lighter and more updated like the @Evan/opus module Then, would calculate the root mean square value of each buffer of pcm audio and see if it was higher than fixed amplitude, then modify the volume accordingly This attempt worked in theory but it wasn’t really good, hopefully it gives you some ideas tho

kriskotooBGOP•16mo ago

Yeah after reading that I was thinking if it would be possible to somehow make the base volume a given dB There is not fad- ins or fade outs, it's linear sfxs so that would work to me

Gaming

Programming

Volume normalization

Did you find this page helpful?