Thursday, 9 May 2019

Set proper buffer size for PCM audio stream in node.js

I want to mix live PCM audio data where new audio streams might be created at any time and existing ones might close at any time. No stream is synched, they just start whenever and end whenever. My output is supposed to be a PCM-encoded audio file that mixes all these streams with the correct amount of silence in between the samples.

As a disclaimer: I am totally new to stream programming in node.js and only have a base understanding of how to work with PCM data in audio streams. Please correct me if my whole foundation is totally wrong.

The full source code to my current naive implementation is available on GitHub, but I'm also trying to outline what I think are the relevant parts to my issue here.

A single readable stream is supposed to generate the output PCM stream. Every time whatever this stream is piped to tries to read n amount of bytes from it, it asks all of its inputs (individual writable streams for each PCM stream) for n bytes of PCM data. The inputs then return either n bytes of their buffered audio data, or a mix of buffered audio data plus some amount of silence if they don't have enough.

The issue is that I tested this with the node speaker package, which allows me to pipe directly to my speakers. My Readable's _read method receives the amount of bytes that is requested. My speakers (or their drivers?) only request as much data as they need since they don't buffer anything. Therefore the amount of data requested matches exactly the amount of data coming in for that sampling rate.

When I try to save data to a file however (after mp3 encoding it) the file write stream calls _read way more often and with way more requested data than the speaker does. Since I fill up any excess data with silence, this results in a file as big as can be written in that amount of time with pretty much pure silence. In fact, as much as I could skim through, I couldn't hear anything at all.

export default class Input extends Writable {

    readSamples (size, time) {
        this.lastRead = time

        // If our buffer is smaller than what's requested, fill it up with silence
        if (this.buffer.length < size) {
            let drainedBuffer = Buffer.concat([this.buffer, this.silence(size - this.buffer.length)])
            this.buffer = this.buffer.slice(this.buffer.length)

            return drainedBuffer
        }

        // Unshift the first _size_ elements from the buffer
        let buffer = this.buffer.slice(0, size)
        this.buffer = this.buffer.slice(size)

        return buffer
    }

    _write (chunk, encoding, next) {
        // Calculate how many samples we should be receiving by now
        let timeDifference = process.hrtime(this.lastRead)
        let timeDifferenceInNs = timeDifference[0] * NS_PER_SEC + timeDifference[1]

        const channels = 2
        const samplingRate = 44100

        let samplesInChunk = chunk.length / channels
        let samplesRequired = Math.floor(timeDifferenceInNs / NS_PER_SEC * samplingRate)

        if (samplesInChunk < samplesRequired) {
            this.buffer = Buffer.concat([this.buffer, this.silence(samplesRequired - samplesInChunk)])
        }

        this.buffer = Buffer.concat([this.buffer, chunk])

        next()
    }

}

.

class Mixer extends Readable {

    _read (size) {
        if (typeof size === 'undefined') {
            // Calculate the number of samples that should be requested
            // if size is not specified.

            let timeSinceLastRead = process.hrtime(this.lastReadTime)

            let nanosecondsSinceLastRead = timeSinceLastRead[0] * NS_PER_SEC + timeSinceLastRead[1]
            let samples = nanosecondsSinceLastRead / NS_PER_SEC * this.options.samplingRate

            size = samples
        }

        this.lastReadTime = process.hrtime()

        // this.inputs also includes an input that only
        // emits silence. This way even when no other inputs are
        // connected, there's still some silent data coming through
        // for proper timing

        let buffers = this.inputs.map(input => {
            return input.readSamples(size, this.lastReadTime)
        })

        let mixedBuffer = this.mixingFunction(buffers)
        this.push(mixedBuffer)
    }

}


My questions now:

  • How do I properly buffer the data and only send as much data as is present (plus silence), instead of relying on how much data is requested from the stream target?
  • Is it the correct approach to buffer the input data within the Input class as it becomes available and only return that when readSamples is called? How do I make sure that the timing of Mixer calling readSamples coincides with the audio sources writing their data to the input and proper input always being available?

Looking at this code while writing this I discovered one thing that I will need to account for as well: In the input, silence only ever needs to be added at the beginning when receiving data through _write to make for a correct starting offset relative to other inputs. If this input's PCM stream ever goes silent, it will stream PCM encoded silence too, so no need to artificially add silence in the end.



from Set proper buffer size for PCM audio stream in node.js

No comments:

Post a Comment