Hi all,
I have the following apps:
- an application server that host my Elixir, Phoenix app hosted on Fly.io
- an AI server hosting OpenAI Whisper Large V3 on a dedicated HuggingFace endpoint
In the app, users can record audio and have it transcribed.
I would like that the user’s browser sends chunks of audio data of a configurable duration (for ex. 5 seconds each – later I would like that each chunks overlaps each other based of a configurable duration, for ex. 2 seconds). The idea is that both servers (app and ai) never holds in memory the entirety of the audio + speed up inference by sending smaller audio files.
I know that there are some examples of Phoenix Liveview with Bumblebee (see this one), however:
- I’m not using Bumblebee
- Those examples send the entirety of the data to the AI model
I’ve tried the following in my microphone.js
hook:
const SAMPLING_RATE = 16_000;
const CHUNK_DURATION_IN_MS = 5_000;
const CHUNK_OVERLAP_DURATION_IN_MS = 2_000;
const MIME_TYPE = "audio/ogg;codecs=opus";
const MicrophoneV2 = {
mounted() {
this.mediaRecorder = null;
this.recording = false;
this.el.addEventListener("click", () => {
if (this.isRecording()) {
this.stopRecording();
} else {
this.startRecording();
}
});
},
startRecording() {
this.audioChunks = [];
navigator.mediaDevices.getUserMedia({ audio: true }).then((stream) => {
this.mediaRecorder = new MediaRecorder(stream, { mimeType: MIME_TYPE });
this.mediaRecorder.addEventListener("dataavailable", (event) => {
if (event.data.size > 0) {
this.audioChunks.push(event.data);
this.processChunks();
}
});
this.mediaRecorder.start(CHUNK_DURATION_IN_MS);
this.updateInterval = setInterval(() => {
this.mediaRecorder.requestData();
}, CHUNK_DURATION_IN_MS);
});
},
stopRecording() {
this.mediaRecorder.addEventListener("stop", () => {
this.processChunks();
});
this.mediaRecorder.stop();
clearInterval(this.updateInterval);
},
processChunks() {
if (this.audioChunks.length < 1) return;
const audioBlob = new Blob(this.audioChunks, { type: MIME_TYPE });
audioBlob
.arrayBuffer()
.then((buffer) => {
const context = new AudioContext({ sampleRate: SAMPLING_RATE });
context.decodeAudioData(buffer, (audioBuffer) => {
const pcmBuffer = this.audioBufferToPcm(audioBuffer);
const buffer = this.convertEndianness32(
pcmBuffer,
this.getEndianness(),
this.el.dataset.endianness
);
this.upload("audio", [new Blob([buffer])]);
});
})
.then(() => {
this.audioChunks = [];
})
.catch((error) => {
console.error("Error decoding audio data", error);
});
},
isRecording() {
return this.mediaRecorder && this.mediaRecorder.state === "recording";
},
audioBufferToPcm(audioBuffer) {
const numChannels = audioBuffer.numberOfChannels;
const length = audioBuffer.length;
const size = Float32Array.BYTES_PER_ELEMENT * length;
const buffer = new ArrayBuffer(size);
const pcmArray = new Float32Array(buffer);
const channelDataBuffers = Array.from(
{ length: numChannels },
(x, channel) => audioBuffer.getChannelData(channel)
);
// Average all channels upfront, so the PCM is always mono
for (let i = 0; i < pcmArray.length; i++) {
let sum = 0;
for (let channel = 0; channel < numChannels; channel++) {
sum += channelDataBuffers[channel][i];
}
pcmArray[i] = sum / numChannels;
}
return buffer;
},
convertEndianness32(buffer, from, to) {
if (from === to) return buffer;
// If the endianness differs, we swap bytes accordingly
for (let i = 0; i < buffer.byteLength / 4; i++) {
const b1 = buffer[i];
const b2 = buffer[i + 1];
const b3 = buffer[i + 2];
const b4 = buffer[i + 3];
buffer[i] = b4;
buffer[i + 1] = b3;
buffer[i + 2] = b2;
buffer[i + 3] = b1;
}
return buffer;
},
getEndianness() {
const buffer = new ArrayBuffer(2);
const int16Array = new Uint16Array(buffer);
const int8Array = new Uint8Array(buffer);
int16Array[0] = 1;
if (int8Array[0] === 1) {
return "little";
} else {
return "big";
}
},
};
export { MicrophoneV2 };
The problem is in the processChunks
method, when I empty the audioChunks like so:
this.audioChunks = [];
The audioContext.decodeAudioData
doesn’t work on partial content (seen here); it only works with the entirety of the audio:
Uncaught (in promise) DOMException: The buffer passed to decodeAudioData contains invalid content which cannot be decoded successfully.
I’ve attempted this but to no avail and I would like to avoid complicated binary handling.
How to send a bunch of audio chunks from the browser to the Phoenix server?