OpenAI Realtime Integration with Membrane WebRTC

hubertlepicki · February 11, 2025, 11:28am

I am interested in using Membrane Framework as a proxy for OpenAI Realtime API and this livebook provided in the examples does almost precisely what I need.

What I am struggling to understand is how this would work in actual production environment.

In particular, I noticed that the pipeline initiated here:

{:ok, _supervisor, pipeline} =
  Membrane.Pipeline.start_link(OpenAIPipeline,
    openai_ws_opts: openai_ws_opts,
    webrtc_source_ws_port: 8829,
    webrtc_sink_ws_port: 8831
  )

Accepts only one connection, and when this connection drops, it shuts down with:

    (ex_webrtc 0.4.1) lib/ex_webrtc/dtls_transport.ex:313: ExWebRTC.DTLSTransport.handle_ice_data/2
    (ex_webrtc 0.4.1) lib/ex_webrtc/dtls_transport.ex:287: ExWebRTC.DTLSTransport.handle_info/2
    (stdlib 6.0.1) gen_server.erl:2173: :gen_server.try_handle_info/3
    (stdlib 6.0.1) gen_server.erl:2261: :gen_server.handle_msg/6
    (stdlib 6.0.1) proc_lib.erl:329: :proc_lib.init_p_do_apply/3
Last message: {:ex_ice, #PID<0.992.0>, {:data, <<21, 254, 253, 0, 1, 0, 0, 0, 0, 0, 1, 0, 26, 0, 1, 0, 0, 0, 0, 0, 1, 174, 202, 11, 139, 249, 54, 50, 51, 36, 148, 29, 72, 144, 63, 215, 19, 63, 224>>}}

12:18:34.765 [error] GenServer #PID<0.1051.0> terminating
** (stop) {[], []}
Last message: {:ex_ice, #PID<0.1050.0>, {:data, <<21, 254, 253, 0, 1, 0, 0, 0, 0, 0, 1, 0, 18, 11, 40, 203, 221, 56, 144, 220, 98, 1, 229, 33, 223, 76, 248, 36, 232, 117, 65>>}}

12:18:34.773 [error] <0.325.0>/:webrtc_sink/ Terminating with reason: {:membrane_child_crash, :webrtc, {:timeout_value, [{:gen_server, :loop, 7, [file: ~c"gen_server.erl", line: 2078]}, {ExDTLS, :handle_data, 2, [file: ~c"lib/ex_dtls.ex", line: 168]}, {ExWebRTC.DTLSTransport, :handle_ice_data, 2, [file: ~c"lib/ex_webrtc/dtls_transport.ex", line: 313]}, {ExWebRTC.DTLSTransport, :handle_info, 2, [file: ~c"lib/ex_webrtc/dtls_transport.ex", line: 287]}, {:gen_server, :try_handle_info, 3, [file: ~c"gen_server.erl", line: 2173]}, {:gen_server, :handle_msg, 6, [file: ~c"gen_server.erl", line: 2261]}, {:proc_lib, :init_p_do_apply, 3, [file: ~c"proc_lib.erl", line: 329]}]}}

12:18:34.773 [error] <0.325.0>/ Terminating with reason: {:membrane_child_crash, :webrtc_sink, {:membrane_child_crash, :webrtc, {:timeout_value, [{:gen_server, :loop, 7, [file: ~c"gen_server.erl", line: 2078]}, {ExDTLS, :handle_data, 2, [file: ~c"lib/ex_dtls.ex", line: 168]}, {ExWebRTC.DTLSTransport, :handle_ice_data, 2, [file: ~c"lib/ex_webrtc/dtls_transport.ex", line: 313]}, {ExWebRTC.DTLSTransport, :handle_info, 2, [file: ~c"lib/ex_webrtc/dtls_transport.ex", line: 287]}, {:gen_server, :try_handle_info, 3, [file: ~c"gen_server.erl", line: 2173]}, {:gen_server, :handle_msg, 6, [file: ~c"gen_server.erl", line: 2261]}, {:proc_lib, :init_p_do_apply, 3, [file: ~c"proc_lib.erl", line: 329]}]}}}

Is this the expected behavior?

If yes, then how would I use that in production? Should I reserve a bunch of ports, and start a pool of pipelines and assign my clients to them?

What about authentication of users, I guess I would need to generate the token that they would pass in the websocket URL but how do I hook up the code server-side to validate it?

Any hints much appreciated <3

Feliks · February 14, 2025, 1:01pm

Hi Hubert!
You are right, one Membrane Pipeline is able to handle only one client at the time, so you would have to reserve a bunch of ports, to be able to handle multiple clients at the time.
But, I am currently working on LiveView components that will be able to exchange WebRTC signaling messages via LiveView WebSocket, so using them should remove the need of reserving many ports. I can let you know, when they will be ready.

BTW, we also developed similar demo that uses Boombox

This demo works even better, because it allows you to interrupt the chat, while it speaks. Boombox spawns Membrane Pipeline under the hood, so it if Membrane Pipeline works well in your case, Boombox should too, and it has simpler API.

I think that using LiveView components that I have mentioned would also solve the problem with authentication, since the authentication provided by Phoenix application makes the websocket used by LiveView secure

hubertlepicki · February 14, 2025, 2:02pm

Hmm I got some error when I tried to run the Boombox example. I’ll report the details when I’m again in front of that machine that I tried it on. But thanks!

nefty · February 23, 2025, 4:00pm

I am also making a similar project and had the same question as you. I managed to get it working with Membrane WebRTC plugin and Membrane.WebRTC.SignalingChannel. I use the SignalingChannel to pass the messages to the LiveView, which then sends the messages to the JS Hook as events. Here is a simple repo showing a LiveView version of Echo, which just receives the audio and video from the client and sends it back unchanged, using a LiveView and Membrane WebRTC. You can add any Membrane elements in the middle of the pipeline to communicate with OpenAI, etc.
The repo is messy still but it should give you an idea. This can obviously be extended to mulitple listeners with PubSub, like in the LiveBroadcaster example, but I haven’t done that yet. It would just be a matter of publishing the SignalingChannel messages to the right PubSub channel, and your LiveViews listening on that channel.
Here is the repo:

If you have any questions, let me know and I’ll try to explain, but I am still learning Membrane and ExWebRTC so I will do my best. Here is a simple diagram of the architecture as wellL

druyang · March 17, 2025, 2:01am

I don’t mean to pester, but this seems like a really useful project to enable membrane for liveview. Are you still working on it?

druyang · March 17, 2025, 2:01am

Still learning RTC and membrane, so forgive my ignorance. Would the liveview app be responsible for “routing” the ExWebRTC messages to the “correct” membrane pipeline?

For example, if each group of users has an exclusive openAI realtime connection, the liveview server would be responsible for creating and publishing to the respective membrane pipeline instance for each group?

Feliks · March 17, 2025, 10:27am

Yes, it would be liveview’s responsibility

Feliks · March 17, 2025, 10:35am

It is available now there: GitHub - membraneframework/membrane_webrtc_live at implement-webrtc-components

Feel free to use this branch as a dependency in your project, the code there should already work fine. Currently I am waiting on the review from a person that is now on the holidays, so IDK when I will merge it.

Probably in the long run these liveview components will be moved to :membrane_webrtc_plugin package, so if you discover one day that this repository doesn’t exist anymore, look for new version of membrane_webrtc_plugin or at least its repository in membraneframework organisation on GitHub.

In case of questions feel free to ask

BTW LiveViews from the linked repo solve the problem of which WebRTC signaling messages should be sent to which Membrane Elements

Feliks · March 18, 2025, 1:23pm

@druyang I have merged the PR so the linked branch is deleted now

Feliks · March 25, 2025, 3:24pm

@druyang @hubertlepicki LiveViews for Membrane WebRTC have been released in new version of membrane_webrtc_plugin

https://hexdocs.pm/membrane_webrtc_plugin/Membrane.WebRTC.Live.Capture.html
https://hexdocs.pm/membrane_webrtc_plugin/Membrane.WebRTC.Live.Player.html

These modules depend on phoenix and phoenx_live_view which are optional dependencies in membrane_webrtc_plugin, so you have to add these two deps to your project.

In case of any issues, please let me know.

hubertlepicki · March 25, 2025, 3:58pm

I will test these but probably only next week. Been snowed under work this week, sorry guys