I am trying to create a media streaming server in Elixir, with an initial focus on RTMP publishing and playback. I chose Elixir/Erlang because it seemed like a perfect candidate but I seem to be having trouble.
The testing setup is 3 applications, 1 RTMP publisher (3rd party OBS studio), 1 RTMP viewer (VLC), and my Elixir server. Both the publisher and viewer connect to my elixir server over localhost, the publisher sends the elixir server video and audio data and each packet gets relayed off to the viewer, all over TCP. The publisher is currently set to send 2500kbps, and network traffic shows it pretty close to this.
When running the test I notice the video is stuttering a lot. VLC debug messages show it’s receiving frames inconsistently and trying to compensate for it.
After getting help from people in IRC and looking through observer, I think I have pretty much pinpointed the issue to the
:gen_tcp.send() calls being slow, so slow in fact I have observed up to 5-10 seconds just to push out an individual send call.
Since i know Erlang is heavily used in switches I can’t believe that this performance I"m getting is normal. Lowering my video’s bitrate to 500kbps does show smoother playback but I can still tell there is an issue.
For reference, the code I have so far is up at https://github.com/KallDrexx/mmids-temp. Note that this is a temporary repository, I plan to split each of hte apps up into their own repositories, slap an MIT license on them, then upload them to hex once I have this thing stabilized.
Based on diagnostics I coded a 2500kbps video is averaging 200-250 messages per second going from the publisher to the viewing client.
What is the architecture?
The general architecture I have right now is that when any type of client connects I utilize
ranch to spawn a
gen_server. This server receives TCP binary (using
raw flags), attempts to deserialize any RTMP messages contained in it, react to messages that can/should be reacted to, and respond with any responses back to the client. This all occurs within a single
gen_server and no other processes are involved.
For demonstration purposes when a viewing client requests playback I use
pg2 to subscribe to a specific channel for audio and video data. Publishing clients that are publishing a/v data on that same stream key push that data to all subscribed clients. The viewing clients then receive the a/v data, serialize them into RTMP messages, serialize them into binary, then send them off across the network pipe.
What have I tried?
First I tried utilizing
:os.system_time(:milli_seconds) to determine how long any audio/video data packet took from deserializing from the publisher to right before binary serialization of the client. I noticed that it would start out extremely fast and then pauses would occur (long 5-10 second pauses) and then batches of packets would get processed, then another pause, etc…
Then I was reminded about observer, and I loaded it and saw the following graph: https://dl.dropboxusercontent.com/u/6753359/observer1.PNG. The I/O graph told me that while inbound traffic was smooth, outbound was being staggered.
I then opened the process for the server managing the viewing client. I noticed the message queue length was constantly increasing, never decreasing, and the process was constantly stuck in the
In doing some Googling I came across this thread talking about slow
send() performance, and while it didn’t have a definite fix it did mention batching up the binary for the send() call so I wasn’t calling it 200 times every second.
The first thing I tried was to utilize a timer. Instead of calling
send() every message I put the binary in an iodata queue held in the gen_server’s state. I then added
:timer.send_interval(100, :send_queue) to my initialization thinking I could send data once every 100ms.
This did not give any better results outside of managing the message queue better. What I noticed with observer and this timer was odd in that I would keep pressing the refresh hotkey and I would see my queue keep growing for up 5-10 seconds, and then go down to zero again. This repeated over and over, and every refresh it was still stuck on
prim_inet:send/3. This seems to me that send is just taking a ridiculous amount of time. Changing the timer interval up or down did not really help noticably.
The last thing I tried was to stop the interval and send every X times I try to send a message, allowing me to batch messages together but make smaller batches then the interval method caused. This didn’t help by a noticeable amount either, and was worse for managing the message queue.
So what now?
I’m not quite sure how to proceed from here. I can’t believe that sending data via TCP is really that bad for a VM that I hear so many low latency and soft-realtime praise for.
At the end of the day when the final final system is built I am hoping to get 50 inputs sending data to 150 outputs (based on current performance I’ve seen from other third party products). So it’s a bit disconcerning that I can’t even get 1 in 1 out working reliably.
Does anyone have any advice on where I go from here?