Where is the state of a WebSocket kept? What object represents it?

mikejm · October 13, 2024, 4:20am

If you configure a WebSocket like shown below, where is the state for that WebSocket held? What object represents the WebSocket? How can you invoke its functions for that connection?

The given simple demo Hex WebSocket code is:

Mix.install([:bandit, :websock_adapter])

defmodule EchoServer do
  def init(options) do
    {:ok, options}
  end

  def handle_in({"ping", [opcode: :text]}, state) do
    {:reply, :ok, {:text, "pong"}, state}
  end

  def terminate(:timeout, state) do
    {:ok, state}
  end
end

defmodule Router do
  use Plug.Router

  plug Plug.Logger
  plug :match
  plug :dispatch

  get "/" do
    send_resp(conn, 200, """
    Use the JavaScript console to interact using websockets

    sock  = new WebSocket("ws://localhost:4000/websocket")
    sock.addEventListener("message", console.log)
    sock.addEventListener("open", () => sock.send("ping"))
    """)
  end

  get "/websocket" do
    conn
    |> WebSockAdapter.upgrade(EchoServer, [], timeout: 60_000)
    |> halt()
  end

  match _ do
    send_resp(conn, 404, "not found")
  end
end

require Logger
webserver = {Bandit, plug: Router, scheme: :http, port: 4000}
{:ok, _} = Supervisor.start_link([webserver], strategy: :one_for_one)
Logger.info("Plug now running on localhost:4000")
Process.sleep(:infinity)

I assumed from that code that EchoServer is the object containing the state of the websocket connection, and thus one EchoServer is made per connected client. And you will just need a reference to that EchoServer to invoke its functions (somehow). But I was told this is not the case.

Somewhere the WebSocket must be managed and represented with state and as an object. So where is it if not there?

Or is it the conn which holds the connection state (and is upgraded to include EchoServer functions)? Is it conn’s PID we need then to keep? And if so, can we invoke its EchoServer functions after it is upgraded to this?

handle_in takes state as an argument, so is this not representing a GenServer-like unit with its own state?

I wrote a longer question along these lines here but perhaps that was too verbose. The question really just boils down to which class contains the socket and how to invoke its functions externally on demand (eg. like sending the user a text message through the socket). Thanks for any help.

kokolegorille · October 13, 2024, 7:37am

You do not have object in the BEAM. You should not try to think in OOP, it will slow You down

EchoServer is just a module and has no reference, not really

Yes exactly, it is the state, represented as a struct. It has no internal methods, but Module contains functions to apply to…

You don’t do conn.do_something() but Module.do_something(conn)

This allows… live reloading of the system. Just think about reloading an object

The EchoServer is just a module, but it has the shape of a GenServer

You have data, You have modules that contains functions to apply on data, and You have processes

Processes are the closest to what You might think of an object. They are similar to actors, and exchange messages, but they are live, and they have a pid to uniquely identify them

They are the things that make the BEAM so unique, until You understand what a process is, You will have a lot of questions about why it is done like this

cmo · October 13, 2024, 7:59am

You can’t go round using offensive terms such as “object” and “class” on the forum for a functional language .

Think of processes rather than objects. Lots of processes are implemented as behaviours other than GenServer, but they often follow a similar style. Something start or start_links it, its init is called and returns the state. Then, it is basically a recursive function waiting for messages that calls itself with possibly updated state.

That WebSock and WebSockAdapter stuff is upgrading a conn. I’m not sure if it is creating a new process or using the existing one.

If you want someone to know your PID, they either need to have started you, you send it to them, or you register it somewhere (e.g. Registry).

I’d read the book Elixir in Action if I were you. It explains all this and more extremely well.

benwilson512 · October 13, 2024, 1:30pm

Your question here and in your other websocket thread I think belies a common misunderstanding of how state is managed in Elixir. A GenServer is not some special construct that enables state. Any process that loops on itself can use one of the arguments of the loop as the state holding value.

To illustrate this, take a very simple ping pong genserver:

defmodule PingPong do
  use GenServer
  
  def init(_) do
    {:ok, %{count: 0}}
  end
  
  def handle_call(:ping, _, state) do
    state = %{state | count: state.count + 1}
    {:reply, "pong #{state.count}", state}
  end
end

iex(4)> {:ok, pid1} = GenServer.start_link(PingPong, [])
{:ok, #PID<0.127.0>}
iex(5)> pid1 |> GenServer.call(:ping)
"pong 1"
iex(6)> pid1 |> GenServer.call(:ping)
"pong 2"
iex(7)> pid1 |> GenServer.call(:ping)
"pong 3"

We have some basic state that we are holding and incrementing, which impacts our replies.

Let’s build basically the same thing from scratch:

defmodule PingPongBasic do
  def start_link() do
    spawn_link(fn ->
      loop(%{count: 0})
    end)
  end
  
  def call(target_pid, msg) do
    send(target_pid, {:call, msg, self()})
    
    receive do
      {:reply, reply} ->
        reply
    after
      5000 ->
        :timeout
    end
  end
  
  def loop(state) do
    receive do
      {:call, :ping, from_pid} ->
        send(from_pid, {:reply, "pong #{state.count}"})
        state = %{state | count: state.count + 1}
        loop(state)
    end
  end
end

iex(9)> pid2 = PingPongBasic.start_link
#PID<0.135.0>
iex(10)> PingPongBasic.call(pid2, :ping)
"pong 0"
iex(11)> PingPongBasic.call(pid2, :ping)
"pong 1"
iex(12)> PingPongBasic.call(pid2, :ping)
"pong 2"

The key part here is loop and noticing how the “state” is literally just a recursive loop where we pass in an argument over and over every iteration. Each iteration we can pass in some transformation of that value.

At its core a GenServer has literally this exact same loop. What makes a GenServer “special” is really just that it complies with a bunch of conventions about how you send messages to and get replies from these processes. Ultimately then ANY process that performs this infinite loop / receive / send pattern acts like a mutable state holder.

Now in your EchoServer of course you don’t see any looping, but you’re passing the EchoServer module TO WebSockAdapter and way deep down inside of all of that code, is one of those loops. Let me take my toy example earlier and show you what I mean:

defmodule GenServerBasic do
  def start_link(handler, initial_state) do
    spawn_link(fn ->
      loop(handler, initial_state)
    end)
  end
  
  def call(target_pid, msg) do
    send(target_pid, {:call, msg, self()})
    
    receive do
      {:reply, reply} ->
        reply
    after
      5000 ->
        :timeout
    end
  end
  
  def loop(handler, state) do
    receive do
      {:call, msg, from_pid} ->
        {:reply, msg, new_state} = handler.handle_call(msg, state)
        send(from_pid, {:reply, msg})
        loop(handler, new_state)
    end
  end
end


defmodule PingPongBasic2 do
  def start_link() do
    GenServerBasic.start_link(__MODULE__, %{count: 0})
  end
  
  def handle_call(:ping, state) do
    state = %{state | count: state.count + 1}
    {:reply, "pong #{state.count}", state}
  end
end

iex(17)> pid3 = GenServerBasic.start_link(PingPongBasic2, %{count: 0})
#PID<0.146.0>
iex(18)> GenServerBasic.call(pid3, :ping)
"pong 1"
iex(19)> GenServerBasic.call(pid3, :ping)
"pong 2"
iex(20)> GenServerBasic.call(pid3, :ping)
"pong 3"

What I’ve done here is basically extract all of this looping boilerplate into its own module, and then you have a very simple PingPongBasic2 which captures the actual program logic that matters. It’s very important to understand that when I’m passing in PingPongBasic2 this isn’t some singleton state value but more like a function pointer. You can call start_link as many times as you like and they’re all different processes doing their own individual loops.

Throw some |> dbg calls in there and check it out!

It is held in the loop of the process of that websocket. You don’t see that loop because the websocket code has extracted that out for you, and just gives you the current value each iteration when it gets a message.

PS: You can checkout a GenServer’s loop here otp/lib/stdlib/src/gen_server.erl at 6d4731b9b5c70686bfaea67f2cd6f4f912e6da06 · erlang/otp · GitHub

sodapopcan · October 13, 2024, 2:00pm

It’s already been said in the above very thorough answers, but just wanted to add how I phrase it to people which is that modules and process really have nothing to do with each other. A process runs code that happens to be stored in a module or any number of modules. A process is its own concept that facilitates running code but doesn’t own any of the code itself. We often set up a “link” between a module and a process, but that is simply a “user space” pattern and not something dictated by the VM.

mikejm · October 14, 2024, 2:00am

For background, I’ve read Elixir in Action (have it next to me), Designing Elixir Systems with OTP, Concurrent Data Processing in Elixir, and I have (though didn’t read) Programming Elixir 1.6 (but nothing in index regarding anything like this for that one). I have also read loads of online tutorials and Hex documents, and the ElixirSchool website.

No explanation I can see in any of that for how to do what I’m describing, which is just manage a websocket connection for a user in a meaningful way.

By contrast, I’ve never read a single Javascript or C++ or C# book in my life. Yet I can easily code in all of them. I have never programmed Rust. But with a few hours last night I set up some Rust code that works from Elixir using Rustler. All easy.

I believe the main reason Elixir has such low uptake in the general coding community is precisely this issue. To counter that, this forum is honestly fantastic and everyone here is super helpful and positive, which helps.

Progress So Far
A user makes a websocket connection. But there is no object to represent the websocket connection as per @cmo and @benwilson512 - just a recursive function looping over and over to persist it and act on received messages from it. That was a good explanation.

The custom functions for this recursion are defined in EchoServer. The rest are hidden out of reach. State can thus exist for this loop because it keeps getting passed back into the loop as an argument. Same as a GenServer. Okay sure.

The Question
So to the question - If we want to add a message into this websocket EchoServer loop (eg. a command to: “Send user a message that says “Hello””), from somewhere else, how do we do this? Only that function loop can send the user a websocket message. So how do we tell the system from somewhere else to make it do so?

Global MailBox?
The only idea I can think of then if we can set into the websocket loop’s state on creation the user name for that connection, then every user’s websocket loop as specified by EchoServer can continuously check a data repository somewhere for commands to perform per user.

Ie. Make an :ets table like “messages per user” which serves as a node mail repository. Then every user’s websocket loop (somewhere in EchoServer) constantly checks that :ets table (or adds to it) for any messages it must send to or receive for any given user.

Is this what we are supposed to do? Why doesn’t anywhere explain that if so (or does something explain that somewhere I missed)? Which function in the EchoServer loop is the recursion where we’d want to do this “check for mail” function?

The Code
We create the websocket with:

  get "/websocket" do
    conn
    |> WebSockAdapter.upgrade(EchoServer, [], timeout: 60_000) 
    #SET USER NAME INTO SECOND ARGUMENT WHICH IS INIT STATE PRESUMABLY
    |> halt()
  end

Then for EchoServer we have:

defmodule EchoServer do

  #SET USER NAME INTO STATE 'OPTIONS' HERE
  def init(options) do
    {:ok, options}
  end

  #CHANGE THIS TO A GLOBAL HANDLE_IN FUNCTION?
  #IS "HANDLE_IN" THE LOOP THAT IS RUNNING CONSTANTLY HERE ON THE SOCKET?
  #CHECK THE 'GENERAL MAILBOX' ETS TABLE HERE TO RESPOND TO OR SEND MSG THEN AS PART OF THIS LOOP?
  def handle_in({"ping", [opcode: :text]}, state) do
    {:reply, :ok, {:text, "pong"}, state}
  end

  def terminate(:timeout, state) do
    {:ok, state}
  end
end

Are my annotations in that code correct? Is that the general idea? Any further help is greatly appreciated.

mikejm · October 14, 2024, 3:11am

I found a good page with some further examples that seem to illustrate this is the roughly correct track.

https://kobrakai.de/kolumne/bare-websockets

This would be a replacement for EchoServer in the prior example:

defmodule MyAppWeb.ConnectionTimer do
  use MyAppWeb, :verified_routes
  @behaviour WebSock

  @impl true
  def init(%{path_params: %{"name" => name}}) do
    path = ~p"/ws/connection_timer/#{name}"
    schedule_alert()
    {:ok, %{start: now(), path: path}}
  end

  @impl true
  def handle_in({"request_timer", opcode: :text}, state) do
    {:push, {:text, "Connected to #{state.path} for #{diff(state.start)}s."}, state}
  end

  def handle_in(_, state) do
    {:ok, state}
  end

  @impl true
  def handle_info(:alert, state) do
    schedule_alert()
    {:push, {:text, "Alert for #{state.path} after #{diff(state.start)}s."}, state}
  end

  def handle_info(_, state) do
    {:ok, state}
  end

  defp now, do: System.monotonic_time()
  defp schedule_alert, do: Process.send_after(self(), :alert, :timer.seconds(15))
  defp diff(start), do: System.convert_time_unit(now() - start, :native, :second)
end

The differences are:

he handles the upgrade with a module to do the whole upgrade called MyAppWeb.WebsocketUpgrade (vs. in the EchoServer example, the upgrade is done briefly in the router)
MyAppWeb.ConnectionTimer replaces EchoServer, with added initial state and a timer function to occur regularly.

There are I believe four functions (maybe more? I dunno) in WebSock

init - take in initial state on upgrade
handle_in - responds to client side messages sent to server
handle_info - responds to processes like the defp schedule_alert, do: Process.send_after(self(), :alert, :timer.seconds(15)) he uses there to send recurrent messages back through
terminate - called on loss of connection for whatever reason

So basically I believe I am roughly correct, with my description. Checks to the “messages” table hypothetically I described for inter-socket communication (or socket-to-other-system communication) could occur on the timer he creates there. And he even describes Phoenix Channels as working this way roughly there. A similar explanation of Phoenix is here.

I do think it is bizarre that not one of the four Elixir books I own nor none of the official documentation provides an explanation for this. I think the formal Elixir community is shooting themselves in the foot by not giving more examples like this to make it clear how simple things should work.

Why is there only one example on the entire internet, from a German programmer’s blog in April 2024, of how to configure a basic websocket like this?

It would take no time to understand if adequate info was out there. The problem is the info is (besides this one page I just found now) almost nowhere to be found.

cmo · October 14, 2024, 3:21am

Think of handle_in as something like an event handler. Define a handle_in for each type of message you’re going to send to the process.

To send messages to that process, you can register its PID under the user’s id with something like Registry or create a PubSub topic and subscribe to it, e.g. subscribe("users:#{user_id}"), when the websocket starts (usually in init).

IIRC, all this process state, messaging and registration is covered pretty well in Elixir in Action. You build a lot of the common parts of OTP/Elixir to learn how they work. The BEAM is a different way of doing things to C# etc, so you need to find the thing that’s going to make it click for you. I’d encourage you to play around with observer so you can see how things are configured in a running system.

mikejm · October 14, 2024, 3:53am

Yeah thanks @cmo. I went for a walk after posting that and realized while walking the solution as you said is to rather than polling an ets message dump table on timer, you can store the pid of the EchoServer type module in init with self() into a username & pid registry like Syn and then send messages as needed by looking other users up there on handle_in or handle_info, just like the timer function does here.

I think you are correct in that most users I have encountered on this forum seem to believe the Elixir documentation is adequate. I think that is part of the problem. I disagree completely as I have said numerous times on here. Of all the languages I have learned it is by far the worst documented.

Documentation is not simply a matter of “this function takes this argument.”

Just as important you need to ask: What are the 20-100 most common tasks a person would want to accomplish in a global sense with this language? And do we have clear examples of how to do that in this language?

That German blog post explains it in under 100 lines of code. But prior to April 2024 so far as I can tell there are zero examples anywhere on the Internet demonstrating this function. Even the top Elixir books provide no examples of how to track and use websockets like this. And it is one of the most critical things anyone would want to learn in Elixir.

cmo · October 14, 2024, 4:35am

There are a lot of examples in Elixir documentation, much more than a lot of languages (Go, Rust, Zig, F#, Odin…). C# docs are great and MDN is excellent for web stuff. Do the ASP.NET docs show you how to setup a websocket server? The .NET docs failed at teaching me how to run a multi thread application. Seemed like a simple request to me.

With Elixir docs, sometimes it is a matter of looking in the right place, which can be frustrating, and some libraries do lack good or enough examples. But you always have the source code linked from the docs, which is invaluable, and then you can read the tests to see how things work.

And, if you think the docs could be better, you should open a pull request to improve them. It can be hard for the author of the library to know what it is like for someone new to the topic because they know it so well.

benwilson512 · October 14, 2024, 2:40pm

I don’t want to downplay your struggle here but I do find it pretty confusing if I’m honest. I wonder if in part it’s because you’re picking this sort of “middle tier” between a complete from scratch solution and a completely off the shelf solution.

The stuff you are asking to do around tracking connections and sending messages is exactly what Phoenix channels and Phoenix pubsub accomplish, and there are endless tutorials in any format you could think of for how to use those tools to do these things. In under an hour you’ll have the whole thing.

josevalim · October 15, 2024, 6:29am

Some examples are:

And I am linking only to books. You will find courses, talks, articles, etc on a huge amount of topics Elixir is used for, including WebSockets.

As @benwilson512 pointed out, the dissonance is that you are rolling your own stack, while the huge majority of community members would pick an off the shelf solution, such as Phoenix Channels, which tackles many concerns for you, such as fault tolerance, multiplexing, and distributed pubsub, and are better documented. It is equivalent to asking the C# community for docs on how to implement SignalR instead of on how to use SignalR. The latter will be plenty more available than the former.

Your first comment mentions objects, classes, etc, and none of those are actual concepts in Elixir. We also likely wouldn’t approach problems in terms of a global queue either. We completely understand the learning curve can be steep, especially if you are jumping under the layers, but the benefit is that you will come out with a whole different understanding on how to implement those systems. You have gotten plenty of replies and help. Be patient with others, and especially yourself.

mikejm · October 21, 2024, 4:52am

Thanks. I think you are correct. I also appreciated your explanation of how the conn/Websock holds state without being a “process” or object. Ie. it is just looping on itself.

However, I would like to not use Phoenix channels or PubSub as I would like to understand how to actually communicate between these different things manually. I wish to create my own system for doing so for a variety of reasons.

I have been playing with WebSock and other things. I can understand now I believe how to communicate from a WebSock out in response to user input to any given Elixir process. Eg. One can create a GenServer per conn/Websock and store its PID into the state of that, then use Process.send to this PID as needed under handle_in. Presto. You have client → Any service in Elixir messaging.

However, I still don’t understand how to do the opposite. Ie. I don’t know how any given Elixir process can utilize a given active WebSock/conn and send the client a message on the Elixir process’s own initiative.

The Websock/conn is not a process and has no PID so can’t accept messages. It is not an object so no reference to it exists. But there is some process (Supervisor) running it, right? So is that where we need to go? Create and employ a custom supervisor for the Websocket/conn with custom message handling perhaps?

It appears that is what is happening in the timer example here where he says:

  defp schedule_alert, do: Process.send_after(self(), :alert, :timer.seconds(15))

What does self() reference here? As far as I can tell it is the supervisor that is running the Websock/conn, since the websock/conn itself has no PID to interface with processes.

I made a new thread here to more specifically ask that as my understanding is now a bit better.

Any help on this (here or there) would be very, very, very appreciated.

I think it’s the only major thing I still don’t know how to do in Elixir in order to build out my whole system. Obviously there is some way to do this, as Phoenix does it. So what is the basic trick?

Also sorry for venting my frustration. I am getting there in any case.

BartOtten · October 21, 2024, 4:49pm

You are listing a few mutable Object Oriented languages as easy. Which is true once you learned one OO language first; they share the same concepts so you have no “wrong ideas” when writing code in them.

Elixir is your first encounter with the concepts it represents: immutability and functional programming. Your knowledge now collides with these other concepts. You think of a dead Object while you should think of a live Process. However, once you have learned them other languages and frameworks (JVM Akka) using those concepts come easy.

It’s just a matter of “learned first” which concepts cause a collision.