Erlang/Elixir Interactive Shell jobs get stuck when printing to output

So I have this little problem with both Erlang and Elixir interactive shells. Which probably means I do not understand something basic and it is intended behavior. In short:

  1. I start outputting something to the console from default job
  2. I start another job (Ctrl + G, then s 'Elixir.IEx' and c 2
  3. I go back to initial job Ctrl + G then c 1

The job numer 1 is stuck. Does not output anything. I can interrupt it, however, and then it works again, until I switch context to another job.

I have tried this on Elixir 1.5 @ Erlang 19.3 and 20.0, and on both these Erlang versions with Erlang code (io:format instead of IO.puts). The result is always the same: when I switch back to a job that attempted to output something to the shell while it was in the background, this job is stuck.

Is it stuck, or does it just not print something anymore? Can you add a counter to verify?

It’s stuck. Does not print anymore.

How do you know it is stuck when it doesn’t print anymore? Perhaps its still writing, but its not printed on your screen but to /dev/null or something similar instead? Have you tried writing to a file instead to ensure it is really stuck? Have you checked using etop or observer if the process is still alive?

I know it’s stuck and not printing because the original example I had involved sending and receiving a message. I wanted to send a message to job 1, and print the message to the console. This works, when job 1 is currently focused, and job 2 (the one that does send) is in background.

If I send a message from job 2, to job 1, while job 1 is in background, it is stuck.

I originally thought the problem was related to receiving messages, but I managed to narrow it down to actual printing message, then I realized it is the same problem in both Erlang io:format and Elixir IO.puts/inspect

So as I understand you, everything is working fine and the processes do not have any trouble running in the backround until they try to write something to stdout? Or to any file?

This really seems to be a bug in erl then…

I wonder if it’s related to [erlang-questions] “New” vs. “old” console behavior: bug or feature?

The user.erl process forwards that data to the shell. The shell
attempts to evaluate it, and if there’s not enough data, it asks for
more. user.erl then blocks until it can get more data to respond to the
io request.

It sounds like the user process can get stuck in a selective receive - so if it’s responsible for anything else that would be blocked too.

REPL? A bit more (and less) than that

PS: Though it sounds like the behaviour of the old shell - not the current one with user_drv (though according to observer user is running).

Seems the first shell’s group leader is getting stuck here as soon as you try to connect back to it.

iex(1)> spid = self()
#PID<0.84.0>
iex(2)> {:group_leader, leader_1} = Process.info spid, :group_leader
{:group_leader, #PID<0.53.0>}
iex(3)> {:group_leader, leader} = Process.info leader_1, :group_leader
{:group_leader, #PID<0.35.0>}
iex(4)> {:group_leader, ^leader} = Process.info leader, :group_leader
{:group_leader, #PID<0.35.0>}
iex(5)> defmodule Foo do
...(5)>   def loop do
...(5)>     IO.puts "Looping..."             
...(5)>     :timer.sleep(1000)
...(5)>     loop()
...(5)>   end
...(5)> end
{:module, Foo,
 <<70, 79, 82, 49, 0, 0, 4, 8, 66, 69, 65, 77, 65, 116, 85, 56, 0, 0, 0, 108, 0,
   0, 0, 12, 10, 69, 108, 105, 120, 105, 114, 46, 70, 111, 111, 8, 95, 95, 105,
   110, 102, 111, 95, 95, 9, 102, 117, ...>>, {:loop, 0}}
iex(6)> Foo.loop()
Looping...
Looping...
Looping...
Looping...
Looping...

User switch command
 --> s 'Elixir.IEx'
 --> c 2
Interactive Elixir (1.5.1) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> leader_1 = pid("0.53.0")
#PID<0.53.0>
iex(2)> Process.info leader_1, :message_queue_len
{:message_queue_len, 0}
iex(3)> Process.info leader_1, :messages
{:messages, []}
iex(4)> Process.info leader_1, :current_location
{:current_location, {:group, :server_loop, 3, [file: 'group.erl', line: 113]}}
iex(5)> 
User switch command
 --> c 1

User switch command
 --> c 2
nil
iex(6)> Process.info leader_1, :message_queue_len
{:message_queue_len, 2}
iex(7)> Process.info leader_1, :messages         
{:messages, [{#PID<0.51.0>, {:data, '\r'}}, {#PID<0.51.0>, {:data, '\r'}}]}
iex(8)> Process.info leader_1, :current_location 
{:current_location, {:group, :server_loop, 3, [file: 'group.erl', line: 113]}}
iex(9)> Process.whereis :user_drv                
#PID<0.51.0>
iex(10)>

Variation on the theme:

iex(1)> leader_1 = Process.group_leader()
#PID<0.53.0>
iex(2)> {:group_leader, leader} = Process.info leader_1, :group_leader
{:group_leader, #PID<0.35.0>}
iex(3)> {:group_leader, ^leader} = Process.info leader_1, :group_leader
{:group_leader, #PID<0.35.0>}
iex(4)> defmodule Foo do
...(4)> 
...(4)>   def start(),
...(4)>     do: (Kernel.spawn_link &loop/0)
...(4)> 
...(4)>   def loop do
...(4)>     IO.puts "Looping..."             
...(4)>     :timer.sleep(1000)
...(4)>     loop()
...(4)>   end
...(4)> end
{:module, Foo,
 <<70, 79, 82, 49, 0, 0, 4, 228, 66, 69, 65, 77, 65, 116, 85, 56, 0, 0, 0, 141, 
   0, 0, 0, 15, 10, 69, 108, 105, 120, 105, 114, 46, 70, 111, 111, 8, 95, 95,
   105, 110, 102, 111, 95, 95, 9, 102, 117, ...>>, {:loop, 0}}
iex(5)> 
nil
iex(6)> Foo.start()
Looping...
#PID<0.127.0>
Looping...
Looping...
Looping...
Looping...
Looping...
Looping...
iex(7)> 
User switch command
 --> s 'Elixir.IEx'
 --> c 2
Interactive Elixir (1.5.1) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> leader_1 = pid("0.53.0")
#PID<0.53.0>
iex(2)> Process.info leader_1, :message_queue_len
{:message_queue_len, 0}
iex(3)> Process.info leader_1, :messages
{:messages, []}
iex(4)> Process.info leader_1, :current_location
{:current_location, {:group, :more_data, 5, [file: 'group.erl', line: 638]}}
iex(5)> 
User switch command
 --> c 1

nil
iex(8)> Process.info leader_1, :message_queue_len
{:message_queue_len, 0}
iex(9)> Process.info leader_1, :messages
{:messages, []}
iex(10)> Process.info leader_1, :current_location
{:current_location, {:group, :server_loop, 3, [file: 'group.erl', line: 113]}}
iex(11)> foo_pid = pid("0.127.0")
#PID<0.127.0>
iex(12)> Process.info foo_pid, :group_leader
{:group_leader, #PID<0.53.0>}
iex(13)> Process.info foo_pid, :current_location 
{:current_location, {:io, :execute_request, 2, [file: 'io.erl', line: 572]}}
iex(14)>                                   

Basically the spawned process is stuck here presumably waiting for a response from the group_leader which never comes. Looks like the response to the IO request got lost because the group_leader is waiting for more data.

2 Likes

I have had a few hangs of newly opened job too just now. It certainly looks like a bug in either Erlang or some lib it depends on.

I am on Arch Linux, and asdf-compiled Erlang and Elixir, but will try something more standard like Ubuntu and Erlang Solutions packages to see if the problem is there too.

I’m wondering if the shell switching code is manipulating the group_leaders themselves or the group_leader setting on the processes - that could potentially create a window during which there is the possibility of IO requests/replies being lost. I ran the experiment on OSX Sierra 10.12.6 (Homebrew).

And I’ve run this test on Ubuntu, Erlang 18 - 20, the issue is present there too.

Wanted to report a bug at Erlang bugtracker but it seems down :frowning: https://bugs.erlang.org/

1 Like

Just to follow-up, I indeed found a bug in Erlang/OTP. This has been already resolved by the awesome OTP team, and is working correctly on master.

The next release that will include the fix will be 20.1.

2 Likes

I be damned! This sounds very similar to my problem… now I gotta study this page here =)