Logger hanging at service startup on Windows 7

I’m having a problem where my application (running as a windows service using erl_srv.exe) is failing to start. In summary, I start the computer and boot windows 7. My app, running as a service set to auto start, fails to start automatically. I need to manually start it. Here is the top of the log file:

2018-09-04 11:04:34.166 I -- Child Logger.ErrorHandler of Supervisor Logger.Supervisor started
Pid: #PID<0.82.0>
Start Call: Logger.Watcher.start_link({:error_logger, Logger.ErrorHandler, {true, true, 500}})
2018-09-04 11:10:00.796 I -- Child Logger.ErrorHandler of Supervisor Logger.Supervisor started
Pid: #PID<0.82.0>
Start Call: Logger.Watcher.start_link({:error_logger, Logger.ErrorHandler, {true, true, 500}})
2018-09-04 11:10:00.952 I -- Application logger started at :"nymiagent@127.0.0.1"

The first couple of lines are the Logger starting up but never completing. This is the failed automatic start attempt. The last few lines are the manual start attempt, which succeeds. I expect to see “Application logger started…” but don’t on the first attempt, only on the second. We are using logger_file_backend to record the log file to local disk.

I’ve got 2 questions:

  1. Is there another log I’m not aware of which logs the erts boot and startup process?
  2. Has anyone seen this issue or similar?

It only happens on Windows 7. Windows 10 doesn’t seem to be affected.

Thanks.

I don’t have this problem. I’m running elixir 1.6.6, works fine on win7 and win10.

One thing I do that’s not included by default is to use sc to register my service, something like sc sidtype %service_name% unrestricted.

Maybe try to be explicit about your dependencies, I define them in mix.exs like this:

  def application do
    [mod: {YourServer, []},
     applications: [:logger, :logger_file_backend,
      :cowboy, :plug, :crypto],
     included_applications: [:sasl, :os_mon]
    ]
  end

So what is happening here is that ERTS is crashing in the Elixir.Logger module at boot. Here is the top of the erlang crash dump. I’m not very skilled at reading this but it looks like a timeout is occurring in in gen_event call function? That line appears truncated.

=erl_crash_dump:0.5
Tue Sep 18 14:53:00 2018
Slogan: Kernel pid terminated (application_controller) ({application_start_failure,logger,{bad_return,{{'Elixir.Logger.App',start,[normal,[]]},{'EXIT',{timeout,{gen_event,call,['Elixir.Logger','Elixir.Logger.
System version: Erlang/OTP 20 [erts-9.3] [smp:4:4] [ds:4:4:10] [async-threads:10]
Compiled: Tue Mar 13 21:38:23 2018
Taints: erl_tracer,zlib
Atoms: 9654
Calling Thread: scheduler:3
=scheduler:1
Scheduler Sleep Info Flags: SLEEPING | TSE_SLEEPING | WAITING
Scheduler Sleep Info Aux Work: 
Current Port: 
Run Queue Max Length: 0
Run Queue High Length: 0
Run Queue Normal Length: 0
Run Queue Low Length: 0
Run Queue Port Length: 0
Run Queue Flags: OUT_OF_WORK | HALFTIME_OUT_OF_WORK
Current Process: 
=scheduler:2
Scheduler Sleep Info Flags: SLEEPING | TSE_SLEEPING | WAITING
Scheduler Sleep Info Aux Work: 
Current Port: 
Run Queue Max Length: 0
Run Queue High Length: 0
Run Queue Normal Length: 0
Run Queue Low Length: 0
Run Queue Port Length: 0
Run Queue Flags: OUT_OF_WORK | HALFTIME_OUT_OF_WORK
Current Process: 
=scheduler:3
Scheduler Sleep Info Flags: 
Scheduler Sleep Info Aux Work: DELAYED_AW_WAKEUP | THR_PRGR_LATER_OP
Current Port: 
Run Queue Max Length: 0
Run Queue High Length: 0
Run Queue Normal Length: 0
Run Queue Low Length: 0
Run Queue Port Length: 0
Run Queue Flags: OUT_OF_WORK | HALFTIME_OUT_OF_WORK | NONEMPTY | EXEC
Current Process: <0.0.0>
Current Process State: Running
Current Process Internal State: ACT_PRIO_NORMAL | USR_PRIO_NORMAL | PRQ_PRIO_NORMAL | ACTIVE | RUNNING | TRAP_EXIT | ON_HEAP_MSGQ
Current Process Program counter: 0x024cb1c0 (init:sleep/1 + 40)
Current Process CP: 0x00000000 (invalid)
Current Process Limited Stack Trace:
0x00e1f99c:SReturn addr 0x24C9444 (init:boot_loop/2 + 1708)
0x00e1f9a0:SReturn addr 0x750E1C54 (<terminate process normally>)
=scheduler:4
Scheduler Sleep Info Flags: SLEEPING | TSE_SLEEPING | WAITING
Scheduler Sleep Info Aux Work: 
Current Port: 
Run Queue Max Length: 0
Run Queue High Length: 0
Run Queue Normal Length: 0
Run Queue Low Length: 0
Run Queue Port Length: 0
Run Queue Flags: OUT_OF_WORK | HALFTIME_OUT_OF_WORK
Current Process:

Without an -onfail option in the erl_srv add command line the Event Viewer shows an ErlSrv Warning message saying that the error was ignored. When I add -onfail restart I get an error saying it crashes repeatedly and was terminated. Oddly, the logfile only shows on failed attempt (as in the original post) not multiple attempts as this event log entry would suggest. It still starts manually after boot.

I’m starting to think it’s something to do with the config of this machine as I’m not seeing this problem on any other systems we test on.

The truncated line gives a hint, if you look at elixir’s logger source code, there is only a single place where it does :gen_event.call/3, it’s here https://github.com/elixir-lang/elixir/blob/master/lib/logger/lib/logger.ex#L642, this should return :ok but obviously it didn’t so we can conclude file logger backend failed to configure, and the most probable reason for that is your filesystem, and this is where it gets tricky… check permissions, check disk health, check everything :wink: