A number of times I’ve run into some error or other while developing in ash, the exception is printed along with a stack trace but it’s all library and framework stack frames. The message will be something like “Input Invalid” and “widget_id: has already been taken” but it’s not clear without a bunch of IO.inspect
debugging what part of the application code is calling the library in a way that causes the error.
I know that Ash runs queries in a separate processes and that’s probably the stack trace that I’m seeing. The way that Elixir apps function, this is a common pattern. Is there a generally accepted approach to dealing with this other than manually instrumenting code with debugging output?
Reading this back, I suspect that a custom Ash tracer that writes a stack trace to ETS before handing over to another process might be a step in the right direction for this kind of debugging. However I don’t know how it would be stitched back together when an exception is raised.
So there are a couple things we can do to help with this. The first one is providing more bread crumbs in our error messages. Additionally, you can try extending the backtrace depth when your app starts
:erlang.system_flag(:backtrace_depth, 100)
in your application start.
def start(_type, _args) do
:erlang.system_flag(:backtrace_depth, 1000)
...
end
1 Like
Thanks for that suggestion - I found that the default is 8, and anything higher than 64 is capped at 64.
Interesting. I just picked a high number just to be sure. Are you finding that the stacktrace is too deep to find your app code? I think we can actually do something about this, like when you call an action we can store the stacktrace at that point, and then add it to the end of returned stack traces, to ensure that there is always a stack frame from your app.
I haven’t seen a stack frame from application code in these cases, no. I figured it was because of that handover to the process that actually performs the query, so the stack trace is limited to that process. In the world of microservices, distributed tracing is used to aggregate a picture of what is happening, so I guess something like that would be handy for OTP if it doesn’t exist already.
Interesting…I think we’ll have to address this on a case-by-case basis. Next time you have a bad stacktrace, post it here or as an issue and I’ll figure out how to make it not bad