When I use the EXLA library (v0.4) in a regular mix setup, all is well. But as soon as I create a release, the application can’t start up and fails with this error message:
=SUPERVISOR REPORT==== 25-Nov-2022::21:35:09.197883 ===
supervisor: {local,kernel_sup}
errorContext: start_error
reason: {on_load_function_failed,'Elixir.EXLA.NIF',
{error,
{load_failed,
"Failed to load NIF library: 'dlopen(/path/to/project/_build/prod/rel/project/lib/exla-0.4.0/priv/libexla.so, 0x0002): Library not loaded: @loader_path/xla_extension/lib/libxla_extension.so\n Referenced from: <3967AEA8-ED44-3A08-83C4-AF702F989050> /path/to/project/_build/prod/rel/project/lib/exla-0.4.0/priv/libexla.so\n Reason: tried: '/path/to/project/_build/prod/rel/project/lib/exla-0.4.0/priv/xla_extension/lib/libxla_extension.so' (no such file), '/System/Volumes/Preboot/Cryptexes/OS@loader_path/xla_extension/lib/libxla_extension.so' (no such file), '/path/to/project/_build/prod/rel/project/lib/exla-0.4.0/priv/xla_extension/lib/libxla_extension.so' (no such file), '/usr/local/lib/libxla_extension.so' (no such file), '/usr/lib/libxla_extension.so' (no such file, not in dyld cache)'"}}}
offender: [{pid,undefined},
{id,kernel_safe_sup},
{mfargs,{supervisor,start_link,
[{local,kernel_safe_sup},kernel,safe]}},
{restart_type,permanent},
{significant,false},
{shutdown,infinity},
{child_type,supervisor}]
... and then some other reports, which are similar
I don’t think the libxla_extension.so
file is expected to be in the places that the runtime goes looking. When installing EXLA, this is how it is installed:
install_name_tool -change bazel-out/darwin_arm64-opt/bin/tensorflow/compiler/xla/extension/libxla_extension.so @loader_path/xla_extension/lib/libxla_extension.so
-change bazel-out/darwin-opt/bin/tensorflow/compiler/xla/extension/libxla_extension.so @loader_path/xla_extension/lib/libxla_extension.so cache/libexla.so
I want to emphasise that running the application with mix does not cause any problems. Also, running the release on a linux based machine (I do a docker build, and deploy to fly.io) causes no issues. So I’m a bit stuck to understand why the combination mix_releases+MacOS+EXLA is problematic.
Maybe it’s important to note that I’m running on an intel mac CPU.
Skimming the changes to the EXLA Makefile, I’m curious why this changes was made, because it did introduce an extra path element, which is where there is a mismatch in my situation… But it’s only a guess…
Thanks in advance for reading so far…