Torchx: OOM (out of memory) on Windows

mortenlund · October 20, 2025, 11:54am

Hi!

I am using Elixir 1.18.3 on OTP 27.
Using torchx version 0.10 compiled using LIBTORCH_TARGET=cu128 and LIBTORCH_VERSION=2.9.0

I am trying to hunt down a serious memory leak / no-free situation when using Torchx.
This is very easy to reproduce using the code below.

The issue is that the virtual memory increases in eternity.

GPU memory is released at intervals as I expect libtorch to do keeping the load at a acceptable level:

I was hunting down a memory leak in the Ortex library (I thought) but it turned out the problem was in Torchx which I used to prepare som data I sent into Ortex.

defmodule Mix.Tasks.Simple do
  @impl Mix.Task
  def run(_args) do
    10000
    |> runner()
  end

  defp runner(0), do: :ok

  defp runner(ittr) do
    Task.async(fn ->
      tensor =
        Nx.broadcast(0, {1, 3, 640, 640})
        |> Nx.backend_transfer(Nx.default_backend())
    end)
    |> Task.await()

    Process.sleep(50)

    runner(ittr - 1)
  end
end

I have also tried to use :erlang.garbage_collect to try releasing the memory but that does nothing.
I also, as you can see put the code into a Task as I thought maybe the memory was “trapped” in my parent process.
This also did nothing.

Anyone have any ideas?

I would really like to run this in WSL and use EXLA instead, but due to peripherals I have to stay in Windows.

polvalente · October 20, 2025, 12:19pm

Please report this as an issue on the Nx repository.

Do you know if this bug happens on Linux or Mac too?

edit:

After re-reading your code, I have a follow-up question: Where did you call :erlang.garbage_collect? What happens if you call it right after Task.await?

mortenlund · October 20, 2025, 12:31pm

I will try to build for linux and check.

Will create an issue.

Thanks for the reply!

mortenlund · October 20, 2025, 12:31pm

I tried both inside the Task and also after the Task.await

polvalente · October 20, 2025, 12:42pm

Thanks. I’d have expected either to have worked.
Also try to use Nx.backend_deallocate inside the task, so we can see if there’s a chance that function itself is busted.

mortenlund · October 20, 2025, 12:48pm

I have also tried using the backend_deallocation function with same result

Currently trying to build torchx for WSL to check for same problem there.

mortenlund · October 20, 2025, 12:58pm

Sofar in my test this does not behave the same way on linux.
I will try to downgrade libtorch on the windows test as the latest version for Linux is 2.7.1

Using htop, both VIRT, RES and SHR stayed stable during the test.

mortenlund · October 20, 2025, 1:16pm

I can confirm that this only happens on Windows if my understanding of Linux htop is correct.

I also tried for libtorch 2.7.1 now, and it was the same as 2.8.0

polvalente · October 20, 2025, 1:24pm

It’s very likely this is a Windows-only bug. Please report this as an issue with a summary of these findings!

mortenlund · October 20, 2025, 1:24pm

Issue created here:

github.com/elixir-nx/nx

Torchx - Memory allocation / release problem (Only on Windows?)

opened 01:19PM - 20 Oct 25 UTC

m0rt3nlund

Hi! As discussed in https://elixirforum.com/t/torchx-oom-on-windows/73005/6 thr…ead there is a problem on Windows where the allocations for tensors never gets released. My setup was on Elixir 1.18.3, OTP 27 I have tried both TorchX with libtorch 2.7.1 and 2.8.0 with same results. Code to reproduce: ```elixir defmodule Mix.Tasks.Simple do @impl Mix.Task def run(_args) do 10000 |> runner() end defp runner(0), do: :ok defp runner(ittr) do Task.async(fn -> tensor = Nx.broadcast(0, {1, 3, 640, 640}) |> Nx.backend_transfer(Nx.default_backend()) end) |> Task.await() Process.sleep(50) runner(ittr - 1) end end ```

Maybe someone else on Windows could verify my test?

polvalente · November 3, 2025, 4:15pm

Just to update the thread here, Torchx on main has been refactored to use elixir-nx/fine for the NIFs. This both fixes the bug and makes it easier to maintain the NIFs!

mortenlund · November 3, 2025, 5:44pm

Great @polvalente !