ydg33
June 15, 2024, 3:29am
1
Hello!
This is probably a newbie question. But I noticed that this code that stacks takes 56 seconds the first time I run it:
for(_ <- 1..10_000, do: Nx.broadcast(0, {1024}))
|> Nx.stack()
while stacking in BinaryBackend
then transferring to EXLA
only takes 0.4 seconds:
zero = Nx.tensor([0], backend: Nx.BinaryBackend)
for(_ <- 1..10_000, do: Nx.broadcast(zero, {1024}))
|> Nx.stack()
|> Nx.backend_transfer(EXLA.Backend)
The first version also takes an additional minute whenever the length of the list changes. Why is that? I noticed the current function was stuck on EXLA.NIF.mlir_compile/7
. Is there a new version of stack
being compiled for each input list size?
1 Like
Confirmed on my laptop with NVIDIA GeForce RTX 4070 Mobile GPU.
When running
for(_ <- 1..10_000, do: Nx.broadcast(0, {1024}))
|> Nx.stack(name: :articles)
the GPU memory is always near full, and I killed that run after about 2 minutes.
Here’s what $ nvidia-smi
showed to me:
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.02 Driver Version: 555.42.02 CUDA Version: 12.5 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4070 ... Off | 00000000:01:00.0 Off | N/A |
| N/A 46C P8 3W / 35W | 7213MiB / 8188MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 5627 C ....17.0-otp-26/.mix/escripts/livebook 7204MiB |
+-----------------------------------------------------------------------------------------+
Then I totally restarted the livebook, and run
zero = Nx.tensor([0], backend: Nx.BinaryBackend)
for(_ <- 1..10_000, do: Nx.broadcast(zero, {1024}))
|> Nx.stack(name: :articles)
|> Nx.backend_transfer(EXLA.Backend)
It finished within 3 seconds.
Elixir Version: 1.17.0 OTP 26
Livebook Version: 0.12.1
Nx Version: 0.7.2
EXLA Version: 0.7.2
ydg33
June 15, 2024, 8:49am
5
Wow! Thanks all for the verification & for the fix!