Livebook cuda 12.2 XLA out of memory at 11006MiB

jonatanklosko · September 4, 2023, 10:43am

XLA reserves memory upfront and then allocates within that reservation as needed. This behaviour can be customized with client options preallocate: false or other :memory_fraction. However, I don’t think this will help with the OOM error.

We are still yet to do more optimisations for stable diffusion, but two things you can try this:

Load the parameters into the CPU with Bumblebee.load_model(..., backend: {EXLA.Backend, client: :host})
Enable lazy transfers in serving defn options: defn_options: [compiler: EXLA, lazy_transfers: :always]

This way, instead of placing all parameters on the GPU, they will be transferred as needed.

Also make sure to try with batch size 1.