I created a Jupter Notebook to run livebook on Kaggle. It is a CPU+GPU instance with 29GB CPU memory and 15.9GB GPU memory. The GPU is P100.
The livebook app came up successfully. However when running the Llama 2 example from the Bumblebee document, I got the following error:
** (RuntimeError) Out of memory while trying to allocate 90177536 bytes.
(exla 0.6.1) lib/exla/device_buffer.ex:55: EXLA.DeviceBuffer.unwrap!/1
(exla 0.6.1) lib/exla/device_buffer.ex:22: EXLA.DeviceBuffer.place_on_device/4
(exla 0.6.1) lib/exla/backend.ex:46: EXLA.Backend.from_binary/3
(bumblebee 0.4.2) lib/bumblebee/conversion/pytorch/loader.ex:79: Bumblebee.Conversion.PyTorch.Loader.object_resolver/1
(unpickler 0.1.0) lib/unpickler.ex:828: Unpickler.resolve_object/2
(unpickler 0.1.0) lib/unpickler.ex:818: anonymous fn/2 in Unpickler.finalize_stack_items/2
(elixir 1.15.7) lib/map.ex:957: Map.get_and_update/3
#cell:aq3ma36lrddcxej7ksiueluzuh4wwxhw:4: (file)
It happened when loading the model and creating the serving, before the livebook could go to the next step to do the inference.
One thing I noticed is that Kaggle shows the CPU memory is 29GB; however, the livebook app runtime shows it has 32GB. Is this a problem?
I’m able to run the same example on a windows laptop with WSL. The Ubuntu Linux is assign 38GB memory, and everything works fine.
What are the CPU+GPU memory requirements to run Llama-2-7b-chat-hf model in livebook?