Best hosting provider for Axon training?

AndyL · October 13, 2022, 5:08am

My dev machines are old and can’t handle Axon training runs. I’d like to use rent-by-the-hour GPU enabled servers from an outfit like Linode, Digital Ocean, GCP, Azure, AWS, etc.

What provider is best? Any tips on optimal configuration re: Ram, CPU, GPU?

Thanks in Advance!

te_chris · October 13, 2022, 8:31am

Worth checking out GCP Vertex AI. Create a model using custom training | Vertex AI | Google Cloud

They have a nice api where you can run one off training jobs with gpus. Just need to spec a docker container.

seanmor5 · October 13, 2022, 7:22pm

It’s worth noting that you can prototype on GPUs with Colab for free:

lucaong · April 14, 2023, 10:50am

Reviving this thread to add that I recently tried https://brev.dev and I must say that the first impression is great: you can spin up a development environment in minutes, add/remove a GPU as needed, and fully customize it to your needs. There are some rough edges, you can feel it’s a startup, but the idea is pretty good.

It is a sweet spot somewhere between setting up your own cloud instance from scratch, and running your notebook on Colab: you fully control your machine, which is a Linux instance, but do not have to mess with the details of the AWS/Google Cloud configuration, and can start from a template that already includes most of what you need.

For example, starting from the base Python + Jupyter template, I installed Elixir and a couple of other tools, and I had an environment capable of running Nx, Axon and Livebook on the GPU in minutes.

Some things are great:

Choosing CPU/GPU and changing hardware as needed is really simple
It’s really just a Linux box, so setting it up to your needs is easy if you know your way around Linux
You are not limited in the tools to use: there are templates to set up Python for ML, but you can just install Elixir and whatever you want on top of it. At the same time, the templates save you from having to figure out yourself details like GPU drivers, etc.
Auto stop of idle instances can save you from wasting money if you forget to switch it off
Integration with VSCode is super smooth (I mostly use terminal Vim, which was just as easy as copying my config files there, but simple VSCode integration is a great plus)

Some things are not as polished as I would like, for example:

copying files between your local machine and the remote one could use better tooling: you can simply scp, but you have to figure out the host, where Brev put its SSH key (in ~/.brev/brev.pem), and which user it uses, all things that could be automated by a brev scp command.
port forwarding (to reach Livebook or a dev server from your local machine) is available, but could be made more convenient with some manifest file, so you would not have to re-do it every time you start your dev environment. In general, I’d prefer to have my configuration scripted, rather than relying on the web UI, which is sometimes lacking
In general, you can feel it’s an early stage startup, with all the possible drawbacks. I am not sure if I would trust them enough yet to commit to them for a use case that would lock me in

In sum, I like the concept: spinning up cloud-based on demand development environments that you fully control, but starting from useful templates that solve the most common needs.