I built a Reddit monitoring tool with Elixir (and avoided a huge GPU bill)

Hey! Just launched https://buzzbear.ai after quite a bit of time working on it on and off. It monitors Reddit and uses semantic matching to find posts related to whatever you want to track… your product, competitors, topics you care about, whatever. And sends you emails when it finds relevant stuff.

I wanted to share how Elixir made this way easier than expected.

The Stack

  • Phoenix LiveView for everything UI
  • Oban for background jobs
  • Bumblebee for ML
  • Libcluster over Tailscale for connecting my local GPU to the cloud

The fun part: Running ML Models on my personal GPU instead of renting cloud GPUs

I needed to run text similarity models to match Reddit posts against user queries. Checked GPU instance pricing and… yeah, no. $700+/month for something that would sit idle most of the time.

Then I remembered: I already have a perfectly good GPU sitting in my closet.

Here’s the setup: The ML model runs on a local server w/ gpu. My Phoenix app runs on Fly.io. They talk via distributed Erlang over a Tailscale VPN. From the code’s perspective, it’s just calling another node in the cluster. Libcluster handles discovery, Tailscale handles the secure connection, and Erlang handles the RPC.

The best part? When I eventually need to scale, I just spin up more GPU nodes and libcluster automatically discovers them. The calling code doesn’t change at all. Right now my janky home server handles everything (~1M comparisons/hour on a single RTX 3080). Later, I can move to proper cloud GPUs without rewriting anything.

What surprised me

Distributed Erlang was easier than expected. I thought clustering would be this complex thing. Nope. Install Tailscale, configure libcluster, done. It just works.

Bumblebee is legit. Running production ML workloads on consumer hardware works great. The EXLA backend with CUDA just works out of the box.

Oban is a cheat code. Retries when Reddit is flaky. Concurrent processing. Cron scheduling. I didn’t have to build any of this. Just define a worker and Oban does the rest.


Anyway, this was a fun build. The Elixir ecosystem made so much of this just work, that I actually enjoyed building it instead of fighting infrastructure.

Happy to answer questions about any of this!

26 Likes

Welcome to the forum!

Congratulations on the launch, and thanks for sharing the story :slight_smile:

Here’s a few:

  • You mentioned a few use cases both in your post and on the landing page. If you already have paying customers, is there any real use case you can share? I’m curious what people would use it for!
  • Are you using Fly’s Postgres?
  • How do you batch comparison jobs?
  • Are you using hybrid search or vector distance?
  • How are you approaching marketing & sales?
  • You mentioned Elixir made it easier than expected, and several positive surprises. What has be the greatest challenge with building/maintaining the system?

Wish you and BuzzBear a successful 2026! Cheers!

1 Like

You mentioned a few use cases both in your post and on the landing page. If you already have paying customers, is there any real use case you can share? I’m curious what people would use it for!

I actually don’t have paying customers yet, all on the free tier currently, as I’ve just launched this a couple days ago. So no complete case studies as of yet. But an example use case off the top of my head:

  • A deployment tool founder monitoring for posts like “our CI pipeline is a nightmare.” When we find a matching post, we’d then email him and he can reply to the post with how he can help.

Are you using Fly’s Postgres?

Nope, using Crunchy Data. Crunchy’s pricing worked better for my needs, plus they integrate well with Tailscale, which fits nicely with the rest of the system.

How do you batch comparison jobs?

  • All comparison jobs are placed in an Oban queue
  • A highly concurrent Oban worker then reads from the queue and then makes RPC calls to the node running Bumblebee
  • Nx.Serving is configured to automatically batch all incoming requests that arrive within about the same time

Are you using hybrid search or vector distance?

It’s mostly vector distance currently, but if a user specifies specific keywords then we use those to find matches regardless of the vector distance.

How are you approaching marketing & sales?

Honestly, still figuring this out… Just posted on a couple subreddits so far. Might run a couple ad campaigns.

You mentioned Elixir made it easier than expected, and several positive surprises. What has be the greatest challenge with building/maintaining the system?

Oban made concurrency and scaling easy, but debugging jobs was a bit hard initially when something went wrong. But adding extensive logging/metrics + the oban web dashboard recently becoming open source, really helped in making it easier to investigate issues and finding patterns in failed jobs.

And Thank you! Appreciate the questions and well wishes, wish you a great 2026 too :folded_hands:

3 Likes

When you do need to scale more, you might take a look at FLAME (on mobile or I would send you a link but if you just search that on hex you’ll find it)

This works especially well with background jobs that call a GPU backed instance because then you can let it scale to zero when it’s not in use. Depending on your local electricity costs and your volume, this may actually be more cost-efficient right now, assuming you are running the GPU system at home all the time.

I think Oban also has a Cloud Protocol that allows you to trigger scaling if you didn’t want to do Flame, or there’s parts of the chain that would need to scale separately.

Oban is a cheat code. Retries when Reddit is flaky. Concurrent processing. Cron scheduling. I didn’t have to build any of this. Just define a worker and Oban does the rest.

Are you paying for the reddit api? What exactly is flaky?

Wow! This is great, was actually not aware of this. Will definitely play around with it.

Pretty cool article I found talking about how it can be used for inference: Next-Gen Machine Learning with FLAME and Nx: Beyond Serverless Solutions

I guess flaky is a mischaracterization… More like hitting their API limits

Hi @yosalama

Congrats on the launch. I loved hearing how happy you were with libcluster & Oban.

Couple of questions if you don’t mind:

  • Did you assess whether calling out to an LLM made more sense than running off your own GPU? I don’t know I’d be comfortable serving paying customers from my own PC - especially since MY 3080 is busy running XCOM and Total War! I’ve been using Gemini Flash and it’s been pretty reliable.
  • Are you paying for access to the Reddit API, or jut using individual account credentials?
  • Are you running multiple instances on Fly (can’t recall what their geo-clustering is called) or just one?

Anyway, good luck with the site and thanks for posting.

Just a quick FYI for anyone using Google’s LLMs - if everything stops working and you keep getting Rate Limit errors, check that they haven’t deprecated the model … They don’t seem to provide any warning for this and their choice of error message isn’t great.

I’m actually not using LLMs for the initial matching! I’m running embedding models via Bumblebee… which made self-hosting on a 3080 pretty feasible. Cost was definitely a factor, but honestly I also just wanted an excuse to dig into the Bumblebee/EXLA stack… it’s been fun to work with.

For Fly… Just a single instance right now. Traffic is low enough that geo-distribution isn’t necessary yet. :

2 Likes