Scaling LiveView for low latency in multiple regions

wtd423 · December 4, 2020, 7:29am

That’s very cool, it seems Cloudflare uses that tech to offer a geo load balancing service of sorts, but it’s for enterprise only: https://developers.cloudflare.com/load-balancing/understand-basics/traffic-steering#geo-steering

For normal people they have a service that looks more like DNS round robin, but it might work very well for the two server use case: https://developers.cloudflare.com/load-balancing/understand-basics/traffic-steering#dynamic-steering I think I remember Route 53 having a similar service.

te_chris · December 7, 2020, 5:02pm

I haven’t used it yet, but we’re using GCP GKE, and our main ingress is a cloud loadbalancer which exposes a single IPv4 address and can, theoretically, do global load balancing. https://cloud.google.com/load-balancing/docs/load-balancing-overview

chasers · December 8, 2020, 3:16am

It can do global load balancing AND you can have instances across the globe on the same virtual network so you can actually run a normal global distributed beam setup. Then just do some smart ETS caching and you’re good.

Not sure how the beam would handle latencies in practice. I’ve read of issues there but it would be simple enough to try. Just keep an eye on your egress bill

chasers · December 8, 2020, 3:33am

And then you don’t need a fancy database just use Cloud SQLs read replication.

te_chris · December 8, 2020, 11:11am

Yeah that’s what I’m thinking when it comes to it: audit of our LV usage to cut down any unnecessary interaction, then multi-region GKE cluster with read replicas in each region and see how it works. We’re currently using discrete horizontal scaling without experiencing any real drawbacks - i.e. no distributed BEAM - so it might just work. When I get some time (hah) I might spin up a multi-region cluster and run some latency tests etc.

TBH this is why I went with GCP: as well as a better Kube product, their networking is really good.

chasers · December 8, 2020, 7:29pm

I had been meaning to test this … and I am glad to confirm that it works!

The process which pulls these log messages on page load starts a task on each node to get the recent log events cache, combines and returns them.

The Germany node is not serving traffic yet. All nodes are talking to the Postgres hosted in the US. We do a bunch of queries on node boot and did get some Ecto timeouts there so I’m not sure we’ll be able to serve traffic until we get a replica over there too.

But with Google Global load balancer all EU traffic should go directly to this node, and all US traffic will go to the US nodes. And when you load the page up it collects data from all nodes globally and displays that data.

matthewcford · January 23, 2021, 12:18pm

Did you get the database replicating ok, this would make an interesting blog post. Are all writes being sent to the US and just reads in the EU?

chasers · January 23, 2021, 3:03pm

Not yet. I pulled that node down as we’re replicating our rate limiting data via PubSub to each node and the data transfer was a bit much.

Hopefully week after next we’ll roll out more precise caching which will let us just run one db node. And then each continent will be a separate cluster to avoid the PubSub transfer costs. Rate limits will be per cluster. Global load balancing will still be sending data per each request to the closest Logflare cluster which will be great for our EU users.

lessless · April 9, 2025, 7:20pm

The limiting factor in those situations is always the state. Specifically, the persistent storage.

If you can build up your state at the edge and sync it across the regions - then you’re golden.

A specific solution depends on the application’s nature but something like Turso and ElectricSQL might be involved.

I wanted to experiment with EventStoreDB (aka Kurrent) and LiteFS abut didn’t get to do that