Fetch online users's friends for large quantity of users

jaybe78 · February 11, 2025, 10:55am

Hello,

I’m designing a bidirectional friendship system where I store the data in a single dynamodb “Users” table.
For the record I only store one side of the relationship and use a GSI to get the edges.

Each user could have between 1 and 1000 friends or more, and I could have 30_000 or more users joining the app at the same time.

I want to develop a feature where when any user who joins the app, he can quickly sees his friends that are online.

Obviously that involves querying friends for every single users joining and also probably caching them (:ets ?).

Each user joining is tracked using Phoenix.track() against his channel(“user:*”) so that I know whether someone is online or not. Their status is stored in an ETS table.

My initial idea is to fetch users’s friends and filter them with what is stored in the ETS table mentioned just before, to know whether one user is online or not.

Obviously the complexity here is that I have to fetch a lot of data at the same time, it represents a lot of queries, so I thought about using genstage or broadway…

May be I can also get away querying less…
For example if user 1, 2 and 3 are friends and the first query executed is for user 1,
I would already get his relationship with 2 and 3, that I could cache !?

May be store in an ETS table each relationship for each user !?
1 => 2
1 => 3
2 => 1
2 => 3
…

Then before querying friends, for users 2 and 3, I can already notify them that user 1 is online.

If anyone could provide some guidance to set up something scalable.

cheers

Cheers

jhogberg · February 11, 2025, 12:20pm

Have you considered using a relational database instead? They excel at solving this kind of problem.

krasenyp · February 11, 2025, 1:00pm

Either a relational database or a graph database will do the job. DynamoDB is definitely not suitable for modelling relationships.

jaybe78 · February 11, 2025, 6:30pm

Thanks but I’m not asking about designing the relation between users.
I’ve already done that and it works really well.

Now to be more specific, my needs are pretty simple, I just need to return a list of friends for a user.
With DynamoDB you can implement a sort of graph using proper primary and sort keys.
It is extremely fast !
I don’t need all the extra features I would get from a graph DB like fetching friends of friends…

This is not the issue I’m trying to solve here.

Whether it’s a graph db or dynamo db, you still have to query the database to fetch the list of friends and you still have to go over them to see which ones of them are online.

jaybe78 · February 11, 2025, 6:34pm

I might be missing something but to me this issue has nothing to do with the DB used.
DynamoDb is fine at mapping users’s friends with way better read performance than a SQL DB.

My question is about the strategy to fetch all those data for each user joining.

I could simply load a user’s friends on “demand”, basically when he joins.
Would that scale well when I have 50_000 people joining at the same time ?
May be, may be not but long term I think I need something more efficient to handle all that work.

Or I could take advantage of stuff like genstage/broadway to fetch/update those users’ friends with their current status ?

Lucassifoni · February 11, 2025, 10:57pm

I might mis-understand your post, but, are you planning for high scale and lots of users being generally there at the same time , or for some reason, your total user number is reasonable (let’s say 80_000) but your domain has a reason for (almost) all of them joining at the same time, making huge bursts of sudden joins ?

jaybe78 · February 12, 2025, 6:12am

I’d say it’s both, lots of users connected at the same time and “huge” bursts of sudden joins sometimes.
Though I’m more preoccupied by the burst of users joining, because my plan is to cache the user friends in ETS table so I would only fetch them once for each user.
Then as mentioned above I would have to filter them out based on whether they are currently available or not, but that step should be straightforward.

DynamoDB is capable of handling more than 10 trillion daily requests, with the ability to support peaks of over 20 million requests per second.
So my question really is to either fetch each user friends separately on demand, or centralize that in a genstage/broadway.

jhogberg · February 12, 2025, 9:14am

It has everything to do with the database used: the fastest operation is the one that never runs, and with a relational/graph database you can avoid the reads altogether. You ask the database for the friends which are online and get only those in the result, deferring the retrieval of the less-interesting offline friends for later, which may very well be cached end-user-side (e.g. localStorage) on a long-term basis.

The moment that you start doing relational operations over data that you have retrieved from a NoSQL database, you have invented your own half-finished, slow, and bug-ridden relational database.

You will need to adapt the application to make good use of the relational model, but once you have, a sustained “50000 users with 1000 friends each leaving and joining every second” – and likely many of your other problems too – become a non-issue.

Not all your data needs to be in a relational database either, but you may as well put it there: in my experience every clever NoSQL project ends up needing relations everywhere sooner or later anyway.

LostKobrakai · February 12, 2025, 9:19am

That expects that the db knows the online status. I guess this topic is about using Phoenix.Presence to track the online status while not having it stored in a database.

jhogberg · February 12, 2025, 9:29am

Yes: keep it updated, lazily if required. It’ll be less work in aggregate than having to shuffle hundreds or thousands of relation tuples for an application-side join every time a very social user joins.

jaybe78 · February 12, 2025, 12:07pm

No this topic is not about using Presence.

Whether you use Presence or not, you still need to get the list of your friends you want to subscribe to.
The topic is about fetching/updating efficiently user’s relationships and their status.

As mentioned above, Presence will be used but at the very last stage, when friends of a users have been fetched. I’m not concerned by that stage as I plan to use Presence sparingly on a subset of the friends returned

jaybe78 · February 12, 2025, 12:26pm

I agree with you, you don’t want to fetch 1000 friends of a user if only one in there is “online”.

But what do you mean by “online” ?
“Online” in the context of a graph or SQL DB might be very different than in the context of a Phoenix app.

I don’t think we can say a user is online with certainty because there’s been an update a few minutes ago, of a column “last_active” in a table, whose purpose is to approximately say, that a user was “recently” active…

But ok, let’s say each client periodically update a column like “last_active” with current datetime, so that the fetching friends query can filter out the ones that have been inactive for more than x minutes.

It’s true that with DynamoDB you cannot make join so there’s no way to gets friends of users and filter with “last_active” date, because that column would only be updated at the user level and not for each friends relationship.

Though my assumption is that since DynamoDB is way faster than SQL, it would not be a problem to fetch more users, especially because they would only be fetched once, and then cached.

The other things to consider in my mind are:

Filtering users from “last_active” could be a waste of time. In the sense, that this column could be way further than the “truth” than we think. What do I want to know ? Whether an user is available or not right.

In the application I develop, people can quickly go from “available/online” to “busy” status and so with that “last_active” date, that we rely on initially, there will always be a delay.

It will never be as fast/accurate as Presence or Tracker so you could also say, “What’s the point of querying with a join in a SQL DB knowing the data it returns, might be obsolete when it arrives ?”

You could have only 1 friend online at a T moment and 90% of your friends online at T+1, so you would end up, querying that user’s friends multiple times anyway, while you could have fetched the whole thing the first time and cache it.

In the end, what is returned by queries at some time could be far from the “truth”

My current architecture currently makes use of DynamoDb and streams some data to PostgresSQL.

I have considered using Postgres and fetch friends using that “last_active” date, but then, I thought, may be PostgreSQL would not be able to handle the load as good as DynamoDb.

You will need to adapt the application to make good use of the relational model, but once you have, a sustained “50000 users with 1000 friends each leaving and joining every second” – and likely many of your other problems too – become a non-issue.

the leaving and joining of users every second would not be an issue at the app level I guess, because once they are fetched and cached, you can quickly go through them and check their status.
I’m currently tracking my users’s status in an ETS table.
I don’t think it would be a big issue to go through 1000 users in an ETS table to filter those that are away/busy
It would be actually way more complicated for a graph or SQL DB though… even with users periodically updating their status.

That’s tricky decision so that’s why I was asking.

Thanks

jhogberg · February 12, 2025, 2:56pm

Not when you know how to leverage the database. I’ve been in this situation before ^[1] and tried to dissuade you, but you seem to have made up your mind already. Good luck

Swap DynamoDB for MongoDB, but better database throughput wouldn’t have helped. ↩︎

jaybe78 · February 12, 2025, 4:03pm

LOL ! not really no… Otherwise why would I be here asking !?

The other thing you don’t mention is that if I go for SQL solution I’m gonna have to handle config and administration of a SQL DB while I’m already busy on other stuff.

DynamoDB offers me some “free” time… but may be more expensive ^^

Anyway as I said, tricky decision

garrison · February 12, 2025, 7:06pm

You have two predicates.

user has a friend relationship with current_user
user has status=online

Assuming you have two indexes, for each of these predicates, you have two choices:

scan user’s friends and then filter for status=online
scan all online users and then filter for friendship

If you use a relational database it will pick one for you - almost certainly the first, because the set of friends is going to be a lot smaller than the set of all online users.

Of course, there is a third option: you could create an index of all online friends, essentially materializing the predicates.

But the crux of the issue here is: you seem to be storing the online status in ephemeral :ets tables instead of in the database (not necessarily a bad idea). So you are responsible for performing the filtering yourself. Meaning you are back to the same choices:

scan all online users from the ets table and then filter for friends
scan all friends from the database and then filter for online

The third option (indexing on both fields) is no longer available to you because you have no way to atomically update both the database and your ets table, so you risk corrupting the state. Of course maybe this is not a big deal since your online state is so ephemeral anyway.

I think you had the right idea from the beginning here - this is what you should do IMO. Also consider the following: it is common for a “friends list UI” to show both online and offline friends. I would go so far as to say this is the default. So you’re probably not losing much querying the whole friends list anyway - you may as well just display it!

garrison · February 12, 2025, 7:10pm

I fully agree with this BTW, my answer above should be interpreted as “if you’re going to do it that way”. The join inside the database is likely to be faster because it won’t send all of the friends over the wire (that is unless you want to load the full friends list anyway).

And just because the database is probably better-optimized than your application code (ets is pretty fast though).

boxxxie · February 13, 2025, 7:44am

30k users is a very small amount of data. you actually can do something like use an in memory graph, or embeded graph db. when you start your app read in the whole DB and then do all your work via the graph and keep it in sync with the DB on writes. you can seed your DB and benchmark the effect of solving the problem in this way. if you are able to do things in memory on your app server you eliminate a lot of complexity.

jaybe78 · February 13, 2025, 10:29am

Thanks for your answer Mr Garrison.

yes I thought about displaying all the users and page the result but for what I want to do, it’s not really ideal.
I don’t want a user to scroll through a hundreds of his friends in search of someone online to play with.
That would be a waste of time… especially for someone with 1000 or more friends whose only friend online is at the very bottom of the list

Regarding scanning table, it’s really not an option for me, would be a waste of time, performance and money…

Let’s say I go for the SQL option:

So basically I’d have 2 tables Users(user_id, last_active_date, status) and Friends(user_id, friend_id)

On the Users table I can create a composite index on “last_active_date” and “status” (away, offline, online)

last_active_date will be updated in batch at the app level to optimize server round trip.

Regarding the Friends table I would store only one side of the relationship

To fetch the online friends of a user joining I could do something like:

select user_id, friend_id from Friends f
inner joins Users u on (f.user_id = u.user_id or f.friend_id=u.user_id)
where u.user_id = :user_id
limit 100

or may be I could even do that in batch rather for a single user

where u.user_id in (:user_ids)

Is there anything else I can look at to improve performance, bearing in mind, that table will be massive at some point ? Partitioning ? Materialized views ?

I heard Materialized views do not scale very well, and about partitioning the Friends table ? would that really benefit in anything ?

At the app level to improve even more performance, once the first users joining start fetching their friends I can store in memory their relationship for each of them:

Basically if users 1 ans 2 join and both have friendship with user 3, when user 3 joins, I can first check the ets tables where already fetched relationships are stored, rather than directly querying the DB.
The idea is to constantly take advantage of the relationship already fetched by other users, so that I can display result faster and have a kind of an in memory “online” representation of users friendship.

the other thing is would broadway/genstage benefits in that scenario to not overload the SQL DB?

Is that too naïve kind of approach ?

jaybe78 · February 13, 2025, 10:34am

30k users multiplied by up to 1000 friends for each user.
Yes not that much for now but will get bigger so looking for some optimised solution.

I have not found in Elixir any in memory graph DB, that’s why I though about storing relationship in multiple ets tables across a cluster of nodes.

boxxxie · February 13, 2025, 11:48am

sorry, i am unsure of the scope of the problem. do you have data and usage patterns, expected active users per second? you could run a graphDB on your server, i think it’s probably overkill. if you think that your data is going to be too much for ram (i suspect very unlikely) then you can do everything in the DB by making a table per friends list. this could be a partition on a relationship table, or this could be something like a materialized view that gets rebuilt on write/friend-added (they are small tables). i think another thing you can do is predict the likelyhood that someone will appear online, and then you can remove those nodes from your graph. how often is someone going to need to know about the 1000 friends statuses? for example you can have a table that records the last time someone logged on/off the site, then use a window of that data to determine what are your high priority nodes. that subset of your total nodes will likely always be small. instead of 30k users and 1000k friends each, you have 1k likely active users with 10 likely active friends. even if you aren’t doing predictions you only need active users in your graph + friend nodes, unless i misunderstand the problem.