Follow users

How should I build my schema to allow a user to follow another one, like Twitter?

1 Like

I would create something as:

ā€œusersā€: id, email ā€¦
ā€œfollowingsā€: id, user_id, follower_id

unique constraing on user_id and follower_id. Both user_id and follower_id are referencing users table. Just that the one is user being followed, the other one is user who is following that user.

5 Likes

@hubertlepicki answer is exactly right.

In a more general sense, what is being suggested is a join table. A join table is used when we need to express has and belongs to many relationships.

In your specific case, the join table is expressing the notion that a user has many followers, and can also belong to many followers.

1 Like

This is good usage for some graph db like https://neo4j.com/
https://neo4j.com/developer/guide-importing-data-and-etl/

I would say it highly depends on particular use case, while in theory graph dbs are supposed to be faster for working with graphs in reality they are often not and you have to maintain an extra dependency while losing the ability to do transaction for part of your data and a ton of other inconveniences.

1 Like

(How do you reply to a post as a new topic?)

Iā€™ve always written a join table like followings as:

followings: user_id follower_id

With no base ā€˜idā€™ column as Iā€™ve found them useless on join tables, I then make user_id and follower_id a combined primary key with indexes on user_id or follower_id depending on which direction (or both) I expect to be accessing it from. Is there actually a good reason to put an ID column on a pure many-to-many join table I am curious?

2 Likes

How do you link the posts with the user and make the ā€œtimelineā€ query?

In twitter, Iā€™m able to see all posts of everybody Iā€™m following. So, if I use an inter-table as mentioned then I need to join it with the Posts table. That join will probably not perform very well if Iā€™m following too many users since the key cardinality will be big (because I will use the composite key (user_id + following_id) joined with ā€œowner_idā€ from posts table.

Do you think is there any other solution for displaying the timeline? (timeline = posts from people I follow). I know some hierarchical solution in NoSQL databases, but I never implemented that in Relational ones.

at small scale it might be OK
at larger scale you might be caching the ā€œfollowingsā€ table
and posts in something like Redis or replicating everything to ElasticSearch
and running queries against it.

1 Like

yes in theory you donā€™t need it. In practice you often want to store some extra fields in the table, simple example is timestamps inserted_at. I donā€™t think Ecto will create/update those unless itā€™s a full model behind it, so, having an id and model is necessity (but please do double check my words here, I assume this is how it works but never actually tried it without).

2nd thing is that if you are following the REST principles in your API urls, you can have actions such as DELETE /followings/:id if you have ID of the resource this way.

@hubertlepicki @StevenXL @OvermindDL1 thank you for your replies.

This is how I built the relationships.

schemas

schema ā€œfollowingsā€ do
belongs_to :user, User
belongs_to :follower, Follower

schema ā€œusersā€ do
has_many :followings, Following

migration

create table(:followings) do
add :user_id, references(:users, on_delete: :nothing)
add :follower_id, references(:users, on_delete: :nothing)

indexes

create index(:followings, [:user_id])
create index(:followings, [:follower_id])
create index(:followings, [:user_id, :follower_id], unique: true)

Just a nitpick Iā€™m sure,

To me personally this reads as if the people (followers) following the user are being modeled, not the people that the user is following. (Is that the intent?)

Now apparently the term ā€œfolloweeā€ is actually a thing but itā€™s not universally liked. The suggested term ā€œsubjectā€ seems more appropriate and less controversial for identifying those who are being followed.

An alternate term for join table is association table (or associative entity) - good to know for googling additional information. In fact JosĆ© Valim has a blog entry ā€œWorking with Ecto associations and embedsā€ discussing the technical aspects of using associations in Ecto.

2 Likes

Youā€™re right, the ā€œfollowerā€ word does not show our intention. I used @hubertlepicki 's terms to avoid confusions, in case someone else also needs something like this.

I often do have extra data, but I still do not use an ID on those as those association tables require their 1/2/3 linked tables, so I have those foreign keys as a combined primary key as they ā€˜areā€™ the primary key lookup of these tables. Actually very few of my current models have IDā€™s at all (interfacing between different DB types makes things odd).

I always encode the data in the parameters, so if I wanted to, say, delete all Followings of a person then Iā€™d have something like DELETE /user/:uid/followings, or to delete a specific following Iā€™d do DELETE /user/:uid/followings/:following_id and that would look up the user, the user they are following, and delete the join row between them. (or ignore looking them up for validity and just delete a join column with those two IDā€™s if it exists at all).

2 Likes

Having and ID is often useful if you have a more complex stack (have a caching layer) and itā€™s nicer to have a simple ID vs some IDs being compound and some not.

1 Like

Would you have cases where you would access such a join table without knowing one or both of the linked tables? Iā€™ve not had that yet myself.

1 Like

I will have to agree with you after giving it some more thought. The ID on the join table is because Iā€™ve been taught to do so by ActiveRecord. There is nothing on the database level thatā€™d require it

1 Like

nope but had cases where I had other tables referencing it like ā€œfilterā€ it becomes a bit cumbersome, plus many ORMs donā€™t handle compound PKs (I dislike ORMs but itā€™s not always up to us).

1 Like

Iā€™ve often used actual SQL, I would here too except I wanted to give Ecto a try and I seem to quite like it, though I do have to fragment stuff up a lot due to Ecto limitations, so Iā€™ve not seen that it was required by poorly made ORMā€™s (thankfully Ecto is comparatively not poorly made ^.^), Iā€™ve tried to follow SQL tenents, even with my Ecto work (which is why almost none of my models have an ā€˜idā€™ field, well they often do, but not as Ecto would expect it to be so I have to @primacy_key false quite a whole lot).

3 Likes

Thanks for putting up the alternative name for this concept.

1 Like

As explained here (properly implemented) surrogate keys are motivated by the fact that changing domain requirements should never force a change on (the type of) a surrogate key. For example a unique customer number could be used as a natural key - but if the type of that natural key needs to change from numeric to alphanumeric, all the tables referencing that natural key need to be altered.

Ironically some current practices undermine the value of surrogate keys. One recommendation is:

Donā€™t naturalize surrogate keys. As soon as you display the value of a surrogate key to your end users, or worse yet allow them to work with the value (perhaps to search), you have effectively given the key business meaning. This in effect naturalizes the key and thereby negates some of the advantages of surrogate keys.

In particular the use of URI-templates like

/user/:uid/followings/:following_id

is acceptable if :uid is the identifier for the lifetime of the user but often :uid is actually the surrogate key for the record the userā€™s information is held in. By exposing it in this manner the surrogate key becomes part of the userā€™s information - i.e. for all intents and purposes it becomes a natural key (or at least the next data migration may have to treat it as such).

Maybe Joe Armstrong had a point with ā€œURIs are badā€ :slight_smile:.

1 Like