Follow users

max4all · July 18, 2016, 10:53am

How should I build my schema to allow a user to follow another one, like Twitter?

hubertlepicki · July 18, 2016, 11:12am

I would create something as:

“users”: id, email …
“followings”: id, user_id, follower_id

unique constraing on user_id and follower_id. Both user_id and follower_id are referencing users table. Just that the one is user being followed, the other one is user who is following that user.

StevenXL · July 18, 2016, 1:45pm

@hubertlepicki answer is exactly right.

In a more general sense, what is being suggested is a join table. A join table is used when we need to express has and belongs to many relationships.

In your specific case, the join table is expressing the notion that a user has many followers, and can also belong to many followers.

mkunikow · July 18, 2016, 2:20pm

This is good usage for some graph db like https://neo4j.com/
https://neo4j.com/developer/guide-importing-data-and-etl/

andre1sk · July 18, 2016, 2:51pm

I would say it highly depends on particular use case, while in theory graph dbs are supposed to be faster for working with graphs in reality they are often not and you have to maintain an extra dependency while losing the ability to do transaction for part of your data and a ton of other inconveniences.

OvermindDL1 · July 18, 2016, 3:07pm

(How do you reply to a post as a new topic?)

I’ve always written a join table like followings as:

followings: user_id follower_id

With no base ‘id’ column as I’ve found them useless on join tables, I then make user_id and follower_id a combined primary key with indexes on user_id or follower_id depending on which direction (or both) I expect to be accessing it from. Is there actually a good reason to put an ID column on a pure many-to-many join table I am curious?

douglascorrea · July 19, 2016, 2:13am

How do you link the posts with the user and make the “timeline” query?

In twitter, I’m able to see all posts of everybody I’m following. So, if I use an inter-table as mentioned then I need to join it with the Posts table. That join will probably not perform very well if I’m following too many users since the key cardinality will be big (because I will use the composite key (user_id + following_id) joined with “owner_id” from posts table.

Do you think is there any other solution for displaying the timeline? (timeline = posts from people I follow). I know some hierarchical solution in NoSQL databases, but I never implemented that in Relational ones.

andre1sk · July 19, 2016, 2:45am

at small scale it might be OK
at larger scale you might be caching the “followings” table
and posts in something like Redis or replicating everything to ElasticSearch
and running queries against it.

hubertlepicki · July 19, 2016, 8:34am

yes in theory you don’t need it. In practice you often want to store some extra fields in the table, simple example is timestamps inserted_at. I don’t think Ecto will create/update those unless it’s a full model behind it, so, having an id and model is necessity (but please do double check my words here, I assume this is how it works but never actually tried it without).

2nd thing is that if you are following the REST principles in your API urls, you can have actions such as DELETE /followings/:id if you have ID of the resource this way.

max4all · July 19, 2016, 9:09am

@hubertlepicki @StevenXL @OvermindDL1 thank you for your replies.

This is how I built the relationships.

schemas

schema “followings” do
belongs_to :user, User
belongs_to :follower, Follower

schema “users” do
has_many :followings, Following

migration

create table(:followings) do
add :user_id, references(:users, on_delete: :nothing)
add :follower_id, references(:users, on_delete: :nothing)

indexes

create index(:followings, [:user_id])
create index(:followings, [:follower_id])
create index(:followings, [:user_id, :follower_id], unique: true)

peerreynders · July 19, 2016, 10:26am

Just a nitpick I’m sure,

To me personally this reads as if the people (followers) following the user are being modeled, not the people that the user is following. (Is that the intent?)

Now apparently the term “followee” is actually a thing but it’s not universally liked. The suggested term “subject” seems more appropriate and less controversial for identifying those who are being followed.

An alternate term for join table is association table (or associative entity) - good to know for googling additional information. In fact José Valim has a blog entry “Working with Ecto associations and embeds” discussing the technical aspects of using associations in Ecto.

max4all · July 19, 2016, 10:34am

You’re right, the “follower” word does not show our intention. I used @hubertlepicki 's terms to avoid confusions, in case someone else also needs something like this.

OvermindDL1 · July 19, 2016, 2:29pm

I often do have extra data, but I still do not use an ID on those as those association tables require their 1/2/3 linked tables, so I have those foreign keys as a combined primary key as they ‘are’ the primary key lookup of these tables. Actually very few of my current models have ID’s at all (interfacing between different DB types makes things odd).

I always encode the data in the parameters, so if I wanted to, say, delete all Followings of a person then I’d have something like DELETE /user/:uid/followings, or to delete a specific following I’d do DELETE /user/:uid/followings/:following_id and that would look up the user, the user they are following, and delete the join row between them. (or ignore looking them up for validity and just delete a join column with those two ID’s if it exists at all).

andre1sk · July 19, 2016, 2:59pm

Having and ID is often useful if you have a more complex stack (have a caching layer) and it’s nicer to have a simple ID vs some IDs being compound and some not.

OvermindDL1 · July 19, 2016, 3:16pm

Would you have cases where you would access such a join table without knowing one or both of the linked tables? I’ve not had that yet myself.

hubertlepicki · July 19, 2016, 3:28pm

I will have to agree with you after giving it some more thought. The ID on the join table is because I’ve been taught to do so by ActiveRecord. There is nothing on the database level that’d require it

andre1sk · July 19, 2016, 3:35pm

nope but had cases where I had other tables referencing it like “filter” it becomes a bit cumbersome, plus many ORMs don’t handle compound PKs (I dislike ORMs but it’s not always up to us).

OvermindDL1 · July 19, 2016, 4:11pm

I’ve often used actual SQL, I would here too except I wanted to give Ecto a try and I seem to quite like it, though I do have to fragment stuff up a lot due to Ecto limitations, so I’ve not seen that it was required by poorly made ORM’s (thankfully Ecto is comparatively not poorly made ^.^), I’ve tried to follow SQL tenents, even with my Ecto work (which is why almost none of my models have an ‘id’ field, well they often do, but not as Ecto would expect it to be so I have to @primacy_key false quite a whole lot).

StevenXL · July 19, 2016, 7:36pm

Thanks for putting up the alternative name for this concept.

peerreynders · July 19, 2016, 8:43pm

As explained here (properly implemented) surrogate keys are motivated by the fact that changing domain requirements should never force a change on (the type of) a surrogate key. For example a unique customer number could be used as a natural key - but if the type of that natural key needs to change from numeric to alphanumeric, all the tables referencing that natural key need to be altered.

Ironically some current practices undermine the value of surrogate keys. One recommendation is:

Don’t naturalize surrogate keys. As soon as you display the value of a surrogate key to your end users, or worse yet allow them to work with the value (perhaps to search), you have effectively given the key business meaning. This in effect naturalizes the key and thereby negates some of the advantages of surrogate keys.

In particular the use of URI-templates like

/user/:uid/followings/:following_id

is acceptable if :uid is the identifier for the lifetime of the user but often :uid is actually the surrogate key for the record the user’s information is held in. By exposing it in this manner the surrogate key becomes part of the user’s information - i.e. for all intents and purposes it becomes a natural key (or at least the next data migration may have to treat it as such).

Maybe Joe Armstrong had a point with “URIs are bad” .