How should I build my schema to allow a user to follow another one, like Twitter?
I would create something as:
āusersā: id, email ā¦
āfollowingsā: id, user_id, follower_id
unique constraing on user_id and follower_id. Both user_id
and follower_id
are referencing users
table. Just that the one is user being followed, the other one is user who is following that user.
@hubertlepicki answer is exactly right.
In a more general sense, what is being suggested is a join table. A join table is used when we need to express has and belongs to many relationships.
In your specific case, the join table is expressing the notion that a user has many followers, and can also belong to many followers.
This is good usage for some graph db like https://neo4j.com/
https://neo4j.com/developer/guide-importing-data-and-etl/
I would say it highly depends on particular use case, while in theory graph dbs are supposed to be faster for working with graphs in reality they are often not and you have to maintain an extra dependency while losing the ability to do transaction for part of your data and a ton of other inconveniences.
(How do you reply to a post as a new topic?)
Iāve always written a join table like followings as:
followings: user_id follower_id
With no base āidā column as Iāve found them useless on join tables, I then make user_id
and follower_id
a combined primary key with indexes on user_id
or follower_id
depending on which direction (or both) I expect to be accessing it from. Is there actually a good reason to put an ID column on a pure many-to-many join table I am curious?
How do you link the posts with the user and make the ātimelineā query?
In twitter, Iām able to see all posts of everybody Iām following. So, if I use an inter-table as mentioned then I need to join it with the Posts table. That join will probably not perform very well if Iām following too many users since the key cardinality will be big (because I will use the composite key (user_id + following_id) joined with āowner_idā from posts table.
Do you think is there any other solution for displaying the timeline? (timeline = posts from people I follow). I know some hierarchical solution in NoSQL databases, but I never implemented that in Relational ones.
at small scale it might be OK
at larger scale you might be caching the āfollowingsā table
and posts in something like Redis or replicating everything to ElasticSearch
and running queries against it.
yes in theory you donāt need it. In practice you often want to store some extra fields in the table, simple example is timestamps inserted_at
. I donāt think Ecto will create/update those unless itās a full model behind it, so, having an id
and model is necessity (but please do double check my words here, I assume this is how it works but never actually tried it without).
2nd thing is that if you are following the REST principles in your API urls, you can have actions such as DELETE /followings/:id
if you have ID of the resource this way.
@hubertlepicki @StevenXL @OvermindDL1 thank you for your replies.
This is how I built the relationships.
schemas
schema āfollowingsā do
belongs_to :user, User
belongs_to :follower, Follower
schema āusersā do
has_many :followings, Following
migration
create table(:followings) do
add :user_id, references(:users, on_delete: :nothing)
add :follower_id, references(:users, on_delete: :nothing)
indexes
create index(:followings, [:user_id])
create index(:followings, [:follower_id])
create index(:followings, [:user_id, :follower_id], unique: true)
Just a nitpick Iām sure,
To me personally this reads as if the people (followers) following the user are being modeled, not the people that the user is following. (Is that the intent?)
Now apparently the term āfolloweeā is actually a thing but itās not universally liked. The suggested term āsubjectā seems more appropriate and less controversial for identifying those who are being followed.
An alternate term for join table is association table (or associative entity) - good to know for googling additional information. In fact JosĆ© Valim has a blog entry āWorking with Ecto associations and embedsā discussing the technical aspects of using associations in Ecto.
Youāre right, the āfollowerā word does not show our intention. I used @hubertlepicki 's terms to avoid confusions, in case someone else also needs something like this.
I often do have extra data, but I still do not use an ID on those as those association tables require their 1/2/3 linked tables, so I have those foreign keys as a combined primary key as they āareā the primary key lookup of these tables. Actually very few of my current models have IDās at all (interfacing between different DB types makes things odd).
I always encode the data in the parameters, so if I wanted to, say, delete all Followings of a person then Iād have something like DELETE /user/:uid/followings
, or to delete a specific following Iād do DELETE /user/:uid/followings/:following_id
and that would look up the user, the user they are following, and delete the join row between them. (or ignore looking them up for validity and just delete a join column with those two IDās if it exists at all).
Having and ID is often useful if you have a more complex stack (have a caching layer) and itās nicer to have a simple ID vs some IDs being compound and some not.
Would you have cases where you would access such a join table without knowing one or both of the linked tables? Iāve not had that yet myself.
I will have to agree with you after giving it some more thought. The ID on the join table is because Iāve been taught to do so by ActiveRecord. There is nothing on the database level thatād require it
nope but had cases where I had other tables referencing it like āfilterā it becomes a bit cumbersome, plus many ORMs donāt handle compound PKs (I dislike ORMs but itās not always up to us).
Iāve often used actual SQL, I would here too except I wanted to give Ecto a try and I seem to quite like it, though I do have to fragment
stuff up a lot due to Ecto limitations, so Iāve not seen that it was required by poorly made ORMās (thankfully Ecto is comparatively not poorly made ^.^), Iāve tried to follow SQL tenents, even with my Ecto work (which is why almost none of my models have an āidā field, well they often do, but not as Ecto would expect it to be so I have to @primacy_key false
quite a whole lot).
Thanks for putting up the alternative name for this concept.
As explained here (properly implemented) surrogate keys are motivated by the fact that changing domain requirements should never force a change on (the type of) a surrogate key. For example a unique customer number could be used as a natural key - but if the type of that natural key needs to change from numeric to alphanumeric, all the tables referencing that natural key need to be altered.
Ironically some current practices undermine the value of surrogate keys. One recommendation is:
Donāt naturalize surrogate keys. As soon as you display the value of a surrogate key to your end users, or worse yet allow them to work with the value (perhaps to search), you have effectively given the key business meaning. This in effect naturalizes the key and thereby negates some of the advantages of surrogate keys.
In particular the use of URI-templates like
/user/:uid/followings/:following_id
is acceptable if :uid
is the identifier for the lifetime of the user but often :uid
is actually the surrogate key for the record the userās information is held in. By exposing it in this manner the surrogate key becomes part of the userās information - i.e. for all intents and purposes it becomes a natural key (or at least the next data migration may have to treat it as such).
Maybe Joe Armstrong had a point with āURIs are badā .