Why does `phx_gen_auth` create a session token? (and other questions)

thojanssens1 · May 2, 2020, 11:35pm

Hello

I have a few questions about the authentication system generator for Phoenix.

Does the generated code cover authentication for SPA clients? Which part would need to be adapted?
Why is there a session token generated in the user codebase?
When using Plug.Conn.put_session/3, Phoenix already generates a cookie-session (it uses Phoenix.Token if I’m not wrong), where one can for example store the user ID. The cookie is signed and cannot be altered. The subsequent requests can retrieve the user ID from the cookie and user data can be fetched from db.
However, instead of storing just the user ID in the Phoenix token with put_session/3, the generator stores a token in the Phoenix token itself (and then retrieves user data from db based on the token). Isn’t that redundant? Why do we need to create a session token in the user codebase if put_session/3/get_session/2 handles that already?

I’ll start with these two questions first:) Thank you for any help.

LostKobrakai · May 3, 2020, 6:21am

Only with a session token in the db you can invalidate a single compromised session. For stateless session management a compromised session would mean you need to either change your signing secret — and therefore invalidate all your active sessions — or wait for the compromised session to timeout potentially doing more damage in the meantime.

You could remove access at other level, e.g. deactivate the account for the compromised session, but this would lock the account even for legitimate usage on other sessions.

silverdr · August 6, 2021, 10:07pm

Could you please elaborate a bit, preferably with example(s)? What specific scenarios does this help addressing? The one I can think of is when a user left without logging out, leaving open session behind for someone else to take over. Or am I on the wrong track?

josevalim · August 7, 2021, 8:21am

Exactly. Someone could easily expand the current feature set to do something like Gmail that lists all of your open sessions with the location and device that started them. It is can also be good even for history itself and keeping track of devices.

cnck1387 · August 7, 2021, 10:50am

In a less common but not impossible case, maybe you gave your old computer to a relative or friend but didn’t format the machine and forgot to log out of an important site. As technical users this type of thing seems nearly impossible to happen but not everyone is hyper diligent with their privacy.

The above case is also IMO a good example of why adding an “active” attribute to control access isn’t enough because as LostKobrakai stated, that would also lock out the real account owner.

LostKobrakai · August 7, 2021, 11:02am

As soon as whatever secret you supply to the user leaves your server it‘s no longer in your control. There are tech. measures like https or browser sandboxing to prevent access to that secret by third parties, but all those might fail, have bugs or whatnot. If they do a third party might get to know the secret knowingly or unknowingly to the actual user. Therefore you should have control over sessions on the server side, to have counter measurements for those cases. The longer a secret is valid on it‘s own, the more important this becomes.

cnck1387 · August 7, 2021, 11:07am

When the generator ships with 1.6 and has docs, I wonder if it’s worth writing a couple of paragraphs around that and also giving a number of real life examples where an “active” field can’t be used as a replacement for session based access.

For something as important as this, the more explicit you are the better, especially since if you don’t spend a lot of time thinking this through it could be easy to talk yourself into thinking an active field could do the job so you may decide to abort the idea of saving it server side.

Edit: To add another potential real life use case, maybe you accidentally forgot your tablet somewhere and you’re not in a position to immediately get it back (left it on a train, etc.) but you want to sign yourself out of a bunch of sites to avoid a potential future headache.

josevalim · August 7, 2021, 11:46am

Docs have been updated here: Add more docs to phx.gen.auth tokens by josevalim · Pull Request #4405 · phoenixframework/phoenix · GitHub

sergio · August 7, 2021, 8:41pm

When people ask why I like Elixir and Phoenix I point to stuff like this. The team, Chris and Jose really care about dev ux and happiness.

silverdr · August 8, 2021, 2:23pm

Thank you guys for the quick and highly constructive response! I could imagine and understand what the advantages are from the first @LostKobrakai response. Still I find the extended, more explicit documentation a highly valuable outcome of the discussion.

I came to this thread because I compared my “home-baked” approach that I’ve been using in multiple projects and carried it over from Rails to Phoenix. The main difference was exactly this not so tiny detail. In my old approach, I also removed the “remember me” part in some projects and used encrypted session with maximal one day validity time (in most projects a shorter one - important for financial stuff f. e.) and found it “good enough”. Now I am still a little concerned about potential performance implications of using DB stored tokens approach in high-traffic situations where every DB roundtrip count. Please correct me if I have no reasons to be

josevalim · August 8, 2021, 5:54pm

Honestly, I doubt this is a concern. You need to lookup the user in the first place. The difference is that, instead of doing so by ID, we are doing it by token. Which is also indexed and it would not be much different from a UUID lookup. The number of round trips is the same.

silverdr · August 8, 2021, 6:17pm

Thanks for your response. Indexing by UUID-alike has to be generally more costly than by an int I believe, and then for retrieving user I either need two trips or a join. Two trips is a no go. Join is OK, surely “less bad” of the two yet still not free. But yeah - I’d need to do some proper benchmarking to stop guessing and see what the difference actually is. Might in fact be that it is negligible even in large DB, high-traffic applications.

dimitarvp · August 9, 2021, 3:16pm

Not necessarily. PostgreSQL compiles the UUID to a simple byte array (128 bits / 16 bytes) and searches for that, and it is just 2x bigger than the normal numeric IDs (64 bits).

I doubt this will make a significant difference unless you’re doing 10k+ queries per second. And I’ve seen average tier Amazon RDS databases handle 50k+ reqs/sec easily (never going above 40% CPU).

cnck1387 · August 9, 2021, 5:17pm

Will it do the same optimization on the @primary_key {:id, :binary_id, autogenerate: true} and field :token, :binary fields used in the generator?

dimitarvp · August 9, 2021, 5:20pm

Can’t claim it with 100% certainty, I simply looked at how PostgreSQL is doing things (a while ago) so IMO the abstraction layer is not important. Hope that I am not egregiously wrong.

LostKobrakai · August 9, 2021, 5:34pm

You can see here how ecto native types map to postgres specific types:

github.com

elixir-ecto/ecto_sql/blob/e5663b8ce28717a68ff073bd645570a57075a314/lib/ecto/adapters/postgres/connection.ex#L1310-L1324

    
      
          defp ecto_to_db({:array, t}),          do: [ecto_to_db(t), ?[, ?]]
          defp ecto_to_db(:id),                  do: "integer"
          defp ecto_to_db(:identity),            do: "bigint"
          defp ecto_to_db(:serial),              do: "serial"
          defp ecto_to_db(:bigserial),           do: "bigserial"
          defp ecto_to_db(:binary_id),           do: "uuid"
          defp ecto_to_db(:string),              do: "varchar"
          defp ecto_to_db(:binary),              do: "bytea"
          defp ecto_to_db(:map),                 do: Application.fetch_env!(:ecto_sql, :postgres_map_type)
          defp ecto_to_db({:map, _}),            do: Application.fetch_env!(:ecto_sql, :postgres_map_type)
          defp ecto_to_db(:time_usec),           do: "time"
          defp ecto_to_db(:utc_datetime),        do: "timestamp"
          defp ecto_to_db(:utc_datetime_usec),   do: "timestamp"
          defp ecto_to_db(:naive_datetime),      do: "timestamp"
          defp ecto_to_db(:naive_datetime_usec), do: "timestamp"

cnck1387 · August 9, 2021, 5:55pm

Thanks, good call. So the PK ends up being a uuid and the token field is a byte array.

silverdr · August 9, 2021, 11:01pm

There are two things about indexing. One is the lookup performance related to the index structure/column(s), another one is the index size and related resources usage. In small and madium-sized datasets this isn’t going to be of any issue on modern hardware. OTOH I relatively recently worked on a set of applications backed by large databases. When one gets into nine figure ranges of interrelated records things look different. We had to be really careful about how and by what we wanted to index things and what querries we wanted to execute Things that worked nice on a dev machine could easily spell a disaster on prod DB. Now, I agree that such apps may not be common. It’s simply that I am mentally still in that “high alert” mode. For the current Phoenix based project I don’t expect problems but I plan to do some benchmarking anyway

thojanssens1 · December 29, 2021, 3:38am

Are there any simple rules/parameters based on which I should invalidate a session token and force the user to log back in?
For example:

user agent changed: won’t work for most apps because nowadays users access those from multiple devices;
ip changed: same problem as above;
location changed: requires a third-party allowing to resolve an ip to a location.

Correct me if I’m wrong but storing the session token in DB will not be useful until such automatic invalidation system is implemented. I don’t see any other way to invalidate a session tokens in DB.
(Not saying it shouldn’t be stored though, it’s a must as the above system should be implemented some day).

drl123 · December 29, 2021, 4:13am

Your first two points are exactly why you would want a session token instead of just an auth or bearer token. If the auth or bearer token were to get compromised, even if you forced logout for any connected sessions with that token (say they user logs out in one tab), the other device with that same auth/bearer token would immediately be re-authenticated because their token is still valid. By storing one token per ‘session’ (connected user), you could individually control logging each tab/browser/device out.

I recently just ran into the same problem with a legacy app where they only used auth tokens and did not have the tokens stored in the db. As a result, if you asked the auth service if the token was valid, it would always be. Logging out was only managed by deleting the token from the session so the browser no longer knew what it was. However this means that if another browser is using the same token, they don’t get logged out when you log out of your tab. Having a unique token per connection gives you finer control of which sessions get terminated and when. And there is no more load on the system because you still have to look up at least one token anyhow.

Additionally, you could now display for the user all of the sessions using their account and allow them to invalidate any they don’t think are legitimate. Then the compromised session user would need to re-authenticate and if all they knew was that session token, they would not be able to unless they knew the username/password.

I’m actually in the process of adding the exact mechanism phx_gen_auth uses in my legacy app right now for these very reasons.