Ecto - Postgresql wiki: Don't do this

ourway · May 9, 2019, 7:54am

Hi there,
I think Ecto development should consider this recommendations:
https://wiki.postgresql.org/wiki/Don't_Do_This

Exadra37 · May 9, 2019, 8:03am

I am curious about things you think that Ecto is doing from that list?

LostKobrakai · May 9, 2019, 8:26am

At least 4.1, but there are valid reasons to not follow the advice.

NaiveDateTime already doesn’t involve timezones and DateTime is enforced to have UTC set as timezone at the time of insertion (runtime). So timestamp is the better default over timestamptz because you can be sure the database won’t ever try to convert the timestamp. Basically ecto is already doing what timestamptz is supposed to do over timestamp but it does so in it’s own control.

Also I’d imagine the ecto team is well aware of what they’re doing.
See this for a response on the above: https://github.com/elixir-ecto/ecto/issues/1868#issuecomment-268169955

jdumont · May 9, 2019, 12:15pm

The “don’t use table inheritance…never…use foreign keys instead” is relevant to a discussion I was having last week. That said, we came to the foreign key conclusion, and table inheritance isn’t suggested by the Ecto docs anyway.

I think Ecto keeps to those rules and hides away a lot of the potential pitfalls on that list (money type, uppercase column names, etc.)

hubertlepicki · May 9, 2019, 1:29pm

Ecto is doing it because over binary protocol that postgrex uses, timestamp and timestamptz is precisely the same thing.

Most postgresql connectors use text based protocol, in other languages. Postgrex uses binary protocol. It doesn’t understand client’s time zone.

I do think that, however, timestamptz is a safer choice, especially when other clients can connect to database. For example, if you ever connect using psql, you could see dates & times in different time zone because of your client configuration and then this can even lead to mistakes. I guess it’s OK if you are aware of the issue. But for the consistency and possible error avoidance, I like to uze timestamptz everywhere which forces the dates to be displayed in UTC no matter what your client is.

I have written about it more in details here: Time zones in PostgreSQL, Elixir and Phoenix | AmberBit Sp. z o. o.

peerreynders · May 9, 2019, 2:50pm

http://www.postgresqltutorial.com/postgresql-timestamp/:

When you query timestamptz from the database, PostgreSQL converts the UTC value back to the time value of the timezone set by the database server.

So the database server establishes the context for your time data?

Not helpful when the database stores event data scattered throughout various timezones - i.e. the data defines the context, not whatever timezone is set on the database server.
Not helpful if it makes sense to convert the time to the timezone where it is being consumed (halfway around the globe).

So recording all time data as UTC (for ease of point in time comparison without interference of DST) and recording the event timezone separately makes universally more sense (there are always exceptions).

Accepting the database server timezone as the centre of the universe seems myopic.

I think the real issue is that clients assume that time without a timezone reflects their own timezone. The ambiguity is gone if you establish upfront that any time without a timezone is always UTC (however inconvenient that may be).

A phone call today between São Paulo and Warsaw could have started at 14:00 UTC but would have been observed by the parties involved at 11:00 (-03:00) and 16:00 (+02:00 (DST)) respectively.

To be clear: timestamptz seems to want to do the right thing, i.e. store UTC times.

However the type timestamptz seems to signal that the DB server/driver boundary is the right place to do the conversion. I’m not convinced that in increasingly distributed systems that is necessarily the right place to do it.

hauleth · May 10, 2019, 9:30am

The most important thing on this list that is not possible in Ecto right now is usage of existential queries. Right now the only option is to use anti-joins instead.

sribe · May 10, 2019, 1:39pm

As you point out, that would be terribly inconvenient. However, that description is misleading for two reasons:

The timezone of the database server is merely a default for the timezone of the connection, which you can set to anything you want.
For my local time zone it doesn’t convert 2019-05-10 07:30:00 to 2019-05-10 00:30:00, but rather 2019-05-10 00:30:00-07. So, convenient in the default case, but trivial to parse accurately as UTC and display any way you want… (That’s for text-based protocol, I don’t know the details of binary, but I’d bet that you get UTC without being offset, because it makes no sense to adjust an 8-byte integer for easier reading by humans.)

Likewise for sending timestamps to the server, the “assume it’s server/connection time zone” only applies to timestamps which do not specify an offset.

You are right that would be a lousy way to do it, but it’s not really what PG does. Since the offset gets added, it’s not a conversion of the underlying value at all, but rather a guess at a convenient display format.

(Also “timezone of the server” can be specified in the PG config, and it’s common practice in distributed system to set it to UTC, rather than letting the system try to infer the actual time zone.)

OvermindDL1 · May 10, 2019, 4:50pm

Yeah I’ve had to help someone who had a nil in a blah not in ^thing, Ecto really needs to not use NOT IN in sql…

peerreynders · May 10, 2019, 7:05pm

Probably reflecting some of my confusion at the time (hence my addendum to the post) but I’m still unconvinced that timestamptz is the cure all that the wiki promotes it to be.

It feels like a technological quick fix for a conceptual misunderstanding.

With timestamp without time zone the timezone is portrayed as missing which is entirely true when looking at the type and perhaps even the record in which a value of that type exists. But in truth the timezone is usually implied. The problem is that from an individual point of view it is local time that is perceived and recorded. And in legacy systems that often was the frame of reference.

However from a global point of view you either need an explicit timezone as a frame of reference or you need a fixed frame of reference, i.e. UTC.

So frankly any time represented/stored without a timezone should be UTC in order to avoid confusion. So storing UTC is the part that timestamptz gets right.

it’s not a conversion of the underlying value at all, but rather a guess at a convenient display format.

So the good news is that the UTC value is sent to the querying system. The bad news is that the value displayed in the an admin tool like psql won’t match it unless you set timezone to 'UTC'; - I can see that generating a lot of confusion.

For my local time zone it doesn’t convert 2019-05-10 07:30:00 to 2019-05-10 00:30:00, but rather 2019-05-10 00:30:00-07.

I’m assuming that 2019-05-10 07:30:00 refers to the already stored value, because:

=# set timezone to '-7';
SET
=# select '2019-05-10 07:30:00'::timestamptz;
      timestamptz       
------------------------
 2019-05-10 07:30:00-07
(1 row)

=# select '2019-05-10 07:30:00 UTC'::timestamptz;
      timestamptz       
------------------------
 2019-05-10 00:30:00-07
(1 row)

i.e. a string without a timezone, parsed inside the server is going to assume the set timezone or timezone from the postgresql.conf. But

it’s common practice in distributed system to set it to UTC

guards against that.

Seems don’t put anything but UTC into timestamp is a simpler solution to the problem (which seems to be the Ecto approach - and the “convenience” of timestamptz isn’t needed anyway).

Another problem that timestamptz doesn’t fix - ambiguity during local end of daylight savings (unless server/connection is set to UTC or ignore DS).

=# SELECT '2019-11-03 01:50'::timestamptz AT TIME ZONE 'UTC';
      timezone       
---------------------
 2019-11-03 06:50:00
(1 row)

=# SELECT '2019-11-03 05:50 UTC'::timestamptz;
      timestamptz       
------------------------
 2019-11-03 01:50:00-04
(1 row)

=# SELECT '2019-11-03 06:50 UTC'::timestamptz;
      timestamptz       
------------------------
 2019-11-03 01:50:00-05
(1 row)

yurko · May 11, 2019, 9:47am

There’s also 5.1:

varchar(n) is a variable width text field that will throw an error if you try and insert a string longer than n characters (not bytes) into it.

Ecto does it as far as I can tell, setting :string type without specifying size would produce varchar that is limited with 255 characters and it will indeed throw a database level error on an attempt to insert a longer value. I’ve got this error once or twice in the past and had to go back and explicitly validate / cast each string value.

This is probably because Ecto must support other databases including older versions of MySQL but still legit point with mentioned consequences.

hauleth · May 11, 2019, 2:41pm

That is why I use :text almost always when want to store string in the DB.

yurko · May 11, 2019, 3:35pm

This is the way to go when using Postgres, there is no advantage in using varchar for shorter text. But with Ecto lingo (string is for strings, right?) and switching between the databases / ecosystems I still don’t do it, at least not always

hubertlepicki · May 11, 2019, 3:54pm

There is no advantage and there is one. Personally I am in the “text for everything” camp, but I saw people arguing that varchar with limit prevents to some extent users from abusing your web site with huge chunks of text. Something that could in theory happen if you missed your validations.

I have never seen that being a problem to be fair in my entire career but I kind of get the point they are making.

Cochonours · May 11, 2019, 9:02pm

If you validate the data against a max length anyway, then what’s the point of using text instead of varchar(max length)? I personally use text only when I cannot decide over a max length, and in such cases there is indeed the risk of people inserting looong text in there.

sribe · May 11, 2019, 9:48pm

If you check in 2 places, you have to keep them in sync when, invariably, a limit turns out to be too low. If you check in your validations, you can typically provide better feedback to users.

ourway · May 11, 2019, 9:51pm

My 2 cents:
Those Postres guys who’ve put together these recommendations are RDBMS titans. Change is hard and I always expect resistance, but following other’s experience is always the right road to progress.
I personally think at the end of the day, we should try to go that way regardless of minor problems.

hauleth · May 12, 2019, 10:21am

Another problem is syncing the way both implementations count characters. Are you sure that your DB counts them in the same way as your application? Additionally in most cases I encountered the string has undefined length, so handling these few cases when there are limits differently isn’t worth the effort and possible bugs.

Cochonours · May 12, 2019, 9:21pm

Keeping them in sync is no more difficult than setting the size as a constant in the schema (plus macro for access from outside) and using it both in the validation function and the migration. Validation is indeed necessary for better feedback, but as it doesn’t make things more difficult to maintain I don’t think it’s a good habit to not limit the size in the DB as well.

Honestly I would bin the one which doesn’t follow the spec for the selected character set (usually UTF-8 for me).
I can see the charm of using only text as we never have to pay attention, but if you regret the decision later (e.g. for performance reason) it’s difficult to define a size afterwards (on the contrary, growing the size of a string is bug-proof and thus almost painless).

What do you do when a DBA edits a text field with a sentence longer than what your validations agree with? Or if another app using the same DB does the same because it didn’t implement the same validations? You can end up with many bugs that could have been avoided with stronger in-DB validations.

OvermindDL1 · May 13, 2019, 3:19pm

Doesn’t PostgreSQL just do a text + constraint when a varchar(N) is used?