Tz - time zone support for Elixir (alternative to Tzdata that comes with a lot of bugfixes)

Hello :wave:

Allow me to introduce you to Tz, an alternative time zone database support to Tzdata.

Why another library?

First and foremost, it comes with a lot of bugfixes. At its current state, Tzdata has many bugs, some of which have already been reported for some time, but left unfixed for the moment.

The Tz library has been tested against nearly 10 million past dates, which includes most of all possible imaginable edge cases. You will find below 10 random examples of bugs using Tzdata, that my tests allow me to detect.

Time zone periods are computed and made available in Elixir maps during compilation time (by “period” it is meant a period of time where a certain offset is observed, for example from March 31 until October 27 2019 clock went forward by 1 hour in Belgium).

I would like to reduce the compilation time as it currently takes over 15 seconds. For example, Tzdata ships with a dump of the computed periods in an ETS file; consequently the periods no longer have to be computed when compiling the dependency. However I do not want to use ETS for querying, but the idea of a dump to avoid all the period computations is interesting.
As I’m writing this post, I realized that the compilation time just got much faster; no idea what happened though and honestly I do not understand what really takes time during the compilation process; I reduced the data inside the maps that represent the periods, could it be that? It seems then that it might not be the computations that take the most time, but rather related to the size of the maps.

The period lookup can be optimized. Note however that without any kind of optimization yet written for Tz, querying the periods and writing the result into a file for 9.866.112 million dates takes around 6.5 minutes, whereas Tzdata takes almost 10 minutes (same code is used for querying, using the DateTime.from_naive/3 and DateTime.shift_zone/3 functions once with Tz.TimeZoneDatabase, once with Tzdata.TimeZoneDatabase). Note that in Java it takes less than 15 seconds to generate and write these nearly 10 millions dates into a file with a similar code logic… how do you think that’s manageable? :sweat_smile:
Tzdata comes however with dynamic tz data updates; I have no plans to integrate that in Tz (for every iana tz database update, the Tz dependency will have to be updated). In order to keep your time zone database updated, you can “watch” the project on github for releases and I also plan to provide with an optional small utility that logs on your server when a new iana tz database update is detected.

If you happen to be part of a profitable company relying on time zone support in Elixir, and the company wouldn’t mind supporting a little for continuous work, I have set up GitHub Sponsors for this particular project as I have been working on it for a long time, full-time, without a source of income; just in case I’d have saved you months of work. and there’s still a lot of work to be done:

  • the library is tested against nearly 10 million dates; this code for testing is currently in a private separate package that needs to be reworked and open-sourced;
  • the library lacks code documentation for now;
  • I’d like to do some continuous refactoring and renaming;
  • decrease the compilation time;
  • decrease the tz periods lookup time;
  • provide different utilities (in separate packages; I want to keep Tz minimal to provide the time zone support for Elixir’s DateTime module) to extract other useful data from the iana tz database, watch for iana tz database updates, etc.
  • …

Bugfixes

Here are 10 example bugs with Tzdata that my testing code detects:

Example bug 1:

DateTime.from_naive(~N[1912-01-01 00:00:00], "Africa/Abidjan", Tzdata.TimeZoneDatabase)

** (UndefinedFunctionError) function nil.utc_off/0 is undefined

Bug has been reported here: https://github.com/lau/tzdata/issues/90

Using Tz:

DateTime.from_naive(~N[1912-01-01 00:00:00], "Africa/Abidjan", Tz.TimeZoneDatabase)
{:gap, #DateTime<1911-12-31 23:59:59.999999-00:16 LMT Africa/Abidjan>,
#DateTime<1912-01-01 00:16:08+00:00 GMT Africa/Abidjan>}

Example bug 2:

DateTime.from_naive(~N[1920-09-01 00:00:00], "Africa/Accra", Tzdata.TimeZoneDatabase)
{:gap, #DateTime<1917-12-31 23:59:59.999999-00:00 LMT Africa/Accra>,
 #DateTime<1920-09-01 00:20:00+00:20 +0020 Africa/Accra>}

The documentation says

When there is a gap in wall time - for instance in spring when the clocks are turned forward - the latest valid datetime just before the gap and the first valid datetime just after the gap.

But the first date returned by Tzdata happens nearly 3 years earlier, that’s definitely not the “latest valid datetime just before the gap”.

Using Tz:

DateTime.from_naive(~N[1920-09-01 00:00:00], "Africa/Accra", Tz.TimeZoneDatabase)
{:gap, #DateTime<1920-08-31 23:59:59.999999+00:00 GMT Africa/Accra>,
 #DateTime<1920-09-01 00:20:00+00:20 +0020 Africa/Accra>}

Example bug 3:

DateTime.from_naive(~N[1891-03-15 00:00:00], "Africa/Algiers", Tzdata.TimeZoneDatabase)
{:ambiguous, #DateTime<1891-03-15 00:00:00+00:09 PMT Africa/Algiers>,
 #DateTime<1891-03-15 00:00:00+00:12 LMT Africa/Algiers>}

According to tzdata, “1891-03-15 00:00:00+00:09 PMT” happens first, but that is wrong;
“1891-03-15 00:00:00+00:12 LMT” happens first, then clock went backwards and the time zone abbreviation changes from LMT to PMT.

Using Tz:

DateTime.from_naive(~N[1891-03-15 00:00:00], "Africa/Algiers", Tz.TimeZoneDatabase)
{:ambiguous, #DateTime<1891-03-15 00:00:00+00:12 LMT Africa/Algiers>,
 #DateTime<1891-03-15 00:00:00+00:09 PMT Africa/Algiers>}

Example bug 4:

DateTime.from_naive(~N[1977-05-06 01:00:00], "Africa/Algiers", Tzdata.TimeZoneDatabase)
{:ambiguous, #DateTime<1977-05-06 01:00:00+02:00 CEST Africa/Algiers>,
 #DateTime<1977-05-06 01:00:00+01:00 WEST Africa/Algiers>}

For tzdata, 1977-05-06 01:00:00 at Africa/Algiers is ambiguous;
however, a DST change happened at 1977-05-06 00:00:00, where clock jumped for 1 hour (gap between 00:00 and 01:00); so from 01:00 there are no ambiguous dates or gaps.

Using Tz:

DateTime.from_naive(~N[1977-05-06 01:00:00], "Africa/Algiers", Tz.TimeZoneDatabase)
{:ok, #DateTime<1977-05-06 01:00:00+01:00 WEST Africa/Algiers>}

Example bug 5:

DateTime.from_naive(~N[2062-01-07 00:00:00], "Africa/Casablanca", Tzdata.TimeZoneDatabase)

** (RuntimeError) dynamic periods assume 2 rules per year

Using Tz:

DateTime.from_naive(~N[2062-01-07 00:00:00], "Africa/Casablanca", Tz.TimeZoneDatabase)
{:ok, #DateTime<2062-01-07 00:00:00+01:00 +01 Africa/Casablanca>}

Example bug 6:

DateTime.from_naive(~N[2064-01-20 02:00:00], "Africa/Casablanca", Tzdata.TimeZoneDatabase)

** (MatchError) no match of right hand side value: :min

Using Tz:

DateTime.from_naive(~N[2064-01-20 02:00:00], "Africa/Casablanca", Tz.TimeZoneDatabase)
{:gap, #DateTime<2064-01-20 01:59:59.999999+00:00 +00 Africa/Casablanca>,
 #DateTime<2064-01-20 03:00:00+01:00 +01 Africa/Casablanca>}

Example bug 7:

DateTime.from_naive(~N[2013-10-25 01:00:00], "Africa/Tripoli", Tzdata.TimeZoneDatabase)
{:ambiguous, #DateTime<2013-10-25 01:00:00+02:00 CEST Africa/Tripoli>,
 #DateTime<2013-10-25 01:00:00+01:00 CET Africa/Tripoli>}

There was no DST change on 2013-10-25 01:00:00 at Tripoli.

Using Tz:

DateTime.from_naive(~N[2013-10-25T01:00:00], "Africa/Tripoli", Tz.TimeZoneDatabase)
{:ok, #DateTime<2013-10-25 01:00:00+02:00 CEST Africa/Tripoli>}

Example bug 8:

DateTime.from_naive(~N[2013-10-25 02:00:00], "Africa/Tripoli", Tzdata.TimeZoneDatabase)
{:gap, #DateTime<2013-03-29 00:59:59.999999+01:00 CET Africa/Tripoli>,
 #DateTime<2013-10-25 03:00:00+02:00 EET Africa/Tripoli>}

This one is tricky. There was a DST change according to the following iana rule:

Rule | Libya | 2013 | only | - | Oct | lastFri | 2:00 | 0 | -

the local offset from standard time changed from 1 hour to 0.

However, the standard offset from UTC time changed as well: one hour was added.
That leads to a total offset difference of 0. Hence, there is no gap.

Using Tz:

DateTime.from_naive(~N[2013-10-25 02:00:00], "Africa/Tripoli", Tz.TimeZoneDatabase)
{:ok, #DateTime<2013-10-25 02:00:00+02:00 EET Africa/Tripoli>}

Example bug 9:

DateTime.from_naive(~N[1941-04-18 23:00:00], "Europe/Belgrade", Tzdata.TimeZoneDatabase)
{:ok, #DateTime<1941-04-18 23:00:00+01:00 CET Europe/Belgrade>}

There is a gap at that time. The local offset from standard time moved from 0 to 1 hour.

Using Tz:

DateTime.from_naive(~N[1941-04-18 23:00:00], "Europe/Belgrade", Tz.TimeZoneDatabase)
{:gap, #DateTime<1941-04-18 22:59:59.999999+01:00 CET Europe/Belgrade>,
 #DateTime<1941-04-19 00:00:00+02:00 CEST Europe/Belgrade>}

Example bug 10:

DateTime.shift_zone(~U[2010-03-27 14:00:00Z], "Asia/Kamchatka", Tzdata.TimeZoneDatabase)
{:ok, #DateTime<2010-03-28 01:00:00+11:00 +11 Asia/Kamchatka>}

According to iana’s records, the standard offset from the UTC time changed from 12 hours to 11 hours at
2010-03-28 02:00 standard time, which was 2010-03-28 14:00 UTC time. That’s why Tzdata shows +11 above.
However, there is another rule that says, at 2010-03-28 02:00 standard time, the local offset from the standard time changed from 0 to 1 hour. So all in all, it is not +11 but should stay at +12.

Using Tz:

DateTime.shift_zone(~U[2010-03-27 14:00:00Z], "Asia/Kamchatka", Tz.TimeZoneDatabase)
{:ok, #DateTime<2010-03-28 02:00:00+12:00 +12 Asia/Kamchatka>}
38 Likes

Correct! For dynamically generated modules, the amount of data and clauses is the main reason for high compilation times. However, I have tried the tz library locally and the compilation times do not seem so bad. If compilation times get higher in the future, you can consider partitioning it. For example, you can compile a database especially for “Asia” or “America”. And if the lookup starts with “Asia/”, you forward the call to Tz.TimeZoneDatabase.Asia.some_fun. Gettext, for instance, generates one module per locale.

Btw, good job on the library and on the bug fixes! I would recommend you to add a small section on the README explaining why it is different from tzdata - pretty much what you posted here - as I am planning to also link to tz from Elixir’s DateTime docs. :slight_smile:

I know for certain that having no built-in updates is a feature for some and especially for Nerves folks (IIRC). They would also most likely benefit from compilation pruning too. For example, reject all timezone rules from before 2010 - which is much easier to do when not relying on ets. The module compilation approach may also provided faster lookups. It may be worth benching.

14 Likes

Does this mean bugfixes will be locked to the current IANA db and users will be forces to switch tz database if it changed in between installation and a bugfix?

3 Likes

Well, just my 2c here: very good job on the library! But I like having dynamic updates, at least as an option.

Nowadays many systems are connected and can fetch updates without problems. And when your service is not a “centralized service” which can be easily updated, but the very same software installed in hundreds of locations, well having to update them for just new tz data is simply too much work, resulting in systems with outdated tz data.

3 Likes

Really good work!

My own feedback also includes the ability to choose between updating the db at runtime and having a mix task update the db locally (and recompile).

2 Likes

FYI you can disable auto-updating in tzdata: https://github.com/lau/tzdata#automatic-data-updates

4 Likes

As a quick note, you may want to include instructions for configuring Elixir to use your library for TZ related operations eg: https://github.com/lau/tzdata#getting-started

3 Likes

Done. Anyone, feel free to update the text if needed.

As every iana file can contain time zone information from any continent (e.g. the file “europe” contains zone info of “America/Thule”, “Asia/Omsk”, etc.), then I suppose that you suggest me to create those modules dynamically with the “code & macros”-related APIs? Would have been so much easier if every time zone in the “europe” file concerns a “Europe/” time zone.

Just to make sure: the goal here is to reduce memory usage? Because that won’t decrease lookup time as time zone periods are queried from the most recent periods. It can also potentially decrease compilation time but it is annoying to integrate (i.e. I have to add some code to skip smartly). So not sure what is important for Nerves developers; memory?

What do you think of this configuration:
config :tz, :skip_time_zone_periods_before_year, 2010
Please suggest other naming if you think of a better one.

Do you mean they would be forced to upgrade the iana tz database if a bugfix comes after the installation of a new iana tz database update? How could that ever be a bad thing? I think that anyone always wants the latest tz data.

I did not consider such large decentralized systems. Let’s try to add dynamic updates then:)

Done.

3 Likes

In theory yes, in practise not so much. Say IANA fixes a certain incorrect tz definition, which an app depends on right now to work correctly. Your library though has a way more far reaching bug fixed. There might not be time yet to fix the issue with the timezone definition, but updating your library is crucial. As you correctly noted you’d force the fix of the less important bug before the important one can be fixed.

Decoupling your library from IANA releases would allow the pull of the bugfix while staying on an old tz db version and also allow people to update to new versions of the tz database without any additional releases from your side.

4 Likes

Out of interest - have You tried using the persistent_term instead of compiling modules? I know that changes there will be expensive (almost as expensive as runtime-recompilation), but on the other hand the tzdb do not change that often.

3 Likes

On the other hand not enabling runtime updates does fix issues such as:

But I think am in agreement with most here. It is good to not update the time zone db’s at runtime by default, but it would be good to provide an optional way to accomplish that.

@mathieuprog does tz work without access to a local directory? Since the time zone modules are compiled I would think so. I recently ran into this tzdata issue (that is closed but not fixed) since I am running from an escript (which does not have a priv directory):

And thanks for creating tz!

1 Like

What possible advantages do you see using persistent_term over compiling into maps and recompilation?

Yes. At runtime it does not access the file system (only during compilation). And thank you for sharing that limitation of escript, let’s always take that into account when new ideas come up.

1 Like

Well, simplified persistent term does exactly this. Recompile a module. But using pattern matching on function heads, rather than using a function that select from a map.

So as I’m not aware how your current implementation looks like, changing to persistent term might increase lookup performance while still being easily updatable at runtime.

There is one disadvantage though - it need to be done on each startup, while precompiled module can be, well, precompiled :wink:

Oh, I though we were talking about a runtime-recompiled module…

The point is that you can do both in BEAM - have precompiled module and then recompile it when needed. So it was more out of interest rather that being useful in the implementation. However recompiling the module with the tz which would use patter matching instead of map lookup should be the nicest solution out there.

1 Like

can you develop that idea? :grin: currently the lookup is done by traversing the list of time zone periods (a period is a map with :from and :to entries) and checking if the timestamp (the given date for which we want to find the period) is included in the period; it’s linear, starting by the most recent period. I can optimize this by adding guards to the generated functions that return the period maps, in order to divide the periods; instead of

def periods("Europe/Brussels")

have

def periods("Europe/Brussels", timestamp) when timestamp > A and timestamp < B
def periods("Europe/Brussels", timestamp) when timestamp > C and timestamp < D
# and so on

Note that I think this is an optimization only for the lookup of very old dates; I wonder sometimes, who really needs that.

That will increase the function clauses though. Then I can also generate different function names per time zone and/or have a module per continent. There are a lot of parameters and possible optimizations.

However not sure what you meant by pattern matching instead of map lookup.

1 Like

Here are some major updates for tz:

1. Reduced lookup time

Testing nearly 10 million past dates on my machine took around ~430 seconds. I could reduce it to ~260 seconds. In the previous version, periods were looked up linearly as they were simply stored in a list. In the current version, this list of periods has been divided into a map, grouped by year. The data grew larger but I could keep the compilation time relatively low, by compiling the periods into modules per area as @josevalim suggested (one module for “Europe/”-time zones, “Asia/”-time zones, etc.).
(Note that I tried to work with function clauses instead, but either compilation was too slow, or after decreasing clauses, lookup was slower, or it led to errors due to too many quoted blocks, …; changing the data structure seemed most efficient in terms of lookup/compilation time balance).

2. Performance tweaks

Two environment options may be given to tz to tweak performance.

-> Decrease period lookup time for dates in the future:

config :tz, build_periods_with_ongoing_dst_changes_until_year: 20 + NaiveDateTime.utc_now().year

Looking up periods for far-future dates for a time zone that has DST changes is expensive, as tz has to dynamically compute those periods. It is now possible to specify until what year periods have to be pre-compiled. This does not affect period lookup time for past periods :+1:, as periods are now stored into a map grouped by year; in the previous version, this option would drastically increase lookup time for past dates, as the search was linear (i.e. the more data you added, the slower it was).

-> Decrease compilation time by rejecting old periods that are never going to be looked up:

config :tz, reject_time_zone_periods_before_year: 2010

These options have been documented in the readme file with their defaults.

7 Likes

You want automatic updates. I listened.

tz comes now with optional automatic updates.

To enable automatic updates, add Tz.UpdatePeriodically as a child in your supervisor:

{Tz.UpdatePeriodically, []}

If you do not wish to update automatically, but still wish be alerted for new upcoming IANA updates, add Tz.WatchPeriodically as a child in your supervisor:

{Tz.WatchPeriodically, []}

This will simply log to your server if any update is found.

mint, a pure Elixir HTTP client, and castore need be added in the dependency list.

This addition to the library does not slow down compilation time nor lookup time.

How does it work? (Also, need your feedback)

When a new available IANA tz database is found, the modules are recompiled with the new time zone data.

I might need some review for this piece of code:

case :erlang.get(:elixir_compiler_pid) do
  :undefined -> &Task.async/1
  _ -> &Kernel.ParallelCompiler.async/1
end

I use :erlang.get(:elixir_compiler_pid) to know whether we are in compile-time or run-time. If compile-time, modules (per time zone area) are compiled in parallel using the function Kernel.ParallelCompiler.async/1. If run-time, modules are compiled in parallel using the function Task.async/1.

Modules are purged after recompilation:

Module.create(module, contents, Macro.Env.location(__ENV__))
:code.purge(module)

Does this seem alright? Updates work fine for me this way.
Thank you for your feedback!

4 Likes

This seems generally sound, although I on the HTTP front I think your best bet is to create a Tz.HTTPClient behaviour and make the dependency on :mint and :castore optional. This way people can plug in their own HTTP client if they’re already using one in particular. I can’t comment on the recompilation strategy sorry.

6 Likes