Call for proposals: time zone support in Elixir

TL;DR: The Elixir Core team is announcing a call for proposals to extend support for time zones in Elixir’s standard library.

The reasoning

Elixir’s relationship with dates times and calendars used to be bumpy. Fortunately, in the 1.3 release we got the built-in calendar types (DateTime, NaiveDateTime, Date and Time) with a thought-through and hopefully future-proof structure and interface. In the following releases, the number of defined APIs grew and the support for various features and algorithms increased.

Today, all the libraries (timex, calendar, ecto and more) use the built-in structs and define various algorithms using them, which means they are fully inter-operable and there’s no longer the issue of incompatible libraries. The transition is not yet complete, though.

Even though, the DateTime struct semantically defines a point-in-time in a concrete time zone, the Elixir standard library does not ship with the time zone database - this means that the functions in the DateTime module can only operate on structs in the Etc/UTC time zone requiring most libraries to include some third-party solution to deal with datetimes in other time zones.

The are many reasons not to ship with a time zone database, the two primary ones are the increase in size of the standard elixir release and the fact that the time zone database is often updated, which would tie the elixir release schedule to the time zone database update schedule.

The solution

Nonetheless, support for time zones is something we, the Elixir Core team, would like to include in the Elixir’s standard library. Unfortunately, we lack the required expertise and primarily the required time to properly design and implement this feature. That’s why we’ve decided to reach to you, the community, and ask for help. We’re asking you to propose possible solutions within the following design constraints and help us push Elixir forward.

Design constraints

Because we still don’t want to ship with time zone database in Elixir itself, we want the time zone library to be pluggable. To achieve this, the solutions needs two components - an interface for such a library defined in elixir itself (most probably a behaviour with one or more @callbacks) and a library implementing the behaviour and providing the time zone database to the standard library.

We don’t want the interface to be based on the current shape of the tzdata library. It has broad interfaces that often return collections of data you need to further process to retrieve the desired information. This means that for many operations a lot of data needs to be copied from the supporting ets tables making the library slower than it should be. The interface of the proposed library should be focused on speed and offer focused functions that give concrete answers to very specific queries.

When it comes to concrete implementations of the time zone database library there are many possibilities:

  • compile the database into a module using macros;
  • store in ets tables and update dynamically;
  • call some C utility;
  • and possibly more.

It’s important that we don’t want to focus on those yet. We want to look at what functions to define in standard library to take advantage of the time zone database and what interface we need from a time zone database provider. The time for concrete implementations will come later.

Proposal

The proposal should include 2 things:

  • new functions in the Elixir standard library leveraging the time zone database
  • a way to provide the time zone database to the Elixir standard library though a package.

The proposals should be RFC-style. This means they should be actionable and present concrete ideas and APIs, not just talk about principles and possibilities. A proposal should include signatures, typespecs and documentation drafts for all the new modules, functions and callbacks.

We intend Elixir 1.8 (estimated January 2019) to ship with those extensions. Thank you!

63 Likes

Another great example of how the core team actively promotes community involvement :003:

…and can I just say, congrats on joining the Elixir Core Team Michał :023: :tada:

20 Likes

First and foremost, I’d like to ask: What is the Elixir Core team’s current view on handling leap seconds? Would leap second handling be expected to be part of this new system, would it be ‘opt in’ based on what timezone functionality would be created, or would it explicitly not be something we’d want to consider at this time?

And, related: What functionality should be kept in Calendar and what should be part of timezones? Would timezones play a role when using a calendar other than Calendar.ISO?

And also from me, congrats on joining the Core team, Michał! :cake:

5 Likes

I would expect timezone support in calendars other than Calendar.ISO. I am nearly completion of a lib supporting other calendars (ISO Week, 445/454/544, various business calendars) that are based upon the Gregorian proleptic calendar but are different. The concept of time, for these calendars, is the same as Calendar.ISO.

6 Likes

I believe those questions are exactly what the proposals should answer. :slight_smile: Although it feels like leap seconds is a bit orthogonal to the timezone concerns? In any case it is unclear if it will be up to the calendar to perform the timezone lookup or if it will be handled separately.

4 Likes

This is not yet a proposal, but more a list of known behaviour that timezones exhibit, and which will probably have to be modeled in one way or another to be able to make a proposed Elixir time zone support system useful:

  • In essence, timezones are an offset from a given ‘base’ time. Usually we talk about these offsets relative to UTC time itself (but this of course only makes sense w.r.t the ISO8601 calendar)
  • However, how far timezones are offset might change, meaning that local wall clock time might have times that ‘occur twice’, as well as sections of time that ‘do not appear at all’.
  • When timezones change their offset also varies: The most common form of this is known as daylight savings time, (which some locations start or end at midnight, and others at two in the morning, for instance; also, europe and the US perform their DST shift a couple of weeks apart). However, there are other forms, as well as political reforms that alter the offset permanently for a given geographical region.
  • Above rules mean that when someone creates a datastructure representing a future datetime, they might:
    1. keep it as-is, which will mean it is influenced by (political) changes in timezone rules between ‘now’ and the specified datetime.
    2. transform it to an absolute (i.e. UTC) datetime, which means it will not be influenced by these.
      Whichever one of these is more reasonable depends on the context at hand, so this choice needs to be open to the user (the application programmer).

please tell if you see an oversight, mistake or ommission in here and I’ll update the list.

5 Likes

Yes, to be more precise about the first two points. Any DateTime has enough information to convert back to UTC. However, we are unable to receive a NaiveDateTime and a timezone and convert it to DateTime, exactly because we lack the timezone information that would tell us if a wall clock occurs once, twice or not at all in that timezone. This is the smallest proposed change I can see being done to the stdlib.

4 Likes

A post was split to a new topic: Elixir and RFCs

Last post before bed-time, I hope I am still awake to make sense:

I currently think we need the following functionality:

First and foremost, to ensure that the timezones can be used with any calendar, they will work on the
Calendar.iso_days type:

@type day_fraction :: {parts_in_day :: non_neg_integer, parts_per_day :: pos_integer}
@type iso_days :: {days :: integer, day_fraction}
  1. Given the current iso_days in the ‘zero’ time (the timezoneless, non-offset time; for ISO8601 this would be UTC), return the day_fraction this timezone is offset from the ‘zero’ time.
  2. Given an iso_days in the current timezone, return a list of zero, one or two (is more possible?) day_fraction elements that are offsets from this current timezone iso_days to create possible iso_days in the ‘zero’ time.

Besides these two pieces of core functionality, we might want to standardize some metadata-fields that timezone-implementations should specify, like:

  • name
  • geographical region
  • date range during which this timezone is known to be valid (TZdata, for instance, only keeps current time zone info and therefore might use wrong results for historical dates; we might want to generate warnings when a timezone is attempted to be used outside of its range.)

Exactly from where these calls should be initiated, what modules are in control of the flow, and how it should be wrapped is, I presume, part of the concrete implementation details that are to be considered later.

But from user-land, I think these might be exposed like following
(Here goes the initial attempt at naming things; be gentle :stuck_out_tongue_winking_eye: ):

DateTime.change_timezone(date_time :: DateTime.t, timezone_name :: String.t) :: {:ok, DateTime.t} | {:error, error}

(This call at least fails if specified string does not identify any known timezone. Are there other potential sources of failure?)

NaiveDateTime.suitable_datetimes_for_timezone(naive_date_time :: NaiveDateTime.t, timezone_name :: String.t) :: {:ok, [DateTime.t]} | {:error, error}

(This call at least fails if specified string does not identify any known timezone. If the timezone is found but the NaiveDateTime does not indicate any valid time, the result list will be empty. Are there other potential edge cases?)

That’s all for now, more tomorrow. I look forward to your critiques!

2 Likes

4 posts were split to a new topic: Should Calendar types and timezones be part of Elixir standard library?

Primary resources for leap seconds:

I think having an API to check out how the operating system handles the leap seconds will be useful, such as accessing the tzdata or timezone database (and the current timezone environment).
OTOH, having something else than the operating system definition might be implementing the same things twice with different interpretation and might be troublesome.

Also, UTC is affected by the leap seconds, and not absolute. If you want a time sequence of monotonic increase in the same pace at all times without any human society intervention, TAI should be a choice. OTOH, the operating system clocks of most computers in the world are synchronized with UTC.

1 Like

On FreeBSD, tzdata should be recompiled if the operating system configuration of being aware of handling or not handling leap seconds is changed.
So tzdata is not orthogonal to the timezone database, and rather tzdata is dependent on the leap second handling definition in the operating system.

1 Like

This is interesting. However, we already keep the offsets in the DateTime struct in ISO seconds and we wouldn’t be able to change it due to backwards compatibility. Thoughts?

2 Likes

I would say that this will be up to the concrete time zone info provider library. In this way it’s outside the scope of the proposal. One possible implementation I can imagine is reading the time zone info from the operating system.
That said, I agree that the leap second issue is parallel, but closely related to time zones, so the proposals could try to address both of them.

I am looking forward to having this behaviour in the Elixir standard library - something that has been planned for a long time, and I think there was even talk about having it in Elixir version 1.4 or 1.5.

This sounds like it is based on a misunderstanding.

The Tzdata library has a bunch of different public functions. However they are not all needed to simply do timezone calculations. For instance you can get a list of all time zones, which is useful in some situations. But for the core calculations you mostly need one function.

Example:

gregorian_seconds = {{2018,01,01},{00,00,00}} |> :calendar.datetime_to_gregorian_seconds()
periods = Tzdata.periods_for_time("Europe/Copenhagen", gregorian_seconds, :wall)

Since there is no ambiguity for that wall time this returns just one period which has the abbreviation, standard offset and UTC offset needed to make a DateTime struct.

Here a behaviour that is not released, but one that the Tzdata library implements:

  @type gregorian_seconds :: non_neg_integer()
  @type time_zone_period_limit :: gregorian_seconds() | :min | :max
  @type time_zone_period :: %{
          utc_off: Calendar.utc_offset(),
          std_off: Calendar.standard_offset(),
          zone_abbr: Calendar.zone_abbr(),
          from: %{
            standard: time_zone_period_limit,
            utc: time_zone_period_limit,
            wall: time_zone_period_limit
          },
          until: %{
            standard: time_zone_period_limit,
            utc: time_zone_period_limit,
            wall: time_zone_period_limit
          }
        }

  @callback periods_for_time(Calendar.time_zone(), gregorian_seconds, :wall | :utc) :: [
              time_zone_period
            ] | {:error, :not_found}

Both Tzdata 0.1.x and 0.5.x versions implement that behaviour. So you can either have the newer version that can update automatically or the 0.1.x versions that require manual updates.

The “from” and “until” part are useful when there is a gap in wall time during e.g. “spring forward”. The “standard” part is not needed for most calculations, and I would lean towards excluding the “standard” part of from/until from a native Elixir behaviour.

0.1.x versions of Tzdata are using compilation of the data using macros. Once compiled and loaded, it is quite fast. The negative part of that is memory usage during compilation, but more importantly it requires compiling again whenever new data is needed. Akin to recompiling Postgrex everytime someone writes to a Postgres database.

About leap seconds: I also think there should be an interface/behaviour for that. Either as the same behaviour or one separate for one for time zones. Leap seconds are needed in order to verify UTC datetimes. So in order to fully support ISO8601 and UTC they are needed. And the tz database from IANA has leap second information included, so if we want to support both UTC and timezones, it makes sense to not ignore the leap second information provided.

4 Likes

@Lau thank you for that information!

Does anyone know what the story is with regard to checking if the OS clock already compensates for leap seconds (and how)?

@josevalim The fact that current datetimes already store the offset in an iso8601-format indeed means that it is a lot harder to write backwards-compatible extensions that are also calendar-agnostic. I will give this matter some thought, but it will probably difficult to keep changes both general, compatible and clean.

Well, to be fair, we could assume that they are always part of 86400. We just need to keep it in mind when doing computations to other timezones.

Yes. To be exact, this is already part of the implicit contract of DateTime. I remember that we talked about the notion of day-subdivision systems in the calendar discussion topic, were multiple subdivisions (like the Hebrew and Arabic systems) are just not compatible with 86400 seconds. But since these systems both see limited use, and I presume that especially for timezones this is fairly uncommon (Because AFAIK – do prove me wrong – all common timezones are offset from ISO8061/UTC), it might not at all be unreasonable to keep the Elixir standard library implementation limited to ISO8601-‘ish’ timezones.

Side Note: The only backward-compatible way to alter the structure that is represented that I can currently think of, would be to introduce a new struct with a new name (like ‘ProperDateTime’) that therefore no library could have written code against, which means that its implementation might be very different than the current DateTime struct. Immediate drawback: All current libraries do not work at all with this new structure.

I ran some benchmarks on the compiled version of tzdata (version 0.1.201805)

Operating System: macOS
CPU Information: Intel® Core™ i7-7920HQ CPU @ 3.10GHz
Number of Available Cores: 8
Available memory: 16 GB
Elixir 1.6.4
Erlang 20.3.4

Name                                        ips        average  deviation         median         99th %
periods_for_time_for_reykjavik_tz       12.51 M      0.0800 μs   ±357.97%      0.0700 μs        0.21 μs
periods_for_time_by_utc_london           5.85 M       0.171 μs  ±2071.47%        0.20 μs        0.30 μs
periods_for_time_by_utc_cph              4.20 M        0.24 μs  ±1601.73%           0 μs           1 μs
periods_for_time_by_wall_cph             3.66 M        0.27 μs  ±1575.91%           0 μs           1 μs

The benchmarking tool says a median of 0 μs for some of them :thinking:

About the system clock and leap seconds: Different systems handle them differently. Unix time does not really support UTC. So there are different strategies to handle them: skip back a second to essentially repeat the 60th second (xx:59:59 happening twice). Or make each second slightly longer on a leap second day to compensate for the extra second.

I have read about some systems running on TAI instead of UTC. But from an Erlang point of view I haven’t seen anything about it handling TAI - it also uses UTC. I’m not sure but maybe if you set your system clock to run TAI instead of UTC, Erlang would think that it was UTC.

4 Likes