Calendars & Calendar Conversions

As the talk about calendars, differences between calendars and the difficulties of converting dates and times from one calendar to the other got a little bit out of hand in the topic of the Jalaali calendar library, this discussion can continue here.

The problems with calendars

(a short summary of the discussion so far)

  • Calendars are often irregular in an unpredictable way: Leap days, leap seconds, next month only starting if the moon is visible, etc.
  • Different calendars use different measures (such as different times at which one day ends and the next one starts: midnight, noon, sunrise, sunset?)
  • The OS clock in our computers ‘counts seconds’ but does not handle leap seconds, which complicate the conversion between OS (POSIX) time and datetimes in other calendar systems.

Now, of course: How would it be possible to overcome these problems as good as possible?

  • We can look at already-existing calendar implemetations such as Joda Time.
  • As @kip noted, Dershowitz and Rheingold described conversions between 23 calendars by using a fixed date representation. (He currently has an open pull request to add an integer calendar representation to the core)
  • This representation might be enhanced by choosing a smaller denomination than integer ‘days’ to make the results truly unambiguous and allow conversions between calendars of different times.
  • One such intermediate representation might be TAI (International Atomic Time), as it is one of the few calendars that is monotonically increasing and computationally simple.
  • However, converting OS time to TAI again requires knowledge of leap seconds, which means that results will be unaccurate starting +6 months from now.
  • Also, precision will diverge when we go too far in the past or future in any case, due to small irregularities or inaccuracies between calendars.

Calendars are hard (but interesting!). Let the discussion on how an accurate calendarium (for Elixir?) might be built continue!

7 Likes

@Qqwy, thanks for starting this thread. And @alisinabh, apologies for hijacking your thread. From the earlier conversations and @qqwy’s summary above, i believe consensus so far is:

  1. A calendar conversion mechanism in the Calendar behaviour is a good idea
  2. That the conversation mechanism should cater for dates and times (ie. conversion of Date, NaiveDateTime and DateTime
  3. Desirably the mechanism preserves monotonicity of time

The candidates I have seen in discussion so far are:

Julian Date (JD)
This is well understood, especially in astronomy circles. In incorporates an integral number of days since an epoch (epoch depends on which version of JD is chosen, but the original epoch of Julian day number 0 is assigned to the day starting at noon on January 1, 4713 BC, proleptic Julian calendar (November 24, 4714 BC, in the proleptic Gregorian calendar). The fractional part represents the fraction of a day since noon.

Plus: well understood, incorporates date and time, conversions have good precision
Minus: float format means time conversions are approximate and therefore whilst quite precise, cannot be assumed to be accurate in the way formal time systems are accurate.

Rata Die
This is the mechanism used by Derschowitz and Rheingold. This is essentially the same mechanism as Julian Day but with epoch of 1 January 0001. In their book Calendrical Calculations, RD is sometimes used as an integer meaning number of days since epoch - and this is what I originally proposed in order to avoid the issues of time conversion. An alternative form uses the same approach as JD, using a fractional part of a float to denote fraction of a day since noon.

Plus: Mostly the same as JD. According to Wikipedia this is used by Go, Rexx and .Net. Another benefit is that D&R have described and tested the algorithms for 23 calendar types.
Minus: Same as JD, and also it seems to be less familiar than JD

Unix timestamp
Common timestamp format so well understood. As long as its 64 bits (as Elixir is) then precision is enough to cater for all practical date/time ranges

Plus: well understood epoch of January 1, 1970 Gregorian; many other library functions that can work directly on Unix timestamps. Precise in that only integer arithmetic is required whereas JD and Rata Die require float arithmetic.

Minus: Not monotonic due to leap second issues

TAI (Temps Atomique International)
Monotonic clock which underpins UT1 time and is part of the UTC time standard means the format is standardised (several formats are available from accuracy of seconds down to nanoseconds depending on needs).

Plus: Monotonic
Minus: Difficult to compute, requires calculation of drift from UTC and a leap second table. Must adjust each time a leap second is declared.

This is clearly not a definitive list, nor a complete explanation. Given that the primary discussion here is about calendar conversion, I would propose JD as the basis of conversion since that is familiar to calendarises, allows calculation of both arithmetic and astronomical calendar conversions and is computationally quite easy to calculate. Clearly I’m also ok with Rata Die (using the float form to represent time) since there is a solid body of work to be leveraged but that is still true using JD. I think despite the benefits, TAI is too complicated to calculate and to maintain.

5 Likes

Could we use JD or RD but represent it as a tuple where the first element is the number of days and the second element is the number of microseconds elapsed in that day?

2 Likes

I think we could, as long as we define a very clear translation of day fraction ↔ microseconds. This might be as simple as saying ‘there are always 86_400_000_000 microseconds in a JD (or RD)’. (86_400_000_000 == 24 * 60 * 60 * 1_000_000)
This does mean that extra care needs to be taken when converting to/from UTC datetimes because of the afore-discussed leap second troubles:
On days at which UTC has a leap second, the conversion of an UTC microsecond is not 1:1 to a JD/RD microsecond, but rather 86401 : 86400 (an UTC day with a leap second contains 24 * 60 * 60 + 1 seconds). Most timestamps with microsecond integer precision will have to be rounded when doing this conversion.

As the gcd of 86400 and 86401 is 1, if we want to use an integer base in which no rounding would occur, we need to multiply it with the common base (86400 * 86401) == 7465046400 first.

So, if we want to have an exact (monotonically increasing) result, we’d need to store microseconds * 7465046400 in the second tuple field.

This does sound like a bit of hard work, but the nice thing about using a {day, in_day_amount} tuple is that the answers will always be exact and fast to convert to- and from UTC on any non-leap second days (Using for instance TAI as intermediate standard we would not be able to do this as in that case, every datetimestamp past a leap second needs to keep track of that leap second).
I believe we’d be able to claim microsecond exactness for all times on days except the June 30ths and December 31ths that are more than 6 months in the future; during those days, this calculation might at most be (amount_of_years_more_than_six_months_in_the_future * 2 * 86401) / 86400 seconds off.

(note: it theorerically is also possible that there are leap second deletions, in which case an UTC day might only have 86399 seconds. This has never happened during the past 45+ years that leap seconds are part of the internationally used calendar, but if we’d want to support it, we’d need 86399 * 86400 * 86401 as base.)

4 Likes

This kind of intra-day amount would also mean that conversions to- and from calendars that have a different (but well-defined) starting point for their days, as these would convert to static offsets in the intra-day amount.

2 Likes

Thanks for the reply. I guess that raises the question of, even if we account for leap seconds in the datetime representation, the only way to calculate the difference between two dates is by having in hand all of the leap seconds that have occurred between the two dates. And if we are going to keep this information in memory, we will also need a way to update it.

Does all calendars have a fixed amount of hours in a day, except by the leap second case, or do some calendars have shorter and longer days?

1 Like

Yes and no. As we count in days and ‘day fractions’ (my previous post was an explanation of how such a day fraction could be stored in an integer format that would be precise enough to handle leap second conversions), to calculate the difference between two dates (in days + day fractions) will not need to handle leap seconds.

In pseudocode:

@common_base = 86399 * 86400 * 86401 
@milliseconds_times_common_base_per_day = 24 * 60 * 60 * 100_000 * @common_base
def difference({rdf1_day, rdf1_fract}, {rdf2_day, rdf2_fract}) do
  resulting_fract = rdf1_fract - rdf2_fract
  if resulting_fract < 0 do
    {rdf1_day - rdf2_day - 1, @milliseconds_times_common_base_per_day - resulting_fract}
  else
    {rdf1_day - rdf2_day, resulting_fract}
  end
end

When working with a POSIX time clock such as OS time, which ‘forgets’ that a leap second happened, conversions are accurate except during a leap second (during which different clock implementations (strict POSIX, NTP, et al.) will do different things such as repeating a second or ‘slowing down’ around it, neither of which can be recognized/handled at the level of our calendar implementation.) Lucky for us, Erlang already has done some work in this regard, having multiple different kinds of clocks (Erlang time documentation).

When converting to- or from an UTC datetime, we’ll need to consult the list of leap seconds for dates that are the 31st of December or the 30th of June, to check if that particular date is in there (but this thus is only necessary for those two days of the year!).
When converting to- or from TAI, you always need the list of leap seconds, as you’ll need to add/subtract all leap seconds that have happened before the timestamp under consideration.

Such a list therefore seems necessary. As to how handle updating the list: When working with POSIX times, the Network Time Protocol is often used (Erlang uses it) to keep the OS clock in sync and notify it about leap seconds. However, NTP only notifies you on the day itself that this is a day with a leap second. It therefore is unusable for working with dates in the future.

A simple solution would be to release new versions of the Calendar library each time a leap second is announced. I am not sure how easy it would be for systems that rely on hot-code reloading to update such an dependency while running.

In any case, it would seem fair to me to say that maintainers of applications that absolutely need the sub-second leap second precision should be considered responsible themselves to update their calendar library once every six months.

I believe the length of a day is earthwide the same; even if the length of daytime and nighttime differ depending on if you are on the poles or at the equator, the length of (daytime + nighttime) is the same*, which means that all calendars that have been made through the ages at the different parts of the globe have this same ‘day’ unit.

As for the subdivision of a day (in e.g. hours), this is something that calendars do differently.

(* unless we’d want to correct for relativistic time dilation differences, but that is definitely a precision and complexity we do not need for a general-purpose calendar library).

2 Likes

But to calculate the difference between two dates in seconds, we would need to know which days had leap seconds, correct?

Yes. This would be the same as converting both times to TAI timestamps first, and subtracting these.

Hi everyone.

Sorry if i’m bugging you, i just have to understand this problem About the problem with leap seconds. If we use UNIX since we are only dealing with date and NOT time, assuming even if every year from now will have a leap second, in year 86336 we will have a day misplaced and that is why we want to use JD. To support +84K years ahead (again, assuming every year will have a leap second). Correct?

It’s not always equal to a second. For more info see a leap second wikipedia article. How about create a simple API for Elixir to fetch a leap seconds?

elixir-lang.org/api/leap_seconds

We should know about add new second several months in advance, so update it is not a big problem :smile:.

To whom are you replying here? In any case, the point about UT1 and UTC not being equal (and TAI ↔ POSIX not being equal) because of the difference in leap seconds has been made in this topic before.

This will not work for the following reasons:

  • Some computers are not connected to the internet, and can therefore not use a service like this.
  • It is a centralized single-point-of-failure, and therefore insecure.

Not a problem! Calendars are hard, and it is important to ensure that we do it properly and that everyone understands what is going on.

We want to use Julian Date or Rata Die because they are really simple, unambiguous representations of the single common format that all calendars (created on our Earth) share: The period of a day.

How this day is subdivided (hours/minutes/seconds; the Hebrew calendar for instance uses 24 hours, each subdivided into 1080 ‘parts’), superdivided (how/if weeks/months/years are counted) and finally where the border between two days lies (noon, midnight, sunset) is something that greatly varies per calendar, however.

The JD/RD formats are able to store this information well, because they use days as unit of measurement.

The calendar library will be forced to deal both with dates and times, as when the next day starts (noon, midnight, sunset) varies per calendar, so if you do not specify a subdivision in the current day, there might be multiple possible dates in the target calendar.

Leap days (Julian/Gregorian and Islamic), Leap months (Hebrew) and Leap Seconds (UTC) were only added after the fact, when people found out that the way they subdivided/superdivided days in their calendars turned out to not be totally conform what happened in nature.

Leap days/Leap months are artefacts to ensure that the seasons do not drift across the dates in the calendar too much, as a (solar) year does not take e.g. 365 days but actually (approx.) 365.24219 days.

The leap second that was added to UTC was introduced because it turned out that the length of a day is also not exactly 86400 SI-defined seconds long, but rather dependent on the exact speed of earths rotation, which is not constant but fluctuating slightly. Leap seconds are peculiar in that they cannot be predicted but are artificial agreed-upon corrections of UTC towards the actual ‘natural’ day length, when expressed in SI-seconds.

It is possible for zero, one or two leap seconds to be added to UTC (and theoretically also for one or two leap seconds to be removed, although this has not happened yet during the ±45 years that leap seconds are used) per UTC year. This means that, if we forget applying leap seconds starting now, that it will take approx. 43227 UTC years to be off by more than a day.

So, to reiterate:
JD or RD are useful because they count in days, which is nice because the length of a day is the same in all calendars, regardless of if or when they use leap months/days/seconds or not. When using JD or RD as ‘base’ representation, these leap artefacts only need to be considered during conversion from the respective calendars to JD/RD or vice-versa, and not during any intermediate calculations.

I like José’s proposal for {RD/JD, milliseconds} a lot.

I wonder if we can cater for the leap second issue on conversion to this intermediate format. It seems easier and more intuitive to convert to milliseconds elapse including leap second offset at this stage. Then the calculation of date/time difference is simpler - albeit still have to watch out for a negative offset and adjust accordingly.

As far as I can tell we just need the leap_seconds to apply the relevant offset. It would also make for easier pattern matching for far forward and far past dates since leap seconds have a short history (and maybe a short future!).

If this becomes the agreed approach then is it better to separate out Calendar from core so that it can be updated in cycle with leap year announcements and out of cycle with Elixir core?

As for intra-day conversions - my understanding is that the leap second is always in inserted/removed at 23:59:59 UTC. Therefore there still remains a challenge of conversion for non-UTC times and therefore the time difference for non-UTC times/datetimes is somewhat problematic?

On the updating topic - given the intent of BEAM apps to be long-lived perhaps in addition to a leap_seconds table, an API to update via external service is also a good idea in addition to the table? And if thats a good idea, perhaps that makes a stronger case for Calendar being decoupled from core?

Sorry, I was not clear. I wrote a small reply (from my Nexus 5) to @alisinabh and I mean that not every year have count of leap seconds equal to 1. I also linked a wikipedia article where he can see a table about added leap seconds in years. We can’t assume that every year could have exactly one leap second. More: it’s also possible (but not yet happen) to have -1 leap second.

@Eiji I said assuming if this anomaly exists every year until year 86336 no problem will be there with dates.

@Qqwy thank you for explaining :blush: i have closed my PR on conversion using UNIX.

I have the feeling that some of my posts might have been too long, so to reformulate:

@kip, @josevalim I propose we use a {RD/JD, fraction} approach, as this would mean that calendars do not have to care about their internal details: For instance, when converting from the Islamic to the Hebrew calendar, leap seconds would not need to be considered at all!

The fraction part could either be an arbitrary-precision Decimal, a Rational number, or a fixed-precision integer representation that is precise enough to handle very small differences between day subdivisions. Using floats is of course out because of their rounding errors. Using decimals or rationals is out if we want to include the Calendarium in the core (or minimize dependencies for another reason).

If we want therefore to use a fixed-precision representation, we need to choose something that is precise enough to handle very small differences. The leap second differences in the UTC calendar are the smallest ones that we will need to handle for the forseeable future.
We know that we want to represent all of 1 / 86400 (one SI-second on a normal day) 1 / 86401 (one SI-second on a leap-second-day) and 1 / 86399 (one SI-second on a leap-second-removal-day). Milliseconds are not precise enough. Attoseconds are.

To whomever is confused by the concept of leap seconds:

Consider a straw.
This straw has a certain length.
I can put my ruler against it, and measure that it is ‘10.25’ centimeter.
I can also put a ruler in inches against it, and measure that it is ‘4.035’ inches.

Neither observation has altered the length of the straw.

If we want to count multiples of straws lengths in fixed, centimeter-size steps, we could say ‘well… lets assume they are exactly 10 centimeter long’.
But now here’s the problem: After counting four straws, we end up with 40 centimeter. But in reality, four straws are 41 centimeter. Oh snap! The more straws we count, the more our calculation is off.
The solution? Proclaim that every fourth straw is a leap straw which is not ten, but 11 centimeter long!

These proclamations have not altered the length of the actual straw in any way either!

The straw in this analogy is of course the day, whose length is the same across all calendars, because that’s simply how the earth rotates.

When we convert from Super Strawcounting With Leapcentimeters to Inches or vice-versa, it is useful to choose an intermediate representation that is based upon the length of the straw itself: This representation will never need leap straws, as ‘a quarter straw’ is always exactly a quarter of a straw, regardless of with what ruler it is measured.

And therefore, if IERS decides that another leap second is needed, this does not alter that ‘a day is a day’ (one full rotation of the earth remains one full rotation of the earth), but only how many SI-seconds fit in that given day. And therefore, counting with fractions of a day is more suited to calendrical conversions than choosing e.g. milliseconds as in-day unit.

1 Like

@Qqwy i think there is still the need to take leap seconds into account if we are talking about a diff function for datetimes? there is such a function now for naivedatetime, which doesn’t consider leap seconds and i think probably that needs an update.

For purely conversion i can seee it’s not a consideration.