Proposal: Allow calendar to be specified in ~D, ~U, ~N and ~T sigils

Elixir 1.9 introduces the ~U sigil thereby completing the full set of sigils for Date, Time, DateTime and NaiveDateTime.

Each of these sigils creates dates, times, datetimes and naive datetimes in the Calendar.ISO calendar.

Proposal overview

This proposal suggest introducing support for alternative calendars. Examples would be:

iex> ~D[2019-01-01 MySpecialCalendar]
iex> ~T[11:59:00 MyOtherCalendar]

In Elixir 1.10 this PR introduces calendar-specific support for the Inspect protocol which is the counterpart of the calendar sigils. So allowing a calendar to be specified will allow easy sigil -> inspect -> sigil roundtrips.

Proposal detail

  • A calendar may be specified in the ~D, ~T, ~U and ~N sigils
  • The calendar is the last element of a list produced by String.split/2 on the binary part of the sigil.
  • Similar to the approach for *_to_string/1 and inspect_*, 4 callbacks called sigil_*/2 is introduced with each of the 4 functions parsing the sigil and returning the relevant struct or an error
  • The four functions are to be implemented for Calendar.ISO which remains the default calendar and whose functions are 100% compatible with the current implementation.
  • Each calendar implementation is responsible for ensuring that the output produced from the Inspect protocol implementation can also be parsed with the sigil_*/2 implementations.

Example for ~D

Current implementation:

  defmacro sigil_D({:<<>>, _, [string]}, []) do
    Macro.escape(Date.from_iso8601!(string))
  end

Proposed implementation approach

  defmacro sigil_D({:<<>>, _, [string]}, []) do
    case String.split(string, " ") do
      [date_string] -> Calendar.ISO.sigil_date(date_string)
      [date_string, calendar] -> calendar.sigil_date(date_string)
    end
    |> Macro.escape
  end
3 Likes

The biggest problem is dispatching to appropriate calendar module,

case String.split(string, " ") do
  [date_string, calendar] -> calendar.sigil_date(date_string)
end

calendar here is a string so we’d have to convert it somehow to an atom.

I believe the only way is to unfortunately rely on global state, for example there would be, say,
Calendar.register(module) and Calendar.registered(string) :: {:ok, module} | :error functions and the registered calendars would be stored in elixir app env.

I am all for this, but the suggested proposal relies on no calendars having spaces in their ‘sigil date’ representation. I do not think that it would be wise to rely on this, because in the best case it restricts the calendars we can represent, and in the worst case people will end up with ununderstandable error messages like "UndefinedFunctionError 10.sigil_date/1 is undefined".

What if the calendar name is to be written before the date-part of the sigil?
So we get, as example:

~D[MySpecialCalendar 2019-01-01]

?

Good point, there are further complications with splitting by " ", date times can be in format: 2019-01-01 09:00:00Z, 2019-01-01 09:00:00 Europe/Warsaw.

Understood on both points.

As to the splitting on " " of course there may be better alternatives. However my proposal says:

  • The calendar is the last element of a list produced by String.split/2 on the binary part of the sigil.

All sigils (that I’m aware of) have a fixed format. The intent of having the calendar as the last element is to ensure compatibility with existing code there being no (valid) possibility of a calendar name being before the date/time.

Not my intention to suggest that. Yes, the proposal would require reassembling the all-but-last element of the list into a string again to be passed to the sigil parser and thats not very beautiful. But having the calendar name as the first element is more likely to introduce compatibility issues.

Yes, thats definitely an issue. In my proof-of-concept I am using

String.to_existing_atom("Elixir." <> calendar_name)

But that also has the limitation that at compile time when the sigil is being processed, this would not force the compilation and loading of the calendar module. Module.concat(calendar_name) would work but introduces an attack vector so its a no-go. Perhaps as @wojtekmach suggest, this itself may make the proposal untenable.

Since sigils are a compile-time construct and are literals (since there are no modifiers and no interpolation, perhaps this isn’t really an attack vector? And therefore this could be used to convert the string to a module name.

Agreed, its not a security concern because its fully compile time. Fortunately there are no ~d etc sigils that would allow interpolation.

If calendars are registered in some kind of global state at compile time, that should mean the registration needs to happen before the usage if I understand that correctly. This can only be ensured by requireing something, which does that registration. If I need to do require CalendarRegistration or even require MySpecialCalendar just to use ~D[2019-01-01 MySpecialCalendar] I’d prefer require MySpecialCalendar and MySpecialCalendar.datetime("2019-01-01"), which to me is way cleaner to read. It could be a macro or a function call. A macro could validate the input at compile time just like a sigil.

The other option of “use the name as written” is imho also not very useful. I can see the appeal of having sigils be able to use different calendars, but given that modules in my applications are hardly ever as short as Calendar.ISO and we cannot use aliases as interpolation is not allowed this is bound to be at least as long as ~D[2019-01-01 MyApp.SomeDomain.Calendar.CiscoWeek], which to me looks not really like a useful sigil. Again I think MyApp.SomeDomain.Calendar.CiscoWeek.datetime("2019-01-01") would be better especially as this could be aliased.

Good and fair points @LostKobrakai.

Part of my reasoning is that inspecting a date/time is a whole lot easier to comprehend when its ~D[2019-01-01] even when it’s ~D[2019-01-01 MyReally.Long.Calendar.SpecialName] compared to the raw map inspect output. But if that’s the inspect format (which I think is reasonable given current practise) then that should also be allowable as a sigil for input.

If it’s not about me writing the sigil in my code I can see it being useful. And really it could be implemented just for the reasoning you mentioned and I can still decide it’s to much work for me to use it in my code.

Sound logic.

I have’t implemented date parsing in ex_cldr_dates_times (as you know) and frankly I’d prefer not too. Its a pain. But if thats a requirement …

If we put the registration call into config/config.exs which is evaluated at compile-time then this would work.

~D[2019-01-01 MyReally.Long.Calendar.SpecialName]

I think registration could alleviate it, MyApp.Holocene could register itself under "Holocene" string.

Given challenges with parsing and registration, I think that’s the way to go :+1:

Depending on your readability preferences you can get pretty far by un-importing the default sigils

import Kernel, except: [sigil_D: 2, sigil_N: 2, sigil_U: 2, sigil_T: 2]
import MySpecialCalendar, only: [sigil_D: 2, sigil_N: 2, sigil_U: 2, sigil_T: 2]

Throw that behind a use MySpecialCalendar if you want. Obviously this doesn’t help if you’re mixing and matching calendars. In that case I think the MySpecialCalendar.datetime("2019-01-01") is probably the way to go.

1 Like

I though about that as well, but for me the place where I use the most date/time/datetime sigils is in tests and there it’s very likely I also need different calendars. It’s also not a solution to the problem of inspecting structs with a non Calendar.ISO calendar, which @kip mentioned.

Didn’t think about that one. When we can register calendar modules with string aliases then I think this can be useful even for writing those sigils (and not just inspection).

config :elixir, :calendar_aliases, 
  "Holocene": MyApp.Calendars.Holocene
iex> ~N[2019-01-01 00:00:00 Holocene]
~[2019-01-01 00:00:00 MyApp.Calendars.Holocene]
iex> ~N[2019-01-01 00:00:00 MyApp.Calendars.Holocene]
~N[2019-01-01 00:00:00 MyApp.Calendars.Holocene]
2 Likes

If there’s a need to register the calendars before hand, I wonder if it makes sense to use sigil modifiers instead. So:

# instead of
~N[2019-01-01 00:00:00 Holocene]

# it becomes something like
Calendar.register(Holocene, :h)
~N[2019-01-01 00:00:00]h

You are correct in the ‘all but last’-element, which might work, although I do think that e.g. timezone indicators might be a problem as @wojtekmach pointed out. It is something that I missed in your proposal, probably because the example implementation did not do this.
I wonder what the compatibility issues would be if we have the optional calendar name at the front. Do you have examples?


Instead of having a special ‘calendar module registration’, what about just using the pre-existing alias functionality of Elixir?
It would seem very clear to me what happens if we have a N[2019-01-01 00:00:00 HC] when there is an alias SomeLibrary.Calendars.Holocene, as: HC.

This doesn’t work if one of the goals is to support inspection. Because it means the inspected value can’t be evaluated everywhere and have the same meaning in the same code base.

1 Like

I went ahead and created a proof-of-concept, the twist here is the calendar specifies what to register itself under (so that it’s inspect implementation matches the sigil):

See: https://github.com/wojtekmach/elixir/tree/wm-sigil-calendar

$ iex -r lib/elixir/test/elixir/calendar/holocene.exs
iex(1)> Calendar.register(Calendar.Holocene)
:ok

iex> ~D[2019-09-15] |> Date.convert!(Calendar.Holocene)
~D[12019-09-15]HE

iex> ~D[12019-09-15]HE |> Date.convert!(Calendar.ISO)
~D[2019-09-15]

what do you think?

The one problem I see with that approach is that now the calendar implementation needs to be changed to disambiguate if two calendars try to register for the same name. I’d at least like to see a way for users of the calendars to apply their own naming, if the default name isn’t unique.

1 Like

I reread the thread (which is amusing since I started it) and I can’t recollect now why we wouldn’t just use the module name of the calendar since:

  1. Its already guaranteed to be unique
  2. Sigils are a developer only construct so clarity and consistency are likely higher objectives
  3. Having the full module name removes any “magic” which a registration process risks
  4. Absence of a calendar name defaults to Calendar.ISO as usual so no backwards incompatibility

Before the date/time/datetime, after it, used in the “flags” location are secondary I think. But using the full module name still makes more sense to me.