Proposal: strftime-based calendar/datetime formatting

NOTE: this is a focused thread, so we appreciate if everybody stayed on topic. Feel free to comment anything in regards to calendar formatting but avoid off-topic or loosely related topics. For example, if you would like to discuss or propose other Calendar/DateTime features, please use a separate thread.

Hi everyone,

This is take two for calendar/datetime formatting in Elixir. This time, we are exploring strftime-based syntax which is much simpler in scope than the Unicode’s Locale Date Markup Language discussed previously.

Here is how the API will look like:

Calendar.format(date_or_time_or_datetime, "%Y-%m-%d %H:%M:%S.%f")
#=> {:ok, "2018-11-29 13:19:41.032412"}

The Calendar.format/2 entry point accepts any calendar type, using structural typing. This means we will be able to format any map that has the fields being formatted. In case a map field is missing, an appropriate error message will be raised.

The formatting function will also support multiple options to customize different aspects of formatting. Let’s take a look at them:

Options

The options can be broken into 2 distinct categories.

The first one is about localization:

  • :preferred_date - configures the default date
  • :preferred_time - configures the default time
  • :preferred_datetime - configures the default datetime
  • :hours_in_am_pm - a function that receives hour, minute, second and returns the hours_in_am_pm tuple (as seen in c:Calendar.hours_in_am_pm/3)

Then we have options that control translations:

  • :am_pm_names - a function that receives :am, :pm and returns the relevant “am”/“pm” string
  • :month_names - a function that receives the month as an integer and returns the month name as a string. For example, fn index -> {"January", "February", ...} |> elem(index - 1) end
  • :abbreviated_month_names - a function that receives the month as an integer and returns the abbreviated month name as a string. For example, fn index -> {"Jan", "Feb", ...} |> elem(index - 1) end
  • :day_of_week_names - a function that receives the day of the week as an integer and returns the day of the week as a string. For example, fn index -> {"Monday", "Tuesday", ...} |> elem(index - 1) end
  • :abbreviated_day_of_week_names - a function that receives the day of the week as an integer and returns the abbreviated day of the week as a string. For example, fn index -> {"Mon", "Tue", ...} |> elem(index - 1) end

The default values of all options will be returned by the calendar, which should implement a formatter_config callback.

With the options out of the way, let’s talk about the formatting syntax.

strftime syntax

strftime has a simpler notation while still covering a wide range of use cases. This leaves it open for the community to support more complex formats such as ICU/Unicode/CLDR if desired.

The proposed syntax is an extension of strftime that also allows the padding width to be given as argument:

%<flag>?<width>?<format>

The flag is limited to certain characters, the width is a positive integer without leading zeros and the format is always a letter. Examples are %d. %-d, %4d and %_4d.

Format Description Example (in ISO) Source
%a Abbreviated name of day Mon Calendar.day_of_week + :abbreviated_day_of_week_names
%A Full name of day Monday Calendar.day_of_week + :day_of_week_names
%b Abbreviated month name Jan struct.month + :abbreviated_month_names
%B Full month name January struct.month + :month_names
%c Preferred date+time representation 2018-10-17 12:34:56 :preferred_datetime
%d Day of the month 01, 12 struct.month
%f Microseconds 000000, 999999, 0123 struct.microsecond
%H Hour using a 24-hour clock 00, 23 struct.hour
%I Hour using a 12-hour clock 01, 12 struct.hour
%j Day of the year 001, 366 Calendar.day_of_year
%m Month 01, 12 struct.month
%M Minute 00, 59 struct.minute
%p “AM” or “PM” (noon is “PM”, midnight as “AM”) AM, PM Calendar.hours_in_am_pm + :am_pm_names
%P “am” or “pm” (noon is “pm”, midnight as “am”) am, pm Calendar.hours_in_am_pm + :am_pm_names
%q Quarter 1, 2, 3, 4 Calendar.quarter_of_year
%S Second 00, 59, 60 struct.second
%u Day of the week 01 (monday), 07 (sunday) Calendar.day_of_week
%x Preferred date (without time) representation 2018-10-17 :preferred_date
%X Preferred time (without date) representation 12:34:56 :preferred_time
%y Year as 2-digits 01, 01, 86, 18 struct.year
%Y Year -0001, 0001, 1986 struct.year
%z +hhmm/-hhmm time zone offset from UTC (empty string if naive) +0300, -0530 struct.utc_offset + struct.std_offset
%Z Time zone abbreviation (empty string if naive) CET, BRST struct.zone_abbr
%% Literal “%” character %

The source column is used as a reference for the implementation and it won’t be present in the final documentation.

Flags

By default the modifiers above are all padded with zeros according to the ISO standard. The user can disable padding or use spaces with the flags below:

  • _ (underscore) - pad a result with spaces, such as %_d
  • - (dash) - do not pad a result, such as %-d
  • 0 (zero) - pad with zeros, such as %0d

Rationale

Last but not least, it is worth discussing the rationale for date/time formatting. If you have an application that works with calendar types, it is likely that you have to format them at some point. If your application mostly interfaces with other systems, then there is a chance the built-in ISO format is enough, but not always. For example, some HTTP headers use a different format than the recommended ISO one. Therefore adding formatting to the standard library feels like a natural next step to the existing functionality. Furthermore, by choosing to support strftime, we guarantee that the implementation will have tiny footprint compared to larger standards.

Another discussion, which may or may not impact this one, is about parsing. The parsing specification is often the same as the formatting specification but we have explicitly decided to not support parsing in Elixir. First of all, it is really hard to support a general but efficient runtime date/time parsing strategy. If you expect certain formats, it is almost always better to define functions that parse specifically those formats. Things get trickier if we consider the fact we need to support internalization, which is trivial for formatting, but quite expensive for parsing. In other words, while we can provide a general and efficient implementation for formatting, we can’t do so for parsing. Since different trade-offs can be made here, ranging from performance to flexibility, we are not comfortable in picking one or another.

Roadmap

We don’t plan to add this functionality directly to Elixir. Instead we will develop it as a library and collect feedback. The complexity of the implementation will also dictate if this will become part of core or not, but we believe the implementation will be relatively simple.

Log

Log of changes done to the proposal.

  • 2018/12/14 - proposal submitted
  • 2018/12/15 - removed the Formatter callback from the proposal in favor of an option/config based API
  • 2018/12/17 - removed week_of_year to align with current Elixir master
  • 2018/12/18 - added width and %q
  • 2018/12/19 - remove calendar extensions section

Feedback

Your turn.

34 Likes

I definitely like this more than Unicode’s Locale Date Markup Language, and I :heart: formatter module dependency injection.

AFAICT, there are no time partials in the world but :am | :pm.

2 Likes

AFAIK, AM/PM, which split the day in two halves, is mainly used in the Gregorian/Julian style calendars. Calendars that subdivide the day in different ways (the decimal calendar, some hindu calendars) split the day in many more chunks that are often more analogous to hours. In any case, because this would greatly complicate the logic for everyone who is using it, and not be super important to have for the smaller group of people using the handful of calendars that follow wildly different formatting rules (because they could use a dedicated custom formatting module in that case), I am for keeping the API simple and working with :am | :pm.

3 Likes

Do you plan on adding English day postfixes? I.e. st/nd/rd/th.

I’m not quite sure about the parsing being missing, even though I can appreciate the technical arguments. If the aim is to make Elixir seem like more “complete” to a user, as apparently some are surprise by the lack of formatting/parsing/TZ stuff in the core, then this will just move it to being surprised about parsing being missing. Many users will just have to go for some calendar library anyway if they want parsing. Add to this the TZ stuff being added into core, the UX will soon be that Elixir contains sort of half of a datetime library and for some reason the user will have to get a package to fill in the rest. This is just a first impression though, so don’t take it too strongly. :slight_smile:

2 Likes

I don’t think so, no. Is it part of any strftime implementation?

Plus this is something that may change with time. For example, if we adopt strftime, then parsing becomes simpler, simply because the syntax is not as complicated as in the previous proposal, but I still think it would require some break-throughs. We will see.

I like the reduced scope! I think it will cover the majority of use-cases very well.

Is there a chance of providing a means for developers to flexibly provide their own custom formatting? It could be used to provide the ordinal formatting that @Nicd is requesting.

Here’s one example syntax:

iex> Calendar.format(now, `"%B %{d_ord}"`, MyApp.OrdinalFormatter)
December 14th

Another syntax might be more like %B %Cd where the C indicates that the next character represents a custom formatting directive.

Of course a setup like this would mean that you’d always have to pass in your formatter when formatting your date strings, but you could relatively easily create a MyApp.calendar_format/2 that would bake it in.

6 Likes

There are other concepts for parts of the day which CLDR supports but which would not seem relevant for this proposal.

2 Likes

@josevalim very practical proposal and probably easier to consume in most use cases than the CLDR encoding I’d agree.

No surprise that I look at this and consider how this could be used in a locale-specific way. Injecting a formatter is great. But I think it would be even better if there is an option to inject an ma tuple instead of just the module. That way a locale could also be injected (or any other parameter). Otherwise for a locale aware application one would need to either:

  1. Set a locale prior to calling strftime which seems very brittle and not very clear.
  2. Or, as your original example illustrates, there would need to be one module per locale and a lookup table to translate a locale into a module name which is a lot of scaffolding just to inject the right formatter for a locale.

An example of the ma approach would be:

 Calendar.format(date_or_time_or_datetime, "%Y-%m-%d %H:%M:%S.%f", {MyApp, ["pt-BR"]})

This would make the intent clearer and be easier to adapt for Gettext and Cldr and any other locale-specific library. Implementation is just another function head that can easily be pattern matched. The actual date being formatted would be prepended to the other tuple arguments.

6 Likes

Can you use string interpolation in those cases?

original = compute_ordinal(now)
iex> Calendar.format(now, "%B #{ordinal}")
December 14th

And if you want this to be extensible too you can have your own behaviour that builds on top of the existing one. The only trouble with interpolation is if the returned string also contains a %, but that is rare and you can always escape it. Or you can call format twice and inject your contents between them.

This is exactly what I was looking for, thank you. One of the issues mentioned there though is that in Japan they consider the AM/PM boundaries to differ from western ones. So I am wondering if we should instead pass the whole hour, minute and second to the formatter and make the AM/PM completely a formatter concern. I would say this is preferable because having to build a calendar only for localization is quite annoying.

The other place where you may need a custom calendar for localization is day_of_week (many start on sunday while ISO is monday) and week_of_year but we could move this to the formatter too in the future if we want to.

This looks good to me. Although I would make it a regular tuple where the second element is always passed last to the formatter functions. Since this is a behaviour, we want the definitions to be static, so all of the formatter callbacks will receive a config which may be given as input. If they are not available in the input, we pass a default value.

The other option is to support a fourth argument but I think that’s worst. Thoughts @michalmuskala?

2 Likes

In the previous discussion about the additions to Calendar, your implementation elected to not include arguments that would allow customisation of the day_of_week, or indeed min_days and first_day for variant Gregorian calendars. I think you were right - thats a characteristic of the calendar and an implementation detail there.

I haven’t yet worked out the optimal way to generate calendar modules at runtime (required in order to define user calendars that might have these variants) but thats not a concern here.

That makes sense to me. In fact I assumed that that the entire date_or_time_or_datetime struct would be passed to the formatter since all of that information might be required depending on the calendar.

1 Like

That’s the part I am struggling with. The flow I have in mind is this: we have the struct, we pass the information to the calendar, and then the formatter receives the calendar result.

However, this often ties the formatter to a given calendar. For example, the Formatter.pads function could say that year has a 4 digit padding but that is actually specific to ISO. Even things like month_name, if you assume that 1 is january, than that information is specific to certain calendars.

The opposite is also true, the calendar is bound to some localization rules. As you said, day_of_week and week_of_year can be localized.

It may be that the best option forward is to actually move names to the calendar because we need to have a canonical source. For example, we shouldn’t pass 1 representing a monday to the formatter, we should pass the atom :monday. Then the formatter can truly care only about localization and translation and also be calendar independent.

However, that still leaves week_of_year in a very awkward position. Maybe it is best to remove it from Elixir for now (or keep it formatter specific) until we are able to revisit this. In any case, @kip, to answer you more precisely, I don’t think you should create different calendars if the difference is the day of week or week of year. It is rather something to build on top.

We’d need to pass the “parts per day” thingy into the formatter - the formatter should be calendar-agnostic and hours, minutes, etc are a calendar concern.

I think the simplest and most pragmatic approach would be to specify that regardless of the formatter the week_of_year always uses the ISO week, in that it always starts on Mondays. On the other hand for the day_of_week we can easily specify that they are 1-based and start on Monday and the formatter may translate between different conventions easily.

I’m curious why only the week_of_year would be interpreted in the context of the ISO calendar only when all the other elements would be interpreted in a calendar-specific way? Thats the point of adding week_of_year to the Calendar behaviour, no?

The conflict comes based on the fact that, even though it is calendar-based, it is one of the aspects that the western word disagrees with the most, so it makes the problem more apparent. There are probably other fields in the Calendar.ISO that are conflicting too but not as much?

1 Like

Ok, after a day of thinking, I believe I have a better understanding of the issues we are discussing. The trouble here is that we have four different concerns between the formatter and the calendar:

  • localization - default date, default time and default date time)
  • week_of_year - what is the initial day of the week (monday or sunday) and on which week the year starts (it is also part of localization)
  • translation so weekday 1 is monday (en) / segunda-feira (pt-br) and month 2 is february (en) / fevereiro (pt-br)
  • padding - which is how to pad each entry, which is calendar specific

The problem is that we are trying to fit those concerns into either the formatter or the calendar and having trouble doing so. Here is a new proposal: we get rid of the formatter and instead we pass multiple options. The options are:

  • default_date - configures the default date
  • default_time - configures the default time
  • default_datetime - configures the default datetime
  • week_of_year - a function that receives calendar, year, month, day and returns the week_of_year tuple
  • month_names - a function that converts integers to month names: fn 1 -> "January"; 2 -> "February"; ... end
  • abbreviated_month_names - a function that converts integers to abbreviated month names: fn 1 -> "Jan"; 2 => "Feb"; ... end
  • day_of_week_names - a function that converts integers to day of week names: fn 1 -> "Monday"; 2 -> ... end
  • abbreviated_day_of_week_names - a function that converts integers to day of week names: fn 1 -> "Monday"; 2 -> ... end
  • default_padding - (:zero, :space, :none)
  • year_padding - 4
  • month_padding - 2
  • day_padding - 2
  • hour_padding - 2
  • minute_padding - 2
  • second_padding - 2
  • day_of_week_padding - 2
  • day_of_year_padding - 2
  • week_of_year_padding - 2

The calendar should return a map with defaults for all of those options which can be customized when the format function is invoked.

AM/PM

For AM/PM, we will add a new function to Calendar that returns :am, :pm, :noon or :midnight. We need moon and midnight because some languages consider those different parts of am/pm. Therefore we will also add an option to the formatter like this:

  • am_pm - a function that converts parts am/pm info: fn :am -> "am"; :midnight -> "am"; :pm -> "pm"; :noon -> "pm" end
2 Likes

I think that an important part of formatting is to return good errors when the input data is not available for formatting. For instance if you call Calendar.format(nil, “%Y-%m-%d %H:%M:%S.%f”) it should return an error. Because nil does not have a year, nor hour nor minute and so on.

Likewise if you have a simple Date such as ~D[2018-12-15] and then call format with “%H:%M” (hour and minute) an error should be returned. Just like nil does not have any hour or minute, neither does ~D[2018-12-15].

1 Like

Yeah we could for simple cases. For more complex cases you could split the string when you have any custom interpolations, format the results separately, and then join all three pieces (recursively). :+1:

I have updated the proposal with the developments mentioned in this thread. The custom formatter is gone and now we have a generally more flexible option-based API.

@kip and @Qqwy, can you please review the new sections “Options” and “Calendar extensions”? I would love your thoughts generally but also in particular on this bullet:

Thanks!

3 Likes

Very good!

As a whole, I think the new variant of the proposal with separate functions for every part is a much better way to split up the concerns. Great thinking! :+1:

The one thing that might need more clarification, is what should happen on failure, such as:

  • The cases that @Lau described in his post above.
  • What if we have a formatter ‘day-of-week’ function but we have a week of 10 days? This applies to :month_names, :abbreviated_month_names, :day_of_week_names, :abbreviated_day_of_week_names. Maybe we should standardize an ‘error’-result they should return, which Calendar.format then turns into a formatting-exception?