Inspect/2 output for structs: executable code or structure?

kip · July 3, 2024, 12:27am

TLDR;

What should I emit for inspect/2 output of library structs? Executable code, string representation or default (outputs the struct fields)?

Summary

In April, I changed the output of inspect/2 for Cldr.LanguageTag.t to be executable code rather than the default structural output. I did this to align with what I understood is the emerging pattern for Elixir structs but not everyone agrees.

I decided to review the inspect/2 output for all the Elixir core structs (summary below)

It appears, as I had understood, that most structs emit executable code these days. Using a sigil if one is part of kernel and a function call if not.

The ones that do not emit executable code appear to do so because there is no mapping from the struct to code - perhaps because the struct holds some state like File.Stream. And it makes sense for a PID to be represented a a string because the struct is opaque. I think URI might be an outlier - it could be URI.parse!(uri_string) I think, without losing any information.

I didn’t seen anything in the Elixir Antipatterns guide.

What is core teams guidance and what are community expectations for inspect/2 output for a struct?

Output executable code

Date
Time
DateTime
Date.Range
NaiveDateTime
Regex
MapSet
Range
Exceptions (emit their name only, which I count as code)

Output structure

URI
File.Stream
Version

Output string representation

PID

Marcus · July 3, 2024, 5:51am

I expect executable code as a result of inspect. In my projects, I also change it to this effect.
If I want to see the structure I use:

iex> Date.utc_today()
~D[2024-07-03]
iex> Date.utc_today() |> inspect(structs: false)
"%{calendar: Calendar.ISO, month: 7, __struct__: Date, day: 3, year: 2024}"

LostKobrakai · July 3, 2024, 7:35am

To me “exectuable code” inspection format is a (better) alternative to the previous convention of having #Struct.Name<…> format. But I don’t expect it in places where no custom inspect implementation is needed and/or makes sense, so I don’t consider that new pattern a general one to be applied everywhere.

josevalim · July 3, 2024, 9:48am

The standard library is not a great example because Date, Time, Regex, Range, etc, all have built-in sigils and operators, imported by default, which are compact and justify using the same notation for printing them. If we remove those from the equation, we end-up with:

URI, File.Stream, Version - output the struct
MapSet - uses MapSet.new
PID, Ref, Port - output the string representation

Given providing built-in sigils/operators is not practical outside of stdlib, here is the main chart:

Can I expose the struct representation without leaking implementation details?
- If yes, output the struct (such as URI, File.Stream, etc)
- If no, is there an executable version?
  - If yes, use it (such as MapSet, Decimal, etc)
  - If no, print a string version (#PID<...>)

acalejos · July 3, 2024, 9:25pm

@josevalim This is a very useful breakdown and I’m glad the original question was asked.

Would this make sense in the documentation somewhere? Maybe under Library Guidelines?

josevalim · July 4, 2024, 6:20am

Perhaps we could document it under Inspect itself?

acalejos · July 4, 2024, 2:06pm

I think that’d be appropriate

kip · July 4, 2024, 9:26pm

José, thank you, I think these are a good addition to the inspect/2 docs.

I’m probably overthinking it again but I’m not able to reconcile the “If yes (expose the struct without leaking implementation details), then output the struct” with the Date/Time/DateTime structs.

Those don’t appear to include implementation details and yet they output executable code - albeit in sigil form. (I agree that sigil form is only really appropriate for stdlib since the those sigils are included by default).

tfwright · July 4, 2024, 11:31pm

This seems to be what makes them exceptions, but also aren’t structs just as executable as sigils? I’m a bit confused by that category. I can definitely see the advantage of emitting executable code over a string representation of the code, which is why I generally prefer the actual struct vs a “pseudo code” string representation, but I don’t see the downsides to emitting structs when they are public. Even if they leak implementation details I think I would prefer that as long as there is an additional convention around signalling that.

On the other hand it’s not clear to me why MapSet returns what it does so it’s likely I’m missing something.

kip · July 4, 2024, 11:36pm

I don’t see the downsides to emitting structs when they are public

Thats definitely part of my consideration here. There is an argument (going on in my head) about whether Cldr.LanguageTag.t should be opaque. In which case the inspect output would not be the structure.

I don’t see the downsides to emitting structs when they are public

When the struct is really large then, at least in iex, the output isn’t always clarifying. At least thats been part of my thinking.

LostKobrakai · July 5, 2024, 7:10am

MapSet is an opaque type. It means the internal representation might change at any time – even has in the past, so not just a theoretical thing. Therefore it cannot show you the internals, because then you start depending on it (like pasting from one elixir version to the next might break), which is something to be avoided.

Previously it did inspect as #MapSet<[…]>, which made the mapset a comment in elixir syntax, meaning you couldn’t paste the inspected value into a shell or file and have elixir turn it back into an actual MapSet.

To help with that downside many of such structs were switched to an “executable code” representation, which is still valid elixir code, which happens to evaluate to the inspected value without needing to know the internals. That’s MapSet.new([…]).

Where it gets more into the space of “tradeoffs” is when it’s not clearly a opaque type, like with Decimal. It is a public and documented struct, but who want so see %Decimal{exp: -4, sign: 1, coef: 152345} vs. Decimal.new("15.2345") in the common case – for the uncommon case there’s always inspect(…, structs: false).

I think this works well because this is lossless. The information and value behind their evaluated data between both options is the same. There is a 1:1 mapping between all possible value in both forms and the conversion between could be implemented as a pure function.
Even with decimal not being an opaque type you’re generally not expected to dig in the individual fields of that struct. The expectation is that you use the Decimal api for any manipulation on a decimal.
That kinda boils down to “decimal” being considered a self-sufficient type, a value that’s (usually) not to be subdivided – similar to e.g. MapSet

From the issue @kip linked it seems those bullet points might not apply to how one would interact with a language tag in cldr.

kip · July 5, 2024, 8:05am

Actually I think those bullet points align quite closely to how I see Cldr.LanguageTag.t.

The information conveyed is lossless in the “code” output - and there are 100s of validation tests to ensure that’s guaranteed.
I don’t really expect people to be digging into the fields - and the structure has definitely changed over time, but slowly and not frequently
It’s a type that is self-sufficient and not expected to be subdivided.

Therefore its more closely aligned to Decimal.t than MapSet.t in my mind,

LostKobrakai · July 5, 2024, 8:11am

Not saying this might not be true, but those are the arguments brought up on the github issue. So it might be worth digging into where the mismatch in understanding comes from in this specific case.

tfwright · July 5, 2024, 8:04pm

Thanks for the additional context. In terms of tradeoffs I definitely lean strongly toward including more information, even the example of Decimal I think I would prefer seeing the struct itself over Decimal.new(...).

I haven’t seen as many libraries using opaque types and I think that’s probably a good thing. Using them or not seems to be entirely the discretion of the library’s author, but it seems like the best thing from the user’s point of view is to make public structs as transparent as possible and try to keep implementation details deemed inessential as separate as possible.

josevalim · July 7, 2024, 11:42pm

Date/Time/DateTime have their own rules because they have built-in sigils already imported. If you think about it, stdlib is full of exceptions like that: lists, maps, tuples, calendar types. They all have their own notation: some because they are implemented by the VM, some by sigils, but to the end user, they have their own rules.

So yeah, I’d remove them from the equation.

Usually data structures that are implementing new data types (like Decimal, MapSet, Queue, etc) are opaque, as their goal is to provide the best implementation with certain properties and you are not meant to care or access exactly how that’s done behind the scenes.