Community Context Exercise/Learning Discussion

steve · June 29, 2017, 7:09am

I want to have a discussion about contexts using an example that’s a bit more complicated than what I’ve seen on the forums so far. I’m still trying to wrap my head around contexts; specifically when to include functionality within a given context or when to break functionality into its own context. I’m hoping that this will serve as an exercise for the community to help people (like myself) better understand contexts (and possibly turn into a guide if this turns out well). Since I assume there’s always more than one way to structure contexts within a project, I’m looking forward to seeing different solutions.

I want to keep the conversation more high level, so I won’t be doing too much (if any) code. I also want to create a fairly complex example to give us more room to play with contexts. Here’s a summary of our example application:

The website is a social networking platform that allows users to sign up with an e-mail address and password. Users can follow other users, make posts to their own page, make post to another user’s page, comment on a post, and react to a post. A user’s profile page shows basic info and a list of recent posts made by themselves or by guests. A user can see all the most recent posts of the users they follow on their home page. There can be “pages” that are moderated by users (such as a page representing a company) that have the same posting/following/reacting functionality that a user has. A user must have permission to act on that pages behalf. All other site functionality is standard, such as a settings page to edit their account or profile.

I’ll take a stab at creating a context within this application. I’m also going to ask questions, show my thought processes, and provide multiple solutions for the same functionality. Feel free to answer, comment, critique, or correct anything that I do. Again, I’m still learning contexts, so I’m sure my solution is far from optimal. If you want to show how you’d model the same context I present or want to show how you’d model a different context, please do so. I want this to be an educational discussion on contexts and how to build them in a more realistic application.

To keep this post short, I’m going to focus on a small subset of the application. Specifically, I believe we could split the user information into two contexts, Accounts and Users. The Accounts context would be for “private” or application-centric information while the Users context would be for “public” information (that is, information shown on a user’s profile). I’ll be talking about the Accounts context specifically.

The Accounts context would be responsible for authentication, email verifications, password resets, permissions, registration, and sessions. The folder structure might look like:

/accounts
    /authentication
        log_in.ex # embedded schema
    /password_reset
        request_reset.ex # embedded schema
        reset_password.ex # embedded schema
    accounts.ex
    authentication.ex
    email_verification.ex
    password_reset.ex
    permissions.ex
    registration.ex # embedded schema
    session.ex
    update_email.ex # embedded schema
    update_password.ex # embedded schema

The embedded schemas represent a “form” that the user must fill out in order to perform certain actions on the website. These forms don’t necessarily map one-to-one in regards to the underlying database implementation. For example, the log_in.ex embedded schema would require e-mail and password, the request_reset.ex embedded schema under /password_reset would require a verified e-mail address, the reset_password.ex would require a password and its confirmation, and the registration.ex embedded schema would require the user’s name, e-mail, and password. All of these fields could be store across one or multiple databases. These files deal more with the interface between the web application and our business logic, making sure the website doesn’t dictate how we structure data in the database.

Now, I don’t know if it’s considered correct to structure the embedded schemas the way that I have. I could’ve easily just included them all in the root of the /accounts folder with more descriptive file names rather than creating a nested folder structure. For example, /accounts/password_reset/request_reset.ex could’ve simply been /accounts/request_password_reset.ex. I could’ve also just created a nested folder for all embedded schemas instead of spreading them out across multiple folders. For example, the folder could’ve been name /accounts/schemas, /accounts/forms, or /accounts/actions and held all embedded schema files within it. How would you structure this context?

Each file would be responsible for its own functionality. For example, email_verification.ex would contain all the logic necessary to verify an account’s e-amil address.

Now, the Accounts context could expose all of its functionality through the main module named Accounts. In this case, all functions in separate files, such as email_verification.ex, would be referenced in accounts.ex in some manner, such as defdelegate. A short, incomplete list of functions might look like:

Accounts.log_in
Accounts.has_permission?
Accounts.create_session
Accounts.mark_email_verified
Accounts.register_account

Another option would be to namespace the functionality. Using this method, there would be no need to reference functions within the accounts.ex file. A short, incomplete list of functions might look like:

Accounts.Session.create
Accounts.PasswordReset.reset_password
Accounts.EmailVerification.mark_verified

In @chrismccord’s recent talk on Phoenix 1.3, he mentions that you shouldn’t namespace a module if it’s leaking implemention details, such as using Amazon’s SQS service. However, these are all more general purpose namespaces that have more to do with sectioning off functionality than revealing any implementation details. Because of that, it makes me wonder if such functionality should be (or could eventually in the future be) split off into its own context. Technically speaking, we could make several contexts out of this, including but not limited to Authentication, EmailVerification, PasswordReset, Permissions, and Sessions. Would it be best to keep any, some, or all of these within the Accounts context? Would you split them apart?

Well, this post is already getting quite long, so I think I’ll stop here. Does this look like a reasonable use of context given our example application? If I did absolutely everything wrong, feel free to say so. Would anyone like to take a stab at creating a context for the application? It could be any part of the application, including the Accounts portion.

Also, I know I didn’t discuss it above, but I was assuming that the application only uses a single database managed via a library like Ecto. I know there’ve been questions about how to manage the database across contexts. Perhaps someone might be able to say how they’d approach that aspect of the application. For example, I’ve read that a few people on this forum has suggested a “shared” folder for the base DB schemas that all the contexts can use.

Anyway, I’m hoping this will be a great collaborative community effort to help everyone come to a better understanding of contexts.

steve · July 5, 2017, 5:44pm

I’ve been toying around with the idea of splitting the Accounts context into smaller contexts and when it’s appropriate to do so. There are essentially the three options I listed above:

Everything is in the Accounts context and is exposed through functions on the main module (e.g. - Accounts.verify_password).
Everything is in the Accounts context and is exposed through namespaces (e.g. - Accounts.Authentication.verify_password).
Everything is split into separate contexts (e.g. - Authentication.verify_password).

Now if contexts were to have namespaces, it almost seems natural to me to want to split it out into its own context (depending on the use case). While the smaller contexts may not be responsible for much, it would be very clear what their responsibility is. For example, the PasswordReset would solely be responsible for all the password reset functionality on the website. The folder structure might look like:

/password_reset
    account.ex # schema
    password_reset.ex # context
    request_reset.ex # embedded schema
    reset_password.ex # embedded schema

The public API for the context would be:

PasswordReset.request_reset(email)
PasswordReset.reset_password(request_id, args)

Then for the public changeset functions:

PasswordReset.request_reset_changeset(args)
PasswordReset.reset_password_changeset(args)

OR

PasswordReset.changeset(:request_reset, args)
PasswordReset.changeset(:reset_password, args)

All the other functionality, such as fetching and deleting the password reset requests would be private.

Typically, password resets use an email to send a password reset link to the user. Now, I’m not sure quite how to handle this. There are three possible solutions:

The PasswordReset context is responsible for sending its own emails as well as templating the emails body.
There is a Mailer context that is responsible for sending all the emails within the application. It also managers how and what the body of the email looks like.
The Mailer context is responsible for simply sending the email while the PasswordReset context is responsible for the body of the email, passing it in as an argument to the mailer.

All three solutions seem simple enough to implement. The final two solutions mean that the PasswordReset context would depend on the Mailer context, which makes sense as a dependency. If I were to choose one of them, I would probably go with the third solution.

Like I said earlier, this could all be included in the Accounts context since password reset is a concern of the account. However, I just wanted to show that it’s trivial to split off functionality into its own context. I’m not entirely sure if it’s better to keep it within Accounts or to have split it off into its own context in this situation. Anyone able to help out with that?

In my first post, I listed several “internal” functionalities that could be split off into new contexts. If it makes sense to do so, or if someone wants to see me do so, I could make more posts here (like this one) showing how I would do each. I know that Authentication could easily be split out into its own context, especially if we plan on adding multiple methods of authenticating, such as OAuth.

Any feedback on what I’ve been doing so far?

peerreynders · July 5, 2017, 8:33pm

I’d argue that composing the email content isn’t the responsibility of either the Account, nor the Mailer context. While the Account context is certainly a key capability when it comes to reseting passwords, I think it’s responsibility should end when it releases the perishable URL for the password reset page (that page is where it’s responsibility continues) after a password reset has been initiated. There may even be a third context that is entirely responsible for generating all sorts of email content because it manages the requisite templates. So another (fourth) party may very well initiate the password reset with the Accounts context, use the URL to generate the content with a “document templates” context and then hand over the finished document to a Document Dispatch context which could email or text it.

Looking at [this example] (blog/lib/blog.ex at 49e506992bb311ff0ca2b8a75c696c795ab3ef7f · learnphoenixtv/blog · GitHub) I’m a bit dismayed - in my personal opinion a context should not be leaking changesets anywhere- while I’m sure this is done for reasons of convenience, for me it goes against a core context design principle:

“Ask the context that holds the data to do the work for you” aka “Ask for help, not information”

and changesets are all about simply handing out information. So as far as I’m concerned --no-schema would be the “goto” option. Also the generator using Ecto schemas is likely giving people the wrong idea - that contexts emerge from the schema - it’s quite the opposite:

These capabilities may require the interchange of information — shared models — but I have seen too often that thinking about data leads to anemic, CRUD-based (create, read, update, delete) services. So ask first “What does this context do?”, and then "So what data does it need to do that?"

Sam Newman, Building Microservices (2015)

So ideally the boundaries have already been laid out before the schema is laid down. The generator simply completes the round trip from the schema if there isn’t any code already. If it turns out that the boundaries need adjusting then the schema will need adjusting. Context drives the schema - not the other way around.

steve · July 5, 2017, 9:46pm

peerreynders:

I’d argue that composing the email content isn’t the responsibility of either the Account, nor the Mailer context. While the Account context is certainly a key capability when it comes to reseting passwords, I think it’s responsibility should end when it releases the perishable URL for the password reset page (that page is where it’s responsibility continues) after a password reset has been initiated. There may even be a third context that is entirely responsible for generating all sorts of email content because it manages the requisite templates. So another (fourth) party may very well initiate the password reset with the Accounts context, use the URL to generate the content with a “document templates” context and then hand over the finished document to a Document Dispatch context which could email or text it.

Could you go into a bit more detail about how that might look overall? If I’m reading this correctly, there would be four different contexts for handling everything involved, right? My gut reaction is to say that’s a lot of moving parts. Could you explain the benefits to spreading out the functionality of the password reset across so many contexts? My first thought it that the application is going to be confusing if we split up functionality into such granular components. Again, I’m an amateur when it comes to architecture, so I’m genuinely asking these questions so that I can learn to better my own practices.

Correct me if I’m wrong, but I don’t believe the goal of contexts is to be as isolated as something such as microservices. I’d imagine that a non-negligible amount of Elixir applications using Phoenix will end up using a singular database that’s shared across all contexts, for example. There’s obviously going to be a bit a sacrifice to “pure” isolation in these projects. I would ask the question of whether or not there’s a diminishing return on the amount of productivity and maintainability gained when it comes to separating this concerns and tools so finely. At what point are you getting the most bang for your buck?

That’s why I picked the Accounts context as one of the first to examine. While all of the namespaces in the original post could all be part of a larger context (Accounts in this case), I was wondering at what point do you break them off into new ones? If I remember correctly, it’s suggested that you keep everything within that original context until you feel a part needs to be split off. This would be akin to a “monolith” context, I suppose.

Basically, where’s the sweet spot when it comes to productivity and maintainability? At what point are we going too large or too small with our contexts? If I were to guess, it’s more harmful to a project to start off too small than it is to start off too large when it comes to contexts.

If we’re going to assume this project is entirely Elixir, how would you handle returning error information otherwise? As far as I’m aware, a changeset is nothing more than a struct. If I were to return an error response as a map or tuple, for example, how is that any different? In the end, there’s going to be an error structure that’s going to be used. Otherwise, how is the API consumer ever going to receive meaningful information about what went wrong? In this case, it makes sense to me to use Changeset because it’s a struct that’s well understood by other tools within the Elixir toolset and can be converted to other formats, such as JSON, if it needs to be used outside of Elixir. By eliminating Changesets entirely in this case, I almost feel as though you’re optimizing for a use case that’s not even on the radar.

kokolegorille · July 5, 2017, 11:02pm

It looks like you are trying to implement devise for phoenix.

While I understand such a need, I would mention that both Elixir and Devise are coming from plataformatec. So It should be very natural to have an easy port… but it’s not the case.

Porting any library from OOP to FP requires changing paradigm. And that is hard, hard as porting ruby gems to elixir hex.

steve · July 5, 2017, 11:27pm

Was this meant for a different thread? I’m simply talking about higher level concepts, not necessarily reimplementing any library. The APIs and boundaries are all hypothetical. If you’d like to take a stab at a different portion of the project unrelated to accounts, such as posts, I’d encourage you to do so.

peerreynders · July 6, 2017, 1:18am

In that particular case I was purely focused on determining where the responsibility for “composing the email” was - I wasn’t trying to determine a full picture of all the contexts involved - as a matter of fact the more I think about - even that URL/web page shouldn’t be the Account context’s responsibility - because that’s about “the Web” - not Accounts; contexts collect cohesive capabilities. You’re probably familiar with this quote:

Every block of stone has a statue inside it and it is the task of the sculptor to discover it. Michelangelo

Context’s within a domain are typically not discovered by some deterministic process - usually it takes some poking and prodding to find out what needs to stay together (high cohesion) and what should be separated (low coupling).

@gregvaughn

The “how would I organize this if I had to operate and sequence this functionality on the command line” is an incredibly useful thought experiment - trying to define small highly focused commands that can perform work based on their own autonomous data has the tendency to shake loose things that can be separated while highlighting what needs to stay together.

The “password reset” (actually I was more thinking of “forgot password” to be honest) I’m talking about is a user-scenario or use-case. It starts with someone clicking a button on a web page, resulting with a message in the users email, which contains a link to a web page that lets the user specify a new password. The Account context supports this scenario but it is only really interested in

Issuing some sort of perishable correlation ID that needs to be associated with the new password
Consuming the correlation ID together with the new password in order to verify that the correlation ID is valid and hasn’t expired - in which case it accepts the new password.

However there are many other parts of the use-case that need to be implemented - so a use case crosses many contexts. Setting up web pages with dynamically generated URLs, composing emails, sending emails aren’t in “Account’s” job description.

Ideally, at any point you should be able to change how the context works internally and provided you haven’t changed the interface its clients should not care. So “when something else” takes over “Account’s” responsibilities you should be able implement the pre-existing interface with a wrapper implementation and keep going - if changesets are part of your interface that isn’t going to happen because it is unlikely that the replacement supports changesets - as that technology is tied to Ecto.

I disagree - identifying areas of high cohesion and low coupling has always been a design goal even in monoliths to enhance maintainability (partially through replaceability). In fact it’s a strategy used when moving to microservices - first identify the boundaries within the monolith then refactor to reflect the boundaries before finally splitting off the microservice. The advantage in a microservice is that the boundary is physically enforced - in a monolith the boundary is largely conceptual and only maintained through developer discipline - violating context boundaries in a monolith is incredibly easy and often tempting (undoing the benefit of keeping it clean in the first place).

If you define a context per table you essentially just end up with Table Modules which is hardly an improvement over Rail’s Active Record.

The point is that there is always an upfront cost to loose coupling.
If loose coupling is applied in the wrong place (i.e. not at a “natural” boundary) then it’s going to keep costing without ever generating any return.
However loose coupling in the right place pays huge dividends in terms of maintainability. Typically it manifests itself in terms of replaceability - especially when multiple changes hit the same context - while the interface to the context manages to isolate the clients of the context from those changes.
Unbridled tight coupling (never-ending shortcuts) will leave you with the proverbial big ball of mud.

The wrong boundaries are just as bad as no (or too few, too large) boundaries. Finding the optimal boundaries is rarely a picnic - typically it requires that you understand the domain quite well - often better than the stakeholder, service owner, project/product champion - it is far from a cookie cutter affair.

A map is a very generic Elixir data structure - changeset is not - compare the functions that support Map vs Changeset - so a map is preferable - a struct is good as long as it reflects a domain concept - Changeset is all about functionality that Ecto supports but nothing directly domain related. However within the context you can use Changeset (as arguments/return values of private functions that support the public functions) as it is likely that the implementation of a context would be replaced wholesale anyway.

Yes, but an error tuple is idiomatic in Erlang and therefore Elixir - so it is the context implementation’s responsibility to extract the error message from the Changeset and to wrap it in an error tuple before returning it to a client

My argument is that if it’s necessary to expose Changeset you’ve either

chosen the wrong boundary or
implemented a poor interface along the boundary
.

steve · July 7, 2017, 8:21am

I’m still mulling over the rest of your post, so I won’t be commenting on that until I feel I have a better understanding.

However, I’m not sure I’m convinced by your arguments concerning changesets. Let’s assume the following is returned by a module when a validation error occurs: {:error, changeset}. While it’s true that the Ecto.Changeset struct is part of the Ecto suite, it doesn’t change the fact that it is simply a bare map underneath, meaning all of the functions in the Map module can be used on a changeset. You could ignore the __struct__ field entirely or even convert it using Map.from_struct() before returning it. I’d argue that the data returned via a changeset is valuable regardless of whether not Ecto is being used within a project. Any error structure you come up with in your own application is very likely to mimic or copy the information already being returned by a changeset. You could simply extract the errors field and return that in your error tuple instead, but then you’d be missing out on a plethora of other useful information.

peerreynders · July 7, 2017, 7:45pm

Ecto.Changeset is part of Ecto. No Ecto - no Ecto.ChangeSet.
Ecto is an implementation detail (related to the particular persistent storage technology that you are currently using), %Ecto.Changeset{} exposes a dependency to that implementation detail.
Dependencies on implementation details are a form of tight coupling.
Tight coupling between capabilities within the same context is OK.
Tight coupling across context boundaries compromises context integrity and autonomy as its clients will become coupled to its implementation details, creating very real obstacles for the internals of the context to evolve in the future because most changes will ripple to it’s clients (rendering the existence of the context kind off pointless).

The other issue is: valuable to who, exactly?

From Ecto github repository

  # If a new field is added here, def merge must be adapted
  defstruct valid?: false, data: nil, params: nil, changes: %{}, repo: nil,
            errors: [], validations: [], required: [], prepare: [],
            constraints: [], filters: %{}, action: nil, types: nil,
            empty_values: @empty_values

  @type t :: %Changeset{valid?: boolean(),
                        repo: atom | nil,
                        data: Ecto.Schema.t | map | nil,
                        params: %{String.t => term} | nil,
                        changes: %{atom => term},
                        required: [atom],
                        prepare: [(t -> t)],
                        errors: [{atom, error}],
                        constraints: [constraint],
                        validations: Keyword.t,
                        filters: %{atom => term},
                        action: action,
                        types: nil | %{atom => Ecto.Type.t}}

Now for most of this discussion I’ve been assuming that we are talking about a domain context. Very little of the above information is of any interest to domain logic. domain logic cares about domain types. domain types are supposed to strive towards Persistence Ignorance (PI). This typically translates to going through considerable lengths to hide the persistence mechanism from the client of the context. This could mean:

Transferring the data contained in an Ecto.Changeset to “Plain-Old-Data” (from POJO and POCO; “Plain-Old-Java-Object” and “Plain-Old-CLR-Object” respectively) before returning it to a client. So if you return a map each key would relate to a domain concept and each value would either be a basic Elixir data type representing a domain quantity or an instance of a domain type. Essentially the Ecto schema structs act as mere “data-transfer-structures” (DTO).
Burying the N-PI (non-persistence ignorance) part of the data in a deep, dark, opaque corner of the domain type and banning the client from any direct access to type’s data requiring that all accesses have to go through domain module functions which of course know how to navigate the twisted internals of that particular type. It should be obvious that this approach is only a last resort; for pure results the former approach is preferred. Other than that “Ask for help, not information” is the module/interface design mantra that is used to try to avoid exposing types where it is necessary to track a dirty state (again “state” is the complexity culprit).
Ideally whoever “owns” the Changeset could hold on to it until (or re-retrieve it when) the “new versions” of the domain type instances “come back” and then use some kind diff-ing mechanism to generate the Changeset needed for Ecto. I’m fully aware of how redundant that sounds but to achieve a high level of PI, Ecto doesn’t go far enough for decoupling because it assumes that the logic that makes the changes also has the Changeset. However the domain logic is only concerned with creating the new, updated instance of the domain type, not knowing how to deal with Changesets - that would be considered accidental complexity that obscures the domain logic.

In the other post I stated this

However within the context you can use Changeset

In terms of DDD I was actually taking a lenient position - DDD actually wouldn’t even let something like a Changeset exist in a domain context. Within a subdomain a repository has the responsibility of dealing with persistent storage concerns - which includes all the data query language assets - but the repository is also only allowed to serve (and accept) “plain-old-data” and domain types to it’s clients. Which makes sense as that, for example, keeps the burden of handling the Ecto-related concerns solely with the repository so that the capabilities within the domain context are only dealing with domain related concerns - however if the context is small enough, the repository could be overkill.

And while on the topic of DDD - the Mailer wouldn’t be considered a context but a service:

A good SERVICE has three characteristics.

The operation relates to a domain concept that is not a natural part of an ENTITY or VALUE OBJECT.
The interface is defined in terms of other elements of the domain model.
The operation is stateless.

Now I’m not here to serve out the DDD kool-aid because there are legitimate criticisms, for example:
Jim Coplien — Symmetry in Design
I think one of his implied concerns is that the third “D” is entirely focused on design, and that Domain Analysis (i.e. understanding the domain) seems to happen much too late in the process. Boundaries are often not chosen but discovered.

Patterns, Principles, and Practices of Domain-Driven Design, p.82

It’s important to be explicit about what context you’re using when talking with domain experts, because terminology can have different meanings in different contexts. As repeated throughout this chapter, multiple models will be at play in your domain. You need to enforce linguistic boundaries to protect the validity of a domain term. Therefore, linguistic boundaries are bounded context boundaries. If the concept of a product has multiple meanings inside the same model, then the model should be split into at least two bounded contexts, each having a single definition of the product concept.

Furthermore, I doubt that the Phoenix team was intending to push DDD as such. A Phoenix context:

Is about collecting capabilities exhibiting high cohesion in a single place (where tight coupling is OK)
Sets a boundary around these capabilities. The boundary is about loose coupling (the opposite of tight coupling) towards the context’s clients - largely to keep clients isolated from any changes internal to the context - up to and including the wholesale replacement of the context’s internals.

To reiterate:

Ecto and any associated data types are an implementation detail.
Letting implementation details cross the boundary leads to tight coupling and severely weakens the benefit of maintaining a context (and its boundary).

michalmuskala · July 7, 2017, 8:11pm

In any real-world application, you need a way to return validation errors to the users. Do you create a separate structure for each entity you have? This might be “correct” and “pure”, but it is definitely not practical.

While changeset implies there’s some data storage behind, it completely does not carry information what that data storage is. It might as well be an in-memory adapter. How deep does one go pruning structures from libraries? At some point, it’s necessary to decide something is a “core” library that is “safe” to use. Otherwise, it’s the same as falling deep into the NIH syndrome. Would anything change if the changeset code was copied into the application and called “MyApp.Changeset”? Would that mean separating from the database, would that be “correct”? For me, it wouldn’t change much.

In much of the DDD code I saw, validation is handled using exceptions. I generally find those to be a horrible choice to use exceptions (or any non-local code construct) for control flow. I’d take a plain-data changeset any time of day.

Phoenix contexts and DDD are related. They have similar goals - make you think about your domain first. But they are definitely not one and the same. One is purely a code organisation construct, another one a whole philosophy of doing software development.

peerreynders · July 7, 2017, 10:28pm

Boundaries are about isolation with the intent of managing dependencies - so its always about having as few dependencies as possible and you want to be especially independent of things that you may want to change in the future. You also want to prune dependencies that may impose changes on you.

You are also supposed to depend only on the part that you actually use - which usually involves writing a thin wrapper around it that exposes the functionality in the way you need to use it.

Got nothing to do with “Not-Invented-Here” - everything with maintaining architectural choice and options.

The most severe case being needing to use a different mapper. If that new mapper is EctoX then maybe I might be able to still use Ecto.Changeset while taking some performance hit or being denied some new features until I change over to EctoX.Changeset. But what if I want to switch to UnectoV? They’ll have UnectoV.Diffset which is incompatible with Ecto.Changeset. So I’m either stuck with Ecto even though I want to switch or I have to upgrade the entire application because, silly me, I let Ecto.Changeset leak and bleed all over the place. At least with a DDD repository approach the changes are contained to the replacement of the DDD repositories and my domain logic and non-persistent storage infrastructure services aren’t effected.

The idea is to stop conflating unrelated concerns (persistent storage and domain logic) and avoiding (vendor) lock-in. Meanwhile I can use all the Ecto specific features I want in the DDD repository because all of that will have to be replaced anyway. And ultimately it’s about daring to judiciously invest in “pay now, save later” (though there are no guaranties in life) rather than the always easy “save now, pay a tons more later but who cares I’m not going to be around anyway” approach.

No, because MyApp.Changeset still serves the needs of the persistence technology (on Ecto’s terms) - not the domain proper. Renaming things does not reduce coupling.

For the domain an error tuple is quite sufficient. Furthermore if the error is related to Ecto or storage, log the details; the domain doesn’t need to know the gory details - just like you don’t spill the goods to the user on an “500 Internal Server Error”.

Just to be clear - it was never my intention to rip on Ecto - though sometimes I may not agree with how it’s employed. But I was stating that in my view there is a fundamental design tension between wanting to establish a boundary around a context while at the same time passing implementation revealing types like Ecto.Changeset through that boundary - and that doing so ultimately compromises the intent behind establishing the context boundary in the first place.

From that point of view stating that not passing Ecto.Changeset through that boundary “is definitely not practical” is to me tantamount to pouring gasoline on the fire of the “contexts are useless” camp.

steve · July 7, 2017, 10:48pm

How would you choose to implement error handling for validations? While you’ve made your criticisms of passing changesets outside of a context known, I think seeing an actual example of what you’re proposing would benefit the discussion.

peerreynders · July 8, 2017, 3:18am

To get right to the point - if I’m investing in domain types I can’t use the validation functionality provided by Ecto.Changeset.

A major point of domaintypes is that instances will only contain information that is known to be valid in reference to the domain. Validation has to happen on data before it becomes a domain type instance. Data entering persistent storage should already be in the form of a domain type instance - i.e. it is already known to be valid. Data coming out of persistent storage should already be valid because it entered it as a domain type instance.

So I would say that Ecto.Changeset conflates the concerns of data mapping and data validation. Now that conflation is convenient for a "Phoenix is your application " development style - which is essentially what Eric Evans calls a “Smart UI”. In a Smart UI you don’t use domain types because the logic isn’t all that complex to begin with, so the overhead of developing a domain model isn’t warranted. The benefit of a Smart UI is that it is relatively easy to put together - even by a team who doesn’t necessarily have deep domain knowledge as there isn’t any complex domain logic to contend with - your typical (close to) CRUD style application.

The problem with a Smart UI is that there is very little margin for growth and development because any additional complexity will quickly push it towards a big ball of mud. Now if in a Smart UI there is a cause to create a context, I guess it would make sense to pass Ecto.Changeset through the boundary - because there aren’t any domain types in the application to begin with - the Ecto schema structs are the “acting domain types”. But if I found a justifiable reason to create a context within a Smart UI, I would get very, very deeply concerned because that would be a potential indicator that I chose the wrong style of application.

The “Phoenix is not your Application” development style is the game changer. “Functional Web Development with Elixir, OTP, and Phoenix” has been making headlines lately - lets look at that:

/_build/dev/lib/islands_engine/ebin/islands_engine.app

{application,islands_engine,
             [{registered,[]},
{description,"islands_engine"},
{vsn,"0.0.1"}, {modules,['Elixir.IslandsEngine','Elixir.IslandsEngine.Board',
                        'Elixir.IslandsEngine.Coordinate',
                        'Elixir.IslandsEngine.Game',
                        'Elixir.IslandsEngine.GameSupervisor',
                        'Elixir.IslandsEngine.Island',
                        'Elixir.IslandsEngine.IslandSet',
                        'Elixir.IslandsEngine.Player',
                        'Elixir.IslandsEngine.Rules']},
              {applications,[kernel,stdlib,elixir,logger]},
              {mod,{'Elixir.IslandsEngine',[]}}]}

With the exception of GameSupervisor all the OTP application modules Board, Coordinate, Game, Island, IslandSet, Player, and Rules deal with domain types and capabilities. No Changeset in sight.

“Foul” you cry - “that application doesn’t even store state in persistent storage - all state is stored in process state!”

So???

What business of the client (to the application) is it, how state is stored within the application?
Would the domain types or capabilities exposed change if some of the state was stored in ETS tables?
What if some of the state was stored in mnesia/amnesia?
What if we used plain-old Postgrex (i.e. no Ecto)?

So why would I start leaking Changesets the moment I use Ecto?

The Smart UI “Anti-Pattern”

. . . That sums up the widely accepted Layered Architecture pattern for object applications. But this separation of UI, application, and domain is so often attempted and so seldom accomplished that its negation deserves a discussion in its own right.

Many software projects do take and should continue to take a much less sophisticated design approach that I call the Smart UI. But Smart UI is an alternate, mutually exclusive fork in the road, incompatible with the approach of domain-driven design. If that road is taken, most of what is in this book is not applicable. My interest is in the situations where the Smart UI does not apply, which is why I call it, with tongue in cheek, an “anti-pattern.” Discussing it here provides a useful contrast and will help clarify the circumstances that justify the more difficult path taken in the rest of the book.

❊ ❊ ❊

A project needs to deliver simple functionality, dominated by data entry and display, with few business rules. Staff is not composed of advanced object modelers.

If an unsophisticated team with a simple project decides to try a Model-Driven Design with Layered Architecture, it will face a difficult learning curve. Team members will have to master complex new technologies and stumble through the process of learning object modeling (which is challenging, even with the help of this book!). The overhead of managing infrastructure and layers makes very simple tasks take longer. Simple projects come with short time lines and modest expectations. Long before the team completes the assigned task, much less demonstrates the exciting possibilities of its approach, the project will have been canceled.

Even if the team is given more time, the team members are likely to fail to master the techniques without expert help. And in the end, if they do surmount these challenges, they will have produced a simple system. Rich capabilities were never requested.

A more experienced team would not face the same trade-offs. Seasoned developers could flatten the learning curve and compress the time needed to manage the layers. Domain-driven design pays off best for ambitious projects, and it does require strong skills. Not all projects are ambitious. Not all project teams can muster those skills.

Therefore, when circumstances warrant:

Put all the business logic into the user interface. Chop the application into small functions and implement them as separate user interfaces, embedding the business rules into them. Use a relational database as a shared repository of the data. Use the most automated UI building and visual programming tools available.

Heresy! The gospel (as advocated everywhere, including elsewhere in this book) is that domain and UI should be separate. In fact, it is difficult to apply any of the methods discussed later in this book without that separation, and so this Smart UI can be considered an “anti-pattern” in the context of domain-driven design. Yet it is a legitimate pattern in some other contexts. In truth, there are advantages to the Smart UI, and there are situations where it works best—which partially accounts for why it is so common. Considering it here helps us understand why we need to separate application from domain and, importantly, when we might not want to.

Advantages

Productivity is high and immediate for simple applications.
Less capable developers can work this way with little training.
Even deficiencies in requirements analysis can be overcome by releasing a prototype to users and then quickly changing the product to fit their requests.
Applications are decoupled from each other, so that delivery schedules of small modules can be planned relatively accurately.
Expanding the system with additional, simple behavior can be easy.
Relational databases work well and provide integration at the data level.
4GL tools work well.
When applications are handed off, maintenance programmers will be able to quickly redo portions they can’t figure out, because the effects of the changes should be localized to each particular UI.

Disadvantages

Integration of applications is difficult except through the database.
There is no reuse of behavior and no abstraction of the business problem. Business rules have to be duplicated in each operation to which they apply.
Rapid prototyping and iteration reach a natural limit because the lack of abstraction limits refactoring options.
Complexity buries you quickly, so the growth path is strictly toward additional simple applications. There is no graceful path to richer behavior.

If this pattern is applied consciously, a team can avoid taking on a great deal of overhead required by other approaches. It is a common mistake to undertake a sophisticated design approach that the team isn’t committed to carrying all the way through. Another common, costly mistake is to build a complex infrastructure and use industrial strength tools for a project that doesn’t need them.

Most flexible languages (such as Java) are overkill for these applications and will cost dearly. A 4GL-style tool is the way to go.

Remember, one of the consequences of this pattern is that you can’t migrate to another design approach except by replacing entire applications. Just using a general-purpose language such as Java won’t really put you in a position to later abandon the Smart UI, so if you’ve chosen that path, you should choose development tools geared to it. Don’t bother hedging your bet. Just using a flexible language doesn’t create a flexible system, but it may well produce an expensive one.

By the same token, a team committed to a Model-Driven Design needs to design that way from the outset. Of course, even experienced project teams with big ambitions have to start with simple functionality and work their way up through successive iterations. But those first tentative steps will be Model-Driven with an isolated domain layer, or the project will most likely be stuck with a Smart UI.

❊ ❊ ❊

The Smart UI is discussed only to clarify why and when a pattern such as Layered Architecture is needed in order to isolate a domain layer.
There are other solutions in between Smart UI and Layered Architecture. For example, Fowler (2003) describes the Transaction Script, which separates UI from application but does not provide for an object model. The bottom line is this: If the architecture isolates the domain-related code in a way that allows a cohesive domain design loosely coupled to the rest of the system, then that architecture can probably support domain-driven design.

Other development styles have their place, but you must accept varying limits on complexity and flexibility. Failing to decouple the domain design can really be disastrous in certain settings. If you have a complex application and are committing to Model-Driven Design, bite the bullet, get the necessary experts, and avoid the Smart UI.

Eric Evans (2003). Domain Driven Design: Tackling Complexity in the Heart of Software (pp.76-79). Boston, MA: Addison-Wesley.

steve · July 8, 2017, 4:16am

I’m note sure that answers my question. I understand that you don’t want to have changesets involved at this point. What I’m asking is what a realistic alternative for validation errors would look like. Let’s assume that you’re given a basic registration form containing name, email, and password. Then let’s assume you have the following validation rules:

name, email, and password are required
name cannot be more than 100 characters
email must contain “@”
email cannot be more than 254 characters
password must be at least 8 characters and no more than 100 characters

If a single validation rule is violated, what would your error response look like? If multiple validation rules are violated, what would your response look like?

peerreynders · July 8, 2017, 5:03am

I really don’t know how this adds anything to the discussion at hand.

The lowest common denominator is to use regular expressions to validate the various entries. As to whether or not to just report the first or all violations is probably dependent on the circumstances. Also if an SPA is used, in-browser based validation is necessary for user experience but the server side functionality would never rely on that and it would always do its own duplicated validation - yes, that type of duplication is annoying and tedious but in the end, it’s a necessary evil unless you are using something like Clojure on the server and ClojureScript in the browser so that you can use the same validation on both ends.

However a quick google lead me to Vex so I would see if I could leverage that within the application on the server side - it seems to focus on doing one thing well - which I prefer over this:
just because you can doesn't mean you should

steve · July 8, 2017, 5:35am

It adds to the discussion because I’m trying to understand your position beyond the criticism. You’ve made it clear that you don’t believe we should be leaking changesets. However, I would like to see what you would consider a reasonable alternative. I’m not asking how you would validate the data. I’m asking what you would return from a function assuming there was a validation error. As it stands, I still agree with @michalmuskala when it comes to using changesets as a simple data structure to describe validation errors.

The point of this thread was to bounce ideas back and forth about how to structure contexts with high-level examples. Examples are key for people like myself to understand concepts completely. I feel that this thread is on the verge of getting derailed. If you feel as though contexts should never return a changeset, then I could see it being relevant to this discussion, but I’d ask that you’d offer a reasonable alternative via an example rather describing architecture design practices. I don’t think I’m asking anything unreasonable.

To bring the thread back on topic, would you care to share how you could structure any context in this application? It doesn’t have to be a large one, but you seem to be well versed in the design pattern you’re proposing and I’m curious to see how you would approach contexts.

michalmuskala · July 8, 2017, 8:00am

It is fair to say that changesets are part of ecto and that they are controlled by Ecto. On the other hand, recently, when explaining what Ecto is I tend to say it’s a “data modelling and database library”. The truth is, Ecto could be two separate libraries - schemas & changesets in one and repo & all the database stuff in the other. This separation is one of the things I’m actively exploring. I frequently use changesets even for things that have no database at all - they are a convenient abstraction for data processing even without a database.

One thing I’m missing in many DDD discussions is the value of common abstractions. Sure, it’s possible for each context to build its own way of returning errors. It might be fine if that’s formalised. It’s much worse if it isn’t. If they return some haphazardly created data structures that have no “formal” structure and can (and will) change - from my point of view that’s exactly what will be the effect of using “raw” tuples or maps. Using a changeset, even though it comes from a library, gives some common structure and formalisation of the return value. This is very valuable. Using libraries imposes constraints and that is sometimes a good thing.

And it’s true an experienced and disciplined team can create their own abstractions. They can build a protocol for extracting errors or they can build an in-application library for common error handling. It will be perfect and work exactly like they need it to. There are obvious tradeoffs related to this, though.

peerreynders · July 10, 2017, 4:58pm

I was trying to more clearly understand your position on the need for structured error data - so I came across this: Error Handling in Elixir Libraries

One thing that occurred to me in the category of “actionable errors” is that quite frequently the effective action in response to the error is entirely independent of the error reason.

When I have just about enough time to drive to a meeting and the car fails to start then an error report as to whether the battery died due the overnight frost or because a racoon decided to make a snack of the wiring is entirely unhelpful in my objective of reaching the meeting on time - I have to face the fact that the car won’t get me there and quickly find an alternate mode of transportation in order to meet my immediate objective.

Detailed error information is important for error logging and consistent formatting is extremely helpful for mining the logs. Detailed and specific error messages are also essential in the UI in order to quickly and effectively direct the user towards corrective action (though in SPAs that logic is typically entirely contained within the browser). In most other cases I would expect that using ok/error tuples (with a reason string) in the role of an Either type is quite sufficient especially if the message is not intended for the user of the system (because then i18n could become an issue).

However I would likely augment their use with the approach described here in order to bypass the typical awkwardness associated with ok/error tuples.

michalmuskala · July 10, 2017, 6:37pm

The problem with using plain strings for errors (or atoms, tuples, etc) is that they are not extensible. You can see this problem with Erlang - it’s widely known for poor error messages. Most functions in :ets or :crypto will raise :badarg as error. Because this is their interface, they can never make the errors more helpful without breaking backwards compatibility. Given how fundamental these modules are, it’s probably never going to happen.

On the other hand, if you look at Elixir errors which are structs, you can very easily add a field and improve the error message without breaking any contracts.

peerreynders · July 10, 2017, 7:25pm

I’m sensing an assumption that contexts are a result of design activities - I would challenge that assumption and counter that in fact they are usually a result of domain analysis activities, especially when it comes to the DDD bounded contexts.

This does not preclude the possibility of boundaries being only discovered when coding is already underway - sometimes the facts that influence such a discovery are buried so deep that an effort would be cancelled due to Analysis Paralysis long before these details are unearthed.

One thing from DDD that doesn’t get enough attention is the practice of ubiquitous language:

A language structured around the domain model and used by all team members to connect all the activities of the team with the software.

Developers and domain experts have to share the identical terminology and mental model - to the point that they have to agree on a ubiquitous language domain glossary which includes all the terms and the detailed explanations.

I was once in a situation where I was continuously compiling and updating a unified domain glossary after every meeting for some time after project inception in order to unify the communications between the various stakeholders and developers who previously where using divergent terminology for the similar concepts and similar terminology for divergent concepts. It was during the act of talking to the domain experts and formulating this ubiquitous language that most of the boundaries started to reveal themselves.

So while, for example, a concept like product seems straight forward enough the relevant information about it can vary considerably among the various contexts like:

Procurement
Inventory
Pricing
Fulfillment
Sales
Marketing

which can lead to each context having it’s own, different product type which are all correlated via a correlation ID. Furthermore the product type used internally inside the context for collaboration between the context’s capabilities will likely be much more rich and detailed than the one it shares with its clients in order to stop the clients from becoming coupled to the context’s implementation details.

Now when it comes to fine-grained contexts it’s still about low coupling and high cohesion.

|                     | Tight Coupling                             | Loose Coupling                       |
===========================================================================================================
| Referencing         | Point-to-Point                             | Via Mediator                         |
| Communication style | Synchronous                                | Asynchronous                         |
| Data model          | Common complex types                       | Simple common types only             |
| Type System         | Strong                                     | Weak                                 |
| Interaction Pattern | Navigate through complex nested structures | Data-centric, self-contained message |
| Control of Process  | Central control                            | Distributed control                  |
| Binding             | Statically                                 | Dynamically                          |
| Platform            | Strong Platform dependencies               | Platform independent                 |
| Transactionality    | 2PC (two-phase commit)                     | Compensation                         |
| Deployment          | Simultaneous                               | At different times                   |
| Versioning          | Explicit upgrades                          | Implicit upgrades                    |

Asynchronous communication typically refers to (events). So loose coupling rarely comes easily nor cheaply.

But at times reversing the direction of dependencies is good enough: Dependency Inversion Principle in the Wild.