Elixact - schema definition and validation (think Pydantic in Elixir)

Hey,

I have used a bunch of schema libraries in Elixir and didn’t find the one satisfies all my needs. At some point, I kinda miss Python for having Pydantic. So recently I started to make one for myself.

Elixact

Elixact is a schema definition and validation library, inspired by Python’s Pydantic.

Features

  • :dart: Intuitive Schema DSL - Similar to Ecto.Schema syntax
  • :mag: Strong Type Validation - Comprehensive validation for basic and complex types
  • :bar_chart: JSON Schema Support - Automatic generation of JSON Schema from your Elixir schemas
  • :jigsaw: Custom Types - Easily define reusable custom types
  • :christmas_tree: Nested Schemas - Support for deeply nested data structures
  • :chains: Field Constraints - Rich set of built-in constraints
  • :rotating_light: Structured Errors - Clear and actionable error messages

See the doc for more details: elixact v0.1.1 — Documentation

Github: GitHub - LiboShen/elixact: schema definition and validation library for Elixir
Hex: elixact | Hex

Credits / Prior Arts

As I mentioned above, I have tried many libraries to give me strong data validations. I took inspirations from all of them:

Ecto is the obvious choice when you need simple a schema.

Typestruct is a lightweight improvement of the builtin struct.

Drops is the closest of what I need. But I failed to make it work with nested or complex schemas.

I’m using it for my several projects now. Let me know if you find this is useful. I’m open for collaboration to make it better. Thanks.

7 Likes

I have yet to use one of these libraries—I just keep using Ecto for now even though it gets clunky for general use—but I really like this DSL. Nice work, I’ll keep it in mind. Also, good name :sweat_smile:

4 Likes

Glad to see that you spotted the pun :joy:

Here are my few cents:

gteq and lteq is not a common naming. I recommend ge and le respectively.


gt and others looks good as a short key in options Keyword, but too short for DSL. I would recommend to add between, count_between (for array), length_between (for string) and maybe even change naming for example from gt to greater_than. If you do not like it you can alternatively create a constraints macro.

defmodule Example do
  use Elixact

  schema "name" do
    field :age, :integer do
      constraints gt: 0, lt: 10
    end
  end
end

    field :settings, {:map, {:string, {:union, [:string, :boolean, :integer]}}} do

This is not bad for a contributors in your project, but it doesn’t look best as DSL. You can consider using @spec notation, so you can convert it’s AST into such data structure within your library.

    field :settings, %{String.t() => String.t() | boolean | integer}) do

looks much more clear.


    {block, _opts} =
      Keyword.pop(
        opts,
        :do,
        quote do
        end
      )

You can change the default to {:__block__, [], []}, so it would look like:

    {block, _opts} = Keyword.pop(opts, :do, {:__block__, [], []})

Since you don’t use _opts you can change it to:

    block = opts[:do] || {:__block__, [], []}

and since you are not using opts in other place you can also change the function heading to:

  defmacro field(name, type, opts \\ [do: {:__block__, [], []}])
  defmacro field(name, type, do: block) do

That’s -7 lines of code (LOC) without losing any feature. :+1:


      var!(field_meta) =

As you may know variable field_meta would not be hygienized. I would recommend to store it in other ways. A common practice is to use module attributes, but those as same as variables may conflict with the code. What I personally like is to store data in process. People don’t use Process often and especially in compile time. If you would use your own Agent (with moduledoc set to false) which starts only in compile time then it would be even better as there is no chance that some developer would use such data.


Many modules are not documented. Consider adding documentation or use @moduledoc false in case some module is not part of public API. To avoid such things in future I recommend to give a try credo tool.


Elixact.Application is starting without any children. If you don’t plan to use it you don’t have to even create such file. All you have to do is to remove mod: {Elixact.Application, []} in application function inside Elixact.MixProject.


The file lib/elixact/config.ex is empty. If you don’t use it you should remove it.

5 Likes

Hey, these tips are super helpful.

  1. I like the constraints macro approach. It provides a natural grouping for value constraints and other field metadata.
  2. Using @spec notation is nicer, I agree. Never thought about it before. I’d like to look into it.
  3. And thanks for the compile time Agent tip. It’s kinda eye-opening TBH.
  4. I’ll do housekeeping for the silly leftovers.
4 Likes

I had the same thoughts re: spec syntax as well as between (though I was thinking range as with between I’m never sure if it’s inclusive or not) instead of gt etc. Otherwise, I always appreciate the fully typed out greater_than over gt. Just IMO, of course.

3 Likes

Last week I shamelessly drop a link to my estructura library for the second time in a row, but still: @spec notation would make one obliged to implement an ad hoc, informally-specified, bug-ridden, slow implementation of half of dialyzer. One cannot allow spec notation and then restrict it to some types. Working with the remote complex types is a pain and a ton of code.

That’s why I went with StreamData types, what literally granted me the no-code implementation of data generation for property-based testing.

3 Likes

That explains why I never saw a library use @spec notation for type definition except dialyzer. It matches my initial intuition: it might be either hard to implement or limited in expressiveness.

From pure readability aspect, my rank for the notions are

  1. field :settings, %{String.t() => String.t() | boolean | integer} do
  2. field :settings, map(string(), union(binary(), boolean(), integer())) do
  3. field :settings, {:map, {:string, {:union, [:string, :boolean, :integer]}}} do

Fine, but who said you have to support User-defined types? As long as you support everything in core the user would be able to use: %SomeStruct{} as that’s a literal (not user defined type). In this case the most complicated spec is map and said struct notation as you have to support nested keys and values (which you already do in {:map, …} case. :thinking:

From that point you can even add support to user defined types later as all you would have to do is to fetch nested types, so in case of map and struct you would simply iterate over key and value, check if any of them is user-defined, if so fetch it and continue nested work for fetched data. :recycle:

As long as you support everything which is not user defined and you have no problem with recursive calls support even whole @spec notation should not be as big pain as it looks. So why almost nobody is using that? Well … it may be that they support maps and structs simply in the nested DSL. :sweat_smile:

%{key => value} notation is usually supported for a flat maps witth dynamic keys. In example above I suggested @spec-like syntax for union and flat map - not for a nested map and struct as if we know nested structure then we can simply use DSL for that. Otherwise let value be any map. :see_no_evil:

It seems that there is a huge flow of type validation libraries lately and all try to define their own DSL or data structure for validation.

Folk that is already doing these kind of validations for years at the edge of the system are using ecto + embedded schemas. Ecto strikes the perfect balance between compile-time definition of base types and runtime validation with changesets.

It has some shortcomings including:

  1. Too tied to database - type validation code, custom types are very tied to database and it has inherently some bugs that could be reworked entirely;
  2. No composition, only associations - this can be easily solved even in base ecto with some metaprogramming, but it would be nice to have native support;
  3. Schemas are tied to a module - this one is hard to decide, however I would prefer if schemas were not tied to a module.

Solve all of these problems while keeping the ecto way of doing things and you will not only will have a library that works well and it’s very flexible, but everyone that already uses ecto for non-database validations will happily migrate over, me including.

3 Likes

The lack of type flexibility has been my biggest problem using Ecto all the time. Lack of union types is probably the biggest thing and ya, custom types are too tied to the database.

This is a absolutely great point to address too, even though this falls into the extremely complex domain of types. For first releases I would be happy even with a few improvements over ecto.

Since it seem more complex than everybody thought initially, I made this issue to track the discussion in GitHub. Feel free to add comments.

Well, I meant stuff like Supervisor.child_spec/0 which is “core” by all means. To support it, one should compile everything with docs chunks, load all these chunks, store them somewhere…

I went through this road till the end in tyyppi, and once I did, I immediately and inevitably abandoned the library. Even despite validations (including but not limited to unions) worked in a runtime (see this test.)

Matchers are proved to work and might be borrowed from there as is.

Oh, for sure that’s a Remote type in Elixir core. :sweat_smile:

For initial support I wanted to suggest Basic types | Typespecs reference @ Elixir documentation with few remote types like ModuleName.t() in core modules (Range, String), but without stuff like Supervisor.child_spec/0.

Well … there is no need to store them, right? I take a look what iex helpers are doing and they simply call Code.fetch_docs/1 and deal with nested structures. It would not be few lines of code, but rather a small module so supporting it doesn’t look very hard.

However I agree that every project should not do that. I believe we need a hex package to:

  1. Fetch result typespec for functions, so we would be able to @spec func_name(…) :: unquote(same_as(Mod.fun/arity)).

  2. Function to “flatten” typespec which would fetch Remotes and User-defined types and translate them into a same nested structure, but using only Basic, Literals and Built in types, so it would be easier to parse them.

Did I miss something else?

In :prod we don’t have docs chunks, they are stripped out. That’s why I said “To support it, one should compile everything with docs chunks, load all these chunks, store them somewhere…”

1 Like

I see, so it’s only about prod runtime as docs could also be stripped when compiling code? Hmm … :thinking:

So … we would need to fetch docs and if it fails load it from cache … That cache would need to be added to priv directory, let’s call it priv//cache for now … That’s really a problem … Even if we cache results we may not have them when compiling code to prod, so even cache may be a bad idea … The only way would be to fetch sources and get docs from there, but it’s too tricky … :confused:

Yeah, there is no good way to deal with it. Will remember it, thanks. :+1: