Elixact

libo · December 12, 2024, 5:08pm

Hey,

I have used a bunch of schema libraries in Elixir and didn’t find the one satisfies all my needs. At some point, I kinda miss Python for having Pydantic. So recently I started to make one for myself.

Elixact is a schema definition and validation library, inspired by Python’s Pydantic.

Features

Intuitive Schema DSL - Similar to Ecto.Schema syntax
Strong Type Validation - Comprehensive validation for basic and complex types
JSON Schema Support - Automatic generation of JSON Schema from your Elixir schemas
Custom Types - Easily define reusable custom types
Nested Schemas - Support for deeply nested data structures
Field Constraints - Rich set of built-in constraints
Structured Errors - Clear and actionable error messages

See the doc for more details: elixact v0.1.1 — Documentation

Github: GitHub - LiboShen/elixact: schema definition and validation library for Elixir
Hex: elixact | Hex

Credits / Prior Arts

As I mentioned above, I have tried many libraries to give me strong data validations. I took inspirations from all of them:

Ecto is the obvious choice when you need simple a schema.

Typestruct is a lightweight improvement of the builtin struct.

Drops is the closest of what I need. But I failed to make it work with nested or complex schemas.

I’m using it for my several projects now. Let me know if you find this is useful. I’m open for collaboration to make it better. Thanks.

sodapopcan · December 12, 2024, 5:44pm

I have yet to use one of these libraries—I just keep using Ecto for now even though it gets clunky for general use—but I really like this DSL. Nice work, I’ll keep it in mind. Also, good name

libo · December 12, 2024, 7:06pm

Glad to see that you spotted the pun

Eiji · December 12, 2024, 7:09pm

Here are my few cents:

gteq and lteq is not a common naming. I recommend ge and le respectively.

gt and others looks good as a short key in options Keyword, but too short for DSL. I would recommend to add between, count_between (for array), length_between (for string) and maybe even change naming for example from gt to greater_than. If you do not like it you can alternatively create a constraints macro.

defmodule Example do
  use Elixact

  schema "name" do
    field :age, :integer do
      constraints gt: 0, lt: 10
    end
  end
end

    field :settings, {:map, {:string, {:union, [:string, :boolean, :integer]}}} do

This is not bad for a contributors in your project, but it doesn’t look best as DSL. You can consider using @spec notation, so you can convert it’s AST into such data structure within your library.

    field :settings, %{String.t() => String.t() | boolean | integer}) do

looks much more clear.

    {block, _opts} =
      Keyword.pop(
        opts,
        :do,
        quote do
        end
      )

You can change the default to {:__block__, [], []}, so it would look like:

    {block, _opts} = Keyword.pop(opts, :do, {:__block__, [], []})

Since you don’t use _opts you can change it to:

    block = opts[:do] || {:__block__, [], []}

and since you are not using opts in other place you can also change the function heading to:

  defmacro field(name, type, opts \\ [do: {:__block__, [], []}])
  defmacro field(name, type, do: block) do

That’s -7 lines of code (LOC) without losing any feature.

      var!(field_meta) =

As you may know variable field_meta would not be hygienized. I would recommend to store it in other ways. A common practice is to use module attributes, but those as same as variables may conflict with the code. What I personally like is to store data in process. People don’t use Process often and especially in compile time. If you would use your own Agent (with moduledoc set to false) which starts only in compile time then it would be even better as there is no chance that some developer would use such data.

Many modules are not documented. Consider adding documentation or use @moduledoc false in case some module is not part of public API. To avoid such things in future I recommend to give a try credo tool.

Elixact.Application is starting without any children. If you don’t plan to use it you don’t have to even create such file. All you have to do is to remove mod: {Elixact.Application, []} in application function inside Elixact.MixProject.

The file lib/elixact/config.ex is empty. If you don’t use it you should remove it.

libo · December 12, 2024, 7:45pm

Hey, these tips are super helpful.

I like the constraints macro approach. It provides a natural grouping for value constraints and other field metadata.
Using @spec notation is nicer, I agree. Never thought about it before. I’d like to look into it.
And thanks for the compile time Agent tip. It’s kinda eye-opening TBH.
I’ll do housekeeping for the silly leftovers.

sodapopcan · December 12, 2024, 8:03pm

I had the same thoughts re: spec syntax as well as between (though I was thinking range as with between I’m never sure if it’s inclusive or not) instead of gt etc. Otherwise, I always appreciate the fully typed out greater_than over gt. Just IMO, of course.

mudasobwa · December 13, 2024, 6:27am

Last week I shamelessly drop a link to my estructura library for the second time in a row, but still: @spec notation would make one obliged to implement an ad hoc, informally-specified, bug-ridden, slow implementation of half of dialyzer. One cannot allow spec notation and then restrict it to some types. Working with the remote complex types is a pain and a ton of code.

That’s why I went with StreamData types, what literally granted me the no-code implementation of data generation for property-based testing.

libo · December 13, 2024, 12:21pm

That explains why I never saw a library use @spec notation for type definition except dialyzer. It matches my initial intuition: it might be either hard to implement or limited in expressiveness.

From pure readability aspect, my rank for the notions are

field :settings, %{String.t() => String.t() | boolean | integer} do
field :settings, map(string(), union(binary(), boolean(), integer())) do
field :settings, {:map, {:string, {:union, [:string, :boolean, :integer]}}} do

Eiji · December 13, 2024, 3:23pm

Fine, but who said you have to support User-defined types? As long as you support everything in core the user would be able to use: %SomeStruct{} as that’s a literal (not user defined type). In this case the most complicated spec is map and said struct notation as you have to support nested keys and values (which you already do in {:map, …} case.

From that point you can even add support to user defined types later as all you would have to do is to fetch nested types, so in case of map and struct you would simply iterate over key and value, check if any of them is user-defined, if so fetch it and continue nested work for fetched data.

As long as you support everything which is not user defined and you have no problem with recursive calls support even whole @spec notation should not be as big pain as it looks. So why almost nobody is using that? Well … it may be that they support maps and structs simply in the nested DSL.

%{key => value} notation is usually supported for a flat maps witth dynamic keys. In example above I suggested @spec-like syntax for union and flat map - not for a nested map and struct as if we know nested structure then we can simply use DSL for that. Otherwise let value be any map.

D4no0 · December 13, 2024, 8:44pm

It seems that there is a huge flow of type validation libraries lately and all try to define their own DSL or data structure for validation.

Folk that is already doing these kind of validations for years at the edge of the system are using ecto + embedded schemas. Ecto strikes the perfect balance between compile-time definition of base types and runtime validation with changesets.

It has some shortcomings including:

Too tied to database - type validation code, custom types are very tied to database and it has inherently some bugs that could be reworked entirely;
No composition, only associations - this can be easily solved even in base ecto with some metaprogramming, but it would be nice to have native support;
Schemas are tied to a module - this one is hard to decide, however I would prefer if schemas were not tied to a module.

Solve all of these problems while keeping the ecto way of doing things and you will not only will have a library that works well and it’s very flexible, but everyone that already uses ecto for non-database validations will happily migrate over, me including.

sodapopcan · December 13, 2024, 10:01pm

The lack of type flexibility has been my biggest problem using Ecto all the time. Lack of union types is probably the biggest thing and ya, custom types are too tied to the database.

D4no0 · December 13, 2024, 10:07pm

This is a absolutely great point to address too, even though this falls into the extremely complex domain of types. For first releases I would be happy even with a few improvements over ecto.

libo · December 13, 2024, 10:54pm

github.com/LiboShen/elixact

Improve type notation

opened 10:52PM - 13 Dec 24 UTC

LiboShen

There are 3 kinds of type notations: ## `@spec` notation ```elixir field …:settings, %{String.t() => String.t() | boolean | integer} do ``` IMO, most readable, especially for union types. But potentially difficult to implement [^1]. ## function-style notation ```elixir field :settings, map(string(), union(binary(), boolean(), integer())) do ``` Relatively clean. ## tuple-style notation ```elixir field :settings, {:map, {:string, {:union, [:string, :boolean, :integer]}}} do ``` **Current implementation.** Following the Ecto schema style. But visually it seems not scale very well when the types got complex. In the meanwhile, Elixir is incorporating set-theoretic types into the compiler. However, the existing Erlang Typespecs is claimed not precise enough and will be phased out of the language [^2] [^1]: https://elixirforum.com/t/elixact-schema-definition-and-validation-think-pydantic-in-elixir/68059/7 [^2]: https://hexdocs.pm/elixir/main/gradual-set-theoretic-types.html#roadmap

Since it seem more complex than everybody thought initially, I made this issue to track the discussion in GitHub. Feel free to add comments.

mudasobwa · December 14, 2024, 6:08am

Well, I meant stuff like Supervisor.child_spec/0 which is “core” by all means. To support it, one should compile everything with docs chunks, load all these chunks, store them somewhere…

I went through this road till the end in tyyppi, and once I did, I immediately and inevitably abandoned the library. Even despite validations (including but not limited to unions) worked in a runtime (see this test.)

Matchers are proved to work and might be borrowed from there as is.

Eiji · December 14, 2024, 11:29am

Oh, for sure that’s a Remote type in Elixir core.

For initial support I wanted to suggest Basic types | Typespecs reference @ Elixir documentation with few remote types like ModuleName.t() in core modules (Range, String), but without stuff like Supervisor.child_spec/0.

Well … there is no need to store them, right? I take a look what iex helpers are doing and they simply call Code.fetch_docs/1 and deal with nested structures. It would not be few lines of code, but rather a small module so supporting it doesn’t look very hard.

However I agree that every project should not do that. I believe we need a hex package to:

Fetch result typespec for functions, so we would be able to @spec func_name(…) :: unquote(same_as(Mod.fun/arity)).
Function to “flatten” typespec which would fetch Remotes and User-defined types and translate them into a same nested structure, but using only Basic, Literals and Built in types, so it would be easier to parse them.

Did I miss something else?

mudasobwa · December 15, 2024, 9:39am

In :prod we don’t have docs chunks, they are stripped out. That’s why I said “To support it, one should compile everything with docs chunks, load all these chunks, store them somewhere…”

Eiji · December 15, 2024, 9:51am

I see, so it’s only about prod runtime as docs could also be stripped when compiling code? Hmm …

So … we would need to fetch docs and if it fails load it from cache … That cache would need to be added to priv directory, let’s call it priv//cache for now … That’s really a problem … Even if we cache results we may not have them when compiling code to prod, so even cache may be a bad idea … The only way would be to fetch sources and get docs from there, but it’s too tricky …

Yeah, there is no good way to deal with it. Will remember it, thanks.

gtcode · June 17, 2025, 11:35pm

Been grinding on a fork of Elixact for the past week, as it’s a foundational dependency for the DSPy Elixir port in progress:

The features are 99% on parity with Pydantic for the subset needed for a DSPy-style buildout.

Will be happy either way, whether the original project owner merges these significant changes or not. Will create a new project if the merges are declined.

edit: While the code is very stable, there are a few smells that came up. First was a somewhat disorganized layering with regard to so-called ‘enhanced’ functionality when I got to the runtime schemas. Further, some of the test or example names are poor, such as ‘phase 4 tests’ that were relevant at the time. The facade itself, however, should be fairly clean.

libo · June 19, 2025, 5:40pm

Hey gtcode, thank you for working on the improvements.

Could you give me a list of what features you were adding, and ideally break your changes into small but self-contained chunks that can be meaningfully reviewed?

I saw your PRs yesterday and it will take me some time to review (bcs they are huge: 15000+ lines of new code in total )

gtcode · June 19, 2025, 6:03pm

hey Libo,

Did some review on the fork. It wound up being really clunky! So I cancelled the PR for now, since it is unclear if a frankenstein that can handle compile time and run time schemas per the “enhanced” arch is good/performant/useful.

I wound up making a new, minimal proj tailored for my needs:

Thanks to your project, Libo, I was able to iterate and settle on the simplified Sinter arch as a result of the fork.

As for the Elixact mods, which are extensive, plz review to see if it makes sense.. It should be backwards compatible with your arch. I am just not stoked on my “enhanced” arch which evolved a bit organically and tried to do too much!

That’s why I cancelled the PR.

Having said that if you’d like to proceed, can you eval nshkrdotcom/elixact and provide feedback on revisions? I’ll work on refactoring per your direction (since the proj ultimately is your vision) before submitting a better PR. It may be worth considering whether to refactor/downsize/focus my fork.