CI/CD as a library

talentdeficit · January 15, 2021, 6:54pm

dhall is fantastic. it’s exactly the right layer of abstraction for all these ‘declarative but really wish i could use some loops’ tasks like configuration yaml and terraform modules

my concern with ci/cd in a language like elixir is that there’s way too many traps you can inadvertently fall into and end up with non deterministic builds (more so than the level of nondeterminism we already put up with)

sasajuric · January 15, 2021, 7:11pm

Could you elaborate on this, maybe with some more concrete examples?

karolsluszniak · January 15, 2021, 7:29pm

Hello @sasajuric. I’m very excited to see another project from you. I really appreciate well-designed, reusable abstractions - top grade stuff. Also, like boundary, it tackles a non-obvious Elixir development concern that coincidentally I can highly relate to.

I’m thrilled that in this case it’s even more so as I’m the developer behind ex_check, a tool that seems to fulfill most of the immediate benefits from ci that you’ve described above (Elixir-driven, locally runnable, parallel + grab all issues in one go). I’ve spent a fair share of time designing it and it’s currently my top OS project, my true love child of Elixir and pragmatism

The main difference is that ex_check is consciously highly focused on a very specific task - checking Elixir code with all the tools that the ecosystem has to offer + others that power users may add. For that sake, declarative aproach - following in the footsteps of formatter, credo or most CIs - seemed like a fitting design choice. This means ex_check gives less room to freely assemble building blocks - it actually implements just the workflow that you’ve posted as an example - “compile && run remaining tools in parallel”. OTOH this is what allows it to be a one line drop in solution to empower new & existing projects.

I must admit that your imperative approach looks very nice & inspiring and I can’t help but wonder if it’d work for ex_check as well i.e. if it’d be beneficial for what it is. It could still come with a default workflow but with ability to replace it with custom one or even re-compose it by using its building blocks - eg. curated tool definitions - via simple module/function composition. What do you think?

Anyway, such a high level of speciality has (obviously) allowed ex_check to come with some extra perks for the use case that it focuses on (again, at the expense of flexibility that ci seems to aim for), for example:

bundling config for all core & community tools that work as sensible defaults for most projects
very specific way to stream output from tools executed in parallel, friendly for progress dots like those output from ex_unit
parallel execution of single tool in multiple umbrella apps with no extra effort

I completely agree with you on the subject of operational complexity and the benefits of staying frictionless. That was one of initial “excuses” for me to do ex_check even though there are so many tools in other languages, including earthly - a new cool child in town mentioned above. As much as I like docker for what it is and how it has steamlined many aspects of ops, I wouldn’t replace ex_check with earthly as it pulls its job perfectly for all my Elixir/Phoenix projects without forcing docker on me in development.

Oh and cool use of telemetry indeed ex_check also has Command module akin to your OsCmd but it uses traditional callbacks for printing purposes. And as the lib is declarative, the building blocks - even if they could be reused - are hidden, that’s one point for imperative as I can imagine cases when I’d reuse that module if it was properly exposed.

I’ll be grateful for all the comments and insights on how you @sasajuric & everyone else in here see ex_check when compared to ci, how ex_check could become better learning from ci or perhaps if it’s fine the way it is.

talentdeficit · January 15, 2021, 7:58pm

one of the advantages of using a declarative “non executable” format like yaml for configuring ci (and other things, like k8s definitions, for example) is that every execution using that declarative artifact should be deterministic. all i as a user need to do is read that file. obviously you can violate this in various ways by introducing directives that read other files or env vars or via tools that use directives to do inherently non-deterministic things (like reading from external state store) but the potential misuse is limited in scope to what the tool and the “magic” directives introduce

by going to a fully imperative, unconstrained language for ci like elixir, now the opportunity to do misguided things is ramped way up. we saw this in mix configs where users introduced problems by making calls in their configs that caused issues in different execution environments. for example, reading from an env var that was present in dev but not prod or by trying to read from files not present in the build artifact. often this was done not out of a concious decision to introduce those issues but out of ignorance or – worse – out of an increase in complexity that was not really tractable to understand. using a first class programming language for ci is going to lead to many of the same problems

contrast with something like dhall. dhall is a language for generating declarative configurations. it offers most of what you expect from a programming language (functions, variables, libraries) but introduces the constraint that you can’t do any i/o (outside of very constrained things, like including other files in a static manner). this means that every execution is deterministic with regards to it’s initial inputs (the set of files it reads). the same files on any host will produce the same outputs

i get the desire to not introduce new concepts and new tooling and i am 100% in agreement that a lot of the “yaml as program” pattern is terrible, but i think embracing programs as configuration is the wrong direction

derek-zhou · January 15, 2021, 10:28pm

Depends on how would you define configuration. If by configuration you mean some static key/value pairs then you are obviously correct; however, anything beyond that could be considered programs. What do you think about .emacs file? Hell, .bashrc file?

sasajuric · January 16, 2021, 12:10am

Hey,

I haven’t encountered ex_check so far, it looks pretty cool! The video drives the point very clearly, good job!

I’m not really sure to what extent would ex_check benefit from the imperative approach. Base on what I’ve seen, my understanding is that it’s a plug and play tool, and in this case a declarative config makes more sense, because the user really needs to provide facts (e.g. run this, don’t run that), and not the flow. But TBH I didn’t have the time to look at docs or the code more carefully, so perhaps I’m missing some opportunities. As the author and someone who’s so deep in, you’re definitely in a better position to asses this

There is a bit of overlap, but I can see both libs having their own worth. When it comes to ex_check, It’s cool that people can quickly get some standard checks running, and I quite like how the tool gently informs the developer that there are some other possible checks worth considering.

The tools can also complement each other. For example, one could easily use ex_check from the ci by just running mix check, although it would be interesting to see if a tighter integration could be obtained, such that ex_check can provide the list of commands, which would then be executed by the ci engine.

I’m not sure what ex_check can get from ci (but hopefully there is something ), but I think that ci can learn something from ex_check. The present generator is pretty basic and naive, generating just the few basic checks. It could take a similar approach to ex_check, figuring out from the dependencies which steps can be added (e.g. db setup if ecto is a dependency, dialyzer if dialyxir is a dep, etc.).

sasajuric · January 16, 2021, 2:54pm

But IMO CI flow is not configuration. It’s an imperative program. Of course, you can always treat a program as a collection of facts (it’s all 0s and 1s after all), but IMO most of the time that’s not very intuitive, and it’s also not flexible. As I’ve said previously in this tread, using a declarative approach to represent an imperative flow may sometimes be useful, but IMO it’s not a sensible default approach.

Yeah, the problem here is that config scripts are too “free-form”, whereas dhall seems to be more constrained. But ultimately, I think that in the context of CI, one major drawback of dhall is that it’s evaluated at “compile-time”, and so, despite it’s very interesting design and Turing completeness, it’s just more of the same from the “modeling imperative as declarative” school of thought.

What if I need to feed the output of one statement to others? Or what if I need to conditionally execute some step, depending on various circumstances (e.g. the outcome of the previous statement, or the branch being tested, or on whether the PR has been approved)? From what I can tell, you can’t make such decisions in dhall (because it transforms imperative to declarative). Of course, one can always do some trickery to make that happen, like pushing imperative logic into the command itself, or using :if properties of the CI engine, piping cmd output to file, …, but this all seems quite clumsy to me.

All this being said, I agree with the following observation:

As usual, we don’t get something for nothing. By using a “full-blown” language (i.e. a Turing complete language with a rich std library & ecosystem) at runtime, we’ve obtained some possibly dangerous power. We can do all sort of things like format the disk, try to steal secrets, issue a DoS attack, etc. Now, in prety much every real-life case I’ve experienced, such problems were more theoretical than practical, since the team was always very small and every team member wore all the hats (frontend, backend, devops, …). Obviously this won’t scale with respect to the CI user base, so doing this approach in e.g. CI as a service, or in larger companies is not something I’d recommend.

But there are many teams which are fairly small and I believe that in such cases, a full-blown imperative using the same language that is used for the implementation of the main product makes more sense. It’s an option which is simpler and more powerful, at the expense of being less secure.

In the cases where Elixir is not a viable option, you can consider other alternatives. One option is to go full declarative, maybe using dhall to generate the final specification. Note that you can still do this with the ci library. You basically need to transform the program into a declarative collection of facts, and then feed that to the engine that runs OS commands inside a Job. All of this can be done as a part of the mix my_app.ci task, invoked directly on the CI machine. The generated config file never needs to be stored on disk, or committed to the repo.

Another option is to use a sandboxed embeddabled language, e.g. Lua. This would allow you to use a TC language at runtime, while still being able to control what can be invoked. Again, this can be combined with the ci library, by importing a set of custom Lua functions which are under the hood using abstractions from the ci to run particular actions.

The ci library is marketed as a CI toolkit, which means that it’s not particularly “opinionated”, or the way I prefer to think about it, it’s not rigid. The library ships with various independent abstractions which you can use however you please. Take the parts that fit you, ignore the things that don’t, wrap any abstraction as you please. Again, you can most certainly build a declarative engine on top of this runtime TC imperative core, but it usually doesn’t work the other way around. If the core is declarative, adding runtime Turing completeness is going to be either impossible, or at the very list difficult and clumsy.

webuhu · January 17, 2021, 3:54pm

I really like the idea. Ceartainly need to test it in a fitting project.
Things I’m welcoming with a very warm heart:

easy CI setup, as running it locally is not anymore impossible
testing the CI, to not break it later again

For the further development my thoughts.

Within CI paralellization was never the greatest win in my projects.
It definitly always was caching wich I can archive with the help of Nix.
Docker layered builds can do caching nowadays also a bit better,
but I still don’t really like to write this Dockerfile stuff…

As Dhall was mentioned, I’m locking foreward for yeah another language - Nickel:

hauleth · January 17, 2021, 8:42pm

From quick glimpse these two aren’t really comparable as one of the main Dhall’s goals is to NOT be Turing complete, while Nickel seems to not bother there at all.