Is elixir a good candidate for fault tolerant database scripts?

cgodgers · January 14, 2021, 5:17pm

Hello there!

I’m looking for opinions to decide whether Elixir makes a good case for my company.

We are an extremely small startup, 2 developers basically. We basically maintain a quite big database and plenty of algorithms programmed as cron jobs on multiple ec2 instances. Our problems normally involve plenty of services (sending mails, asking to the database, making http requests), so we need a good concurrency model.

To overcome PHP limitations, I introduced node js to my colleague, which turned out great (bastly improved algorithm speed). However, after some time I’ve seen that error handling in js is not really a pleasure. We can’t tolerate runtime errors, as the failure of one algorithm means the failure of other dependent big updates on other database tables. It’s not the end of the world, but those type of errors mean hours spent manually updating data, which is time consuming and not fun at all.

Therefore, I’m looking for a safe functional alternative which is easy to learn, scalable, excellent at concurrency and, most important, fun to program with, so my search brought me here. Basically I’m looking for a more maintainable functional nodejs (functional programming in typescript is quite horrible, so deno | compiled js is also discarded). I also value deploy simplicity, as with node it is just a git pull and some occasional npm install.

However, the following talk made me wonder a bit:

For web requests and web servers the let it crash mantra is okay, and I’m considering Elixir to replace our current node js express stack. But it is very important that both of us can produce maintainable and fault tolerant code, with little to no refactors. Something like “if it compiles, it works”, like rust.

So, knowing this, is Elixir a good fit for running those scripts? Take consideration that I am completely new to it (although I’ve experimented with haskell and purescript a bit), and after introducing this language I’ll have to teach it to my coworker.

al2o3cr · January 14, 2021, 6:38pm

To me, this looks like your problem, not the choice of language.

Elixir provides great tools to build reliable, concurrent software - but it’s also possible to build flaky, single-threaded software with it by making the wrong choices (like the client who had most of their system’s state in a single unsupervised GenServer )

In particular, you mention algorithms as cron jobs - when these fail, does it cause a problem because the database is left in an invalid state or because there’s no retry mechanism in cron?

In the first case, Elixir won’t necessarily help.

In the second, tools like Oban could help by taking care of the retry bookkeeping.

The recommended approach to deploying (releases) is more complicated than this, you may not be a fan.

Exadra37 · January 14, 2021, 6:56pm

That video may have failed to make you understand the benefits of the let it crash philosophy in the BEAM, The BEAM was developed by Ericson with this let it crash mantra at is core for running a telephony system that could never go down. So, no web requests, just backend stuff

Please bear in mind that just watching talks can lead you to a misunderstanding of what a language good fit is, because sometimes it may lack the full context to you due to your lack of knowledge with the language. I recommend you to also read some books or do some video courses from reputable sources.

So, I strongly recommend you to first read this book or make this video by @pragdave as I recommend here:

After doing it you will have a better understanding about the let it crash philosophy and how it can help you isolate errors on your system in order to keep it running without downtime.

Be aware that the BEAM and Elixir doesn’t do miracles and solve developer bad decisions in terms of architecture.

As a a previous PHP developer that had used also cron jobs or queues to keep the system working and in sync I can tell you that Elixir is an excellent fit for your use case.

dimitarvp · January 15, 2021, 12:43am

What you are describing seems like it good be a good candidate for transactional SQL stored procedures. However, that might turn out to be extremely difficult to code and test so I’d try a lot of other tools before reaching for those.

IMO Elixir can help you a lot because you can utilize tools like Ecto.Multi for your logic. If one step fails, everything stops there and nothing further is executed. But if this is across several databases then it might be more difficult.

“Let it crash” is aimed at 3rd party APIs or non-app-breaking DB disconnections. It doesn’t apply to everything.

I suggest you give us a simplified scenario of what you’re doing and we might be able to suggest Elixir code to make it work – or recommend another language or tool entirely.

ityonemo · January 15, 2021, 5:41am

“if it compiles, it works” like rust.

That kind of belief is going to come to burn you, big time, in distributed system. Rust’s compiler cannot think ahead to a network disconnect caused by a backhoe near the data center or a junior engineer at gcp pushing some bad code that brings down the eastern region.

Note that database + backend is already a distributed system.

The stuff you are talking about (database inconsistency) is a very very difficult problem and although some tools have been built to help (as @dimitarvp says, transactions), there is absolutely nothing you can do to prevent them altogether. Interestingly though Ecto gives you multi which lets you wrap several operations into a single transaction that gets rolled back if anything (database or code between db calls) fails

cgodgers · January 15, 2021, 12:45pm

Thanks a lot for your replies! I’ll try to explain myself a bit better, yesterday I was quite exhausted.

To me, this looks like your problem, not the choice of language.

I’m refering, for example, calls to undefined methods, calling functions with the wrong datatypes, unpredictable error handling, that kind of stuff. To me a language doesn’t seem fit for my usecase if that kind of errors cannot be discovered at compile time, and can be sneakily be deployed into production. Note that I’m talking of elixir from complete ignorance, I just learned yesterday of dialyzer.

Elixir provides great tools to build reliable, concurrent software - but it’s also possible to build flaky, single-threaded software with it by making the wrong choices (like the client who had most of their system’s state in a single unsupervised GenServer )

I’ll take that in consideration.

In particular, you mention algorithms as cron jobs - when these fail, does it cause a problem because the database is left in an invalid state or because there’s no retry mechanism in cron?

Not particularly, but we do download a lot of json data daily. It consists of various steps, such as inserting data, correcting data inconsistencies, creating summary tables, etc. They have to run synchronously.

The recommended approach to deploying (releases) is more complicated than this, you may not be a fan.

Sad to hear. As I plan to bring Elixir in a small use case, maybe just by installing the elixir ecosystem, installing dependencies and running .exs I can achieve aproximately the same behaviour as for node scripts.

I suggest you give us a simplified scenario of what you’re doing and we might be able to suggest Elixir code to make it work – or recommend another language or tool entirely.

Basically we download big amounts of time series data from various countries. That is then reduced for analytics that we serve to our clients. For example:

Fetch user data (millions of rows) into a temporary table. The script that downloads this must correctly finish, a runtime error means the following algorithms fail.
Update the main big users time series data with the daily temporary table.
Sometimes some users have old time series data that needs to be updated, so correcting algorithms are run to prevent data inconsistency.
Report to Sentry the results (exec time, total users affected, …)

And repeat that for 5 different countries. So maintaining that is not an easy task, and I’m afraid that still most of the code is written in copy paste untested PHP.
That’s why I considered Haskell as its type system is quite safe as far as I read. However, introducing that will take a looot of time and energy that we don’t have at the moment. We are looking for a painless easy and safe transition, so Elixir looks good to me. Also it would be amazing to have an uniform scalable ecosystem for our backend.

IMO Elixir can help you a lot because you can utilize tools like Ecto.Multi for your logic

Mmm, looks interesting! I’ll keep that in mind.

Also, coming from nodejs and its environment I tend to dislike quite a bit it’s testing tools. It just feels like I have to test almost everything to be sure that a new feature won’t break things. Also, filling all my code with dependency injection solely for testing doesn’t feel right.

Btw, how does unit testing look like in Elixir? Would you say is it more or less painful than in other environments such as node.js or C# or others?

Edit: I now this is a totally different language, but I’m looking for something like Elm for the backend. I love that I can just pattern match all the possible scenarios in a function, and the compiler telling me if I’m missing a clause is really appreciated. For those that have programmed in Elm, in a subjective point of view, does it feel the same when programming in Elixir?

dimitarvp · January 15, 2021, 4:07pm

Using Elixir for that is, shall we say, 75% likely to be a good fit.

However, if you want a static and strong typing system to catch some of the bugs beforehand then I can’t recommend Rust enough. (I did something almost the same as what you described several months ago in Rust; was surprised that even without the BEAM’s guarantees it was still rock-solid although it did require a bit more defensive coding at places so maybe it’s my Elixir training that made the code rock-solid; I am not sure but was very pleasantly surprised by the robustness of the final Rust code).

But I think you can be just fine with Elixir. It’s very well suited for classic ETL workflows such as yours. You can also check out Flow and Broadway. I’ve done a lot of successful ETL with Flow alone.

(RE: your Elm question, Rust / Haskell / OCaml mandate exhaustive pattern matching while Elixir doesn’t – it can’t due to its dynamic typing nature.)

ericgray · January 15, 2021, 11:32pm

Elixir is really good at catching compile time errors. For example you’ll get an error when functions are being called with the wrong number of arguments or function arity. Or trying to call functions that don’t exist.

Elixir also has pattern matching that can guard against calling functions with the wrong data type. You can do pretty good with catching compile time errors even without using dialyzer.

The compilation phase is great because most of the kind of bugs you mentioned will be caught.

we download big amounts of time series data from various countries.

I agree with @dimitarvp Broadway and Flow would probably be a good fit. I would check out these videos to get an idea of what Broadway can do.

Build Efficient Data Processing Pipelines
Batch Operations with Broadway

Exadra37 · January 15, 2021, 11:41pm

and Elixir also has guard clause, that I like to use heavily in my code:

defp _todo_hash(%{
          title: title, # pattern matching
          user_uid: user_uid,
          date: date,
        } = _attrs,
        action
      )
    when is_binary(title) # guard clauses
    and  byte_size(title) > 0
    and  is_binary(user_uid)
    and  byte_size(user_uid) === 64
    and  is_binary(date)
    and  byte_size(date) === 10
  do
  # your code here
end

On top of this I also use types:

defmodule Tasks.Todos.Types.Event do

  # @link https://hexdocs.pm/domo/Domo.html
  use Domo

  typedstruct do
    field :type, :todo | :backlog
    field :target, :todo | :backlog | :all
    field :action, :add | :update | :move | :duplicate | :delete
    field :origin, atom()
    field :broadcast_topics, list(), default: []
    field :context, map(), default: %{}
  end

end

that are then used like this:

def broadcast_change(
  {:ok, data} = result, 
  %Tasks.Todos.Types.Event{} = event
) do
  # your code here
do

Exadra37 · January 15, 2021, 11:53pm

It’s built-in from the begin in the language with:

https://hexdocs.pm/ex_unit/ExUnit.html

And then you have excellent resources to leverage it:

I also recommend you to use Property based testing, because this will find bugs in your code that you never dreamed off:

Devtalk – 21 Apr 20

Property-Based Testing with PropEr, Erlang, and Elixir (PragProg)

Backend Developer Forum Backend Learning Resources

Property-based testing helps you create better, more solid tests with little code. Use the PropEr framework in both Erlang and Elixir, to automatically generate test cases, test stateful programs, and change your software designs for more reliable...

Reading time: 1 mins 🕑 Likes: 3 ❤

You even have libraries to help writing them:

https://hexdocs.pm/propcheck/readme.html

cgodgers · January 17, 2021, 9:24am

@Exadra37 @ericgray @dimitarvp (and others) thanks! I may not know much of elixir, but surely the community support is awesome