Discussion about uses for Agent Processes

mjadczak · April 7, 2017, 12:17pm

What I mean is, the default interface of an Agent expects a lambda to be passed in at runtime to tell it what to do with its state. Most of the time that is encapsulated inside another module anyway, but it is different from a GenServer which requires you to write a new module, and perhaps specifically accept and execute lambdas if you explicitly want that behaviour for some reason.

I get your points here. The key issue here is that the processing/importing is done in parallel/concurrently over several processes. The agent here acts as the cache as well as a point of serialisation. The main issue is that I have e.g. “Focus Areas” stored as strings inside the source data, but I am normalising them into a database table. Instead of executing an atomic “insert if not exists and get me the ID” query against the DB for every single record, I serialise this through this process which keeps track of which ones have already been inserted.

There is admittedly no reason why there’s an advantage to this being an Agent instead of a GenServer though, I think. Perhaps Agents were an initial backlash against the “boilerplate” of GenServer—initially I also thought that the triplication of functions “just to do one thing” was annoying, but then I got used to it.

Maybe requiring people to understand what’s going on inside the GenServer before they use it instead of allowing the “quick win” of a simple interface which then leads to misuse would be better.

brightball · April 7, 2017, 1:01pm

Is ETS definitely faster than storing and retrieving something from an agent? I’d been wondering about that but hadn’t seen any stats.

sasajuric · April 7, 2017, 1:38pm

Oh, in that case, Agent works perfectly fine here, sorry for wrongly accusing you

I agree. Many people here said they don’t use agents all that often. And yet, in the official mix&otp guide, agents are the second lesson. I’m not sure they deserve so much spotlight, since it seems they are not used all that often in practice.

sasajuric · April 7, 2017, 1:46pm

In terms of speed ETS should always be faster, because it doesn’t go through the scheduler, and it also doesn’t put any pressure on the garbage collector. In terms of memory overhead, it’s ~ twice as much than process, so you want to be a bit careful not to create too many tables. The default limit is 1400 ETS tables (can be increased via command line).

Given its speed and the fact that it can support concurrent reads/writes, for a simple shared k-v structure used by multiple processes, I don’t even bother with agents/GenServer, and opt immediately for ETS tables.

On the flip side, in terms of atomic operations support, ETS is much more limited than standard processes. You can do some basic things such as atomically set some value, or increment a counter. So there are many cases where ETS table will not suffice, and in such cases GenServer (or derivatives such as Agent or :gen_statem) are your only option.

Qqwy · April 7, 2017, 3:30pm

Well, I would like to say that, at least in my personal opinion, grasping how Agents work and how to create them has a lot lower learning curve than grasping how the GenServer and related behaviours work and should be used. Of course, this might also be considered a bad thing if it leads to newcomers over-using Agents because they understand those.

As for using ETS over an Agent-argument: Do not forget forget that you should first make it correct, then readable, and only finally fast. If using an Agent is fast enough for whatever purpose it has in your application, the added complexity from using an ETS-based system might not be advantageous.

OvermindDL1 · April 7, 2017, 4:27pm

Yes.

Just flat-out yes.

Sometimes faster and sometimes slower depending on the options used to set up the ETS table, but regardless, ‘Yes’.

Except in very very specific situations that you are likely to never run in to, and even then only barely…

I’d still use ETS.

Overall though:

If I’m wanting to store/set data to be accessed by different processes, I’d use ETS.

If I’m wanting to store/set data to be accessed by different processes transactionally, I’d use Mnesia.

If I’m wanting to store/set data in a single process, I’d pass a value (map/tuple/record/whatever) through it.

If I’m wanting to bind some ‘actions’ to operating on data, I’d use GenServer straight.

I cannot really see any reason that I would ever use an Agent at all, there are better/faster methods in all cases that are more functional, both in syntax/code and functionality.

sasajuric · April 7, 2017, 4:54pm

That’s the big problem in my impression. Another problem is that IMO Agents don’t help much for learning GenServer. So while I agree that it’s easier to grasp Agents, I’m not sure that the gain is worth it.

Especially since basic GenServer (init, call, and cast) is not all that complicated (though admittedly still more than Agent).

I agree with this as long as there is some notable difference in complexity between correct and fast. For a k-v, here’s how it looks with Agents:

{:ok, pid} = Agent.start_link(fn -> %{} end)
Agent.update(pid, &Map.put(&1, :foo, :bar))
Agent.get(pid, &Map.get(&1, :foo))

and this is how it looks with ETS:

ets = :ets.new(:table_name, [])
:ets.insert(ets, {:foo, :bar})
:ets.lookup(ets, :foo)

In terms of code complexity these are mostly similar. In term of scalability, ETS will be much better than single proc in handling larger number of items, as well as larger number of clients. So in this situation, the choice is pretty clear to me

Otherwise, I mostly agree with what @OvermindDL1 wrote above.

mjadczak · April 9, 2017, 6:36pm

I guess I feel/felt like for something which exists in memory for maybe a few minutes, making an ETS or Mnesia table feels really “heavyweight”, especially since with current scale it’s unlikely to make any measurable difference whether a process is used or ETS/Mnesia directly.

Maybe my inner sense of how “heavy” using those is needs to be recalibrated however. Maybe I incorrectly assume they’re this heavy extra thing because they are rarely mentioned in any educational material around Elixir. Up until now I would have only thought to reach for them when storing data for a longer time—e.g. where I would use Redis in another stack—and not to just store a small, transient amount of data.

OvermindDL1 · April 10, 2017, 3:02pm

Not at all, ETS is blazing fast, it technically works ‘outside’ the actor system so most communication with it is faster than anything else you could set up short of keeping it on the local actor’s stack. Plus the ETS table is killed when the process it is linked to dies as well.

That is because Elixir is… odd in some of its teaching. ^.^
In my opinion it really should have made ETS more front-and-center, like it is in Erlang, the EVM/BEAM is practically designed around ETS and making it fast.

alamba78 · April 10, 2017, 6:11pm

I read the same thing you mentioned from here:

https://brainlid.org/elixir/2015/12/05/when-do-i-want-a-genserver.html

It’s apparently how José Valim described the choice between using Tasks, Agents, and GenServer.

gon782 · April 10, 2017, 9:03pm

This is largely the reason no one seems to be using Mnesia, as well. It’s odd, because it’s such a great resource to have and use out of the box. People won’t use it even when what they have is generally perfect for it. I think it stems from records being something Elixir programmers generally know nothing about and all they know is “they’re bad” even though they’ve never had to work with them and so they don’t even know what’s bad about them (the syntax, that’s all). This, coupled with that no one in the core team seems to ever mention Mnesia, means that it’s something people don’t even consider.

When I started working with an erlang codebase I’d only worked with Elixir before and so I faced this personally. I was totally unprepared for how great Mnesia is and how easy it is to work with, because the Elixir community does an extremely bad job in highlighting what a great out of the box feature it is to have.

OvermindDL1 · April 10, 2017, 9:06pm

Wait what? Records are bad? Since when? I use them quite heavily! They give the same ‘structural static typing’ that Elixir Structs do, but since they are tuples underneath they are a ‘little bit’ faster until they start getting too large. ^.^

Although in my opinion it is safer to let Struct’s ‘leak out’ from your module definitions then it is to let Records leak out, for various reasons.

benwilson512 · April 10, 2017, 9:11pm

The reason no one is using :mnesia is because its netsplit story is “deal with it manually”. Sure there’s GitHub - uwiger/unsplit: Resolves conflicts in Mnesia after network splits but you’ve gotta pick your functions wisely to avoid data loss. This is hardly a beginner subject, and there’s zero beginner oriented literature on the matter.

:mnesia needs an Mnesia in Anger book that goes through all the gotchas that can happen, appropriate ways to manage them, and so on. Without that it’s simply too perilous as a beginner to use as a primary data store of anything, and if you don’t need it as a primary data store you can probably just get by with :ets.

gon782 · April 10, 2017, 9:24pm

This is mostly entirely a non-issue, since as a beginner you have absolutely nothing you could conceivably lose from this. Nothing is going to come for free and the whole Elixir community sticking its collective head in the sand won’t make them understand when and how to use Mnesia.

No one is (hopefully) creating important production systems in languages and with technologies they don’t know enough about. When not doing important production systems considerations like these are pointless and only serve for procrastination.

If you want to say that Mnesia is simply bad, that’s another story. But the beginner argument is just a non-starter.

josevalim · April 10, 2017, 9:26pm

Those are very different in terms of code complexity. Sure, in terms of characters is about the same, but reasoning about them is very very very different. If data is shared across multiple processes, who creates the ETS? How to reason about failures with your ETS table? What about race conditions?

I completely understand that some would rather introduce GenServer directly and skip Agents, but starting out with ETS is asking for trouble unless you know what you are doing. If people are having trouble with the functional mindset and they think they need to use mutable state, then I would a thousand times prefer them to use an Agent rather than ETS.

It reminds me of Joe Armstrong saying “you shouldn’t use GenServer, you should start from scratch”. That’s a great advice from Joe to Robert or Mike but I certainly won’t be doing it.

For learning purposes, I find the Agent a very natural step to teach developers how to think about processes and data boundaries. What is part of the client and what is part of the server. I also prefer them when reading my code too. In my experience, when teaching them, the Agent clicks much faster than a GenServer.

Maybe they are more likely to be misused exactly because they are much easier to grasp? It is probably a topic where the documentation and guides could reflect on.

benwilson512 · April 10, 2017, 9:51pm

This is a caricature of what I’m arguing for. I’m providing reasons why introductory material does not generally present :mnesia as an option, and instead opts for genservers / ets for local node work, and traditional databases (ecto / postgres) for persistent data. I am not arguing that the “whole Elixir community” ought not to look at :mnesia at all, the two arguments are not the same thing.

I’m not necessarily saying that it’s bad at everything, but it does have very distinct limits that make it difficult to use. Distributed databases in general are exceptionally difficult to operate, even by experts in the field (see any blog post involving Jepsen). Mnesia takes one of the most difficult parts of distributed database management, netsplit healing, and as best I can tell basically just throws up its hands and tells you to fix it yourself.

Of course. However, many people come to Elixir looking to learn stuff that, once they become more familiar, they can in fact use in production. I’m arguing that the bar for using mnesia in production is much, much higher than GenServers, :ets, Postgres, and so forth, and so it shouldn’t be much of a surprise that material aimed at getting people up to speed with Elixir doesn’t feature :mnesia prominently.

Don’t get me wrong, I would LOVE to see a well written book on using :mnesia that covers this kind of information, but without a focused and detailed walk through of doing so I’m not really sure what the value of having folks use it in a merely cursory manner is.

mjadczak · April 11, 2017, 2:18am

I think then, that it’s important to emphasise that one of their main purposes is to be a teaching tool, rather than a core piece of the Elixir infrastructure that you should construct your codebase out of. I do agree that when I was starting out, considering Agents first allowed the actual messaging model to sink in before I had to worry how callbacks and behaviours worked.

I am one of the people at the stage where I feel I have a very good grasp on the “basics” of Elixir / OTP, which is what the literature seems to focus on. I have a couple of simple, non-critical apps in production. I know how processes and supervision trees and releases and applications and clusters work. However I definitely feel frustrated by the lack of literature which covers more advanced topics and how and when you should actually deploy these more advanced tools like ETS or Mnesia when dealing with “real” systems, and the gotchas and caveats that apply.

I did breeze through the introductory material—most likely because I had already been exposed to functional programming as well as traditional “formal” concurrency models like CSP—and I feel like I’ve suddenly hit a wall in terms of further learning; for now the only way forward has been to push forward, make my own mistakes and learn from them (often with the help of the wonderful community here).

I think that if we want to encourage more people without an Erlang background to actually build non-trivial, production apps which take advantage of the power of OTP, and not just treat it as an interesting and powerful language, but only for side projects, this space definitely needs to be developed some more.

sasajuric · April 11, 2017, 8:39am

Not sure I understand the problem here. The same questions you raise apply to agents as well.

No disagreements there The same thing IMO also holds for processes, especially for the cases where Agents are misused where plain in-proc data structure would work just fine.

My point about starting out with ETS was about “simple” shared data, such as k-v. The word “simple” here is pretty vague, but mechanically speaking it is about atomic guarantees of ETS. If I need to keep some shared in-memory data (say some kind of a cache), and ETS supports that scenario (e.g. for simple key-based lookups and writes), then I go immediately for ETS.

Misusing ETS to bypass FP is as bad as doing it with Agents. I’m guilty of both sins in the past, so my position is based on my own experience

There’s no denying that Agents are easy to grasp. I’m not sure though how well do they work in explaining process boundaries. Because of their simple usage, the fact that data is being passed across processes is maybe not as clear as with GenServer, where you explicitly need to invoke cast or call, which is perhaps a stronger indication that something is being sent to another process.

Maybe I’m too used to the “old ways” of teaching, but IMO the bottom up approach of explaining plain spawns with message passing, followed by teaching GenServer is much more explicit about what goes on. It takes more time, yes, but I believe it is more geared toward idiomatic ways and deeper understanding.

Also, speaking of the learning curve, I personally don’t think spawning, message passing, and GenServer are hard concepts to learn. As someone who came to Erlang with OO background (Ruby, C#, C++) I had no problem learning those. I struggled more with getting used to FP, understanding how to organize supervision trees, and properly produce OTP releases. Therefore, if it comes to learning advantages, I wonder if Agent is solving a problem which is not so huge in the first place. I’m not saying Agents are not helping, but I don’t think they’re helping that much.

That’s precisely my sentiment! Given that, and what I said above, I’m not really sure that Agent pulls enough weight. In fact, I sometimes wonder if it muddies the water more than it helps. Over years I’ve repeatedly seen newcomers wondering should they use Agent or GenServer. And then there are a bunch of misuse examples, with agents being used where plain in-proc struct would work.

That would be a good idea, yes.

josevalim · April 11, 2017, 4:59pm

Except I don’t agree with this. It is not what I said. As I mentioned in my earlier replies, I find the Agent usage clearer when it fits and I do use it in different applications such as Mix and Ecto.

josevalim · April 11, 2017, 5:00pm

Sure. But you learn how to reason about processes much earlier than you do with ETS.

I agree with the data copying. By process boundary, I meant to explain “what happens in the client and what happens in the server”. With agent it is visually in the same function, like this:

def do_something do
  # client
  Agent.put(..., fn -> # server end)
end

and I think that helps drive the point that process execution is orthogonal to code layout. With the GenServer, you end-up with functions and callbacks all at the same level and that is neither here nor there. It would probably be clearer if we wrote client and server modules instead (but I am not quite proposing that).

Using agents and such approaches are probably complimentary.

That’s interesting because spawning and message passing was quite straight-forward but a GenServer took me quite some time to master (especially in understand all of the trade-offs regarding call, cast, termination, etc). So it definitely throws a lot at you upfront. I guess that’s the point though, different people are going to get stuck at different places, and different paths are likely helpful.

Here is something that just came to my mind: if we didn’t have agents, those folks would likely end-up just using the process dictionary, because that’s what happens in Erlang to a certain extend. And I am not quite sure if that is better or worse.