Some of the apps where I work frequently suffer outages as a result of our stored procedures, and someone suggested that maybe we could remove some of this load on the database by bringing the data into elixir. Has anyone here seen something like this where they work?
The query-by-id parts would likely get translated into phoenix topics. What would be a good way to backup the in-memory data in case a process crashes? I’m imagining that elixir could generate the needed processes based on demand, so that any processes representing objects that haven’t gotten any attention in a while, could be put to sleep and left in the database, to be warmed up the next time someone asks for those objects. Our apps stall when specific records get a lot of attention, so the reads and writes tend to be concentrated around a specific object ID and all the records associated with it.
There’s a discussion on caching to reduce DB load here (particularly for heavy reads that are frequently hit). It is probably a bad idea to roll your own caching system using raw processes - it gets hard quite quickly (e.g. handling timeouts, knowing when to invalidate etc).
I’ve given this some thought, and I actually think what I want to build is not a cache. I think it’s more akin to a buffer for the DB. As such, there’s no need to cache invalidation.
When a request comes in for a record, it’s loaded into an Agent, and the Agent will persist for X minutes since the most recent read/write request. All reads/writes for the associated record will be handled by the Agent while it’s alive, and writes to the DB will be generated by this Agent at a configured time interval (gives devops control of DB load). When the Agent reaches end of life, it writes the final state to the DB.
This is exactly the issue. If your database is slow and inefficient, batching data in memory before inserts just adds another layer of failure. It isn’t clear how this would even help with the stored procedure issue, cause those will ultimately fire when inserted anyway.
If you have valuable data, and you can’t tolerate data loss, then you can’t afford to store that data in memory, it must be written to a disk somewhere. If reads are your issue and not writes then you can look into caching.
SET col2 = col2 + val
WHERE id = arg_id AND month = month_of;
IF found THEN
INSERT INTO table1 (id, month, col1, col2)
VALUES (arg_id, month_of, 0, val);
EXCEPTION WHEN unique_violation THEN
-- do nothing, and loop to try again
To persist the data to disk prior to DB update, maybe DETS could handle it? I’ve tested writing Erlang terms to disk in binary format via :erlang.term_to_binary(), it’s really simple, but I don’t know how fast.
Currently devops kills this SP when our service goes down, so I assume data loss is already happening.
These aren’t multi-node solutions. If you run more than 1 server, how do you deal with conflicts? Your stored procedure is based entirely around a unique constraint violation, you can’t possibly replicate that outside of your database because you don’t have a consistent view of your database from Elixir.
I do think that you could move these stored procedures to Elixir, but by “to elixir” I mean you could craft a database transaction that was perhaps more efficient.
I’m not sure that’s true. If a user submits a request, and then devops kills stuff, then the return to the user is that their request fails. The danger of using an Agent is that the user submits a request, your server says “ok, confirmed”, and then the agent dies and their data is actually gone, even though you said it was confirmed.
Before we can evaluate things further though, we need to understand better what the purpose of that loop is, because at the moment you can’t straight up reimplement it in Elixir at all.
It looks like the purpose of this code is to track an amount that we are summing. Our DB has things, each is globally unique, and each has this amount field. Then, events occur which add to this amount, and we want to update and report that amount accurately. All seems doable via topics, and I would think we could spawn a single node process for each globally unique thing (using DynamicSupervisor), so we don’t need to worry about conflicts. The number of events we’re talking about is ~100k - ~200k over ~2 hours, and this many events all relates to just one of our globally unique things. I would think this quite manageable.
Ah! I finally get it. Which database are you on? This stored procedure is basically a home grown version of postgres’s INSERT ON CONFLICT so if you’re on postgres you may be able to refactor this to just use that and move on.
Beyond that though, I was referring to EXCEPTION WHEN unique_violation THEN. The whole thing is weird though because if the id is a primary key, indexing on (id, month) shouldn’t actually help because querying on just id should already be indexed. Where does arg_id come from in your stored procedure?
Excellent! So before we do any fancy Elixir things, your current code should really try to use INSERT ON CONFLICT which allows you to try to insert a value and, if one already exists, conditionally update it. Your stored procedure is basically inserting a summation row, but it first checks if one exists, and then adds to it. I feel like the native version is likely to perform much better.
200k over 2 hours is ~30 updates a second, which really ought to be doable in the DB.
I don’t want to discourage you from an Elixir solution but
will work fine until there is netsplit. Phoenix’s distributed tooling works great because it (aside from Presence) handles distributed communication but not distributed state. Distributed state is much, much harder because you have to deal with split brain, things coming apart and coming back together. Presence handles it, and it required some very fancy cutting edge stuff to make it work.
I still think there are solutions that the Elixir version could help with, depending on whether these sums can be slightly out of date or not, but all of those are going to involve choosing when to do a database update, not try to do the sums in Elixir.
Cool, so at a minimum, you could replace the per-request summation with an update query that happens on an interval. You could also try to track which entries in your other table that you are trying to sum have been handled before or not.
I suppose my basic point here is this: This feels to me more like a database design problem, not a language problem. You are trying to sum rows in a database, the only way to do so and guarantee zero error is to do the summation in the database. The method ya’ll are doing right now is definitely inefficient, and I’m not surprised there are scaling issues. But before you break out a new language to solve your problem, I’d explore other options with the tools you have.
Elixir is awesome, and it can help with a lot or problems. At the end of the day though as long as the problem statement is “I want the values in these rows in table X to represent the totals of rows found in table Y” your database design and interaction pattern is going to matter far, far more than the language.
EDIT: In response to your edit: Try INSERT ON CONFLICT. In general my point about this being a database design issue remains important, particularly if you use those sums to make further database decisions.