Best pattern for updating Elasticsearch after doing an update for a record in postgres via Ecto?

bolducp · July 1, 2024, 4:51pm

Hi!

I’m working in a Phoenix application where we use both Elasticsearch and postgres (via Ecto) to store data.

Historically, at the time that we make the call to the module that handles CRUD functions for the record (via Ecto), we also call a separate module that inserts an Oban job to update/create/delete the corresponding document in Elasticsearch (ES). But this has led to a couple problems. 1. When records are created, updated, deleted, we need to remember to call the functions for updating ES as well. 2. Because the Oban jobs are async, they can sometimes take a while to complete (during which time postgres and ES are out of sync), and we need to carefully monitor for ES updating failures or the data can be out of sync long term.

It seems like we should actually be able to move from the async Oban jobs to doing synchronous updates to ES at the time that the records is created/deleted/updated via Ecto.

But my question is whether there are any best practices for this sort of thing?

I’ve searched through the forum discussions and google, and I’ve read about Ecto.Multi allows you to run arbitrary functions as part of your transaction via run/3 and run/5. I’ve also been reading about Event-sourcing approaches.

But I’m wondering if there are already some established best practices in the community that someone might kindly point me towards?

My use case is specifically about wanting to update Elasticsearch after an Ecto (postgres) db change, and while there are some special considerations here about Elasticsearch (e.g. the possibility of timeouts), I’d very much appreciate more general resources or suggestions too-- anything of the shape: When a records is created/updated/deleted via Ecto, how do you make sure X also happens.

bdarla · July 1, 2024, 5:34pm

I am working with a similar toy project that runs only locally at my machine (for the moment, I am the only user), so maybe our requirements are different.

In order to combine Postgres and Elasticsearch, I first store a document in Elasticsearch using Snap. Snap employs HTTP requests and for my scenario (single user), Oban is not essential. If the storage request response is not an error, then I continue with my “pseudo-transaction” by adding a corresponding record in Postgres.

Maybe this simplified solution is not proper for your scenario, though.

bolducp · July 1, 2024, 5:47pm

Thanks for sharing, @bdarla! I hadn’t thought about going to Elasticsearch first and then only updating postgres if the ES update succeeds. I’ll have to think more about whether that would make sense for our use case. I appreciate the suggestion!

LostKobrakai · July 1, 2024, 5:49pm

I’d argue what you have is the only true option you have. On a distributed system you simply cannot have systems be perfectly in sync all the time. You can only decide where you make tradeoffs.

Even if you use more synchronous approaches you can still easily get into a place where one db is updated and the other isn’t.

bolducp · July 1, 2024, 5:58pm

hi @LostKobrakai, thanks for the feedback! That’s definitely a helpful perspective.

In this case, it seems like there wasn’t really a performance need for the calls update Elasticsearch to be async (it was done that way because it was modeled off of, and share code with, a different case where that was required because the ES updates were slow).But our application is actually relying on Elasticsearch (not postgres) to do the counting and searching in some key places now, and so the lag between when the user initiates the change (and it happens sync in postgres) and when the change gets populated to ES and shown in the UI is more problematic (not to mention that we seem to be failing to always catch and surface when the ES updates fail because there has been data permanently out of sync in ES).

bolducp · July 1, 2024, 6:04pm

@LostKobrakai Oops. Hit enter before I meant to!

It does feel like there’s a question about tradeoffs here, especially knowing as you said that you simply cannot have systems be perfectly in sync all the time. But given the UI expectations of the new designs and the fact that the “source of truth” is coming from ES (not postgres) for what is returned in the searches and counting, it seemed to me like we could remove some friction and allow for better handling of errors if the request to update ES was more tightly coupled to the postgres update. Does that make sense? Or maybe I’m missing something to consider here?

Thanks again for the thoughts and feedback.

thomas.fortes · July 1, 2024, 7:10pm

If there isn’t a performance penalty in running the elastic search without async and you absolutely need consistency between the databases I think that your first option of using Ecto.Multi.run to rollback the postgres write if the ES write fails is a good choice.

dimitarvp · July 1, 2024, 7:32pm

You might be helped by Overview — Sage v0.6.3.

bolducp · July 1, 2024, 7:39pm

Yeah, that makes sense. Thanks for taking the time to read and offer feedback!

bolducp · July 1, 2024, 7:39pm

Oh, very cool. I’ll definitely look into this more. Thank you!