Using external HTTP API for data storage

artem · October 8, 2023, 8:19pm

Hi all

Context: external system with inconsistent, but feature-full API
We use an external SaaS system for hiring people. There are candidates, jobs, recruiters and so on. We like the data inside and our recruiters like some of the workflows inside, but interface towards candidates and other third parties (e.g. for other people to ask us post jobs) is not customizable and in many places does not exist at all. Fortunately there is API. It is not nice, and not consistent at all (e.g. different kinds of auth tokens in different parts of a system, sometimes XML, sometimes JSON), yet functionality-wise it’s verified to be good enough - we have built several semi-static micro-sites with it.

Goal: Phoenix with external API as database
So I am thinking about creating a real website in Phoenix for e.g. recruiters to manage their jobs, candidates to manage their profiles, etc. Main and possibly only source of truth - to be in that external system. In ideal world I would use that external HR system instead of a database. I guess at some point some local db would be needed, but that’s not certain and not right now (even login isn’t really needed - for now we could use magic tokens stored in the external system).

How would you do it?
My initial idea was that I could make Phoenix and LiveView use usual Ecto and replace Ecto’s DB layer (Repo? Repo adapter?) with something that would go to API instead of Postgres/MySQL. Main motivation for Ecto is not the powerful querying (it would anyway be limited by what API provides), but schemas, validations-changesets and ease of adding forms with error handling to LiveView. And possibly I could use some magic JSON/XML to-from Ecto schema encoding-decoding.

Unfortunately after a couple of days of searching I can’t see any similar situations except possibly for this EctoApi attempt from couple of years ago.

Is it something so exotic that nobody really needs anything similar?
Or is implementing even primitive Ecto Repo/adapter way too complex?
Shall I just use a context that would use HTTPPoison or Tesla instead of asking Ecto? Possibly I could still use schemas-changesets and ecoding-decoding to-from JSON/XML.

Okay, maybe exactly full source data being external is indeed a bit exotic. Yet, there definitely are cases where a lot of objects you operate with are stored/retrieved in/from the external API.

How do you handle such cases? What’s a good mechanism for creating CRUD forms for such external resources?

w0rd-driven · October 8, 2023, 9:19pm

If it were me I’d start with the local database and work toward eventual consistency with the external system. There are places where it can be a gatekeeper like user registration but it does make less sense if the external system is reliable, just quirky. A local system would be useful during outages but that could be extremely rare too. It’s certainly more complex because the lines of separation blur easily but you can have an eye on being platform agnostic. Companies get acquired or shutter or grow to be unicorns so all of this suggestion could also be a premature optimization.

Ecto and its family seem to map more to database like concepts. Cloud type data systems using http APIs certainly exist but they are likely closer to the primitives Ecto exposes than something more CRUD like. I also suffer from limited thinking so an adapter could very much pave the way for doing this between multiple systems. I feel like a generic adapter could be too ambitious without trying this between 3-5 systems to sus out the common bits.

I also can’t adequately evaluate which of just these two is more complex. Ecto potentially has a risk of not paying off at all where eventually consistent seems to have less of that risk, you just may throw more time and frustration at it.

In a perfect situation I’d spike both and leave myself open for a third option or a hybrid. There may be resources that must live local with others that can be more remote-first.

adw632 · October 8, 2023, 10:15pm

You can use Ecto embedded changesets for this, simply define all your CRUD changesets and validation in Ecto as you normally would.

Your contexts decide what to do with those changesets, how to read, create, update, and delete, just follow the same pattern of changeset validation, and pipe to an external API instead of Ecto.Repo. Remember that Ecto.Repo interfaces with an external system too as it maintains a connection pool to an external database.

You will probably want to think about a connection pool for the API. Here is a tutorial of following the similar model that Ecto uses for maintaining a pool using poolboy.

If you are using a http interface then req which is built on top of finch/mint/nimblepool also has pools integrated, the nice part being finch/mint/nimblepool is already included with phoenix.

If you are using the API at a rudimentary level with your own http request handling I would create a thin abstraction around around it to handle auth token refresh (something req can help with too with a retry error step) and mapping errors to changeset field errors and so on and this will make it easier to test also.

cevado · October 8, 2023, 11:32pm

I kinda stopped exploring the ideas around EctoApi bc I felt the db approach to an API is very brittle, … I went in another direction and started boto it’s very early stage, i had to stop my experiments there bc I was dealing with some more urgent projects at work and also i’m not with that much free time to finish the adjustments. I might restart working on boto next year.

I personally think it’s very brittle so it would either be very limited or at least very unpredictable.
A sign of that is projects that go the other way around and try to expose sql functionalities to a rest api, like sqlrest that doesn’t get that much of a traction.

I think that’s the best approach if you want to avoid overengineer the problem.