Best way to wrap Absinthe into Ecto transactions?

xadhoom · September 6, 2017, 10:16am

I’m scratching my head to try to understand what is the best way to wrap an graphql query (made with Absinthe) into an Ecto transaction.

Since the user may potentially query the entire tree, accessing different objects that may reside into different tables, to have a consistent response the transaction is the way to obtain it.

Think about users and posts: what happens if another query mutates the posts while the graphql query starts retrieving them after having retrieved the users? the query response will be not consistent.
I’ve seen how to batch the queries in relations, but still it is not one and is not applicable to all cases.

Same applies for mutations.

I’ve not found any reference to this case, so maybe I’ve overlooked/missed something… I’m still a newbie in graphql/Absinthe

xadhoom · September 6, 2017, 11:06am

for now I just got my copy of the pragprog book let’s see if some idea pops up

kokolegorille · September 6, 2017, 11:56am

I am not sure of that because graphQL can serve as entry point to many different sources. There would be no transaction between info coming from different apis.

And if it is in the same db, somehow You will load the parent and associations with ecto, at the same time, and subscribe to get changes.

xadhoom · September 6, 2017, 12:25pm

Well, maybe?

If the data is coming from different contexts I agree that there’s no transaction info and no need for it, probably.
Said that, I may want to isolate in transaction different paths of the tree (in different transactions of course).

My question was for a “top level” transaction, but applies also to isolating different paths in different transactions, depending on the business logic.

About the associations, well if your context data may not be associated in ecto… because the db structure is different from the user facing structure so the ecto associations tricks may not apply at all.

So basically if my user facing data tree, presents data that belongs to same context (lets say stays on same db) but not associated in ecto, I need to find a way to wrap everything into a single transaction to be sure that I’m getting a consistent view of the context.

On the oher hand, REST APIs are not (normally) transaction safe because of their nature, so why bothering with graphql?
To me seems thats can be great to have a response as consistent as possible in addition to placing only 1 API call instead of many.

I’m just thinking out loud, I need to make a lot of things clear and how (if makes sense) to obtain them…

hlx · September 6, 2017, 1:49pm

Does absinthe-graphql/absinthe_ecto work for you?

benwilson512 · September 6, 2017, 1:53pm

I think you’re working with a confused notion of “transaction safe”. Postgres does not let you read uncommitted transactions by default. Any time you do a read, you’re definitely reading the state of the database at that time. If you read at some other time, it may have a different state, and that’s true regardless of whether you’re using transactions or not.

You can try to use transactions to lock all the tables or something for every request but this prevents more than one user from writing to the database at a time or writing while anyone is reading, but this will have horribly bad consequences for your API performance.

Can you articulate exactly what problem you’re running into and want to solve?

xadhoom · September 6, 2017, 2:26pm

Well,
I’m not trying to solve a particular problem ( except my ignorance ), just trying to understand what is possible and the best way.

In a serializable transaction read concurrency is possible and does not slow down API performance.
And a stable view of the database is obtained.

For example: I want the total balance of each bank account and the total balance of each bank branch, something like:

BEGIN;
SELECT SUM(balance) FROM accounts;
SELECT SUM(branch balance) FROM branches;
– check to see that we got the same result
COMMIT;

(taken from https://www.postgresql.org/files/developer/concurrency.pdf)

The two tables are not related and may be mapped to different path on the graphql tree.
With two standard resolver function (one for account, one for branches) nothing protects against an update into the branches table while the accounts table is read so the final document may not be consistent.

Maybe I’m overthinking and this will not be needed in 90% of situations…

xadhoom · September 6, 2017, 4:04pm

I’ve stomped into this post Using Ecto to run a long-running multi-process transaction and while may not be the best solution, it gives some ideas on how to approach the situation (unless the’re a more idiomatic ways).

basically certain “paths” maybe handled by the transaction manager, passing its pid using contexts or middleware.

what do you think?

back to the book now

mjadczak · September 6, 2017, 9:29pm

I think one thing you’re missing here is that running things inside a transaction by default only prevents you partially writing data, i.e. either all of the changes you made during the transaction are committed, or none are. It does not, by default, prevent other transactions from modifying the database in a way visible to you during the transaction.

In the thread you linked, my issue was that I had a long-running import process occurring over different processes but wanted to keep those inserts in a single transaction so that there would be no partial import persisted in case of error.

Take a look at the Postgres docs for this. In particular, note that at the default isolation level,

In effect, a SELECT query sees a snapshot of the database as of the instant the query begins to run. […] note that two successive SELECT commands can see different data, even though they are within a single transaction, if other transactions commit changes after the first SELECT starts and before the second SELECT starts.

If you want to ensure that the data returned from two different contexts is consistent, you would have to change the isolation level of your transaction which may have an impact on how well your database handles concurrent load.

A nicer solution, IMO, would be to either create a view in the database which would aggregate the data needed for a particular GraphQL endpoint and use that directly, splitting up the data into individual structs on the application side, or even explicitly constructing a query which grabs both sets of data at once.

You may argue that the data is in different contexts and this would break the context separation, but if the sets of data depend on each other to the point where they can become inconsistent over a single GraphQL query, then they likely belong in the same context or at least deserve explicit coupling at the database level. Keeping sets of data consistent in the face of concurrent updates is, after all, exactly what these databases were built for.

xadhoom · September 7, 2017, 7:19am

Seems that nowadays everyone in concerned about raw performance and less about strong consistency

Being serious, I’m aware of default isolation model and I was thinking about the serializable isolation model (reading only), others does not apply for the reasons cited above. The performance should not be too bad, also because in my use case I don’t need to handle hundreds of API requests per second, so the performance is not really a concern.

Said that, I think that this chat made me think better (@mjadczak thanks for the pointers!) and maybe I was over-engineering some points, but still is a bit more clear how to handle the need, when it will be present.