Combining postgresql and neo4j

simrayz · January 17, 2020, 12:11pm

I’m using a Neo4j database to supplement my Postgresql database. Postgres is used to store the entities themselves (e.g. users), and should always be the one source of truth. However, I am struggling with figuring out how I make sure the Neo4j database consistent with my postgres database. I am using Ecto and Bolt.Sips as my database adapters.

How can I make sure that a user is only created/deleted/changed if the action is successful in both databases? Both Ecto and Bolt.Sips has transaction/rollback functionality, but is there a way to combine them?

This is what I have so far, but this would cause problems if the postgres transaction fails after the neo4j transaction has finished.

def delete_user(%User{} = user) do 
  Ecto.Multi.new()
  |> Ecto.Multi.delete(:delete, user)
  |> Ecto.Multi.run(:delete_neo4j, fn _, _ ->
        Neo4j.Accounts.delete_user(user)
  |> Repo.transaction()

lpil · January 17, 2020, 12:43pm

There’s no simple way to have transactional behaviour across two database. You could look into implementing patterns such as the saga pattern but this is a lot of work and easy to get wrong.

The more straightforward solution may be to manually write some undo logic for the Neo4j database that runs after a postgresql transaction fails, but this will be error prone and imperfect.

I would be tempted to remove the Neo4j database if possible and move the graph queries into postgresql using recursive queries.

simrayz · January 17, 2020, 1:42pm

The reason we’re using Neo4j is to handle complex relationships. For example, we have containers of widgets, that can be connected to other containers. Containers can also subscribe to widgets of other containers.
Another reason is that access permissions are not necessarily granted directly (as in user has permissions on container), but by checking if the user has permissions somewhere in the hierarchy. E.g. user A is an admin in organization B, which has a group C with a container D. Simply by being an admin in B, user A should have permissions on container D.

Would you still opt to model it in postgres, despite there being a potential for many levels of joins?

lpil · January 17, 2020, 2:26pm

I would build the system using the existing postgresql database and recursive queries, and then benchmark the system with generated production-like data to measure performance to see if an alternative approach is needed.

If the performance of postgresql is sufficient (in my case it was) I would go that way as (in my opinion) if offers a better operational and development experience while offering better durability and data integrity guarantees. In addition use of single database means we can use database transactions

If I were to use both I would likely write to Postgres in the transaction (as the primary datastore and source of truth) and then replicate a subset of that information to Neo4j for faster querying with eventual consistency.

WolfDan · January 17, 2020, 3:00pm

I ran into this problem before with an app I’ve worked for a while, as @lpil mentioned there’s no simple way to have two transactional behaviours, depending on the app using postgres only is the right solution, but I’ve work with graphs and I know there’s way too many benefits over a relational one on some use cases

In this case I’d recommend you look into another solutions that guarantee consistency over the whole single database while maintaining the graph data structure

https://github.com/bitnine-oss/agensgraph a graph database on top of postgres, you’ve both sql and graph queries available and you can combine them as well, the problem is documentation, is hard to get started

https://dgraph.io/ it’s a distributed and acid complaint database, it’s supposed to be the same as postgres consistency while being graph and distributed, tho in my time I was using it the dev team was quite “problematic” and didn’t focus well on what they want to do, so I stop using it, I don’t know the actual status of the dev team right now but seems like everything is going fine now

https://github.com/cayleygraph/cayley this is more like a plugin for many databases, it works for postgres too so you can do the same as agens graph, documentation is a problem as well

In my case I just end up using https://www.foundationdb.org/ and creating a graph like layer myself (https://github.com/OkamiIO/Nomure , I’ve another version but is not released yet, once is finished I’ll share it for the moment the code you saw there is not the best),

There’s another graph layer for fdb like https://github.com/experoinc/janusgraph-foundationdb tho is not complete afaik. The folks at Ebay made a new layer on top of janusgraph and fdb but they have not released it (yet?), for more info check here https://forums.foundationdb.org/t/foundationdb-summit-2019-nugraph-graphdb-as-a-cloud-service-built-upon-janusgraph-and-foundationdb/1765

domvas · January 17, 2020, 4:01pm

Using only ecto, it’s not possible to manage a transaction on both postgres and neo4j at the same time.
To do so use existing pattern like saga as @lpil said or implement your own system: having a transaction on top on the two other, use events to do / undo operation, having ‘Temp’ label in Neo4j, etc.

Well, everyone seems to show the way to “put all your data in postgres”. But why not the opposite? Is having all the data is Neo4j so horrible? (Well, if you plan to do mostly statistics, yes it’s awful…)
What offers postgres that couldn’t be done by Neo4j? (except that they are not designed for the same purpose )
And keep in mind that if you don’t do “heavy” traversal queries, you probably don’t need Neo4j. If all your requests can be express with Ecto.Query, then you don’t need Neo4j.

And this:

If I were to use both I would likely write to Postgres in the transaction (as the primary datastore and source of truth) and then replicate a subset of that information to Neo4j for faster querying with eventual consistency.

works very well.
But it depends on wether you need to serve ultra fresh data or not.
Again, trade-off…

yourpalal · January 17, 2020, 4:16pm

I’m going to be contrarian and suggest that yes, this is possible!

You need to use a 2-step transaction in postgres (https://www.postgresql.org/docs/9.3/sql-prepare-transaction.html)

Basically it goes like this:

postgres: BEGIN TRANSACTION
neo4j: BEGIN TRANSACTION

...

postgres: PREPARE TRANSACTION
if failed, neo4j: ROLLBACK
neo4j: COMMIT
if failed: postgres ROLLBACK PREPARED
postgres: postgres COMMIT PREPARED

the key being the PREPARE feature in postgres. This allows you to almost commit a transaction, with the possibility of rolling it back. The transaction is pretty much committed at that point, and any database checks such as constraints etc. have passed. Then you can commit neo4j knowing the postgres data is safe, and then finally finish up the postgres commit by telling postgres that it’s okay to finalize that transaction.

I’m not sure what the performance implications are but this is the safest way to do it.

Edit: It looks like postgrex doesn’t have native support for this but I feel like it would probably not be that hard to either:

make a PR To support it
hack it into your codebase in some way

OvermindDL1 · January 18, 2020, 9:24pm

Well if there is a neo4j FDW plugin for postgresql, then postgresql could do the cross database transaction as well as any is possible.

FDW plugins let you use another database from inside pgsql itself, I use it and it works well, is not a touch slower but hey joining across makes up for it!

sorentwo · January 18, 2020, 10:10pm

Storing data in both Neo4J and Postgres isn’t something I recommend. As mentioned earlier in this thread, something should be the source of truth and then you can replicate to another data store for querying.

There is a great talk on this exact situation by Glen Vanderburg at RailsConf a few years back (this was a project I was initially involved in): https://m.youtube.com/watch?v=Nz-aU3vOFbw

simrayz · January 19, 2020, 4:04pm

Thanks for a lot of good answers!

From what I’ve gathered, using Neo4j as document storage is not the best idea (it doesn’t scale well). However, a lot of projects use a combination of MongoDB as document storage and Neo4j for metadata. I’m thinking that Postgres should be able to fill the role of document storage just as well, as it can store json documents and keep separate fields for what needs to be queryable. Is using Mongo + Neo4j a good idea, and if so, why is this not also the case for Postgres + Neo4j?

The reason for using Neo4j is to simplify modeling of complex structures such as permission hierarchies, as it can be queried and examined exactly as it’s stored, but also to make queries faster when they get several levels deep. I feel like “don’t do it” is a bit of a rash decision, as there’s no doubt that graph databases excel at both these points. However, I can’t find a single record of someone combining Postgres + Neo4j, so I do feel like I’m doing something wrong

lpil · January 21, 2020, 12:54pm

I think with Mongo + Neo4j you’ll likely have a lot of the same problems as with Postgres + Neo4j. Mongo vs Postgres is probably a matter of personal preference in many ways, but I would go for Postgres even if I wanted a document database (using Postgres JSONB).

sorentwo · January 21, 2020, 1:13pm

The talk I linked is about the experience of running an app that combined Postgres and Neo4J. He details the pain points and the extended process to migrate to Postgres entirely.

OvermindDL1 · January 21, 2020, 3:45pm

You can even build indexes on json fields. ^.^

PGSQL has ‘path’ based support, I forget the commands off hand but they are in the docs, but those sound better for that use-case?

tjdam · October 6, 2020, 10:49am

Hi @sorentwo! I just watched the talk you linked as I’be been playing with Neo4j and thinking of using it in my next project… but something Glenn mentions in the beginning of the talk is that the mistake they made was they chose to use Neo4j when they believed they were building a social graph when they were not… but what he presents as being the data and relationships they had seems very much like a social graph to me… could you expand on that a little?

Thanks

sorentwo · October 6, 2020, 4:30pm

Glenn (and the recorded talk) will have a more accurate memory of the situation than I do, but I’ll share what I can recall.

Much of the data fit into heavily interconnected network of connections, at least for storage. The way that data was used for display and analysis didn’t fit so well. The access pattern was relational, rather than “crawling” through a graph.

The biggest issue, IIRC was with the performance and instability of the Neo4J driver in Ruby. It had concurrency issues and performance bottlenecks that couldn’t easily be worked around. It necessitated backs to correlate PG and Neo4J data and use broad caching layers to prevent ever connecting to the database.

tjdam · October 7, 2020, 5:19am

Yeah, he did mention the issue with drivers, but didn’t expand much on this.
I’m a bit afraid of choosing it due to the current state of its support in Elixir.

Thank you for sharing this, highly useful information

simrayz · October 10, 2020, 7:26pm

Hi there, OP here. Since the start of the year, we’re still using a combination of Neo4J and Postgres. It’s working fine for us, but it does require a lot of low level decisions. In addition, we’ve experienced difficulties when onboarding new employees, as the complexity of running two databases with a custom solution for keeping them synced is hard to grasp. Furthermore, there’s no unified way of writing “migrations” for Neo4j, as it is with ecto. As such, I had to write a script that runs the Neo4j updates manually. Keeping this updated as the business logic changes is a challenge.

If I were to rewrite our backend, I would try my best to use Postgres as much as possible. The reason being that we haven’t yet utilized deep, multi-level queries which are the main selling point for using Neo4j. That being said though, writing Neo4j queries is genuinely fun. Adding a new ad-hoc relationship is done in an instant.

However, we have chosen to use the combination for the foreseeable future. Here’s a few details on our implementation in case you’re interested:

We are using Ecto.Multi to make sure that create/update/delete functions are run properly in BOTH databases before committing the postgres update. Aa such, the Neo4j query runs last.
I created a duplicate schema for the Neo4j module, and use this to determine whether or not the neo query needs to be run based on changeset changes.
We only add fields to Neo4j that are used for querying. E.g. uuid, username and name for Users.
When loading documents themselves, such as our dashboards, we only use Neo4j to check the given user’s permissions and connected users.

Don’t hesitate with reaching out if you’re interested in knowing more!

tjdam · October 14, 2020, 12:41pm

Ohh, tons of food for thought there…! I will begin my experiments this weekend, awesome to know someone around here battle testing it , will definitely be in contact!

Somewhat related, are you aware of Seraph project?

I found it recently and I was wondering if someone like yourself with some experience handling raw Neo4j queries in production would find it valuable. In particular the idea of an interface that resembles Ecto but it’s built around graph database “mechanics”.