Very complex ecto query design

So I’m porting an old SQL generator that creates rather amazingly complex queries at run-time, let me hop in to some code to get the point across quickly:

def query_information(main_id, lookups) when is_list(lookups)

Used like:

lookups = [:name, :first_name, :last_name, :middle_initial, :employee_email, :personal_email] # and more, this lookup list is generated dynamically at runtime and can be fairly huge
results = query_information(main_id, lookups) |> Repo.one!()
name = results.name # Results is a map keyed on the lookups with the data

Basically query_information needs to take a large amount of ‘things’ to look up, and this would be fairly trivial if they were all in one table for ecto, however I’m working with a very very old database and querying even just ‘name’ involves 4 tables and 3 joins and a lot of where conditions, and trying to get something like the ethnicity is huge, requiring 3 sub-queries, a couple SQL case statements, and over a dozen joins, it is a bit of hell yes.

And to top it off, the database is slow, even just doing a simple query on a single table (like ‘username’ if you already have the main_id, which you usually don’t, so you require a minimum of one join there, usually more) is not what one would call ‘instant’.

Now interestingly actually running the query is not too slow, it is the sending of the query ‘to’ the server and getting the data back, so need to minimize the sql calls as much as possible, hence this previous amalgamation of horror.

So the old one in another language created a string of SQL, not too hard to do actually considering that the database has one oddity going for it, every single field in every single table is prefixed with the table name, so the ‘id’ column of table ‘blah’ is blah_id, so what the horror query builder did was process all joins that would be necessary for the requested things then make an utterly massive select statement and 'when’s and so much more, all using the field names, no tables needing to be referenced anywhere.

So… I’ve been tasked with re-creating it in Elixir, and I’d prefer not to make the potentially unsafe string horror and would prefer Ecto, however I’m having issues coming up with a good design.

I’m stuck even just on the first step where I have to setup all the joins based on the information. First I iterate the lookups list and have each command return the tables it will need to join at the top level to, then I uniq that, then I need to join them all based on criteria that each needs talking to another, in most cases it is the main_id, in some others it will be querying other fields.

The main issue I am having is the query syntax, to be able to reference prior joined tables I need to pass the bindings in, however I do not know how many bindings the join has at that point unless I pass a counter around, nor would that counter be useful since it requires a static binding name. If these lookups were known at compile-time for a given call this would be trivial in macro’s, however it seems what I am needing is either the expression syntax to not take bindings at all (so I can just reference the globally unique field names as there are no identical ones, except in the few cases a table joins itself, yes that happens too), or I need the bindings to be able to reference a specific table, say by name or something, but I cannot find a way to do this…

Please help… I’ve been fighting this all day… :frowning:

6 Likes

The pain of working with old databases…
At work I am porting and old “system” that consisted of running 4 abominable queries directly on an excel sheet, depending on the client the query could take a whole day to complete.
For each client we have one spreadsheet configured with the DB connection and the tweaked queries, because some are Firebird, some Oracle, and a few MSSQL.

Is it possible to connect to a database at runtime with ecto?

1 Like

Just curious: How bad would it be to just use postgrex with parameterized queries? You avoid sql injection, you get to use elixir, but you can also handle all the super weird unconventional sql patterns.

2 Likes

Maybe a thin wrapper on top of postgrex to make some common things easier? What’s the issue with building the SQL? you have white list of valid field names so that is not an issue and values will be escaped by postgrex or whatever you are using? is there a lot of data in DB?

1 Like

I don’t know this lib: https://github.com/bettyblocks/sql_dust but maybe its can help / bring some ideas?

2 Likes

condolences :slight_smile:

First I’d try to simplify things, if it’s a part of moving to a new system, I’d write cli tools to migrate the whole DB or parts of it to a new more relevant structure that can be sanely queried.

If it’s not an option I’d probably reimplement the whole legacy raw SQL building logic. Then with this implementation and a lot of tests in place, I’d try to make it better one step at a time, it will not be perfect but will probably get better in time.

It might be totally possible to jump to Ecto right away, but I have no good ideas with the scary description you provided.

1 Like

Yeah it is an old Oracle database that I’m accessing for this (through PostgreSQL, which oddly adds no measurable overhead compared to the raw connection, and actually ends up faster in some cases).

Unsure about runtime connection with Ecto though…

I’d have to make such a query for every combination of lookups they want, or have to do a lot of conditional checks inside of it while somehow getting the data that is wanted back, it is quite a pain…

There is about 30 years of data in the database, it was built 9 years ago and the old data was put in, it is massive and a single lookup of a single ‘main_id’ on a single table takes a half second, after a few joins it slows to an utter crawl.

Let me detail how this is set up a bit more. :slight_smile:

  • The database itself is 9 years old, Oracle setup, been upgraded a few times but still quite out of date (the fdw wrapper for postgresql supports it at its older versions).
  • There is no assigned primary key in any table.
  • Every table has a field named <tablename>_change_ind, which is a single character field, if it is null it is the ‘active’ field, if it has any character at all then it is old and should be ignored, thus every-single-table you query or join you have to add the <tablename>_change_ind IS NULL check to it, and don’t forget. Thankfully this field is one of the few that has an index. The value, if it has one, is always Y, but we don’t care about it, just if it is null or not.
  • The account records have a global ‘main_id’ called a pidm, or named in each table as <tablename>_pidm, it is one of the few integer fields, lots of duplicates of this as records are updated (no row is deleted, any change sets the <tablename>_change_ind field and inserts a new row), this also balloons the database size. The pidm fields ‘usually’ have an index table, but not in all tables.
  • The 'pidm’s are not always up to date either, occasionally a user record gets inserted multiple times with a new ‘pidm’, where a resolver is run once a day to combine them, the old records are ‘deleted’ (<tablename>_change_ind is set so there are no non-null ones left) and only the new one exists.
  • Accounts also have an ‘id’ field (a string) on one specific table, which is ‘usually’ more static than the ‘pidm’, although it changes on occasion too (resolver runs over it).
  • The actual name fields on the accounts change on occasion too (when a duplicate is entered and the resolver runs over it).
  • Most joins end up using the id field to look up that table to get the ‘active’ pidm that end up grabbing a dozen other tables. For example, to grab ethnicity you first grab that table with the id to grab the pidm to join that to another table that grabs a column on it to use to look up another table (with no pidm) to see if it uses the old or new ethnicity syntax, which is then conditionally joined (in sql the whole time) to another table, if the old style that other table has the data in its columns, if the new style that table is then used to grab information from to join to 2 other tables to combine the data to come up with their final value, then all the values are massaged with a few sql case statements into the same format for usage. This is one of the more common complex queries, although there are worse. The ethnicity SQL expression is multiple pages long if you are getting just that (and good luck piecing it out of the big function that grabs whatever you want).
  • Most key’s everywhere are ‘character’ fields of various lengths, there is the occasional ‘datetime with no timezone’ field, and the pidm is I think the only integer field that I’ve found (regardless of how much integers should be used elsewhere), but about everything are variable character fields, and there are very few indexes and no unique indexes.
  • Oh, and we only have read-only access, so we cannot create new indexes or anything of the sort, and even if we could the system would wipe the changes at the next bi-yearly update (which takes days, of which everything is down during).
  • And this is only the general bit of it all…

So yes, this system is… interesting… :cry:

Interesting, never ran across that before, will look at it closely soon. :slight_smile:

Yeah this is interesting… ^.^

Nope, very old system that none of us have any clue how it is still functioning (thankfully daily backups at midnight when it is not in use). Plus read-only access. Plus it changes so often that cache’ing is near impossible…

Yeah I’ve partially ported over the horror of the original SQL string builder, it is hell and not what I would consider safe, I really really do not want to continue it…

Yeah I am at a loss with this problem… Ecto ‘seems’ to be lacking a lot of features to be able to handle dynamic SQL building (I really want a way to bind to specific indexed or ‘named’ tables… really badly…), but I am hoping someone corrects me. :slight_smile:

1 Like

Wow sounds amazing wonder why is it using Oracle if not even the basic relational features are being used. Seems other than maybe building some views on on pg side using fdw you don’t have many options to simplify things.

1 Like

Government builds, to be expected in competency. :wink:

I was thinking of lots of views then subquerying those into a main thing as I needed as well, but unsure what kind of cost postgresql would give with that, I’m pretty sure it will do one query per view to the fdw instead of all at once…

1 Like

@OvermindDL1 I had this pain before… what I did - and I had the “green card” to do it - was to move the database to a graph database…

When I used for the first time graph databases and I saw what I could do with so little compared to relational databases and their crazy joins and complex queries, I felt I was being cheated my entire life. Graph databases are awesome and you can model nearly everything (to be honest, I’ve not crossed a single thing I couldn’t model in a graph database).

And yes, some might say that graph databases are only great for really interconnected data. I disagree. They are great for any kind of related data.

1 Like

It absolutely cannot be moved here. The old software will only work with that very specific Oracle setup and it is absolutely required for legal compliance, so no go here. ^.^

And yes, graph databases are awesome, though often slower in many cases from my experimentations. :slight_smile:

1 Like

this is probably a distraction, but I was experimenting with Ecto code generators …

https://github.com/nathanjohnson320/Ecto-Generator

1 Like

Views are nothing more than another index. Unsure about postgres but in DB2 you can decide whether the logical view’s index is maintained or rebuilt only when opening the view. Sure you have to bite the bullet but you usually get to decide when.

1 Like

Well the issue here is the fdw layer in PostgreSQL to the Oracle server, each view I’m pretty sure will cause a sequential scan on every-single-view access, but need to test to make certain. :-/

2 Likes

Ask them to create the logical views on the Oracle server and give you read access to those. Things like that aren’t out of the ordinary for most corporate shops. Government, well I’ve heard firsthand IRS shop horror stories but it can’t hurt to ask considering it might be essential for the project to be performant.

1 Like

Heh, as stated above, read-only access only, no chance at all of any modifications, the database, its schemas, and access controls are managed via this other Oracle software that will revert any change you make. We all want to make changes to it, especially adding indexes on a few things that do not have them. :wink:

1 Like

Just curious what you end up with? :slight_smile:
I’m also straggling with kinda complex (not like in your case though) ecto queries for analytics dashboard.
I’m end up with multiple functions in SQL and short parameterized ecto fragments, but it has it’s own drawbacks, like adding or changing function is pita and they are really aren’t that composable.

1 Like

I eventually went to a combinatorial explosion of possibilities via macro generation (better than writing it all by hand!) but then the current master branch of ecto has got named bindings, which simplified it all TREMENDOUSLY if you don’t mind using a master branch. Only thing I want now is a way to query a query for if a binding name is currently existing in it (it ‘might’ exist? I need to check) as then I could stop carrying around my own custom state, which would then simplify my Repo itself. :slight_smile:

1 Like

Ah, macros power :slight_smile:

Will definitely check named bindings, thanks :slight_smile:

1 Like