Securely composing dynamic queries with Ecto using values from database

Hello everyone! How are you? (I hope you are all fine despite the current situation)

Well, I already spent a good amount of time researching and testing some solutions and I’d like to see if anyone would like to leave a suggestion on other ways to solve this before I make it final.

I’m creating an application for game servers that can interact with the game’s database and enhance the administration experience for community managers.
So, usually, there’s already a database setup in place for those game servers (MySQL 90% of the times) and my application’s database only stores configurations, logs, table references, etc. I have two repositories configured for my application, one that I have control over the schema and the other which I don’t and need to query based on certain conditions.

One of the problems I’m facing right now is coming up with a good strategy to allow the administrators of the server to configure how queries are done against their database and successfully building those queries.

Currently, I’m storing configuration on a table as key/value pairs, and the user can set, for instance, the name of specific tables to be retrieved, primary keys (for cross-referencing) and which fields of a given table can be selected (like player rankings, etc).

Most of this configuration is saved as ‘string’ and Ecto, as far as my research went, only accepts ‘atom’ fields, so this requires me to at least convert these strings to atoms on the fly (which I discovered is not recommended).

A simple example would be retrieving a player from a table and only returning specific columns:

# Configs retrieved from the database
accounts_table = "accounts"
account_columns = "id, player_name, registration_date"
username_column = "player_name"

# Something to query
username_list = ["alpha", "bravo", "charlie"]

# Convert some of the string values before passing to the query (account_columns, username_column)

    from a in accounts_table,
    where: fragment("? in (?)", field(a, ^username_column), ^username_list),
    select: ^account_columns

One of the things I’m doing to prevent creating a lot of atoms is to limit the number of columns the user can specify, but I’m not completely happy with the solution so far because it is a very poor workaround.
I also tested storing the queries directly to the database and just including filters, but the fragment function does not play nice with only string params (SQL injection reasons).
I also tried running raw queries and setting SQL variables as an extension point but I lose a bit of composability and SQL variables have their own set of limitations.

If you’re looking to provide an interface for reporting-style queries, have you considered systems like Metabase or Elasticsearch?

For instance, the Elasticsearch Query DSL seems like a vastly-evolved version of what you’re sketching.

Hey @al2o3cr! Thanks for the reply.
Unfortunately, MySQL is a hard requirement (I don’t have control over that). Another thing is that I’m not always retrieving data, I might need to insert it (directly) and because of that, I’m dependending on the game’s database.

Maybe you can use String.to_existing_atom/1 and handle ArgumentError accordingly

I thought of that as a safer alternative and I’ll probably use that to increase safety, but those game databases are huge. Think of hundreds of columns for hundreds of tables (I’d prefer if Ecto was not strict about atom fields when you don’t have control over the schema).

Both Metabase and Elasticsearch are secondary indexes - you’d typically configure them so that changes in the “main” database are pushed to them.

Be very wary of this practice - it’s essentially treating the entire hundreds-of-tables game database as a public API. If the game doesn’t provide a real API, I recommend you write a narrowly-focused one that writes to the things you need and use that instead of sprinkling random INSERTs throughout your code.

IMO this friction is a sign that you’re working against the grain of the tool; Ecto is primarily designed to handle building SQL queries that are known at compile time, and that’s the opposite of what you’re building.

I’m curious about the exact sort of users you’re envisioning for this; there’s a very narrow window of skill between “can understand a schema with hundreds of tables well enough to query it” and “can write SQL”, and the abstraction you’ve described doesn’t remove very much complexity.

I know, I know, I’ve worked with Elasticsearch before…
The problem is that this is not a greenfield project, there are very restrictive constraints in place. I can’t for example, impose an Elasticsearch cluster because it’s beyond the budget. Dedicated game servers are already too costly and most of the time the person behind it is not an expert to deal with a lot of ‘moving parts’.

Yes, it’s not ideal, but usually, this is a tradeoff the server administrator is willing to accept for better management because the game source code is not extendable or moddable enough.

It’s not about removing complexity per se, it’s about extending something that can’t be extended by the ordinary means. I’m gonna give you a hypothetical (but realistic) scenario:

Have you heard of a game called Legends of Aria? If not, the thing you should know is that this game allows creators to build their own rules on top of the base game. So I could, for example, host the game with my own set of rules (which I modified from the game tools). This way, any player that bought the game can enter the official server or my custom server.

Most of the time, those kinda games have tools that are closed source (except for some rare exceptions where the game is extendable to the core - looking at your Minecraft).
Because of that, we only have access to the database as an extension point. If I want to have a website that shows player data or an auction house or even sell cosmetic items, and the game tooling does not support that, the only way is to query and insert data directly to the database.
And then, there’s the fact that most of those games are built in completely different ways, so this has to be configurable by whoever is administrating the server. So far I’ve come to realize It’s a very unique problem.

Not really a solution, but ecto dsl mostly are macros, have you tried to replace that with yours?

eg. accepting strings then do some validation?

I am still trying to understand the problem, so I am guessing that even if you don’t have control over the data schema, the data schema doesn’t change, right? There’s no way the games let users add tables or columns, right? So the problem you have is with making it easier to integrate with various schemas. Maybe you could look into providing a database introspection tool that would generate the configuration? Then create ecto schemas on the fly based on that config

edit: Or even not generate schemas, just use schemaless queries but you’d have the information needed to properly handle columns and their types. I have done a similar thing recently where I have general where functions that use the field(t, ^col) trick and I use the column type information to cast the string input to appriopriate types before calling them

Interesting :thinking:.
I’m definitely check this out, thanks for the suggestion @ardhitama!

The schema might be different, and that’s why it has to be configurable. Perhaps there’s a column called ‘username’ in a ‘users’ table instead of a column called ‘login’ in a ‘accounts’ table - it depends on the game or the version if we are talking about the same game, but usually, there are some similarities, the administrator of the platform has to hint this out to the system.

Something like this lib?
One thing is that it has to be configured at runtime, so I’m not sure.

This is what I’m doing right now, hence the problem with field atoms that I’ve mentioned.
I’m also very inclined to start running raw parameterized queries to avoid hitting limitations with Ecto.

Very interesting suggestions to consider so far.