How to use PostgreSQL temporary tables with Ecto?

kenny-evitt · January 27, 2021, 10:00pm

It’s very possible I’m overthinking this! But I also haven’t been able to find anything (via Google) about this specifically.

AFAIK, the only way to create temporary tables via Ecto is using the query/4 function available thru the Ecto SQL adapter (or its alias on the Ecto repo modules). That’s fine.

But how do temporary tables work with Ecto connection pooling and PostgreSQL ‘sessions’? Do I need to run queries that use a temporary table in a transaction that includes the query/4 call that creates the temporary table?

Are there reasons why I might want to create a regular table instead of a (PostgreSQL) temporary table (and then delete the regular table after it’s no longer needed)?

kip · January 28, 2021, 1:19am

As you know, temporary tables are deleted at the end of session or transaction. Since a connection (maintained by Postgrex) is a session and connections are reused, you would be best served by ensuring you wrap the temporary table lifecycle in a transaction so the table is deleted at the end of the transaction.

Hard to say the best strategy for you use case on the available data. However using permanent tables transiently means you need to manage unique naming and cleanup in the case of transaction rollback or server crashes.

kenny-evitt · January 28, 2021, 9:38pm

As you know, temporary tables are deleted at the end of session or transaction. Since a connection (maintained by Postgrex) is a session and connections are reused, you would be best served by ensuring you wrap the temporary table lifecycle in a transaction so the table is deleted at the end of the transaction.

Thanks – I did know the first part, but I hadn’t seen any explicit mention of connections being (equivalent to) sessions. That’s one thing I was worried about.

One potential workaround, to avoid using a transaction, seems to be to use a dynamic repo:

Replicas and dynamic repositories — Ecto v3.5.6

The example has a :pool_size setting of 1, which I’m guessing indicates a ‘pool’ with a single connection.

But a transaction should be fine too for my case.

Hard to say the best strategy for you use case on the available data. However using permanent tables transiently means you need to manage unique naming and cleanup in the case of transaction rollback or server crashes.

Of course – but that’s probably easily handled for my use case, i.e. a daily background job.

kip · January 28, 2021, 11:01pm

One thing to remember is that every statement is always wrapped in a transaction. A single SQL statement is wrapped in an implicit transaction. And of course developers and users can declare explicit transactions.

I don’t think transactions are anything to be avoided or worked around since they underpin the ACID properties of a database.

kenny-evitt · January 29, 2021, 8:25pm

Thanks – those are all good points to make explicit for anyone else that finds this post.

I don’t think transactions are anything to be avoided or worked around since they underpin the ACID properties of a database.

I’ve found it generally hard to reason about ‘transaction semantics’ in systems that weren’t designed with it in mind. After the fact, it’s not usually clear what semantics are ‘correct’ and considerations ‘devolve’ to cost-benefit analysis of the relevant tradeoffs (including development/admin time). Or at least that’s been true of most of my experience!

Interestingly, Microsoft SQL Server seems to explicitly cleanup ‘session data’, e.g. temporary tables, when a database connection in a pool is re-used:

SQL Server Temp Tables and Connection Pooling - Stack Overflow

How have you used PostgreSQL temporary tables with Ecto? Strictly in transactions? Or (also) with DROP ON COMMIT?

kip · January 30, 2021, 3:37am

That’s an interesting read, thanks for the link (about SQL Server). As I read it, I think the the SQLServer connection pool is calling a stored procedure to clear temp tables at either connection checkin or checkout (I didn’t see clearly which).

In Elixir, you can implement very similar (same?) semantics with

Repo.checkout fn ->
  # Do stuff
  Repo.query!(“DISCARD TEMP”)
end

Interestingly, this post from @josevalim notes that using Repo.transaction/1 instead of Repo.checkout/1 will likely have better performance.

I don’t have a common use case for temp tables but I found this post on stackoverflow that gives good guidance for using CREATE TEMPORARY TABLE some_table ON COMMIT DROP AS SELECT ..... which also appears to suggest a transaction is required with this form to get the expected results.

hauleth · January 30, 2021, 9:27am

I just wonder - what do you need temporary tables for?

kenny-evitt · February 1, 2021, 6:45pm

Thanks! I hadn’t seen (or remembered) Repo.checkout/1. That looks like what I’d want if I couldn’t use a transaction.

kenny-evitt · February 1, 2021, 7:06pm

It turned out that I didn’t need them; or at least not yet!

There are some alternatives, like PostgreSQL materialized views, but I was testing temporary tables and those alternatives because of ‘bad’ performance for a big recurring query.

I was curious about how others were using them with Ecto and PostgreSQL because I hadn’t found any detailed examples, e.g. of ‘checking-out’ a connection, or otherwise isolating a temporary table from other sessions/connections in the same pool.

hauleth · February 1, 2021, 7:16pm

If that was in recurring query then temporary table should not help much, and if that helped anyway (because was probably used more than once in query) then you can achieve exactly the same with CTE.

kenny-evitt · February 1, 2021, 7:45pm

I don’t think CTEs are generally equivalent to temporary tables, even just performance-wise. One difference being that you can’t index the CTE data. Another being that a CTE can only be used with one query (tho I’d expect PostgreSQL would likely re-use its ‘query plan’ if the same CTE was used for multiple queries).

I really like PostgreSQL’s materialized views as an alternative to temporary tables too. Like tables, they can be indexed. And there’s nice concise syntax for ‘refreshing’ one of those views. The only downside is that it persists, tho that seems like a relatively small cost, especially if it’s going to be used regularly anyways.

nathanl · April 23, 2021, 12:54pm

You can also build an Ecto.Schema for a view (materialized or otherwise) and treat it like any other table. For example, if you have a system with a Game schema, you could have it has_one(:game_stats, GameStats) and have that schema backed by a view.

Refreshing a materialized view could be a nice job for a GenServer. For example, when relevant changes come in, send a message to the GenServer, which checks every N seconds to see if there are any changes since the last time and refreshes the view if there are. It would be harder to implement that kind of rate limiting if you refreshed it using a database trigger, and refreshing it too often could burden the database.

You can create a materialized view in a migration, you just have to do

execute("""
    CREATE MATERIALIZED VIEW game_stats
    AS 
    SELECT
    -- ...
""")

The only way I know to update that later is to write a new migration where you drop the old one and create the new one - copy, paste, and edit from the prior migration.

kenny-evitt · April 23, 2021, 3:33pm

Thanks!

I think I knew that you could use views with Ecto schemas, but it’s good to be reminded of it – seems potentially really handy!

I also like your idea of using a GenServer to refresh the view. There’s so many things one can do with little servers like this.

I’ve already run into scenarios where it was best to use raw SQL in migrations, e.g. instead of referencing schemas (which aren’t migrated, hence subject to change, thus breaking migration code).

There is a OR REPLACE option for CREATE VIEW, but that seems pretty equivalent to DROP VIEW ... and then CREATE VIEW ... anyways.