My expectation is that the multiple tables approach would outperform and make for less load than the approach involving embedded schemas. RDBMS systems, especially mature ones like PostgreSQL, are highly optimized for dealing with row oriented data. That is, after all, their purpose in life. As a rule of thumb, absent any reason of substance, don’t try to second guess the normal mode of working with the tool: use that mode until there is a demonstrable reason to go to some alternative approach.
But lets think through some of the trade-offs.
Note that my knowledge of the following can be described as “hand-wavy” as I’m just a frequent user of databases and not a builder of them… expect some errors in fact or understanding in what follows.
First, let’s consider how PostgreSQL deals with rows at a lower level.
Under ordinary circumstances, PostgreSQL stores rows in 8K chunks or “pages” in the file that makes up a table; in addition, PostgreSQL will store 4 rows in each page. This means that the data (and overhead) of each row in the table is limited to 2K in size. Of course PostgreSQL allows rows that are far bigger than 8K, let alone 2K and the way it does that is via the TOAST (The Oversized-Attribute Storage Technique) mechanism. This is a different file on disk which stores larger values (PostgreSQL: Documentation: 16: 73.2. TOAST).
If your embedded schema JSON object pushes past those in-table storage limits, you’re going to be adding values to a second TOAST file anyway; in some ways this acts like a de facto second table. To be fair TOAST works really well in practice and I rarely sweat if I’m going to be TOASTing attributes or not, but if we’re splitting hairs at this level, we’re in the range where these questions will matter. TOASTED values have a property that can make it pretty expensive when updating TOASTED values: they’ll duplicate the entire value on update.
So, lets say that your embedded_schema entries are stored as an array of JSON objects for each user record. If a user has a lot of entries, adding/deleting/updating a single entry will result in the entire collection being re-written. On the other hand, I’ll bet that the data you’re talking about stored as regular rows in a table, will pretty easily allow each entry to sit as individual rows in the 8K page of the primary table file. Inserts are just new entries in the table, updates will duplicate the individual row being updated, and deletes simply mark the deleted record as deleted. No re-writing the big JSON value for each change.
If table data is actually giving you a performance problem, in an RDBMS there are typically better ways to address that than denormalizing the data into blob-like structures. Alternative indexing strategies (BRIN, GIST, etc) or techniques such as partitioning are often times better than trying to devise one-off methodologies.
Really the only time I would suggest using JSON/JSONB (and similar complex data representations) is when the data itself is genuinely unstructured. Otherwise use tables.
For many applications genuinely unstructured data can be rarer than expected.
Here’s some further reading that might help understanding some of the dynamics better: