Opinions on using stored procedures

OvermindDL1 · September 13, 2016, 9:23pm

That is the whole point of a DBA, most places do not have one anymore so they’ve lost access to that knowledge.

OvermindDL1 · September 13, 2016, 9:24pm

I like Riak as a massively distributed store.

However I even use PostgreSQL for a KV store too, it excels at that with the JSONB datatype!

brightball · September 13, 2016, 10:25pm

It’s a complicated subject.

peerreynders · September 14, 2016, 7:53pm

It’s exactly this type of thinking that made the approach popular at first - “just before the turn of the century”. And the fact there is only “one place” that needs to be updated during maintenance - rather than having to push out yet another deploy for all the affected (desktop) applications. But in reality there are a lot of things that can interfere with that ideal …

tl;dr: Stored procedures are powerful, especially for addressing vendor-specific optimizations and issues - however once they are being used as an application development Swiss army knife any type of (poorly informed) misuse or abuse will quickly land you in a rigid (and brittle) cul-de-sac.

I’m surprised that nobody has mentioned to what I perceive as the most severe pain point; stored procedures exist in an environment where circumstances can quickly conspire to make (unit-)testing them a royal PitA - often to the point that “proper” testing doesn’t get done (more on that later).

To put matters in perspective one has to take into account what was going on - “just before the turn of the century” - some of which may have left entrenched manifestations to this day in the guise of “legacy, mission critical systems”. Before DevOps was invented DBAs and application developers were mostly in two entirely separate cultures and camps (aka departments) and while it was recognized that application development should receive solid DBA support, good DBA’s were scarce and typically overloaded with priority work in operations.

Meanwhile among some application developers the view wasn’t that uncommon that SQL, like other DB-related technologies, was just too strange and arcane to be bothered with (“Why can’t I just use a QBE tool to generate that query for me?”). So there often was an “us vs. them” situation between application developers and DBAs. Right off the bat this was already not an ideal environment for stored procedures to be developed.

Now there are situations where the use of stored procedures is entirely justified, however golden hammer was a common affliction which could have one of two extreme outcomes for stored procedures:

Application developers don’t even bother to learn the vendor-specific procedural language first and are then surprised when there are significant differences to THAT OTHER LANGUAGE they would rather be using - i.e. the stored procedures aren’t going to be particularly well written (some stored procedures can be written in a hosted/managed language (Java in Oracle/.Net in SQL Server) but there can be constraints and penalties that can make these type of solutions less than ideal).
An ambitious application developer seeks to exploit the procedural language extension to the fullest extent (i.e. actually puts some effort into learning about it) - and promptly goes overboard by implementing actual domain logic inside the stored procedures. This basically lets your implementation of your bounded context dip into the database. Within the previously described culture there were typically two types of planned deployments to production - the slow one when no DB changes were required and the really slow (and tedious) one when DB changes like updating stored procedures were required (i.e. additional coordination with the high-in-demand DBAs was required). So in reality the inherent deployment inertia associated with stored procedures served as a significant disincentive to place anything into them that had any risk of needing to be changed in the future.

I also believe DB-based stored procedures are the wrong place to “decouple the schema from the application”. Some aspects of the object relational impedance mismatch don’t simply evaporate when you stop using OO. In some sense the DB is dealing with “data-at-rest” while the application is dealing with “data-in-use” and those are very different responsibilities facilitated by some very different “engines”. So while the DB-schema and the application schema could be similar initially, pressures to keep both running at peek efficiency can over time cause them to diverge.

To manage complexity the application should only deal with the application schema, while the database, and that includes the stored procedures, should only deal with the DB-schema. The schema transformation should be the responsibility of a repository (which would actually wrap a mapper like an Ecto repo rather than be represented by it).

Well, for one the procedural extensions can feel quite restrictive especially when compared to the available facilities in current mainstream programming languages and at times it can feel like you are working in an environment that was current 10 years ago. Then there is the environment the code has to run in - there are packages that allow activities like manipulating files and accessing the network but they often have arcane interfaces and are locked out for security reasons. And you need to know the (vendor-specific) RDBMS engine in detail unless you don’t mind stepping on some landmines - for example:

“Final question: what happens when a PL/SQL block executes a query that calls a user-defined function that, in turn, executes a SQL statement?” I asked.
“Lots and lots of context switches?”
“Yes, indeed, a whole mess of context switches, and each switch is a delay. So the more you switch, the slower your code goes. Remember FORALL from that training I did a few months ago? Reduction of context switches was the primary design goal of that feature.”

Thats may be true for a SaaS or PaaS product. But there are cases where the application is the product and corporate clients want to use their own in-house DBA expertise - i.e. they want to use their preferred RDBMS. In that case you want to minimize the use of stored procedures as they would only increase the porting effort to “yet another database”.

Back to my other point. Popular wisdom often declares that unit tests should not involve the database - because interacting with the database slows the unit tests down (which need to be fast, otherwise they aren’t run often enough, etc.). Well, stored procedures live in the database - worse yet, they usually select, insert, update, and/or delete data directly from the DB tables. Unit testing stored procedures proceeds at a glacial pace, especially if you have to test across transaction boundaries with lots of inserts, updates, deletes (truncates), and verifies during test setup and cleanup. By and large I see stored procedures as a potential impediment to automated testing. And even if you can get past the slowness of automated DB testing - you typically still need full autonomy over your testing environment.

Let’s go back to “just before the turn of the century” and see what was going on there.

The corporate databases are HUGE. So huge in fact that due to cost (licenses and hardware) there are only three instances of each database in the entire organization: dev, test and prod.
Due to hardware cost “dev” can’t even accommodate a production set of data - to test against representative data you have to run against “test”. Shared database integration is rampant - so one single “dev” instance is shared among all the application developers of the various active application development project teams. While the application developers typically have the necessary privileges to place testing instrumentation into the “dev” instance they don’t bother because there is no point. “dev” is everybody’s experimental playground and people frequently don’t clean up after themselves, so packages could be invalid, constraints were disabled, etc. - or your unit test could fail because of what somebody else is doing to the database at the time. So when you are truly testing, you are running against “test” - however …
“test” is locked down entirely and completely under the control of the “more operation-oriented” (and hence rarely available) DBAs. To thoroughly test your stored procedures you needed a DBA to move your code into “test” for you. And introducing test instrumentation code into the “test” instance was not an encouraged practice as that wasn’t “production code”. Furthermore any manipulation of the data in “test” outside of production applications was discouraged in order to keep the data representative of production between the data reloads that happened maybe every 3 to 6 months. Basically “stored procedures” were synonymous with “lots of red tape”.

All in all, under circumstances like that most people would avoid using stored procedures unless it was absolutely necessary. This may help explain why “not many people advocate using it anymore”. Of course there are always other opinions. I suppose that if an application/service has full autonomy over its database then there are fewer downsides to stored procedures - however that only works if the product team is in full control over the development, testing and production instances (DevOps style). But even then I don’t think you can even today adopt a “functional programming” style inside stored functions and procedures, so any logic that would benefit from that would have to stay outside of the database.

Future of Stored Procedures

Personally I view the NoSQL alternatives as exactly that - alternatives for situations were an RDBMS is overkill or an ill fit (e.g. object relational impedance mismatch). That doesn’t mean that there is no room for improvement for the RDBMS concept. For my taste the current mainstream incarnations of RDBMS engines are still pretty monolithic affairs, ideally cramming everything onto one single iron, requiring relatively complex and kludgy configurations when they do need to scale across servers.

I suspect that to scale (for huge data stores) the RDBMS concept needs to be deconstructed into its constituent parts like mass storage, query engine and application data cache which can then be distributed and scaled separately as is alluded to in “Turning the database inside out with Apache Samza”. For the time being Datomic seems to be a step in that direction - note however that Datomic doesn’t actually store any data but simply uses existing persistent storage solutions for its mass storage component.

Given that kind of an outlook I don’t see stored procedures having much of a resurgence in the future either, save for engine/platform specific optimizations. I guess it would is possible to (optionally) host code in the application data cache (collocated with the application) but who knows if that would still be considered equivalent to “stored procedures”.

andre1sk · September 15, 2016, 6:30pm

Wow this is a candidate for the most detailed and well structured response I’ve ever encountered on this forum so far. While I would disagree with a fairly large number of the points you made I really appreciate how well you made your case :). Will try to address you points when I have a bit more time, for now just a quick note

This was specific reply to @OvermindDL1

You can use Python, JavaScript, Perl, Java, R, TCL etc. to write stored procedures in PG

brightball · September 15, 2016, 6:50pm

Agreed on the number of languages. Additionally the ability to connect to external data sources and read/write directly to them as well as notify listening external code of changes via LISTEN/NOTIFY pubsub functionality makes the conversation entirely different when discussing them in PG versus all of the others.

If we were purely talking about code that only mattered confined within the single database platform I’d agree with @peerreynders completely, but that’s not at all the case when it comes to Postgres.

To put it another way, if you needed to do ingestion of log data with elastic search would you want to alert and react to incoming data from within your application code or would you want to use the functionality built into elastic search? The functionality within ES makes more sense in that use case.

Postgres stored procedures are no different. Sometimes, it makes more sense to use them. Sometimes it doesn’t. The biggest thing that you have to avoid is subscribing to a hard ideology without understanding what you are giving up for sake of potential portability. With Postgres you’re giving up a lot of tools to adhere to that while simultaneously using a database that is at the bottom of the “motivation to move off” rankings.

Doesn’t mean you should do everything in stored procedures, just that you shouldn’t bypass great tooling at your fingertips without really good reasons.

I actually came across a good use case for a stored procedure because of a problem that Ecto was giving me, for example: http://www.brightball.com/articles/postgresql-functions-with-elixir-ecto

sribe · September 18, 2016, 3:08pm

Brilliant way of putting it!

crusso · September 18, 2016, 6:18pm

Very well said. While creating a clean interface to the data layer through stored procedures sounds great, organizationally it seems like it always ends up as a fail. Creating a silo around the database just creates new and difficult challenges when trying to move the application or system forward.

andre1sk · September 18, 2016, 7:49pm

It seems not a single person mentioned security Be honest to what degree would your app be screwed if someone gains your web app’s db user level of access :)?

sztosz · September 19, 2016, 6:02am

If someone from outside got DB read access you’re screwed. Data will be leaked.

If someone from outside got DB read/write access you’re screwed anyway. And it doesn’t matter if you have logic at app layer or db layer. Data will be leaked. And whatever was written/changed in database has no importance, because you’ll be restoring it from backup anyway (just like you would with app code).

Well that’s how I see it anyway, but I would love to be proved wrong, it might prevent potential headache in future

andre1sk · September 19, 2016, 9:19am

well in a tight setup your DB will have
your web app user it only has execute permissions on stored procedures specific to your user facing functionality
your admin interface user that has execute permissions on stored procedures specific to admin functionality
you run admin app in isolation accessible via VPN only
you setup db such that you can only connect with web app user from user facing web servers.
you would also use uuid or other hard to guess ids

peerreynders · September 19, 2016, 12:59pm

Just because you can doesn’t mean you should. I’m in no position to criticize Postgres’s approach or implementation but in my experience it is helpful to approach features like that with a healthy amount of skepticism. Before using any of it I would have to research/investigate issues like:

What is gained by running the code inside the database rather than simply collocating it on the same server hardware? Does the code perform worse because the virtualized environment executes inside the database server software? What is the actual performance benefit, if any, of having direct access to the database inside of the server software?
What are the constraints on the type, shape and the manner of which data that can be passed both ways to the code compared to “in process” and “RPC” implementations? What is the data marshaling overhead compared to an “RPC” implementation both in terms of development effort and run-time overhead.

Furthermore in my expectation any code going into the database is the result of some sort of optimization effort related to a database intensive task - the “raison d’etre” for procedural extensions like PL/pgSQL - so really the procedural extension should likely be the first choice for code like that to get optimal results. It just so happens that the “domain logic” may not be as clearly expressed and separated as it could be in another paradigm. But by moving to THAT OTHER LANGUAGE you may be giving up some of the optimization benefits that the procedural extension could give you. In the end it isn’t that unusual for clarity to be traded off during optimization - but it still needs to be recognized that there is a trade-off. So when you put code into the database it better be worth it!

I’m not trying to be dogmatic either - but I would adjust your statement to “make use of great tooling at your fingertips as long as it makes sense” - just because it’s available doesn’t automatically imply that it is beneficial in your particular situation. While you may need to use the facilities of libraries, frameworks or other systems to get the job done it is also part of dependency management is to ensure that you minimize entanglement with those resources while getting exactly what you need out of them.

That is a convenience feature - not a security feature. It’s convenient because it saves the administrator from having to micro manage the rights and privileges on the involved database artifacts individually.

Last statistic I ran across quoted that 70% of all cyber crime originates from behind the firewall. In that type of a situation stored procedures may help data integrity but won’t do much for the organization’s “integrity” or that of it’s clients.

andre1sk · September 19, 2016, 1:24pm

Something being a useful mitigator for 30% of situations is still an important security feature The amount of damage inflicted trying to guess procedure names and call signature and ids for rows vs running DROP and bulk selecting, updating, deleting whatever your please are 2 very different things.

brightball · September 19, 2016, 2:52pm

I agree that you shouldn’t use something just because you can. Use it because it makes sense.

Postgres search features are a good example here. Entirely custom of Postgres but by using it you get to avoid the need for a standalone engine that you must maintain separately and keep data in sync. Pick a database because the database suits your needs.

knewter · September 19, 2016, 5:14pm

The biggest problems I saw with Stored Procedures / Stored Functions (aside from a supremely boneheaded way one of them is implemented in MySQL, which I’d gladly discuss by voice somewhere but typing it up would make my blood boil) are as follows:

No freaking source control.
(that’s really the main thing)

I’ve never seen any organization handle these well in production in such a way that one could easily audit changes, roll back, etc. I had a grand idea for how I’d do it in rails, but the DBA at the time really just preferred replacing them ad-hoc with no oversight so I didn’t fight the battle at all. This was for a bit of software we built from the ground up that sold for many hundreds of millions of dollars and handled > $1B in tx/year at scale (consumer financial product). So it’s not like it was poorly funded at all. This was in ~2008 so it’s also not like we didn’t know better.

Anyway, that’s the last time I used them in production. I despised everything that had to do with testing them, and the “CTO” (in quotes for sure) that they brought in replaced a lot of battle-tested but slow code with an ultra-fast (but incorrect!) implementation. After I spent weeks building him a test harness so he could verify against > 1-year of past hand-verified daily reports. (He never ran the tests. He repeatedly lied to management and claimed he had). He was the belle of the ball until they found out he’d ruined their rapport with the finance department of their biggest client because the new fast reports were in no way correct, at which point he got unceremoniously booted out the door.

So yeah…I basically never want to think about them again, but only because operationally they suck.

andre1sk · September 19, 2016, 5:55pm

Hmm thats seems to me more a criticism of mismanaged release process and “dumb” CTO then of using sps :). Also MySQL is very poor indicator of usefulness of many RDMBS features

knewter · September 19, 2016, 6:40pm

I only partially agree. I’ve worked with a ton of teams of varying stature. I’ve never seen a well managed stored procedure version control solution (closest I’ve seen is procs with dates appended and functions that were upgraded over time to call the ‘correct’ proc, which at least allows rollback appropriately).

And agree wholeheartedly on MySQL. I wasn’t a fan of MySQL at the time but that project was what pushed me to the “never again” column.

But yeah I worked for a Microsoft Gold Partner in the past, which meant all the biz logic lived in SPs. It was fine and fast and good and development was slow and untestable and awful. These were MCSD/E folks as well, so on paper they knew their stuff at least.

andre1sk · September 19, 2016, 7:04pm

We just use migrations, I would also guess based on your points that there were multiple apps using same db and sps? I am not saying we would use sps on every project but we go full sp approach on projects that need to be extra tight as far as security or we also will abstract all db objects via sps on projects that will likely to grow a lot
so we can tweak underlying structures without affecting higher level code.

peerreynders · September 19, 2016, 8:33pm

Andrei,

This isn’t about who’s right and who’s wrong - but I’ve only met people with your “enthusiasm for stored procedures” who were just starting out to use them. Typically people “who have been using them in the trenches” have a much less enthusiastic attitude towards them.

Now granted most of that experience comes from a time where IT organizations structured themselves around technology layers rather than business capability (application projects vs. domain solution products) - but if you follow the contemporary micro-services “news” you will notice that it is the organizational change, not the technology that is potentially the biggest hurdle to successful adoption of “beneficial” tools and approaches - I guess it is just so much “easier” to organize around technology layers.

Going forward, you (and your organization) just need to be extra careful to not repeat past mistakes. To maintain development velocity you need to be able to refactor; to refactor effectively you need source control and unit tests - this also applies to every aspect of the database - but all too often those aspects were neglected in the past - with predictable results.

Scott Ambler - Database Refactoring (2006):
p. 15

Database refactorings are conceptually more difficult than code refactorings: Code refactorings only need to maintain behavioral semantics, whereas database refactorings must also maintain informational semantics. Worse yet, database refactorings can become more complicated by the amount of coupling resulting from your database architecture …

p. 68

5.13 Beware of Politics

Introducing evolutionary database techniques, and in particular database refactoring, is likely to be a major change within your organization. As Manns and Rising (2005) point out in Fearless Change, politics is likely to rear its ugly head as you try to make these improvements to the way that you work. Although many IT professionals prefer to ignore politics, naively believing that they are above it, they do so at their peril. The techniques that we describe in this book work; we know this because we have done it. We also know that many traditional data professionals are threatened by these techniques, and rightfully so, because these techniques force them to make significant changes to the way that they will work in the future. You will likely need to “play the political game” if you want to adopt database refactoring within your organization.

Good Luck!

andre1sk · September 19, 2016, 8:45pm

Cmon dude that a bit of a cheap shot making arguments based on personal assumptions is a very weak sauce way to make a point [quote=“peerreynders, post:39, topic:1629”]
Going forward, you (and your organization) just need to be extra careful to not repeat past mistakes. To maintain development velocity you need to be able to refactor; to refactor effectively you need source control and unit tests - this also applies to every aspect of the database - but all too often those aspects were neglected in the past - with predictable results.
[/quote]

Very true and absolutely orthogonal to the question at hand