Functional and non-functional composition

joeerl · March 26, 2019, 12:45pm

On thing that bugs me and I’d like to comment on.

Programs have functional and non functional properties

Pure functional programs compose - this means that f(g(X)) behaves as if you had written:

Y = g(X) and then computed f(Y)

But this is not the end of the story.

Suppose g(X) takes time T1 to compute and f(Y) takes time T2 to compute

How long does it take to compute f(g(X)) is it T1 or T2 or max(T1, T2) or T1+T2

The answer is we just don’t know.

We have to measure and see.

The time a function takes to evaluate is a non-functional property of a program.

Non-functional properties do not compose.

This is why you can’t just re-use library code it might be too slow, it might be too big, it might not run on your architecture.

hauleth · March 26, 2019, 12:54pm

Computing f(g(X)) can take less time than max(T1, T2) in optimal evaluators (namely we can apply operation N times in O(log N) steps).

So composition of any form can change behaviour of your program.

But still, that evaluation time should not change between Y = f(x), g(Y) and f(g(X)) (in lazily evaluated languages and non-optimising compilers).

dimitarvp · March 26, 2019, 1:45pm

While side effects can be mostly eliminated and the rest of them tamed somewhat in most FP languages, IMO the much bigger issue is the extremely arcane nature of modern processors and OS-es (even knowing your CPU’s machine code is not enough; they have microcode that translates that machine code to something even lower level). Those make any predictable performance a nightmare to reproduce predictably across different machines.

There are good techniques to avoid this – warm-ups and loads of repetition come to mind – but the problem still remains that the architecture you run your code on has way too many moving parts. And the edges of your code almost always end up surprising you.

It’s not hard to understand why many newish DB projects nowadays are specially advertised for being optimal to be run on spinning hard disks or SSDs specifically – being one example.

I’ve became convinced in the last several years that we need strong isolation between programs (OS processes) if we are gonna have any hope for good solid computing from here and on. Qubes OS I guess might be a good start. We also need very strict hardware-enforced policies on which programs get how much resources.

I made it a personal goal to repurpose an old laptop and install Slackware Linux on it – since it’s famous for never doing anything automatic and this presumably gives you a much better platform to measure and understand the non-functional properties of your programs.

ferd · March 26, 2019, 2:41pm

I think this is the wrong conclusion to reach. You don’t know that with your own code either, unless you analyze and measure it. You can do that with library code as well, particularly if it’s open-source and you can audit and learn how it works.

In either case, since as you said the integration/composition of both pieces of code (f and g) is unknown ahead of time, you might be stuck needing to do this analysis yourself no matter what.

Qqwy · March 26, 2019, 7:26pm

@joeerl

Programs have functional and non functional properties

I do not think that this is an imortant distinction to make.
Rather, let me rephrase it as:

Pieces of code (programs, subroutines, methods, functions) have properties.

In some languages, we try to capture more of these properties in the function’s signature than in others:

In most languages, we capture the function’s symbolic name and arity (number of accepted parameters).
In typed languages, we capture the function’s input and return type(s).
In pure (functional) languages, we capture the type of effect(s) a function might return (and restrict/disallow the notion of ‘side effects’).
In dependently typed languages, we capture even more pre- and post-conditions of functions, as well as proving that our functions will not get stuck in an infinite loop (by proving recursion for finite functions and corecursion for ‘infinitely running’ functions).
In formal verification, we might even reason about the theoretical and actual time-complexity of an algorithm and implementation, respectively.

In all of these cases, (except maybe for certain kinds of formal verification, which, because of their ‘total’ approach are time-consuming and hard) knowing more of the properties of a piece of code helps us to reason about how it might perform.

No, this information is usually not complete. But that does not mean that we cannot (re-)use them! Because usually, the pieces are ‘good enough’. And if they turn out not to be during measurement, we can inspect and refine the pieces we are unhappy with, while still being able to re-use existing code for the rest of the program.

And this means that existing code, libraries, and open-source software, are, despite us not having a 100% understanding of what they are doing, very useful.

As a side note: f(g(X), which can be rewritten as Y = g(X); f(Y), will, assuming a strict execution environment and no compiler-optimization tricks (what @hauleth is talking about is one) , take t[g(X)] + t[f(Y)] of time to execute. Yes, it is very true that what exact execution path is taken inside g(X) will alter what Y will be exactly, and that this will alter what f(Y) will do exactly (as well as how long that might take), but this does not change the execution time for any given X: The answer will always be T1 + T2, never max(T1, T2) or something else.

rvirding · March 27, 2019, 12:51am

I think that in this case “behaves as if” and “evaluates” need not be the same thing.

With a pure functional language the compiler is quite free to reorder, combine or split evaluation as it sees fit because it knows the ordering is irrelevant. For example one interesting case for this is a sequence of maps on an initial list where the compiler could combine the operations and just make one pass of the initial list and save creating data at no extra cost.

This of course is not possible if you have side-effects as the effect on the rest of the system will vary depending on ordering. It is actually even a bit more complicated in Erlang/Elixir as the handling of errors is defined in the language. So if an error will occur in the evaluation then any reordering can affect when the error occurs which can affect the rest of the system. Perhaps you could view errors as side-effects?

Qqwy · March 27, 2019, 8:12am

This is exactly the difference between a naïve compiler/runtime and an optimizing compiler/runtime.
In the case of an optimizing one, indeed the known properties (that are in the function’s signature) of a function are taken to e.g. fuse it. This is what Haskell does when combining multiple list-operations. This fusing-system is by the way a set of rewrite-rules that is curated by humans (rather than being fully automated). See the list of rewrite rules for Haskell’s Data.List.map here.

Other compilers, like the C and C++ compilers perform similar steps to do loop fusion.

All of optimization is designed to speed up the common/average runtime case, but this inevitably means (because compilers/runtimes themselves only have the incomplete information about a function’s properties available that was specified in the code) that there will be some edge cases that will run slower.

A good language-agnostic example of this would be the decision on implementing Quicksort: On average, it takes O(n * log(n)) steps to sort an enumerable. And, because its actual implementation is in many languages simpler than other sorting algorithms with the same time complexity, it is a common sort to be used in many environments. However, its worst-case time complexity is O(n²).

… unfortunately, when implemented in the most straightforward, deterministic way, this worst-case behaviour happens on an already-sorted enumerable.
So many implementations decide to pick a ‘random’ element as pivot, making the implementation suddenly non-deterministic (with all problems of non-determinism as a result). But even when deciding on another, deterministic, way to implement which pivot to pick, there will always be enumerables that trigger the O(n²) case.

To get back to your point:

Correct. In the example of Quicksort, we often like to only think about how it behaves in the average (‘behaves as if’) case. (The one exception being when we write real-time software).

At least in Haskell, raising an error is viewed as a side-effect. In fact, Haskell’s Control.Exception module states that:

In addition to exceptions thrown by IO operations, exceptions may be thrown by pure code (imprecise exceptions) or by external events (asynchronous exceptions), but may only be caught in the IO monad.

The kind of exceptions that might be thrown in pure code are things like:

failed pattern matches (which means that your code, while ‘pure’, was not total),
division by zero (which means that your ‘pure’ math is not total),
allocation exceptions (which means that, while your pure implementation would work perfectly on a computer with more memory, we have reached the boundaries of this physical one).

In each of these cases, Haskell says ‘our pure representation of the world is now breaking down, completely interrupt the normal pure dataflow and go to the nearest, effectful exception handler.’

Also note that Haskell has some other kinds of exception-handling mechanisms as well, and has actually multiple escape hatches, for when you require some crazy hack. People will scoff at you, and it is very good that accursedUnutterablePerformIO is well-hidden, but in the case we decide that the only way to solve our difficult problem in a reasonable way on the current hardware architecture stack needs it, it is possible to reach out for these tools. Doing so is very similar to using unsafe in Rust, or writing a NIF in Elixir/Erlang.

I think that speaking for Erlang/Elixir, re-ordering statements in a way that affects when the error occurs (or, to be more exact: in which order errors might occur) should be forbidden (or be opt-in behind a ‘I understand that I might mess with the rest of the program’). Because if you don’t, you alter the semantics of the code, which is something that compilers/interpreters are not allowed to do.

joeerl · March 27, 2019, 9:33am

Back to my original argument.

In my many years of programming most of my most serious problems have been caused by the non-functional parts of the code.

The problem is “side effects” - now in a FPL context the discussion of side effects usually just talks about things like mutable variables and so on - this is the part of “side effects” that is “well understood”

Let’s imagine a type specification language, which I’ll use to specify factorial. I might write

-spec factorial(integer()) -> integer()

What I’d really like to say is

-spec factorial(N) -> M when
    is_integer(N) < 10000
    is_integer(M)
    computes_within(2.3 millseconds)
    needs(4 KB stack)

or something.

And yes libraries frameworks (whatevers) are good and might save me time - but if the code in the library is too slow then it is useless

Unfortunately there is no way I know of other than downloading a library writing code, running it, testing it and measuring to see if it’s any use - all of this is a colossal time waster.

I must have wasted tens of years downloading code that is correct and does work “in some other context” BUT which does not work in my context - because it’s too slow, too big, or incompatible with my the code.

True parallel programming on isolated CPUs may be a way out of this - running a single program on a single core provides an environment where there is tiny chance we might understand what we are doing. Things like occam on a transputer can be analysed for things like energy usage.

Predicting the non-functional behaviour of a program by analysis is impossible (it’s the halting problem -

Getting a program through some fancy type system is the easy part of the problem - getting it fast enough, small enough, secure enough, energy efficient enough is the real problem.

energy efficiency, security, run-time, responsiveness, fragility, brittleness, upgradeability, observability, reproducibility … (I could go on) are all non-functional properties of code.

My feeling is that these areas of computer science are under-studied -

A few years ago I supervised a project “how to debug programs that have no observed errors”

The idea was to debug a program that had never ever shown any signs of error - this lead to ideas of coverage analysis and tracing internals - and we did find errors in programs that we thought were error free.

I could tell some longer stories here - but the margin is too short

hauleth · March 27, 2019, 9:55am

Unfortunately I cannot see this happen as all these values depends on the hardware (ignoring fact that this will depend on the input size), on x86 it can use 4 KB stack, but on ARM or RISC-V it can use different amount of memory, the same goes for execution time. And as a developer, especially in languages like Erlang which are compile-once-run-everywhere, I have no way to know how long and how much memory my algorithm will use. Of course I can document big-O complexity, but that will not help with your problem as O(1) algorithm can be slower than O(n^2) algorithm (for some values of n, as this depends on constants).

Yes, but I do not see any other way to do so. Code that will be blazingly fast on x86 can be slow on RISC-V, there is no way to know that in other way than fetching code and benchmarking it on your own. Ba it can depends on how you have compiled your VM (what flags, what compiler, on what architecture), how you have compiled your code (whether you use +native or not), or what libraries you have available on your OS (something fast on OpenSSL x can be slower on OpenSSL y).

And that will always a problem, because I, as an author of library, have no idea what your context can be, I need to focus on my work. So unfortunately there is no silver bullet that could fix such problems without omniscience.

True, but as it highly depends on the context and the environment where code will run it is as hard as solving halting problem.

For example I once have read a story about ultimate garbage collector:

From: k...@rational.com (Kent Mitchell)
Subject: Re: Does memory leak?
Date: 1995/03/31

Norman H. Cohen (nco...@watson.ibm.com) wrote:
: The only programs I know of with deliberate memory leaks are those whose
: executions are short enough, and whose target machines have enough
: virtual memory space, that running out of memory is not a concern.
: (This class of programs includes many student programming exercises and
: some simple applets and utilities; it includes few if any embedded or
: safety-critical programs.)

This sparked an interesting memory for me.  I was once working with a
customer who was producing on-board software for a missile.  In my analysis
of the code, I pointed out that they had a number of problems with storage
leaks.  Imagine my surprise when the customers chief software engineer said
"Of course it leaks".  He went on to point out that they had calculated the
amount of memory the application would leak in the total possible flight time
for the missile and then doubled that number.  They added this much
additional memory to the hardware to "support" the leaks.  Since the missile
will explode when it hits its target or at the end of its flight, the
ultimate in garbage collection is performed without programmer intervention.

--
Kent Mitchell                   | One possible reason that things aren't
Technical Consultant            | going according to plan is .....
Rational Software Corporation   | that there never *was* a plan!

dimitarvp · March 27, 2019, 12:10pm

-spec factorial(N) → M when
is_integer(N) < 10000
is_integer(M)
computes_within(2.3 millseconds)
needs(4 KB stack)

This is something I wanted for my entire programming career ever since I started tinkering with an Apple IIc as a teen! But only PASCAL only ever offered stuff like bounded integer types. I am not a polyglot though so there are probably more that offer it which I don’t know about.

Outside of that though, I am not seeing it ever happening. There are way too many computing architectures!

hauleth · March 27, 2019, 12:53pm

Also Ada has them, maybe Eiffel. I also believe that systems like Coq, Idris, and Agda also have support for them.

And for sure you can have them in Dialyzer specs.

dimitarvp · March 27, 2019, 1:01pm

Thanks for the extra info.

Dialyzer can’t force anything though.

Qqwy · March 27, 2019, 8:14pm

As @hauleth mentioned, those values depend on hardware. But what about:

-spec factorial(N) -> M 
when is_integer(N) < 10000 
is_integer(M) 
time_complexity_of(log₂(N))
memory_complexity_of(3 integers)

I find it very interesting to see that your argument seems closely related to the Law of Leaky Abstractions that Joel Spolsky wrote about some years back. If you don’t know it, definitely worth the read.

I would like to point out that you probably would have spent more time if you decided to re-invent the wheel instead. For instance, rather than depending on the C-runtime, you decided to write Erlang in plain Assembly: “What if some part of the C implementation targeting this chip might not be to my liking”?

I agree with you that indeed, sometimes we find out that what someone else has made does not suit our purposes, and that it might be frustrating and time-consuming to measure and test to find out if this is the case. But I am fairly certain that instead re-inventing the wheel from the ground up all over again usually takes much more time and effort.

Qqwy · March 27, 2019, 8:45pm

To put a famous quote from Donald Knuth in context:

There is no doubt that the grail of efficiency leads to abuse.
Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered.
We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.
Yet we should not pass up our opportunities in that critical 3 %.
A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified. It is often a mistake to make a priori judgments about what parts of a program are really critical, since the universal experience of programmers who have been using measurement tools has been that their intuitive guesses fail.

At first I tried to emphasize certain parts of this quote but then I decided against it, since actually all sentences in here are equally important.

Most of the time, the solution provided to us by someone else is fine. Only when we figure out that indeed it is not to our liking because we really really really need the extra speed/energy efficiency/memory efficiency/real-time behaviour/upgradeability/observability should we take the extra effort to replace it with a custom-built solution from pieces of one abstraction layer less.

peerreynders · March 28, 2019, 1:33am

C++ Coding Standards - 101 Rules, Guidelines, and Best Practices

6. Correctness, simplicity, and clarity come first
Summary
KISS (Keep It Simple Software): Correct is better than fast. Simple is better than complex. Clear is better than cute. Safe is better than insecure (see Items 83 and 99).

…

8. Don’t optimize prematurely
Summary
Spur not a willing horse (Latin proverb): Premature optimization is as addictive as it is unproductive. The first rule of optimization is: Don’t do it. The second rule of optimization (for experts only) is: Don’t do it yet. Measure twice, optimize once.

Discussion
As [Stroustrup00] §6’s introduction quotes so deliciously:

Premature optimization is the root of all evil.

—Donald Knuth [quoting Hoare]

On the other hand, we cannot ignore efficiency.

—Jon Bentley

Hoare and Knuth are, of course and as always, completely correct (see Item 6 and this Item). So is Bentley (see Item 9).

…

9. Don’t pessimize prematurely
Summary
Easy on yourself, easy on the code: All other things being equal, notably code complexity and readability, certain efficient design patterns and coding idioms should just flow naturally from your fingertips and are no harder to write than the pessimized alternatives. This is not premature optimization; it is avoiding gratuitous pessimization.

…

Qqwy · March 28, 2019, 9:04am

Thanks for sharing, although I am not entirely sure what point you are trying to get across. Could you elaborate?

joeerl · March 28, 2019, 9:58am

Years ago I worked on a project to make software for the Swedish viking satellite.

The software had to control a dozen or so experiments.

Each experiment used a different amount of power.

Each group knew how much power their experiment took.

Amazingly nobody had added all the number up - I added the numbers and came to the conclusion that the total power was greater than the power from the battery and solar panels.

Oooops

So we had to decide which combination could be run together.

The same is often true of software - we know how our SW runs on our computer in our environment - but we do not have a clue as to how it will behave in different environments.

There should be a science lurking in the background here - type system just capture the math properties of code (ie functions are of type int -> int or something - but they do no capture physical properties (time to execute)

I (and @rvirding) are both ex physicists - perhaps it shows in our languages – seems like we know a lot of the maths of computation but rather little about the physics.

How fast something executes is a question that can be answered by studying physics and statistics - do large assemblies of process obey Boyle’s law or Boltzmann’s equations.

The fact that type specs do not even attempt to say anything about the execution time is crazy.

If x takes time T1 and y takes time T2 then x followed by y computed sequentially in time T3 where T3 < T1 + T2 should be impossible

But we don’t write this stuff down so we can never do such calculations.

I did see some work on occam/transputer a couple of years ago that did try statically predict energy usage - not sure how it’s going

Qqwy · March 28, 2019, 10:22am

I agree, although I would like to note that while exact running time and power draw important for real-time and embedded applications (like your sattelite example), there are many environments where these properties are among the far less important ones (composability and replacability are often more important because they keep a system flexible to future change, which is very common in web- and desktop applications).

I do not think that it is difficult or impossible to add those bits of specifications to type-specs; I only think that it is a lot of work and because people have not deemed it important enough so far, it has not happened yet.

To be honest, personally I’d love it if dependently-typed languages (which would be able to capture these kinds of properties in the function specifications with little change or maybe even already) would be used more broadly, but languages like Idris are still in a very early stage, probably in large part because again, interest from larger groups of people unfortunately is not there (at this point in time).

Correct, although compilers try to cheat whenever they can (because usually we are only happy if software runs faster than expected). Conversely, this does mean that some compilers do try to think about (relative) speed of performing one operation vs. another. Some great example of this in the CppCon2017 talk what has my compiler done for me lately, where presenter Matt Godbolt tries shows multiple examples where the programmer tries to optimize code by out-clevering the compiler, with the compiler winning every time, because it for instance knows that XOR-ing a register with itself in x86 assembly is faster than setting it to zero.

So my tl;dr; is that while I agree that it would be nice if we could specify concrete efficiency and memory-usage in our specs, I think that it has not happened yet because other constraints are often deemed more important by the CS-community at large.

EDIT: Actually, what might be even more useful, is to work with the Dependency Inversion principle (AKA working with Interfaces, Behaviours or Typeclasses, depending on your language), where you specify in your own code that you want ‘an implementation that is fast enough’, and then this can type-check against multiple lower-level implementations (one being faster but less memory-efficient, the other being slower but having more predictable real-time behavour, etc.) to fetch the best one depending on the actual properties that your code ends up needing. In your sattelite-example: If all experiments had been given a clear upper limit on power draw from the start, you’d have been in a much different situation (and no doubt some of the experiment groups might have decided to choose their components and implement their software differently).

gregvaughn · March 28, 2019, 3:28pm

@joeerl I really enjoy your thoughts here. I was a bit confused where it was heading earlier, but now I think I understand better. I also hold physics degrees, so maybe that helps

You’re getting at the distinction between computer science and computer engineering. Computer Science (or perhaps more accurately: Computing Science) is a branch of abstract mathematics. This is the discipline where category theory and type systems are from. Once their algorithms have to execute on physical computers, that’s when physics and engineering have to be considered. (I love how that is so core to Erlang!)

You seem to be talking more about some academic computer engineering practices to experimentally determine runtime guarantees from libraries. I don’t think many computer scientists are trained in these skills. Are you aware of any universities studying this, perhaps in some cross-disciplinary group?

peerreynders · March 28, 2019, 6:44pm

I preceived it as a play on the notion of “non-functional requirements” i.e.:

the non-functional characteristics exhibited by some piece of code as it executes in a specific operational environment
assessing whether a of piece code is “fit for purpose” in a given operational environment by meeting certain non-functional constraints like generating its result reliably within a given time and power budget.

All too often don’t optimize prematurely is taken as a licence to disregard efficiency.

Ideally idiomatic practices should aim to be efficient by default (or at least promote the awareness and practice of efficiency) - even at the risk of raising the bar for potential newcomers. Only then optimization becomes an exceptional practice of trading off other beneficial characteristics to gain efficiency beyond the already established norm.

Don’t optimize prematurely isn’t an invitation to be sloppy, lazy, or plain ignorant on matters of efficiency.