ExUnit checking if Enum.at is called?

epailty · September 21, 2016, 5:41am

I need to be able to check if Enum.at() is called, inside of ExUnit tests.

I tried the following:

test "...my description excluded ..." do
    with_mock Enum, [:passthrough], [at: fn(["A","B"], 1) -> "B" end] do
        .... my code excluded here
        assert called Enum.at(["A","B"], 1)
    end
end

And I get the following:

** (EXIT from #PID<0.69.0>) killed

Looking at the source for mock, the cause is obvious - its using Enum, so mocking Enum will break the internals of mock. ExUnit sources also use Enum throughout.

So using meck instead of mock is not going to help (and in fact I tried before looking at the ExUnit source.)

OK so my question is not how to mock Enum, because apparently the answer to that will be pretty difficult…
My question is: how to check if Enum.at is called - without using mock?

(I can get around this by creating a new module with a function that just proxies calls to Enum.at - but this seems really unacceptable because it will slow down run-time code.)

josevalim · September 21, 2016, 10:04am

Why do you need to check if Enum is called? If you do so, you will be asserting how the function is implemented and not how the function behaves. That’s undesirable because you may change the implementation, for example by calling Enum.fetch instead of Enum.at, and now your tests break even when the code exhibits the same behaviour.

The question you want to ask yourself is which properties you expect the code to exhibit by calling Enum.at and how you can assert those properties through testing without relying on mocks.

OvermindDL1 · September 21, 2016, 2:27pm

It sounds like he’s coming from what I’ve seen some javascript testing libraries do, which is to ‘hook’ certain other calls and confirm they are called with specific input.

I personally hate that style as it is testing internals instead of interfaces, but I can see the reasoning behind it in non-pure languages.

epailty · September 21, 2016, 10:22pm

You are right, and in this case - in this function, I don’t need to check the implementation. This is the most simple example which I posted. In other functions, part of the behaviour required is efficiency. Needlessly calling Enum.map - for example and recreating data, would be a bug.

Here’s a contrived example:

input data is a list:

["a", ["b"]]

Expected format is list(list(String.t))

The function will pass through the list unchanged if its conformant. If not, it will fix it and log a warning.
So in this case it will return:

[["a"], ["b"]]

If its passed something like this, however:

[["a"],["b"],["c"]]

then it should just return the list.
A big question in my mind is of course that the cost of checking (which would use Enum.all?) is probably not much different to using Enum.map for every list in nay case ???)

This is only a concern to me because I want my functions that work on the lists to be able to assume the list format is as expected, and only check it as it comes in to the system (from 3rd parties who do often get it wrong, and as a service provider, we let them know, but don’t want to halt the machinery by being pedantic if we can do a few fixes here and there.

Perhaps the answer is to let the data in unchanged, let it loose on the handlers - who assume the data is conformant - and will raise errors if not… wrap the entry with try and then if it fails, log the error and attempt to fix the data and then retry…

I have probably partially answered my own question above (try) but also, just realised I have been amazingly dumb because when it “fixes” the data, it logs a warning - so all I need to do is to mock Logger.bare_log and refute called Logger.bare_log…

However, from a best practices point of view, whats your opinion on using try as suggested above? I am kind of thinking of try as being better to let errors crash the process - because they represent bugs rather than bad input data. I am thinking this way because of the way exceptions mess with lazy code…

KallDrexx · September 22, 2016, 1:42am

That is extremely vague behavior. Just because it requires it to be efficient doesn’t mean it requires the use of Enum.any. What happens when you figure out a way to make it more “efficient” that doesn’t require Enum.any? Why does Enum.any need to be tested and not any other part of the function’s algorithm?

Furthermore, efficiency in this context makes no sense. Do you mean memory efficiency? if so then you should be iterating over your function and checking the process heap size changes and using that as your benchmark (which only requires observable behaviour not implementation details). Do you mean latency efficiency? if so then you should be measuring how long the function takes to run and use that as a performance benchmark that gets recorded and logged to know when it gets too far outside of an acceptable range.

The idea that recreating data would be a bug doesn’t make any sense. Under the hood the BEAM VM is extremely efficient and one of the advantages of immutability is the VM under the hood re-uses common data from within a single process, meaning that unless it’s creating the data from scratch (and even then) then it’s not actual “recreating” the data but reusing it under the hood.

This means that Enum.Map can be efficient and you would need to prove that an extra “Any” call would actually increase efficiency rather than decrease it (especially in the 99% case). In fact, having to call Enum.any when not needed may make the whole application more inefficient in the long run unless most real life inputs will cause Enum.Any to bail out early. Remember that Enum.Any has to iterate through the whole list to find the first element matching, which means that every time the function is called with 1000 elements and 0 matches then you’ve looked at 1000 individual items prior to moving on.

So none of what you posted describes why you need to test Enum.Any rather than for a given set of inputs you should get a specific output, and it sounds like you are not only trying to prematurely optimize, but test your premature optimization.

NobbZ · September 22, 2016, 5:25am

While checking if the list is valid, you are touching every element. The last one needs to get repaired, you hand it over to your repair function, it will touch every element again before it finds that invalid one again.

Please do a benchmark if repairing valid data is really that slower than checking and then repair only if necessary. Also, if you are able to repair invalid data, why don’t handle invalid data directly broken as it is? You can’t use Enum then, probably, but your own set of functions.

The case you have shown here is easy, but how to handle ["a", "b"]? Is it [["a", "b"]] or [["a"], ["b"]]? And don’t tell me that the inner list will not have more than a single element…

Personally I do prefer to reject broken input and ask the source for the correct data set, or even simpler, drop it and crash!

KallDrexx · September 22, 2016, 11:07am

Just for an actual answer, if you still decide that you really want to check if Enum.Any is called (or any other algorithm) you’ll have to abstract that out. That means either having your function take a function (or a module name) as a parameter, and then call that instead of Enum.Any directly.

In your test code you would then pass a test function (instead of the Enum.Any wrapped call) and that will either let you test a change in result if your function returns true or false, or you can use an Agent to store a flag that it was called and verify that.

I don’t know of any other way, and this can easily balloon out of control. It also adds integration complexity because you have to make sure the right function is passed in by all the callers.

epailty · September 22, 2016, 11:08pm

Thanks for your answer, the list format is actually a lot more complex than how I presented it
But I can do some benchmarking and may end up writing a specialised tail recursion function which does the fix really efficiently so that its not worth while checking first,