ExUnit / Data-Driven Tests / Macros

AndyL · October 8, 2016, 4:41pm

Elixir newbie here working on a stemmer for a document indexing engine. The stemming algorithm has dozens of steps with many tests per step. I’ve got a data-driven approach that is helpful, but I think it must be possible to be even more efficient with a little meta-programming. Here is an example of one of my test modules:

defmodule StemEx.StepsTest do

  use ExUnit.Case, async: true

  # ----- step1a -----
  
  step1a_vals = [ 
     ["caresses" , "caress"],
     ["ponies"   , "poni"  ],
     ["ties"     , "ti"    ],
     ["caress"   , "caress"],
     ["cats"     , "cat"   ],
  ]

  for [input, output] <- step1a_vals do
    @input  input
    @output output
    test "step1a: '#{input}' has output of '#{output}'" do
      assert StemEx.Steps.step1a(@input) == @output
    end
  end

  # ----- step1b -----

  step1b_vals = [
     ["feed"     , "feed"    ],
     ["agreed"   , "agree"   ],
     ["plastered", "plaster" ],
     ["bled"     , "bled"    ],
     ["motoring" , "motoring"],
     ["sing"     , "sing"    ],
  ]

  for [input, output] <- step1b_vals do
    @input  input
    @output output
    test "step1b: '#{input}' has output of '#{output}'" do
      assert StemEx.Steps.step1b(@input) == @output
    end
  end
end

This works great but I don’t like to repetitive boilerplate in the for blocks. Ideally I could reduce the blocks to a single call - something like test_loop_for("step1b", step1b_vals).

Would it be possible to use a defmacro to auto-magically generate the loop code? Can someone post an example?? Thanks in advance.

sheharyarn · October 8, 2016, 7:41pm

How about Enum.each/2? I do something like this myself. Here’s how it would look:

defmodule StemEx.StepsTest do
  use ExUnit.Case, async: true

  tests = [
    step1a: [
      ["caresses" , "caress"],
      ["ponies"   , "poni"  ],
      ["ties"     , "ti"    ],
      ["caress"   , "caress"],
      ["cats"     , "cat"   ],
    ],

    step1b: [
      ["feed"     , "feed"    ],
      ["agreed"   , "agree"   ],
      ["plastered", "plaster" ],
      ["bled"     , "bled"    ],
      ["motoring" , "motoring"],
      ["sing"     , "sing"    ],
    ]
  ]

  Enum.each tests, fn {name, values} ->
    @name name

    for [input, output] <- values do
      @input  input
      @output output

      test "#{@name}: '#{input}' has output of '#{output}'" do
        result = apply(StemEx.Steps, @name, [@input])
        assert result == @output
      end
    end
  end

end

AndyL · October 8, 2016, 8:37pm

I ended up with a solution that looks much like your example. Simpler than metaprogramming (maybe this is metaprogramming?!) Not sure exactly what is going on under the covers - but it seems to give a lot of flexibility for data-driven tests. The tests run very fast and the failure messages are easy to decipher. Looks like you can load datasets from the filesystem - I’ve got a files with ~22K test examples. Just saved a lot of typing!

defmodule StemEx.StepsTest do

  use ExUnit.Case, async: true

  functions = %{
    step1a: &StemEx.Steps.step1a/1  ,
    step1b: &StemEx.Steps.step1b/1  ,
  }
  
  values = [
     [:step1a  ,  "caresses"  , "caress"  ],
     [:step1a  ,  "ponies"    , "poni"    ],
     [:step1a  ,  "ties"      , "ti"      ],
     [:step1a  ,  "caress"    , "caress"  ],
     [:step1a  ,  "cats"      , "cat"     ],

     [:step1b  ,  "feed"      , "feed"    ],
     [:step1b  ,  "agreed"    , "agree"   ],
     [:step1b  ,  "plastered" , "plaster" ],
     [:step1b  ,  "bled"      , "bled"    ],
     [:step1b  ,  "motoring"  , "motoring"],
     [:step1b  ,  "sing"      , "sing"    ],
  ]

  for [label, input, output] <- values do
    @label  label
    @input  input
    @output output
    @func   functions[@label]
    test "#{label}: '#{input}' has output of '#{output}'" do
      assert @func.(@input) == @output
    end
  end
end

AndyL · October 9, 2016, 8:32pm

Last note on this thread: this ‘dynamic generation’ approach didn’t work with 22K examples - it times out and fails to run. Here is the approach I used - runs 22K examples in 0.3 seconds - very acceptable - and the error messages are quite good.

defmodule StemExTest do
  use ExUnit.Case, async: true

  doctest StemEx

  test "stem transformations" do
    input_list  = String.split(File.read!("test/data/voc.txt")   , "\n")
    output_list = String.split(File.read!("test/data/output.txt"), "\n")
    io_list     = List.zip([input_list, output_list])
    for {input, output} <- io_list do
      assert StemEx.stem(input) == output
    end
  end
end

axelson · July 11, 2017, 4:12am

For anyone else that comes across this as an example of data-driven tests and wants to improve their test reporting output you can do something like the following:

test "stem transformations" do
  input_list  = String.split(File.read!("test/data/voc.txt")   , "\n")
  output_list = String.split(File.read!("test/data/output.txt"), "\n")
  io_list     = List.zip([input_list, output_list])
  for {input, output} <- io_list do
    actual = StemEx.stem(input)

    message = """
    StemEx.stem(
      input = #{inspect input}
    )
    was expected to output #{output} but instead was #{actual}
    """

    assert actual == output, message
  end
end

This will give you pretty output like:

1) test stem transformations (StemExTest)
   test/stem_ex_test.exs:10
   StemExTest.stem(
     input = "some input"
   )
   was expected to equal false but instead was true

Whereas the default output show something more like:

Assertion with == failed
code:  StemExTest.stem(input) == output
left:  true
right: false

Which isn’t that helpful since you can’t tell what the input is without adding some IO.puts into your test code. There’s probably more elegant ways to accomplish what I’m outlining but I think this is at least a good start!