Split Thread: Fixtures vs Factories in Elixir

hauleth · August 10, 2020, 7:51pm

Use factories and not fixtures

Gods, I cannot even express how much I disagree with it.

BartOtten · August 12, 2020, 10:11pm

Please try; otherwise your comment is nothing but negative energy.

halostatue · August 14, 2020, 3:14am

Factories hide a lot of complexity and make it really easy for you to test toward positive cases (it also typically uses the same flows that you should be testing). With fixtures, you can pretty much ensure that you have data that looks a lot more like what you’re going to see in real data. Factories are far more likely to result in subtle logic bugs than fixtures, because they’re not typically just applied data.

I haven’t found a good way to do fixtures in Elixir and have been tempted more than once to try to port Rails fixtures over for Ecto (but don’t have the time), but I enforced the use of fixtures over factories on my last two Rails projects and we had more tests that ran faster than the last time I worked with a Rails project that used factories. The factories themselves introduced a 30%+ slowdown in test running.

I’m using simplified factories in my current Elixir projects, but I hate them. They have been the source of a half-dozen test bugs across the projects, and result in more churn than I would like.

In my previous Rails projects, when I needed to test a particular set of “empty” table behaviours, I would just truncate the table in question and build from there. But that only applies to the first few times your code runs in the real world, most of the time.

tcoopman · August 14, 2020, 4:57am

Interesting. Could you maybe give a code example of what the difference is exactly between a fixture and a factory and why fixtures are better?

I’ve been doing some quick googling but they all point into the direction that factories are better because fixtures hide the data you are using in the tests

ityonemo · August 14, 2020, 6:39am

am I misunderstanding fixtures? Usually I have something like this:

test/support/my_data_fixture.ex

defmodule MyAppTest.MyDataFixture do
  @default_fields [foo: "bar", baz: nil]
  def new(supplied_fields) do
    fields = Keyword.merge(supplied_fields, @default_fields)

    MyData
    |> struct(fields)
    |> apply_to_db_or_other_state()
  end
end

halostatue · August 14, 2020, 2:07pm

Interesting. Could you maybe give a code example of what the difference is exactly between a fixture and a factory and why fixtures are better?

It’s hard, because part of the point is that fixtures aren’t code. Fixtures are a set of data that represents a baseline set of data that covers a large portion of your tests for the unit under test. If a particular unit could benefit from similar but not quite the same data, you can manipulate the fixture data (after it’s loaded) in the test (or test set, e.g., a describe block).

With Rails fixtures, you define the data as YAML files, which the FixtureSet code turns into SQL inserts into the database not using your model’s new/save functions. (It uses your model to help determine relations, but it sidesteps all of the validation and other business logic.) Because it’s a YAML file, you might have something like:

# users.yml
luke:
  first_name: Luke
  last_name: Skywalker
  title: Jedi Knight

You can refer to this fixture as users(:luke), which is pretty much the same as create(:user, :luke) would be in one of the factory providers…without any hidden behaviour that might be present in the code behind either the user or user_luke factories.

There’s more to it than that (fixture YAML files are parsed through ERB prior to processing, which means you can generate large amounts of data as fixtures if you need to do so through loops).

Essentially, though, the point is that it’s named (mostly) static data that is loaded through direct interfaces rather than through your application code in the first place.

I’ve been doing some quick googling but they all point into the direction that factories are better because fixtures hide the data you are using in the tests

Yeah. That’s nonsense promulgated by people who don’t understand fixture data and the value of having a small but reasonable set of data loaded quickly into your database. Let’s also be clear, betterspecs is also lying about what a fixture is in the example provided. Look at the linked issue, and you’ll see a ton of discussion about it in the Rails context, and proper use of fixtures looks nothing like the example provided on the betterspecs page.

I’m talking about database fixtures here, but fixtures are often used by the very people who deride them in contexts other than the database:

If you’re doing CSV parser tests, the test files themselves are fixtures. The only time you’d write a CSV to disk and then read it would be to test the roundtrip capability of your CSV generator and parser, for example.
If you’re doing payload tests, if you have VCR files or JSON response files that represent a payload…that’s fixtures.

With fixtures, you understand your data much better because you have to think about it in terms of the data, not in terms of your objects (don’t get me started on a rant about OO modelling vs data modelling and why you can’t do the former if you don’t understand the latter).

halostatue · August 14, 2020, 2:10pm

That’s basically a factory. A simple factory (the one that I am using in my current Elixir codebase is more complex, but not that much more complex), but it’s not a fixture. In database terms, you’d create some SQL statements to seed your test data (usually before your test transactions start, but in Ecto you’d need to run that in the same process, so…) and then manipulate the records that are present to get them into the shape you need for a particular test—but the default set of data loaded would cover > 80% of your tests (no, really, and it would only take 2–3 records per table to do that most of the time).

ityonemo · August 14, 2020, 2:52pm

Edit: deleted since i didn’t see @halostatue’s response above the response to mine.

ityonemo · August 14, 2020, 3:12pm

I did some reading on fixtures and I think it wouldn’t be so hard to write something.

test/support/fixture.ex

defmodule MyAppTest.MyFixture do
  require EEx

  file = Path.join(__DIR__, "fixtures/my_fixture.yaml")
  EEx.function_from_file(:defp, :to_rows, file, [:assigns])

  @spec load(pos_integer, (pos_integer -> %{assigns: map})) :: :ok
  def load(count, generator) do
    0..count-1
    |> Enum.flat_map(fn idx ->
      idx
      |> generator.()
      |> to_rows
      |> Yaml.from_string  # this is not the correct function name, but I don't usually use yaml.
    end)
    |> Enum.each(&add_to_database/1)
  end

  defp add_to_database(map) do
    ...
  end
end

lpil · August 14, 2020, 3:15pm

Another argument in favour of fixtures over factories is that they are much faster as you don’t need to insert a bunch of data into the database at the start of each test, it is inserted once at the start of the suite.

edit: Oh! I now see @halostatue already covered that!

LostKobrakai · August 14, 2020, 3:35pm

I’m not sure if trading the need to handle cleanup / setup each time is worth the time saved by not preparing data for each test individually.

lpil · August 14, 2020, 4:19pm

There’s no clean up to do, Ecto handled that automatically the same way as with factories.

You don’t need to write any setup or teardown code with fixtures, while with factories you write the setup.

halostatue · August 14, 2020, 4:24pm

Something like that might work, yes. There’s a bit more work to it than just that, because Rails fixtures also give you a way to refer to the records from your tests. I’ve got some ideas, but no time to actually build this out, but I think it would entirely be possible to get something like Rails fixtures working with Ecto.

I’d probably avoid using YAML (to avoid an unusual dependency), but don’t have a better format offhand (JSON5 would be useful, but I don’t think that Jason can parse JSON5) except maybe .exs files that are supposed to produce an array of maps.

LostKobrakai · August 14, 2020, 4:26pm

That does setup for each test though. Not once for the whole test suite run.

ityonemo · August 14, 2020, 4:31pm

depends on where you call it. It’s in test/support, indicating that it’s compiled before test_helper.exs, so you could call it there. The fun is just there for a wee bit of flexbility. You’d be expected to have more than one object in there (as indicated by use of flat_map)

halostatue · August 14, 2020, 4:36pm

The thing is, it’s ultimately not an either/or.

In some cases, it’s better to use direct configuration—usually when you’re testing what happens when your table is empty or you want it empty to make it easier to reason about what changed. When you’re using fixtures, you’d truncate the table(s) in question and create new database records as you are currently talking about. It’s an extra step, but a small one.

I’ve got code that requires that data in ~6 tables are set up with the correct relationships—and I run ~10 tests on that code. With fixtures that data is configured once and is then available for all ten tests. With per-test configuration, most of your test is configuration, not assertion. (Yes, you can do that in a setup block in describe; there are cases when you want that data available for tests that don’t fit into that describe without repeating yourself.)

You can do the same with factories, but factories hide the complexity behind typically one function call (e.g., if you need to set up a user, credentials, and profile record for each user, your create(:user) factory function might actually create three records behind the scenes and you don’t really know/remember.

Done properly, fixtures let you set up a minimal amount of meaningful data with no ceremony and explicit configuration. It usually far better reflects the state of your application’s database will be most of the time (because your database isn’t going to be empty for long).

Pure functions can and should be tested without touching the database. Using fixtures when you have any complexity to your data at all is going to be far easier to reason about than factories or per-test (or per-group) setup.

lpil · August 14, 2020, 4:48pm

That’s right, all tests start with the same dataset which is inserted at the start of the suite. Ecto rolls back changes after each test, it does not truncate the database.

tcoopman · August 14, 2020, 5:14pm

If I’m reading everything correctly it seems that the only real difference between factories and fixtures is that fixtures are just the “real data” and factories go through some domain logic.
The other differences mentioned above seem things you can do with both kind of data creation.

For me this means that you probably want to use both factories and fixtures depending on what kind of test you’re writing. A factory adds more complexity because you go through the domain logic, but this prevents you from creating invalid data. A fixture is more standalone, so easier to reason about, but you have the risk that you have incorrect fixtures.

josevalim · August 14, 2020, 5:31pm

I will add some cents to the discussion that I haven’t seen mentioned yet. Below, I will be talking about database-backed fixtures (and not necessarily fixtures as a whole):

One of the downsides of fixtures is that it is shared data across all of your tests. So sometimes you will change your fixtures, because you need new data to be used in some new tests, and other tests may now fail. This gets worse if projects define a large amount of fixture data. The correct approach, as mentioned earlier, is to define fixtures for a basic feature set that will be shared across all tests.
The particular fixtures implementation in Rails caused some issues because referential integrity and data validations are disabled or not really used in Rails, so you could easily end-up with invalid data or data that would never exist in the database through the regular application workload.
It is actually super straight-forward to have fixtures in Ecto: just write to the database in your test_helper.exs before you start the SQL sandbox.

Finally, a summary that was given to me a long time ago that explains when to use fixtures vs factories well is:

Use fixtures for the setup data (i.e. the data you need to define before you can start writing your test)
Use factories for the data under test, especially because you want the data being tested close to the test itself

In my opinion, factories are actually easier to get started as they require less discipline (with bigger costs in the long term), while fixtures are really easy to mess up at the beginning (but pay off if well-structured).

halostatue · August 14, 2020, 8:23pm

Spot on, Jose. I really came to love the Rails fixtures for their speed, and the referential integrity stuff was only slightly annoying for me (I always use FKs at the database level, so the only time we ran into any sort of problem with this was with polymorphic relations, which we used factory-like functions for).

If I ever make an Ecto.Fixtures-type library, it’ll be set up to work against the database tables directly as opposed to Ecto schema modules. That would force you to think in terms of your underlying data model. The hard part is going to be, as you say, the referential integrity part, and then figuring out a way to make sure that the inserted data is readily available using similar naming conventions as the Rails ones. Maybe an Agent that stores {table, name} => pk values so that you could say something like Repo.get(User, Ecto.Fixtures.id("users", "luke")).

But that’s for a time when I, well, actually have time.