How to refactor and test large functions in a Phoenix context?

Hi. I am busy building my first Phoenix web app but I got stuck with the following problem: my context functions seem to be getting enormous and I do not have the functional programming experience or knowledge to refactor and test them. The particular code that I am struggling with is the functions involved in uploading files to the website. Here is the relevant functions:

def upload_file(%Plug.Upload{
            filename: filename,
            path: tmp_path,
            content_type: mime
        }) do
        Repo.transaction fn ->
            with hash <- hash_file(tmp_path),
                {:ok, storename} <- generate_file_name(hash, mime),
                {:ok, location} <- filestore().store(
                    :uploads,
                    tmp_path,
                    storename
                ),
                {:ok, size} <- file_size(tmp_path),
                {:ok, file} <- create_file(%{
                    name: filename,
                    content_type: mime,
                    size: size,
                    location: location,
                    hash: hash
                }) do
            file #transaction returns {:ok, _} already
          else
            {:error, reason} -> Repo.rollback(reason)
          end
        end
    end

    defp filestore() do
        Application.get_env(:boomhuis, :filestore)
        |> Map.get(:cabinet, Filestore.Local)
    end

    defp hash_file(path) do
          File.stream!(path, [], 2048)
          |> Enum.reduce(
              :crypto.hash_init(:sha256),
              &(:crypto.hash_update(&2, &1))
          )
          |> :crypto.hash_final()
          |> Base.encode16()
          |> String.downcase()
    end

    defp generate_file_name(hash, content_type) do
        case MIME.valid?(content_type) do
            true ->
                extension =
                    content_type
                    |> MIME.extensions()
                    |> Enum.at(0)

                {:ok, hash <> "." <> extension}

            false ->
                {:error, :unsupported}
        end
    end

    defp file_size(path) do
        case FUtils.stat(path) do
            {:ok, %FUtils.Stat{size: size}} ->
                {:ok, size}

            other ->
                other
        end
    end

I have a few concerns / questions:
1.) As this function is clearly doing many things, how can I refactor this code to make it more succinct and clean? Obviously I want to split it into multiple function clauses, but I do not know where to begin or how to do it.
2.) How should I test this function as it relies on a lot of different functionalities? In particular, how does one test Filestore part as it interacts with the filesystem?
3.) Is one of the problems of the testing difficulty the fact that the bottom functions are private? Should I make them public or not?
4.) How would you guys (the experts / better versed programmers) implement this?

Any advice, examples and resources are appreciated. Please excuse my questions as I am still a beginner Elixir programmer/hobbyist.

One thing you could do is extract the file-related functions into a new File module and make them public, and therefore more easily testable.

When creating modules that are very similar to the standard Elixir modules, my pattern is to put them in an Extra submodule, like MyApp.Extra.File. In a project, I’ll often have a few such modules, like Extra.File, Extra.String, Extra.Enum, etc. with functions that are pretty specific to my project. Then in the calling code, I alias MyApp.Extra so that calling them looks like Extra.File.hash(path) so I can distinguish between calls to my module and the built-in File module. (I’d love to hear other peoples’ patterns for this sort of thing.)

As far as testing interactions with the filesystem, fixture files often come in handy. For example, you could create a fixture file called test/fixtures/207_byte_file.txt and then test it like assert Extra.File.size("test/fixtures/207_byte_file.txt") == 207.

2 Likes

Good ideas from @eahanson. A couple other notes to consider:

  • Given you are already use Ecto, you might consider using Ecto.Multi (https://hexdocs.pm/ecto/Ecto.Multi.html). Multis will automatically be rolled back when {:error, ...} is returned in any step. I usually reach for it with operations involving more than one side effect. I find it to be a clean and consistent API once you get over the additional verbosity. Not all of those steps necessarily need to be their own Multi, but whatever Multis are used as steps need to either return {:ok, ...} or {:error, ...}.
  • For testing, you might consider using something like Mox (https://hexdocs.pm/mox/Mox.html) to mock the “filestore”. This will allow a different store to be used in production (ex. S3) and test (ex. local) environments. This probably means that your reference to Application.get_env(:boomhuis, :filestore) returns something that has the store function. Currently, this is piped to |> Map.get(:cabinet, Filestore.Local) which seems environment specific.
1 Like

Thank you @eahanson! I will definitely extract that extra functionality into a module of its own for testability. Does it not seem wrong to you that my upload_file function is so long? Or is that not a problem? Also, I currently do use a fixture file, which I thought is wrong as does not allow my test to be run in async mode because it could cause interference with other tests. If however a file fixture is acceptable, where should one put the code to clean up after the test? For instance, my upload_file copies the file from a temporary path (for the test this is the path to the file fixture) to an uploads folder. Where should I place the code that removes the file from the uploads folder after the test?

Thanks @baldwindavid! I will definitely look into both Multi and the Mox package.

It doesn’t bother me personally, but if it bothers you, you could experiment with different ways of splitting it up to see if there’s something you’re more happy with. To get a sense of how more experienced Elixir developers structure their code, you could read through open source Elixir codebases.

In my example above, I meant that you’d create a 207 (or whatever) byte text file and check it in. I can’t think of why there would be issues with accessing a static text file from an async test. For the upload test, reading a fixture file should also have no issues in an async test.

Your upload function could upload into a temp directory (via something like Briefly) and then your operating system would take care of deleting the files.

Alternatively, your tests you could use on_exit to clean up.

Thank you very much @eahanson. I appreciate your time and effort.