FN75W

FN75W

Avoiding repeating and hardcoding keys across module definitions

I have a large text file that I need to read line-by-line and collect various statistics about its contents, e.g. the amount of certain phrases, words, characters, etc. Then, I need to turn those statistics into a neat human-readable string.

My current code roughly looks like this:

defmodule MyStruct do
  @moduledoc false
  defstruct total: 0, foo: 0, bar: 0, baz: 0
end
defmodule Parser do
  @moduledoc false
  defp translations do
    %{
      "Example String 1" => %{translation: :foo},
      "Example String 2" => %{translation: :bar}
      # Some complex strings that optionally may need to be further parsed with regexps, etc., hence maps as values
      # ...
    }
  end

  def parse_file do
    "large_text_file.txt"
    |> File.stream!()
    |> Enum.reduce(%MyStruct{}, fn line, acc -> maybe_update_acc(line, acc) end)
  end

  defp increment(map, key), do: Map.update(map, key, 1, &(&1 + 1))

  defp find_tl(string) do
    Enum.find_value(translations(), fn {k, v} -> if String.contains?(string, k), do: v.translation end)
  end

  defp maybe_update_acc(line, acc) do
    tl = find_tl(line)

    acc
    |> increment(:total)
    |> increment(tl)
    # Irrelevant logic for regexps and other conditions to update the acc further
    # ...
  end
end
defmodule Stringifier do
  @moduledoc false
  defp translations do
    # Is a list because the result needs to be ordered in a certain way
    [
      total: "Everything, duh",
      foo: "Those fabulous foos",
      bar: "Them beautiful bars"
      # ...
    ]
  end

  def translate(%MyStruct{} = struct) do
    Enum.map(translations(), fn {key, string} ->
      struct_value = struct[key]
      if struct_value, do: translate_key(key, string, struct_value)
    end)
  end

  def translate_key(key, string, value) do
    case key do
      # Keys that need unique formatting or other logic
      # ...

      key -> "#{string} - #{value}"
    end
  end
end

This approach has some serious flaws. The most obvious one is that every time I need to add a new string to collect data about or remove/change an existing one, I need to edit my code in three places: the struct definition, the parser and the stringifier. This is bad. If I forget about one, in the best case scenario my application crashes, in the worst case scenario I get a completely wrong result.

However, I can’t come up with a better way to do things. Maybe I could make a single keyword list like this, [foo: %{term: "Example String 1", string: "Those fabulous foos"}], which will give me the required ordering and will allow to unify the translations from Parser and Stringifier modules.

At a first glance, this probably won’t add too much strain on my find_tl/1 function, but because I also have “meta” keys (like :total) that don’t resolve to any terms from the file, this, in fact, will actually increase the complexion. Usually I don’t care about such overhead if it means for better readability, but the text file can have billions of lines, so I need all the speed I can get.

Some “meta” keys need to be in the front, some in the back, some in the middle, so simply prepending/appending them in the Stringifier module is not an option.

Besides, this would mean that I still will have to edit the struct. Though I guess this is probably not that important, because I don’t do any struct-related compile-time checks.

Marked As Solved

al2o3cr

al2o3cr

I don’t see the struct adding any value in this case; a plain map would serve the same purpose but save editing one place when adding a new value to collect.

A good way to spot this is that both reading or writing of the counts uses map operations ([] and Map.update) versus compile-time references like .some_field.

I’d recommend a list of two-element tuples for Parser’s translations:

  @moduledoc false
  defp translations do
    [
      {"Example String 1", :foo],
      {"Example String 2", :bar],
      # Some complex strings that optionally may need to be further parsed with regexps, etc., hence bigger tuples
      # {~r/some weird thing/, extra_args, :baz}
    ]
  end

This ensures that functions like Enum.find_value(translations(), ...) always check to see if they match in the same order, ensuring that overlapping definitions like:

      {"Example String", :short_foo],
      {"Example String 1", :foo],

always give the same result.


Stringifier has a similar-but different list just like in your already-posted code, which is slightly easier to write since the first element of all the tuples is an atom.

Again the list ensures that the result iterates over the keys in the correct (& consistent) order.


One place to NOT use a list is for keeping the counts! For instance:

    # CAUTION: PERFORMANCE HAZARD BELOW
    "large_text_file.txt"
    |> File.stream!()
    |> Enum.reduce([total: 0], fn line, acc ->
      tl = find_tl(line)
      acc
      |> Keyword.update(:total, 1, &(&1+1))
      |> Keyword.update(tl, 1, &(&1+1))
    end)

Unlike in the previous cases, the key order of the result isn’t something we care about - and maintaining it means that Keyword.update is slower than Map.update by a factor roughly the size of acc. This kind of random access is a good indicator that a map is the right choice.


This still leaves two places that need editing when adding a new value to be counted, but IMO that’s not unreasonable:

  • one place says “this is how to collect this count”
  • the other place says “this is how to display this count”

Fusing the two together into one list would force one of the orderings discussed above to match the other; either the display order would need to exactly follow the matching order, or the other way around.

Also Liked

FN75W

FN75W

I don’t see the struct adding any value in this case; a plain map would serve the same purpose but save editing one place when adding a new value to collect.

True. In this particular case, I only use a struct because it, in my opinion, improves code readability by making it easier to understand what’s being passed around. Other than that, it serves pretty much no function in my code. Once again, no compile-time references, and I don’t need to implement any protocols or whatever. I’m not sure if defining an empty struct just for the sake of readability is not a code smell, so I’ll probably just remove the struct altogether.

This ensures that functions like Enum.find_value(translations(), ...) always check to see if they match in the same order, ensuring that overlapping definitions like:

Sorry, I should’ve probably mentioned that such overlapping strings aren’t possible in the file, so it’s not a concern and there’s no reason to use a list of tuples. But I’ll take note of your suggestion in case things change, though, or for other projects, so thank you anyway.

This still leaves two places that need editing when adding a new value to be counted, but IMO that’s not unreasonable

Yeah, I guess that’s perfectly reasonable. I was so obsessed with attempting to make all those translations, uh, “monolithic”, that I completely forgot that I’m doing two completely different things in two completely different parts of my application. Thank you for reminding me of that and for essentially clearing my doubts on whether such repetitions are a bad practice or not.

Seems like the correct way to tackle this problem would be to simply write a test that checks if all translations from the parser are present in the stringifier to ensure that there are no typos or whatnot.

Where Next?

Popular in Questions Top

aadeshere1
I have a another noob question about loop. Since elixir is immutable, while loop is not directly possible. total = 10 while total != 0 ...
New
9mm
I am constructing a JSON object (map) and I need to conditionally set a field. I’m trying to write proper elixir-way code… and I’m at a l...
New
lastday4you
I wanted to check elixir version in phoenix because i found that my elixir is 1.5 but when i use Enum.chunk_by it said the function is un...
New
skosch
To my knowledge, put_in, Map.update etc. all have the one limitation of not automatically creating intermediate keys when needed (for exa...
New
ovidiubadita
Hey all, I discovered Elixir and I love it. I always wanted to learn a functional programming and I intended to go for Haskell, but afte...
New
jaysoifer
Is there a way to rollback a specific migration and only that one (“skipping” all the other ones)? Would mix ecto.rollback -v 200809061...
New
vegabook
I’m brand new to Phoenix and I have stripped one of the demo applications to the bone. I just want to get an svg up on the screen. Here i...
New
pmjoe
I have a relationship of love and hate with Elixir. Lots of things are just absolutely right, but there are some things that are kind of ...
New
lucidguppy
I have a super simple question about elixir - how would I take a file like this foo bar baz and output a new file that enumerates th...
New
PeterCarter
There are pre-rolled solutions for other frameworks that do work. However, Phoenix does not seem to have these. Have people had good expe...
New

Other popular topics Top

albydarned
Hello all! I am typing this post from my new MacBook Pro with the M1 chip. I’m loving it so far, and will probably use it as my daily dr...
New
lessless
I believe there are people here who are dealing with CSV files import on the daily basis, and since Excel is a really popular tool there ...
New
gshaw
What is the idiomatic way of matching for not nil in Elixir? E.g., First way: defp halt_if_not_signed_in(conn, signed_in_account) when...
New
shahryarjb
Hello, I have map which I want to convert it to string like this: the map: %{last_name: "tavakkoli", name: "shahryar"} the string I ne...
New
JorisKok
I have a server on AWS, and was running a load test using artillery. When looking at the Phoenix dashboard I see the Ports going to 100% ...
New
AngeloChecked
What learn first? Rust or Elixir Hi Elixir community! I’m here because i want learn a new language. I’m a junior developer and mainly i ...
New
gausby
I asked this very same question on twitter and got some interesting feedback, but I thought it would be a good question to ask here as we...
1207 39297 209
New
jason.o
In the code below, if the create action is not set to accept “extra_key” as an input, it errors out with a message shown above. Is there ...
New
Qqwy
Update: How to use the Blogs & Podcasts section You can post links to your blog posts or podcasts either in one of the Official Blog...
3271 126479 1222
New
vonH
In asking this question I am more interested about the expressiveness of the language itself and less concerned about the availability of...
New

We're in Beta

About us Mission Statement