FN75W

FN75W

Avoiding repeating and hardcoding keys across module definitions

I have a large text file that I need to read line-by-line and collect various statistics about its contents, e.g. the amount of certain phrases, words, characters, etc. Then, I need to turn those statistics into a neat human-readable string.

My current code roughly looks like this:

defmodule MyStruct do
  @moduledoc false
  defstruct total: 0, foo: 0, bar: 0, baz: 0
end
defmodule Parser do
  @moduledoc false
  defp translations do
    %{
      "Example String 1" => %{translation: :foo},
      "Example String 2" => %{translation: :bar}
      # Some complex strings that optionally may need to be further parsed with regexps, etc., hence maps as values
      # ...
    }
  end

  def parse_file do
    "large_text_file.txt"
    |> File.stream!()
    |> Enum.reduce(%MyStruct{}, fn line, acc -> maybe_update_acc(line, acc) end)
  end

  defp increment(map, key), do: Map.update(map, key, 1, &(&1 + 1))

  defp find_tl(string) do
    Enum.find_value(translations(), fn {k, v} -> if String.contains?(string, k), do: v.translation end)
  end

  defp maybe_update_acc(line, acc) do
    tl = find_tl(line)

    acc
    |> increment(:total)
    |> increment(tl)
    # Irrelevant logic for regexps and other conditions to update the acc further
    # ...
  end
end
defmodule Stringifier do
  @moduledoc false
  defp translations do
    # Is a list because the result needs to be ordered in a certain way
    [
      total: "Everything, duh",
      foo: "Those fabulous foos",
      bar: "Them beautiful bars"
      # ...
    ]
  end

  def translate(%MyStruct{} = struct) do
    Enum.map(translations(), fn {key, string} ->
      struct_value = struct[key]
      if struct_value, do: translate_key(key, string, struct_value)
    end)
  end

  def translate_key(key, string, value) do
    case key do
      # Keys that need unique formatting or other logic
      # ...

      key -> "#{string} - #{value}"
    end
  end
end

This approach has some serious flaws. The most obvious one is that every time I need to add a new string to collect data about or remove/change an existing one, I need to edit my code in three places: the struct definition, the parser and the stringifier. This is bad. If I forget about one, in the best case scenario my application crashes, in the worst case scenario I get a completely wrong result.

However, I can’t come up with a better way to do things. Maybe I could make a single keyword list like this, [foo: %{term: "Example String 1", string: "Those fabulous foos"}], which will give me the required ordering and will allow to unify the translations from Parser and Stringifier modules.

At a first glance, this probably won’t add too much strain on my find_tl/1 function, but because I also have “meta” keys (like :total) that don’t resolve to any terms from the file, this, in fact, will actually increase the complexion. Usually I don’t care about such overhead if it means for better readability, but the text file can have billions of lines, so I need all the speed I can get.

Some “meta” keys need to be in the front, some in the back, some in the middle, so simply prepending/appending them in the Stringifier module is not an option.

Besides, this would mean that I still will have to edit the struct. Though I guess this is probably not that important, because I don’t do any struct-related compile-time checks.

Marked As Solved

al2o3cr

al2o3cr

I don’t see the struct adding any value in this case; a plain map would serve the same purpose but save editing one place when adding a new value to collect.

A good way to spot this is that both reading or writing of the counts uses map operations ([] and Map.update) versus compile-time references like .some_field.

I’d recommend a list of two-element tuples for Parser’s translations:

  @moduledoc false
  defp translations do
    [
      {"Example String 1", :foo],
      {"Example String 2", :bar],
      # Some complex strings that optionally may need to be further parsed with regexps, etc., hence bigger tuples
      # {~r/some weird thing/, extra_args, :baz}
    ]
  end

This ensures that functions like Enum.find_value(translations(), ...) always check to see if they match in the same order, ensuring that overlapping definitions like:

      {"Example String", :short_foo],
      {"Example String 1", :foo],

always give the same result.


Stringifier has a similar-but different list just like in your already-posted code, which is slightly easier to write since the first element of all the tuples is an atom.

Again the list ensures that the result iterates over the keys in the correct (& consistent) order.


One place to NOT use a list is for keeping the counts! For instance:

    # CAUTION: PERFORMANCE HAZARD BELOW
    "large_text_file.txt"
    |> File.stream!()
    |> Enum.reduce([total: 0], fn line, acc ->
      tl = find_tl(line)
      acc
      |> Keyword.update(:total, 1, &(&1+1))
      |> Keyword.update(tl, 1, &(&1+1))
    end)

Unlike in the previous cases, the key order of the result isn’t something we care about - and maintaining it means that Keyword.update is slower than Map.update by a factor roughly the size of acc. This kind of random access is a good indicator that a map is the right choice.


This still leaves two places that need editing when adding a new value to be counted, but IMO that’s not unreasonable:

  • one place says “this is how to collect this count”
  • the other place says “this is how to display this count”

Fusing the two together into one list would force one of the orderings discussed above to match the other; either the display order would need to exactly follow the matching order, or the other way around.

Also Liked

FN75W

FN75W

I don’t see the struct adding any value in this case; a plain map would serve the same purpose but save editing one place when adding a new value to collect.

True. In this particular case, I only use a struct because it, in my opinion, improves code readability by making it easier to understand what’s being passed around. Other than that, it serves pretty much no function in my code. Once again, no compile-time references, and I don’t need to implement any protocols or whatever. I’m not sure if defining an empty struct just for the sake of readability is not a code smell, so I’ll probably just remove the struct altogether.

This ensures that functions like Enum.find_value(translations(), ...) always check to see if they match in the same order, ensuring that overlapping definitions like:

Sorry, I should’ve probably mentioned that such overlapping strings aren’t possible in the file, so it’s not a concern and there’s no reason to use a list of tuples. But I’ll take note of your suggestion in case things change, though, or for other projects, so thank you anyway.

This still leaves two places that need editing when adding a new value to be counted, but IMO that’s not unreasonable

Yeah, I guess that’s perfectly reasonable. I was so obsessed with attempting to make all those translations, uh, “monolithic”, that I completely forgot that I’m doing two completely different things in two completely different parts of my application. Thank you for reminding me of that and for essentially clearing my doubts on whether such repetitions are a bad practice or not.

Seems like the correct way to tackle this problem would be to simply write a test that checks if all translations from the parser are present in the stringifier to ensure that there are no typos or whatnot.

Where Next?

Popular in Questions Top

aadeshere1
I have a another noob question about loop. Since elixir is immutable, while loop is not directly possible. total = 10 while total != 0 ...
New
_russellb
I want to try my hand at web scraping. What tools/libraries do I need to use. I’m hoping to turn this into something professional so don’...
New
marius95
Hello everyone, I try to use an Javascript Event Handler in my root.html.leex file. Therefore I created a function in the app.js file: ...
New
lessless
I believe there are people here who are dealing with CSV files import on the daily basis, and since Excel is a really popular tool there ...
New
hariharasudhan94
lets say i have a sample like a = 20; b = 10; if (a > b) do {:ok, "a"} end if (a < b) do {:ok, b} end if (a == b) do {:ok, "equa...
New
baxterw3b
Hi guys, i’m new in the Elixir world, and i have to say, that i love it! i’m having some problem to understand anonymous functions with ...
New
itssasanka
Hi all, Trying to get some more clarity over utc_datetime and naive_datetime for Ecto: The documentation above suggests that while ...
New
rms.mrcs
Hi, I need to transform a list of numbers into a map where the keys are the indexes and the values are the original values of the list. ...
New
nsuchy
Hi. I’ve noticed that Windows Powershell has it’s own IEX command and you cannot access Elixir’s IEX due to the conflict. This isn’t a cr...
New
marick
I had some trouble figuring out how to make many-to-many associations work. Once I got it working, I wrote a blog post. Because I’m a nov...
New

Other popular topics Top

lessless
I believe there are people here who are dealing with CSV files import on the daily basis, and since Excel is a really popular tool there ...
New
johnnyicon
Hi all, I’ve just started learning Elixir and Phoenix Framework, so please pardon my n00bness at this stage. I’m trying to use Postgres...
New
jononomo
I am trying to figure out how Mix knows whether the environment is test, dev, or prod – where is this set? Thanks.
New
stefanluptak
Hello everybody, usually, I use a 29" ultra-wide monitor for VSCode which can easily accomodate explorer (files panel) + file with code ...
New
freewebwithme
Using vs code and installed ElixirLS: support and debugger. And I got an error popped up on start up says Failed to run ‘elixir’ comma...
New
RisingFromAshes
I’ve read in another post that it may be possible with a router helper - but I couldn’t find an appropriate one, and tbh, I’m still just ...
New
klo
Got a question about when to concat vs. prepending items to list then reversing to achieve appending. So i know lists boil down to [1 | ...
New
komlanvi
Hi everyone, I was playing with phoenix liveView but I run into an issue. I have a form and want to validate each input text when the te...
New
marick
I had some trouble figuring out how to make many-to-many associations work. Once I got it working, I wrote a blog post. Because I’m a nov...
New
jononomo
For some reason my phoenix channels are working for me in my local dev environment, but as soon as I deploy via Docker, I get a 403 error...
New

We're in Beta

About us Mission Statement