FN75W
Avoiding repeating and hardcoding keys across module definitions
I have a large text file that I need to read line-by-line and collect various statistics about its contents, e.g. the amount of certain phrases, words, characters, etc. Then, I need to turn those statistics into a neat human-readable string.
My current code roughly looks like this:
defmodule MyStruct do
@moduledoc false
defstruct total: 0, foo: 0, bar: 0, baz: 0
end
defmodule Parser do
@moduledoc false
defp translations do
%{
"Example String 1" => %{translation: :foo},
"Example String 2" => %{translation: :bar}
# Some complex strings that optionally may need to be further parsed with regexps, etc., hence maps as values
# ...
}
end
def parse_file do
"large_text_file.txt"
|> File.stream!()
|> Enum.reduce(%MyStruct{}, fn line, acc -> maybe_update_acc(line, acc) end)
end
defp increment(map, key), do: Map.update(map, key, 1, &(&1 + 1))
defp find_tl(string) do
Enum.find_value(translations(), fn {k, v} -> if String.contains?(string, k), do: v.translation end)
end
defp maybe_update_acc(line, acc) do
tl = find_tl(line)
acc
|> increment(:total)
|> increment(tl)
# Irrelevant logic for regexps and other conditions to update the acc further
# ...
end
end
defmodule Stringifier do
@moduledoc false
defp translations do
# Is a list because the result needs to be ordered in a certain way
[
total: "Everything, duh",
foo: "Those fabulous foos",
bar: "Them beautiful bars"
# ...
]
end
def translate(%MyStruct{} = struct) do
Enum.map(translations(), fn {key, string} ->
struct_value = struct[key]
if struct_value, do: translate_key(key, string, struct_value)
end)
end
def translate_key(key, string, value) do
case key do
# Keys that need unique formatting or other logic
# ...
key -> "#{string} - #{value}"
end
end
end
This approach has some serious flaws. The most obvious one is that every time I need to add a new string to collect data about or remove/change an existing one, I need to edit my code in three places: the struct definition, the parser and the stringifier. This is bad. If I forget about one, in the best case scenario my application crashes, in the worst case scenario I get a completely wrong result.
However, I can’t come up with a better way to do things. Maybe I could make a single keyword list like this, [foo: %{term: "Example String 1", string: "Those fabulous foos"}], which will give me the required ordering and will allow to unify the translations from Parser and Stringifier modules.
At a first glance, this probably won’t add too much strain on my find_tl/1 function, but because I also have “meta” keys (like :total) that don’t resolve to any terms from the file, this, in fact, will actually increase the complexion. Usually I don’t care about such overhead if it means for better readability, but the text file can have billions of lines, so I need all the speed I can get.
Some “meta” keys need to be in the front, some in the back, some in the middle, so simply prepending/appending them in the Stringifier module is not an option.
Besides, this would mean that I still will have to edit the struct. Though I guess this is probably not that important, because I don’t do any struct-related compile-time checks.
Marked As Solved
al2o3cr
I don’t see the struct adding any value in this case; a plain map would serve the same purpose but save editing one place when adding a new value to collect.
A good way to spot this is that both reading or writing of the counts uses map operations ([] and Map.update) versus compile-time references like .some_field.
I’d recommend a list of two-element tuples for Parser’s translations:
@moduledoc false
defp translations do
[
{"Example String 1", :foo],
{"Example String 2", :bar],
# Some complex strings that optionally may need to be further parsed with regexps, etc., hence bigger tuples
# {~r/some weird thing/, extra_args, :baz}
]
end
This ensures that functions like Enum.find_value(translations(), ...) always check to see if they match in the same order, ensuring that overlapping definitions like:
{"Example String", :short_foo],
{"Example String 1", :foo],
always give the same result.
Stringifier has a similar-but different list just like in your already-posted code, which is slightly easier to write since the first element of all the tuples is an atom.
Again the list ensures that the result iterates over the keys in the correct (& consistent) order.
One place to NOT use a list is for keeping the counts! For instance:
# CAUTION: PERFORMANCE HAZARD BELOW
"large_text_file.txt"
|> File.stream!()
|> Enum.reduce([total: 0], fn line, acc ->
tl = find_tl(line)
acc
|> Keyword.update(:total, 1, &(&1+1))
|> Keyword.update(tl, 1, &(&1+1))
end)
Unlike in the previous cases, the key order of the result isn’t something we care about - and maintaining it means that Keyword.update is slower than Map.update by a factor roughly the size of acc. This kind of random access is a good indicator that a map is the right choice.
This still leaves two places that need editing when adding a new value to be counted, but IMO that’s not unreasonable:
- one place says “this is how to collect this count”
- the other place says “this is how to display this count”
Fusing the two together into one list would force one of the orderings discussed above to match the other; either the display order would need to exactly follow the matching order, or the other way around.
Also Liked
FN75W
I don’t see the struct adding any value in this case; a plain map would serve the same purpose but save editing one place when adding a new value to collect.
True. In this particular case, I only use a struct because it, in my opinion, improves code readability by making it easier to understand what’s being passed around. Other than that, it serves pretty much no function in my code. Once again, no compile-time references, and I don’t need to implement any protocols or whatever. I’m not sure if defining an empty struct just for the sake of readability is not a code smell, so I’ll probably just remove the struct altogether.
This ensures that functions like
Enum.find_value(translations(), ...)always check to see if they match in the same order, ensuring that overlapping definitions like:
Sorry, I should’ve probably mentioned that such overlapping strings aren’t possible in the file, so it’s not a concern and there’s no reason to use a list of tuples. But I’ll take note of your suggestion in case things change, though, or for other projects, so thank you anyway.
This still leaves two places that need editing when adding a new value to be counted, but IMO that’s not unreasonable
Yeah, I guess that’s perfectly reasonable. I was so obsessed with attempting to make all those translations, uh, “monolithic”, that I completely forgot that I’m doing two completely different things in two completely different parts of my application. Thank you for reminding me of that and for essentially clearing my doubts on whether such repetitions are a bad practice or not.
Seems like the correct way to tackle this problem would be to simply write a test that checks if all translations from the parser are present in the stringifier to ensure that there are no typos or whatnot.
Popular in Questions
Other popular topics
Categories:
Sub Categories:
Forums
Popular Tags
- #ecto
- #liveview
- #troubleshooting
- #learning-elixir
- #deployment
- #library
- #erlang
- #testing
- #genserver
- #mix
- #absinthe
- #remote-other
- #otp
- #plug
- #how-to-question
- #macros
- #postgres
- #channels
- #elixirconf
- #exunit
- #discussion
- #javascript
- #code-sync
- #podcasts
- #onsite
- #dialyzer
- #docker
- #authentication
- #umbrella
- #full-time-contract
- #podcasts-by-brainlid
- #ecto-query
- #elixir-ls
- #phoenix_html
- #iex
- #blog-post
- #graphql
- #genstage
- #ai
- #websockets
- #supervisor
- #advent-of-code
- #elixirconf-us
- #distillery
- #processes
- #forms
- #api
- #metaprogramming
- #security
- #performance








