I have a large text file that I need to read line-by-line and collect various statistics about its contents, e.g. the amount of certain phrases, words, characters, etc. Then, I need to turn those statistics into a neat human-readable string.
My current code roughly looks like this:
defmodule MyStruct do
@moduledoc false
defstruct total: 0, foo: 0, bar: 0, baz: 0
end
defmodule Parser do
@moduledoc false
defp translations do
%{
"Example String 1" => %{translation: :foo},
"Example String 2" => %{translation: :bar}
# Some complex strings that optionally may need to be further parsed with regexps, etc., hence maps as values
# ...
}
end
def parse_file do
"large_text_file.txt"
|> File.stream!()
|> Enum.reduce(%MyStruct{}, fn line, acc -> maybe_update_acc(line, acc) end)
end
defp increment(map, key), do: Map.update(map, key, 1, &(&1 + 1))
defp find_tl(string) do
Enum.find_value(translations(), fn {k, v} -> if String.contains?(string, k), do: v.translation end)
end
defp maybe_update_acc(line, acc) do
tl = find_tl(line)
acc
|> increment(:total)
|> increment(tl)
# Irrelevant logic for regexps and other conditions to update the acc further
# ...
end
end
defmodule Stringifier do
@moduledoc false
defp translations do
# Is a list because the result needs to be ordered in a certain way
[
total: "Everything, duh",
foo: "Those fabulous foos",
bar: "Them beautiful bars"
# ...
]
end
def translate(%MyStruct{} = struct) do
Enum.map(translations(), fn {key, string} ->
struct_value = struct[key]
if struct_value, do: translate_key(key, string, struct_value)
end)
end
def translate_key(key, string, value) do
case key do
# Keys that need unique formatting or other logic
# ...
key -> "#{string} - #{value}"
end
end
end
This approach has some serious flaws. The most obvious one is that every time I need to add a new string to collect data about or remove/change an existing one, I need to edit my code in three places: the struct definition, the parser and the stringifier. This is bad. If I forget about one, in the best case scenario my application crashes, in the worst case scenario I get a completely wrong result.
However, I can’t come up with a better way to do things. Maybe I could make a single keyword list like this, [foo: %{term: "Example String 1", string: "Those fabulous foos"}]
, which will give me the required ordering and will allow to unify the translations from Parser
and Stringifier
modules.
At a first glance, this probably won’t add too much strain on my find_tl/1
function, but because I also have “meta” keys (like :total
) that don’t resolve to any terms from the file, this, in fact, will actually increase the complexion. Usually I don’t care about such overhead if it means for better readability, but the text file can have billions of lines, so I need all the speed I can get.
Some “meta” keys need to be in the front, some in the back, some in the middle, so simply prepending/appending them in the Stringifier
module is not an option.
Besides, this would mean that I still will have to edit the struct. Though I guess this is probably not that important, because I don’t do any struct-related compile-time checks.