Computing Shannon entropy as my first program in Elixir. What concepts should I learn to improve the implementation?

dangro · February 20, 2024, 7:50am

Hello!

I heard about Elixir a few months ago (a Honeypot documentary on Elixir), and I got curious. Unfortunately, I have not had a chance to try it out until today.

Today, I finally took the first step in writing a program in Elixir: a way to compute the Shannon entropy of a word or phrase. I think I got the job done, but I wonder in what ways I can improve this implementation.

Perhaps I’m missing some key feature or concept that makes implementing this function more… Elixir-ly? (By the way, is there a term used to describe ideal elixir style in the same way that the Python community has pythonic?)

defmodule Entropy do
    def prepare(input) do
        [
            input |> String.graphemes |> Enum.frequencies,
            input |> String.length
        ]
    end

    def calc(input) do
        [letter_frequency, input_length] = Entropy.prepare(input)
        letter_frequency
            |> Map.values
            |> Enum.map(fn x -> x / input_length end)
            |> Enum.map(fn x -> x * :math.log2(1 / x) end)
            |> Enum.sum
    end
end

LostKobrakai · February 20, 2024, 7:56am

Generally this looks fine. Maybe two points for small improvements:

I’d use a tuple as the return value for prepare/1. It’s the more ideomatic type for returning two distinct values.
I don’t see a good reason to keep the two Enum.map apart. You could collapse those to a single one and safe yourself from iterating the list twice for those calculations.

SirWerto · February 21, 2024, 11:42am

Also, you can use a guard on the calc function just in case

is_binary

ken-kost · February 21, 2024, 12:30pm

or collapse them all into a single reduce?

|> Enum.reduce(0, &(&2 + (&1 * :math.log2(input_length / &1)) / input_length))