First Elixir Project (Probably terrible code)

nix2intel · May 13, 2024, 9:51am

I just finished the first iteration of my first Elixir project. I’m still super new to all of this and would love feedback, pull requests, etc. The code is available here GitHub - nix2intel/builtwith: Builtwith API wrapper written in Elixir and the documentation is here on hex builtwith. I would love to know if I’m doing dumb things, if there are different ways I should approach the problems, etc. I love elixir so far but still trying to grok it as python has been my main programming language for years.

dimitarvp · May 13, 2024, 10:02am

To start with, I’d use Req and not HttpPoison.

Also the Map.get function calls you could just replace with pattern-matching IMO.

D4no0 · May 13, 2024, 12:57pm

Maybe Req is now the cool kid in town, however HttpPoison is a battle-tested library that was long before req was created, I use it a lot even to this day. Personally I would just decouple the http client and have a default implementation with whatever is easier to use these days.

nix2intel · May 13, 2024, 2:10pm

Could you describe more what this would look like in this case? right now it kind of looks like this:
Builtwith.make_request(domain: example.com bwpass: api_token_here)

and then it handles building the string in the backend in this case using httpoison. I just want to make sure i’m writing elixiy code and not python code in elixir

nix2intel · May 13, 2024, 2:12pm

Are there any good resources you could point me to for pattern matching? This is a fairly new concept to me. I’d love to not have to do map finds in enum statements to get a map that I can pull out top level items from the json.

arcyfelix · May 13, 2024, 2:33pm

Just out of curiosity - what is your argument for one, not the other?

D4no0 · May 13, 2024, 3:25pm

Instead of using HttpPoison directly in your codebase, you can define a generic http client interface that is fully configurable.

Here is an example of a interface: lib/ssl_moon/network/http/http_client.ex · main · SSL MOON / SSL MOON · GitLab
And implementation: lib/ssl_moon/network/http/http_client_impl.ex · main · SSL MOON / SSL MOON · GitLab

If you plan on using that code, don’t mind the logic related to redirects as that is irrelevant in your case.

muelthe · May 13, 2024, 3:34pm

on the topic of pattern matching, from the docs:
https://hexdocs.pm/elixir/pattern-matching.html

And personally, I really like this source as well:

seeplusplus · May 13, 2024, 4:41pm

AFAIK it’s not advisable to check in the doc directory. It is ignored by the .gitignore generated by ElixirLS; gitignore/Elixir.gitignore at main · github/gitignore · GitHub.

seeplusplus · May 13, 2024, 5:26pm

Some other general feedback (not Elixir specific, hope that’s ok):

Improve names

Method names like make_request can feel obvious in the moment you are writing the code, but don’t truly communicate what they are doing. Instead, consider lookup_by_domain.

Do one thing

Most of the get_ methods are doing more than one thing, for example:

 def get_subdomains(builtwithjson) do
      get_results(builtwithjson)
      |> Enum.find(fn map -> Map.has_key?(map, "Result") end)
      |> case do
        nil ->
          nil
        subdomains ->
        Map.get(subdomains, "Result")
        |> Map.get("Paths")
        |> Enum.map(fn x -> Map.get(x, "SubDomain") end)
        |> Enum.reject(&(&1 == ""))
      end
  end

You are getting the first result from a list of results and getting the SubDomain from it. I know most likely the first result of the list is probably the correct one, otherwise you wouldn’t have built it this way. The issue though is that your library is making a choice that’s hidden from the user. Namely, which one of the results is “correct.” What if my lookup by domain had multiple results and I actually wanted the second one? Instead, make these get_ methods only work with a Result, not a list of Results, e.g. change the above to:

 def get_subdomains(builtwithjson_result) do
      builtwithjson_result
          |> Map.get(subdomains, "Result")
          |> Map.get("Paths")
          |> Enum.map(fn x -> Map.get(x, "SubDomain") end)
        |> Enum.reject(&(&1 == ""))
  end

Now, you do force the library user to pick which result they want, and they could get an error when using the get_ methods, but your current library has the following behavior:

# Imagine the list of results has, in order:
# 1. A result with a "Result" field, but no "Attributes" field.
# 2. A result with an "Atttribute" field.
results = Builtwith.lookup_by_domain("google.com", my_api_key)

# Reads the "SubDomain" field from the first entry in the list of results. 
subdomains = results |> Builtwith.get_subdomains() 

# Read the Attributes field from the second entry in the list of results.
attributes = results |> Builtwith.get_attributes()

The user now has a list of subdomains and a list of attributes, but they aren’t from the same result. This could be very bad for some APIs. This exact scenario may be impossible with the Builtwith API, but it could be possible with another API with a different data model and could have very bad consequences if you get this mixed up.

Micro optimization: Avoid double Enum calls

This is a small one and someone may correct me if I’m incorrect about this. In get_subdomains:

|> Enum.map(fn x -> Map.get(x, "SubDomain") end)
|> Enum.reject(&(&1 == ""))

Each call to Enum will basically allocate a new list (which will increase memory usage). Instead you can use the Stream API:

|> Stream.map(fn x -> Map.get(x, "SubDomain") end)
|> Enum.reject(&(&1 == ""))

This will lead to a slight optimization to the memory footprint of your code. Note, Streams are lazy - the output of a Stream method is always a Stream, so you will always need to end with an Enum method to get the type you actually want at the end.

kokolegorille · May 13, 2024, 5:47pm

BTW Req is inspired by requests, a Python HTTP client…

sodapopcan · May 13, 2024, 5:48pm

Not to speak for @dimitarvp but Andrea L. has a good write-up from a year ago now:

Despite HTTPoinson not being listed in the mini table of contents, he talks about it near the end.

kokolegorille · May 13, 2024, 5:52pm

While I have been using HTTPoison for most of my HTTP clients, I am now switching to Req, it has some cool stuff baked in (retry…)

nix2intel · May 13, 2024, 7:35pm

Pattern-matchings is blowing my mind, thank you! goodbye python, this is so much cooler!

nix2intel · May 13, 2024, 7:36pm

i’ll look into this in the future, I can already tell a major refactoring going to happen here with some of the above and as I get to know elixir better.

nix2intel · May 13, 2024, 7:37pm

this is good to know thank you, what about priv? i’m trying to find all those edge cases.

nix2intel · May 13, 2024, 7:39pm

thank you so much! these are definetly the problems I want to avoid, i’ll be refactoring based on these comments. Elixir seems like such a beautiful and amazing language and community.

seeplusplus · May 13, 2024, 7:42pm

I have priv committed in some of my Phoenix projects, because it contains non-machine generated code (e.g., migrations, javascript, etc).

thank you so much! these are definetly the problems I want to avoid, i’ll be refactoring based on these comments. Elixir seems like such a beautiful and amazing language and community.

I am glad the comments were helpful. And I am glad you are enjoying the language and the community. I feel the same way. The community is one of the best I have seen.

sodapopcan · May 13, 2024, 10:56pm

priv is the canonical place to store external resources, ie, non-Elixir files. Stuff in assets gets copied into it which is ignored by default if you use phx_new. You can also store such things in lib if you use the @external_resource module attribute though that’s a bit more advanced!