Structuring projects - this ok?

Hello,

With your help from Generate a list of structs from a list and Improve a function I have now a working first simple version of my program. Before I want to extend that program, I am wondering whether the structure of my project is a good one or not.

My program just gets the data of the first 100.000 leagues (a “league” has an id, name, etc.) from an external url and inserts them into a repo.

It looks like this (I stripped the files as much as possible):

program.ex

defmodule Program do
  use Application 

  alias Program.LeagueAPI

  def start(_type, _args) do
    children = [
      Program.Repo
    ]

    Supervisor.start_link(children, strategy: :one_for_one)
  end 

  def start() do
    for x <- 1..100_000 do 
      LeagueAPI.get_data(x) 
    end
  end
end

leagueAPI.ex is like that:

defmodule Program.LeagueAPI do 

  @api_url "http://--hidden-external-api-url"

  alias ProgramAuth
  alias Program.Repo
  alias League
  import SweetXml

  defp get_url(league_id) do
    @api_url <> "&leagueID=" <> to_string(league_id)
  end 

  def get_data(league_id) do
    url = get_url(league_id)
    response = ProgramAuth.get_response(url)

    league_id =
      response.body
      |> xpath(~x"/ProgramData/LeagueID/text()"l)
      |> List.first()
      |> to_string() 

    league_name =
      response.body
      |> xpath(~x"/ProgramData/LeagueName/text()"l)
      |> List.first()
      |> to_string()

    # same for all the other fields

    %Program.League{}
    |> Ecto.Changeset.change(%{
      league_id: league_id, 
      # ... same for all the other fields
    })
    |> Repo.insert(on_conflict: :nothing)
  end
end

and finally league.ex is like that:

defmodule Program.League do
  use Ecto.Schema
  import Ecto.Changeset

  schema "leagues" do
    field(:league_id, :string, unique: true)
    # same for all the other fields
    timestamps()
  end

  def changeset(league, attrs) do
    league
    |> cast(attrs, [
      :league_id, 
      # same for all the other fields
    ])
    |> validate_required([:league_id])
    |> unique_constraint(:league_id)
    |> validate_number(:league_id, greater_than_or_equal_to: Decimal.new(0))
    # same for all the other fields
  end
end

Is my structure with a file for the main program, one file for the schema and one for the API with the functions I implemented in there a good practise or should I put them differently?

Especially I am not sure where the Repo.insert should be placed. Should a module called leagueAPI know the Repo?
Now Repo.insert is in the leagueAPI.ex, but maybe it should be in program.ex and leaguesAPI should only return a league (which is a schema) to program which inserts it? But should leaguesAPI know the schema “league”? Maybe it should return a list/struct instead?

It may seem that I overcomplicate things, but I think it’s better to think in correct structure before attempting much heavier implementations.

Thanks for advice :slight_smile:

1 Like

The parts where you say ‘same for all the other fields’ can probably be extracted to a separate function, to make the get_data function shorter/more readable. Maybe something like:

def get_data(league_id) do
    league_id
  |> build_url()
  |> ProgramAuth.get_response()
  |> parse_response()
end

def parse_response(%{body: response_body}) do
  %Program.League{}
  |> Ecto.Changeset.change(%{
    league_id: extract_xpath(body, ~x"/ProgramData/LeagueID/text()"l),
    league_id: extract_xpath(body, ~x"/ProgramData/LeagueID/text()"l),
    # ... the same for the other fields.
  )
end

def extract_xpath(body, xpath) do
  body
  |> xpath(xpath)
  |> List.first
  |> to_string()
end

Also note that:

  • I decided in the following example to use get for the HTTP request call, and build for local data manipulation.
  • I split off from the get_data function a separate parse_response function. Besides being more readable, this also means that you can test the parsing separately from invoking the remote API.
  • I did not insert anything into the repository inside the API module, to keep that module free from state-modifications. It’s probably nicer to insert the thing into the repository at the place where you call it. For instance the controller inside a Phoenix application (or, if you are working with domain-driven-design, the context module that is invoked by the controller, which itself calls into your API module as part of its work). In your example application, Program is indeed where I would put it.

I do think that it makes sense that the leaguesAPI knows the internals of the League datatype, since it is what it will return. You might like to have a module called League for the datatype, and have the API module live at League.API to indicate clearly that it is conceptually a part of League.

3 Likes

Thanks, quite a bunch of good hints. I am doing my best to get those changes in.

Is there a better way then this?

case LeagueAPI.data(league_id) do
  {:ok, league} -> league |> Repo.insert(on_conflict: :nothing)
end

LeagueAPI.data is now giving a tuple with :ok as first element if the API fetched a valid league.

You are close. However, what do you want to happen when the answer of LeageAPI.data(league_id) is {:error, some_error}? And are there other steps, where you only want to continue if they all are OK?

So I’d suggest either adding an {:error, some_error}-clause (or multiple clauses for various kinds of errors) to the case-statement, or instead use a with statement:

with {:ok, league} <- LeagueAPI.data(league_id),
       # ... maybe some other steps,
  do
    Repo.insert(league, on_conflict: :nothing)
  else 
    # Handle error somehow
end

(And side note: Many developers find a simple function call more readable than a pipeline of only one function. I suggest using credo to get some tips about common ways to structure code and pitfalls to avoid, especially while still starting out. Of course many of the tips are somewhat subjective, but they are a great baseline for a starting Elixir developer.)

2 Likes

Thanks again :slight_smile: