Please help me improve my solution to the "Computing GC Content" challenge on the Rosalind site (bioinformatics topic)

Hi guys!
I would like your help to improve my solution.
my test
I resolved the code-challenge, but not is clear.

In my mind this can be much better with other people helping.

Eh, it could be shortened a bit and made a bit more efficient but not really that much more readable either. I’d probably have the actual calculation be something like this though (as it is significantly faster):

iex(1)> dna = "AGCTATAG"
iex(2)> Enum.reduce(to_charlist(dna), 0, &if(&1==?C or &1==?G, do: &2+1, else: &2))/byte_size(dna)

Which if wrapped in a case do ... end then the whole thing could be pipelined into just a dozen lines or so. :slight_smile:

1 Like

Here’s a version with some helper functions split out, using Stream to eliminate intermediate lists, and binary matching to count the G and C characters:

defmodule Gc do
  def gc_content(dataset) do
    {key, gc_percent} =
      |> parse_lines()
      |> {k, v} -> {k, gc_percent(v)} end)
      |> Enum.max_by(&elem(&1, 1))


  @spec parse_lines(String.t()) :: Enumerable.t()
  def parse_lines(dataset) do
    |> String.replace("\n", "")
    |> String.split(">", trim: true)
    |>, 13))

  @spec gc_percent(String.t()) :: float
  def gc_percent(val), do: Float.round(100 * gc_count(val) / String.length(val), 7)

  @spec gc_count(String.t(), integer) :: integer
  def gc_count(val, n \\ 0)
  def gc_count("", n), do: n
  def gc_count("G" <> rest, n), do: gc_count(rest, n + 1)
  def gc_count("C" <> rest, n), do: gc_count(rest, n + 1)
  def gc_count(<<_::utf8>> <> rest, n), do: gc_count(rest, n)
1 Like

cool your solution!
Your words is true, your code is unreadable :sweat_smile:
But sometimes performance is more necessary.
Whereas bioinformatics usually works with large files and data, it’s really important.

Thinking about your solution, and looking to improve the readable.
The “if” can turn into functions.

What do you think?

1 Like