New to elixir, would appreciate a code review very much

I’m very new to Elixir and hoped someone might have a few moments to check over my first Elixir program. It’s a pretty direct translation of a Racket program I wrote, so I would especially value any feedback on more effective or Elixiry approaches.

The purpose of the program is to ingest a BibLaTeX file (@bibfile), which will be processed and written out to @outfile. Processing involves

  • Regularizing the keywords (conveniently on lines starting "“Keywords = {”), so each keyword is separated by commas (rather than comma or semicolon) and capitalized
  • Passing all of other lines unchanged

For example,

@book{Snoopy-Dark-01,
  Author = {Snoopy},
  Keywords = {fiction; Unfinished Books},
  Booktitle = {It was a Dark and Stormy Night}}

should end up as

@book{Snoopy-Dark-01,
Author = {Snoopy},
Keywords = {Fiction, Unfinished books},
 Booktitle = {It was a Dark and Stormy Night}}

Pretty basic, but a good learning experience.

Thank you very much in advance. I’m really enjoying both the Elixir language and community.

defmodule BibCleaner do
  @bibfile "/Users/ghoetker/BibDeskPapers/masterbibliography.bib"
  @outfile "test_out.bib"
  @moduledoc """
  Clean BibLaTeX files
  """

  def clean do
    {:ok, data} = File.read(@bibfile)
    {:ok, file} = File.open(@outfile, [:write, :utf8])

    data
    |> String.split("\n")
    |> Enum.map(&process(&1))
    |> Enum.map(&IO.puts(file, &1))

    File.close(file)
  end

  defp process(astring) do
    cond do
      String.contains?(astring, "\tKeywords = {") ->
        astring
        |> String.replace_leading("\tKeywords = {", "")
        |> String.replace_trailing("},", "")
        |> String.split([", ", "; "])
        |> Enum.map(&String.capitalize/1)
        |> Enum.join(", ")
        |> (fn x -> "\tKeywords = {" <> "#{x}" <> "}," end).()

      true ->
        "#{astring}"
    end
  end
end
2 Likes

From a quick glance, lgtm. Having said that - and this has nothing to do with Elixir - if this is more than a one-off, I would parse that thing into some data structure, operate on the data structure, and then write it out to a string again. May be me, but I prefer to have the ugly string scanning stuff and the stuff I’m actually trying to accomplish separate :wink:

2 Likes

Instead of using cond, you could do this which seems a bit more Elixir-y to me:


  defp process(astring) do
    if String.contains?(astring, "\tKeywords = {"),
      do: do_process(astring),
      else: astring
  end

  defp do_process(astring)
    astring
    |> String.replace_leading("\tKeywords = {", "")
    |> String.replace_trailing("},", "")
    |> String.split([", ", "; "])
    |> Enum.map(&String.capitalize/1)
    |> Enum.join(", ")
    |> (fn x -> "\tKeywords = {" <> "#{x}" <> "}," end).()
  end

Was there any particular reason you were using "#{astring}"? It seems like this just creates a string that is identical to the previous string …

3 Likes

Eh, that itself is technically a shorter way of just calling to_string(astring), which can be important is astring is an IOList or integer or something.

I suppose, but the input seems to be coming from String.split anyways :slight_smile:

1 Like

Thank you all very, very much.

Replacing cond makes a lot of sense. Not sure why I’d gone to "#{astring}", but just astring works just fine.

I agree that parsing into a data structure would make sense if I were doing anything more complex. Bib(La)TeX is infamous for being difficult to parse (in its defense, we’ve learned a lot in the 33 years since it was invented), so I’ve avoided that pain point for this simple task.

Very helpful and much appreciated. Thank you again.

2 Likes

If you’re already interpolating, you can avoid the binary concats.

(fn x -> “\tKeywords = { #{x} },” end).()

And in general this whole line is kind of a smell to me. Not in what it does, but in the way you’re forced to write it. I tried to come up with something better, but couldn’t. Would be nice if there was a Pipe module with convenience functions to help make this type of stuff prettier.

|> Pipe.map(“\tKeywords = { #{&1} },”)

or some such.

PS: As another elixir noobie, I love this thread. Wish there were many more.

Thanks for the lead on avoiding the binary concats. In terms of pipes going anywhere but the “front” of a function, that may just be a trade-off of what makes them so straight-forward. I could have written a private function to do that, I think, and probably will next time.

Glad you found the thread useful. I’ve really enjoyed the Forum, overall.

Happy Elixiring.

1 Like

Sure, you can definitely write your own private map function and then do the same thing and clean up the syntax a bit. But I assume this is a very common problem, and I’m surprised this is the accepted solution. I’m still hoping for one of my betters to demonstrate something truly elegant that we’re apparently missing.

Also, what’s the best way to pipe to a file? Seems like an obvious thing people would want to do.

"contents"
|> <???>.write(path) # or <???>.write(iodevice)

This isn’t ideal, but looks slightly better:

    astring
    |> String.replace_leading("\tKeywords = {", "")
    |> String.replace_trailing("},", "")
    |> String.split([", ", "; "])
    |> Enum.map(&String.capitalize/1)
    |> Enum.join(", ")
    |> String.replace_prefix("", "\tKeywords = {  ")
    |> String.replace_postfix("", " },")

I wouldn’t mind a Pipe.map/2. In this case a String.prepend/2 and String.append/2 would also work if they existed.

I also feel like this processing pipeline could be made more resilient and clear by using a regex to extract the stuff. Then process the stuff. Then interpolate back into a string.

I know that when I was playing around locally, the string I passed into the process function didn’t include a leading “\t”. And I didn’t get the expected output. Is it critical to always anchor on a leading tab? I dunno. Possibly not.

I don’t have a working regex example yet, but it would be something I’d look into. You’d only save 1 line, but clarity could be improved. There may be a String.replace function that is more easily piped. You’d supply your string containing an anchor to replace, and then your fixed keywords. Could work.

I also wonder if the pipe macro could be improved to accept an optional parameter position for injection. Something like:

"contents"
|> 1, File.write("path") # 0 index default

You could also try this if you don’t like the full fn syntax:

|> (&("\tKeywords = { " <> &1)).()
|> (&(&1 <> " }")).()

@CptnKirk @blatyo There is a package on Hex, PipeTo, that appears to offer much of this functionality. Adds a new pipe ~> with a placeholder _, leading to functionality like the following example

1 ~> Enum.at(1..3, _) # yields 2

The |> pipe continues to function as normal. In fact, if no placeholder is used, ~> performs just like |>. So, one could, I suppose, use ~> exclusively.

Haven’t played with it at all. Until I get my feet fully under me, I’ll err on the side of the basics, but you might find it worth looking into. Thanks for all your help.

@CptnKirk Interesting difference in philosophies. Perhaps because of my limited skills, I generally follow the quote attributed to Jamie Zawinski: “Some people, when confronted with a problem, think ‘I know, I’ll use regular expressions.’ Now they have two problems.”

1 Like