I’m very new to Elixir and hoped someone might have a few moments to check over my first Elixir program. It’s a pretty direct translation of a Racket program I wrote, so I would especially value any feedback on more effective or Elixiry approaches.
The purpose of the program is to ingest a BibLaTeX file (@bibfile), which will be processed and written out to @outfile. Processing involves
Regularizing the keywords (conveniently on lines starting "“Keywords = {”), so each keyword is separated by commas (rather than comma or semicolon) and capitalized
Passing all of other lines unchanged
For example,
@book{Snoopy-Dark-01,
Author = {Snoopy},
Keywords = {fiction; Unfinished Books},
Booktitle = {It was a Dark and Stormy Night}}
should end up as
@book{Snoopy-Dark-01,
Author = {Snoopy},
Keywords = {Fiction, Unfinished books},
Booktitle = {It was a Dark and Stormy Night}}
Pretty basic, but a good learning experience.
Thank you very much in advance. I’m really enjoying both the Elixir language and community.
defmodule BibCleaner do
@bibfile "/Users/ghoetker/BibDeskPapers/masterbibliography.bib"
@outfile "test_out.bib"
@moduledoc """
Clean BibLaTeX files
"""
def clean do
{:ok, data} = File.read(@bibfile)
{:ok, file} = File.open(@outfile, [:write, :utf8])
data
|> String.split("\n")
|> Enum.map(&process(&1))
|> Enum.map(&IO.puts(file, &1))
File.close(file)
end
defp process(astring) do
cond do
String.contains?(astring, "\tKeywords = {") ->
astring
|> String.replace_leading("\tKeywords = {", "")
|> String.replace_trailing("},", "")
|> String.split([", ", "; "])
|> Enum.map(&String.capitalize/1)
|> Enum.join(", ")
|> (fn x -> "\tKeywords = {" <> "#{x}" <> "}," end).()
true ->
"#{astring}"
end
end
end
From a quick glance, lgtm. Having said that - and this has nothing to do with Elixir - if this is more than a one-off, I would parse that thing into some data structure, operate on the data structure, and then write it out to a string again. May be me, but I prefer to have the ugly string scanning stuff and the stuff I’m actually trying to accomplish separate
Eh, that itself is technically a shorter way of just calling to_string(astring), which can be important is astring is an IOList or integer or something.
Replacing cond makes a lot of sense. Not sure why I’d gone to "#{astring}", but just astring works just fine.
I agree that parsing into a data structure would make sense if I were doing anything more complex. Bib(La)TeX is infamous for being difficult to parse (in its defense, we’ve learned a lot in the 33 years since it was invented), so I’ve avoided that pain point for this simple task.
Very helpful and much appreciated. Thank you again.
If you’re already interpolating, you can avoid the binary concats.
(fn x -> “\tKeywords = { #{x} },” end).()
And in general this whole line is kind of a smell to me. Not in what it does, but in the way you’re forced to write it. I tried to come up with something better, but couldn’t. Would be nice if there was a Pipe module with convenience functions to help make this type of stuff prettier.
|> Pipe.map(“\tKeywords = { #{&1} },”)
or some such.
PS: As another elixir noobie, I love this thread. Wish there were many more.
Thanks for the lead on avoiding the binary concats. In terms of pipes going anywhere but the “front” of a function, that may just be a trade-off of what makes them so straight-forward. I could have written a private function to do that, I think, and probably will next time.
Glad you found the thread useful. I’ve really enjoyed the Forum, overall.
Sure, you can definitely write your own private map function and then do the same thing and clean up the syntax a bit. But I assume this is a very common problem, and I’m surprised this is the accepted solution. I’m still hoping for one of my betters to demonstrate something truly elegant that we’re apparently missing.
Also, what’s the best way to pipe to a file? Seems like an obvious thing people would want to do.
"contents"
|> <???>.write(path) # or <???>.write(iodevice)
I also feel like this processing pipeline could be made more resilient and clear by using a regex to extract the stuff. Then process the stuff. Then interpolate back into a string.
I know that when I was playing around locally, the string I passed into the process function didn’t include a leading “\t”. And I didn’t get the expected output. Is it critical to always anchor on a leading tab? I dunno. Possibly not.
I don’t have a working regex example yet, but it would be something I’d look into. You’d only save 1 line, but clarity could be improved. There may be a String.replace function that is more easily piped. You’d supply your string containing an anchor to replace, and then your fixed keywords. Could work.
I also wonder if the pipe macro could be improved to accept an optional parameter position for injection. Something like:
"contents"
|> 1, File.write("path") # 0 index default
@CptnKirk@blatyo There is a package on Hex, PipeTo, that appears to offer much of this functionality. Adds a new pipe ~> with a placeholder _, leading to functionality like the following example
1 ~> Enum.at(1..3, _) # yields 2
The |> pipe continues to function as normal. In fact, if no placeholder is used, ~> performs just like |>. So, one could, I suppose, use ~> exclusively.
Haven’t played with it at all. Until I get my feet fully under me, I’ll err on the side of the basics, but you might find it worth looking into. Thanks for all your help.
@CptnKirk Interesting difference in philosophies. Perhaps because of my limited skills, I generally follow the quote attributed to Jamie Zawinski: “Some people, when confronted with a problem, think ‘I know, I’ll use regular expressions.’ Now they have two problems.”