Replacing placeholders in a string from CSV values

hakarabakara · June 12, 2021, 3:32am

Hi,

My application accepts a string input and dynamic values from an uploaded CSV file and should generate a response based on the csv.

The string has placeholders have corresponding values in the CSV under the headers.

I was wondering whether there is a more efficient way of reading the stream once to perform the whole operation as opposed to fetching columns which are in the first row then streaming through the file again replacing the occurrence of the header in the string.

mindok · June 12, 2021, 4:20am

Hi,

It’s not quite clear to me what you’re trying to achieve. It might be useful to show a small example of the two inputs (string input and CSV file) and expected output.

hakarabakara · June 12, 2021, 4:33am

Thank you for your response and apologies for not being clear.

Assuming this is my csv:

name,email,balance
Francis Waters,jolir@jalih.mz,$1810.08
Ina Thomas,duzzigip@hizjos.cl,$5639.13
George Cortez,siw@jijol.ma,$222.81
Oscar Nguyen,ov@rici.nu,$7167.56
Wayne Campbell,nad@tuj.jp,$964.14

My text looks something like this:

#name, you owe us #balance this month. Please see your statement here #email

The result for each row should be something like this:

Francis Waters, you owe us $1810.08 this month. Please see your statement here jolir@jalih.mz

Sebb · June 12, 2021, 6:13am

Still not sure what you want, is it this?

test "sentences" do
  data = """
  name,email,balance
  Francis Waters,jolir@jalih.mz,$1810.08
  Ina Thomas,duzzigip@hizjos.cl,$5639.13
  George Cortez,siw@jijol.ma,$222.81
  Oscar Nguyen,ov@rici.nu,$7167.56
  Wayne Campbell,nad@tuj.jp,$964.14
  """
  
  rows = String.split(data, "\n")
  Enum.map(rows, fn line -> line |> String.split(",") |> sentence() end)
end


defp sentence([name, email, balance]) do
  "#{name}, you owe us #{balance} this month. Please see your statement here #{email}"
end
defp sentence(_), do: ""

["name, you owe us balance this month. Please see your statement here email",
 "Francis Waters, you owe us $1810.08 this month. Please see your statement here jolir@jalih.mz",
 "Ina Thomas, you owe us $5639.13 this month. Please see your statement here duzzigip@hizjos.cl",
 "George Cortez, you owe us $222.81 this month. Please see your statement here siw@jijol.ma",
 "Oscar Nguyen, you owe us $7167.56 this month. Please see your statement here ov@rici.nu",
 "Wayne Campbell, you owe us $964.14 this month. Please see your statement here nad@tuj.jp",
 ""]

mindok · June 12, 2021, 6:28am

Using Enum.take(1) on a stream to get the headers should be pretty efficient. Using Nimble CSV:

  alias NimbleCSV.RFC4180, as: CSV

 ...
 # Creates a list of header names
 [header] 
  = input_csv_stream 
    |> CSV.parse_stream(skip_headers: false) 
    |> Enum.take(1)

 # Then you can do what you need to with the main CSV. If you are a bit lazy (like me) you can zip
 # the header with the value into a map to allow easy lookup for processing your string template
 data 
  = input_csv_stream
    |> CSV.parse_stream(skip_headers: true)
    |> Stream.map(fn row -> Enum.zip(header, row) |> Enum.into(%{}) end)
    |> Stream.map(fn row_map -> template_wrangling_stuff(template, row_map) end)
    |> Enum.take(10) # or however you want to deal with the output

Sebb · June 12, 2021, 7:02am

It takes about 1 sec to load and convert a 1.000.000 lines CSV.
So I’d just stick with the naive implementation for this one (and - having about 1 Billion revenue in that file - just buy a supercomputer to speed things up if need be).

hakarabakara · June 12, 2021, 8:17am

Thank you Sebb for this.

My columns are however dynamic, won’t always be name, email and balance.

Sebb · June 12, 2021, 8:31am

OK, more information needed. It’s still not clear what you want to do.

hakarabakara · June 12, 2021, 8:56am

I need to achieve exactly what you did but bearing in mind that the csv titles and content would be dynamic, for example the next template can be exam scores so message would look like this:

hi #name, you have scored #physics in physics, #chemistry in chemistry and #maths in maths

The csv would then look something like this:

name,maths,chemistry,physics
James Ortega,44,14,49
Eula Garrett,26,74,16
Emily Marsh,47,22,73
Tommy Jenkins,37,54,55
Ronnie Nash,54,04,31

Sebb · June 12, 2021, 9:12am

I hacked that into the first solution, so you can get an idea.

def load(csv) do
  [header | rows] = String.split(csv, "\n")
  fun = &sentence(&1, header)
  Enum.map(rows, fn line -> line |> String.split(",") |> fun.() end)
end

defp sentence([name, email, balance], "name,email,balance") do
  "#{name}, you owe us #{balance} this month. Please see your statement here #{email}"
end

defp sentence([other, header], "other,header") do
  "#{other}, ... #{header} ..."
end

defp sentence(_, _), do: ""

EDIT this wont work: If this does what you want, I’d create a lookup header => template instead of the multiple functions.

amnu3387 · June 12, 2021, 3:00pm


template =
      "hi #name, you have scored #physics in physics, #chemistry in chemistry and #maths in maths"

input = """
    name,maths,chemistry,physics
    James Ortega,44,14,49
    Eula Garrett,26,74,16
    Emily Marsh,47,22,73
    Tommy Jenkins,37,54,55
    Ronnie Nash,54,04,31
"""

[headers] =
      input
      |> NimbleCSV.RFC4180.parse_string(skip_headers: false)
      |> Enum.take(1)

replacement_hs =
      headers
      |> Enum.map(fn header -> "#" <> header end)

input
|> NimbleCSV.RFC4180.parse_string(skip_headers: true)
|> Enum.map(fn row ->
      Enum.zip_reduce(row, replacement_hs, template, fn col_val, header, acc ->
        String.replace(acc, header, col_val)
    end)
end)

The only reasons to use NimbleCSV here is because sometimes there can be escaped separation characters and if you have a streamable source it’s just easier to get it working. Using @mindok example but with a string I would just map the headers into their replacement tokens ("#some_header") and then brute force the replacements unless it turned out to be slower than acceptable at which maybe compiling a custom regex could be faster.

Sebb · June 12, 2021, 5:56pm

Definitely use a CSV-parser. I just wanted to keep it simple.

I like your little template-engine, but maybe its not really beginner-friendly (and it doesn’t address the selection of the template by CSV header).

my solution with nimble

def load(csv) do
  [header | rows] = NimbleCSV.RFC4180.parse_string(csv, skip_headers: false)
  Enum.map(rows, fn row -> sentence(row, header) end)
end

defp sentence([name, email, balance], ["name", "email", "balance"]) do
  "#{name}, you owe us #{balance} this month. Please see your statement here #{email}"
end

defp sentence([other, header], ["other", "header"]) do
  "#{other}, ... #{header} ..."
end

@mnussbaumer’s solution is better because you don’t need to write code for the templates, could even put them in a json if you want. Just create a map header -> template from which to select the template you need.

hakarabakara · June 13, 2021, 2:01am

Thank you very much.

This works perfectly on my Ubuntu box which allows upgrading to 1.12 of elixir which introduced Enum.zip_reduce. Production VM runs Centos 7 so I’m stuck at 1.10 unfortunately. The IP is whitelisted by a service provider and they’ve dragged their feet in getting another whitelisted where we can have Debian

amnu3387 · June 13, 2021, 3:13am

In the docs it’s mentioned it’s basically sugar for Enum.reduce(Stream.zip(enums), acc, reducer)

It should work with:

Stream.zip(row, replacement_hs)
|> Enum.reduce(template, fn {col_val, header}, acc ->
      String.replace(acc, header, col_val)
end)

Perhaps extracting that bit into its own fun

@spec template_token_replacement(template :: String.t(), row :: list(String.t()), tokens :: list(String.t()) :: String.t()
def template_token_replacement(template, row, replacement_tokens) do
  row
  |> Stream.zip(replacement_tokens)
  |> Enum.reduce(template, fn {col_val, header}, acc ->
      String.replace(acc, header, col_val)
  end)
end

Sebb · June 13, 2021, 1:48pm

I think this is more readable.

Another thing about the templates: just found out, that EEx is part of Elixir, so the templates could also be written in EEx (without any dependency). But as I understand it, the assigns have to be in the form [{:atom, <value>}, ...] so the header would have to be converted to atoms.

EEx.eval_string(
   "<%= @name %>, you owe us <%= @balance %> this month...",
   [name: "Wayne Campbell", balance: "$964.14", ...]
)

l00ker · June 13, 2021, 6:16pm

Not necessarily. You can use a mix release which will include ERTS and the compiled Elixir app. You don’t need to have Erlang & Elixir installed on the production VM. That’s the real beauty of a mix release.

A little off topic but relevant to my suggestion:

The easiest way to do this is to install the lxd snap on your Ubuntu box and then launch a CentOS 7 container and shell into it.

Inside the CentOS 7 container install asdf, then the Erlang & Elixir asdf plugins. Then install all of the dependencies required to build Erlang (see the asdf erlang plugin repo).

After that install Erlang & Elixir via asdf, copy your mix project into the container and build the release. Upload the release the the production VM and run it.

It might sound like a bit much, but it’s actually not that complicated and once it’s done, you will then have the CentOS 7 container around to use for future releases as well as a playground to build and test anything else that you don’t want to install on the production CentOS server. I’ve been using LXC containers for all sorts of things like that for years now.