Excel Sheet(xslx) Import to Postgres DB in Elixir Phoenix

I am working on one project where I have to import data from an Excel sheet to Postgres DB. In my Previous Project, I have done the CSV File import using https://hex.pm/packages/nimble_csv. It works like a charm.

In this Project, I am dealing with Excel files.

  NimbleCSV.define(NimbleCSV.Spreadsheet, [])
  NimbleCSV.define(MyParser,  escape: "\"")

  alias NimbleCSV.RFC4180, as: CSV

  def import_shipments(conn, %{"params" => %{"import_file" => params} }) do

    params.path
      |> File.stream!(read_ahead: 100_000)
      |> MyParser.parse_stream([skip_headers: true])
      |> Stream.map(fn n ->
          case n do
            [name, barcode] ->
                IO.inspect name
                IO.inspect barcode
            _ -> IO.inspect "empty"
          end
        end)
      |> Stream.run


    render(conn, "index.html")
  end

I am getting this error.

** (NimbleCSV.ParseError) unexpected escape character " in <<101, 236, 208, 147, 208, 215, 243, 190, 72, 245, 234, 52, 141, 234, 136, 196, 67, 12, 6, 170, 162, 4, 133, 193, 197, 118, 8, 189, 129, 253, 238, 237, 225, 25, 86, 205, 162, 222, 224, 104, 37, 143, 176, 31, 18, 171, 188, 19, 216, 128, ...>>

Can anyone give me insight on this? Your help is greatly appreciated. Thanks

The data doesn’t, at first glance, look like UTF-8 encoding. Did you export the data explicitly as UTF-8? ie CSV UTF-8 (Comma-delimited) (.csv) below?

I got the sample excel file from the client. It was stored under this format.

Screenshot from 2020-05-06 09-57-31

I think the issue is that you actually want UTF-16 encoding, but the Parser is trying to use UTF-8. Try changing

alias NimbleCSV.RFC4180, as: CSV

to

alias NimbleCSV.Spreadsheet, as: CSV

as described in the docs.

I changed

  NimbleCSV.define(NimbleCSV.Spreadsheet, [])
  NimbleCSV.define(MyParser,  escape: "\"")

  alias NimbleCSV.Spreadsheet, as: CSV

  def import_shipments(conn, %{"shipment_params" => %{"import_file" => params} }) do

    params.path
      |> File.stream!(read_ahead: 100_000)
      |> MyParser.parse_stream([skip_headers: true])
      |> Stream.map(fn n ->
          case n do
            [name, barcode] ->
                IO.inspect name
                IO.inspect barcode
            _ -> IO.inspect "empty"
          end
        end)
      |> Stream.run


    render(conn, "index.html")
  end

But again the same error is coming

** (NimbleCSV.ParseError) unexpected escape character " in <<80, 75, 3, 4, 20, 0, 8, 8, 8, 0, 34, 35, 166, 80, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 26, 0, 0, 0, 120, 108, 47, 95, 114, 101, 108, 115, 47, 119, 111, 114, 107, 98, 111, 111, 107, 46, 120, 109, ...>>

Is it a CSV file you are parsing or an excel file?

Edit: It doesn’t look like CSV, in which case NimbleCSV is the wrong tool for the job…

1 Like

Did you export the file as CSV? The library assumes that starting format, not XLSX, etc.

1 Like

Here is the test file which I am using

It’s an excel file.

What I’m saying is I think you’'ll have to export it (save as CSV), in order to use NimbleCSV to parse it (irrespective of whether you use the UTF-16 or UTF-8 parser from the library).

I don’t know if this is your specific situation, but I have had multiple issues reading data from excel when the data was coming from a weird encoding database export (saving the excel as utf-8 csv did nothing to help).

I had to use codepagex in order to successfully read that data.

1 Like

I tried with Codepagex lib

  def import_shipments(conn, %{"shipment_params" => %{"import_file" => params} }) do

    params.path
      |> File.stream!(read_ahead: 100_000)
      |> MyParser.parse_stream([skip_headers: true])
      |> Stream.map(fn n ->
          case n do
            [name, barcode] ->
                IO.inspect Codepagex.to_string(name, :iso_8859_1)
                IO.inspect Codepagex.to_string(barcode, :iso_8859_1)
            _ -> IO.inspect "empty"
          end
        end)
      |> Stream.run


    render(conn, "index.html")
  end

But still the same error.

** (NimbleCSV.ParseError) unexpected escape character " in <<80, 75, 3, 4, 20, 0, 8, 8, 8, 0, 34, 35, 166, 80, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 26, 0, 0, 0, 120, 108, 47, 95, 114, 101, 108, 115, 47, 119, 111, 114, 107, 98, 111, 111, 107, 46, 120, 109, ...>>

I think in this line itself it is throwing error

[name, barcode]

Is <<0x50, 0x4b, 0x03, 0x04, …>>, which again is the “magic number” for ZIP archives, which again is the envelop format for excel spreadsheets.

nimble_csv can’t work with those, as nimble_csv is a CSV parser, not an Excel Spreadsheet parser.

Please use a library suited to Excel Spreadsheets or open your data in Excel and export it as CSV.

4 Likes

Use this for excel

https://github.com/TheFirstAvenger/elixir-xlsx_parser
1 Like