Explorer DataFrame.new like Pandas (rows and columns)

Dear all,

I’m trying to use the new Library Explorer DataFrame as we have for Pandas (python) loading rows and columns arrays.

I have an output parsing HTML table and creating an array of columns and rows.

 def transform_request(body) do
    columns =
      body
      |> Floki.parse_document!()
      |> Floki.find("td.dxgvHeader_DevEx")
      |> Floki.find("tr")
      |> Enum.map(fn {"tr", [],
                      [
                        {"td", [{"style", "font-size:10pt;font-weight:normal;"}], [topic]},
                        {"td", [{"style", "width:1px;text-align:right;"}], [_topic_span]}
                      ]} ->
                        String.downcase(topic)
      end)

    rows =
      body
      |> Floki.parse_document!()
      |> Floki.find("tr.dxgvDataRow_DevEx")
      |> Floki.find("td")
      |> Enum.map(&parse_row/1)

    **DF.new(rows, columns)**

  end

  defp parse_row(
         {"td", [{"class", "dxgv"}, {"align", "right"}, {"style", "font-family:Arial;"}], [topic]}
       ) do
    #%{data: String.trim(topic)}
    String.trim(topic)
  end

  defp parse_row(_), do: ""

How can I load this Explorer DataFrame using columns and rows as an input? By the way, same code in pandas works very well e.g. pd.DataFrame(rows[1:], columns=rows[0])

Can you show an example of the input and output you’re expecting? I don’t understand what’s supposed to happen with rows and columns both just being lists of strings.

Hi!

Sure:

I would expecting as output a json or something like this:

{patient: Sergio, age: 40, gender: M, …}
{patient: Maria, age: 12, gender: F, …}
{patient: João, age: 21, gender: M, …}
{patient: Mirela, age: 52, gender: F, …}

In addiction, it will be an output for REST API.

Thanks.

P.S. Sorry for my English.

Explorer’s accepts a list of maps in it’s DataFrame.new function.

columns = ["patient", "age", "gender"]

rows = [
  ["Sergio", 40, "M"],
  ["Maria", 12, "F"],
  ["João", 21, "M"],
  ["Mirela", 52, "F"]
]

rows
|> Enum.map(fn row ->
  columns
  |> Enum.zip(row)
  |> Enum.into(%{})
end)
|> Explorer.DataFrame.new()
#Explorer.DataFrame<
  Polars[4 x 3]
  age integer [40, 12, 21, 52]
  gender string ["M", "F", "M", "F"]
  patient string ["Sergio", "Maria", "João", "Mirela"]
>

By the way, it sounds like you will be pushing this into a DB and serving it over an API, unless you are using Explorer for other data manipulation, you might want to skip using DataFrames.

3 Likes

You’re right! It worked!

Thank you!