Error decoding data from external source

Hi all, I’m trying to grab some data from a txt file online and running into a strange error when trying to use Jason.decode and Poison.decode. Here is code to reproduce and then the error.

{:ok, %HTTPoison.Response{status_code: 200, body: body}} = HTTPoison.get("https://www.ndbc.noaa.gov/data/realtime2/44097.spec")

body |> String.split("\n") |> Enum.map(fn row -> String.split(row, " ", trim: true) end) |> Enum.drop_every(2_500) |> Enum.drop_every(2_500) |> Jason.decode!()
(Jason.DecodeError) unexpected byte at position 16: 0x2E (".")

Well, the data is not JSON. Even after you split it in pieces, those pieces are still not valid JSON.

Working with the first non-commented out text row:

iex(1)> text = "2023 07 27 15 56  1.1  0.1 10.5  1.1  4.8   S  SW VERY_STEEP  4.1 217"
"2023 07 27 15 56  1.1  0.1 10.5  1.1  4.8   S  SW VERY_STEEP  4.1 217"
iex(2)> String.split(text, " ", trim: true)
["2023", "07", "27", "15", "56", "1.1", "0.1", "10.5", "1.1", "4.8", "S", "SW",
 "VERY_STEEP", "4.1", "217"]

Decoding JSON requires the input to be either JSON array or JSON object. Example:

iex(3)> "[\"2023\", \"07\"]" |> Jason.decode!()
["2023", "07"]

So your error is very logical. What’s your expected output per row? This looks more like a case of a CSV data (with space or tab separator) and not JSON.

You’re right I can use the data as is and was just adding an unnecessary step. Thanks

You can either utilize a CSV library like the excellent nimble_csv and specify a custom separator (and skip N first rows that are commented out) or just do your own mapping f.ex. you can get that record from above – ["2023", "07", "27", "15", "56", "1.1", "0.1", "10.5", "1.1", "4.8", "S", "SW", "VERY_STEEP", "4.1", "217"] – and feed it to a function that converts it to a map with informative keys.

In any case, there are people around here who are always curious about data ingestion and conversion tasks (me included) so if you can post your final solution, that would benefit them and potential future readers. :smiley:

Sure I can do that. This data source is very inconvenient in that it only offers this data in that plain text format, I’ll take a look at nimble_csv.

To be fair to all sides, it’s not necessary to convert a list record to a map at all. You can just feed each record to a function that accepts it in its original list form and then deconstruct it inside:

def process_record(record) when is_list(record) do
  [year, month, day, hour, minute, ...] = record
end

And then work with each variable.

Anyhow, this is of interest to me so I will probably post sample code later.

2 Likes

I looked at this again but I don’t understand the names and the idea of part of the columns so can’t write good educated code.

How far did you get?

It looks like weather data where the column headers are on the first row and the units are split onto the second row.

[column_headers, units | data] =
  body
  |> String.split("\n", trim: true)
  |> Enum.map(&String.split(&1, " ", trim: true))
  ...

For now, my use case is simple and all I need to do is grab the most recent row of data. As my project grows I will need to use all the rows and also other data from this website that is served in a similar way so my solution will evolve then.

spectral_data
    |> String.split("\n")
    |> Enum.map(fn row -> String.split(row, " ", trim: true) end)
    |> Enum.drop_every(2_500)
    |> Enum.drop_every(2_500)
    |> Enum.at(0)
    |> then(fn row ->
      [
        year,
        month,
        day,
        hour,
        minute,
        wave_height,
        swell_height,
        swell_period,
        wind_wave_height,
        wind_wave_period,
        swell_direction,
        wind_wave_direction,
        steepness,
        average_period,
        mean_wave_direction
      ] = row

      %{
        wave_height: wave_height,
        swell_height: swell_height,
        swell_period: swell_period,
        wind_wave_height: wind_wave_height,
        swell_direction: swell_direction,
        wind_wave_period: wind_wave_period,
        wind_wave_direction: wind_wave_direction,
        steepness: steepness,
        average_period: average_period,
        mean_wave_direction: mean_wave_direction
      }
    end)

The most recent row is the topmost one (after the commented out lines)?

Word list sigils paired with Enum.zip or Enum.zip_with could be nice here.

[_columns, _units | spectral_data] =
  body
  |> String.split("\n", trim: true)
  |> Enum.map(&String.split(&1, " ", trim: true))
  ...

# word list sigil
column_headers = ~w[
  year month day hour minute
  wave_height swell_height swell_period
  wind_wave_height wind_wave_period
  swell_direction wind_wave_direction
  steepness average_period mean_wave_direction
]a # type modifier for atoms

first_row_map =
  column_headers
  |> Enum.zip(hd(spectral_data))
  |> Enum.into(%{})
# or
first_row_map =
  spectral_data
  |> hd() # or `Enum.at(0)`
  |> then(fn first_row -> Enum.zip(column_headers, first_row) end)
  |> Enum.into(%{})
# or even
first_row_map =
  spectral_data
  |> hd() # or `Enum.at(0)`
  |> Enum.zip_with(column_headers, fn [c, h] -> {h, c} end)
  |> Enum.into(%{})