Flow + Jaxon - json stream parsing with Flow

Hello. So, I’m trying to stream a json file, and process it with flow:

    |> File.stream!
    |> Jaxon.Stream.from_enumerable
    |> Jaxon.Stream.query([:root, "rows", :all])
    |> Flow.from_enumerable()
    |> Flow.partition() # should I use partition here?
    |> Flow.reduce(fn -> %{} end, fn inv, acc ->
     ...
    end)
    |> Enum.to_list()

but this is much slower than the version with only Jason.decode!() |> Flow.enumerable() .... Maybe who worked on those libraries could tell me if they are compatible?

1 Like

Streaming doesn’t make things faster and it’s not meant to do that either. Streaming trades more CPU cycles for less memory usage and is generally useful whenever your data does not comfortably fit into memory. This is compounded by the fact that json is not really well suited for being parsed in chunks. It’s possible, but it’s not a great format for streaming. So if your json is not large, just parse it in one go.

2 Likes

The json is large though, that’s why I’m trying to stream it. I might deal with 0.8-1.5gb of json per time, and several of those process per time, it’s going to hog all the memory, so I’m searching a solution for that.