Process a large gRPC client response as it is being streamed

RooSoft · December 8, 2021, 7:41pm

Hi!

Trying to efficiently parse a 111MB JSON gRPC response that is bound to exponentially grow over time for a while. Expecting it to hit gigabytes mid-long term… talking about the Lightning Network graph straight from a local node. Here’s what it looks like visually…

Currently using this lib: {:grpc, github: "elixir-grpc/grpc"}

As these lines are being written, it seems there’s no way to access that data until it’s been completely parsed and downloaded in RAM. Not sure about that solution’s scalability…

If possible at all, I’d decode the JSON as it is being streamed from the server. That would make this process even possible on the smallest computer such as a raspberry pi, which is super popular hardware used by node operators.

Thanks for any insight on how to achieve this!

Marc

dimitarvp · December 9, 2021, 12:08am

Just today I found myself wondering the same. There’s Jaxon but I am not sure I can invest the time to rewrite certain code in the projects I work on. Replacing a simple Jason.decode with who knows how many lines that partially ingest a JSON stream can bring about metric tons of complexity.

Maybe a better question for you would be: is there a way to consume the API in a one-JSON-record-per-line fashion?

RooSoft · December 9, 2021, 12:31am

I have no control over how the server sends data… it’s sending it in one enormous blob…

The problem also is about reading the data while it’s getting in… not even sure it’s possible with the current grpc lib.

dimitarvp · December 9, 2021, 12:43am

Shame the service doesn’t offer alternatives.

Forking the library is always an option though. If you get desperate, go for that. In my case, I have to parse no more than 20-30 MB for the moment and the API server is slow – sometimes takes 15-20 seconds to deliver data, but our app isn’t real time for 99% of the data so it’s okay.

Hope you can find a solution.