kodepett

kodepett

Processing json files

Hi,
I’m working on an elixir application that processes json files(contain transactions) from an upstream server. The average file size is 10mb. The upstream server will generate the file(s) and push it to my server. I need to parse the json file and generate alerts where applicable based on threshold configuration. A lot of files will be generated by the upstream server and I’m expecting to process about 10GB in total everyday. I don’t have control over the rate at which the files are generated and pushed by the upstream server.

  1. What will be the most efficient solution for the above. (I’ve heard of FLOW/BROADWAY/GENSTAGE - which one will be ideal)

  2. Is there any filesystem monitoring api that can monitor the filesystem and notify my application once a new file is pushed by the upstream server.

  3. What are some of the gotchas I should be aware of, especially if I want to persist the file content to a database(Postgresql)

Thanks for your continuous guidance.

Most Liked

Eiji

Eiji

@kodepett I have working in a bit similar case. I was doing a migration from old JSON BigData (thousands of small files) to PostgreSQL database. Therefore I did not need to make it fastest ever, but of course it should not run for hours. For that case simply Flow was enough.

From what you said you should have about 1000 files per day which is not that big number. If you will make parsing each file enough fast then number of files should not be a problem for you.

I’m sure about only one thing: there is nothing ideal :smiley:. It may depend on your logic. Look that simply putting JSON to jsonb column without even parsing JSON to Elixir structs is way different comparing to a big process of parsing and processing data. If working with single file will be really short then maybe you do not even need to think about putting an extra dependency you don’t know about just for this case. However typically the whole process is longer than just reading file and it may be worth to think about splitting job into few stages. For this case I would recommend Broadway.

You may also be interested in Flow by Plataformatec. Both Broadway and Flow are built on top of GenStage. Flow is a more general abstraction than Broadway that focuses on data as a whole, providing features like aggregation, joins, windows, etc. Broadway focuses on events and on operational features, such as metrics, automatic acknowledgements, failure handling, and so on.
Source: GitHub - elixir-broadway/broadway: Concurrent and multi-stage data ingestion and data processing with Elixir · GitHub

Yes, it is. Even phoenix_live_reload is using it, so it may be worth to check it’s source. The library is called file_system. For this please make sure that your backend is prepared as in some cases it’s a must have.

There are lots of gotchas. From basic Elixir up to specific to your use case.

First of all for a well known gotchas in Elixir you may read this forum topic:

When working in file you are doing few operations (especially with a big number of files) then it’s worth to use different File.open or sometimes also File.stream. Therefore in some cases Jaxon may be interesting for you:

https://moboudra.com/intro-to-jaxon-json-parser-for-elixir/

Finally you should be aware of typical overusing some features of Elixir like GenServer as there are not good for every use case:

https://learn-elixir.dev/dangers-of-genservers

axelson

axelson

Scenic Core Team

If the average file size is 10mb I think you’d probably be okay reading the whole file into memory and parsing it all at once. But you should definitely do some benchmarking on your own to see what works well for your use-case.

Where Next?

Popular in Questions Top

Darmani72
If I have a post route which an argument: post /my_post_route/:my_param1, MyController.my_post_handler How would get the post params ...
New
_russellb
I want to try my hand at web scraping. What tools/libraries do I need to use. I’m hoping to turn this into something professional so don’...
New
vertexbuffer
Hello, can anybody help here..? I have a list of players and I what to delete an element, but every for loop the list is reverting to ori...
New
siddhant3030
Hi, I have to write a raw query for one of my project. But till now I have used ecto queries and don’t have much experience writing raw ...
New
Patoshizzle
After calling mix ecto.create I get this error: 17:00:32.162 [error] GenServer #PID<0.412.0> terminating ** (Postgrex.Error) FATAL...
New
minhajuddin
I have seen a lot of code which picks the first element from a list using Enum.at(0) instead of List.first. Is there a reason why people ...
New
jerry
Good day to you all. I have been struggling to get a query involving like and ilike to work. Can anyone assist me on this, please? pro...
New
ycv005
I have followed this StackOverflow post to install the specific version of Erlang. And When I am running mix ecto.setup then getting fol...
New
stefanluptak
Hello everybody, usually, I use a 29" ultra-wide monitor for VSCode which can easily accomodate explorer (files panel) + file with code ...
New
Emily
I have VueJS GUIs with the project generated using Webpack. I have Elixir modules that will need to be used by the VueJS GUIs. I forese...
New

Other popular topics Top

vertexbuffer
Hello, can anybody help here..? I have a list of players and I what to delete an element, but every for loop the list is reverting to ori...
New
mcarvalho
What is the difference between System.get_env and Application.get_env? For example, what are best practices to use one versus another.
New
lessless
I believe there are people here who are dealing with CSV files import on the daily basis, and since Excel is a really popular tool there ...
New
vegabook
I’m brand new to Phoenix and I have stripped one of the demo applications to the bone. I just want to get an svg up on the screen. Here i...
New
baxterw3b
Hi guys, i’m new in the Elixir world, and i have to say, that i love it! i’m having some problem to understand anonymous functions with ...
New
pmjoe
I have a relationship of love and hate with Elixir. Lots of things are just absolutely right, but there are some things that are kind of ...
New
vrod
I am using the Starship cross-shell prompt – it seems pretty nice, but I get some errors: [WARN] - (starship::utils): Executing command ...
New
fayddelight
I tried installing elixir 1.11.2 erlang 23.3.4 via asdf in my zsh shell. Enabled the versions locally and globally. When I list them ...
New
jason.o
In the code below, if the create action is not set to accept “extra_key” as an input, it errors out with a message shown above. Is there ...
New
lanycrost
Hi everyone! I need implement if…else if…else condition from my elixir code, and anymore of this control flow structures not work proper...
New

We're in Beta

About us Mission Statement