kodepett

Processing json files

Hi,
I’m working on an elixir application that processes json files(contain transactions) from an upstream server. The average file size is 10mb. The upstream server will generate the file(s) and push it to my server. I need to parse the json file and generate alerts where applicable based on threshold configuration. A lot of files will be generated by the upstream server and I’m expecting to process about 10GB in total everyday. I don’t have control over the rate at which the files are generated and pushed by the upstream server.

What will be the most efficient solution for the above. (I’ve heard of FLOW/BROADWAY/GENSTAGE - which one will be ideal)
Is there any filesystem monitoring api that can monitor the filesystem and notify my application once a new file is pushed by the upstream server.
What are some of the gotchas I should be aware of, especially if I want to persist the file content to a database(Postgresql)

Thanks for your continuous guidance.

6 comments

#development

6 3662 6

2022-05-29 12:26:05 UTC

Most Liked

Eiji

@kodepett I have working in a bit similar case. I was doing a migration from old JSON BigData (thousands of small files) to PostgreSQL database. Therefore I did not need to make it fastest ever, but of course it should not run for hours. For that case simply Flow was enough.

From what you said you should have about 1000 files per day which is not that big number. If you will make parsing each file enough fast then number of files should not be a problem for you.

I’m sure about only one thing: there is nothing ideal . It may depend on your logic. Look that simply putting JSON to jsonb column without even parsing JSON to Elixir structs is way different comparing to a big process of parsing and processing data. If working with single file will be really short then maybe you do not even need to think about putting an extra dependency you don’t know about just for this case. However typically the whole process is longer than just reading file and it may be worth to think about splitting job into few stages. For this case I would recommend Broadway.

You may also be interested in Flow by Plataformatec. Both Broadway and Flow are built on top of GenStage. Flow is a more general abstraction than Broadway that focuses on data as a whole, providing features like aggregation, joins, windows, etc. Broadway focuses on events and on operational features, such as metrics, automatic acknowledgements, failure handling, and so on.
Source: GitHub - elixir-broadway/broadway: Concurrent and multi-stage data ingestion and data processing with Elixir · GitHub

Yes, it is. Even phoenix_live_reload is using it, so it may be worth to check it’s source. The library is called file_system. For this please make sure that your backend is prepared as in some cases it’s a must have.

There are lots of gotchas. From basic Elixir up to specific to your use case.

First of all for a well known gotchas in Elixir you may read this forum topic:

When working in file you are doing few operations (especially with a big number of files) then it’s worth to use different File.open or sometimes also File.stream. Therefore in some cases Jaxon may be interesting for you:

https://moboudra.com/intro-to-jaxon-json-parser-for-elixir/

Finally you should be aware of typical overusing some features of Elixir like GenServer as there are not good for every use case:

https://learn-elixir.dev/dangers-of-genservers

Post #2

axelson

Scenic Core Team

If the average file size is 10mb I think you’d probably be okay reading the whole file into memory and parsing it all at once. But you should definitely do some benchmarking on your own to see what works well for your use-case.

Post #6

Where Next?

View thread on forum (has 6 responses!)

development

Home Questions & Help>Questions

#development

6 3674 6

Last post

Popular in Questions

Questions & Help>Questions

Params in the URL and body -- how does Phoenix handle them together?

If I have a post route which an argument: post /my_post_route/:my_param1, MyController.my_post_handler How would get the post params ...

/phoenix #params

17 26931 14

2018-06-13 21:38:48 UTC

New

Questions & Help>Questions

Web scraping tools

I want to try my hand at web scraping. What tools/libraries do I need to use. I’m hoping to turn this into something professional so don’...

#web-scraping

123 19532 45

2021-04-30 08:10:13 UTC

New

Questions & Help>Questions

Deleting item from a list

Hello, can anybody help here..? I have a list of players and I what to delete an element, but every for loop the list is reverting to ori...

7 24292 4

2020-03-18 04:04:09 UTC

New

Questions & Help>Questions

How can I write a raw sql query?

Hi, I have to write a raw query for one of my project. But till now I have used ecto queries and don’t have much experience writing raw ...

/phoenix #ecto

13 19654 20

2020-04-12 00:15:10 UTC

New

Questions & Help>Questions

(Postgrex.Error) FATAL 28P01 (invalid_password) password authentication failed for user “postgres”

After calling mix ecto.create I get this error: 17:00:32.162 [error] GenServer #PID<0.412.0> terminating ** (Postgrex.Error) FATAL...

#ecto #postgres #troubleshooting

10 29754 20

2023-03-18 06:56:50 UTC

New

Questions & Help>Questions

Using List.first instead of Enum.at(0)

I have seen a lot of code which picks the first element from a list using Enum.at(0) instead of List.first. Is there a reason why people ...

#code-style

76 33670 14

2022-10-26 22:41:44 UTC

New

Questions & Help>Questions

Ecto query using like/ilike in query

Good day to you all. I have been struggling to get a query involving like and ilike to work. Can anyone assist me on this, please? pro...

#ecto

17 16876 10

2022-09-15 19:56:29 UTC

New

Questions & Help>Questions

BEAM file compliation error after degrading Erlang version from 22 to 21

I have followed this StackOverflow post to install the specific version of Erlang. And When I am running mix ecto.setup then getting fol...

/phoenix #erlang

33 15756 4

2019-06-10 13:33:06 UTC

New

Questions & Help>Questions

Using VSCode on multiple monitors

Hello everybody, usually, I use a 29" ultra-wide monitor for VSCode which can easily accomodate explorer (files panel) + file with code ...

#vscode #vscode-elixir

7 17888 6

2021-04-16 15:44:36 UTC

New

Questions & Help>Questions

How To Get Phoenix & VueJS working Together?

I have VueJS GUIs with the project generated using Webpack. I have Elixir modules that will need to be used by the VueJS GUIs. I forese...

/phoenix

93 22614 42

2019-12-19 09:28:07 UTC

New

Other popular topics

Questions & Help>Questions

Deleting item from a list

Hello, can anybody help here..? I have a list of players and I what to delete an element, but every for loop the list is reverting to ori...

7 24292 4

2020-03-18 04:04:09 UTC

New

Questions & Help>Questions

System.get_env vs. Application.get_env

What is the difference between System.get_env and Application.get_env? For example, what are best practices to use one versus another.

#environment

13 17200 6

2023-08-30 16:24:51 UTC

New

Questions & Help>Questions

How are you dealing with CSV files that uses CR line breaks?

I believe there are people here who are dealing with CSV files import on the daily basis, and since Excel is a really popular tool there ...

#how-to-question

10 18614 17

2017-08-25 15:34:14 UTC

New

Questions & Help>Questions

How to serve an img with Phoenix?

I’m brand new to Phoenix and I have stripped one of the demo applications to the bone. I just want to get an svg up on the screen. Here i...

/phoenix

15 18745 5

2017-12-27 15:50:20 UTC

New

Questions & Help>Questions

Anonymous functions with multiple body

Hi guys, i’m new in the Elixir world, and i have to say, that i love it! i’m having some problem to understand anonymous functions with ...

19 21684 4

2017-02-16 19:25:58 UTC

New

Questions & Help>Questions

What do you think of Gleam compared to Elixir?

I have a relationship of love and hate with Elixir. Lots of things are just absolutely right, but there are some things that are kind of ...

#programminguages #gleam

24 17513 10

2023-04-08 20:09:27 UTC

New

Questions & Help>Questions

Starship (cross-shell prompt) error - (starship::utils): Executing command "elixir" timed out

I am using the Starship cross-shell prompt – it seems pretty nice, but I get some errors: [WARN] - (starship::utils): Executing command ...

#starship

8 17307 3

2021-04-26 16:14:19 UTC

New

Questions & Help>Questions

Installing elixir via asdf shows zsh: command not found: iex

I tried installing elixir 1.11.2 erlang 23.3.4 via asdf in my zsh shell. Enabled the versions locally and globally. When I list them ...

#erlang #asdf

44 16948 17

2023-12-27 16:32:30 UTC

New

Questions & Help>Questions

No such input `xxxxx` for action ResourceName1.create

In the code below, if the create action is not set to accept “extra_key” as an input, it errors out with a message shown above. Is there ...

/ash

3 78595 2

2024-05-13 17:51:41 UTC

New

Questions & Help>Questions

How To Implement if...else if...else condition

Hi everyone! I need implement if…else if…else condition from my elixir code, and anymore of this control flow structures not work proper...

#how-to-question

40 52243 6

2017-08-23 10:29:43 UTC

New

Questions & Help>Questions

I miss the ternary operator - does anyone have a macro that allows a ternary operator in Elixir code?

Questions & Help>Questions

Empty Result on Generic Action with graphql_unnested_unions

Questions & Help>Questions

Clarification about `assign/2,3` usage in `render/1` callbacks

Questions & Help>Questions

With the new 1.20 release does it change the way you see Gleam?

Questions & Help>Questions

Using Phoenix.LiveView.TagEngine as an EEx.Engine is deprecated!

Questions & Help>Questions

About ambiguity introduced in function default arguments

Questions & Help>Questions

OpenApiSpex schema - are there any naming conventions on handling show and index routes?

Questions & Help>Questions

How to get type warnings before test failure reports

Questions & Help>Questions

Help with Durable Server counter demo as a first step

Questions & Help>Questions

Has anyone implemented 2FA with Ash Authentication?

Questions & Help>Questions

Questions Questions ❯

Latest on Elixir Forum

Mob 0.7.13 released!

News>News & Updates

Finitomata v0.41.0 released!

News>News & Updates

Oban v2.23.0 released!

News>News & Updates

Mob 0.7.12 released!

News>News & Updates

Nerves v1.14.3 released!

News>News & Updates

ElixirConf US 2026 is coming! Meet the Keynote Speakers and check out the talk lineup!

Chat & Discussions>Chit Chat

Sidereon - GPS, satellite positioning, and astrodynamics for Elixir

News>Announcing

Patch Package OTP 27.3.4.14 Released

News>Erlang News

Patch Package OTP 28.5.0.3 Released

News>Erlang News

Patch Package OTP 29.0.3 Released

News>Erlang News

LT: Your project is great! More people should know about it - Kamila Pokój | ElixirConf EU

Learning Resources>Talks

Stack Overflow Developer Survey 2026

Chat & Discussions>Discussions

Cache hit ratio on oban_jobs at 65% - large completed backlog, is this expected?

Questions & Help>Troubleshooting

Andy LeClair - Principal/Staff Full Stack Engineer

Jobs & Member Profiles>Member Profiles

What Liveview component for small data-grid?

Chat & Discussions>Discussions

Elixir Forum ❯

Sub Categories:

Forums

We're in Beta

About us Mission Statement

Processing json files

kodepett

Processing json files

Most Liked

Eiji

axelson

Where Next?

Popular in Questions

Params in the URL and body -- how does Phoenix handle them together?

Web scraping tools

Deleting item from a list

How can I write a raw sql query?

(Postgrex.Error) FATAL 28P01 (invalid_password) password authentication failed for user “postgres”

Using List.first instead of Enum.at(0)

Ecto query using like/ilike in query

BEAM file compliation error after degrading Erlang version from 22 to 21

Using VSCode on multiple monitors

How To Get Phoenix & VueJS working Together?

Other popular topics

Deleting item from a list

System.get_env vs. Application.get_env

How are you dealing with CSV files that uses CR line breaks?

How to serve an img with Phoenix?

Anonymous functions with multiple body

What do you think of Gleam compared to Elixir?

Starship (cross-shell prompt) error - (starship::utils): Executing command "elixir" timed out

Installing elixir via asdf shows zsh: command not found: iex

No such input `xxxxx` for action ResourceName1.create

How To Implement if...else if...else condition

Questions & Help>Questions

Latest on Elixir Forum

Sponsor Spotlight

Our Sponsors

Categories:

Sub Categories:

Forums

Our Sponsors

We're in Beta