@kodepett I have working in a bit similar case. I was doing a migration from old JSON
BigData (thousands of small files) to PostgreSQL
database. Therefore I did not need to make it fastest ever, but of course it should not run for hours. For that case simply Flow
was enough.
From what you said you should have about 1000 files per day which is not that big number. If you will make parsing each file enough fast then number of files should not be a problem for you.
I’m sure about only one thing: there is nothing ideal . It may depend on your logic. Look that simply putting JSON
to jsonb
column without even parsing JSON
to Elixir
structs is way different comparing to a big process of parsing and processing data. If working with single file will be really short then maybe you do not even need to think about putting an extra dependency you don’t know about just for this case. However typically the whole process is longer than just reading file and it may be worth to think about splitting job into few stages. For this case I would recommend Broadway
.
You may also be interested in Flow by Plataformatec. Both Broadway and Flow are built on top of GenStage. Flow is a more general abstraction than Broadway that focuses on data as a whole, providing features like aggregation, joins, windows, etc. Broadway focuses on events and on operational features, such as metrics, automatic acknowledgements, failure handling, and so on.
Source: GitHub - dashbitco/broadway: Concurrent and multi-stage data ingestion and data processing with Elixir
Yes, it is. Even phoenix_live_reload
is using it, so it may be worth to check it’s source. The library is called file_system
. For this please make sure that your backend
is prepared as in some cases it’s a must have.
There are lots of gotchas. From basic Elixir
up to specific to your use case.
First of all for a well known gotchas in Elixir
you may read this forum topic:
When working in file you are doing few operations (especially with a big number of files) then it’s worth to use different File.open
or sometimes also File.stream
. Therefore in some cases Jaxon
may be interesting for you:
https://moboudra.com/intro-to-jaxon-json-parser-for-elixir/
Finally you should be aware of typical overusing some features of Elixir
like GenServer
as there are not good for every use case: