Greetings,
Trying to get a toy project done in both go and elixir for fun. Would love to hear suggestions , libs to use and strategies to make it done quicker and faster.
-
ingest about 1000 xml files containing Articles (about 2 million of them ) with slight schema differences every day,
- size varies from 20 mb to 8 GB
- schema varies slightly / different node / element /attributes names
Using Saxy.Handler and file.stream! above tasks have been completed.
-
for every article in xml file.
- do processing
- check if it exists in database, insert / update articles in database
- create an xml file for changed / new articles each day for backup
should this be a job queue ? GenStage, Flow ?
-
Requirements
-
telemetry
- ( new / updated articles every day)
- time taken
- any error notification
- dashboard
-
back pressure
- should not slow down / overwhelm databases
-
optimizations.
- take articles in a chunk of 500, do a single DB query to check if they exists in database.
- batch updates/ inserts etc
-