msonawane

msonawane

Xml Data Pipeline - suggestions needed

Greetings,
Trying to get a toy project done in both go and elixir for fun. Would love to hear suggestions , libs to use and strategies to make it done quicker and faster.

  • ingest about 1000 xml files containing Articles (about 2 million of them ) with slight schema differences every day,

    • size varies from 20 mb to 8 GB
    • schema varies slightly / different node / element /attributes names

    Using Saxy.Handler and file.stream! above tasks have been completed.

  • for every article in xml file.

    • do processing
    • check if it exists in database, insert / update articles in database
    • create an xml file for changed / new articles each day for backup

    should this be a job queue ? GenStage, Flow ?

  • Requirements

    • telemetry

      • ( new / updated articles every day)
      • time taken
      • any error notification
      • dashboard
    • back pressure

      • should not slow down / overwhelm databases
    • optimizations.

      • take articles in a chunk of 500, do a single DB query to check if they exists in database.
      • batch updates/ inserts etc

Most Liked

BartOtten

BartOtten

For the backpressure and chunking I use Genstages. Works like a charm :slight_smile: Have no computer nearby, so can’t provide an example. You probably can find one searching for ‘genstage stream chunk’

Where Next?

Popular in Questions Top

Tee
can someone please explain to me how Enum.reduce works with maps
New
siddhant3030
Hi, I have to write a raw query for one of my project. But till now I have used ecto queries and don’t have much experience writing raw ...
New
JeremM34
Hello, how can I check the Phoenix version ? Thanks !
New
mgjohns61585
Could someone help me? I’m making my first elixir program, number guessing game. I can’t figure out how to convert the user’s guess from ...
New
jononomo
I am trying to figure out how Mix knows whether the environment is test, dev, or prod – where is this set? Thanks.
New
vac
Hi, I’m quite new in Elixir and I’m trying to format a string to a PEM format. I have the certificate value like MIIDBTCCAe2...... and I...
New
JulienCorb
I am trying to implement my new.html.eex file to create new posts on my website. new.html.eex: <h1>Create Post</h1> <%= ...
New
jaysoifer
Is there a way to rollback a specific migration and only that one (“skipping” all the other ones)? Would mix ecto.rollback -v 200809061...
New
vegabook
I’m brand new to Phoenix and I have stripped one of the demo applications to the bone. I just want to get an svg up on the screen. Here i...
New
RisingFromAshes
I’ve read in another post that it may be possible with a router helper - but I couldn’t find an appropriate one, and tbh, I’m still just ...
New

Other popular topics Top

Harrisonl
We have an ECS cluster with 4 services, where each task joins a single cluster, via discovery ECS discovery service. Currently when I de...
New
lastday4you
I wanted to check elixir version in phoenix because i found that my elixir is 1.5 but when i use Enum.chunk_by it said the function is un...
New
Patoshizzle
After calling mix ecto.create I get this error: 17:00:32.162 [error] GenServer #PID<0.412.0> terminating ** (Postgrex.Error) FATAL...
New
JakeBecker
TL;DR: I’ve just released an implementation of Microsoft’s IDE-independent Language Server Protocol for Elixir. It adds language support ...
1144 53690 245
New
stefanchrobot
What’s the safe way to decode a JSON string into a struct? I want to avoid calling String.to_atom. Jason.decode can give me a map with st...
New
Lily
In templates/appointment/index.html.eex: <%= for appointment <- @appointments do %> <tr> <td><%= appoi...
New
dokuzbir
I want to highlight html closing tags when i click a html tag. That works in .html files but doesnt work for html.eex templates. How can...
New
baxterw3b
Hi guys, i’m new in the Elixir world, and i have to say, that i love it! i’m having some problem to understand anonymous functions with ...
New
joeerl
Hello again - after a longish gap I’ve decided I really must dig into Elixir and see what’s been happening here - so I have a few questio...
New
AstonJ
Please see the new poll here: Which code editor or IDE do you use? (Poll) (2022 Edition) It’s been a while since we first asked this, I...
208 31142 143
New

We're in Beta

About us Mission Statement