msonawane
Xml Data Pipeline - suggestions needed
Greetings,
Trying to get a toy project done in both go and elixir for fun. Would love to hear suggestions , libs to use and strategies to make it done quicker and faster.
-
ingest about 1000 xml files containing Articles (about 2 million of them ) with slight schema differences every day,
- size varies from 20 mb to 8 GB
- schema varies slightly / different node / element /attributes names
Using Saxy.Handler and file.stream! above tasks have been completed.
-
for every article in xml file.
- do processing
- check if it exists in database, insert / update articles in database
- create an xml file for changed / new articles each day for backup
should this be a job queue ? GenStage, Flow ?
-
Requirements
-
telemetry
- ( new / updated articles every day)
- time taken
- any error notification
- dashboard
-
back pressure
- should not slow down / overwhelm databases
-
optimizations.
- take articles in a chunk of 500, do a single DB query to check if they exists in database.
- batch updates/ inserts etc
-
Most Liked
BartOtten
For the backpressure and chunking I use Genstages. Works like a charm
Have no computer nearby, so can’t provide an example. You probably can find one searching for ‘genstage stream chunk’
2
Popular in Questions
can someone please explain to me how Enum.reduce works with maps
New
Hi,
I have to write a raw query for one of my project. But till now I have used ecto queries and don’t have much experience writing raw ...
New
Hello, how can I check the Phoenix version ?
Thanks !
New
Could someone help me? I’m making my first elixir program, number guessing game. I can’t figure out how to convert the user’s guess from ...
New
I am trying to figure out how Mix knows whether the environment is test, dev, or prod – where is this set?
Thanks.
New
Hi,
I’m quite new in Elixir and I’m trying to format a string to a PEM format. I have the certificate value like MIIDBTCCAe2...... and I...
New
I am trying to implement my new.html.eex file to create new posts on my website.
new.html.eex:
<h1>Create Post</h1>
<%= ...
New
Is there a way to rollback a specific migration and only that one (“skipping” all the other ones)?
Would
mix ecto.rollback -v 200809061...
New
I’m brand new to Phoenix and I have stripped one of the demo applications to the bone. I just want to get an svg up on the screen. Here i...
New
I’ve read in another post that it may be possible with a router helper - but I couldn’t find an appropriate one, and tbh, I’m still just ...
New
Other popular topics
We have an ECS cluster with 4 services, where each task joins a single cluster, via discovery ECS discovery service.
Currently when I de...
New
I wanted to check elixir version in phoenix because i found that my elixir is 1.5 but when i use Enum.chunk_by it said the function is un...
New
After calling mix ecto.create I get this error:
17:00:32.162 [error] GenServer #PID<0.412.0> terminating
** (Postgrex.Error) FATAL...
New
TL;DR: I’ve just released an implementation of Microsoft’s IDE-independent Language Server Protocol for Elixir. It adds language support ...
New
What’s the safe way to decode a JSON string into a struct? I want to avoid calling String.to_atom. Jason.decode can give me a map with st...
New
In templates/appointment/index.html.eex:
<%= for appointment <- @appointments do %>
<tr>
<td><%= appoi...
New
I want to highlight html closing tags when i click a html tag. That works in .html files but doesnt work for html.eex templates. How can...
New
Hi guys, i’m new in the Elixir world, and i have to say, that i love it!
i’m having some problem to understand anonymous functions with ...
New
Hello again - after a longish gap I’ve decided I really must dig into Elixir and see what’s been happening here - so I have a few questio...
New
Please see the new poll here: Which code editor or IDE do you use? (Poll) (2022 Edition)
It’s been a while since we first asked this, I...
New
Categories:
Sub Categories:
Forums
Popular Tags
- #ecto
- #liveview
- #troubleshooting
- #learning-elixir
- #deployment
- #library
- #erlang
- #testing
- #genserver
- #mix
- #absinthe
- #remote-other
- #otp
- #plug
- #how-to-question
- #macros
- #postgres
- #channels
- #elixirconf
- #exunit
- #discussion
- #javascript
- #code-sync
- #podcasts
- #onsite
- #dialyzer
- #docker
- #authentication
- #umbrella
- #full-time-contract
- #podcasts-by-brainlid
- #ecto-query
- #elixir-ls
- #phoenix_html
- #iex
- #blog-post
- #graphql
- #genstage
- #ai
- #websockets
- #supervisor
- #advent-of-code
- #elixirconf-us
- #distillery
- #processes
- #forms
- #api
- #metaprogramming
- #security
- #performance








