Designing a migration system with Elixir

Happy new year!


I’m trying to create a migration system that takes ids from another websites and convert them to my app ids

There are 3 websites available to import the data, the quantity of ids from every website is ~200.000 which are divided in 4 different categories of migration (EG: Movies category ids, series category ids etc…)

The migration data (website id to my app id) is stored in the database, but some users can have really large lists of content to migrate so fetch from the database every single migration value isn’t a good option, so I decided to use Elixir to handle the problem, my main idea looks like this:

  1. Create an Agent to maintain the state
  2. On init the Agent query the database and get all the migration records and save into memory
  3. User give a list of ids and the Agent return the app ids
  4. If a given website id isn’t on the agent, query the database to check if it’s on it, if is return the value and update the Agent state (The database can be updated with new migration ids over the time)

But here is where I need some help to design the app,

  • If I use a single Agent to maintain the state can lead to bottlenecks if many processes request that data
  • The point 4 can lead to performance problems, since a user can give a huge list of non existing ids will query the database a large amount of times just to ensure the id is existing or not
  • Since the database migration ids is updated regularly, the Agent don’t have any way to know that a new migration id is added and that’s why the point 4 is on the list
  • I’m not sure about this, but the use of memory can be really high since I need to save ~800.000 ids (the 3 websites and my app ids) always persistent in memory

So that are the problems that I identified (feel free to tell me if there is more), this is the first serious project that I’m doing with Elixir and I would like to have some feedback and help to design this migration system in a Elixir way, what are the approaches to solve this problem ^^

I also have a question regarding to the project in general… I have in mind to add this system to my phoenix app, so every instance of my app running on a server will have this migration system with it’s own state, is this a good approach? Or is better to run this service on a separate machine and the phoenix app communicating with it? (I want to point out that this migration process is computational expensive since I need to parse some import files and in some cases scrape content)

I hope can help me to find the best option to implement this ^^

Cheers!

PD: Feel free to tell me (or edit the post) if there is any grammatical error

If I use a single Agent to maintain the state can lead to bottlenecks if many processes request that data

You can allow concurrent reads to the data by letting the Agent manage an ETS table.
Reads don’t need to go through the Agent process, just query ETS directly.

You can also consider using something like the Registry to have an Agent per user to avoid bottlenecks.

is better to run this service on a separate machine and the phoenix app communicating with it?

If the memory usage is not too high, then just run it along side each phoenix app as another umbrella app.

1 Like

(EDIT) I’ll go for ETS…

In that case every Agent will need to have the same state to the migration, this can consume a lot of RAM, no?

If the data is already keyed by user_id, each agent can query the database and store only the data required for a particular user.

1 Like

Before doing anything, I would really check If I need to cache this information. Can’t you just query the database all the time?

However, if the number of reads is very high. I would put the mapping info in an ets table and keep refreshing it every second or ever few seconds. If your ids are sequential, you can always store the max id in a separate ets table inside the configuration using Application.put_env and just get the delta to refresh the ets tables.

Also, if you are doing this for caching, putting it in the same phoenix app is a good idea. On a decent server memory shouldn’t really be an issue.

2 Likes

I’m thinking about it, since I need to check the performance of it, the idea of @mbuhot to have a agent for every user and query the database looks like a best solution than save the data in memory