Detect and pause process work until its memory goes down

Hello,

I have a process (A) that will continuously get new data from some external source. Every time another process request some data from A, A will return and remove it from its state, meaning that as long as data requests are faster or at the same speed as new data is collected by A, memory is not an issue.

As you can imagine, this is not the case, the rate that data is removed from A varies, so sometimes it is faster than the rate new data is arriving and sometimes is not.

To manage that, I was thinking of establishing some maximum allowed memory A can have, and if it reaches it, it will stop collecting data until the memory is lower again.

Example:
Maximum memory A is allowed to have is 5GB.

After collecting some data, it reached 5.1GB, instead of collecting more data, A sleeps for 10 seconds, after that it checks its ram usage again, if it is still bigger than 5GB, it sleeps again, if not, it resumes collecting new data.

My question is, first, is this a good solution/idea?

Also, how can I get this memory information for A process efficiently? I think I can get if from Process.info but I’m not sure which field total_heap_size, heap_size, stack_size, etc should I use.

Hello,
have you already looked at GenStage?
If I understand well, you are looking for some form of backpressure.
Hth.

I did, but as far as I know I cannot change the max_demand logic to check for the process stored memory and not just a number of demands, if that was possible I think GenStage would make perfect sense.

I see, having no real-world experience with this kind of problem I’m probably misunderstanding it, but searching a while I found a ‘solved’ thread in the forum where a memory issue was accounted with GenStage (edit: actually the subject was Flow): […] memory was growing quite fast to more than 1gb, I ctrl+ced to not exhaust my RAM".
Maybe that is an indirect way to keep memory under control? Sorry if getting it wrong.

If your state is going to be that big you’ll want to keep that data in ETS otherwise every iteration of that process state will he kept in memory until it’s garbage collected. And you can easily get the size of an ets table and base some logic off of that.

Take a look at Broadway for processing from that queue. It’s GenStage but with a lot of things already done for you.

Hey @patrickdm, thanks for the suggestion, you are right in thinking that limiting the chunk size would fix it, but if I got it right this is only valid when I’m streaming something right (eg. a file content).

In my case, I have multiple external sources and its data is variable in size (that’s why I can’t limit it with max_demand).

I didn’t use ETS initially because the data is a list that is prepended with new data (in an ordered manner), so simply using a List sounded like a better fit.

But I guess I can have the same behavior if I use an ordered set ETS.

Thanks, I will try that!