Getting started with otp; having a sample app which main purpose is to ingest data.
A genserver is subscribed to a given channel and receives load of data passed aroudn through pubsub. Its single purpose is tostore this data in the db (using ecto). It does not store any state.
def init(_args) do
Phoenix.PubSub.subscribe(PubSub, "CHANNEL_NAME")
{:ok, []}
end
def handle_info({:matcher, data}, state) do
upsert(data)
{:noreply, state} # , :hibernate} added, solved memory leak, but ĀÆ\_(āļøæā)_/ĀÆ
end
defp upsert(klines) do
Repo.insert_all(
ModuleName,
data, # array of struct
on_conflict: :replace_all,
conflict_target: [:key1, :key2, :key3]
)
end
Its memory invariably increase to some 100Mb, and never lowers. Googling around, found out about {:noreply, state, :hibernate}, and it solved my issue.
But no clue why this memory increase ended up to be permanent. Is it because of Ecto ? About the way the data is passed around (pubsub) ?
Hope the question is somehow relevant; would like to get a better understanding
Itās not required. but sorry, I donāt have answers for you on when the GenServer gets gcād. Are maybe your messages coming in more rapidly than you can dispatch them to the database? (Aka do you always have a message in the message queue?). This is just a guess but I think if all of the messages are dispatched, the process goes into a tail call that calls GC, but if there are messages in the queue, the process is only put on ice by the scheduler.
Could also be some weird things with binary references, which are handled differently and have different lifetimes.
thanks a lot for those links! and sorry did not meant to demand an answer, rather general advices
I should have mentionned, the genserver receives data every 15 minutes. It takes it something like 1 minutes to ingest the data; then it just hangs around not receiving anything, waiting for the next batch. Which leads me to think the GC was never triggered
Well. itās actually not a leakage, itās just how the garbage collection works. I think youāre receiving big amount of structures (in the {:matcher, data}) structure, which is not collected right away
@hst337 what happens with klines in gen server after call to Task.async? Those are not garbage collected? Based on our testing, gen server memory usage immediately dropped.
First of all, unused data will always be garbage collected. The question here is at which point itāll be garbage collected
Second, here klines get copied into Task process youāve created. So it might trigger gc in GenServer. But this klines will be in the Task until it dies.
But this is a bad solution, because you donāt need a process here, you donāt need to copy this data. You just need to call the Repo.all and then trigger minor gc
hibernate is okay, but I donāt know why want to collect garbage straight away. I mean, gc will be called when thereās no space left, so you donāt have to worry about running out of memory (in this situation). And when garbage collection is initiated by VM, it will be more efficient in time, than calling GC manually every time
The gen_server process can go into hibernation (see erlang:hibernate/3) if a callback function specifies āhibernateā instead of a time-out value. This can be useful if the server is expected to be idle for a long time. However, use this feature with care, as hibernation implies at least two garbage collections (when hibernating and shortly after waking up) and is not something you want to do between each call to a busy server.
playing around on heroku, on purpose as am learning this whole otp thing and would like to see where thing gets āheavyā
the thing is i dont see the no space left doing anything, rather system errors (memory quota exceeded). i might see whats up on other kind of instances
You can list all processes in your system, then for each get the memory usage, and then print that sorted in your console. Maybe other processes are consuming too much memory.
Your genserver is receiving large amount of data, yet does very little āworkā on them, so gc was triggered late; I think gc is triggered by the amount of āworkā since last gc. Just curious, why do you need a genserver here? You can just upsert the data in the original process; ecto will take care of the atomicity.
@lud running within phoenix so i opened the dashboard route to see whats up; might explain why below, it is not the memory of a given process, but the amount of genservers running
@derek-zhou what would you mean by āworkā ? you bring another issue with the upsert.
this genserver receives data from numerous other running ones. Doing so in the original processes (the numerous other ones) just timed out the db super fast. Centralizing the upsert within this single genserver solved the db timeouts. This surprised me a lot, as the amount of data and queries is the very same; it just flows another way.
But youāre right, i could totally do the subscribe/upsert in the main process, not in a genserver
My understanding of the GC is that it is triggered by every so many āreductionsā, which is a BEAM defined metric of āworkā. If running the upsert in all the originating processes jams the system, then serializing them on one process will cause serious backlog on its incoming message queue. You may have other problems.