I have a situation where multiple apps compete to process a job. When one of the app completes the job it tells the other apps, so they can stop processing the job. My jobs have IDs and take times to be processed. In my mind what I want is:
GenServer client calls cast({process_job, 1})
GenServer server receive handle_cast({process_job: job_id}) and start processing
GenServer is notified that job with id 1 has been completed by someone else so it should cancel processing started in step 2
It doesnāt seems possible to cancel a ācastā after it started.
I could solve this by using an Agent and Task, where the Agent would be responsible to store a mapping between the id of a running job and the pid of the Task. Then I can easily cancel a Task that is processing an already terminated job.
Still new to Elixir so wondering what would be the best design to answer thisā¦
GenServersā (like any processesā) messages are serialized and processed one after another. So as soon as your handle_cast function does start processing that GenServer wonāt read any new messages until itās finished. So thereās no way to stop them.
As you said your job processing takes time wouldnāt is be better to not even start processing jobs in multiple processes in the first place? I mean you could end up blocking a good chunk of workers with doing the same thing only for one of them finishing a few seconds earlier making the work of all other workers useless.
Thank you for your answer. I ignored that calls to GenServer where serialized and thought it was processing things in parallel. Anyway thatās confirm that I canāt use a GenServer for my problem.
Still the problem stay unchanged, apps compete to get the job done. And they are different apps, communicating through a network. I wanāt them to compete and try to be the first to get the job done. However one itās done by one app I wanāt the others to stop. Iāll probably go with the Agent + Task combination. Seems a good choice
Though I doubt this would be a good idea, i think hypothetically if, inside the gen serverās work processing, you periodically/each time through a loop or whatever did a receive for a specific cancel message with a super short timeout, and if received then abort processing, you might achieve the behavior youāre describingā¦you could also spin up a process per cast received, store the id, and then kill that process when receiving a cancel message.
When multiple programs perform the same job at the same time, this is not really parallellization, but a case of duplicate work: Youāll throw away everything except one of the computed results. The only cases I can think of where youāll want two different applications to really perform the same work is:
When performing benchmarks.
When ensuring that the new implementation of some specification matches the behaviour of the existing one.
As such, it is not something you should ever do in production code.
If you truly want jobs to be performed in a parallellized way, the easiest way is to invert the control in this scenario: Instead of the work-giver telling the GenServer(s) the jobs to perform, they should look up the next job themselves:
One GenServer is a work queue.
A pool of N GenServers can perform tasks.
Some outside place (or multiple) can add work to the work queue GenServer.
The pool is normally dormant, but as soon as work is added to the work queue, the work queue GenServer will wake all GenServers in the task pool up. Then these will look up the first job in the queue: Because the work queue GenServer will only handle work-requests one after the other, the work items each GenServer receives will be different. After performing this job, they will each repeat this procedure until the queue is empty again (and when that happens, go back to sleep).
Guys thank you for all your answers but if my question is asked this way itās because it is a requirement.
I was focusing my question to a āsimpleā pattern to be efficient but maybe it lacks some context that will help you know why I wanāt to do this before explaining this is not something I wanāt to do
In Bitcoin, every nodes is connected by a P2P network. In this network there are miners which are responsible to compute a proof-of-work (something hard and that takes time). First miner to get the proof-of-work right have a reward. But when it gets it right, itās simple courtesy to tell others that they can stop trying to compute the proof-of-work as only one node will have a reward. So others can just cancel, save CPU cycles and wait for the next challenge.
So thatās why Iām looking for a clean way in Elixir to:
start a long job in a process
be able to cancel it if I need
For this I need to keep track of a mapping between the ID of a block Iām mining and the PID of the process that is computing the proof-of-work, so, if someone on the network tell me ātoo late, I took care of this blockā, I can shutdown the process.
To write a Bitcoin proof-of-work miner, you do not give all GenServerās the same task: They should all try different Nonces.
Of course, instead of giving them the nonces to try outright, you could reduce the amount of chatter by letting them pick a random number in the 0ā¦2^256-range each time themselves (the possibility of repeated work when doing this is astronomically small and can thus be neglected).
And indeed, as @NobbZ noted, you can manage a pool of worker GenServers, and when their work is no longer useful, simply use Process.exit on them, to close down that pool. Afterwards, you can just start a new pool for the next Bitcoin block.
EDIT:
Do note that this kind of computation is something that Elixir, being a bytecode-interpreted language, is not nearly as fast as code written in statically compiled languages. Running mining jobs on your GPU is of course even faster. For serious mining, people now use dedicated hardware, making CPU-bound software miners no longer profitable.
But of course, theyāre still fun to build as hobby projects !
Again @Qqwy thank you but I never talked about parallelizing a proof of work computation. Iām talking about nodes competing to get the nounce right. Iām not writing a mining software, Iām working on an āas simple as possibleā cryptocurrency in Elixir for educational purpose.
Iāve had the need to do something similar: cancel a long-running job because input data has changed and we donāt care about the result of the current-running jobs any more.
To do this, Iāve had a GenServer act a long-running server. When it receives the data to be computed, it spawns and monitors a plain process:
Store the pid and ref in the GenServer state. The monitor is used to be able to tell if the job has finished successfully or crashed etc.
If I want to cancel the request, I just do Process.exit(pid, :kill) (making sure to handle the resulting monitor signal).
There might be a way to do this with Task but since in my particular case I donāt care about the workās result (if the process exits normally, everything is OK), this approach seems to be working.
From my side (as I see it) it looks really bad, because my node could not get reward, because it could be slower than other nodes, so other nodes needs to start same work much more time after my node will or simply donāt work on it, so for me counting fortunately is really unprofitable and therefore I could choose other nodes network (if any).
Looking from RESTful side it could be easier solved by this steps:
0. index: list of not accepted challenges
new: simple form to accept challenge
insert: submit accept challenge form
update: set challenge finished and save result
So first node could mark challenge as itās own and work on it and when another node want to accept this challenge then you returns error. Of course you could say that you canāt make RESTful site in P2P netowk, but itās not fully true. If I well remember there was an JavaScript framework to make edits in P2P network really simple. If itās possible for Elixir then I would recommend this (longer) way (you need to write library if itās does not exists), because it looks much better than stopping long job in the middle. You only waste energy, time and moneys, so as I can see itās much more profitable for you and all nodes.
Thank you for your input @Eiji. I mostly agree with you however this would mean 2 things:
hashing power isnāt relevant anymore since you only need a http request to ābookā a block
slower hashing and thus slower block validation
Having nodes to compete to get a block valid make the transaction validation fast.
As far as I know, this is how bitcoin work, and knowing than entire companies are dedicated to mine bitcoins, I canāt see how mining is profitable for someone like me. This is still something I have to understand ā¦
I donāt know how it looks inside. If I well understand you mean that you donāt want to reserve challenge (group of blocks to compute) per node. If you know how much blocks you have in each challenge then you can reserve block instead of group of them for one node. Is it what you want?
Donāt know how much time your node needs to compute a block, but letās imagine that each node needs a hour for this and each node lose always 5 min., because another node already completed specified block. If you have 12 nodes then after hour you loss 1 block and finally your loss looks:
x BTC (cost of block) + y (your currency here) for energy lose (multiple work on same block)
As you can see in this simple example when you multiple nodes count then you also multiply lose for work conflict. Of course you still earn more than lose, but anyway you do not need to lose anything except energy for real (validated) work.
In worst case all nodes could work on same block in same time, so you could lose:
n - 1 (n - number of nodes) * x BTC (cost of block) + n - 1 (same here) * y (your currency here) for energy lose (multiple work on same block)
of course for in all case you spend money for energy, so:
+ z (your current here) normal energy lose (for validated block work)
Iām not cryptocurrency expert, so let me know if I made a mistake.
If youāre not sure about that in detail Iād really advice you to clear up that gap of knowledge before diving into code without really knowing what to do. It just might be time spend on incorrect assumptions.
But still in an attempt to clear things up.
Try to minimize the possibilities of workers doing the āsameā work. In case of blockchain hash mining, do your best so processes do not check the same nonce for the hash generation twice, itāll just result in the same hash to be outputted. What you can do on the other hand: Have multiple workers try to solve the same ānext blockā within different ranges of nonces to test. You might call that competing, but itās actually much more working together on a common goal, a.k.a. solving the block as fast as possible to get the reward.
I think the fact most answers above do not address, is that mining is not a fixed sized repeatable operation. If different process are not trying out the same nonces then itās parallelization and not wasted computation.
@LostKobrakai and @Eiji you are both right. Iāll dig more on this point. My goal is to build an āas simple as possibleā cryptocurrency that mimic what happens in real world. So Iāve learned about blockchain, about transactions and validations and the last piece Iām missing is this whole mining thing.
Iāll document myself more but, as for my initial question, @orestis gives me the answer I was looking for.
Skipping a lot of this, to cancel a cast Iād do:
If the arguments passed in are āuniqueā then just pass them in again with a ā:cancelā cast or so.
If they are not unique then make them unique by passing in a ref, then pass in that ref as a :cancel cast or so.
The GenServer itself would manage itās pool and cancel whatever is mapped to those args (via Process.kill or a message or whatever based on how you design it).
Briefly, use a process group that contains all workers that are attacking a given common problem, and when the first worker completes, after sending in its results to your aggregation process, send a final kill to the entire process group. There are a couple of options regarding OTP structure that will depend on how distributed (across physical nodes & BEAM VMs) you are, either run all common workers under a single supervisor using one_for_all strategy, where on completion the entire worker set is terminated via OTP and can restart working on a new target, or use PG2 http://erlang.org/doc/man/pg2.html for a more distributed approach, and do explicit termination yourself.