Cancelling gen_server "cast"

Hello all,

I have a situation where multiple apps compete to process a job. When one of the app completes the job it tells the other apps, so they can stop processing the job. My jobs have IDs and take times to be processed. In my mind what I want is:

  1. GenServer client calls cast({process_job, 1})
  2. GenServer server receive handle_cast({process_job: job_id}) and start processing
  3. GenServer is notified that job with id 1 has been completed by someone else so it should cancel processing started in step 2

It doesnā€™t seems possible to cancel a ā€œcastā€ after it started.

I could solve this by using an Agent and Task, where the Agent would be responsible to store a mapping between the id of a running job and the pid of the Task. Then I can easily cancel a Task that is processing an already terminated job.

Still new to Elixir so wondering what would be the best design to answer thisā€¦

Kind regards,
Robin

GenServersā€™ (like any processesā€™) messages are serialized and processed one after another. So as soon as your handle_cast function does start processing that GenServer wonā€™t read any new messages until itā€™s finished. So thereā€™s no way to stop them.

As you said your job processing takes time wouldnā€™t is be better to not even start processing jobs in multiple processes in the first place? I mean you could end up blocking a good chunk of workers with doing the same thing only for one of them finishing a few seconds earlier making the work of all other workers useless.

1 Like

Thank you for your answer. I ignored that calls to GenServer where serialized and thought it was processing things in parallel. Anyway thatā€™s confirm that I canā€™t use a GenServer for my problem.
Still the problem stay unchanged, apps compete to get the job done. And they are different apps, communicating through a network. I wanā€™t them to compete and try to be the first to get the job done. However one itā€™s done by one app I wanā€™t the others to stop. Iā€™ll probably go with the Agent + Task combination. Seems a good choice

From your dispatcher close all the ports driving the programms.

Though I doubt this would be a good idea, i think hypothetically if, inside the gen serverā€™s work processing, you periodically/each time through a loop or whatever did a receive for a specific cancel message with a super short timeout, and if received then abort processing, you might achieve the behavior youā€™re describingā€¦you could also spin up a process per cast received, store the id, and then kill that process when receiving a cancel message.

When multiple programs perform the same job at the same time, this is not really parallellization, but a case of duplicate work: Youā€™ll throw away everything except one of the computed results. The only cases I can think of where youā€™ll want two different applications to really perform the same work is:

  1. When performing benchmarks.
  2. When ensuring that the new implementation of some specification matches the behaviour of the existing one.

As such, it is not something you should ever do in production code.

If you truly want jobs to be performed in a parallellized way, the easiest way is to invert the control in this scenario: Instead of the work-giver telling the GenServer(s) the jobs to perform, they should look up the next job themselves:

  • One GenServer is a work queue.
  • A pool of N GenServers can perform tasks.
  • Some outside place (or multiple) can add work to the work queue GenServer.

The pool is normally dormant, but as soon as work is added to the work queue, the work queue GenServer will wake all GenServers in the task pool up. Then these will look up the first job in the queue: Because the work queue GenServer will only handle work-requests one after the other, the work items each GenServer receives will be different. After performing this job, they will each repeat this procedure until the queue is empty again (and when that happens, go back to sleep).

Guys thank you for all your answers but if my question is asked this way itā€™s because it is a requirement.

I was focusing my question to a ā€œsimpleā€ pattern to be efficient but maybe it lacks some context that will help you know why I wanā€™t to do this before explaining this is not something I wanā€™t to do :stuck_out_tongue:

In Bitcoin, every nodes is connected by a P2P network. In this network there are miners which are responsible to compute a proof-of-work (something hard and that takes time). First miner to get the proof-of-work right have a reward. But when it gets it right, itā€™s simple courtesy to tell others that they can stop trying to compute the proof-of-work as only one node will have a reward. So others can just cancel, save CPU cycles and wait for the next challenge.

So thatā€™s why Iā€™m looking for a clean way in Elixir to:

  • start a long job in a process
  • be able to cancel it if I need

For this I need to keep track of a mapping between the ID of a block Iā€™m mining and the PID of the process that is computing the proof-of-work, so, if someone on the network tell me ā€œtoo late, I took care of this blockā€, I can shutdown the process.

Exactly this.

  • Find which process is working on that work item
  • Process.exit the worker.

Use a GenServer, which does start a worker process elsewhere to do work. If it does receive a message saying it should abort, it can kill that worker.

To write a Bitcoin proof-of-work miner, you do not give all GenServerā€™s the same task: They should all try different Nonces.

Of course, instead of giving them the nonces to try outright, you could reduce the amount of chatter by letting them pick a random number in the 0ā€¦2^256-range each time themselves (the possibility of repeated work when doing this is astronomically small and can thus be neglected).

And indeed, as @NobbZ noted, you can manage a pool of worker GenServers, and when their work is no longer useful, simply use Process.exit on them, to close down that pool. Afterwards, you can just start a new pool for the next Bitcoin block.


EDIT:
Do note that this kind of computation is something that Elixir, being a bytecode-interpreted language, is not nearly as fast as code written in statically compiled languages. Running mining jobs on your GPU is of course even faster. For serious mining, people now use dedicated hardware, making CPU-bound software miners no longer profitable.
But of course, theyā€™re still fun to build as hobby projects :smiley: !

Again @Qqwy thank you but I never talked about parallelizing a proof of work computation. Iā€™m talking about nodes competing to get the nounce right. Iā€™m not writing a mining software, Iā€™m working on an ā€œas simple as possibleā€ cryptocurrency in Elixir for educational purpose.

1 Like

Iā€™ve had the need to do something similar: cancel a long-running job because input data has changed and we donā€™t care about the result of the current-running jobs any more.

To do this, Iā€™ve had a GenServer act a long-running server. When it receives the data to be computed, it spawns and monitors a plain process:

{pid, ref} = spawn_monitor(fn() -> do_some_work(data) end)

Store the pid and ref in the GenServer state. The monitor is used to be able to tell if the job has finished successfully or crashed etc.

If I want to cancel the request, I just do Process.exit(pid, :kill) (making sure to handle the resulting monitor signal).

There might be a way to do this with Task but since in my particular case I donā€™t care about the workā€™s result (if the process exits normally, everything is OK), this approach seems to be working.

2 Likes

From my side (as I see it) it looks really bad, because my node could not get reward, because it could be slower than other nodes, so other nodes needs to start same work much more time after my node will or simply donā€™t work on it, so for me counting fortunately is really unprofitable and therefore I could choose other nodes network (if any).

Looking from RESTful side it could be easier solved by this steps:
0. index: list of not accepted challenges

  1. new: simple form to accept challenge
  2. insert: submit accept challenge form
  3. update: set challenge finished and save result

So first node could mark challenge as itā€™s own and work on it and when another node want to accept this challenge then you returns error. Of course you could say that you canā€™t make RESTful site in P2P netowk, but itā€™s not fully true. If I well remember there was an JavaScript framework to make edits in P2P network really simple. If itā€™s possible for Elixir then I would recommend this (longer) way (you need to write library if itā€™s does not exists), because it looks much better than stopping long job in the middle. You only waste energy, time and moneys, so as I can see itā€™s much more profitable for you and all nodes.

Thank you for your input @Eiji. I mostly agree with you however this would mean 2 things:

  • hashing power isnā€™t relevant anymore since you only need a http request to ā€œbookā€ a block
  • slower hashing and thus slower block validation

Having nodes to compete to get a block valid make the transaction validation fast.

As far as I know, this is how bitcoin work, and knowing than entire companies are dedicated to mine bitcoins, I canā€™t see how mining is profitable for someone like me. This is still something I have to understand ā€¦

I donā€™t know how it looks inside. If I well understand you mean that you donā€™t want to reserve challenge (group of blocks to compute) per node. If you know how much blocks you have in each challenge then you can reserve block instead of group of them for one node. Is it what you want?

Donā€™t know how much time your node needs to compute a block, but letā€™s imagine that each node needs a hour for this and each node lose always 5 min., because another node already completed specified block. If you have 12 nodes then after hour you loss 1 block and finally your loss looks:

x BTC (cost of block) + y (your currency here) for energy lose (multiple work on same block)

As you can see in this simple example when you multiple nodes count then you also multiply lose for work conflict. Of course you still earn more than lose, but anyway you do not need to lose anything except energy for real (validated) work. :slight_smile:

In worst case all nodes could work on same block in same time, so you could lose:

n - 1 (n - number of nodes) * x BTC (cost of block) + n - 1 (same here) * y (your currency here) for energy lose (multiple work on same block)

of course for in all case you spend money for energy, so:

+ z (your current here) normal energy lose (for validated block work) :slight_smile:

Iā€™m not cryptocurrency expert, so let me know if I made a mistake.

If youā€™re not sure about that in detail Iā€™d really advice you to clear up that gap of knowledge before diving into code without really knowing what to do. It just might be time spend on incorrect assumptions.

But still in an attempt to clear things up.

Try to minimize the possibilities of workers doing the ā€œsameā€ work. In case of blockchain hash mining, do your best so processes do not check the same nonce for the hash generation twice, itā€™ll just result in the same hash to be outputted. What you can do on the other hand: Have multiple workers try to solve the same ā€œnext blockā€ within different ranges of nonces to test. You might call that competing, but itā€™s actually much more working together on a common goal, a.k.a. solving the block as fast as possible to get the reward.

I think the fact most answers above do not address, is that mining is not a fixed sized repeatable operation. If different process are not trying out the same nonces then itā€™s parallelization and not wasted computation.

@LostKobrakai and @Eiji you are both right. Iā€™ll dig more on this point. My goal is to build an ā€œas simple as possibleā€ cryptocurrency that mimic what happens in real world. So Iā€™ve learned about blockchain, about transactions and validations and the last piece Iā€™m missing is this whole mining thing.
Iā€™ll document myself more but, as for my initial question, @orestis gives me the answer I was looking for.

Maybe this can get you started: https://github.com/anders94/blockchain-demo/

Skipping a lot of this, to cancel a cast Iā€™d do:

  • If the arguments passed in are ā€˜uniqueā€™ then just pass them in again with a ā€˜:cancelā€™ cast or so.
  • If they are not unique then make them unique by passing in a ref, then pass in that ref as a :cancel cast or so.
  • The GenServer itself would manage itā€™s pool and cancel whatever is mapped to those args (via Process.kill or a message or whatever based on how you design it).

Briefly, use a process group that contains all workers that are attacking a given common problem, and when the first worker completes, after sending in its results to your aggregation process, send a final kill to the entire process group. There are a couple of options regarding OTP structure that will depend on how distributed (across physical nodes & BEAM VMs) you are, either run all common workers under a single supervisor using one_for_all strategy, where on completion the entire worker set is terminated via OTP and can restart working on a new target, or use PG2 http://erlang.org/doc/man/pg2.html for a more distributed approach, and do explicit termination yourself.