Cancelling gen_server "cast"

robinmonjo · August 27, 2017, 12:16pm

Hello all,

I have a situation where multiple apps compete to process a job. When one of the app completes the job it tells the other apps, so they can stop processing the job. My jobs have IDs and take times to be processed. In my mind what I want is:

GenServer client calls cast({process_job, 1})
GenServer server receive handle_cast({process_job: job_id}) and start processing
GenServer is notified that job with id 1 has been completed by someone else so it should cancel processing started in step 2

It doesn’t seems possible to cancel a “cast” after it started.

I could solve this by using an Agent and Task, where the Agent would be responsible to store a mapping between the id of a running job and the pid of the Task. Then I can easily cancel a Task that is processing an already terminated job.

Still new to Elixir so wondering what would be the best design to answer this…

Kind regards,
Robin

LostKobrakai · August 27, 2017, 3:29pm

GenServers’ (like any processes’) messages are serialized and processed one after another. So as soon as your handle_cast function does start processing that GenServer won’t read any new messages until it’s finished. So there’s no way to stop them.

As you said your job processing takes time wouldn’t is be better to not even start processing jobs in multiple processes in the first place? I mean you could end up blocking a good chunk of workers with doing the same thing only for one of them finishing a few seconds earlier making the work of all other workers useless.

robinmonjo · August 27, 2017, 4:28pm

Thank you for your answer. I ignored that calls to GenServer where serialized and thought it was processing things in parallel. Anyway that’s confirm that I can’t use a GenServer for my problem.
Still the problem stay unchanged, apps compete to get the job done. And they are different apps, communicating through a network. I wan’t them to compete and try to be the first to get the job done. However one it’s done by one app I wan’t the others to stop. I’ll probably go with the Agent + Task combination. Seems a good choice

NobbZ · August 27, 2017, 4:42pm

From your dispatcher close all the ports driving the programms.

bglusman · August 27, 2017, 6:57pm

Though I doubt this would be a good idea, i think hypothetically if, inside the gen server’s work processing, you periodically/each time through a loop or whatever did a receive for a specific cancel message with a super short timeout, and if received then abort processing, you might achieve the behavior you’re describing…you could also spin up a process per cast received, store the id, and then kill that process when receiving a cancel message.

Qqwy · August 27, 2017, 7:28pm

When multiple programs perform the same job at the same time, this is not really parallellization, but a case of duplicate work: You’ll throw away everything except one of the computed results. The only cases I can think of where you’ll want two different applications to really perform the same work is:

When performing benchmarks.
When ensuring that the new implementation of some specification matches the behaviour of the existing one.

As such, it is not something you should ever do in production code.

If you truly want jobs to be performed in a parallellized way, the easiest way is to invert the control in this scenario: Instead of the work-giver telling the GenServer(s) the jobs to perform, they should look up the next job themselves:

One GenServer is a work queue.
A pool of N GenServers can perform tasks.
Some outside place (or multiple) can add work to the work queue GenServer.

The pool is normally dormant, but as soon as work is added to the work queue, the work queue GenServer will wake all GenServers in the task pool up. Then these will look up the first job in the queue: Because the work queue GenServer will only handle work-requests one after the other, the work items each GenServer receives will be different. After performing this job, they will each repeat this procedure until the queue is empty again (and when that happens, go back to sleep).

robinmonjo · August 27, 2017, 9:03pm

Guys thank you for all your answers but if my question is asked this way it’s because it is a requirement.

I was focusing my question to a “simple” pattern to be efficient but maybe it lacks some context that will help you know why I wan’t to do this before explaining this is not something I wan’t to do

In Bitcoin, every nodes is connected by a P2P network. In this network there are miners which are responsible to compute a proof-of-work (something hard and that takes time). First miner to get the proof-of-work right have a reward. But when it gets it right, it’s simple courtesy to tell others that they can stop trying to compute the proof-of-work as only one node will have a reward. So others can just cancel, save CPU cycles and wait for the next challenge.

So that’s why I’m looking for a clean way in Elixir to:

start a long job in a process
be able to cancel it if I need

For this I need to keep track of a mapping between the ID of a block I’m mining and the PID of the process that is computing the proof-of-work, so, if someone on the network tell me “too late, I took care of this block”, I can shutdown the process.

NobbZ · August 27, 2017, 9:19pm

Exactly this.

Find which process is working on that work item
Process.exit the worker.

LostKobrakai · August 27, 2017, 9:32pm

Use a GenServer, which does start a worker process elsewhere to do work. If it does receive a message saying it should abort, it can kill that worker.

Qqwy · August 27, 2017, 9:37pm

To write a Bitcoin proof-of-work miner, you do not give all GenServer’s the same task: They should all try different Nonces.

Of course, instead of giving them the nonces to try outright, you could reduce the amount of chatter by letting them pick a random number in the 0…2^256-range each time themselves (the possibility of repeated work when doing this is astronomically small and can thus be neglected).

And indeed, as @NobbZ noted, you can manage a pool of worker GenServers, and when their work is no longer useful, simply use Process.exit on them, to close down that pool. Afterwards, you can just start a new pool for the next Bitcoin block.

EDIT:
Do note that this kind of computation is something that Elixir, being a bytecode-interpreted language, is not nearly as fast as code written in statically compiled languages. Running mining jobs on your GPU is of course even faster. For serious mining, people now use dedicated hardware, making CPU-bound software miners no longer profitable.
But of course, they’re still fun to build as hobby projects !

robinmonjo · August 27, 2017, 9:53pm

Again @Qqwy thank you but I never talked about parallelizing a proof of work computation. I’m talking about nodes competing to get the nounce right. I’m not writing a mining software, I’m working on an “as simple as possible” cryptocurrency in Elixir for educational purpose.

orestis · August 28, 2017, 7:16am

I’ve had the need to do something similar: cancel a long-running job because input data has changed and we don’t care about the result of the current-running jobs any more.

To do this, I’ve had a GenServer act a long-running server. When it receives the data to be computed, it spawns and monitors a plain process:

{pid, ref} = spawn_monitor(fn() -> do_some_work(data) end)

Store the pid and ref in the GenServer state. The monitor is used to be able to tell if the job has finished successfully or crashed etc.

If I want to cancel the request, I just do Process.exit(pid, :kill) (making sure to handle the resulting monitor signal).

There might be a way to do this with Task but since in my particular case I don’t care about the work’s result (if the process exits normally, everything is OK), this approach seems to be working.

Eiji · August 28, 2017, 10:44am

From my side (as I see it) it looks really bad, because my node could not get reward, because it could be slower than other nodes, so other nodes needs to start same work much more time after my node will or simply don’t work on it, so for me counting fortunately is really unprofitable and therefore I could choose other nodes network (if any).

Looking from RESTful side it could be easier solved by this steps:
0. index: list of not accepted challenges

new: simple form to accept challenge
insert: submit accept challenge form
update: set challenge finished and save result

So first node could mark challenge as it’s own and work on it and when another node want to accept this challenge then you returns error. Of course you could say that you can’t make RESTful site in P2P netowk, but it’s not fully true. If I well remember there was an JavaScript framework to make edits in P2P network really simple. If it’s possible for Elixir then I would recommend this (longer) way (you need to write library if it’s does not exists), because it looks much better than stopping long job in the middle. You only waste energy, time and moneys, so as I can see it’s much more profitable for you and all nodes.

robinmonjo · August 28, 2017, 11:06am

Thank you for your input @Eiji. I mostly agree with you however this would mean 2 things:

hashing power isn’t relevant anymore since you only need a http request to “book” a block
slower hashing and thus slower block validation

Having nodes to compete to get a block valid make the transaction validation fast.

As far as I know, this is how bitcoin work, and knowing than entire companies are dedicated to mine bitcoins, I can’t see how mining is profitable for someone like me. This is still something I have to understand …

Eiji · August 28, 2017, 11:31am

I don’t know how it looks inside. If I well understand you mean that you don’t want to reserve challenge (group of blocks to compute) per node. If you know how much blocks you have in each challenge then you can reserve block instead of group of them for one node. Is it what you want?

Don’t know how much time your node needs to compute a block, but let’s imagine that each node needs a hour for this and each node lose always 5 min., because another node already completed specified block. If you have 12 nodes then after hour you loss 1 block and finally your loss looks:

x BTC (cost of block) + y (your currency here) for energy lose (multiple work on same block)

As you can see in this simple example when you multiple nodes count then you also multiply lose for work conflict. Of course you still earn more than lose, but anyway you do not need to lose anything except energy for real (validated) work.

In worst case all nodes could work on same block in same time, so you could lose:

n - 1 (n - number of nodes) * x BTC (cost of block) + n - 1 (same here) * y (your currency here) for energy lose (multiple work on same block)

of course for in all case you spend money for energy, so:

+ z (your current here) normal energy lose (for validated block work)

I’m not cryptocurrency expert, so let me know if I made a mistake.

LostKobrakai · August 28, 2017, 11:47am

If you’re not sure about that in detail I’d really advice you to clear up that gap of knowledge before diving into code without really knowing what to do. It just might be time spend on incorrect assumptions.

But still in an attempt to clear things up.

Try to minimize the possibilities of workers doing the “same” work. In case of blockchain hash mining, do your best so processes do not check the same nonce for the hash generation twice, it’ll just result in the same hash to be outputted. What you can do on the other hand: Have multiple workers try to solve the same “next block” within different ranges of nonces to test. You might call that competing, but it’s actually much more working together on a common goal, a.k.a. solving the block as fast as possible to get the reward.

I think the fact most answers above do not address, is that mining is not a fixed sized repeatable operation. If different process are not trying out the same nonces then it’s parallelization and not wasted computation.

robinmonjo · August 28, 2017, 11:55am

@LostKobrakai and @Eiji you are both right. I’ll dig more on this point. My goal is to build an “as simple as possible” cryptocurrency that mimic what happens in real world. So I’ve learned about blockchain, about transactions and validations and the last piece I’m missing is this whole mining thing.
I’ll document myself more but, as for my initial question, @orestis gives me the answer I was looking for.

LostKobrakai · August 28, 2017, 11:56am

Maybe this can get you started: https://github.com/anders94/blockchain-demo/

OvermindDL1 · August 28, 2017, 3:03pm

Skipping a lot of this, to cancel a cast I’d do:

If the arguments passed in are ‘unique’ then just pass them in again with a ‘:cancel’ cast or so.
If they are not unique then make them unique by passing in a ref, then pass in that ref as a :cancel cast or so.
The GenServer itself would manage it’s pool and cancel whatever is mapped to those args (via Process.kill or a message or whatever based on how you design it).

dch · August 30, 2017, 10:02pm

Briefly, use a process group that contains all workers that are attacking a given common problem, and when the first worker completes, after sending in its results to your aggregation process, send a final kill to the entire process group. There are a couple of options regarding OTP structure that will depend on how distributed (across physical nodes & BEAM VMs) you are, either run all common workers under a single supervisor using one_for_all strategy, where on completion the entire worker set is terminated via OTP and can restart working on a new target, or use PG2 http://erlang.org/doc/man/pg2.html for a more distributed approach, and do explicit termination yourself.