Dirty CPU NIF blocking scheduler

Hello everyone!

I’m currently working on a web spider/metadata extractor and because writing parsers for many file types would be too much work, I’m using libextractor.

I wrote a NIF which I learned only allows for functions that return in less than ~1ms. So I just changed this:

static ErlNifFunc funcs[] = {
  { "extract", 2, extract }
};

into this:

static ErlNifFunc funcs[] = {
  { "extract", 2, extract, ERL_NIF_DIRTY_JOB_CPU_BOUND }
};

The problem: it seems like the scheduler gets to 100% utilization which is bad because the spider downloads a lot of files in parallel so it becomes impossible to do things like launch the observer if I’m calling the NIF. Any idea if this is normal? Is there a way I can prevent the NIF from blocking the rest of the system?

Here is a picture of the observer while I test the calls to my main function:

Thanks!

1 Like

Is it possible to run erlang:system_info to check schedulers_state, scheduling_statistics, dirty_cpu_schedulers_online, dirty_cpu_schedulers

And the most useful would be to check schedulers using microstate accounting? msacc:start(1000), msacc:print(), msacc:stop(). Might not be wise to use it in a production env…

Can you set the number of dirty nif schedulers to number of cores - 1? Probably easiest way to test this is to do the following at startup:

:erlang.system_flag(:dirty_cpu_schedulers, System.schedulers_online() - 1)
1 Like

Are you sure that your NIF is CPU bound and not IO bound?

3 Likes

I don’t know how useful this is, I’m just running iex -S mix on my NIF, it’s not running in production or anything, but I hope it helps.

Is it possible to run erlang:system_info to check schedulers_state, scheduling_statistics, dirty_cpu_schedulers_online, dirty_cpu_schedulers

iex(1)> {:ok, file_data} = File.read("/home/solanav/krptkn/final_form.png")
{:ok,
 <<137, 80, 78, 71, 13, 10, 26, 10, 0, 0, 0, 13, 73, 72, 68, 82, 0, 0, 6, 50, 0,
   0, 2, 229, 8, 6, 0, 0, 0, 110, 46, 127, 76, 0, 0, 11, 218, 116, 69, 88, 116,
   109, 120, 102, 105, 108, 101, 0, ...>>}

iex(2)> Extractor.extract(file_data)
Extracting plugins...[OK] [0.046875]
Reading file binary...[OK] [0.000000]
Extracting the data...[OK] [0.218750]
Removing the plugins...[OK] [0.000000]
Creating list of data...[OK] [0.000000]
Destroying the data array...[OK] [0.000000]
[
  {'png', 'mimetype', 1, 'text/plain', 'image/png'},
  {'png', 'image dimensions', 1, 'text/plain', '1586x741'},
  {'mime', 'mimetype', 1, 'text/plain', 'image/png'},
  {'png', 'keywords', 1, 'text/plain', 'url encoded data'},
  {'thumbnailgtk', 'image dimensions', 1, 'text/plain', '1586x741'},
  {'thumbnailgtk', 'thumbnail', 2, 'image/png',  [137, 80, 78, 71, 13, 10, 26, 10]},
  {'gstreamer', 'mimetype', 1, 'text/plain', 'image/png'},
  {'gstreamer', 'video dimensions', 1, 'text/plain', '1586x741'},
  {'gstreamer', 'video depth', 1, 'text/plain', '32'},
  {'gstreamer', 'pixel aspect ratio', 1, 'text/plain', '1/1'}
]

iex(3)> :erlang.system_info(:schedulers_state)
{12, 12, 12}

iex(4)> :erlang.system_info(:scheduling_statistics)
[
  {:process_max, 0, 0},
  {:process_high, 0, 0},
  {:process_normal, 0, 0},
  {:process_low, 0, 0},
  {:port, 0, 0}
]

iex(5)> :erlang.system_info(:dirty_cpu_schedulers_online)
12

iex(6)> :erlang.system_info(:dirty_cpu_schedulers)
12

And the most useful would be to check schedulers using microstate accounting? msacc:start(1000), msacc:print(), msacc:stop(). Might not be wise to use it in a production env…

Here I did :msacc.start(1000), ran the NIF, printed and then stopped.

iex(8)> :msacc.print()
Average thread real-time    : 1001601 us
Accumulated system run-time :    7208 us
Average scheduler run-time  :     593 us

        Thread      aux check_io emulator       gc    other     port    sleep

Stats per thread:
     async( 0)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
       aux( 1)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%   99.99%
dirty_cpu_( 1)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_cpu_( 2)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_cpu_( 3)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_cpu_( 4)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_cpu_( 5)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_cpu_( 6)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_cpu_( 7)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_cpu_( 8)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_cpu_( 9)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_cpu_(10)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_cpu_(11)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_cpu_(12)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 1)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 2)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 3)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 4)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 5)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 6)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 7)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 8)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 9)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s(10)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
      poll( 0)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
 scheduler( 1)    0.00%    0.05%    0.00%    0.00%    0.00%    0.00%   99.94%
 scheduler( 2)    0.00%    0.00%    0.00%    0.00%    0.06%    0.00%   99.94%
 scheduler( 3)    0.00%    0.00%    0.00%    0.00%    0.06%    0.00%   99.94%
 scheduler( 4)    0.00%    0.00%    0.00%    0.00%    0.07%    0.00%   99.93%
 scheduler( 5)    0.00%    0.00%    0.00%    0.00%    0.06%    0.00%   99.94%
 scheduler( 6)    0.00%    0.00%    0.00%    0.00%    0.06%    0.00%   99.94%
 scheduler( 7)    0.00%    0.00%    0.00%    0.00%    0.01%    0.00%   99.99%
 scheduler( 8)    0.00%    0.00%    0.00%    0.00%    0.06%    0.00%   99.94%
 scheduler( 9)    0.00%    0.00%    0.00%    0.00%    0.06%    0.00%   99.94%
 scheduler(10)    0.00%    0.00%    0.00%    0.00%    0.06%    0.00%   99.94%
 scheduler(11)    0.00%    0.00%    0.00%    0.00%    0.06%    0.00%   99.94%
 scheduler(12)    0.00%    0.00%    0.00%    0.00%    0.06%    0.00%   99.94%

Stats per type:
         async    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
           aux    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%   99.99%
dirty_cpu_sche    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_sched    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
          poll    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
     scheduler    0.00%    0.00%    0.00%    0.00%    0.05%    0.00%   99.94%
:ok

So I tried doing that and it seems like the result is the same, probably because the number of dirty schedulers was 12 so I guess not much changed.

Well I am not. Not really. I know libextractor (the library I am wrapping with this NIF) does not wait for user input and does not connect to the internet. But I think it may be running some code stored on dynamic libraries.

But I have tried changing it to IO scheduler and even though the peak was of less duration, it still peaked at 80-100%. The graph was almost the same but no dotted line, just a normal one.

It seems that the performance surge only least for one or two seconds from the chart. :msacc.start(1000) is kinda too short to catch the vibration since it only analyze scheduler in 1000ms. Could you increase the time of :msacc.start/1 and try to fit the surge in that period?

1 Like

And could you provide the OTP version?

I’m using version 23.1 (tried both on WSL and on a ubuntu 20.04 virtual machine with the same results).

I can’t get it to work, the extractor won’t start running until :msacc.start() has finished. Is this code alright?

def test do
  {:ok, file_data} = File.read("/home/solanav/krptkn/final_form.png")

  :msacc.start(2_000)
  Extractor.extract(file_data)
  :msacc.print()
  :msacc.stop
end

While a nif is running on a scheduler its utilization will be plotted as 100%. This is normal.

If the NIF is running on a dirty scheduler it should not block regular schedulers. Here’s how I tested it locally:

  1. In the C code of the dirty NIF function I added sleep(10);
  2. I started the shell with ELIXIR_ERL_OPTIONS="+S 1" iex -S mix (to make sure there’s just one scheduler and one dirty CPU scheduler)
  3. From the shell I started the nif as spawn(&my_nif/0)
  4. As soon as process was spawned the iex shell was responsive and I was able to start the observer.

If you invoke a lot of longer-running dirty nifs concurrently, you may end up occupying all the dirty scheduler threads, which might block some other operations. For example, length will switch to dirty schedulers for larger list, and if all schedulers are busy for a couple of seconds, length will be blocked. I was able to reproduce this by spawning a long-running NIF, and then invoking length(Enum.to_list(1..1_000_000)) which blocked until NIF finished.

Based on the description of the scenario this looks like a case where port might be a better fit.

9 Likes

Good point. I should have also mentioned that it should be ran in parallel to the nif execution.

Thank you so much for the explanation, I will rewrite the NIF as a port to see if it’s a better choice.

I’m a big fan by the way, keep up the good work!

2 Likes

Based on your screenshot it seems that a single extract call takes a few seconds, right? If that’s the case, I recommend limiting the amount of concurrency, i.e. make sure you’re not running more than x simultaneous extracts, where x should probably be smaller than the number of cpu cores.

1 Like

For NIF related code I usually refer to Paul Davis, Jesper Andersen, & Steve Vinoski, in no particular order. These are all in erlang, but that’s not really important for NIFs anyway.

If people have other examples in Elixir – I expect Frank Hunleth has a bunch of amazing Nerves ones. I’d hope they can chime in with some examples.

Paul’s authored jiffy, probably the most widely used NIF of all, Steve was instrumental in documenting and authoring the first working dirty scheduler code to my knowledge, and Jesper’s enacl NIF comes with excellent explanatory notes in addition.

The first two are both used extensively and show IMO canonical examples of building cross-platform compilation solutions. The middle is really well documented, and the latter shows how you can ship C source as part of the code and have it compiled locally.

9 Likes