Reading mtime (stat) to microsecond accuracy

the_wildgoose · March 21, 2023, 5:11pm

I tried using

File.state(“some/file”)

and discovered that it truncates the timestamps to the nearest second. It seems to be a wrapper for the underlying erlang call:

:file.read_file_info(‘some/file’, [{:time, :posix}])

…which returns times in whole seconds…

Any suggestions on how to most performantly access mtimes to maximum accuracy of the filesystem?

Equivalently, the use is debouncing reads of a small JSON file, so a possible workaround is a high speed way to checksum the file? Suggestions?

(The file is small, we have a problem (insert reasons) that means we need to acquire an expensive lock (100s of milliseconds) before reading it, so I need robust way to detect if it’s been changed since last read. Yes, long term we can work on improving the speed of locking, but…)

jhogberg · March 21, 2023, 5:33pm

Relying on time stamps, however accurate, is not a reliable way to check whether files have changed. Neither are content comparisons (checksums or otherwise) if there’s a possibility of the file being modified while you’re reading it. Try writing a NIF that uses inotify(7) or similar instead and listen for changes.

I have a feeling that you’re trying to solve the wrong problem though, why are you communicating over the file system?

D4no0 · March 21, 2023, 5:35pm

Since nowadays hashing is so common, there is a chance that the CPU you are using might have acceleration for some specific hashing algorithms. If I were to choose, I would go CRC at the beginning.

dimitarvp · March 21, 2023, 6:13pm

I’m interested what’s your actual scenario that you eventually ended up with having to modify a JSON file and checking if you modified it recently.

the_wildgoose · March 21, 2023, 6:29pm

I’m using inotify. I need to debounce multiple notifications from it.

Can I just add that I’m not hoping for an argument on the merits of mtimes and comfortable with their limitations.

Larger situation is that multiple processes across an embedded system are all editing a master JSON config file. Think of something like openwrt and it’s config system. You need 2 basic primitives, open with a shared lock and open with an exclusive lock. Due to the varied different implementations/programming languages that I need to support, I’m using flock to implement my locking. For various reasons the required incantations to support flock using exec under elixir are “slow”. This is due to calling the native “flock” binary, which itself needs to bring up a shell to be safe against process failure.

Longer term I will rewrite flock in rustler. I already wrote an implementation in zigler, but that seems to have lost traction and whilst the code runs fine on a number of platforms, it fails to execute correctly on arm32. Development time is the limitation here. I also see that there is a pull request for OTP to implement flock in OTP27, which should land in about 14 months or so. So I just need to get this project stood up until then

So, back to the original problem: I have a database file, its slow to lock it for (synchronised) reading. I can read it’s contents raw without a problem, eg to do a CRC check on it, or I can check it’s mtime to see if it changed since I last read it. The file is read regularly, but written very rarely. Goals are to avoid calling flock on every read. Cache is blown away on every inotify and when elixir alters the file. However, I’m still getting stale reads in some corner case. I’m looking to implement a safety net that also re-reads if the file has changed based on mtime (or perhaps something else)

OK, is the problem space clear enough? Any suggestions on accessing mtime that are neater than running exec “stat” and parsing the output?

the_wildgoose · March 21, 2023, 7:05pm

Implement an API to access a JSON file…

People call the API to write stuff, then later something else reads it, etc, etc. We have a web interface, users can make changes and hit save. Then every time you open a page you read the config, de-json it, etc.

It’s an embedded processor. Reading files from slow flash, slow flocking, converting from JSON, etc, are all a performance hotspot. I think caching as a necessary evil around a slow resource is an accepted solution in general?

jhogberg · March 21, 2023, 7:52pm

Fair enough, we’re just trying to help you better.

Since you just want to catch writes that somehow slip the inotify net (how? ), I think the quickest way forward is to read the whole file and compare it with the cached version or a checksum thereof. If the files are reasonably small it shouldn’t be much more expensive than checking modification time and you won’t have to think about how to get a more accurate timestamp, so why not check if it works well enough?

dimitarvp · March 21, 2023, 9:05pm

Feel free to ignore the following since you already outlined your time and energy budget constraints. I am still posting it because I feel it might be valuable now or in the future.

If you are looking for something so ubiquitous – even on ARM32 – then do consider SQLite, there are 1-2 good Elixir NIF libraries for it (and I never got around to writing my own due to various nasty factors). SQLite also has very solid JSON support, and its WAL / WAL2 features allow for safe parallel access, both read and write.

BradS2S · March 22, 2023, 1:16am

Here’s a NIF:

#include <erl_nif.h>
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <time.h>
#include <errno.h>

//  detect operating systems that I know something about
#ifdef __APPLE__
    #define PLATFORM_MACOS
#elif __linux__
    #define PLATFORM_LINUX
#endif

// Useful resource:
// https://andrealeopard.com/posts/using-c-from-elixir-with-nifs/#defining-a-nif


// ERL_NIF_TERM is a "wrapper" type that represents all Erlang types
// (like binary, list, tuple, and so on) in C.
static ERL_NIF_TERM get_file_time(ErlNifEnv* env, int argc, const ERL_NIF_TERM argv[])
{
    char file_path[1024];
    long seconds, nanoseconds;

    
    if (enif_get_string(env, argv[0], file_path, sizeof(file_path), ERL_NIF_LATIN1) <= 0) {
        // erl_nif.h provides several enif_make_* functions
        // to convert C values back to Erlang values.
        return enif_make_badarg(env);
    }

    struct stat filestat;
    if (stat(file_path, &filestat) == 0)
    {
        #ifdef PLATFORM_MACOS
        seconds = filestat.st_mtimespec.tv_sec;
        nanoseconds = filestat.st_mtimespec.tv_nsec;
        #elif defined(PLATFORM_LINUX)
        seconds = filestat.st_mtim.tv_sec;
        nanoseconds = filestat.st_mtim.tv_nsec;
        #else
        seconds = filestat.st_mtime;
        #endif
        
        return enif_make_tuple2(env,
                                // A enif_make_atom
                                enif_make_atom(env, "ok"),
                                enif_make_tuple2(env,
                                                 // Here's a enif_make_long
                                                 enif_make_long(env, seconds),
                                                 enif_make_long(env, nanoseconds)
                                                 )
                                );
    }
    else
    {
        // Here's a enif_make_tuple{n}
        return enif_make_tuple2(env,
                                
                                // https://stackoverflow.com/questions/503878/how-to-know-what-the-errno-means
                                enif_make_atom(env, "error"),
                                enif_make_int(env, errno)
                                );
    }
}

// Let's define the array of ErlNifFunc beforehand:
static ErlNifFunc nif_funcs[] =
{
    // {erl_function_name, erl_function_arity, c_function}
    {"get_file_time", 1, get_file_time}
};

// We now have to export the function we wrote to Erlang.
// We'll have to use the ERL_NIF_INIT macro. It looks like this:
ERL_NIF_INIT(Elixir.FileTime, nif_funcs, NULL, NULL, NULL, NULL);

I compiled it using this command:

gcc -fPIC -I/usr/local/lib/erlang/usr/include/ \
     -dynamiclib -undefined dynamic_lookup \
     -o file_time.so file_time.c

Here’s the module I wrapped it in:

defmodule FileTime do
  @on_load :load_nifs

  def load_nifs do
    :erlang.load_nif("./file_time", 0)
  end

  def get_file_time(_file) do
    raise "NIF get_file_time/1 had an error. :("
  end
end

(If you do a string it won’t work.)
{:ok, {seconds, nanoseconds}} = FileTime.get_file_time('file_as_char_list.txt')

{:ok, datetime} = DateTime.from_unix(seconds)

I tested it on a couple files and seemed to be working ok.

the_wildgoose · April 8, 2023, 1:57pm

Thanks all. Some really great ideas here!

I think I can find a way to use the NIF idea as well. Thanks!