So I have been playing with this, using a script that writes directly to the disk, without going through Mnesia:
defmodule FileIO do
@moduledoc """
iex> fd = FileIO.open! "test.txt"
#PID<0.292.0>
iex> FileIO.append!(fd, "test it") && FileIO.read!("test.txt")
"test it\n"
iex> FileIO.close fd
:ok
"""
@write_mode [:append, :binary]
# @write_mode [:append, :binary, :delayed_write]
# @write_mode [:append, :binary, {:delayed_write, 1, 1}]
def open!(path, write_mode \\ @write_mode) do
File.open!(path, write_mode)
end
def close(file_descriptor) do
File.close(file_descriptor)
end
def append!(file_descriptor, data) do
{:ok, start_position} = :file.position(file_descriptor, :cur)
:ok = IO.binwrite(file_descriptor, "#{data}\n")
{:ok, end_position} = :file.position(file_descriptor, :cur)
%{
file_descriptor: file_descriptor,
start_position: start_position,
end_position: end_position,
size: byte_size(data)
}
end
def read!(path) do
{:ok, data} = :file.read_file(path)
data
end
end
I can confirm that the delay of 2 seconds is indeed coming from the BEAM when the file is open to write with delayed_write
, that defaults to 64KB max size or 2 seconds, as per the Erlang docs:
delayed_write
The same as {delayed_write, Size, Delay} with reasonable default values for Size and Delay (roughly some 64 KB, 2 seconds).
If in the above script i use @write_mode [:append, :binary]
or @write_mode [:append, :binary, {:delayed_write, 1, 1}]
I can immediately read the content of the file after I write to it and see that my last write is persisted on disk, but if I use instead @write_mode [:append, :binary, :delayed_write]
I cannot see the last write in the file, unless I wait 2 seconds to read it.
Just to recap, Mnesia when set to disc_copies
is using disk_log Erlang library that opens the file with {:delayed_write, max_size, max_time}]
, thus opening the window to loose data.
So, the next steps is trying to find a commit time to disk that is as lower as possible, without affecting too much the throughput, but favoring consistency over write speed. When I find a good value that I am happy with, then I will configure Mnesia with it and see if it can cope well under load in the same way as my above script.
Any recommendation to benchmark my script and Mnesia?