thanks for the detailed response!
- How many rows are you trying to insert? From the look of it 50_000 - 75_000 records with 10 attributes in each.
I am trying to do 10k writes per event. Meaning that there will be consecutive independent events, that will come in with about 10k record, and I have to persist that into mnesia.
- What is the size of the table?
The able is not that big, maybe tens/hundreds of megabytes?
- What table copy type do you have? (ram_copies, disc_copies, disc_only_copies)
I thought mnesia auto assumes ram_copies, unless specified otherwise?
- What type of table? (set, ordered_set, bag)
I am using bag for the record, does it have a performance hit because it have to check for dup tuples?
- Is the table in use by many processes? I.e are there lots of concurrent inserts happening at the same time?
The table is being used by supervised process, no other process is actively using it, until the write is complete, and the consumer is then notified, so it can read the data back from mnesia.
So my understanding is that the dirty_* operations forgoes a bunch of checks involving data replication, sync , checkpoints and does not have transactions. But its still a serialized operation? Meaning that if i had 10000 record i need to insert, and I did it in a Enum.each function, each record has to be written, return before the next one can be written.
Even in the async_dirty operations documentation, it states
Calls the Fun in a context that is not protected by a transaction. The Mnesia function calls performed in the Fun are mapped to the corresponding dirty functions. This still involves logging, replication, and subscriptions, but there is no locking, local transaction storage, or commit protocols involved. Checkpoint retainers and indexes are updated, but they are updated dirty. As for normal mnesia:dirty_* operations, the operations are performed semi-asynchronously. For details, see mnesia:activity/4 and the User’s Guide.
so its not truely async meaning the write complete on its own, like If I were to iterate through the 10k records, and i do the following
spawn(fn -> :mnesia.dirty_write(record) end)
this would effectively spawn 10k processes, each would go and write to mnesia, meaning there could be race conditions, and might corrupt the dataset.
I guess what I am asking is what happens between these two blocks of code
Enum.each(fn (row) -> :mnesia.activity(:async_dirty, fn -> :mnesia.write(record) end, ) end)
Enum.each(fn (row) -> spawn(fn -> :mnesia.activity(:async_dirty, fn -> :mnesia.write(record) end, ) end) end)
At the end of the day, If I need to persist large number of record. Whats the best way to do that? I’d imagine iterating over them 1 and at a time and performce the write is not the best option. How can I do better? How do u load 100k record in for your test?