Best way to keep read-only table in memory

Cruz · December 14, 2018, 11:53pm

Hello,

I have a design related question. I’d like to keep a very large list of products in-memory to serve multiple clients and queries. My first implementation uses structs, and it’s a good start but I now want to explore some of the more advanced options of the ecosystem, e.g. ETS, Mnesia, etc.

I was in the middle of refactoring to an ETS-based solution, but I stopped when I found out I couldn’t have two different keys on the same ETS table. I think I still can use “ETS match patterns” to get data based on the 2nd key, but I wonder if I should just use Mnesia instead.

This is just a read-only cache level for my app. There’s a DB that I’ll query to populate the ETS or Mnesia in-memory table on startup. The data will change only every two or three months; and at that point, reloading it is totally fine. So, I don’t think I need to go with something more complex like CacheX.

Obviously, I can use a bit of brute force, and have a 2nd ETS table with the 2nd key as key, get the 1st key, and then query the 1st table. Kind of an reverse index table. However, this seems somehow ugly. So, again, should I use Mnesia or something else instead?

Thank you,

axelson · December 15, 2018, 12:06am

Will you always be querying by both keys? If so that makes it easier. Also Cachex is quite easy to integrate. You could probably get a lot of benefit with very little code using Cachex.fetch/4:

Cachex.fetch(:my_cache, {key1, key2}, fn {key1, key2} ->
  result = DB.lookup(key1, key2)
  {:commit, result}
end)

Cruz · December 15, 2018, 12:12am

Sorry, I should have provided some examples. No, I’d never need to query with both keys together. It’d be either one or the other one. I might need to add a 3rd key in the future though.

Is CacheX easier to use than Mnesia? I don’t need any of its advanced features but cleaner code is always welcome. I was just reading about Amnesia as well

axelson · December 15, 2018, 12:24am

If it’s just in memory (:ets) then querying for one after the other should be fine. And I would say that Cachex is easier to use than Mnesia and suits your use-case better (since you already have a primary database).

Cruz · December 15, 2018, 12:34am

Your comment about CacheX intrigues me, but with the little reading that I’ve done, it’s not obvious how I will maintain two or three keys for the same record. While with something like Mnesia or Amnesia (which seems easier to use), I’d just declare as many indexes as I need. Am I missing something?

kokolegorille · December 15, 2018, 12:36am

If it’s just about the keys, You can define ets table of different types, which one did You use?

From https://elixirschool.com/en/lessons/specifics/ets/#table-types

set — This is the default table type. One value per key. Keys are unique.
ordered_set — Similar to set but ordered by Erlang/Elixir term. It is important to note that key comparison is different within ordered_set . Keys need not match so long as they compare equally. 1 and 1.0 are considered equal.
bag — Many objects per key but only one instance of each object per key.
duplicate_bag — Many objects per key, with duplicates allowed.

Cruz · December 15, 2018, 12:43am

I don’t follow. The different table types allows me to have multiple records on the same key, but that’s not my issue. Let me provide an example:

key1, key2, key3, value1, value2, …, valueN

Use cases:
Given key1, find the one record that matches it

Given key2, find the one record that matches it (this could be the same record as the previous case)

Given key3, find all the records that matches it

I’m using duplicate_bag, but this is just a trick. I know the records are not duplicate, and I don’t want the system wasting time checking for that

benwilson512 · December 15, 2018, 1:15am

Can you make this a bit more concrete? What does match mean? equality? pattern matching? are there multiple keys for the same value?

Cruz · December 15, 2018, 1:18am

What does match mean?
It means equality. It’s basically the equivalent of the SQL query:

Select * from table where key = value

are there multiple keys for the same value?
Yes, that’s what I meant before. There are two ways to access each unique row on the table. That’s why I have to support two different queries based on those two different keys

benwilson512 · December 15, 2018, 1:34am

How many values are we talking here (order of magnitude)? How does your current implementation work? Structs don’t store data in a way that is accessible from different processes.

One option is to have two tables, one which stores k1 -> v1 and then some other table which stores, k1 -> k1, k2 -> k1 mappings. You do a lookup in table 2 first to get the canonical key, and then a lookup in table 1 to get the real value.

EDIT: Actually your second table could just store k2 -> k1, and if you do a lookup in the second table and there’s no values then you know you already have the canonical key.

Cruz · December 15, 2018, 1:47am

How many values are we talking here (order of magnitude)?

We’re talking about 15K records

How does your current implementation work? Structs don’t store data in a way that is accessible from different processes.

The list of structs is in a GenServer. I did that precisely to allow concurrent access

One option is to have two tables, one which stores k1 -> v1 and then some other table which stores, k1 -> k1, k2 -> k1 mappings. You do a lookup in table 2 first to get the canonical key, and then a lookup in table 1 to get the real value.

Yes, that’s what I clumsily tried to explain before:
"Obviously, I can use a bit of brute force, and have a 2nd ETS table with the 2nd key as key, get the 1st key, and then query the 1st table. Kind of an reverse index table. "

And, my question still is if I should use something like Amnesia instead. I think that will allow me to more easily support queries on other columns that are not unique keys but that group data never the less.

axelson thinks CacheX is a better option for me, but I don’t see it. I still need to support queries on different columns.

benwilson512 · December 15, 2018, 1:53am

If you need in memory multi column (non key) lookups you’re going to start making things pretty complex. No matter what you do, if you want to avoid linear scans you’ll need to maintain secondary indices. Adding in mnesia won’t really change that, or even something fancy like an in memory sqlite table.

15k items isn’t really all that many. You could easily just have 1 table for canonical key -> value pairs, and then N tables, one per column you want to query on, containing value -> canonical key pairs.

Cruz · December 15, 2018, 2:02am

Yes, that was my first idea. I just thought I should ask before going with it.

Looking at the Mnesia/Amnesia documentation it seems straightforward to declare multiple indexes on the same table. But, I’m very new to Elixir so I don’t really know what I’ll be getting myself into. Any reason why I should avoid this, and implement all the tables/indexes on my own?

benwilson512 · December 15, 2018, 2:03am

I definitely think giving :mnesia a shot may be worthwhile, particularly if you’re just doing in memory tables and not worrying about any of the distributed bits. Amnesia is OK as wrapper libraries go, it’s VERY macro heavy so it can be a bit hard to debug. Your use case is relatively simple though so maybe just time box it to a couple hours and see how it goes?

Cruz · December 15, 2018, 2:05am

Yes, that’s precisely my use case. Thank you so much. You’ve been very patient and helpful

Kabie · December 15, 2018, 3:28am

I would compile those into a module with macro. Similar to the new :persistent_term.

Cruz · December 15, 2018, 4:29pm

Thank you for the info. I initially didn’t think this will help me because it doesn’t allow me to have two keys on the same record. However, I could use the same idea we discussed in this thread: Implement the two indexes on their own structures, and have a third structure for the non-key based query.

I’d like to implement this after the Mnesia based solution, and compare both. Can you tell me where to find more info? I know I can get the latest Erlang to be able to use :persistent_term but you’re suggesting to manually implement something similar with macros. Correct?

dimitarvp · December 16, 2018, 2:16am

I might be severely off the mark here but I believe they were referring to something like FastGlobal. It produces a compiled module for you that would contain code lines like these:

def get("column1-key1"), do: "value1"
def get("column2-key1"), do: "value1"
def get("column1-key2"), do: "value2"

(This is the code generated by the library. You don’t write that. You do FastGlobal.put(:key, "value") and that’s it.)

…Which is the fastest ever access you can get with Erlang / Elixir. However, every changing of value has a heavy runtime cost (make sure to go through the README file). So only use FastGlobal for very rare writes and a ton of reads.

As Ben said, 15K records is nothing. Caching every value several times (on as many keys as you need) is a perfectly fine strategy at that small scale.

Cruz · December 16, 2018, 2:26am

This is awesome sir. Thanks a lot

dimitarvp · December 16, 2018, 2:27am

We will be happy if you share what solution you ended up with eventually. It’s interesting for the future readers.