How many values are we talking here (order of magnitude)?
We’re talking about 15K records
How does your current implementation work? Structs don’t store data in a way that is accessible from different processes.
The list of structs is in a GenServer. I did that precisely to allow concurrent access
One option is to have two tables, one which stores k1 -> v1 and then some other table which stores, k1 -> k1, k2 -> k1 mappings. You do a lookup in table 2 first to get the canonical key, and then a lookup in table 1 to get the real value.
Yes, that’s what I clumsily tried to explain before:
"Obviously, I can use a bit of brute force, and have a 2nd ETS table with the 2nd key as key, get the 1st key, and then query the 1st table. Kind of an reverse index table. "
And, my question still is if I should use something like Amnesia instead. I think that will allow me to more easily support queries on other columns that are not unique keys but that group data never the less.
axelson thinks CacheX is a better option for me, but I don’t see it. I still need to support queries on different columns.
If you need in memory multi column (non key) lookups you’re going to start making things pretty complex. No matter what you do, if you want to avoid linear scans you’ll need to maintain secondary indices. Adding in mnesia won’t really change that, or even something fancy like an in memory sqlite table.
15k items isn’t really all that many. You could easily just have 1 table for canonical key -> value pairs, and then N tables, one per column you want to query on, containing value -> canonical key pairs.
Yes, that was my first idea. I just thought I should ask before going with it.
Looking at the Mnesia/Amnesia documentation it seems straightforward to declare multiple indexes on the same table. But, I’m very new to Elixir so I don’t really know what I’ll be getting myself into. Any reason why I should avoid this, and implement all the tables/indexes on my own?
I definitely think giving :mnesia a shot may be worthwhile, particularly if you’re just doing in memory tables and not worrying about any of the distributed bits. Amnesia is OK as wrapper libraries go, it’s VERY macro heavy so it can be a bit hard to debug. Your use case is relatively simple though so maybe just time box it to a couple hours and see how it goes?
Thank you for the info. I initially didn’t think this will help me because it doesn’t allow me to have two keys on the same record. However, I could use the same idea we discussed in this thread: Implement the two indexes on their own structures, and have a third structure for the non-key based query.
I’d like to implement this after the Mnesia based solution, and compare both. Can you tell me where to find more info? I know I can get the latest Erlang to be able to use :persistent_term but you’re suggesting to manually implement something similar with macros. Correct?
I might be severely off the mark here but I believe they were referring to something like FastGlobal. It produces a compiled module for you that would contain code lines like these:
(This is the code generated by the library. You don’t write that. You do FastGlobal.put(:key, "value") and that’s it.)
…Which is the fastest ever access you can get with Erlang / Elixir. However, every changing of value has a heavy runtime cost (make sure to go through the README file). So only use FastGlobal for very rare writes and a ton of reads.
As Ben said, 15K records is nothing. Caching every value several times (on as many keys as you need) is a perfectly fine strategy at that small scale.
Yes, it’s super simple if you don’t need to add things at runtime.
defmodule Data do
# load external data
@data [
%{id: 1, other_key: :a},
%{id: 2, other_key: :b},
%{id: 3, other_key: :c},
]
for key <- [:id, :other_key] do
for row <- @data do
def unquote(:"by_#{key}")(unquote(row[key])) do
unquote(Macro.escape(row))
end
end
def unquote(:"by_#{key}")(_) do
nil
end
end
end
OK. I hit an issue. I have the data in a CSV file, and to load it, I’m doing this:
def csv_to_map do
"../test0.csv"
|> Path.expand(__DIR__)
|> File.stream!()
|> CSV.decode(headers: true)
|> Enum.to_list()
end
That works fine, but if now I save it to a variable instead of the attribute you use, I get an error:
data = csv_to_map
for key <- [:id, :other_key] do
for row <- data do
def unquote(:"by_#{key}")(unquote(row[key])) do
unquote(Macro.escape(row))
end
end
Thank you, but I still get the same error. The compiler doesn’t see “key” as being defined and expands it as a function. This is the code I have now:
defmodule Catalog do
def csv_to_map do
"../test0.csv"
|> Path.expand(__DIR__)
|> File.stream!()
|> CSV.decode(strip_fileds: true, headers: true)
|> Enum.to_list()
end
def map_list_to_mem do
data = csv_to_map()
for key <- [:id, :other_key], row <- data do
def unquote(:"by_#{key}")(unquote(row[key])) do
unquote(Macro.escape(row))
end
def unquote(:"by_#{key}")(_) do
nil
end
end
end
end
And, I still get:
== Compilation error in file lib/catalog.ex ==
** (CompileError) lib/catalog.ex:15: undefined function key/0
(elixir) src/elixir_bitstring.erl:142: :elixir_bitstring.expand_expr/4
(elixir) src/elixir_bitstring.erl:27: :elixir_bitstring.expand/7
(elixir) src/elixir_bitstring.erl:20: :elixir_bitstring.expand/4
(stdlib) lists.erl:1354: :lists.mapfoldl/3
(elixir) expanding macro: Kernel.def/2
The contents here need to happen in the module body if you’re trying to generate functions. This will all happen at compile time. If compile time isn’t when you want to do this then you’ll want to pick one of the other strategies.
I see. Beginners mistake I don’t know why I didn’t try that before.
BTW, I was able to do the ETS based solution, and it’s working fine. I decided not to bother with Mnesia as I just have one additional index and I don’t think I’ll gain anything else from it.
But I’d like to finish at least one of the macro based solutions. Even if I don’t use it, I’m learning a lot.