Enum.member? in a map with multiple lists

What is an efficient way to find out if a user_id is within a struct that contains multiple lists in any of the lists?

I have a struct that contains 5 keys each with a list as the value.

I want to see if a user_id is actively searching in any of the categories (each individual list). If they are it should halt and return true.

I tried Enum.each over the struct, but it says that the protocol isn’t implemented.

:wave:

Maybe

defmodule ListStruct do
  defstruct [:list_a, :list_b, :list_c]
end
your_struct = %ListStruct{list_a: [1, 2, 3], list_b: [4, 5, 6], list_c: [7, 8, 9]}
looking_for_user_id = 5 # to search for

{:list_b, [4, 5, 6]} =
  your_struct
  |> Map.from_struct()
  |> Enum.find(fn {_key, user_ids} ->
    Enum.find(user_ids, fn user_id ->
      looking_for_user_id == user_id
    end)
  end)
2 Likes

Well if you keep your data structure the same then you’ll have to do a linear search through each list. If you change your datastructure (or build a complementary one) then you can improve the performance of this search. But that’s probably only worth it if you are doing this search many times or if the size of the lists is quite large.

One easy way would be a :set ets table with user ids as keys and a list of “lists” (if they overlap) as values:

{user_id = 5, lists = [:list_a, :list_b]}
{user_id = 6, lists = [:list_b]}
{user_id = 7, lists = [:list_c]}

Or a :duplicate_set ets table with user id as the key and one of lists as the value:

{user_id = 5, list = :list_a}
{user_id = 5, list = :list_b}
{user_id = 6, list = :list_b}
{user_id = 7, list = :list_c}
1 Like

A bit better:

defmodule Example do
  def sample(data, value, keys \\ [:list_a, :list_b, :list_c]),
    do: data |> Map.take(keys) |> Enum.find(&do_sample(&1, value))

  defp do_sample({_key, list}, value) when is_list(list), do: value in list
end

Important differences:

  1. Your way does not works with structs which have extra unrelated keys.
    As a developer I absolutely never assume that anything will not change - especially when you are working for startups. :smiley:
  2. Your way does not checks if value of key is enumerable (is_list/1 guard).
    This is more problematic if you are looking on all keys - especially if keys parameter is passed dynamically.
  3. My way is dynamic (Map.take/2 is just designed for such things)

In short my way is basically easier to modify on everyone’s needs.

I’m not sure, but maybe you are saving user_id as PostgreSQL's array. In such case I strongly recommend to use relation and perform SQL search which should be faster and safer (duplicated id constraint etc.).

@idi527 :ets and even :dets (if there is such need) are option too, but I don’t see that every newbie would understand what you mean. Even I don’t remember this topic well as I did not need them recently. I believe that well documented Elixir functions are better at start. :slight_smile:

For beginners which want to use really big enumerable(s) I recommend to take a look at Stream module and even flow library if there is such a need. Both are also well documented and should not cause bigger problems.

3 Likes

I’ll elaborate a little bit as to what I’m working on.

I am creating a bot for a marketplace that is limited to 6 items. When a user decides they want an item(s) they would typically have to refresh the browser until someone else drops it on the “trading board”.

I have two GenServers that I figured could solve this. One is responsible for the botting commands (Using Hound it creates new session, logs in, selects item…) and the other is responsible for delegating who gets to pick up from the board next and sending a msg to their process to pick up an item.

The second GenServer is the one in question and it holds a struct of user_ids per each item (%StructName{ exclusive_item1: [1,2,3], exclusive_item2: []…}.
When exclusive_item1 gets posted to the board after the bot has been running for a few hours (the typical amount of time for one to drop) I want it to grab the first user_id from exclusive_item1. The users have to be ordered because it should delegate the item to the person who’s waited the longest.

I have read a lot about ETS and DETS, but never implemented them. I’m essentially spending the next year of my life dedicating myself to Elixir. I left my Rails job to continue traveling and I plan on hiring some sort of consultant to make this investment of time really worth it. I don’t mind reaching for some of the more advanced tools right now because this is what my time is for, but I also want to make sure I have a solid understanding of the fundamentals.

Edit: Obligatory permission from site owner has already been granted as I am testing out new features for their company to show proof of concept.

What’s considered a big enumerable? The map I was trying to Enumerate over has 5 keys and each had an empty list except one. When I was getting the protocol error for Enum and it suggested Stream that was my first thought, but looking at the two functions I couldn’t figure out why Enum.each failed but Stream did not. I then wondered if it had something to do with Stream’s compostability and I was probably just messing that up too.

You probably did not started Stream :smiley:

Here is small example:

iex(1)> ["a", "b", "c"] |> Stream.map(& &1 + 1)
#Stream<[
  enum: ["a", "b", "c"],
  funs: [#Function<49.131689479/1 in Stream.map/2>]
]>
iex(2)> ["a", "b", "c"] |> Stream.map(& &1 + 1) |> Stream.run()
** (ArithmeticError) bad argument in arithmetic expression: "a" + 1
    :erlang.+("a", 1)
    (elixir) lib/stream.ex:565: anonymous fn/4 in Stream.map/2
    (elixir) lib/enum.ex:3317: Enumerable.List.reduce/3
    (elixir) lib/stream.ex:1568: Enumerable.Stream.do_each/4
    (elixir) lib/stream.ex:640: Stream.run/1

Tip: Stream is automatically started when you pass it to any Enum function. This is useful in some cases.

Firstly it’s not possible to pipe value, because this function returns :ok. You can deal with it only by passing function which is bad way, because loop will not stop when you will find value.

This is more what I was trying

# Check if user is currently searching
def handle_call({:active_search?, user_id}, _FROM, queue) do
    answer = false
    # active_users that are currently searching for particular item
    Enum.each(queue, fn ({item, active_users}) ->
      case Enum.member?(active_users, user_id) do
        true ->
          answer = true # At this point I was just trying to get something to work before refactoring.
      end
    end)
    {:reply, answer, queue}
  end

Enumerate over the initial struct then check if the current user_id is a member of the list.

as said each is bad here - Enum.find/2 as we suggested is much better

If you have some time for this I suggest to use developer tools and find server private API. Of course private API could change at any time without control you can always debug it well. Simple task which run once a day and do something without submitting should be ok for most cases. I generally don’t like hound as its doing heavy browser job which could be reduced (for some experienced people is just a matter of day or less - rarely more - depends on complexity). I have experience with writing scrapers, so for me it’s trivial task.

Looks like a typical scenario for database. Probably SQLite 3 or mnesia (:disc_copies mode). I really like mnesia last time as its pretty easy to work with multiple nodes without creating standalone database somewhere. ecto (database wrapper) should have good support for both of them. With database you do not need to worry about memory usage. For sure here you are storing only ids, but later you maybe would like to store also some extra data. Things are changing and extra limiting at start is mostly bad way.

I suggest to create schema like:

defmodule MyApp.MyContext.TradingQueue do
  use Ecto.Schema

  schema "trading_queue" do
    # or item_id and user_id fields if you do not want to save any extra data
    belongs_to :item, MyApp.MyContext.Item
    belongs_to :user, MyApp.MyContext.User
    # extra fields goes here
    timestampts()
  end
end

with this (and of course Item and User schema + migrations) you can easily write query in which you order by id field.

There’s no set rule but I would think you’d want a total of 200 or so items over those 5 keys before it makes sense performance wise to move to a different data structure. But actually analyzing the performance is always best.

The GenServer holding onto the ids has a few other jobs related to its purpose and the ids are very temporary and have no chance of changing in the future. If I am using two GenServers, mostly for the purpose of organization (one is responsible for the browser actions and the other is for delegating tasks needed to specific users). I have a Postgres db that ultimately stores the items that get picked up and by which users.

I am trying to move my state away from the boundaries of my app (funneling everything into a db) and using the BEAM more with things like GenServers, :ets, and even mnesia. So what would the difference between storing ids of users searching for items within a GenServer versus storing it within an :ets table?

I was just breaching a more advanced rails position when I left and began learning Elixir for my own projects. I do wish that I had more experience when I made the transition because building Rails apps finally started to seem easy then I abruptly changed to Elixir. I feel like l can see more of the bigger picture this time, but I’m still working out the kinks that really slow me down. I love OTP, I just have to figure it out. :joy:

Edit: The other thing I want to note, there will be a user scraping the page and checking against the state in the GS once to several times a second.

I also chose Hound because the site is 100% javascript. It’s a tiny app, but does not work without it. I’ve only deconstructed one other site using MITM once before and didn’t know how I could simulate the same things.

I would also be willing to pay someone to help me build the rest of an app to further understand Elixir. I really want to start contributing to the community, but I’m just not quite there yet.

It doesn’t work on structs though.

This is what I just realized. It was the struct that put a kink in it for me. When I start my GS I initialize it with a map with the 5 values with empty lists now and it seems to work.

It explains everything. You have all you need to create a relation like I described already.

If you have PostgreSQL database already then you do not need extra database. It only complicates your code. I was thinking about SQLite 3 (single file database) or :mnesia (which stores data in directory) for simplicity i.e. no need to setup database somewhere in /var/ or something like that. For example :mnesia is builtin in Erlang, so there is no need to install something like PostgreSQL. Everything changes if you already have database. Relation like I described is good, because it’s easy to extend (just by adding some extra fields).

This does not matters if you know how to write scrapers well. Basically you are doing something on site and inspecting it, so you see what request is send when you are registering or logging in etc. Having information about parameters, request and response content types and headers you can write any scraper without doing any JavaScript work. For some sites it’s easier, for some harder, but whole process looks really similar i.e. do something, observe requests/responses and try to reproduce them. Sometimes you need to read cookies from headers, sometimes token from response body, but scraper is scraper.

Look it’s pretty simple. For doing X scenario JavaScript code needs to send some requests and receive their responses. Developer need to inspect all of them and write Elixir version based on inspected data. Of course instead of static data like tokens you just need a simple function to fetch them from response dynamically.

As said I have experience writing scrapers. I can even write one for you/your company if you want. I just need to know url, full scenario (where to click, what to type etc.) and example data (like test account login data).

Feel free to reach me. :077:

What do you mean?

defmodule Example do
  def sample(data, value, keys \\ [:list_a, :list_b, :list_c]),
    do: data |> Map.take(keys) |> Enum.find(&do_sample(&1, value))

  defp do_sample({_key, list}, value) when is_list(list), do: value in list
end

defmodule ListStruct do
  defstruct [:list_a, :list_b, :list_c]
end

Example.sample(%ListStruct{list_a: [1, 2, 3], list_b: [4, 5, 6], list_c: [7, 8, 9]}, 5)
{:list_b, [4, 5, 6]}

For me it’s working well.

Sorry, did you mean that you tried to do:

Example.sample(%ListStruct{list_a: nil, list_b: [4, 5, 6], list_c: nil}, 5)

If so then there are 3 solutions (depends on use case):

# 1. change struct definition:
defstruct [:list_a, :list_b, :list_c]
# to
defstruct [list_a: [], list_b: [], list_c: []]

# 2. use keys enforcing which prevents to create struct with nil values
@enforce_keys [:list_a, :list_b, :list_c]
# in such case this would fail at compile time:
%ListStruct{list_a: nil, list_b: [4, 5, 6], list_c: nil}

# 3. add extra guard:
defmodule Example do
  def sample(data, value, keys \\ [:list_a, :list_b, :list_c]),
    do: data |> Map.take(keys) |> Enum.find(&do_sample(&1, value))

  defp do_sample({_key, nil}, value), do: false
  defp do_sample({_key, list}, value) when is_list(list), do: value in list
end