Accessing an array by index with Ecto

For my hobby project, I’m writing software to analyze a board game, working with a large dataset of recorded online games. I’m using Postgres, have a ‘games’ table, and am storing the moves of each game in a ‘moves’ field. Each move encodes to a short string, so the ‘moves’ field is just an array of strings.

Here’s my migration:

def change 
  create table(:games) do
    # other fields omitted
    add :moves, {:array, :string}
  end

  create index(:records, [:moves], using: "GIN")
end

And here’s my schema:

schema "games" do
  # other fields omitted
  field :moves, {:array, :string}
end

My issue - I’m trying to fetch the games where the move at ‘index’ is ‘move’. I expected something like this to work:

Repo.all(from game in Game, where: game.moves[^index] == ^move)

However, it gave me this error:

** (RuntimeError) expected field `moves` to be an embed or a map, got: `{:array, :string}`
    (elixir 1.11.3) lib/enum.ex:1533: Enum."-map_reduce/3-lists^mapfoldl/2-0-"/3
    (elixir 1.11.3) lib/enum.ex:2193: Enum."-reduce/3-lists^foldl/2-0-"/3
    (elixir 1.11.3) lib/enum.ex:2193: Enum."-reduce/3-lists^foldl/2-0-"/3

When I drop down to a SQL fragment, it works just fine:

Repo.all(from game in Games, where: fragment("moves[?] = ?", ^index, ^move))

Is there a way, with Ecto, to access my array by index without having to resort to the fragment of raw SQL? I was genuinely surprised when the natural way to access an array didn’t work.

While you’re here, is this the best way to be doing this? Right now I’m working with a toy subset of the data, but the full set is over two billion games. Is the GIN index the way to go? Will it work with an array of strings, or should I encode them as integers instead? And will the index still benefit me if I’m querying using more than one field – say, all games with move[1] = ‘move’ AND with player = ‘Joe’?

Sorry for all the questions – I haven’t yet found a good resource for learning advanced Postgres.

1 Like

Have you tried it with :map?

def change 
  create table(:games) do
    # other fields omitted
    add :moves,:map
  end

  create index(:records, [:moves], using: "GIN")
end

and then in schema use embeds to save the moves as JSON or has_many to use it in another table.

If I remember correctly, :array was used in the older versions and now you should use :map instead.

I am curious about the board game… what kind of game?

Anyway, if it is the game I think of You have many ways to store moves, but sometime, it’s also interesting to store positions as well.

Most board games have a text representation for the moves… PGN for chess, SGF for go etc.

And You might use bitboard to store positions in an efficient way.

You might also use hash function (like zobrist hash) to identify unique position.

Each with their own advantage…

I did something similar, but use a schema for moves with…

game_id
move_index
move
position
zobrist hash

This allows me to query any moves at move_index, but also each identical position (hash) and patterns in positions (bitboard), like all games where this particular piece is on this particular square.

But my moves db holds hundreds of millions of records :slight_smile:

Note that this is not the way to access lists in Elixir:

iex(1)> a = [1, 2, 3, 4, 5]
[1, 2, 3, 4, 5]
iex(2)> a[2]
** (ArgumentError) the Access calls for keywords expect the key to be an atom, got: 2
    (elixir 1.11.4) lib/access.ex:311: Access.get/3

I suspect that is why it’s not implemented that way in the Ecto DSL, which tries to match Elixir semantics.

3 Likes

So was there anytime a Ecto solution to this question?
I found a mailing list discussion about it, with no conclusion?

I also got caught by this now. Have an existing database with type {:array, :string}.
Tried to access it via select: t.array[1] no luck.
I thought this would be the intuitive way in this case.

It’s not that simple. That type can be backed in postgres by an array column or a json/-b column. Both require different sql for access by index and ecto doesn’t know which one you used. The [] syntax was more or less recently implemented for Ecto.Query.API.json_extract_path/2, which as the name suggests only works if the backing column is of json/-b type.

Thank. I’m already aware of json_extract_path/2.
So there won’t be a solution for array then, right?

Means going the fragment way.