You need to do a couple of things in order to have this working, you could probably do joins but I personally prefer the materialised view approach with a background process that just refreshes it. This is viable if you don’t need the search to be reflected immediately as long as you can create a unique index on the table and Postgresql will be able to refresh the view concurrently.
The process can be summarised as:
- Create the materialised view
def change do
execute("""
CREATE MATERIALIZED VIEW your_search_table AS
SELECT 'product' AS result_type, products.id, products.title AS display_text, to_tsvector('english', products.title) AS tsv_search, product_categories.category_id AS category_id, product_subcategories.subcategory_id AS subcategory_id FROM products
LEFT JOIN product_categories ON product_categories.product_id = products.id
LEFT JOIN product_subcategories ON product_subcategories.product_id = products.id
WHERE products.published
UNION ALL
SELECT 'store' AS result_type, store.id, stores.company_name AS display_text, to_tsvector('english', stores.company_name) AS tsv_search, NULL AS category_id, NULL AS subcategory_id FROM stores WHERE stores.type = 'marketplace'
""")
execute("CREATE UNIQUE INDEX id_type_index ON your_search_table (result_type, id);")
execute("CREATE INDEX tsv_index ON your_search_table USING gin(tsv_search);")
execute("CREATE INDEX categories_index ON your_search_table (category_id, subcategory_id) WHERE result_type = 'product'")
end
Notice this had things removed like setting weights on the tsv’s so it might not work in copy paste but should give you an idea.
Perhaps you should also add a reverse action to the execute
statement. The unique index is necessary for the concurrent refresh. By using a type
and an id
where each id is unique in the scope of the type you can be sure that index is unique. If you can’t for some reason have that then you would create new materialised views on the fly, and rename & drop them as you go, so you have always a table ready and no queries get blocked waiting for the refresh.
- define a schema for it:
defmodule SearchTable do
use Ecto.Schema
@derive {Jason.Encoder, only: [:result_type, :id, :display_text]}
@primary_key false
schema("your_search_table") do
field :result_type, :string, primary_key: true
field :id, :id, primary_key: true
field :display_text, :string
field :tsv_search, TSVector.Type
field :category_id, :id
field :subcategory_id, :id
end
end
- Define a type for the TSVector:
defmodule TSVector.Type do
@behaviour Ecto.Type
def type, do: :tsvector
def cast(tsvector), do: {:ok, tsvector}
def load(tsvector), do: {:ok, tsvector}
def dump(tsvector), do: {:ok, tsvector}
end
- Now to query the table, you’ll need to have a fragment to make it usable in normal queries, so defining one somewhere and importing it into whatever context queries:
defmacro tsquery(field, text) do
quote do
fragment("?::tsvector @@ to_tsquery('english', ?)", unquote(field), unquote(text))
end
end
- You’ll also need to prepare the text for the tsquery, when it comes in:
def split_text_for_tsquery(text) do
String.split(text, " ", trim: true)
|> Enum.reject(fn(text) -> Regex.match?(~r/\(|\)\[|\]\{|\}/, text) end)
|> Enum.map(fn(token) -> token <> ":*" end)
|> Enum.intersperse(" & ")
|> Enum.join
end
Then you can use it as:
def search(terms, params) do
case split_text_for_tsquery(terms) do
"" -> []
search ->
YourSearchTable
|> where([yst], tsquery(yst.tsv_search, ^search))
|> Db.Repo.all
end
end
- Finally you’ll need to keep the table updated. You can write a process that loops itself every X and just refreshes it, like every 5min.
Db.Repo.query("REFRESH MATERIALIZED VIEW CONCURRENTLY #{@your_search_table_name}", [], [timeout: 360_000])