Semantic Search Fields / Search Engine: Detect tags, location and time information in search input

Hello beloved forum members :slight_smile:

I am currently curious about good search experiences.
Most pages have some kind of search somewhere,
so I guess this is a pretty universal “problem”.

i was recently looking on a house marketplace website, where you have TONS of filter options, like the living space, room count, energy efficiency, age, renovation state, plot size, location, price, price per square-meter, heating energy source, …

I was wondering: While having tons of different input fields for all the different pieces of information is very explicit and probably makes a lot of sense for refining the search filter, I personally would like to just enter something like:

  • 5 room house around 120m^2 available this summer or
  • 200 sqm energy efficient flat roof house with gas heating and garden built after 2010 in London or
  • 100-120m2 house around Cologne, Leverkusen or Düsseldorf with less than 3 minutes driving to the highway

Basically, I’d like to write down what I would tell a realtor.

So here we have the description of some properties, dates/times or even timespans and locations.

I wonder:

  1. Most search fields except for in search engines do not offer this kind of search experience.
    Is that because its not feasible or just because adding 10+ different inputs is just plain simple to implement?
    An exception seems to be log analysers or written documentation, probably because those are very nice for full text searches and they have only text information.

  2. How would you go about and extract the different pieces of information?
    Have full-text search through tags and categories that your page knows?
    Or just match on those?
    Match on common date input formats and try to parse them or push the input into an AI language model?
    Maybe have some snippets like ‘next week’ or ‘this weekend’ and their respective translations for other languages that can be matched against?

  3. Have you ever worked with/on search inputs like described above?
    What worked well, what was a pain?

2 Likes

Making sense of human language is a non trivial problem, which seems unneccessary complex for a search setup. Fantastical’s calendar entry is the only place where I know such a pattern to exist and work reasonably well. But I guess that’s mostly because the language for “Do x at y on z” and alternatives of that are ok to parse.

3 Likes

The easiest way I would see is to use the GPT api and the function calls to define the schema. But it would be costly.
A cheaper alternative could be Bumblebee with Llama and having it reply as a parse-able json.

That sounds like classic named-entity recognition:

Named-entity recognition (NER) (also known as (named) entity identification, entity chunking, and entity extraction) is a subtask of information extraction that seeks to locate and classify named entities mentioned in unstructured text into pre-defined categories such as person names, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages, etc.

Most research on NER/NEE systems has been structured as taking an unannotated block of text, such as this one:

Jim bought 300 shares of Acme Corp. in 2006.

And producing an annotated block of text that highlights the names of entities:

[Jim]{Person} bought 300 shares of [Acme Corp.]{Organization} in [2006]{Time}.

Here are two libraries that turned up when searching for elixir entity recognition:

Agreed, it’s not perfect but overall their implementation is quite slick – especially how it incrementally populates the event form as you type. For example, entering Watch Lord of the Rings tomorrow afternoon will populate the date field upon typing w at the end of tomorrow and time field field upon typing n at the end of afternoon. LiveView could definitely help in creating similar real-time interactions much like it does with form validation.

3 Likes

Yup my example with the house was maybe a bit overkill, too many factors and no clear list of things to check for.

Maybe a used car search would be a better example: A lot of parameters like seat count, horse power, mileage, age etc, but every attribute is very specific and known.

Oh a new term to check for me, that is awesome! Ill dig into it.