Apache Lucene equivalent in Elixir

Hi all,

I need to index patents from the sources available in the web. The current implementation enables querying over a database of patents published by the USPTO.

Current implementation: Apache Nutch, Apache Lucene and PostgreSQL.

Is there a Apache Lucene equivalent in Elixir?

3 Likes

I’m not experienced enough with Elixir to comment on a direct equivalent of Lucene, but perhaps swapping out Lucene with Solr might do the trick.

Then you might be able to use an Elixir-based client, maybe:

.

2 Likes

Is it a project requirement for the full-text index to be embedded, or it is a possibility to use a search server like ElasticSearch or Solr (both based on Lucene internally)? In that case there are Elixir clients available.

5 Likes

It really wouldn’t be hard to set up your own Lucene server, then query it from Elixir. Full-text index & search is very CPU intensive–I’d call it a good example of something the BEAM is not suited for.

4 Likes

As @lucaong suggested, Solr and ElasticSearch are nice services based on Lucene, gotta see what is better for you. I use Elastic, but instead of a client package to interact with it, I use just a module that provides a minimal API to talk to it.

Definitely not a recommendation, but yes, there is at least one:

riak_search is a Lucene-like full-text search engine in the BEAM (Erlang), it just seems not to be under active development anymore – not sure if others are still working with it. Looks like it provides full text abilities just with BEAM, and also integration with Solr.

3 Likes

It is not a requirement of the project. ElasticSearch and Solr are interesting options.

Good idea! I will try this approach. :smile:

Thank you so much!

2 Likes

There is a rust alternative called tantivy which is not particularly hard to interact with from elixir via rustler.

6 Likes

There is Amazon rip off of elastic search also :slight_smile:


nice Rust is cool kid on the block :slight_smile:



4 Likes

And don’t forget, if you’re using Postgres as the database the full text search capabilities are quite good and can often prevent the need to involve an external engine.

https://www.postgresql.org/docs/11/textsearch.html

8 Likes

if you’re looking to integrate with elastic search there’s https://github.com/tsloughter/erlastic_search - take a look at the .travis.yml if you’d like to set up a similar integration test (i’m a maintainer on that project)

1 Like