Apache Lucene equivalent in Elixir

fun2src · July 12, 2019, 2:01pm

Hi all,

I need to index patents from the sources available in the web. The current implementation enables querying over a database of patents published by the USPTO.

Current implementation: Apache Nutch, Apache Lucene and PostgreSQL.

Is there a Apache Lucene equivalent in Elixir?

Ted · July 12, 2019, 4:12pm

I’m not experienced enough with Elixir to comment on a direct equivalent of Lucene, but perhaps swapping out Lucene with Solr might do the trick.

Then you might be able to use an Elixir-based client, maybe:

.

lucaong · July 12, 2019, 4:16pm

Is it a project requirement for the full-text index to be embedded, or it is a possibility to use a search server like ElasticSearch or Solr (both based on Lucene internally)? In that case there are Elixir clients available.

sribe · July 12, 2019, 4:49pm

It really wouldn’t be hard to set up your own Lucene server, then query it from Elixir. Full-text index & search is very CPU intensive–I’d call it a good example of something the BEAM is not suited for.

rodrigues · July 12, 2019, 5:04pm

As @lucaong suggested, Solr and ElasticSearch are nice services based on Lucene, gotta see what is better for you. I use Elastic, but instead of a client package to interact with it, I use just a module that provides a minimal API to talk to it.

Definitely not a recommendation, but yes, there is at least one:

riak_search is a Lucene-like full-text search engine in the BEAM (Erlang), it just seems not to be under active development anymore – not sure if others are still working with it. Looks like it provides full text abilities just with BEAM, and also integration with Solr.

fun2src · July 12, 2019, 5:55pm

It is not a requirement of the project. ElasticSearch and Solr are interesting options.

Good idea! I will try this approach.

Thank you so much!

idi527 · July 12, 2019, 7:32pm

There is a rust alternative called tantivy which is not particularly hard to interact with from elixir via rustler.

mkunikow · July 12, 2019, 9:20pm

There is Amazon rip off of elastic search also

mkunikow · July 12, 2019, 9:22pm

nice Rust is cool kid on the block

mkunikow · July 13, 2019, 3:34pm

brightball · July 13, 2019, 4:39pm

And don’t forget, if you’re using Postgres as the database the full text search capabilities are quite good and can often prevent the need to involve an external engine.

https://www.postgresql.org/docs/11/textsearch.html

binarytemple · July 17, 2019, 8:02am

if you’re looking to integrate with elastic search there’s https://github.com/tsloughter/erlastic_search - take a look at the .travis.yml if you’d like to set up a similar integration test (i’m a maintainer on that project)