Complexity of a search withing a range in a ets?

Background

We have a file that has lines with IP ranges like the following:

IP1,     IP2,      Country, Region
1.0.0.1, 1.0.0.10, pt,      l

We use this file to know where an IP belongs to. So, for example, If I am given the IP 1.0.0.9, because it is between IP1 and IP2, I know it is in Portugal, Lisbon.

As you may imagine, this file has millions of lines like this. We need an efficient way to get the location of an IP.

ETS

ETS tables are known for being fast. So one of my first reactions was: what if I put this file in an ETS table with keys / values and then use pattern matching to check if a given IP is within the range?

Problem

However one of my co-workers pointed that although this would be possible using select the complexity would be O(n) for worst case, meaning I would run the entire table until I pattern matched into the answer.

This is not acceptable, so I was wondering if there is a more efficient way of doing this using ETS tables.

Any suggestions?

Compile into a module at runtime. Very bad load time and in general not good write times, but fast for read, constant time.

There are even packages abstracting those away for you.

There are even packages abstracting those away for you.

And there’s also Locus: Geolocation and ASN lookup of IP addresses

This is true of select generally because the patterns can be arbitrarily complex. That doesn’t mean that a range check is necessarily O(n) worst case. I’d do some testing and see.

1 Like

We tried doing that but we ran out of atoms. When we increased the number of atoms, we ran out of RAM at compile time. We are still looking into it though…

Due to current company policies, this is not an option for us.

How would you test this exactly?

You can look into how Locus, for example, approached the problem and apply a similar solution … For compilation into a module at runtime you can look into foil or fastglobal, but I’d still go with locus’s approach instead.

1 Like

How?

A single module, with a single function, this doesn’t seem to be much of atoms…

If though you want to have all IP addresses as atoms, well then there is not much we could do for you, if though you use tuple notation for them, it should be fine.

I’m sorry for you. But perhaps you might take a look at them and check how they do it? Alternatively there is the :persistent_term module in a recent enough OTP.

From all of what I’ve heard, it is meant to be used as a write once, read fast store of data.

1 Like

We are using multi-clause functions. Somehow, we ran out of atoms because of that. We are generating a module that has 10_000_000 functions. Admittedly this is something I plan on asking in another discussion, since this topic is supposed to be about ETS tables or similar alternatives.

Rather interesting but unfortunately the API doesn’t have support for range queries. Thanks for the effort though!

As an aside, this seems like a situation where an interval tree could be useful.

Have a set of IPs or ranges you want to look up, benchmark how long it takes you to look them up on the ets table at various sizes. Plot. If it’s a line you’ve got O(n), if it’s an increasingly flat curve you’ve got O(log(n))

3 Likes