Which database for time-series data?

Hi,

I am building my channels based app, users will be sending their lat, long as messages each 1 second, I need to store all the messages, there could be hundred of thousands users sending their locations, the message size is less than 100 byte.

The question is, what is the recommended db to go with? As I will be storing all those messages in real time… ? Each process (channel) will store the message once it’s received via Task.async

Please advice!

4 Likes

Are you going to need all that information forever? I think probably not (Rather, you’d want to create some smoothing algorithm that only stores a location once it is far enough from a users previous location).

If I am correct in that assumption, I think that you could store the first, real-time information using Mnesia (which is Erlangs built-in in-memory database that is made for concurrent read/writes), and after processing this information, store the smoothed values in a more conventional relational database such as PostgreSQL.

3 Likes

InfluxDB came to mind because it focuses on storing time-series data. There is also TrailDB which was recently released by AdRoll but may not fit your use case as well.

In general, I would think about appending these messages to a log like Apache Kafka or RabbitMQ, and then creating subscribers/consumers which store that data in special purpose databases like InfluxDB. This allows for more flexibility as your application grows. Consider the following diagrams:



sources:

6 Likes

Yes I agree Apache Kafka is the best message queue right now. It can very easy scale adding more nodes if needed.

But the main question is what you will do with this data, how do you process this data and how do you want to access this data later.

If you have kafka you can have many consumers, process data and put it to different destinations.

There was interesting article on Arts Technica Power tools: Sorting through the crowded specialized database toolbox

You could put also data into http://lucene.apache.org/solr/ or https://www.elastic.co/products/elasticsearch if you want to have data optimized for search.

if you are only interested to query data over time maybe time maybe influx db is good for you
Overview of influxdb

Summary:
You always optimize data how you access it later.

3 Likes

You might want to try Riak TS.

2 Likes

Riak TS is engineered to be faster than Cassandra.

Thanks, I just discovered Cassandra few days, and now Riak TS! :grinning:
How would I consume Riak TS inside my elixir app? it seems that Cassandra’s community is much bigger than Riak TS…

1 Like

Riak TS is kind of new, that’s why the community is rather small at the moment.

You can use the riak client (timeseries release): https://github.com/basho/riak-erlang-client/releases

The official documentation has snippets to work around with querying and data processing.

1 Like

I have installed Riak via docker:


And I have installed Elixir Riak client:

Connecting worked fine:
{:ok, pid} = Riak.Connection.start_link('192.168.99.100', 32774)

However, putting a user has failed, here is my code:

def store_to_riak do
  {:ok, pid} = Riak.Connection.start_link('192.168.99.100', 32774)
  IO.inspect pid
  o = Riak.Object.create(bucket: "user", key: "my_key", data: "Han Solo")
  IO.inspect o
  Riak.put(pid, o)
end

The result:

iex(1)> App.LocationController.store_to_riak
#PID<0.314.0>
%Riak.Object{bucket: "user", content_type: 'application/json', data: "Han Solo",
 key: "my_key",
 metadata: {:dict, 1, 16, 16, 8, 80, 48,
  {[], [], [], [], [], [], [], [], [], [], [], [], [], [], [], []},
  {{[], [], [], [], [], [], [], [], [], [],
    [["content-type", 97, 112, 112, 108, 105, 99, 97, 116, 105, 111, 110, 47,
      106, 115, 111, 110]], [], [], [], [], []}}}, type: :undefined,
 vclock: :undefined}
nil
** (EXIT from #PID<0.312.0>) :disconnected

Interactive Elixir (1.2.5) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> [error] GenServer #PID<0.314.0> terminating
** (stop) :disconnected
Last message: {:tcp_closed, #Port<0.12186>}
State: {:state, '192.168.99.100', 32774, false, false, :undefined, false, :gen_tcp, :undefined, {[], []}, 1, [], :infinity, :undefined, :undefined, :undefined, :undefined, [], 100}

But, sense I am new to Riak, I don’t know how to handle such errors, if you can help that would be great

Why picking timeseries release?

I suggest using this client and branch for Riak TS.

Isn’t Riak TS the same that will be installed here: http://basho.com/posts/technical/riak-quick-start-with-docker/ ?

1 Like

Thanks, I tried that branch, I’ve successfully connected to Riak TS cluster, but I could not do any query, can you please point me to simple example? I just need to put and get data.

Here is my implementation:

def store_to_riak do
  {:ok, pid} = Riak.Connection.start_link('192.168.99.100', 8098)
  IO.inspect pid
  o = Riak.Object.create(bucket: "user", key: "my_key", data: "Han Solo")
  IO.inspect o
  Riak.put(pid, o)
end

Here is the error I got:

iex(31)> iex(2)> App.LocationController.store_to_riak
iex(31)> #PID<0.390.0>
iex(31)> %Riak.Object{bucket: "user", content_type: 'application/json', data: "Han Solo",
 key: "my_key",
 metadata: {:dict, 1, 16, 16, 8, 80, 48,
  {[], [], [], [], [], [], [], [], [], [], [], [], [], [], [], []},
  {{[], [], [], [], [], [], [], [], [], [],
    [["content-type", 97, 112, 112, 108, 105, 99, 97, 116, 105, 111, 110, 47,
      106, 115, 111, 110]], [], [], [], [], []}}}, type: :undefined,
 vclock: :undefined}
iex(31)> ** (EXIT from #PID<0.354.0>) :disconnected
iex(31)>
iex(31)> Interactive Elixir (1.2.5) - press Ctrl+C to exit (type h() ENTER for help)
iex(31)> [error] GenServer #PID<0.390.0> terminating
** (stop) :disconnected
Last message: {:tcp_closed, #Port<0.13088>}
State: {:state, '192.168.99.100', 8098, false, false, :undefined, false, :gen_tcp, :undefined, {[], []}, 1, [], :infinity, :undefined, :undefined, :undefined, :undefined, [], 100}

I suspect that you are confusing Riak KV (key-value) with Riak TS.

Please read through the usage documentation for Riak TS. Unfortunately there is not much Elixir information on this yet, but you can follow the Erlang examples and read through the Elixir Riak Client code. That’s what I do.

The Riak TS usage documentation can be found here: http://docs.basho.com/riak/ts/1.3.0/using/

I might do a Riak TS integration with Ecto, as one uses SQL to interface with it, but I don’t have much time these days. If somebody else are up for the challenge it would be appreciated.

1 Like

Thanks, I will check Erlang examples, would the Riak.Object.create and Riak.put (https://github.com/drewkerrigan/riak-elixir-client) be sufficient to put data into Riak TS DB?

1 Like

They are for Riak KV and cannot be used with Riak TS.

In the timeseries branch of the Elixir Riak Client there is a Timeseries module with a put/3 function: https://github.com/drewkerrigan/riak-elixir-client/blob/timeseries/lib/riak/timeseries.ex#L8-L10

1 Like

I was confused as the ReadMe at timeseries branch needs to be updated, its currently uses put/2 , I am very new to Riak and Elixir too!

No problem. We are all here to learn.

Perhaps Basho should have rebranded Riak TS as something different than “Riak.” They are based on the same technology, but you interface them differently and they are tuned to do different things.

2 Likes