Right way to write data come from concurrent requests to Kafka

duytruong · May 2, 2017, 1:24am

Hi all,
I’m building an event collector, it will receive a http request like http://collector.me/?uuid=abc123&product=D3F4&metric=view then write them to Apache Kafka, now I use Plug, Cowboy and KafkaEx.

defmodule Collector.Router do
  import Plug.Conn

  def init(opts) do
    opts
  end

  def call(conn, _opts) do
    conn = fetch_query_params(conn)
    KafkaEx.produce("test", 0, "#{inspect conn.query_params}")
    conn
    |> put_resp_content_type("text/plain")
    |> send_resp(200, "OK")
  end
end

AFAIK, Cowboy spawns a new process for each request, so I think write to Kafka in the call function is a proper way because it’s easy to create hundreds of thousands of processes in Elixir. But I wonder if this is the right way to do? Do I need a queue before write to Kafka or something like that? My goal is handle as much concurrent requests as possible.

aseigo · May 2, 2017, 7:25am

According to the docs on hex.pm, if you do not define a worker in the options in the produce call, it defaults to the :kafka_ex worker … so your calls would end being serialized through that one worker.

I do not know how much throughout a kafka_ex worker can manage but if it is a concern you can great a pool of unnamed workers using e.g. poolboy and then pull workers from the pool. How to do this is documented in the kafka_ex readme.

As always, measure before making your app more complicated

duytruong · May 2, 2017, 7:35am

Thank you for your hint, I’ll try and update the result for further discussion