Secure files transfers servers in Elixir?

Jaypee · March 25, 2019, 8:47am

Dear Elixir Community,

First, I hope I choosed the right thread to post…

After Clojure last year, I’m currently studiying Elixir.
I’ve read several books, seen tons of videos and also struggled with some exercices on Exercism/Elixir ( a bit less “beginners friendly” than its Clojure counterpart “4Clojure” !).

It happens that the advantages of Elixir/OTP could benefit to the project I’m working on at the moment, so I would like to do some tests/benchmarks on some basics componants of this project:

I want to implement small servers in Elixir, that could be located on several computers/VM which simply send and receive files and could also verify their integrity (MD5 checksum ?)

The benchmark will mainly focus on scalability (multiply the servers and test a lot of simultaneaous file transfers) and robustness (test the transfer of huge files between two servers).

At the moment, I must confess that I’m a bit confused with the Elixir/OTP Genservers, Supervisors, Workers and others and how to correctly articulate them.
So I would like to have some advices/suggestions/hints on the correct way to use and organise these entities in order to implement the most efficient file transfer server in Elixir, regarding the goals of the benchmark.

I’m also open to suggestions concerning tips and tricks I could use to optimize my FT servers.

To my knowledge, Elixir is not used in my (big) company and I’d really like to succeed in providing a “proof of concept” !

Best regards and thanks in advance for your help.

P.S.: Please forgive my English as long as you roughly understand what I write !

Fl4m3Ph03n1x · March 25, 2019, 12:03pm

Can you send the files via HTTP POST? If so, a simple app using HTTPoison (or HTTPotion) with cowboy would likely suffice.

Otherwise, using Phoenix may be more up you alley:

https://phoenixframework.org/blog/file-uploads

Generally speaking using Phoenix will likely be easier, since the framework already does most of the work for you, but you will suffer in performance when compared to the first option.

brightball · March 25, 2019, 1:15pm

But not much

Jaypee · March 25, 2019, 1:29pm

I thought about Phoenix (and tryed it, following the “Chat” course) but it seemed to me that a framework could be a bit “heavy” for my simple FT servers.
I will then considere it.
Thanks.

Fl4m3Ph03n1x · March 25, 2019, 1:35pm

Depends on the level on incoming traffic you actually have. The more traffic, the more you will feel Phoenix dragging.

Jaypee · March 25, 2019, 1:54pm

My first idea was to set my servers as nodes, sending them files streams.
This could allow me to incrementally calculate the MD5 checksum while the files are streamed and avoid the waste of time.
Something like:

File.stream!("./thefile",[],2048) 
|> Enum.reduce(:crypto.hash_init(:md5),fn(line, acc) -> :crypto.hash_update(acc,line) end ) 
|> :crypto.hash_final 
|> Base.encode16(case: :lower)

brightball · March 25, 2019, 2:23pm

Maybe this hasn’t been updated in a while but this really shouldn’t be an issue.

gist.github.com

https://gist.github.com/josevalim/d3bf2f0654e1abe36c9e

phoenix showdown rackspace onmetal io.md

Comparative Benchmark Numbers @ Rackspace
==

I've taken the benchmarks from [Matthew Rothenberg](https://github.com/mroth)'s [phoenix-showdown](https://github.com/mroth/phoenix-showdown), updated Phoenix to 0.13.1 and ran the tests on the most powerful machines available at Rackspace.

Results
--

| Framework         | Throughput (req/s) | Latency (ms) | Consistency (σ ms) |
|-------------------|-------------------:|-------------:|-------------------:|

This file has been truncated. show original

AstonJ · March 25, 2019, 2:32pm

What is your experience on which you base such comments?

As shown in the benchmarks by @brightball any performance hit by Phoenix for the majority of apps would generally be considered to be negligible.

Fl4m3Ph03n1x · March 25, 2019, 2:46pm

Our system with hundreds of thousands of requests per seconds and the fact we are looking into alternatives to phoenix templating because of that.

I don’t know if the OP will test his system the same way we use ours, it may also be because we are not using Phoenix at the best of its potential, but this is my experience.

Still, I find what I say not to be polemic - I still recommended Phoenix to the OP and even directed him to some documentation.

It’s up to the OP to check both options and decide for himself. If Phoenix was borderline unusable with high traffic (like some community packeges in hex) then I wouldn’t recommend it in the first place.

AstonJ · March 25, 2019, 2:48pm

Can you share what app/site that is? I’d love to see

Fl4m3Ph03n1x · March 25, 2019, 2:55pm

I can’t

We do backend and for all purposes we are invisible to people. We deal mainly with data from providers and we use Phoenix Views as an easier means to modify XML, HTML, JS and JSON files (a decision was made that using Phoenix would be easier than dealing with AST trees). We are now updating Phoenix from 1.2 but it is proving to be a battle we want to avoid.

Not only that, sharing company code would get me fired. Hence the reason why I always have MWE in my forum threads and not real code.

I know this is not the answer one would expect. It even undermines my argument (I am asking readers to trust my word without real proof).

To add to that, I don’t even know if the OP will use Phoenix the same way we did. These reasons are exactly why I still recommended OP to try it out in my first post.

I think Phoenix is awesome, it’s not working for us and I am sorry I can’t provide additional context.

easco · March 25, 2019, 3:07pm

If your goal is file transfers only, then it seems like Erlang should contain mechanisms to handle that already. I believe Erlang implements an sftp server (http://erlang.org/doc/man/ssh_sftp.html). I also see this article:

https://ninenines.eu/articles/ranch-ftp/

Which talks about building an ftp server on top of Ranch.

Accessing those things from Elixir to help with distributed coordination shouldn’t be too hard.

brightball · March 25, 2019, 3:48pm

Honestly, if you find a faster option for templating at 100k req / sec I think we’d all like to know.

It might even be worth contacting the folks over at Discord for some suggestions. Their engineering team has done a lot to push the limits of Elixir, open sourced and blogged about too.

Fl4m3Ph03n1x · March 25, 2019, 4:00pm

We are currently looking into this Engine:

https://elixirforum.com/t/fast-eex-iolist-option-for-eex-engine/16145/7

Looks promising thus far, I hope it can serve you well.

This is not anything new however, some folks at the community already gave it a shot, but I am not sure why this was not implemented (at the best of my knowledge, nothing was done with it).

dimitarvp · March 25, 2019, 8:51pm

That’s pushing the limits indeed. But IMO at this scale you’ll be much better off having 2-3 servers and putting a load balancer in front of them. Trying to squeeze every last drop of performance per watt is not using the dynamic languages like Erlang and Elixir to good use. They can crumble under such pressure.

If you really really want that server to remain singular then maybe it’s time to look into Rust or Go or OCaml.

OvermindDL1 · April 1, 2019, 8:16pm

At that scale on a single server then you are looking at pure socket performance, so C/C++ or Rust with specific libraries, or Nim (I’m still heavily impressed at its significantly loaded socket performance). I’d not recommend Go or OCaml, neither perform as well as C/C++/Rust/Nim when finely tuned (though they do well when not everything is finely tuned). You will be looking at memory pools, no-alloc structures, etc… etc… Honestly just adding more servers as better as redundency is always better for such things, at which point Elixir is just fine.

Still, hundreds of thousands per second is crazy high, I’m exceptionally curious what this is. ^.^

joeerl · April 2, 2019, 2:26pm

Well a while back I wanted to transfer some files so …

joeerl · April 2, 2019, 2:36pm

Well you have sockets, and something to compute an MD5 checksum - which is actually all you need.

Forget all the generic this and that and make a simple socket client and server (homework - find out the smallest socket client and server code)

Making a client/server over a socket is virtually the first thing I do in any language when learning - very instructive

Repeat in C, TCL, JS, Java, Ruby, Python, Perl, C#

Seriously, the round trip client -> server -> client with a check that you transfer all data and loose nothing is an essential programming technique - learn to do this first THEN add JSON/XML (whatever) on top.

Once you can get two programs in different languages communicating raw bytes over a socket then you can start having fun - Use the low level socket libraries and NOT fancy frameworks - this is the slow way to program - BUT in the long term the best way - understanding is the key to programming libraries which hide what you are really doing should only be used once you understand what’s really happening

Cheers

sribe · April 2, 2019, 4:41pm

Possibly even DPDK…

Jaypee · April 4, 2019, 8:38am

I will follow your “slow” way in Elixir to learn it with the help of your article “Why I often implement things from scratch”.
Thanks for your advice !