Connect ecto to SAP HANA by means of odbc / jdbc fdw

tschnibo · November 19, 2020, 8:56pm

Good day!

Does anyone of you have any experience with hooking up elixir with or without ecto to SAP HANA ˋdirectlyˋ or over Postgresql fdw to SAP Hana? I am working on a liveview frontend would be awesome if I could consume the data from our HANA Datawarehouse.

If yes, how does that work performance wise? And in general? I am now working with mongodb…

Your help is much appreciated. Thank you for reading.

mmmrrr · November 20, 2020, 7:45am

Hi @tschnibo

I have not yet connected to SAP HANA, but I have some knowledge using ODBC with Elixir because I need to connect to a rather esotheric DB over ODBC for a customer.

What I did was to patch up https://github.com/findmypast-oss/mssqlex as “ODBX”, which is as of yet ugly enough to not publish it… nevertheless: that is a starting point.

I also started implementing the Ecto adapter for this driver and got it somewhat working but decided that it’s not worth it to use Ecto, since we have no use for migrations in this specific project.

That’s why we simply use a patched version of https://github.com/lpil/yesql by the dear community member @lpil
To use it we only had to patch in the aforementioned ODBX driver and make some adjustments on how we receive results. The benefit is that we can simply write the SQL dialect our database needs while avoiding injection attacks. This freed us from having to maintain our own Ecto DSL mapping.

To the other questions: ODBC has a decent performance. erlang_odbc is hands down the best ODBC implementation I came across, which is mainly the reason we went with Elixir in that project. Although we found the connection to be somewhat brittle if you do lots and lots of parallel requests (this might still be a problem in our driver, actually).

Apart from that: the hardest part was really to figure out how to properly setup unixODBC.

Since you’re currently working with mongodb, I wonder if your problem should be represented in a relational database at all. I’d assume you were tasked to “put that data in the main system”, weren’t you? ^^

tschnibo · November 20, 2020, 2:19pm

Hi @mmmrrr

yes, you guessed completely right. The main system just moved to HANA - and my system is a high visibility project they would be glad to be the data provider for. Just the idea of a nice software stack differs quite a lot ;). I am willing to try since neither mongodb is officially supported… at first I used mysql (with python, not elixir), then mongodb, so it would “just” be touching quite a lot of DB related stuff again - good time to refactor a bit ;).

Thank you for your insights! That is very helpful. I have never heard of yesql before. To make sure I understand this correctly, you used unixODBC with the driver you received from your database toolchain. ODBX as a “driver” on the erlang-elixir side and yesql for querying and data consumption, right?

So in case I would use unixODBC together with the ODBC driver included in the SAP HANA client, there is no generic erlang/elixir ODBC driver which can handle this. This is where mssqlex in a modified version can be used. I don’t understand the role of yesql and mssqlex completely, is yesql a necessity, or would the modified ODBX also be sufficient for sql queries, but yesql does this more elegantly? And or is ODBC for the different dialects not generic enough, this is why you need to patch mssqlex?

Again on the indirect idea of a postgres-fdw, would you know the steps to hook up this exotic DB to postgres?
Would the ODBC_fdw package https://github.com/CartoDB/odbc_fdw, possibily work with the HANA client driver, or would I need a separate tool too?

Sorry, many questions, I sure can dive into it deeper myself - still I am very glad for pushes into the right direction, as you already provided. These tools seem like an excellent starting point.

Thank you very much for your advice, and best regards!

mmmrrr · November 20, 2020, 10:46pm

https://github.com/CartoDB/odbc_fdw looks really interesting. I simply wasn’t aware of it so I cannot say if this would or would not work. Nevertheless I’d always be hesitant to introduce another layer of indirection if not absolutely necessary or beneficial.

The setup/architecture contains 4 parts:

unixODBC
a) The program from your package manager. This will establish the low level connection.
b) Database specific driver (for you, this would be the HANA ODBC driver)
erlang_odbc, which is a binding to unixODBC
“ODBX” is the Elixir wrapper I wrote (based on mssqlex), that is modeled after postgrex. This manages the connection pooling, supervision of the requests and a streamlined API for an Elixir consumer. It also translates UTF-16 to UTF-8 and vice versa due to some unfortunate decisions in the past This is the core part: You could simply pass SQL strings to this library and it will yield results
Yesql is the thing we’re using to write the actual SQL. It translates something like select * from foo where bar = :baz into an Elixir function that takes an argument baz: "something" and returns a query like {'select * from foo where bar = ?', ['something']}. Note the ?? This is a placeholder and most database engines will populate these placeholders with the values from the passed list. The cool thing about this is that now these queries are protected against SQL injections, which we would not have if we had simply concatenated strings.

Actually the question mark was one of the things we had to patch in because Postgres and MySQL use $1, $2, etc. to identify the placeholders…

tschnibo · November 23, 2020, 7:46pm

Hi @mmmrrr,

thank you for your quick and awesome explanations!! Very helpful!

Now I understand (at least I think so now, but there is at least fifty shades of understanding ), and I’ll do some trying. Step 1 is going to be the same for fdw, obviously this is going to be my starting point. The whole process is going to take a while, because I can not work on this continuously…

Maybe I approach you again if I am deeper in the details and struggle.

Much appreciation!

tschnibo · January 31, 2021, 11:54pm

Hi @mmmrrr ,
thank you again for your help.
I finally spend some time on this problem.

My first experimentally working setup follows exactly your line with the exact same four parts!

The first steps including a slightly patched version of Mssqlex where even quite troubleless I think I changed two lines in the code. e.g. the “ping” function’s SQL statement was not compatible… and some connection parameter.

Just the hooking up of yesql to mssql took me quite a few hours now, but not due to complexity, but just because I had, until now, not looked at supervisors and “starting” of DB pools and such. I was partially lost

Thank you again for your insights and help!

Ps. maybe you can tell me if this is about how it “should” be done to start up the mssqlex connectionpool for yesql, or do I miss some important parts here? Its working at the moment

  use Application

  def start(_type, _args) do

        conn_params = [
                        name: xxxHana.ConnectionPool, #the name used in yesql later on...
                        database: "xxx",
                        hostname: "xxx:12345",
                        username: "xxx",
                        password: "xxx"
                ]

        child_spec_mssqlex = %{
                id: :Mssqlex,
                start: {Mssqlex, :start_link, [conn_params]}
        }

        children = [child_spec_mssqlex
                ]

    Supervisor.start_link(children, [strategy: :one_for_one])

  end

mmmrrr · February 2, 2021, 10:08am

Hm… that looks alright to me.

In the meantime I found one caveat with the Erlang ODBC driver which renders this setup useless for me, so watch out: a VARCHAR cell in the database will only transfer 4096 bytes - regardless of how many bytes are stored in that cell. This is a hard limit in the erlang ODBC driver. There exists at least one patched version (eodbc) but for me it sliced some data out and was not reliable - and I have no intention of maintaining a C code base

Other than that this setup worked really well for us, so if you don’t expect your cells to get large you should be good to go.

tschnibo · February 4, 2021, 5:30pm

Hey, thank you for your insights - again

This might be a problem, because I want to save JSON files to “collections” (to have mongodb interoperability)…
I just tested and found that my JSON get cut too…

Quite a setback, I have to admit. And when I see that this problem is already quite old, I see that the erlang / elixir universe is maybe not as big as I imagined it to be in the last few months.
I was in an elixir bubble

I tried also to fiddle around with “binary_strings: :off” / “binary_strings: :on” option in erlang odbc, but untill now without success.

For some tasks the setup is still going to be useful, but to store my data its not going to work then.

What do you do now?

I have no experience in writing a ODBC library… maybe this would be a nice project to write some common lisp or something. To dive into C is also not really on my list.

Of course I could try the postgres_fdw stuff now, but also this is not going to be working without patching for hana I suppose.

I don’t really understand how all the other people work around this. Do they all use postgres or have just small varchar fields. Or is there an other path (e.g. JDBC style), which I am missing completely at the moment…
Mainly talking to myself here, sorry…

I keep you informed in case I find a “golden bullet”… !

mmmrrr · February 5, 2021, 1:43pm

Actually I don’t know the workarounds. I was also floored when I found this problem a few weeks ago.
I think the best bet would be to start a cnode or something with JInterface (I tried clojure with GitHub - clojang/clojang: Clojure API for Erlang/OTP Communications (built on jiface) but that seems to be not very well maintained).

I also tried starting a clojure wrapper over an Erlang port which didn’t work because stdio also truncated for some of the test data. Then I tried a Rustler NIF which “worked” in that the data was correctly transmitted but I couldn’t solve the problem of sharing the connection without making the NIF stateful - which one should not do (at least that is my understanding).

So I’m down to using either using Java and JInterface, using Clojure directly or to use something completely different like Haskell or Rust directly. And honestly: since the project doesn’t have a hard requirement on the BEAM it will probably end up being one of the latter options because it will make the setup much, much simpler.

I think what could be worth a shot is: GitHub - CartoDB/odbc_fdw: PostgreSQL Foreign-data Wrapper for ODBC
If that driver does not impose the limit we could simply use ecto with postgrex and call it a day…

tschnibo · February 5, 2021, 4:59pm

I see, you have been diving quite deeply already! I totally understand the frustration!

Just right now stuff like NIF and calling other language’s implementations via port etc. is way above my current knowledge. I was also googling around and looking at stuff like the “go” driver for hana, until you wrote. But if the process of calling go functions over NIF or whatever one needs for that is as involving as it sounds, maybe this is not going to be an “easy” solution either. ( i was thinking about go for ease of deployment ).

I also wonder whether this FDW’s would support the “CLOB” datatype I am interested in. This CartoDB version does not list it specifically at least…

As I don’t yet understand all the pieces needed I am not of much help at the moment. I would like to spend some time on this, but for the next two weeks I have to concentrate on other stuff. I’ll return to the issue afterwards. If you should stumble upon a nice solution until then, please let me know! (Otherwise I might write here again later).

Thank you!

Edit: ps. I am sure you checked whether for your “exotic DB” exsists a specific postgres wrapper… just for completeness in case someone else follows…: Foreign data wrappers - PostgreSQL wiki most of them are outdated I think (or read somewhere on reddit ).

mmmrrr · November 15, 2021, 9:31am

Just FYI. We actually followed through with using Rust but in a more creative way: we use rust to provide a wrapper for the ODBC driver (because there are surprisingly few languages that do not have this 4kb cell size limit).

The cool thing about rust is that you have a relatively straight forward way of including it in Elixir: GitHub - rusterlium/rustler: Safe Rust bridge for creating Erlang NIF functions

So we used that and in the end wrote an Ecto wrapper for our custom database driver because that way we don’t have to retrain people to use something else.

Why am I writing this now? Because I found this - and my own answer - through google because another problem keeps creeping up. Since Rustler is communicating as a NIF and the ODBC driver we wrapped is… not as good as I hoped, it keeps crashing the webserver when the database goes away.

No biggie, just restart the server and all is good. But it bugs me that this will drop requests. So now I’m attempting to wrap our custom driver in a port so that it can simply be restarted by the supervision tree, which would be amazing for our general uptime

tschnibo · November 17, 2021, 12:50am

Thank you a lot for this follow-up! I am currently not working on this problem (and company) any-more. Nevertheless, very nice to hear that you find good ways!

At the moment I tend to avoid DB specific stuff up to an absolute minimum, just storing :term_to_binary stuff… (no ecto needed… ) well this just as a side note not really insightful and not performance tested in any ways.

Have a good time!