Things that could be helpful from my experience in Ecto (plug-in)

Eiji · April 20, 2017, 1:33pm

Hey, I worked on some projects and I collected some of my notes/propositions for Ecto or Ecto library/plug-in:

exactly_one_of: useful when designing polymorphic associations, for example:
when casting source association we check if exactly one of source exists (is not nil)
when fetching comments we could allow to load association source dynamically

defmodule MyApp.Comment do
    use Ecto.Schema
    # schema
    schema "comments" do
      belongs_to :author, MyApp.User
      exactly_one_of as: :source, required: true do # required is default to true
        belongs_to :post, MyApp.Post
        # more belongs_to ...
      end
    end
  end

group_assocs: similar to exactly_one_of, for example when fetching comments Ecto could allow to load all sources associations at one call

defmodule MyApp.Comment do
    use Ecto.Schema
    # schema
    schema "comments" do
      belongs_to :author, MyApp.User
      group_assocs :sources do
        belongs_to :post, MyApp.Post
        # more belongs_to ...
      end
    end
    # or:
    belongs_to :post, MyApp.Post
    group_assocs :sources, [:post]
  end

An Ecto plug-in with methods for easily work with multiple PostgreSQL schemas. Currently Ecto does not support queries between schemas, because it’s a unique feature for PostgreSQL.
An Ecto recording library as cache. I worked with exvcr and I think that could be really helpful to have something similar for Ecto, but:
that plug-in need to allow to implement custom back-ends, for example: for ETS, DEST, Mnesia, Redis and more
should use configuration for data expiration like:
it should have size limit and/or item count limit like: keep 5 mostly used query results
it should have priorities in configuration: prefer_never, prefer_mostly_used
allow to cache all queries (for whole application) or auto prefer slowest queries

use Mix.Config

config :ecto_recording_library, :expiration,
  count: 5,
  unit: :minute

I don’t have a time to learn how Ecto works and create plug-in(s) that could implement that and other features, but maybe someone have a time and no idea for useful library …
What do you think about my suggestions?

kelvinst · April 20, 2017, 3:25pm

I liked your solution for polimorphic associations there! +1 for a plugin with this

ericmj · April 20, 2017, 3:35pm

Ecto supports Postgres schemas, check out prefix in the docs.

Eiji · April 20, 2017, 3:36pm

@ericmj: ok, I was not clear - I corrected that point

OvermindDL1 · April 20, 2017, 4:41pm

Then please please tell me how to convert this working SQL into Ecto?

SELECT ap.id, ap.pidm, s.spriden_last_name, s.spriden_first_name
FROM public.accounts_pidm AS ap
JOIN banner.spriden AS s ON s.spriden_pidm = ap.pidm AND s.spriden_change_ind IS NULL

@Eiji Love the list!

A join could get them all at once efficiently if they are they same shape, however I find this to not be the case, so would be this be multiple queries or would you want it to build one massive query that returns all of the possible values as one big row from all the tables? Either could be done but that could get to be a huge SQL call…
Ditto here, but potentially much Much bigger too.
Please Yes Like This Would Help Me SO Much!!! Not just in PostgreSQL, I have to do joins across Oracle’s version of schema’s as well and right now I’m having to hand-write the SQL and pray that I did it right via Ecto’s execute since Ecto is entirely unusable in this case!!!
This would indeed by useful and I have created a kind of my own. Right now when a user logs in my cache will scan the database for permissions, grab extra permission data from the Oracle database, grab extra permission data from the LDAP server, and a couple other places, munge that all together and store it in ETS (5 minute time-out up to 10 minute stored due to a 5-minute janitor process). I’ve not had a case yet where caching the database itself is useful, but caching calculations done on the database data is highly useful, like in this case.

Eiji · April 20, 2017, 6:35pm

@OvermindDL1: thanks for your reply
In Ecto is possible to use join - to be clear for others - me and you are talking about something like:

MyApp.Post
|> MyApp.Repo.preload_from_another_schema(:comments, :not_posts_schema)
|> MyApp.Repo.all

I think about load them to Elixir List in one query (if possible) like: [["post_id", nil], ["another_id", 5]], so finally we could use &Enum.find/2 where id is not nil.
Here I think about fetching them normally (but by call like calling one association) and join all data into one List.
We already know what type of item we are trying to add (comment changeset), so we could require to use same model struct only one time per group, but I originally think only of preloading multiple associations into one List.
For example:

# params = ...
comment_changeset = MyApp.Comment.changeset(MyApp.Comment, params)
MyApp.Post
|> MyApp.Repo.one(1)
|> MyApp.Post.changeset(post, params)
|> Ecto.Changeset.add_to_group(:items, comment_changeset)

Me Too - I Already Have A Plans For Use It!!!
I know some use cases:
a) User info, settings etc. in RESTful applications (no need for single page WebSocket based application) - when we have 10% of really active users then that 10% could have a chance to load pages faster than other users - depends on specified limits
b) Reports: weekly, monthly, quarterly and annual - reports could be really big (especially annual reports) and have really long queries (number of records and associations) - caching them without write our own code per project would be really helpful
c) Cache only products, so more popular products pages will loads faster
d) Other data that is changing rarely or is really often searched by multiple users - there are lots of examples to describe

OvermindDL1 · April 20, 2017, 7:21pm

For number 4 you could use something like cachex, just access information via it’s API instead of ecto’s and you can have it load the data from where-ever, such as ecto, if it does not exist or is expired, else it returns the last used one from ETS. It would indeed by nice to have Ecto handle such a cache itself using its normal query API, but I’m unsure, it seems like it would be a hidden thing so sometimes you don’t know if it would be fast or slow…

Eiji · April 20, 2017, 7:41pm

@OvermindDL1: I think a little different than you about 4th point:

By default it should work similarly to exvcr api
Here is an part of example from README.md file:

  use_cassette "example_httpotion" do
    HTTPotion.start
    assert HTTPotion.get("http://example.com", []).body =~ ~r/Example Domain/
  end
  # I would like to have something similar, for example:
  opts = [entries_limit: 5, memory_limit: {2, :mb}, strategy: :latest_query]
  # or:
  opts = [entries_limit: 5, memory_limit: {2, :mb}, strategy: :mostly_used]
  # or move `opts` part to configuration file, so cache will match id by `cache_id`
  use_ecto_cache :cache_id, opts do
    # an Ecto API calls
  end

In some cases it could be used globally (after change configuration)
You said about ETS - how about Redis? I already think about using Redis as a cache, but currently I need to write expiration validation manually.
Use second API? Maybe better is to replace Postgrex adapter configuration with cache library and that library will use Postgrex or another Ecto adapter catching all of it’s queries, but I don’t know Ecto so good, so I don’t know if that way is possible or not.

OvermindDL1 · April 20, 2017, 8:42pm

Redis is still a remote call, and if you are doing that why not just use the database as the cache then, just make a MATERIALIZED VIEW to have the database itself create a mini-table of the interesting cached data and ignore all of the older or whatever data that you do not care about or make a sql procedure to manage an internal MRU cache instead? Will not be much of a time difference if any and it is one less system to manage.

I’ve personally never seen a point in Redis if you already have something like PostgreSQL. PGSQL has K/V support, streaming data, etc… etc…

Also, ETS does not have expiration but things like CacheX do, with janitor processing and all (it is what I use).

Hmm, that could actually work then, just have to make a new Ecto PostgreSQL adapter that uses Postgrex…

Eiji · April 20, 2017, 9:09pm

I don’t know about details, but someone asked me about Redis, because he was interested in, so it’s probably going to be a project requirement, but I see what you mean. Thanks, now we need to restart our work about it

MATERIALIZED VIEW and cache system - it’s why I wrote that it would be helpful to get library for that.

In library you linked:

There is only one policy implemented that removes oldest entries (at least library should have one more to prefer entries that are accessed more often than others).
I also see that there is no limit for memory. We need to expect max size of each entry and manually calculate how many memory it could take. I would like to set of percent of memory that should be available for other algorithms, but currently it’s a detail.

btw. streaming data - you mean something like Elixir Streams?

The more I read about databases, the more I want to create my own. I have lots more ideas and API draft in my crazy brain, but I need to learn more about core work of it and it’s of course lots of work - yes, lots of work - it will be probably a complete database framework rather than small library. If someone is interested in then I can help to work on it using private core methods and creating really advanced API. I have lots of ideas for that framework - some things I don’t saw in other projects that I know.

Update: ok, but what you want to work on more than one node? Simple in memory cache will work only per node. Is it a way to share it between nodes?

OvermindDL1 · April 20, 2017, 9:18pm

PostgreSQL has the ability to ‘register’ to streams, so say when a table gets changed you get notified of the change via a pushed message to you, or a direct message can be sent to a stream from elsewhere, it is a classic case that Redis is used for, which is entirely useless if you have PostgreSQL (or heck, even Phoenix PubSub, except PostgreSQL streams can persist the data more easily ^.^).

Creating a database is a hard, and I do mean HARD problem, do it for the learning opportunity, but not for an end task. ^.^

I have each node cache it’s own data. Whenever I change, say, permissions of a user I broadcast a message to a Phoenix.PubSub that clears the account permission cache ‘now’ and so forth. PostgreSQL used as a cache would already be multi-node anyway. ^.^

Also, if the in-memory cache is not shared per node then there is some outside communication, like PostgreSQL/Redis, which kind of defeats the point of the uber-fast-in-memory lookup speeds anyway. Do not worry about duplicating cache data, just use Phoenix.PubSub to send message to synchronize clearing and/or updating them.

Eiji · April 21, 2017, 12:08pm

@OvermindDL1:

Hmm, that could be useful, thanks.
I know how much hard it is (it’s why I probably need a good help on
core methods), but it could give us some good things. Anyway the best
way is to start it as a learning project and try to make it more
popular, because only them it’s easier to implement next features. It
would be awesome to release 1.0.0 version, but it could take lots of
months work.
That’s also interesting. I need do more research in that topic.

collegeimprovements · June 25, 2017, 6:16pm

How do you write Oracle queries with Ecto?
Even If you don’t write Ecto queries, how are you connecting Ecto/Elixir with Oracle database?

It’s a big pain for me too. Currently I’m using ruby to send data from Oracle to Posters but it’s getting more and more complex and hard to maintain over time.

Any help is hugely appreciated.

OvermindDL1 · June 25, 2017, 6:59pm

There is an in-dev-but-mostly-working Ecto thing for Oracle, but I’m actually using the Oracle FDW layer in PostgreSQL to talk to Oracle all through PostgreSQL itself.

collegeimprovements · June 25, 2017, 7:02pm

Can you share the links? I am looking at the Ecto adapters but unable to find them.

Wow, i didn’t know about this layer. I will explore it.
Thanks for the pointers.

OvermindDL1 · June 25, 2017, 7:05pm

Don’t know the Ecto name/link one on hand but the base library is: GitHub - erlangbureau/jamdb_oracle: Oracle Database driver for Erlang

https://laurenz.github.io/oracle_fdw/

PostgreSQL can treat a LOT of different things as just native inside it, Oracle is just one of many.

/me still does not know why someone would not pick PostgreSQL when they need a database