Replicating SQL data into Ecto

So i have an application that consumes an external web API via CRUD endpoints. The application is not always connected to the network, so i need to have a system that downloads/syncs the data, and enters it into an Ecto Repo.

The problem i have is that the documents are relational, but i don’t know how to enter them in the correct order. That means I don’t actually get to use Ecto/SQL relations so i’m doing all the splicing together manually at runtime. I also need to maintain all the migrations separately.

As an extension to the HTTP fetches, i also get real time messages over AMQP of the documents, but that is somewhat easier to work with as long as the initial HTTP fetch of the cache worked properly

Has anyone done this with Ecto before? I know HTTP/SQL might not be the best solution for this also. I’d be happy to hear of any other techs that would be better at solving this problem that would work with a Rails backend.

Can you please clarify on that? Not sure I understand how that makes it impossible to enter the data in a DB through Ecto.

sorry - i can enter the data into ecto, i just don’t get to use the schema helpers has_many, belongs_to, has_one etc.

There is nothing stopping me from using these, it’s just downloading and entering the data into Ecto efficiently.

so i do SomeHttpAdapter.get("/api/things") that returns a JSON array of relational data relying on other data, that i fetch in a different network request.

I guess one solution would be to fetch the data in some sort of tree/recursive manor.

Are you open to making a public code repository with already downloaded data put in the priv directory? Or is the data privacy sensitive?

I am curious to look at your problem in detail.

This is an open source project, just a pretty complex one. you can check it out here

as an example here is a schema for an asset that gets downloaded.

and here and here are the accompanying models.

in the Rails API, there is a regimen with many regimen_items.
In the Elixir project i do get("/api/regimens") and instead of replicating regimen with many regimen_items, i just store a regimen with a json blob for regimen_items. This makes indexing the regimen_items a little difficult.

1 Like

This might take me a while though. I an willing to help you with your particular problem. Can I get several files with relations between them quickly explained so I can code a module that inserts them in the right order and uses foreign keys? (You should also include the order in which the files are downloaded.)

Thanks for the dedication!

As for order, I don’t particularly care about order, but my ideal solution would be something that is not order dependent, or at least the order could be generated dynamically somehow in a somewhat generic manor.

Let me whip up some example documents.

1 Like

okay one that might actually be pretty simple to solve:

get("/api/regimens"):

[
     %{
       "color" => "yellow",
       "device_id" => 6,
       "id" => 174,
       "in_use" => false,
       "name" => "Go around the square every day at 10 am",
       "regimen_items" => [
         #...
         %{
           "id" => 6120,
           "regimen_id" => 174,
           "sequence_id" => 1897,
           "time_offset" => 36300000
         },
         %{
           "id" => 6121,
           "regimen_id" => 174,
           "sequence_id" => 1897,
           "time_offset" => 122700000
         }
         #...
       ]
     }
   ]

so in a normal Ecto setup i would define these four modules:

defmodule Regimen do
  use Ecto.Schema

  schema "regimens" do
    field :color, :string
    field :in_use, :boolean
    field :name, :string
    has_many :regimen_items, RegimenItem
    belongs_to :device, Device
  end
end

defmodule RegimenItem do
  use Ecto.Schema

  schema "regimen_items" do
    field :time_offset, :integer
    belongs_to :regimen, Regimen
    belongs_to :sequence, Sequence
  end
end

defmodule Device do
  use Ecto.Schema

  schema "devices" do
    #... other stuff
    has_many :regimens, Regimen
    has_many :sequences, Sequence
    #... other stuff
  end
end

defmodule Sequence do
  use Ecto.Schema

  schema "sequences" do
    #... other stuff
    has_many :regimen_items, RegimenItem
    belongs_to :device, Device
    #... other stuff
  end
end

The other thing i forgot to mention: since the ID of the resources are provided by the API, i’m currently doing primary_key: false or something to make Ecto not auto generate ids because i’m not aloud to supply an id when inserting a record. (this may be a sqlite3 or adapter issue?)

Edit

I just had a small realization. It is pretty obvious now, but i’m somehow just now noticing/understanding this.

NOTE: Device is sort of the central resource in my case.

get("/api/device") doesn’t explicitly say it has sequences since they are huge, and would probably lead to endless recursion.
BUT a device most definitely has many sequences.

It might just be that the data is presented to the consumer in the wrong way for replication.

Not sure about sqlite3 right now but what I always do is – I allow a normal DB primary key but also add a column called something like external_id.

i do @primary_key {:local_id, :binary_id, autogenerate: true} on all of my schemas and keep id as the external id. I think this is essentially the same thing as you are suggesting right?

Nope. I mean have a regular primary key id that is autogenerated and has zero relation to the external IDs that come from the downloaded data, and use a field like external_id to capture those, without it belonging in a composite primary key. Just have it separately and only put unique: true on it. This is what I do normally and I found it to be the lesser headache.

1 Like

BTW I have been inserting records through Ecto in Postgres while specifying an id and never had trouble. Have not tried updating them though.

So maybe what you describe is sqlite3-specific indeed.

I’ll try that out. Maybe i just have something setup weird there.

My offer still stands. Prepare a small project with the data already downloaded in the priv directory. And put a few docs explaining the relation of the data (you already explained a part). And your desired outcome.

I am just about to start new jobs and I will prioritize them – so can’t guarantee you a very quick help but the area is interesting to me so I will help.

1 Like

Thanks! I’ve been thinking about this pretty much nonstop for a few days. I think i’ll make a dummy up that explains the issue better.

1 Like

Okay i made a pretty simple project here

the server application will serve the files in server/fixture. These are modeled almost exactly like what will be served by the real server.

1 Like

Thank you. No promises for exact time but I will definitely check it out and give my input.

1 Like