How should I use put_assoc with upsert ? (many-to-many)

Hello! I build a crawler and using Ecto as a database.

The crawler scrapes some data from the doctor-hospitals site.
One doctor can work in many hospitals, and one hospital can have many doctors.
I use many-to-many associations to connect them.

I will try to explain how crawler works:

  1. Go to the Doctor’s page. Grab all doctor’s info and its hospitals.
  2. Put these hospitals’ links to the queue.
  3. Crawled data go to Ecto Changeset. Which build the associations via put_assoc, add to hospital table:hospital_profile, which contains just the links and no other data yet.
  4. After the hospital link is visited(from point 2), the crawler should enrich data by fill the field.
 %Doctor{}
   |> cast(params, fields)
   |> validate_required(:doctor_profile)
   |> put_assoc(:hospitals, hospitals_to_map(params.hospitals))
   |> upsert_doctors()

  def upsert_doctors(changeset) do
   changeset
   |> Repo.insert(on_conflict: :replace_all_except_primary_key, conflict_target: :doctor_profile)
  end

Below is my debugging process.

  1. Add the Doctor fields with associated Hospitals - Success! The references to “hospitals_doctors” table added.
  2. Enrich data for the Hospitals - Sucess! (for this I using usperts)
  3. Add a new Doctor which related from one of the Hospitals which already has added(points above) - Error!
** (Ecto.ConstraintError) constraint error when attempting to insert struct:
   * hospitals_index (unique_constraint)
If you would like to stop this constraint violation from raising an
exception and instead add it as an error to your changeset, please
call `unique_constraint/3` on your changeset with the constraint
`:name` as an option.
The changeset has not defined any constraint.

So If a doctor has many hospitals then I will get many ids in hospitals_doctors table. However if I want to add a new doctor which also works in one of the hospitals where work the first one, I will meat the error. Somehow can I fix it ? What I am doing wrong?

Code here: https://gist.github.com/Unumus/1f8c5029c7e0f3ffe2c1314b777d90e6

I could figure out it actually. Need do some refactoring, but an idea I think I got. Are there some other way to figure out such situations?

   {:ok, doctor} =
    %Doctor{}
    |> cast(params, fields)
    |> validate_required(:doctor_profile)
    |> upsert_doctors()

  {_n, struct} =
  upsert_hospitals(parse_hospitals_to_map(params.hospitals))

  Repo.insert_all("hospitals_doctors", parse_ids(doctor, struct))
  end

  def upsert_doctors(changeset) do
    Repo.insert(changeset, on_conflict: {:replace_all_except, [:doctor_profile, :id]}, conflict_target: :doctor_profile, returning: true)
  end

  def upsert_hospitals(hospitals) do
    Repo.insert_all(Hospital, hospitals, on_conflict: :nothing, returning: true)
  end

  def parse_ids(doctor, struct) do
    struct
    |> Enum.map(&%{doctor_id: doctor.id, hospital_id: &1.id})
  end

  def parse_hospitals_to_map(list_of_links) do
      list_of_links
      |> Enum.map(&%{hospital_profile: &1, inserted_at: ~N[2020-04-02 19:22:42], updated_at: ~N[2020-04-02 19:22:42]})
  end

Just a sanity check: but did you add the unique_constraint/3 call inside your cast-using function?

The unique_constraint helper is unrelated to on_conflict. @013 I don’t think that the association helpers handle upserts.

@dimitarvp yes, I did try a lot of variations did not found a way how to do associations out of the box.

It seems like I did meet some of this type of limitation.

Because of the inability to know if the struct is up to date or not, inserting a struct with associations and using the :on_conflict option at the same time is not recommended, as Ecto will be unable to actually track the proper status of the association.