I’m in the process of migrating a codebase and accompanying data to Elixir/Ecto, and I notice that my embedded schema (backed by jsonb columns) now include all fields defined in the schema, even when values for those fields are not provided.
Consider the following data model:
Asset
model to hold info regarding an attachment asset. Not all info will be known/provided:
defmodule Asset do
use Ecto.Schema
@primary_key false
embedded_schema do
field :url, :string
field :filename, :string
field :mime_type, :string
field :size, :integer
field :width, :integer
field :height, :integer
end
def changeset(asset, attrs) do
asset
|> cast(attrs, [:url, :mime_type, :width, :height])
end
end
Attachment
model holds info regarding media attachments which includes different assets of varying “editions” i.e. original and derived editions of the asset such as thumbnails, medium squares etc
defmodule Attachment do
use Ecto.Schema
schema "attachments" do
field :attachment_provider, AttachmentProvider
field :attachment_type, AttachmentType
field :attached_count, :integer, read_after_writes: true
timestamps(updated_at: false)
embeds_one :content_data, ContentData, on_replace: :delete, primary_key: false do
field :provider_id, :string
embeds_one :original, Asset
embeds_one :full, Asset
embeds_one :medium, Asset
embeds_one :thumb, Asset
end
end
Attachments may come from direct user uploads, or third party links. With third party links there are no editions. On saving third party links with no editions:
attrs = %{
attachment_provider: :some_provider,
attachment_type: :image,
content_data: %{
provider_id: "deadbeef",
original: %{
url: "https://example.com/image-link"
}
}
}
%Attachment{}
|> change(attrs)
|> Repo.insert!()
This results in the following jsonb content_data
column data:
{
"full": null,
"thumb": null,
"medium": null,
"original": {
"url": "https://example.com/image-link",
"size": null,
"width": null,
"height": null,
"filename": null,
"mime_type": null
},
"provider_id": "deadbeef"
}
Is there a way to get this to store just the provided data? So the goal here is to store the following only:
{
"original": {
"url": "https://example.com/image-link"
},
"provider_id": "deadbeef"
}
Take another example, where there are editions provided, but not all the fields are known:
attrs = %{
attachment_provider: :us,
attachment_type: :image,
content_data: %{
full: %{
url: "https://example.com/our-uploads-full.jpg",
filename: "our-uploads-full.jpg",
mime_type: "image/jpeg"
},
medium: %{
url: "https://example.com/our-uploads-med.jpg",
filename: "our-uploads-med.jpg",
mime_type: "image/jpeg"
},
thumb: %{
url: "https://example.com/our-uploads-thumb.jpg",
filename: "our-uploads-thumb.jpg",
mime_type: "image/jpeg"
}
}
}
%Attachment{}
|> change(attrs)
|> Repo.insert!()
The resulting jsonb content_data
column data is:
{
"full": {
"url": "https://example.com/our-uploads-full.jpg",
"size": null,
"width": null,
"height": null,
"filename": "our-uploads-full.jpg",
"mime_type": "image/jpeg"
},
"thumb": {
"url": "https://example.com/our-uploads-thumb.jpg",
"size": null,
"width": null,
"height": null,
"filename": "our-uploads-thumb.jpg",
"mime_type": "image/jpeg"
},
"medium": {
"url": "https://example.com/our-uploads-med.jpg",
"size": null,
"width": null,
"height": null,
"filename": "our-uploads-med.jpg",
"mime_type": "image/jpeg"
},
"original": null,
"provider_id": null
}
vs:
{
"full": {
"url": "https://example.com/our-uploads-full.jpg",
"filename": "our-uploads-full.jpg",
"mime_type": "image/jpeg"
},
"thumb": {
"url": "https://example.com/our-uploads-thumb.jpg",
"filename": "our-uploads-thumb.jpg",
"mime_type": "image/jpeg"
},
"medium": {
"url": "https://example.com/our-uploads-med.jpg",
"filename": "our-uploads-med.jpg",
"mime_type": "image/jpeg"
}
}
This results in a ton of extra space taken for millions of records. I feel I can accomplish what I need by storing raw maps, but would like to harness the utilities of schemas i.e. validation, custom types etc. Is this possible with embedded schemas?