Hi!
I’m experiencing a weird RAM-usage spike in a relatively simple update
action. It does involve relationships and embeds and a lot of data (approx. 7MB), but the spike reaches over 30GB on a dev machine and 16GB + out-of-memory crash on production.
I have found a workaround, but it is not ideal (different transaction). And I am in general curious if there’s an issue with Ash or with my understanding of how things work.
Here are the details:
Document
represents a PDF for which we need to save its OCR data (coming from an external service) - basically, we store and use coordinates of all the recognized words on all the pages.
Here’s how this data is modeled:
defmodule MyApp.Document do
use Ash.Resource,
domain: MyApp.Domain,
data_layer: AshPostgres.DataLayer,
notifiers: [Ash.Notifier.PubSub]
alias MyApp.Page
attributes do
integer_primary_key :id
status :string # it's more complex that this, but not very relevant
end
relationships do
has_many :pages, Page do
source_attribute :document_id
sort :index
end
end
# ...
end
defmodule MyApp.Page do
use Ash.Resource,
domain: MyApp.Domain,
data_layer: AshPostgres.DataLayer
@derive Jason.Encoder
attributes do
attribute :index, :integer, allow_nil?: false, constraints: [min: 0], primary_key?: true
attribute :width, :integer, allow_nil?: false, constraints: [min: 0]
attribute :height, :integer, allow_nil?: false, constraints: [min: 0]
attribute :words, {:array, MyApp.Page.Word}, default: []
end
relationships do
belongs_to :document, MyApp.Document do
allow_nil? false
primary_key? true
attribute_type :integer
attribute_writable? true
destination_attribute :document_id
end
end
calculations do
calculate :preview_url, :string, MyApp.GeneratePagePreviewURL
end
identities do
identity :document_and_index, [:document_id, :index]
end
actions do
defaults [:read]
create :add do
primary? true
accept [:index, :width, :height, :words, :document_id]
upsert? true
upsert_identity :document_and_index
upsert_fields :replace_all
end
end
end
defmodule MyApp.Page.Word do
use Ash.Resource,
data_layer: :embedded
@derive Jason.Encoder
attributes do
attribute :global_index, :integer, public?: true, allow_nil?: false, constraints: [min: 0]
attribute :symbols, :string, public?: true
attribute :bounding_box, MyApp.Page.RectangleOnPage, public?: true
end
actions do
defaults [:read, :destroy, create: :*]
end
end
defmodule MyApp.Page.RectangleOnPage do
use Ash.Resource,
data_layer: :embedded
@derive {Jason.Encoder, only: [:page_index, :left_x, :top_y, :width, :height]}
attributes do
attribute :page_index, :integer, public?: true, allow_nil?: false, constraints: [min: 0]
attribute :left_x, :float, public?: true, allow_nil?: false
attribute :top_y, :float, public?: true, allow_nil?: false
attribute :width, :float, public?: true, allow_nil?: false
attribute :height, :float, public?: true, allow_nil?: false
end
actions do
defaults [:read, :destroy, create: :*]
end
end
Here’s how saving of the data is implemented
defmodule MyApp.Document do
# ...
actions
# `create` is out of scope
update :save_ocr_data do
require_atomic? false
argument :pages, {:array, :map}, allow_nil?: false
# this is the naïve implementation,
# which works progressively slower when trying to save more pages
change manage_relationship(:pages, type: :create)
change set_attribute(:status, "ready")
end
end
end
What I’ve tried:
Replaced manage_relationship
with bulk_insert
inside the same transaction
# instead of manage_relationship:
change before_action(fn changeset, _context ->
changeset
|> Ash.Changeset.get_argument(:pages)
|> Ash.bulk_create!(MyApp.Page, :add)
changeset
end)
→ This has similar memory usage, it does seem a bit lower though
Replaced manage_relationship
with bulk_insert
outside the transaction
# instead of manage_relationship:
change before_transaction(fn changeset, _context ->
changeset
|> Ash.Changeset.get_argument(:pages)
|> Ash.bulk_create!(MyApp.Page, :add)
changeset
end)
→ This completely removes the problem, there is no noticeable spike in RAM. This also works much faster (seconds instead of minutes). But it does remove some consistency guarantees.
I would like to ask whether there is anything wrong with the initial implementation, or could it be an issue with Ash?
Is there still a decent way to perform everything within the same transaction boundaries?
P.S.
- The sample data is a document with 18 pages, each page contains a few hundred words
- Elixir 1.15 + Erlang 26.1.2
- ash: 3.0.14
- ash_postgres: 2.0.9