Huge memory usage when inserting related resources

Hi!

I’m experiencing a weird RAM-usage spike in a relatively simple update action. It does involve relationships and embeds and a lot of data (approx. 7MB), but the spike reaches over 30GB on a dev machine and 16GB + out-of-memory crash on production.

I have found a workaround, but it is not ideal (different transaction). And I am in general curious if there’s an issue with Ash or with my understanding of how things work.

Here are the details:

  • Document represents a PDF for which we need to save its OCR data (coming from an external service) - basically, we store and use coordinates of all the recognized words on all the pages.

Here’s how this data is modeled:

defmodule MyApp.Document do
  use Ash.Resource,
    domain: MyApp.Domain,
    data_layer: AshPostgres.DataLayer,
    notifiers: [Ash.Notifier.PubSub]

  alias MyApp.Page

  attributes do
    integer_primary_key :id
    status :string # it's more complex that this, but not very relevant
  end

  relationships do
    has_many :pages, Page do
      source_attribute :document_id
      sort :index
    end
  end

# ...
end


defmodule MyApp.Page do
  use Ash.Resource,
    domain: MyApp.Domain,
    data_layer: AshPostgres.DataLayer

  @derive Jason.Encoder

  attributes do
    attribute :index, :integer, allow_nil?: false, constraints: [min: 0], primary_key?: true
    attribute :width, :integer, allow_nil?: false, constraints: [min: 0]
    attribute :height, :integer, allow_nil?: false, constraints: [min: 0]
    attribute :words, {:array, MyApp.Page.Word}, default: []
  end

  relationships do
    belongs_to :document, MyApp.Document do
      allow_nil? false
      primary_key? true
      attribute_type :integer
      attribute_writable? true
      destination_attribute :document_id
    end
  end

  calculations do
    calculate :preview_url, :string, MyApp.GeneratePagePreviewURL
  end

  identities do
    identity :document_and_index, [:document_id, :index]
  end

  actions do
    defaults [:read]

    create :add do
      primary? true
      accept [:index, :width, :height, :words, :document_id]

      upsert? true
      upsert_identity :document_and_index
      upsert_fields :replace_all
    end
  end
end


defmodule MyApp.Page.Word do
  use Ash.Resource,
    data_layer: :embedded

  @derive Jason.Encoder

  attributes do
    attribute :global_index, :integer, public?: true, allow_nil?: false, constraints: [min: 0]
    attribute :symbols, :string, public?: true
    attribute :bounding_box, MyApp.Page.RectangleOnPage, public?: true
  end

  actions do
    defaults [:read, :destroy, create: :*]
  end
end


defmodule MyApp.Page.RectangleOnPage do
  use Ash.Resource,
    data_layer: :embedded

  @derive {Jason.Encoder, only: [:page_index, :left_x, :top_y, :width, :height]}

  attributes do
    attribute :page_index, :integer, public?: true, allow_nil?: false, constraints: [min: 0]
    attribute :left_x, :float, public?: true, allow_nil?: false
    attribute :top_y, :float, public?: true, allow_nil?: false
    attribute :width, :float, public?: true, allow_nil?: false
    attribute :height, :float, public?: true, allow_nil?: false
  end

  actions do
    defaults [:read, :destroy, create: :*]
  end
end

Here’s how saving of the data is implemented

defmodule MyApp.Document do
  # ... 

  actions
    # `create` is out of scope

    update :save_ocr_data do
      require_atomic? false
      argument :pages, {:array, :map}, allow_nil?: false

      # this is the naïve implementation, 
      # which works progressively slower when trying to save more pages
      change manage_relationship(:pages, type: :create)

      change set_attribute(:status, "ready")
    end
  end
end

What I’ve tried:

Replaced manage_relationship with bulk_insert inside the same transaction

# instead of manage_relationship:
change before_action(fn changeset, _context ->
   changeset
   |> Ash.Changeset.get_argument(:pages)
   |> Ash.bulk_create!(MyApp.Page, :add)

   changeset
 end)

→ This has similar memory usage, it does seem a bit lower though

Replaced manage_relationship with bulk_insert outside the transaction

# instead of manage_relationship:
change before_transaction(fn changeset, _context ->
   changeset
   |> Ash.Changeset.get_argument(:pages)
   |> Ash.bulk_create!(MyApp.Page, :add)

   changeset
 end)

This completely removes the problem, there is no noticeable spike in RAM. This also works much faster (seconds instead of minutes). But it does remove some consistency guarantees.

I would like to ask whether there is anything wrong with the initial implementation, or could it be an issue with Ash?
Is there still a decent way to perform everything within the same transaction boundaries?

P.S.

  • The sample data is a document with 18 pages, each page contains a few hundred words
  • Elixir 1.15 + Erlang 26.1.2
  • ash: 3.0.14
  • ash_postgres: 2.0.9

:thinking: That is quite surprising.

What isn’t surprising is that manage_relationship is quite slow. Soon, I’ll be rewriting them to leverage bulk actions just as you have done, but in their current implementation they do their work iteratively.

What is very surprising is that the refactor appears to be “move it out of the transaction” as opposed to “use a bulk action”. That part I don’t fully understand.

I’d like to fully understand this issue to ensure it doesn’t happen for others. Lets start with some questions:

  • When doing this the new way in bulk action, are you seeing one single CREATE statement?

  • If so, do you see the same thing when doing it in a transaction?

  • Are you calling the save_ocr_data action in bulk?

1 Like

Thank you for the quick reaction, Zach!

  • When doing this the new way in bulk action, are you seeing one single CREATE statement?
  • If so, do you see the same thing when doing it in a transaction?

Yes, I’m seeing a single CREATE statement. Both when running this within the transaction and before the transaction.

While debugging, I have actually tried replacing bulk_create with a hand-made loop of Ash.create!, one per page. This produced multiple CREATE statements, one per page, as I expected - but, the delay between each CREATE was getting longer and longer after each page.

  • Are you calling the save_ocr_data action in bulk?

No, the action is called for one document at a time, using code interface generated function.

# Domain:
resource Document do
  define :get_document, action: :read, get_by: [:id]
  # ...
  define :save_ocr_data, action: :save_ocr_data, args: [:pages]
  # ...
end

# Usage (within an Oban worker):
document_id
  |> Domain.get_document!()
  |> Domain.save_ocr_data!(pages)

For interested parties, we’ve isolated this to an issue with bulk actions and lists of embedded resources. I’m working on a benchmark and will use that to solve the issue.

4 Likes

Hey @arconautishche I’ve pushed some significant optimizations to main. There are three relevant changes:

  1. right now, we include the parent changeset as context on embedded changesets. This is actually quite expensive, and will be changed in 4.x (someday in the distant future). For backwards compatibility, we cannot change this, but it can be disabled in config with
config :ash, :include_embedded_source_by_default?, false

Setting this config is now in our getting started guides.

  1. resources can be heavy handed for certain kinds of simple data, and so going through the entire create flow was unnecessary. I’ve made a “fast path” for embedded resources that don’t have any special customizations to bypass action logic. We still need to apply aggressive optimizations to resource actions overall, but this allows embedded resources that are essentially just typed-maps to bypass the internal resource/action logic when casting.

  2. Over time, I’d like to push the idea of using non-embedded types when all that you want to do is model a map attribute or list of map attributes. To this end, I’ve pushed support for the :fields constraint for the :struct type to main. This allows for using a much simpler abstraction than Ash.Resource when you simply want to represent a data type.

How it could be applied in your case:

defmodule MyApp.Page.Word do
  defstruct [:key1, :key2]

  use Ash.Type.NewType, subtype_of: :struct, constraints: [
    instance_of: __MODULE__,
    fields: [
      key1: [type: :string, allow_nil?: false],
      key2: [type: :integer]
    ]
  ]
end

which then allows:

    attribute :words, {:array, MyApp.Page.Word}, default: []

in the same way that you currently have it.

Using Ash.Type.NewType with the :struct type like this can allow you to have the ergonomics of usage of an embedded resource with a simpler tool.

Long term

Ultimately, there is no good reason for embedded resources to have even .01% of the performance impacts that you noted, and I will be investigating that separately. These changes, however, should provide short term relief. Please give the main branch a try and let me know how it works for you :slight_smile:

2 Likes

We’ve made various optimizations to this process in main, it should use orders of magnitude less memory going forward.

This is pretty awesome!

With the latest changes on main, and without any changes to my code, the memory spike is significantly smaller on the same data - approx. 200MB instead of 15-20GB on Ash 3.3.3. So it looks like change #2 already has a huge impact - thanks!

Then I’ve tried the include_embedded_source_by_default?: false config option with Ash 3.3.3 → this doesn’t seem to produce noticeable improvements, the memory spike still went up to 18GB-ish.

And then I’ve tried converting the embedded types to the simple structs with the new constraints on Ash.Type.NewType (on latest main) → this completely removed any noticeable memory spikes :partying_face:. This simpler way to define the types is indeed sufficient in this case, since we’re indeed only trying to represent pretty basic data types.

Side note: I’ve never encountered Ash.Type.NewType before, good to know it exists.

Thanks for all the improvements, can’t wait to use the next version whenever it comes out!

1 Like

Good to hear :slight_smile: I ought to be able to get the performance impacts of embedded resources down to roughly the same levels as the map type, but that will take a bit longer, hunting down a few smaller optimizations.

We’ve recently upgraded Ash to 3.4 and changed the code back to do everything in a single transaction (before_transactionbefore_action). And I can confirm that things are running very smoothly, nothing to complain about!

Thanks, @zachdaniel!

6 Likes