Blink - Fast bulk seeding for Ecto/PostgreSQL with clean, declarative syntax

Blink is a library for fast bulk data insertion into PostgreSQL databases using the COPY command. It provides a clean, declarative syntax for defining seeders.

Features:

  • Uses PostgreSQL’s COPY for fast bulk inserts
  • Tables inserted in declaration order to respect foreign key constraints
  • Access data from previously defined tables when building subsequent tables
  • Store auxiliary context data that won’t be inserted into the database
  • Load data from CSV/JSON files with Blink.from_csv/2 and Blink.from_json/2
  • :transform option for type conversion when loading from files
  • Integrates with ExMachina nicely
  • Rollback on errors
  • Adapter pattern for supporting other databases

Example:

defmodule MyApp.Seeder do
  use Blink

  def call do
    new()
    |> add_table(:users)
    |> add_table(:posts)
    |> insert(MyApp.Repo)
  end

  def table(_store, :users) do
    [
      %{id: 1, name: "Alice", email: "alice@example.com"},
      %{id: 2, name: "Bob", email: "bob@example.com"}
    ]
  end

  def table(store, :posts) do
    users = store.tables.users
    # Build posts referencing users...
  end
end

Links:

10 Likes

Good library.

I’ve read the code and found a couple of fairly obvious bugs (like non-escaped strings in generated CSV) and limitations (like reading everything in memory), so I made a PR with fixes.

I am also providing fairly cheap consultancy services if you want to have this kind of review and contribution in your private projects.

2 Likes

Great. Ty.

I was aware of the memory issue and had a fix in mind similar to the one in your PR. I’ll have a closer look when I have time.

v0.5.0 Released

Version 0.5.0 is now available. This release marks a big step toward 1.0.0 — it covers all the major changes I had planned. Now the focus shifts to gathering feedback, fixing bugs, and addressing any remaining breaking changes before 1.0.0 (though I don’t have any in mind).

The headline feature is stream support, which enables memory-efficient seeding of large datasets.

Both table/2 clauses return streams in the example below, but returning lists still works as before.

defmodule Blog.Seeder do
  use Blink

  def call do
    new()
    |> with_table("users")
    |> with_table("posts")
    |> run(Blog.Repo, timeout: :infinity)
  end

  def table(_seeder, "users") do
    Stream.map(1..200_000, fn i ->
      %{
        id: i,
        name: "User #{i}",
        email: "user#{i}@example.com",
        ...
        inserted_at: ~U[2024-01-01 00:00:00Z],
        updated_at: ~U[2024-01-01 00:00:00Z]
      }
    end)
  end
  
  def table(seeder, "posts") do
    users_stream = seeder.tables["users"]

    Stream.flat_map(users_stream, fn user ->
      for i <- 1..20 do
        %{
          id: (user.id - 1) * 20 + i,
          title: "Post #{i} by #{user.name}",
          body: "This is the content of post #{i}",
          user_id: user.id,
          ...
          inserted_at: ~U[2024-01-01 00:00:00Z],
          updated_at: ~U[2024-01-01 00:00:00Z]
        }
      end
    end)
  end
end

Other highlights

  • JSONB support — nested maps are automatically JSON-encoded during insertion
  • Configurable timeout — :timeout option for long-running transactions
  • Configurable batch size — :batch_size option controls stream chunking (default: 10,000 rows)
  • Performance improvement — CSV encoding executes significantly faster
  • Bug fix — CSV escaping now correctly handles pipes, quotes, newlines, and backslashes

Breaking changes

  • Blink.Store → Blink.Seeder
  • insert/3 → run/3
  • add_table/2 → with_table/2
  • add_context/2 → with_context/2
  • Return values simplified to :ok (raises on failure)
  • Adapter call/4 callback now receives table_name as a string

Full changelog: v0.5.0 release

1 Like