Blink is a library for fast bulk data insertion into PostgreSQL databases using the COPY command. It provides a clean, declarative syntax for defining seeders.
Features:
Uses PostgreSQL’s COPY for fast bulk inserts
Tables inserted in declaration order to respect foreign key constraints
Access data from previously defined tables when building subsequent tables
Store auxiliary context data that won’t be inserted into the database
Load data from CSV/JSON files with Blink.from_csv/2 and Blink.from_json/2
:transform option for type conversion when loading from files
Integrates with ExMachina nicely
Rollback on errors
Adapter pattern for supporting other databases
Example:
defmodule MyApp.Seeder do
use Blink
def call do
new()
|> add_table(:users)
|> add_table(:posts)
|> insert(MyApp.Repo)
end
def table(_store, :users) do
[
%{id: 1, name: "Alice", email: "alice@example.com"},
%{id: 2, name: "Bob", email: "bob@example.com"}
]
end
def table(store, :posts) do
users = store.tables.users
# Build posts referencing users...
end
end
I’ve read the code and found a couple of fairly obvious bugs (like non-escaped strings in generated CSV) and limitations (like reading everything in memory), so I made a PR with fixes.
I am also providing fairly cheap consultancy services if you want to have this kind of review and contribution in your private projects.
Version 0.5.0 is now available. This release marks a big step toward 1.0.0 — it covers all the major changes I had planned. Now the focus shifts to gathering feedback, fixing bugs, and addressing any remaining breaking changes before 1.0.0 (though I don’t have any in mind).
The headline feature is stream support, which enables memory-efficient seeding of large datasets.
Both table/2 clauses return streams in the example below, but returning lists still works as before.
defmodule Blog.Seeder do
use Blink
def call do
new()
|> with_table("users")
|> with_table("posts")
|> run(Blog.Repo, timeout: :infinity)
end
def table(_seeder, "users") do
Stream.map(1..200_000, fn i ->
%{
id: i,
name: "User #{i}",
email: "user#{i}@example.com",
...
inserted_at: ~U[2024-01-01 00:00:00Z],
updated_at: ~U[2024-01-01 00:00:00Z]
}
end)
end
def table(seeder, "posts") do
users_stream = seeder.tables["users"]
Stream.flat_map(users_stream, fn user ->
for i <- 1..20 do
%{
id: (user.id - 1) * 20 + i,
title: "Post #{i} by #{user.name}",
body: "This is the content of post #{i}",
user_id: user.id,
...
inserted_at: ~U[2024-01-01 00:00:00Z],
updated_at: ~U[2024-01-01 00:00:00Z]
}
end
end)
end
end
Other highlights
JSONB support — nested maps are automatically JSON-encoded during insertion
Configurable timeout — :timeout option for long-running transactions
You missed a couple of other important things from my PR:
Doing
try do
adapter.call(...)
rescue
UndefinedFunctionError ->
raise "Module #{inspect adapter} must implement call/4"
end
is a strange approach. Removing the try completely would result in the more readable and meaningful exception.
Plus, it is a buggy approach. Take for example a situation then the call function itself calls an undefined function. This try clause would hide this error, making debugging a nightmare
You new approach opens and parses a CSV file twice in stream mode. First one to get the headers and second one to stream the data. This is not an issue when there is a one huge file, but it is an issue when there are a lot of small files. Opening a file is an operation which is more expensive than reading from a file