How to use Postgres cursors and multi-insert in Ruby with Sequel

This post is part of our PostgreSQL series.
SQL cursors are a great way to stream data out of a database without loading it all into memory.
Reading using a cursor
Sequel has use_cursor
:
BATCH_SIZE = 1000
DB[:widgets].
select(:id, :name, :power).
use_cursor(rows_per_fetch: BATCH_SIZE).each do |row|
row[:id] # whatever
end
end
Writing using cursors and multi-insert
Here's a real-world example - we had to copy things from widgets
to widgets_copy
.
We thought we were being really clever, keeping all the data inside Postgres:
# NOTE: this is not the recommended method, but it works too
batches = (desired_count.to_f / BATCH_SIZE).ceil
batches.times do |i|
offset = i * BATCH_SIZE
DB.run <<-SQL
INSERT INTO widgets_copy (
id,
name,
power,
created_at,
updated_at
) (
SELECT
uuid_generate_v4(),
name,
power,
now(),
now(),
FROM widgets
ORDER BY id
LIMIT #{BATCH_SIZE}
OFFSET #{offset}
)
SQL
end
Even with 244gb of RAM, this sometimes stopped dead 75% of the way through. We rewrote it—pulling the data into Ruby, even—and it's both faster and doesn't stop in the middle. We're using Sequel's multi_insert
.
batch = []
now = Sequel.function :now
uuid_generate_v4 = Sequel.function :uuid_generate_v4
DB[:widgets].
select(:name, :power).
order(:id).
use_cursor(rows_per_fetch: BATCH_SIZE).each do |row|
batch << row.merge!(
id: uuid_generate_v4,
updated_at: now,
created_at: now
)
if (batch.length % BATCH_SIZE == 0) || batch.length == desired_count
DB[:widgets_copy].multi_insert batch
batch.clear
end
end
end
Thanks to @dkastner and @devTristan who promoted these ideas internally!