Introduction
Let's say that I have a model declaration of a Book with attr_accessor named :purchased.
class Book < ActiveRecord::Base
# contains fields like :id, :title, :description, :archived_at
...
attr_accessor :purchased
def purchased
@purchased || false
end
scope :unarchived, -> { where(archived_at: nil) }
...
end
Problem
I have an arbitrary method
def mark_purchased(purchased_ids)
# 1. Returns ActiveRecord::Relation
unarchived_books = Book.unarchived
# 2. Mark only purchased books
purchased_books = unarchived_books.where(id: purchased_ids)
purchased_books.update_attr_accessor(purchased: true)
# 3. Return unarchived books with update :purchased attr_accessor
unarchived_books
end
This method takes an ActiveRecord::Relation and updates attr_accessor on a subset of this relation, and then should return the whole relation with updated attr_accessor
Question
How to accomplish this with the lowest amount of code and as efficient as possible?
Let's assume that we have 200 000 books and the user purchased only 50 of them and let's avoid the pagination solution for now. I know that I'm going to call :purchased 200 000 times when displaying them, but I want to update only a subset of unarchived_books (50 records to be specific) instead of looping through all books and checking if they are in purchased_ids.
The best solution that I thought of, was to
- Get all unarchived books
- Get a subset and update them one by one using
.eachloop (because there is nothing like:update_attr_accessoron wholeActiveRecord::Relation - Merge these relations together, overriding the first relation with updated one.
But there's one problem with number 3 of my solution - the :merge method actually ignores the attr_accessors, so I have no idea where to go from there.
Do you have any tips on how to improve my solution?
Loading and updating all the records one by one is guaranteed to run out of memory with a non trivial amount of data. The fastest way is to use
update_allwhich updates all the records in a single query without instantiating them:If you really have to update them one by one for example to trigger callbacks make sure you use batches:
This will be slower by an order of magnitude though as each UPDATE query has be sent to the DB.