Your script should have the following features:
- Print the output, show progress in the stdout to make it
- Provide a
limit
argument, so you can test on a small subset - Provide a
offset
argument, so you can skip already processed rows
Keep in mind while writing:
- Ensure idempotency, so multiple runs do not process the same data multiple times.
- Keep the
task
block small. Use methods for business logic, thetask
should just provide simple iteration. ActiveRecord
calls add up. Be mindful where bottlenecks may occur- Consider dropping down for raw SQL for mass insertions/updates
Before you run:
- Have a plan for ensuring the results are as expected
- Ensure it runs on a local dataset
While its running & after its completed:
- Keep an eye on the process for unexpected errors.
- Spot check the output ocassionally to confirm everthing is working
- Ask yourself, Will this task ever be used again? If the answer is no, why not delete it?
**Example backfill task**
require 'open-uri'
namespace :backfill do
task :albums, [:limit, :offset] do
limit = args.limit.to_i
offset = args.offset.to_i
url = args.url
data = get_data_for_backfill(url, limit, offset)
total = data.count
data.each_with_index do |album, index|
puts "PROCESSING #{index}/#{total}, ALBUM ID #{album.id} "
backfill_album(album)
end
end
def backfill_album(album)
if album.processed?
puts "ALREADY BEEN PROCESSED #{album.id}"
else
was_successful = album.process!
if was_successful
puts "SUCCESSFULLY PROCESSED #{album.id}"
else
puts "FAILED TO PROCESS #{album.id}"
end
end
end
def get_data_for_backfill(limit, offset)
Album
.where(processed: false)
.limit(limit)
.offset(offset)
end
end