Skip to content

Instantly share code, notes, and snippets.

@cesartalves
Last active September 13, 2022 17:21
Show Gist options
  • Save cesartalves/df919b261012499ed30dfd11a6a1a672 to your computer and use it in GitHub Desktop.
Save cesartalves/df919b261012499ed30dfd11a6a1a672 to your computer and use it in GitHub Desktop.
Solidus Importer Redesign Notes / Sketch
# SolidusImporter::Row now has a `entity_id` field.
# It helps aggregate rows that belong to the same "entity" (Order for now) so we can process them together.
# With this, there's no need to maintain a huge context inside the `ProcessRow` job any longer.
# So how might we?
# lib/solidus_importer/process_row.rb
# 1. Associate the Row with an `Identifier` we an use to group related rows together
def prepare_rows(data)
messages = check_data(data)
if messages.empty?
data.each do |row|
# since the Import contains metadata related to the Import type, we always know that the row is, for example,
# of an Order.
@import.rows << ::SolidusImporter::Row.new(data: row.to_h, entity_id: row['Identifier'])
end
{ success: true }
else
@import.update(state: :failed, messages: messages.join(', '))
{ success: false, messages: messages.join(', ') }
end
end
# 2. Instead of processing in-line, move to a background job
def process_rows(initial_context)
group_of_rows = @import.rows.created_or_failed.order(id: :asc).group_by(:entity_id)
group_of_rows.each do |group|
row_ids = group.map(&:id)
# since all the Rows contain all the info we need, no `initial_context` is required.
# This allows full background processing of the related Rows!.
::SolidusImporter::ProcessRowGroupJob.new(@import.id, row_ids).perform_later
end
end
# 3. A new Job
```ruby
class ProcessRowGroupJob
# wrapps ProcessRow so that it can apply to a group of related rows
def perform(import_id, row_ids)
import = SolidusImport::Import.find(import_id)
rows = SolidusImport::Row.where(id: row_ids).all.to_a
@importer = import.importer
context = { }
rows.each do |row|
context = SolidusImporter::ProcessRow.new(@importer, row)
end
# key point! The after import now happens after a group of related rows is processed
(no longer after everything has been processed)
@importer.after_import(context)
end
end
@aamyot
Copy link

aamyot commented Sep 13, 2022

@import.rows << ::SolidusImporter::Row.new(data: row.to_h, entity_id: row['Identifier'])
Does that mean that we need to add a new Identfier columns to each CSV?

This is GREAT! Something that we've been talking about for a long time and I think that it will really help with performance and reliability. What would be nice is to make sure that we can re-import the same file without creating duplicates.

One question: how are we planning to update the Import#state attribute? There are different ways. I was just wondering if you had thought about it.

@cesartalves
Copy link
Author

Does that mean that we need to add a new Identefier columns to each CSV

I gave some thought about this during lunch and we can infer the entity_id based on the rows. Something like data['Number'] || data['Slug'] || data['Email'] etc on a before_create hook.

One question: how are we planning to update the Import#state attribute? There are different ways. I was just wondering if you had thought about it.

I think that's another outstanding question. We could simply update the import to failed whenever there's an error. Wouldn't change much from the way it currently is, but then I wonder how to indicate that we need to reprocess - since failures would now not occur inside the job that processes the Import itself. Which... is what is happening right now anyway during the Order import ๐Ÿ˜ฎโ€๐Ÿ’จ

I'm still wondering if this design will work or it will break too much stuff. Perhaps we even want to bring the RowGroup concept a step further and expose it on the UI, with an option to Retry. I'm gonna work on some minor errors on the extension for now until I feel comfortable getting my feet deep onto the water ๐Ÿ˜›

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment