-
-
Save akcrono/376904e565fd873e7b78 to your computer and use it in GitHub Desktop.
class CustomMailExporter | |
require 'fileutils' | |
attr_accessor :service, :target_users, :target_start_date, :target_end_date, | |
:path, :errors, :filename_counter, :emails_found | |
# Dates should be in Date or DateTime format. Users should be an array. | |
def initialize(service, target_users, target_start_date, target_end_date) | |
@service = service | |
@target_users = target_users.map(&:downcase) | |
@target_start_date = target_start_date | |
@target_end_date = target_end_date | |
@path = "/mnt/#{service.id}/" | |
@errors = [] | |
@filename_counter = 0 | |
@emails_found = 0 | |
end | |
def process | |
FileUtils.mkdir_p(path) unless File.directory?(path) | |
service.metadatum_class.find_each(service.id) do |datum| | |
if target_user?(datum.from) && date_in_range?(datum.date) | |
write_contents_to_file(datum) | |
@emails_found += 1 | |
end | |
end | |
return true if errors.count == 0 | |
end | |
def convert_to_email_address(from) | |
from.split("<").last.split(">").first | |
end | |
def target_user?(from) | |
target_users.include?(convert_to_email_address(from).downcase) | |
end | |
def date_in_range?(date) | |
date > target_start_date && date < target_end_date | |
end | |
def write_contents_to_file(datum) | |
begin | |
path_and_name = path | |
if datum.respond_to?(:content_filename) && datum.content_filename.present? | |
path_and_name += datum.content_filename.gsub(/[.<>:"\/\\|\?\*']/, "") | |
#use gsub for problem characters in subjects | |
else | |
path_and_name += filename_counter.to_s | |
filename_counter += 1 | |
end | |
path_and_name += ".eml" unless path_and_name.include? ".eml" | |
File.open(path_and_name, 'wb') do |f| | |
datum.content { |chunk| f << chunk } | |
end | |
rescue => e | |
errors << [datum.key, e] | |
end | |
end | |
end |
Overall looks pretty good. Just a few comments. Nice work.
What is the relationship between service and target_users?
Rather than trying to extract the email address in convert_to_email_address
, which could prove difficult if there are from fields that are formatted differently than you expect, you try creating regular expressions from each of the target_users and using them to do the comparison in the target_user? method. The result would look something like this:
def initialize(service, target_users, target_start_date, target_end_date)
# Other initialization code
@target_users = target_users.map do |email_address|
# Regexp.escape will properly escape any special characters in the email address, such as '.'
# The second boolean argument tells Regexp that the expression should be case insensitive.
Regexp.new(Regexp.escape(email_address), true)
end
# Other initialization code
end
def target_user?(from)
# The === method is defined on the Regexp class and returns true if the other String, in this case 'from',
# matches the regular expression.
target_users.any?{ |regexp| regexp === from }
end
For more info on regular expressions in Ruby, check out http://ruby-doc.org//core-2.1.1/Regexp.html
With regard to the date_in_range?
method, you should make sure that you're dealing with consistent objects. The comment above your initialize method suggests that target_start_date
and target_end_date
should be Date or DateTime objects, but the GoogleMailDatum#date method returns a Time object. The differences between these types could lead to unexpected behavior when you try to make comparisons, as you do in date_in_range?
. This script would be much more reliable and easier to work with if you just chose one type of Time object to use throughout the script. I would suggest using Time over Date or DateTime, since Time is the one used by the Datum class.
Also regarding date_in_range?
, the comparison you're making, date > target_start_date && date < target_end_date
, suggests that target_start_date and target_end_date are non-inclusive. Is this desirable?
Minor FYI, but S3Datum has a method for writing content to a file, see Concerns::S3Datum#write_content_to_file for details.
Regarding lines 46-52, I dont think that you need to worry about Datums missing or not responding to content_filename. The content_filename method is required by the S3Datum interface for files to be stored in S3. Without it, we wouldn't be able to store or fetch content for them anyway. Also, with regard to
path_and_name += datum.content_filename.gsub(/[.<>:"\/\\|\?\*']/, "")
#use gsub for problem characters in subjects
it seems like the GoogleMailDatum and GoogleMailRestModels::CannonicalDatum classes both use the message_id to generate the filename, not the subject, so the gsub call may not be completely necessary, though I guess it's possible that I'm missing something.
Overall, I think it's good. Nice job!
I would not do the check for datum.content.nil? on line 50. This will load all content into memory. We don't want that. Instead, we'll wind up with some empty files. We can clean these up on the command line after the export finishes.