Lawson,
re: http://viget.com/extend/make-remote-files-local-with-ruby-tempfile
I have created my own This American Life archive (http://thisamericanlife.co) and I am using your AWESOME LocalResource class to import the remote jpg and mp3 associated with each episode to an s3 bucket. It has been awesome and very functional for months and it just stopped working the other day. I think maybe part of the problem was that they changed the download link, but I got that sorted and then kept getting errors and went down a big rabbit hole and now here I am creating this gist in hopes that some of this makes sense and you can help steer me in the right direction. So, here's my breakdown...
To start off, I am requiring both open-uri
and httparty
in my application.rb. Also, I pulled LocalResource into it's own class (lib/local_resource.rb):
class LocalResource
attr_reader :uri
def initialize(uri)
@uri = uri
end
def file
@file ||= Tempfile.new(tmp_filename, tmp_folder, encoding: encoding).tap do |f|
io.rewind
f.write(io.read)
f.close
end
end
def io
@io ||= uri.open
end
def encoding
io.rewind
io.read.encoding
end
def tmp_filename
[
Pathname.new(uri.path).basename,
Pathname.new(uri.path).extname
]
end
def tmp_folder
Rails.root.join('tmp')
end
end
I am still using your "local resources from url" method:
def local_resource_from_url(url)
LocalResource.new(URI.parse(url))
end
And here is my bloated import method:
def import
if new_episode?
episode = this_week
doc = Nokogiri::HTML(open("http://www.thisamericanlife.org/radio-archives/episode/#{episode}")).css("div#content")
number = doc.css("h1.node-title").text.split(":").first.to_i
title = doc.css("h1.node-title").text.split(":").last.strip
description = doc.css("div.description").text.strip
date = Date.parse(doc.css("div.date").text).strftime("%F")
image = doc.css("div.image img").attribute('src')
podcast = "http://www.podtrac.com/pts/redirect.mp3/podcast.thisamericanlife.org/podcast/#{episode}.mp3"
begin
local_podcast = local_resource_from_url(podcast)
local_copy_of_podcast = local_podcast.file
local_image = local_resource_from_url(image)
local_copy_of_image = local_image.file
s3 = AWS::S3.new
bucket = s3.buckets["#{ENV['S3_BUCKET_NAME']}"]
if !bucket.objects["podcasts/#{episode}.mp3"].exists?
bucket.objects["podcasts/#{episode}.mp3"].write(:file => local_copy_of_podcast.path, :acl => :public_read)
end
if !bucket.objects["images/#{episode}.jpg"].exists?
bucket.objects["images/#{episode}.jpg"].write(:file => local_copy_of_image.path, :acl => :public_read)
end
ensure
local_copy_of_podcast.close
local_copy_of_podcast.unlink
local_copy_of_image.close
local_copy_of_image.unlink
end
Podcast.create!(number: number, title: title, description: description, date: date)
redirect_to root_path, notice: "New Episode Imported! :)"
else
redirect_to root_path, notice: "No New Episodes. :("
end
end
All of this worked for quite some time, but then I got this error:
undefined method `close' for nil:NilClass
ensure
local_copy_of_podcast.close
local_copy_of_podcast.unlink
local_copy_of_image.close
local_copy_of_image.unlink
I removed the "begin / ensure / end" part in the import method, I run it again, and I get this:
unexpected prefix: #<Pathname:552.mp3>
def file
@file ||= Tempfile.new(tmp_filename, tmp_folder, encoding: encoding).tap do |f|
io.rewind
f.write(io.read)
f.close
That seemed like a weird error, because, as I understand things, the tmp_filename is just trying to grab the actual file name and then the file type and create an array like this ['522','.mp3']
. But when I looked into the Pathname.basename method, it returns the entire thing "522.mp3" in this weird Pathname object. Keep in mind, this HAD been working for MONTHS, but either way I decided that my next move was to forget about tmp_filename all together and I just hardcoded a file name:
@file ||= Tempfile.new(['522', '.mp3'], tmp_folder, encoding: encoding)
I run import again and I get this:
private method `open' called for #<URI::Generic:0x007ff28e8914e8>
def io
@io ||= uri.open
end
And this is where I stop because I can't figure out what to do next. Does any of this make sense? If you've made it this far and things are still totally out there and you want to take a look at the actual source files, have at it!
https://github.com/eliduke/thisamericanlife.co
Thanks ahead of time!
Eli
I just had the same thing happen to me. No idea exactly when it stopped working but it appears that Pathname.basename returns a class and not a string and doesn't sub out the extension name.
I changed the tmp_pathname method to the following and it seems to be working:
Definately nicer ways of doing this, but essentially you need to end up with an array in the following format: