Skip to content

Instantly share code, notes, and snippets.

@zackster
Created August 29, 2016 16:08
Show Gist options
  • Save zackster/5f35c37f79bc668b52fe35450aae0992 to your computer and use it in GitHub Desktop.
Save zackster/5f35c37f79bc668b52fe35450aae0992 to your computer and use it in GitHub Desktop.
Download 8-K filings from the SEC
require 'posix/spawn'
require 'pry'
require 'active_support/all'
years = (2010..2016).to_a
quarters = (1..4).to_a
years.product(quarters).each do |year, quarter|
puts "Year: #{year} quarter #{quarter}"
save_name = "Y#{year}Q#{quarter}-form.idx"
file_uri = "ftp://ftp.sec.gov/edgar/full-index/#{year}/QTR#{quarter}/form.idx"
pid = POSIX::Spawn::spawn("wget #{file_uri} -O #{save_name}")
stat = Process::waitpid(pid)
form_lines = POSIX::Spawn::Child.new("cat #{save_name} | grep ^8-K").out.split("\n")
puts "Found #{form_lines.size} files for #{year}-#{quarter}"
sleep 5
form_lines.each_with_index do |line, line_index|
puts "#{line_index}/#{form_lines.size} - #{save_name}"
actual_document_url = line.split(/\s+/).last
filing_date = line.split(/\s+/)[-2]
local_filename = actual_document_url.split('/').last
# maybe we already downloaded it before
if File.exist?(local_filename) && File.size(local_filename).to_f > 0
puts "Skipping; already downloaded"
else
file_uri = "ftp://ftp.sec.gov/#{actual_document_url}"
system("wget #{file_uri}")
GC.start
end
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment