Skip to content

Instantly share code, notes, and snippets.

@JDStraughan
Last active December 10, 2015 12:58
Show Gist options
  • Save JDStraughan/4437866 to your computer and use it in GitHub Desktop.
Save JDStraughan/4437866 to your computer and use it in GitHub Desktop.
Harvest job postings from all USA CraigsList boards for given areas (default is software, web, and internet engineering)
#!/usr/bin/env ruby
require 'rubygems'
require 'feedzirra'
require 'csv'
cities = %w[abilene akroncanton albany albanyga albuquerque altoona amarillo ames anchorage annarbor annapolis appleton asheville ashtabula athensga athensohio atlanta auburn augusta austin bakersfield baltimore batonrouge battlecreek beaumont bellingham bemidji bend billings binghamton bham bismarck bloomington bn boise boone boston boulder bgky bozeman brainerd brownsville brunswick buffalo butte capecod catskills cedarrapids cnj cenla centralmich chambana charleston charlestonwv charlotte charlottesville chattanooga chautauqua chicago chico chillicothe cincinnati clarksville cleveland clovis collegestation cosprings columbiamo columbia columbus columbusga cookeville corpuschristi corvallis chambersburg dallas danville dayton daytona decatur nacogdoches delrio delaware denver desmoines detroit dothan dubuque duluth eastidaho eastoregon eastco newlondon eastnc eastky martinsburg easternshore eauclaire elpaso elko elmira erie eugene evansville fairbanks fargo farmington fayetteville fayar fingerlakes flagstaff flint shoals florencesc keys fortcollins fortdodge fortsmith fortwayne frederick fredericksburg fresno fortmyers gadsden gainesville galveston glensfalls goldcountry grandforks grandisland grandrapids greatfalls greenbay greensboro greenville gulfport norfolk hanford harrisburg harrisonburg hartford hattiesburg honolulu cfl helena hickory rockies hiltonhead holland houma houston hudsonvalley humboldt huntington huntsville imperial indianapolis inlandempire iowacity ithaca jxn jackson jacksontn jacksonville onslow janesville jerseyshore jonesboro joplin kalamazoo kalispell kansascity kenai kpr racine killeen kirksville klamath knoxville kokomo lacrosse lasalle lafayette tippecanoe lakecharles loz lakeland lancaster lansing laredo lascruces lasvegas lawrence lawton allentown lewiston lexington limaohio lincoln littlerock logan longisland losangeles louisville lubbock lynchburg macon madison maine ksu mankato mansfield masoncity mattoon mcallen meadville medford memphis mendocino merced meridian milwaukee minneapolis missoula mobile modesto mohave monroemi monroe montana monterey montgomery morgantown moseslake muncie muskegon myrtlebeach nashville nh newhaven neworleans blacksburg newyork lakecity nd newjersey northmiss northplatte nesd northernwi nmi wheeling nwct nwga nwks enid ocala odessa ogden okaloosa oklahomacity olympic omaha oneonta orangecounty oregoncoast orlando outerbanks owensboro palmsprings panamacity parkersburg pensacola peoria philadelphia phoenix csd pittsburgh plattsburgh poconos porthuron portland potsdam prescott provo pueblo pullman quadcities raleigh rapidcity reading redding reno providence richmond richmondin roanoke rmn rochester rockford roseburg roswell sacramento saginaw salem salina saltlakecity sanangelo sanantonio sandiego slo sanmarcos sandusky santabarbara santafe santamaria sarasota savannah scottsbluff scranton seattle sheboygan showlow shreveport sierravista siouxcity siouxfalls siskiyou skagit southbend southcoast sd miami southjersey ottumwa seks juneau semo swv carbondale smd swks marshall natchez bigbend swva swmi spacecoast spokane springfieldil springfield staugustine stcloud stgeorge stjoseph stlouis pennstate statesboro stillwater stockton susanville syracuse tallahassee tampa terrehaute texarkana texoma thumb toledo topeka treasure tricities tucson tulsa tuscaloosa tuscarawas twinfalls twintiers easttexas up utica valdosta ventura burlington victoriatx visalia waco washingtondc waterloo watertown wausau wenatchee wv quincy westky westmd westernmass westslope wichita wichitafalls williamsport wilmington winchester winstonsalem worcester wyoming yakima york youngstown yubasutter yuma zanesville']
areas = %w['sof web eng']
@output_filename_suffix = Time.now.to_i
def process_feed feed
return unless feed.is_a? Feedzirra::Parser::RSS
feed.entries.each do |entry|
unless (entry.title.nil? || entry.url.nil? || entry.summary.nil? || entry.published.nil?)
puts "Adding #{entry.title}..."
CSV::open("./postings_#{@output_filename_suffix}.csv", "a") do |csv|
csv << [
entry.title.chomp,
entry.url.chomp,
entry.summary.chomp,
entry.published
]
end
end
end
end
CSV::open("./postings_#{@output_filename_suffix}.csv", "w") do |csv|
csv << %w[title url summary published]
end
cities.each do |city|
areas.each do |area|
process_feed Feedzirra::Feed.fetch_and_parse("http://#{city}.craigslist.org/search/#{area}?query=%20&format=rss")
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment