Skip to content

Instantly share code, notes, and snippets.

@postmodern
Created March 24, 2009 02:11
Show Gist options
  • Save postmodern/83893 to your computer and use it in GitHub Desktop.
Save postmodern/83893 to your computer and use it in GitHub Desktop.
A small script which will spider a website and build a word-list
#!/usr/bin/env ruby
gem 'spidr'
require 'spidr'
require 'set'
unless ARGV.length == 2
STDERR.puts "usage: #{$0} HOST FILE"
exit -1
end
words = Set[]
Spidr.host(ARGV[0]) do |spidr|
spidr.every_page do |page|
if page.html?
puts "[-] Scanning words from #{page.url}"
words += page.doc.search('p').inner_text.split
end
end
end
File.open(ARGV[1],'w') do |file|
words.each { |word| file.puts(word) }
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment