Skip to content

Instantly share code, notes, and snippets.

@avsej
Created May 27, 2010 18:15
Show Gist options
  • Save avsej/416143 to your computer and use it in GitHub Desktop.
Save avsej/416143 to your computer and use it in GitHub Desktop.
Google sitemap generator scripts
These files describes how to configure automatic google sitemap
generation.
1. In your models you should define class method which will be generate
array or urls with last modified timestamp.
2. In 'lib/tasks/google_sitemap.rb' you should update 'sources' and
'host' variables for your site.
3. You should configure cron tasks to periodically regenerate sitemap
and ping google. You could use nice gem whenever and sample schedule
in 'config/schedule.rb'
4. This task will generate sitemap index and several sitemaps (one for
each models, because google limited items in one sitemap to 50k). It
places sitemaps to 'public/sitemaps'. It also gzips all sitemaps to
save traffic.
5. Install sitemap in google webmaster's tools using such URL
'http://yoursite.com/sitemaps/index.xml.gz'
That's all
require 'net/http'
require 'uri'
# A class specific to the application which generates a google sitemap from the contents of the database.
# Author: Alastair Brunton
# Modified: Harry Love 2008-06-09
class GoogleSitemapGenerator
def initialize(base_url, sources)
@base_url = base_url
@sources = sources
end
# 1. Iterate through each model's #get_paths method
# 2. Create sitemap file for each model
# 3. Create sitemap index file
# 4. Ping Google
def generate
path_ar = []
sitemaps = []
@sources.each do |source|
# initialize the class and call the get_paths method on it.
path_ar = eval("#{source}.get_paths")
xml = generate_sitemap(path_ar)
save_file(source, xml)
end
index = generate_sitemap_index(@sources)
save_file('index', index)
update_google
end
# Create a sitemap document for a model
def generate_sitemap(path_ar)
xml_str = ""
xml = Builder::XmlMarkup.new(:target => xml_str)
xml.instruct!
xml.urlset(:xmlns => 'http://www.sitemaps.org/schemas/sitemap/0.9') {
path_ar.each do |path|
xml.url {
xml.loc(@base_url + path[:url])
xml.lastmod(path[:last_mod])
xml.changefreq('weekly')
}
end
}
xml_str
end
# Create a sitemap index document
def generate_sitemap_index(sitemaps)
xml_str = ""
xml = Builder::XmlMarkup.new(:target => xml_str)
xml.instruct!
xml.sitemapindex(:xmlns => 'http://www.sitemaps.org/schemas/sitemap/0.9') {
sitemaps.each do |site|
xml.sitemap {
xml.loc(@base_url + "/sitemaps/sitemap_#{site.underscore}.xml.gz")
xml.lastmod(Time.now.strftime('%Y-%m-%d'))
}
end
}
xml_str
end
# Save the xml file (gzipped) to disk
def save_file(source, xml)
FileUtils.mkdir_p(RAILS_ROOT + "/public/sitemaps/")
File.open(RAILS_ROOT + "/public/sitemaps/sitemap_#{source.underscore}.xml.gz", 'w+') do |f|
gz = Zlib::GzipWriter.new(f)
gz.write xml
gz.close
end
end
# Notify Google of the new sitemap index file
def update_google
sitemap_uri = @base_url + '/sitemaps/sitemap_index.xml.gz'
escaped_sitemap_uri = URI.escape(sitemap_uri)
puts 'www.google.com/webmasters/tools/ping?sitemap=' + escaped_sitemap_uri
puts Net::HTTP.get('www.google.com', '/webmasters/tools/ping?sitemap=' + escaped_sitemap_uri)
end
end
class Page < ActiveRecord::Base
def self.get_paths
urls = []
Page.all.each do |page|
urls << { :url => "/pages/#{page.to_param}", :last_mod => page.updated_at.strftime('%Y-%m-%d')}
end
urls
end
end
class Post < ActiveRecord::Base
def self.get_paths
urls = []
Post.all.each do |post|
urls << { :url => "/posts/#{post.to_param}", :last_mod => post.updated_at.strftime('%Y-%m-%d')}
end
urls
end
end
set :output, "#{RAILS_ROOT}/log/cron.log"
every 1.day do
rake "google_sitemap:generate"
end
require 'google_sitemap'
namespace :google_sitemap do
desc "Generate a Google sitemap from the models"
task(:generate => :environment) do
# Generate sitemaps for each of the models listed in the array
sources = %w(Post Page)
host = ENV['HOST'] || 'http://mysite.com'
sitemap = GoogleSitemapGenerator.new(host, sources)
sitemap.generate
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment