Skip to content

Instantly share code, notes, and snippets.

@sephraim
Last active October 4, 2022 19:36
Show Gist options
  • Save sephraim/3db6db2d945ab384b4aa82efb3b80928 to your computer and use it in GitHub Desktop.
Save sephraim/3db6db2d945ab384b4aa82efb3b80928 to your computer and use it in GitHub Desktop.
[Get web page source] Get the contents of a web page using Ruby's built-in open-uri library
require 'open-uri'
url = 'https://www.google.com/'
file = URI.open(url)
contents = file.read
puts contents
require 'open-uri'
require 'htmlentities'
# get page source
url = 'https://www.google.com/'
source = open(url) { |f| f.read }.encode!('UTF-8', 'UTF-8', invalid: :replace)
# replace HTML entities
coder = HTMLEntities.new
source = coder.decode(source)
# split into array
source = source.split("\n").collect{ |line| line.strip.chomp }
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment