Skip to content

Instantly share code, notes, and snippets.

@kimmel
Created August 26, 2012 18:06
Show Gist options
  • Save kimmel/3482190 to your computer and use it in GitHub Desktop.
Save kimmel/3482190 to your computer and use it in GitHub Desktop.
ruby regexp html parsing 2
# The following regexp will break
indentation = '<img src="http:\/\/ycombinator.com\/images\/s.gif" height=1 width=(\d+)><\/td>'
score = '<span id=score_([0-9]+)>([0-9]+) point'
user_id = '<a href="user\\?id=([^"]+)">'
time_ago = '<\/a>([^\|]+)\|'
comment_body = '<span class=\\"comment\\"><font color=#000000>(.*?)<\\/font>'
regexp_str = "#{indentation}.*?#{score}.*?#{user_id}.*?#{time_ago}.*?#{comment_body}"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment