Skip to content

Instantly share code, notes, and snippets.

@knowtheory
Last active December 13, 2015 21:48
Show Gist options
  • Save knowtheory/4979322 to your computer and use it in GitHub Desktop.
Save knowtheory/4979322 to your computer and use it in GitHub Desktop.
Teaching my brother regular expressions.
[2/18/13 11:49:45 AM] schwaumlaut: I don't know regular expressions. :P
[2/18/13 11:49:57 AM] Ted Han: L2REGEXP
[2/18/13 11:49:59 AM] Ted Han: n00b
[2/18/13 11:50:01 AM] schwaumlaut: What is a good thing to read for them?
[2/18/13 11:50:09 AM] Ted Han: http://rubular.com/ is fun.
[2/18/13 11:50:45 AM] schwaumlaut: "I would like to learn to program."
"Here is your command line. ENJOY!"
[2/18/13 11:50:52 AM] Ted Han: haha.
[2/18/13 11:51:17 AM] Ted Han: http://docs.python.org/2/howto/regex.html ?
[2/18/13 11:51:19 AM] Ted Han: here's the basic gist
[2/18/13 11:51:25 AM] schwaumlaut: yaaaaay
[2/18/13 11:51:26 AM] Ted Han: you can match literal characters
[2/18/13 11:51:30 AM] Ted Han: "a"
[2/18/13 11:51:31 AM] Ted Han: "b"
[2/18/13 11:51:32 AM] Ted Han: etc.
[2/18/13 11:51:33 AM] Ted Han: /a/
[2/18/13 11:51:35 AM] Ted Han: /b/
[2/18/13 11:51:44 AM] Ted Han: quantification of those things goes as follows
[2/18/13 11:52:05 AM] Ted Han: /a+/ # match 1 or more "a". Will match a, aa, aaaa, aaaaaaaaaa, and so on
[2/18/13 11:52:27 AM] Ted Han: /a*/ # match 0 or more "a". Will match "", "a", "aa", etc.
[2/18/13 11:53:15 AM] Ted Han: /a{2,4}/ # match some number of chars between lower and upper bound, will match "aa", "aaa", "aaaa", but not "a", or "aaaaa"
[2/18/13 11:53:40 AM] Ted Han: /a?/ # match 0 or 1 "a"
[2/18/13 11:53:48 AM] Ted Han: you can do grouping using parentheses.
[2/18/13 11:54:08 AM] Ted Han: /(aa)?/ # match the group "aa" 0 or 1 times.
[2/18/13 11:54:21 AM] Ted Han: you can also do disjunction using |
[2/18/13 11:54:39 AM] Ted Han: /a|b/ # match any string w/ "a" or "b" in it.
[2/18/13 11:54:49 AM] Ted Han: there are also character classes
[2/18/13 11:55:06 AM] Ted Han: /\w/ # match any words (which i think works out to anything that's not a space)
[2/18/13 11:55:17 AM] Ted Han: errrr that's match any word character.
[2/18/13 11:55:25 AM] schwaumlaut: Mm.
[2/18/13 11:55:25 AM] Ted Han: /\d/ # will match any digit.
[2/18/13 11:55:28 AM] Ted Han: so.
[2/18/13 11:55:31 AM] Ted Han: if you wanted to get a date...
[2/18/13 11:55:41 AM] Ted Han: oh, i forgot. escaping
[2/18/13 11:55:58 AM] Ted Han: some characters you want to match are control characters for Regexps, so you have to tell the regexp you mean the literal character by escaping it.
[2/18/13 11:56:03 AM] Ted Han: for example.
[2/18/13 11:56:06 AM] Ted Han: matching the "/" character
[2/18/13 11:56:13 AM] Ted Han: /\//
[2/18/13 11:56:19 AM] Ted Han: \ <- the escape character.
[2/18/13 11:56:31 AM] schwaumlaut: I see.
[2/18/13 11:56:34 AM] Ted Han: "\/" <- means escape "/"
[2/18/13 11:56:41 AM] Ted Han: so "\d" is an escaped "d"
[2/18/13 11:56:44 AM] Ted Han: rather than a literal d.
[2/18/13 11:56:48 AM] schwaumlaut: Just like escaping quotes in strings.
[2/18/13 11:56:52 AM] Ted Han: yep.
[2/18/13 11:56:55 AM] Ted Han: so
[2/18/13 11:57:01 AM] Ted Han: you can match a calendar date
[2/18/13 11:57:23 AM] Ted Han: /\d{2}\/\d{2}\/(\d{2}|\d{4})/
[2/18/13 11:57:44 AM] Ted Han: that'll match 2 digits followed by a slash, followed by 2 more digits, followed by either 2 or 4 digits.
[2/18/13 11:58:21 AM] Ted Han: Now, perl compatible regular expressions also let you refer to groups that you've delinated in parentheses
[2/18/13 11:58:23 AM] Ted Han: so.
[2/18/13 11:58:27 AM] Ted Han: /(\d{2}\/\d{2}\/(\d{2}|\d{4}))/
[2/18/13 11:58:37 AM] Ted Han: group 1 is the full match here
[2/18/13 11:58:43 AM] Ted Han: group 2 is the year
[2/18/13 11:59:08 AM] Ted Han: i forget how python lets you access it, it might be in the variable \1
[2/18/13 11:59:10 AM] Ted Han: you'll have to read up on that
[2/18/13 11:59:18 AM] schwaumlaut: Mm hm.
[2/18/13 11:59:20 AM] Ted Han: so you can do things in ruby like...
[2/18/13 12:01:31 PM] Ted Han: >> "12/31/80".sub(/(\d{2}\/\d{2}\/)(\d{2})/, "#{$1}19#{$2}")
=> "12/31/1980"
[2/18/13 12:02:05 PM] Ted Han: that regexp will break "12/31/80" into two chunks, "12/31/" and "80"
[2/18/13 12:02:17 PM] Ted Han: and then you can mash together variables in an interpolated string
[2/18/13 12:02:19 PM] Ted Han: i forget whether python lets you interpolate or not.
[2/18/13 12:02:53 PM] schwaumlaut: Hmmm. I guess I will find out. Thanks!
[2/18/13 12:02:53 PM] Ted Han: but you get the idea.
[2/18/13 12:03:06 PM] Ted Han: so... remember you'll want to match the rest of the string after the date.
[2/18/13 12:03:16 PM] Ted Han: oh i forgot one of the other character classes
[2/18/13 12:03:25 PM] Ted Han: /./ # the dot will match any character
[2/18/13 12:04:06 PM] Ted Han: so
[2/18/13 12:04:19 PM] Ted Han: /.*/ # will match any string including an empty string.
[2/18/13 12:05:33 PM] Ted Han: ruby has two methods for replacing stuff in strings, sub, and gsub.  the first substitutes one match, the second replaces all matches.
[2/18/13 12:06:08 PM] schwaumlaut: Alas, regular expressions look like gibberish to the uninitiated.
[2/18/13 12:06:57 PM] Ted Han: As has been said:
[2/18/13 12:07:26 PM] Ted Han: Some people, when confronted with a problem, think
“I know, I'll use regular expressions.” Now they have two problems.
[2/18/13 12:07:43 PM] schwaumlaut: Heh.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment