Last active
December 13, 2015 21:48
-
-
Save knowtheory/4979322 to your computer and use it in GitHub Desktop.
Teaching my brother regular expressions.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[2/18/13 11:49:45 AM] schwaumlaut: I don't know regular expressions. :P | |
[2/18/13 11:49:57 AM] Ted Han: L2REGEXP | |
[2/18/13 11:49:59 AM] Ted Han: n00b | |
[2/18/13 11:50:01 AM] schwaumlaut: What is a good thing to read for them? | |
[2/18/13 11:50:09 AM] Ted Han: http://rubular.com/ is fun. | |
[2/18/13 11:50:45 AM] schwaumlaut: "I would like to learn to program." | |
"Here is your command line. ENJOY!" | |
[2/18/13 11:50:52 AM] Ted Han: haha. | |
[2/18/13 11:51:17 AM] Ted Han: http://docs.python.org/2/howto/regex.html ? | |
[2/18/13 11:51:19 AM] Ted Han: here's the basic gist | |
[2/18/13 11:51:25 AM] schwaumlaut: yaaaaay | |
[2/18/13 11:51:26 AM] Ted Han: you can match literal characters | |
[2/18/13 11:51:30 AM] Ted Han: "a" | |
[2/18/13 11:51:31 AM] Ted Han: "b" | |
[2/18/13 11:51:32 AM] Ted Han: etc. | |
[2/18/13 11:51:33 AM] Ted Han: /a/ | |
[2/18/13 11:51:35 AM] Ted Han: /b/ | |
[2/18/13 11:51:44 AM] Ted Han: quantification of those things goes as follows | |
[2/18/13 11:52:05 AM] Ted Han: /a+/ # match 1 or more "a". Will match a, aa, aaaa, aaaaaaaaaa, and so on | |
[2/18/13 11:52:27 AM] Ted Han: /a*/ # match 0 or more "a". Will match "", "a", "aa", etc. | |
[2/18/13 11:53:15 AM] Ted Han: /a{2,4}/ # match some number of chars between lower and upper bound, will match "aa", "aaa", "aaaa", but not "a", or "aaaaa" | |
[2/18/13 11:53:40 AM] Ted Han: /a?/ # match 0 or 1 "a" | |
[2/18/13 11:53:48 AM] Ted Han: you can do grouping using parentheses. | |
[2/18/13 11:54:08 AM] Ted Han: /(aa)?/ # match the group "aa" 0 or 1 times. | |
[2/18/13 11:54:21 AM] Ted Han: you can also do disjunction using | | |
[2/18/13 11:54:39 AM] Ted Han: /a|b/ # match any string w/ "a" or "b" in it. | |
[2/18/13 11:54:49 AM] Ted Han: there are also character classes | |
[2/18/13 11:55:06 AM] Ted Han: /\w/ # match any words (which i think works out to anything that's not a space) | |
[2/18/13 11:55:17 AM] Ted Han: errrr that's match any word character. | |
[2/18/13 11:55:25 AM] schwaumlaut: Mm. | |
[2/18/13 11:55:25 AM] Ted Han: /\d/ # will match any digit. | |
[2/18/13 11:55:28 AM] Ted Han: so. | |
[2/18/13 11:55:31 AM] Ted Han: if you wanted to get a date... | |
[2/18/13 11:55:41 AM] Ted Han: oh, i forgot. escaping | |
[2/18/13 11:55:58 AM] Ted Han: some characters you want to match are control characters for Regexps, so you have to tell the regexp you mean the literal character by escaping it. | |
[2/18/13 11:56:03 AM] Ted Han: for example. | |
[2/18/13 11:56:06 AM] Ted Han: matching the "/" character | |
[2/18/13 11:56:13 AM] Ted Han: /\// | |
[2/18/13 11:56:19 AM] Ted Han: \ <- the escape character. | |
[2/18/13 11:56:31 AM] schwaumlaut: I see. | |
[2/18/13 11:56:34 AM] Ted Han: "\/" <- means escape "/" | |
[2/18/13 11:56:41 AM] Ted Han: so "\d" is an escaped "d" | |
[2/18/13 11:56:44 AM] Ted Han: rather than a literal d. | |
[2/18/13 11:56:48 AM] schwaumlaut: Just like escaping quotes in strings. | |
[2/18/13 11:56:52 AM] Ted Han: yep. | |
[2/18/13 11:56:55 AM] Ted Han: so | |
[2/18/13 11:57:01 AM] Ted Han: you can match a calendar date | |
[2/18/13 11:57:23 AM] Ted Han: /\d{2}\/\d{2}\/(\d{2}|\d{4})/ | |
[2/18/13 11:57:44 AM] Ted Han: that'll match 2 digits followed by a slash, followed by 2 more digits, followed by either 2 or 4 digits. | |
[2/18/13 11:58:21 AM] Ted Han: Now, perl compatible regular expressions also let you refer to groups that you've delinated in parentheses | |
[2/18/13 11:58:23 AM] Ted Han: so. | |
[2/18/13 11:58:27 AM] Ted Han: /(\d{2}\/\d{2}\/(\d{2}|\d{4}))/ | |
[2/18/13 11:58:37 AM] Ted Han: group 1 is the full match here | |
[2/18/13 11:58:43 AM] Ted Han: group 2 is the year | |
[2/18/13 11:59:08 AM] Ted Han: i forget how python lets you access it, it might be in the variable \1 | |
[2/18/13 11:59:10 AM] Ted Han: you'll have to read up on that | |
[2/18/13 11:59:18 AM] schwaumlaut: Mm hm. | |
[2/18/13 11:59:20 AM] Ted Han: so you can do things in ruby like... | |
[2/18/13 12:01:31 PM] Ted Han: >> "12/31/80".sub(/(\d{2}\/\d{2}\/)(\d{2})/, "#{$1}19#{$2}") | |
=> "12/31/1980" | |
[2/18/13 12:02:05 PM] Ted Han: that regexp will break "12/31/80" into two chunks, "12/31/" and "80" | |
[2/18/13 12:02:17 PM] Ted Han: and then you can mash together variables in an interpolated string | |
[2/18/13 12:02:19 PM] Ted Han: i forget whether python lets you interpolate or not. | |
[2/18/13 12:02:53 PM] schwaumlaut: Hmmm. I guess I will find out. Thanks! | |
[2/18/13 12:02:53 PM] Ted Han: but you get the idea. | |
[2/18/13 12:03:06 PM] Ted Han: so... remember you'll want to match the rest of the string after the date. | |
[2/18/13 12:03:16 PM] Ted Han: oh i forgot one of the other character classes | |
[2/18/13 12:03:25 PM] Ted Han: /./ # the dot will match any character | |
[2/18/13 12:04:06 PM] Ted Han: so | |
[2/18/13 12:04:19 PM] Ted Han: /.*/ # will match any string including an empty string. | |
[2/18/13 12:05:33 PM] Ted Han: ruby has two methods for replacing stuff in strings, sub, and gsub. the first substitutes one match, the second replaces all matches. | |
[2/18/13 12:06:08 PM] schwaumlaut: Alas, regular expressions look like gibberish to the uninitiated. | |
[2/18/13 12:06:57 PM] Ted Han: As has been said: | |
[2/18/13 12:07:26 PM] Ted Han: Some people, when confronted with a problem, think | |
“I know, I'll use regular expressions.” Now they have two problems. | |
[2/18/13 12:07:43 PM] schwaumlaut: Heh. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment