Last active
December 12, 2015 04:48
-
-
Save cbare/4716614 to your computer and use it in GitHub Desktop.
An example of regular expressions using capture groups in R.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| ## An example of regular expressions using capture groups | |
| ## in R. See: | |
| ## http://stackoverflow.com/questions/14700799/r-regex-gsub-extract-part-of-pattern/14714370#14714370 | |
| ############################################################ | |
| # example data | |
| data <- | |
| "Station lat lon | |
| 1940 K01R 31-08N 092-34W | |
| 1941 K01T 28-08N 094-24W | |
| 1942 K03Y 48-47N 096-57W | |
| 1943 K04V 38-05-50N 106-10-07W | |
| 1944 K05F 31-25-16N 097-47-49W | |
| 1945 K06D 48-53-04N 099-37-15W" | |
| ## read string into a data.frame | |
| df <- read.table(text=data, head=T, stringsAsFactors=F) | |
| ## here's the pattern we want to extract | |
| pattern <- "(\\d{1,3})-(\\d{1,3})(?:-(\\d{1,3}))?([NSWE]{1})" | |
| ## The stringr library's str_match function returns a data.frame | |
| ## in which the first column is the whole matched string and additional | |
| ## columns hold the contents of each capture-group in the regex. | |
| library(stringr) | |
| str_match(df$lat, pattern) | |
| ## Alternatively, the package gsubfn defines a strapply function, | |
| ## which R-ishly applies a function to each matching string. | |
| ## True to it's early version number (0.6-5) I came across some | |
| ## bugs. | |
| # http://code.google.com/p/gsubfn | |
| # install.packages('gsubfn') | |
| library(gsubfn) | |
| parts <- strapply(df$lat, pattern, FUN=c, simplify=rbind, backref=NULL) | |
Author
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Written in response to a StackOverflow question: R regex / gsub : extract part of pattern