The lookfor
function emulates the lookfor
Stata command in R. It searches for one or more keywords in the variable names of a dataset. It can also search the variable labels of datasets imported into R with the foreign
and memisc
packages.
install.packages("devtools")
library(devtools)
# install
source_gist("https://gist.github.com/briatte/4699960")
# recommended
install.packages("memisc")
lookfor(data, keywords = "weight|sample", labels = TRUE, ignore.case = TRUE)
data
is a data frame, which can be annotated by theread.dta
orread.spss
functions of theforeign
package, or built by thedata.set
orimporter
methods of thememisc
package.keywords
is a character string, which can be formatted as a regular expression, or a vector of character strings; the syntax of regular expression patterns must be that of thegrep
function.labels
indicates whether or not to search variable labels, as passed through attributes by either theforeign
or thememisc
methods;labels = TRUE
by default.ignore.case
indicates whether or not to make the keywords case sensitive;ignore.case = TRUE
by default, which means that, as in the Statalookfor
command, case is ignored during matching.
The lookfor
function requires a dataset and usually takes one or more additional keyword(s) as a character string. Only variable names are searched in datasets with no variable descriptions, as below.
lookfor(iris, "petal")
The memisc
package offers a simple way to try out the command on a richer form of dataset. The following chunk loads the data file of the [American National Election Study of 1948](http://www.electionstudies.org/studypages/1948prepost/1948prepost. htm) in SPSS format.
require(memisc)
nes1948.por <- UnZip("anes/NES1948.ZIP","NES1948.POR", package="memisc")
nes1948 <- spss.portable.file(nes1948.por)
The lookfor
function accepts either a single keyword, a vector of keywords, or a regular expression that matches the syntax of a grep
pattern.
# Look for single keyword.
lookfor(nes1948, "truman")
# Look for a vector of keywords.
lookfor(nes1948, c("truman", "dewey"))
# Look for a regular expression.
lookfor(nes1948, "truman|dewey")
Variable labels can be excluded from the search scope. This causes the previous examples to find nothing in the variable names alone. Identically, making the search case sensitive will fail to find anything.
lookfor(nes1948, "truman", labels = FALSE)
lookfor(nes1948, "truman", ignore.case = FALSE)
The next examples require to download the data file for the General Social Survey of 2010 in Stata format. The data is first imported with the memisc
package.
# Download the GSS 2010.
if(!file.exists(zip <- "2010.zip")) {
url <- "http://publicdata.norc.org/GSS/DOCUMENTS/OTHR/2010_stata.zip"
download.file(url, zip)
}
# Load as a memisc object.
gss <- UnZip(zip, "2010.dta")
gss <- Stata.file(gss)
If no keyword is specified, the lookfor
function searches for the 'sample' and 'weight' keywords by default, assuming that the user might start by looking for sampling and weighting variables.
lookfor(gss)
The lookfor
function will also read variable labels from objects loaded with the foreign
package.
require(foreign)
unzip(zip)
# Load as a data frame with attributes.
gss <- read.dta("2010.dta")
# Look for a single keyword.
lookfor(gss, "homosex")
Variable labels are searched by looking at the var.labels
and variable.labels
attributes in all data frames, and at the results of the description
function in memisc
objects.
The query
function of the memisc
package also allows to search for keywords in a data file, and supports fuzzy search via agrep
. It also covers value labels, which makes it 'wider' than lookfor
.
Please send comments and suggestions through this Gist or by email.