The lookfor function emulates the lookfor Stata command in R. It searches for one or more keywords in the variable names of a dataset. It can also search the variable labels of datasets imported into R with the foreign and memisc packages.
install.packages("devtools")
library(devtools)
# install
source_gist("https://gist.github.com/briatte/4699960")
# recommended
install.packages("memisc")
lookfor(data, keywords = "weight|sample", labels = TRUE, ignore.case = TRUE)
datais a data frame, which can be annotated by theread.dtaorread.spssfunctions of theforeignpackage, or built by thedata.setorimportermethods of thememiscpackage.keywordsis a character string, which can be formatted as a regular expression, or a vector of character strings; the syntax of regular expression patterns must be that of thegrepfunction.labelsindicates whether or not to search variable labels, as passed through attributes by either theforeignor thememiscmethods;labels = TRUEby default.ignore.caseindicates whether or not to make the keywords case sensitive;ignore.case = TRUEby default, which means that, as in the Statalookforcommand, case is ignored during matching.
The lookfor function requires a dataset and usually takes one or more additional keyword(s) as a character string. Only variable names are searched in datasets with no variable descriptions, as below.
lookfor(iris, "petal")
The memisc package offers a simple way to try out the command on a richer form of dataset. The following chunk loads the data file of the [American National Election Study of 1948](http://www.electionstudies.org/studypages/1948prepost/1948prepost. htm) in SPSS format.
require(memisc)
nes1948.por <- UnZip("anes/NES1948.ZIP","NES1948.POR", package="memisc")
nes1948 <- spss.portable.file(nes1948.por)
The lookfor function accepts either a single keyword, a vector of keywords, or a regular expression that matches the syntax of a grep pattern.
# Look for single keyword.
lookfor(nes1948, "truman")
# Look for a vector of keywords.
lookfor(nes1948, c("truman", "dewey"))
# Look for a regular expression.
lookfor(nes1948, "truman|dewey")
Variable labels can be excluded from the search scope. This causes the previous examples to find nothing in the variable names alone. Identically, making the search case sensitive will fail to find anything.
lookfor(nes1948, "truman", labels = FALSE)
lookfor(nes1948, "truman", ignore.case = FALSE)
The next examples require to download the data file for the General Social Survey of 2010 in Stata format. The data is first imported with the memisc package.
# Download the GSS 2010.
if(!file.exists(zip <- "2010.zip")) {
url <- "http://publicdata.norc.org/GSS/DOCUMENTS/OTHR/2010_stata.zip"
download.file(url, zip)
}
# Load as a memisc object.
gss <- UnZip(zip, "2010.dta")
gss <- Stata.file(gss)
If no keyword is specified, the lookfor function searches for the 'sample' and 'weight' keywords by default, assuming that the user might start by looking for sampling and weighting variables.
lookfor(gss)
The lookfor function will also read variable labels from objects loaded with the foreign package.
require(foreign)
unzip(zip)
# Load as a data frame with attributes.
gss <- read.dta("2010.dta")
# Look for a single keyword.
lookfor(gss, "homosex")
Variable labels are searched by looking at the var.labels and variable.labels attributes in all data frames, and at the results of the description function in memisc objects.
The query function of the memisc package also allows to search for keywords in a data file, and supports fuzzy search via agrep. It also covers value labels, which makes it 'wider' than lookfor.
Please send comments and suggestions through this Gist or by email.