evils_of_attach.md

"Why you should never use `attach()`"

We will use the internal mtcars dataset to illustrate.

dat <- mtcars
head(dat)

##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Now we'll attach dat to our workspace. You can see that we can acces the columns in dat without referencing dat itself. Handy, right?

attach(dat)
mpg

##  [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2
## [15] 10.4 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4
## [29] 15.8 19.7 15.0 21.4

## plot
plot(mpg, hp)

However, what if we want to alter one of the columns? Say, convert weight (wt) to kilograms instead of 1000 lbs?

wt <- wt * 1000 / 2.2

If we look and see what we've got in our global environment, you can see that we have a new variabale, which is no longer associated with dat, called wt.

ls()

## [1] "dat" "wt"

Now, if we refernce wt, which one will we get?

wt

##  [1] 1190.9091 1306.8182 1054.5455 1461.3636 1563.6364 1572.7273 1622.7273
##  [8] 1450.0000 1431.8182 1563.6364 1563.6364 1850.0000 1695.4545 1718.1818
## [15] 2386.3636 2465.4545 2429.5455 1000.0000  734.0909  834.0909 1120.4545
## [22] 1600.0000 1561.3636 1745.4545 1747.7273  879.5455  972.7273  687.7273
## [29] 1440.9091 1259.0909 1622.7273 1263.6364

We get the newly created wt in kg, not the original - this is probably good, as we likely want to use the new one. If we want to get the original, we have to go back to using $. You can see that it is unmodified.

dat$wt

##  [1] 2.620 2.875 2.320 3.215 3.440 3.460 3.570 3.190 3.150 3.440 3.440
## [12] 4.070 3.730 3.780 5.250 5.424 5.345 2.200 1.615 1.835 2.465 3.520
## [23] 3.435 3.840 3.845 1.935 2.140 1.513 3.170 2.770 3.570 2.780

Now pretend we have another dataset we want to attach (this will be a bit contrived, but bear with me). This dataset has the same variables, but is a different size

dat2 <- mtcars[1:10,]
attach(dat2)

## The following object is masked _by_ .GlobalEnv:
## 
##     wt

## The following objects are masked from dat:
## 
##     am, carb, cyl, disp, drat, gear, hp, mpg, qsec, vs, wt

We get a couple of warnings. The first one tells us that the wt variable is masked by wt in the global environment. This means that if we try to access wt, we will be getting the wt in kg that we created previously.

wt

##  [1] 1190.9091 1306.8182 1054.5455 1461.3636 1563.6364 1572.7273 1622.7273
##  [8] 1450.0000 1431.8182 1563.6364 1563.6364 1850.0000 1695.4545 1718.1818
## [15] 2386.3636 2465.4545 2429.5455 1000.0000  734.0909  834.0909 1120.4545
## [22] 1600.0000 1561.3636 1745.4545 1747.7273  879.5455  972.7273  687.7273
## [29] 1440.9091 1259.0909 1622.7273 1263.6364

So now, if we plot something using wt, hoping to find something out about dat2, we get an error because it is trying to plot the wt in kg from the first dat, with the mpg from dat2 - they are different lengths:

plot(wt, mpg)

## Error in xy.coords(x, y, xlabel, ylabel, log): 'x' and 'y' lengths differ

Plotting other variables in dat2 gives us what we expect:

plot(mpg, hp)

The second warning, about am, carb, cyl, disp, drat, gear, hp, mpg, qsec, vs, wt being masked from dat, means that the original variables in dat are no longer accessible by typing their name directly; we will have to now access them using $ (or, in some instances such as with plot, or lm, we can optionally specify the data frame they come from):

plot(dat$mpg, dat$hp)

plot(hp ~ mpg, data = dat)

But what if we want to use the original dat and look at wt, using the metric values of wt?

plot(mpg ~ wt, data = dat)

It's clearly using the original wt in dat, in lbs * 1000. If we want to use the metic wt, we have to do a strange mixture of one variable from the data frame, and one from the global environment:

plot(dat$mpg ~ wt)

It is possible to use attach() without causing errors (or worse, things working but not in the way you expect), but it gets pretty confusing pretty quickly. At the expense of a little bit more typing, it is strongly recommended that you don't ever use attach().

ateucher/evils_of_attach.md

"Why you should never use attach()"

"Why you should never use `attach()`"