pedroj · February 21, 2012 00:01
diff --git a/table_manip.txt b/table_manip.txt
 *******************************************************************
 *** Solutions to a question I posted on the R-news mailing list ***
 *** about simple table manipulation                             ***
 *** Campbell Webb post in R-help                                ***
 *******************************************************************

 THE QUESTION:

 Hi all,

 I am fairly new to S-plus (actually R, on linux), and am continually
 frustrated at how hard it seems to be to do the most basic table/matrix
 manipulations.  So, if anyone has some spare time for a challenge, I pose
 the following problem...

 Of several canned stats packages, I have found the table manipulation
 options in JMP to be the most helpful/intuitive.  The options there
 are Group, Split, Stack, Join, Subset, Transpose, Sort.  What would be
 the simplest S-plus/R code to do the same?  Some of these are quite easy
 but I leave them in for completeness.  A full solution to this will be an
 invaluable resource - apologies/request for directions if it already
 exists!

 In detail, with examples:

 1) GROUP: Turn this table(A):

 A  a  1
 A  a  3
 A  a  1
 A  b  1
 A  b  2
 B  c  2
 B  d  1

 into this (B):

 Group1 Group2 Nrows SumOfCol3
 A      a      3     5
 A      b      2     3
 B      c      1     2
 B      d      1     1


 2) SPLIT: Turn col3 of the second table (B) into this (C):

   a  b  c  d
 A  3  2  .  .
 B  .  .  1  1


 3) STACK: Turn the above table (C) into this (D):

 A  a  3
 A  b  2
 A  c  .
 A  d  .
 B  a  .
 B  b  .
 B  c  1
 B  d  1


 4) JOIN these tables (E, F):

 A  1
 B  2
 C  3

 A  a
 B  a
 B  a
 B  b
 C  c
 A  b

 to give (G):

 A  a  1
 B  a  2
 B  a  2
 B  b  2
 C  c  3
 A  b  1


 5) SUBSET Table G to give H:

 A  a  1
 A  b  1


 6)  TRANSPOSE H to give:

 A  A
 a  b
 1  1

 7) and finally SORT, by up to 3 keys.


 Solutions referring to data frames would be most helpful. A big thanks to
 anyone who responds!

 Cam

 This message was distributed by [email protected].


 THE ANSWERS:

 1111111111111111111111111111111111111111111111111111111111111111111111111111111

 ********** From: Bill Dunlap <[email protected]> **********

 > 1) GROUP: Turn this table(A):
 >
 > A  a  1
 > A  a  3
 > A  a  1
 > A  b  1
 > A  b  2
 > B  c  2
 > B  d  1
 >
 > into this (B):
 >
 > Group1 Group2 Nrows SumOfCol3
 > A      a      3     5
 > A      b      2     3
 > B      c      1     2
 > B      d      1     1

 You can use aggregate(), but you only get 1 new column for each call
 to aggregate().  E.g.,

 > aggregate(A$Number,  by=A[,c("Group1","Group2")], FUN=sum)

  Group1 Group2 x
 1      A      a 5
 2      A      b 3
 3      B      c 2
 4      B      d 1

 > aggregate(A$Number,  by=A[,c("Group1","Group2")], FUN=length)

  Group1 Group2 x
 1      A      a 3
 2      A      b 2
 3      B      c 1
 4      B      d 1

 Use, e.g., cbind() to paste together the columns of interest:

 > B <- aggregate(A$Number,  by=A[,c("Group1","Group2")], FUN=length)
 > names(B)[3] <- "Nrows"
 > B <- cbind(B,SumOfCol3=aggregate(A$Number,  by=A[,c("Group1","Group2")], FUN=sum)$x)
 > B

  Group1 Group2 Nrows SumOfCol3
 1      A      a     3         5
 2      A      b     2         3
 3      B      c     1         2
 4      B      d     1         1

 aggregate() is a pretty general purpose function and there exist versions
 which are optimized for FUN=sum or length in slightly restricted circumstances.
 They are much faster when you are dealing with a lot of categories.
 Check in the S-news archives for aggsum or agglength.


 ********** From: Ed Kademan <[email protected]> **********
 ## I am doing the following in R.

 A <- data.frame(
       c1 = c('A', 'A', 'A', 'A', 'A', 'B', 'B'),
       c2 = c('a', 'a', 'a', 'b', 'b', 'c', 'd'),
       c3 = c(1, 3, 1, 1, 2, 2, 1))

 problem1 <-
  function(A)
 {
  nRows     <- as.vector(table(A[1:2]))
  sumOfCol3 <- as.vector(tapply(A[[3]], A[1:2], sum))
  groups    <- expand.grid(lapply(A[1:2], levels))
  dimnames(groups)[[2]][1:2] <- c('Group1', 'Group2')
  cbind(groups, nRows, sumOfCol3)[nRows > 0,]
 }

 B <- problem1(A)


 ********** From: Frank Harrell <[email protected]> **********

 Some of what you want is worked out in
 http://hesweb1.med.virginia.edu/biostat/s/doc/splus.pdf
 section 4.2

 - My libraries will be available in R in a few weeks


 ********** From: "Patrick Connolly" <[email protected]> **********

 Check out aggregate().  It'll do most of what you want.


 ********** From: [email protected] **********

 2 calls to aggregate() do this


 2222222222222222222222222222222222222222222222222222222222222222222222222222222

 ********** From: Bill Dunlap <[email protected]> **********

 > 2) SPLIT: Turn col3 of the second table (B) into this (C):
 >
 >    a  b  c  d
 > A  3  2  .  .
 > B  .  .  1  1

 The fastest way to get this result is to start with the raw data A:

   > table(A$Group1, A$Group2)
     a b c d
   A 3 2 0 0
   B 0 0 1 1

 Do you really want missing values (NA's in Splus) where table puts 0's?
 Do you really want to be able to turn B directly into C?


 ********** From: Ed Kademan <[email protected]> **********

 ## I am not turning B into C as you specified, but maybe this is ok.

 C <- table(A[1:2])


 ********** From: [email protected] **********

 see my attached function table.trans (originally Richard Heiberger's code)
 table.trans(df[,3],df[1:2])

 > table.trans
 function(X, INDICES, ...)
 {
 	len.ind <- length(INDICES)
 	IN1 <- interaction(INDICES[ - len.ind], drop = T)
 	IN2 <- INDICES[[len.ind]]
 	a <- tapply(X, data.frame(IN1, IN2), c, ...)
 	cbind(INDICES[seq(1, nrow(INDICES), length(levels(IN2))),  -
 len.ind], a)
 }

 HELP FILE:
 filename="table_trans.d"

 .BG
 .FN table.trans
 .TL
 Transpose a data frame with 'by' variables
 .DN
 Transposes a data frame with SAS-style 'by' variables
 .CS
 table.trans(X, INDICES, sep="#")
 .RA
 .AG X
 The single data vector to transpose
 .AG INDICES
 All columns except the last are the 'by' variables.
 The last column is a factor whose levels are the names
 of the transposed columns.
 .OA
 .AG sep
 Delimiter used internally. Shouldn't occur as a character in any of the 'by' variables
 .RT
 A data frame with columns corresponding to 'by' variables
 and extra columns corresponding to levels of last column 
 of INDICES whose value comes from X
 .SE
 none
 .DT
 Transposes all values for vector X into columns with names given by the
 levels of the last column of INDICES. Missing entries are denoted by NA. 
 Usually the 'by' variables in INDICES will be factors or discrete values.
 .br
 The inverse of this function is `reshape', which folds columns into rows.
 .SH REFERENCES
 Adapted from code by Richard M Heiberger in s-news 22 Mar 1996
 .SA
 `reshape', as.data.frame.array
 .EX
 > names(dimnames(iris)) <- c("flower", "measurement", "species")
 > iris1 <- as.data.frame.array(iris, row.dims = 1:3)
 > iris1[1:10,  ]
   x.1 flower measurement species 
 1 5.1      1    Sepal L.  Setosa
 2 4.9      2    Sepal L.  Setosa
 3 4.7      3    Sepal L.  Setosa
 4 4.6      4    Sepal L.  Setosa
 5 5.0      5    Sepal L.  Setosa
 6 5.4      6    Sepal L.  Setosa
 7 4.6      7    Sepal L.  Setosa
 8 5.0      8    Sepal L.  Setosa
 9 4.4      9    Sepal L.  Setosa
 10 4.9     10    Sepal L.  Setosa
 > iris.meas <- table.trans(iris1[, 1], iris1[, c(2, 4, 3)])
 > iris.meas[1:10,  ]
   flower    species Sepal.L. Sepal.W. Petal.L. Petal.W. 
 1      1     Setosa      5.1      3.5      1.4      0.2
 2      1 Versicolor      7.0      3.2      4.7      1.4
 3      1  Virginica      6.3      3.3      6.0      2.5
 4      2     Setosa      4.9      3.0      1.4      0.2
 5      2 Versicolor      6.4      3.2      4.5      1.5
 6      2  Virginica      5.8      2.7      5.1      1.9
 7      3     Setosa      4.7      3.2      1.3      0.2
 8      3 Versicolor      6.9      3.1      4.9      1.5
 9      3  Virginica      7.1      3.0      5.9      2.1
 10      4     Setosa      4.6      3.1      1.5      0.2
 > samp <- sample(nrow(iris1), 20)
 > iris.meas <- table.trans(iris1[ - samp, 1], iris1[ - samp, c(2, 3, 4)])
 > iris.meas[1:10,  ]
   flower measurement Setosa Versicolor Virginica 
 1      1    Sepal L.    5.1        7.0       6.3
 2      1    Sepal W.    3.5        3.2       3.3
 3      1    Petal L.    1.4        4.7       6.0
 4      1    Petal W.    0.2        1.4       2.5
 5      2    Sepal L.    4.9        6.4       5.8
 6      2    Sepal W.    3.0        3.2       2.7
 7      2    Petal L.    1.4        4.5       5.1
 8      2    Petal W.    0.2        1.5       1.9
 9      3    Sepal L.    4.7        6.9       7.1
 10      3    Sepal W.    3.2        3.1       3.0

 .KW misc
 .WR


 3333333333333333333333333333333333333333333333333333333333333333333333333333333

 ********** From: Bill Dunlap <[email protected]> **********

 > 3) STACK: Turn the above table (C) into this (D):
 >
 > A  a  3
 > A  b  2
 > A  c  .
 > A  d  .
 > B  a  .
 > B  b  .
 > B  c  1
 > B  d  1

   I think an equivalent result is
   > as.data.frame.array(table(A$Group1, A$Group2), row.dims=1:2)
     X Dim1 Dim2
   1 3    A    a
   2 0    B    a
   3 2    A    b
   4 0    B    b
   5 0    A    c
   6 1    B    c
   7 0    A    d
   8 1    B    d
   See the sorting section on how to reorder it to your taste.
   The GUI has a pretty flexible stacking function.


 ********** From: Ed Kademan <[email protected]> **********

 problem3 <- function(C)
 {
  C <- t(C)
  cbind(expand.grid(dimnames(C))[2:1], as.vector(C))
 }

 D <- problem3(C)


 ********** From: [email protected] **********

 see my attached function reshape()
 reshape(df,col.unfol=1:4)

 > reshape
 function(obj, col.copy = NULL, col.unfold = 1:dim(obj)[2], label = "label",
 value = "value")
 {
 	n <- dim(obj)[1]
 	p <- length(col.unfold)
 	res <- vector("list", length(col.copy) + 2)
 	names(res) <- c(if(is.null(col.copy)) NULL else dimnames(obj[,
 col.copy, drop = F])[[2]], label, value)
 	for(i in seq(along = col.copy)) {
 		res[[i]] <- rep(obj[, col.copy[i]], p)
 	}
 	res[[label]] <- rep(dimnames(obj[, col.unfold, drop = F])[[2]],
 rep(n, p))
 	res[[value]] <- as.vector(data.matrix(obj[, col.unfold, drop = F]))
 	as.data.frame(res)
 }

 HELP FILE:
 filename="reshape.d"

 .BG
 .FN reshape
 .TL
 Reshape, unfold or transpose a data frame
 .DN
 Reshape, unfold or transpose a data frame. 
 .CS
 reshape(obj, col.copy=NULL, col.unfold=1:dim(obj)[2], 
 label="label", value="value")
 .RA
 .AG obj
 the data frame to reshape
 .OA
 .AG col.copy
 The columns of `obj' to copy to result
 .AG col.unfold
 The columns of `obj' to be unfolded to rows
 .AG label
 the name of the label column
 .AG value
 the name of the value column
 .RT
 A data frame consisting of the copied columns replicated p times
 (where p is number of unfolded columns) and two extra columns:
 label contains the name of the unfolded column
 value contains the value
 .SE
 none
 .DT
 see code below. The inverse of this function is table.trans.
 .SH WARNINGS
 the data in the unfolded columns should all have the same mode
 .SA
 `table.trans', as.data.frame.array
 .EX
 > air[1:5,  ]
     ozone radiation temperature wind 
 1 3.448217       190          67  7.4
 2 3.301927       118          72  8.0
 3 2.289428       149          74 12.6
 4 2.620741       313          62 11.5
 5 2.843867       299          65  8.6
 > reshape(air[1:5,],2,c(1,3,4),"attribute")
   radiation   attribute     value 
 1       190       ozone  3.448217
 2       118       ozone  3.301927
 3       149       ozone  2.289428
 4       313       ozone  2.620741
 5       299       ozone  2.843867
 6       190 temperature 67.000000
 7       118 temperature 72.000000
 8       149 temperature 74.000000
 9       313 temperature 62.000000
 10       299 temperature 65.000000
 11       190        wind  7.400000
 12       118        wind  8.000000
 13       149        wind 12.600000
 14       313        wind 11.500000
 15       299        wind  8.600000
 > reshape(air[1:5,],label="attribute")
     attribute      value 
 1       ozone   3.448217
 2       ozone   3.301927
 3       ozone   2.289428
 4       ozone   2.620741
 5       ozone   2.843867
 6   radiation 190.000000
 7   radiation 118.000000
 8   radiation 149.000000
 9   radiation 313.000000
 10   radiation 299.000000
 11 temperature  67.000000
 12 temperature  72.000000
 13 temperature  74.000000
 14 temperature  62.000000
 15 temperature  65.000000
 16        wind   7.400000
 17        wind   8.000000
 18        wind  12.600000
 19        wind  11.500000
 20        wind   8.600000
 .KW misc
 .WR


 4444444444444444444444444444444444444444444444444444444444444444444444444444444

 ********** From: Bill Dunlap <[email protected]> **********

 > 4) JOIN these tables (E, F):
 >
 > A  1
 > B  2
 > C  3
 >
 > A  a
 > B  a
 > B  a
 > B  b
 > C  c
 > A  b
 >
 > to give (G):
 >
 > A  a  1
 > B  a  2
 > B  a  2
 > B  b  2
 > C  c  3
 > A  b  1

 Use merge():

    > E<-data.frame(Group1=c("A","B","C"), Number=1:3)
    > # F is a reserved word, so I call it FF
    > FF <- data.frame(Group1=c("A","B","B","B","C","A"), Group2=c("a","a","a","b","c","b"))
    > G <- merge(FF, E)
    > G
       Group1 Group2 Number
     1      A      a      1
     2      A      b      1
     3      B      a      2
     4      B      a      2
     5      B      b      2
     6      C      c      3

     You get a different order but I think it is an equivalent table.


 ********** From: Ed Kademan <[email protected]> **********

 E <- data.frame(c1 = c('A', 'B', 'C'), c2 = 1:3)
 F <- data.frame(c1 = c('A', 'B', 'B', 'B', 'C', 'A'),
                c2 = c('a', 'a', 'a', 'b', 'c', 'b'))
 ## F is a very bad name for a variable in R, by the way.

 G <- cbind(F, E[[2]][match(F[[1]], E[[1]])])


 ********** From: [email protected] **********

 merge(df1,df2) # if fields have common name


 5555555555555555555555555555555555555555555555555555555555555555555555555555555

 ********** From: Bill Dunlap <[email protected]> **********

 > 5) SUBSET Table G to give H:
 >
 > A  a  1
 > A  b  1

   You didn't say what subset this represents.  To get the first 2 rows
   use G[1:2,].  To get the rows with Number==1 use G[ G$Number==1,].
   To get the rows with Group1=="A" use G[ G$Group1=="A",].  To get the
   rows with Group1=="A" and Group2 != "c" use G[G$Group1=="A" & G$Group2!="c",].


 ********** From: Ed Kademan <[email protected]> **********

 H <- G[G[[1]] == 'A',]
 ## I am assuming you just want those rows whose first column contains
 ## 'A.'


 ********** From: [email protected] **********

 df[df[,1] == "A",]
 also see help(subscript)


 6666666666666666666666666666666666666666666666666666666666666666666666666666666

 ********** From: Bill Dunlap <[email protected]> **********

 > 6)  TRANSPOSE H to give:
 >
 > A  A
 > a  b
 > 1  1

   Transpose is harder because data frames must have only one type of data
   per column.  Our transpose function converts the data frame into a
   matrix (which has one type for the whole matrix) and returns the transpose
   of that matrix.  You would have to convert that back to a data frame:

   > data.frame(t(H))
          X1 X2
   Group1  A  A
   Group2  a  b
   Number  1  1


 ********** From: Ed Kademan <[email protected]> **********

 transposeH <- t(H)


 ********** From: [email protected] **********

 t(df)


 7777777777777777777777777777777777777777777777777777777777777777777777777777777

 ********** From: Bill Dunlap <[email protected]> **********

 > 7) and finally SORT, by up to 3 keys.

   Use order() and subscripting.  E.g.,
   > G[ order(G$Group1, G$Group2, G$Number), ]
     Group1 Group2 Number
   1      A      a      1
   2      A      b      1
   3      B      a      2
   4      B      a      2
   5      B      b      2
   6      C      c      3
   > G[ order(G$Number, G$Group2), ]
     Group1 Group2 Number
   1      A      a      1
   2      A      b      1
   3      B      a      2
   4      B      a      2
   5      B      b      2
   6      C      c      3


 ********** From: Ed Kademan <[email protected]> **********

 dfr       <- as.data.frame(matrix(sample(5, 120, replace = T), 30, 4))
 sortedDfr <- dfr[do.call('order', dfr[1:3]),]
 ## The first column is the primary key, the second is the secondary
 ## key, and the third is tertiary.


 ********** From: [email protected] **********

 use order()

 e.g. df[order(df$a,df$b,df$c),]


 ===============================================================================
	*******************************************************************
	* Solutions to a question I posted on the R-news mailing list *
	* about simple table manipulation *
	* Campbell Webb post in R-help *
	*******************************************************************

	THE QUESTION:

	Hi all,

	I am fairly new to S-plus (actually R, on linux), and am continually
	frustrated at how hard it seems to be to do the most basic table/matrix
	manipulations. So, if anyone has some spare time for a challenge, I pose
	the following problem...

	Of several canned stats packages, I have found the table manipulation
	options in JMP to be the most helpful/intuitive. The options there
	are Group, Split, Stack, Join, Subset, Transpose, Sort. What would be
	the simplest S-plus/R code to do the same? Some of these are quite easy
	but I leave them in for completeness. A full solution to this will be an
	invaluable resource - apologies/request for directions if it already
	exists!

	In detail, with examples:

	1) GROUP: Turn this table(A):

	A a 1
	A a 3
	A a 1
	A b 1
	A b 2
	B c 2
	B d 1

	into this (B):

	Group1 Group2 Nrows SumOfCol3
	A a 3 5
	A b 2 3
	B c 1 2
	B d 1 1


	2) SPLIT: Turn col3 of the second table (B) into this (C):

	a b c d
	A 3 2 . .
	B . . 1 1


	3) STACK: Turn the above table (C) into this (D):

	A a 3
	A b 2
	A c .
	A d .
	B a .
	B b .
	B c 1
	B d 1


	4) JOIN these tables (E, F):

	A 1
	B 2
	C 3

	A a
	B a
	B a
	B b
	C c
	A b

	to give (G):

	A a 1
	B a 2
	B a 2
	B b 2
	C c 3
	A b 1


	5) SUBSET Table G to give H:

	A a 1
	A b 1


	6) TRANSPOSE H to give:

	A A
	a b
	1 1

	7) and finally SORT, by up to 3 keys.


	Solutions referring to data frames would be most helpful. A big thanks to
	anyone who responds!

	Cam

	This message was distributed by [email protected].


	THE ANSWERS:

	1111111111111111111111111111111111111111111111111111111111111111111111111111111

	******** From: Bill Dunlap <[email protected]> ********

	> 1) GROUP: Turn this table(A):
	>
	> A a 1
	> A a 3
	> A a 1
	> A b 1
	> A b 2
	> B c 2
	> B d 1
	>
	> into this (B):
	>
	> Group1 Group2 Nrows SumOfCol3
	> A a 3 5
	> A b 2 3
	> B c 1 2
	> B d 1 1

	You can use aggregate(), but you only get 1 new column for each call
	to aggregate(). E.g.,

	> aggregate(A$Number, by=A[,c("Group1","Group2")], FUN=sum)

	Group1 Group2 x
	1 A a 5
	2 A b 3
	3 B c 2
	4 B d 1

	> aggregate(A$Number, by=A[,c("Group1","Group2")], FUN=length)

	Group1 Group2 x
	1 A a 3
	2 A b 2
	3 B c 1
	4 B d 1

	Use, e.g., cbind() to paste together the columns of interest:

	> B <- aggregate(A$Number, by=A[,c("Group1","Group2")], FUN=length)
	> names(B)[3] <- "Nrows"
	> B <- cbind(B,SumOfCol3=aggregate(A$Number, by=A[,c("Group1","Group2")], FUN=sum)$x)
	> B

	Group1 Group2 Nrows SumOfCol3
	1 A a 3 5
	2 A b 2 3
	3 B c 1 2
	4 B d 1 1

	aggregate() is a pretty general purpose function and there exist versions
	which are optimized for FUN=sum or length in slightly restricted circumstances.
	They are much faster when you are dealing with a lot of categories.
	Check in the S-news archives for aggsum or agglength.


	******** From: Ed Kademan <[email protected]> ********
	## I am doing the following in R.

	A <- data.frame(
	c1 = c('A', 'A', 'A', 'A', 'A', 'B', 'B'),
	c2 = c('a', 'a', 'a', 'b', 'b', 'c', 'd'),
	c3 = c(1, 3, 1, 1, 2, 2, 1))

	problem1 <-
	function(A)
	{
	nRows <- as.vector(table(A[1:2]))
	sumOfCol3 <- as.vector(tapply(A[[3]], A[1:2], sum))
	groups <- expand.grid(lapply(A[1:2], levels))
	dimnames(groups)[[2]][1:2] <- c('Group1', 'Group2')
	cbind(groups, nRows, sumOfCol3)[nRows > 0,]
	}

	B <- problem1(A)


	******** From: Frank Harrell <[email protected]> ********

	Some of what you want is worked out in
	http://hesweb1.med.virginia.edu/biostat/s/doc/splus.pdf
	section 4.2

	- My libraries will be available in R in a few weeks


	******** From: "Patrick Connolly" <[email protected]> ********

	Check out aggregate(). It'll do most of what you want.


	******** From: [email protected] ********

	2 calls to aggregate() do this


	2222222222222222222222222222222222222222222222222222222222222222222222222222222

	******** From: Bill Dunlap <[email protected]> ********

	> 2) SPLIT: Turn col3 of the second table (B) into this (C):
	>
	> a b c d
	> A 3 2 . .
	> B . . 1 1

	The fastest way to get this result is to start with the raw data A:

	> table(A$Group1, A$Group2)
	a b c d
	A 3 2 0 0
	B 0 0 1 1

	Do you really want missing values (NA's in Splus) where table puts 0's?
	Do you really want to be able to turn B directly into C?


	******** From: Ed Kademan <[email protected]> ********

	## I am not turning B into C as you specified, but maybe this is ok.

	C <- table(A[1:2])


	******** From: [email protected] ********

	see my attached function table.trans (originally Richard Heiberger's code)
	table.trans(df[,3],df[1:2])

	> table.trans
	function(X, INDICES, ...)
	{
	len.ind <- length(INDICES)
	IN1 <- interaction(INDICES[ - len.ind], drop = T)
	IN2 <- INDICES[[len.ind]]
	a <- tapply(X, data.frame(IN1, IN2), c, ...)
	cbind(INDICES[seq(1, nrow(INDICES), length(levels(IN2))), -
	len.ind], a)
	}

	HELP FILE:
	filename="table_trans.d"

	.BG
	.FN table.trans
	.TL
	Transpose a data frame with 'by' variables
	.DN
	Transposes a data frame with SAS-style 'by' variables
	.CS
	table.trans(X, INDICES, sep="#")
	.RA
	.AG X
	The single data vector to transpose
	.AG INDICES
	All columns except the last are the 'by' variables.
	The last column is a factor whose levels are the names
	of the transposed columns.
	.OA
	.AG sep
	Delimiter used internally. Shouldn't occur as a character in any of the 'by' variables
	.RT
	A data frame with columns corresponding to 'by' variables
	and extra columns corresponding to levels of last column
	of INDICES whose value comes from X
	.SE
	none
	.DT
	Transposes all values for vector X into columns with names given by the
	levels of the last column of INDICES. Missing entries are denoted by NA.
	Usually the 'by' variables in INDICES will be factors or discrete values.
	.br
	The inverse of this function is `reshape', which folds columns into rows.
	.SH REFERENCES
	Adapted from code by Richard M Heiberger in s-news 22 Mar 1996
	.SA
	`reshape', as.data.frame.array
	.EX
	> names(dimnames(iris)) <- c("flower", "measurement", "species")
	> iris1 <- as.data.frame.array(iris, row.dims = 1:3)
	> iris1[1:10, ]
	x.1 flower measurement species
	1 5.1 1 Sepal L. Setosa
	2 4.9 2 Sepal L. Setosa
	3 4.7 3 Sepal L. Setosa
	4 4.6 4 Sepal L. Setosa
	5 5.0 5 Sepal L. Setosa
	6 5.4 6 Sepal L. Setosa
	7 4.6 7 Sepal L. Setosa
	8 5.0 8 Sepal L. Setosa
	9 4.4 9 Sepal L. Setosa
	10 4.9 10 Sepal L. Setosa
	> iris.meas <- table.trans(iris1[, 1], iris1[, c(2, 4, 3)])
	> iris.meas[1:10, ]
	flower species Sepal.L. Sepal.W. Petal.L. Petal.W.
	1 1 Setosa 5.1 3.5 1.4 0.2
	2 1 Versicolor 7.0 3.2 4.7 1.4
	3 1 Virginica 6.3 3.3 6.0 2.5
	4 2 Setosa 4.9 3.0 1.4 0.2
	5 2 Versicolor 6.4 3.2 4.5 1.5
	6 2 Virginica 5.8 2.7 5.1 1.9
	7 3 Setosa 4.7 3.2 1.3 0.2
	8 3 Versicolor 6.9 3.1 4.9 1.5
	9 3 Virginica 7.1 3.0 5.9 2.1
	10 4 Setosa 4.6 3.1 1.5 0.2
	> samp <- sample(nrow(iris1), 20)
	> iris.meas <- table.trans(iris1[ - samp, 1], iris1[ - samp, c(2, 3, 4)])
	> iris.meas[1:10, ]
	flower measurement Setosa Versicolor Virginica
	1 1 Sepal L. 5.1 7.0 6.3
	2 1 Sepal W. 3.5 3.2 3.3
	3 1 Petal L. 1.4 4.7 6.0
	4 1 Petal W. 0.2 1.4 2.5
	5 2 Sepal L. 4.9 6.4 5.8
	6 2 Sepal W. 3.0 3.2 2.7
	7 2 Petal L. 1.4 4.5 5.1
	8 2 Petal W. 0.2 1.5 1.9
	9 3 Sepal L. 4.7 6.9 7.1
	10 3 Sepal W. 3.2 3.1 3.0

	.KW misc
	.WR


	3333333333333333333333333333333333333333333333333333333333333333333333333333333

	******** From: Bill Dunlap <[email protected]> ********

	> 3) STACK: Turn the above table (C) into this (D):
	>
	> A a 3
	> A b 2
	> A c .
	> A d .
	> B a .
	> B b .
	> B c 1
	> B d 1

	I think an equivalent result is
	> as.data.frame.array(table(A$Group1, A$Group2), row.dims=1:2)
	X Dim1 Dim2
	1 3 A a
	2 0 B a
	3 2 A b
	4 0 B b
	5 0 A c
	6 1 B c
	7 0 A d
	8 1 B d
	See the sorting section on how to reorder it to your taste.
	The GUI has a pretty flexible stacking function.


	******** From: Ed Kademan <[email protected]> ********

	problem3 <- function(C)
	{
	C <- t(C)
	cbind(expand.grid(dimnames(C))[2:1], as.vector(C))
	}

	D <- problem3(C)


	******** From: [email protected] ********

	see my attached function reshape()
	reshape(df,col.unfol=1:4)

	> reshape
	function(obj, col.copy = NULL, col.unfold = 1:dim(obj)[2], label = "label",
	value = "value")
	{
	n <- dim(obj)[1]
	p <- length(col.unfold)
	res <- vector("list", length(col.copy) + 2)
	names(res) <- c(if(is.null(col.copy)) NULL else dimnames(obj[,
	col.copy, drop = F])[[2]], label, value)
	for(i in seq(along = col.copy)) {
	res[[i]] <- rep(obj[, col.copy[i]], p)
	}
	res[[label]] <- rep(dimnames(obj[, col.unfold, drop = F])[[2]],
	rep(n, p))
	res[[value]] <- as.vector(data.matrix(obj[, col.unfold, drop = F]))
	as.data.frame(res)
	}

	HELP FILE:
	filename="reshape.d"

	.BG
	.FN reshape
	.TL
	Reshape, unfold or transpose a data frame
	.DN
	Reshape, unfold or transpose a data frame.
	.CS
	reshape(obj, col.copy=NULL, col.unfold=1:dim(obj)[2],
	label="label", value="value")
	.RA
	.AG obj
	the data frame to reshape
	.OA
	.AG col.copy
	The columns of `obj' to copy to result
	.AG col.unfold
	The columns of `obj' to be unfolded to rows
	.AG label
	the name of the label column
	.AG value
	the name of the value column
	.RT
	A data frame consisting of the copied columns replicated p times
	(where p is number of unfolded columns) and two extra columns:
	label contains the name of the unfolded column
	value contains the value
	.SE
	none
	.DT
	see code below. The inverse of this function is table.trans.
	.SH WARNINGS
	the data in the unfolded columns should all have the same mode
	.SA
	`table.trans', as.data.frame.array
	.EX
	> air[1:5, ]
	ozone radiation temperature wind
	1 3.448217 190 67 7.4
	2 3.301927 118 72 8.0
	3 2.289428 149 74 12.6
	4 2.620741 313 62 11.5
	5 2.843867 299 65 8.6
	> reshape(air[1:5,],2,c(1,3,4),"attribute")
	radiation attribute value
	1 190 ozone 3.448217
	2 118 ozone 3.301927
	3 149 ozone 2.289428
	4 313 ozone 2.620741
	5 299 ozone 2.843867
	6 190 temperature 67.000000
	7 118 temperature 72.000000
	8 149 temperature 74.000000
	9 313 temperature 62.000000
	10 299 temperature 65.000000
	11 190 wind 7.400000
	12 118 wind 8.000000
	13 149 wind 12.600000
	14 313 wind 11.500000
	15 299 wind 8.600000
	> reshape(air[1:5,],label="attribute")
	attribute value
	1 ozone 3.448217
	2 ozone 3.301927
	3 ozone 2.289428
	4 ozone 2.620741
	5 ozone 2.843867
	6 radiation 190.000000
	7 radiation 118.000000
	8 radiation 149.000000
	9 radiation 313.000000
	10 radiation 299.000000
	11 temperature 67.000000
	12 temperature 72.000000
	13 temperature 74.000000
	14 temperature 62.000000
	15 temperature 65.000000
	16 wind 7.400000
	17 wind 8.000000
	18 wind 12.600000
	19 wind 11.500000
	20 wind 8.600000
	.KW misc
	.WR


	4444444444444444444444444444444444444444444444444444444444444444444444444444444

	******** From: Bill Dunlap <[email protected]> ********

	> 4) JOIN these tables (E, F):
	>
	> A 1
	> B 2
	> C 3
	>
	> A a
	> B a
	> B a
	> B b
	> C c
	> A b
	>
	> to give (G):
	>
	> A a 1
	> B a 2
	> B a 2
	> B b 2
	> C c 3
	> A b 1

	Use merge():

	> E<-data.frame(Group1=c("A","B","C"), Number=1:3)
	> # F is a reserved word, so I call it FF
	> FF <- data.frame(Group1=c("A","B","B","B","C","A"), Group2=c("a","a","a","b","c","b"))
	> G <- merge(FF, E)
	> G
	Group1 Group2 Number
	1 A a 1
	2 A b 1
	3 B a 2
	4 B a 2
	5 B b 2
	6 C c 3

	You get a different order but I think it is an equivalent table.


	******** From: Ed Kademan <[email protected]> ********

	E <- data.frame(c1 = c('A', 'B', 'C'), c2 = 1:3)
	F <- data.frame(c1 = c('A', 'B', 'B', 'B', 'C', 'A'),
	c2 = c('a', 'a', 'a', 'b', 'c', 'b'))
	## F is a very bad name for a variable in R, by the way.

	G <- cbind(F, E[[2]][match(F[[1]], E[[1]])])


	******** From: [email protected] ********

	merge(df1,df2) # if fields have common name


	5555555555555555555555555555555555555555555555555555555555555555555555555555555

	******** From: Bill Dunlap <[email protected]> ********

	> 5) SUBSET Table G to give H:
	>
	> A a 1
	> A b 1

	You didn't say what subset this represents. To get the first 2 rows
	use G[1:2,]. To get the rows with Number==1 use G[ G$Number==1,].
	To get the rows with Group1=="A" use G[ G$Group1=="A",]. To get the
	rows with Group1=="A" and Group2 != "c" use G[G$Group1=="A" & G$Group2!="c",].


	******** From: Ed Kademan <[email protected]> ********

	H <- G[G[[1]] == 'A',]
	## I am assuming you just want those rows whose first column contains
	## 'A.'


	******** From: [email protected] ********

	df[df[,1] == "A",]
	also see help(subscript)


	6666666666666666666666666666666666666666666666666666666666666666666666666666666

	******** From: Bill Dunlap <[email protected]> ********

	> 6) TRANSPOSE H to give:
	>
	> A A
	> a b
	> 1 1

	Transpose is harder because data frames must have only one type of data
	per column. Our transpose function converts the data frame into a
	matrix (which has one type for the whole matrix) and returns the transpose
	of that matrix. You would have to convert that back to a data frame:

	> data.frame(t(H))
	X1 X2
	Group1 A A
	Group2 a b
	Number 1 1


	******** From: Ed Kademan <[email protected]> ********

	transposeH <- t(H)


	******** From: [email protected] ********

	t(df)


	7777777777777777777777777777777777777777777777777777777777777777777777777777777

	******** From: Bill Dunlap <[email protected]> ********

	> 7) and finally SORT, by up to 3 keys.

	Use order() and subscripting. E.g.,
	> G[ order(G$Group1, G$Group2, G$Number), ]
	Group1 Group2 Number
	1 A a 1
	2 A b 1
	3 B a 2
	4 B a 2
	5 B b 2
	6 C c 3
	> G[ order(G$Number, G$Group2), ]
	Group1 Group2 Number
	1 A a 1
	2 A b 1
	3 B a 2
	4 B a 2
	5 B b 2
	6 C c 3


	******** From: Ed Kademan <[email protected]> ********

	dfr <- as.data.frame(matrix(sample(5, 120, replace = T), 30, 4))
	sortedDfr <- dfr[do.call('order', dfr[1:3]),]
	## The first column is the primary key, the second is the secondary
	## key, and the third is tertiary.


	******** From: [email protected] ********

	use order()

	e.g. df[order(df$a,df$b,df$c),]


	===============================================================================
No results found