About R

Semantics:
- assignment:
  - When assigning a variable to another name, eg, a = b, a new object is created. However, no data is copied due to the copy-on-modify
- In order to xor booleans, use xor(a, b).
- reminder and quotient
  - %% for reminder and %/% for quotient.
- For accessing list inside list, [[index]] must be used.
- For returning a vector from a data.frame or data.table, df[[one_list_index]] must be used.
- slicing:
  - Slicing happens when you [] a container (vector, list, etc) using more than one index, generated by seq or : or c(). The index used can be integers or charaters.
  - When slicing a list, a shallow copy of the subset of the original container will be created. That is, a new list will be created, but the elements in it will be just reference to the original with the copy-on-modify semantics.See here for more.
  - Positive integer slicing
    - When slicing using positive integer(s), only the elements specified by the integers will be in the new subset.
  - Negative integer slicing
    - This works the opposite way of positve integer slicing. Only the elements specified by the integers will not be present in the subset. See here for more.
- subset(x, sekect) function
  - subset function can be used to remove column easily:
```
subset(df, select = -column_name_to_remove) # "column_name_to_remove" is not a character, it is just the name
```
- Compare an array/data frame with a singel value and generate an array/data frame of same dim
  - Compare each element of it with the value and the result can be indexed in the same way the array/data frame can be indexed. E.g. v == value or dataframe$column_name == value.
- Count TRUEs
  - which(x), where x is a logical vector/array, it returns an integer vector with length equal to sum(x), ie. the number of TRUEs.
  - sum(x) can also do a similar job, just like which.
  - It seems that sum(bools) is faster than length(which(bools)) when the bools is considerably long.
- Def function:
```
name_of_function = function(arg1, arg2 = 1) {# There can be default values to arguments
     # expr
     # The return statement is not always necessary. When there is only one expr in the function, the result of it will be
     # returned atomatically by R.
     return (expr) # If expr is omitted, NULL will be returned. expr can even be a funciton
```
  - To be precise, I will call it the definition of lambda instead of normal function.
  - Here, function is stored variable. function can also be used inside of the definition of another function body.
  - It is also worth noting that a function can access the variable that is defined in the env where the function is defined.
- stop:
  - stop is a class that can be constructed with a message and passed as function arguments. It stops the execution of the current expression and executes and error action.
- for loop:
```
    for (each in collections) {# collections can be vector, list, data frame, matrix, etc)
        expr
    }
```
  - Speeding up your R code - vectorisation tricks for beginners shows that loops are exensive on large data compared to apply function family writen in R and the external call to C functions are even quicker.
  - However, this is not always true. So it is better to do benchmark and understand what is under the hood to use them correctly.
- while, if, else works just like in C
- switch:
  - switch in R is like a function. switch(VALUE, COND1_ret_value, ...).
Builtin data structures:
- vector and list
  - vector
    - vector is a homologous container. Since there is only one type of elements, the elements is stored continously. vector also has lower memory consumption compared to list if length is not too large.
    - vector(mode = "logical", length = 0) is used to construct an length-long vector storing elements of type mode. For how elements are allocated, see help(vector).
    - c(...) can be used to initialize a vector. It can also be used to combine vectors, new elements of the same type to become one vector(not vector of vector).
  - list
    - list is a heterogenous container, so it stores each elements by storing a pointer to it. It is very usefull since you get make a list of list using list(...).
    - c(...) can be used to combine list and any other type of new elements together into one list (not list of list).
    - To make list of list, you need to use list(...) to combine lists.
  - To append to a list or vector, you need to use list.append(.data, ...) from pacakge rlist, where .data is the container and ... is the elements.
  - Insert: using list.insert(.data, index, ...) from rlist.
  - push_front: using list.prepend(.data, ...) from rlist.
- vector of logical
  - To perform &&, || or ! action on vector of logical: use &, | or !.
Builtin funcitons:
- help(x)?x
- ??x
  - Provid manual page about x.
- object.size(x)
  - Get the size of an aobject.
- rm(x)
  - Delete the name x and release its release if no other names use it (due to copy-on-modify semantics).
- gc()
  - Do garbage collection immediately. It can be usefull to call after a large object have been removed and return memory to the
  - operating system. GC happens automatically without any user intervention, so normally a call to gc() isn't necessary and
  - can hurt the performance if call it after the removal every object. For more, see help(gc) and help(gctorture)`.
- help(Memory):
  - Documents how objects are allocated in R.
Making packages
1. write DESCRIPTION file at the root of the project:
```
Package: Helloworld
Title: What The Package Does (one line, title case required)
Version: 0.1
Author: person("First", "Last", email = "[email protected]",
Maintainer:
Description: What the package does (one paragraph)
Depends: R (>= 3.1.0)
License: What license is it under?
LazyData: true
ByteCompile: true
RoxygenNote: 6.1.1
```
1. Put code into root_of_pack/R/*.R.
2. Then run roxygenise() from package roxygen2 with current working dir at the root of the project or roxygenise(root_of_project).
The info above is from Creating R packages, the byte compiler and from running vignette("roxygen2", package = "roxygen2") (it does not need library("roxygen2") to work).
1. Then run R CMD check --check-subdirs=yes root_of_pack and fix any error.
2. Then run R CMD build root_of_pack to generate a *.tar.gz.
3. Run R CMD check --check-subdirs=yes *.tar.gz where *.tar.gz is generated by the previous step.
4. RUn R CMD INSTALL *.tar.gz to install the package.
For more info on packages, check here.

NobodyXu/R.md