-
rbindlist
fromdata.table
is very efficient in binding multiple rows of a data frame. -
According to here, the most efficient way to remove a column is to
library("data.table") # set from data.table set(my_df, j = "A", value = NULL)
-
-
Save NobodyXu/39479d5a226d55417f11515df16ec419 to your computer and use it in GitHub Desktop.
-
Semantics:
-
assignment:
- When assigning a variable to another name, eg,
a = b
, a new object is created. However, no data is copied due to the copy-on-modify
- When assigning a variable to another name, eg,
-
In order to
xor
booleans, usexor(a, b)
. -
reminder and quotient
%%
for reminder and%/%
for quotient.
-
For accessing
list
insidelist
,[[index]]
must be used. -
For returning a vector from a
data.frame
ordata.table
,df[[one_list_index]]
must be used. -
slicing:
- Slicing happens when you
[]
a container (vector
,list
, etc) using more than one index, generated byseq
or:
orc()
. The index used can be integers or charaters. - When slicing a list, a shallow copy of the subset of the original container will be created. That is, a new list will be created, but the elements in it will be just reference to the original with the
copy-on-modify
semantics.See here for more. - Positive integer slicing
- When slicing using positive integer(s), only the elements specified by the integers will be in the new subset.
- Negative integer slicing
- This works the opposite way of positve integer slicing. Only the elements specified by the integers will not be present in the subset. See here for more.
- Slicing happens when you
-
subset(x, sekect)
functionsubset
function can be used to remove column easily:subset(df, select = -column_name_to_remove) # "column_name_to_remove" is not a character, it is just the name
-
Compare an array/data frame with a singel value and generate an array/data frame of same dim
- Compare each element of it with the value and the result can be indexed in the same way the array/data frame can be indexed. E.g.
v == value
ordataframe$column_name == value
.
- Compare each element of it with the value and the result can be indexed in the same way the array/data frame can be indexed. E.g.
-
Count
TRUE
swhich(x)
, wherex
is a logical vector/array, it returns an integer vector withlength
equal tosum(x)
, ie. the number ofTRUE
s.sum(x)
can also do a similar job, just likewhich
.- It seems that
sum(bools)
is faster thanlength(which(bools))
when thebools
is considerably long.
-
Def function:
name_of_function = function(arg1, arg2 = 1) {# There can be default values to arguments # expr # The return statement is not always necessary. When there is only one expr in the function, the result of it will be # returned atomatically by R. return (expr) # If expr is omitted, NULL will be returned. expr can even be a funciton
-
To be precise, I will call it the definition of
lambda
instead of normal function. -
Here, function is stored variable.
function
can also be used inside of the definition of anotherfunction
body. -
It is also worth noting that a function can access the variable that is defined in the env where the function is defined.
-
-
stop
:stop
is a class that can be constructed with a message and passed as function arguments. It stops the execution of the current expression and executes and error action.
-
for
loop:for (each in collections) {# collections can be vector, list, data frame, matrix, etc) expr }
-
Speeding up your R code - vectorisation tricks for beginners shows that loops are exensive on large data compared to
apply
function family writen inR
and the external call toC
functions are even quicker. -
However, this is not always true. So it is better to do benchmark and understand what is under the hood to use them correctly.
-
-
while
,if
,else
works just like inC
-
switch
:switch
inR
is like a function.switch(VALUE, COND1_ret_value, ...)
.
-
-
Builtin data structures:
vector
andlist
vector
vector
is a homologous container. Since there is only one type of elements, the elements is stored continously.vector
also has lower memory consumption compared tolist
iflength
is not too large.vector(mode = "logical", length = 0)
is used to construct anlength
-longvector
storing elements of typemode
. For how elements are allocated, seehelp(vector)
.c(...)
can be used to initialize avector
. It can also be used to combine vectors, new elements of the same type to become onevector
(notvector
ofvector
).
list
list
is a heterogenous container, so it stores each elements by storing a pointer to it. It is very usefull since you get make alist
oflist
usinglist(...)
.c(...)
can be used to combinelist
and any other type of new elements together into onelist
(notlist
oflist
).- To make
list
oflist
, you need to uselist(...)
to combinelist
s.
- To append to a
list
orvector
, you need to uselist.append(.data, ...)
from pacakgerlist
, where.data
is the container and...
is the elements. - Insert: using
list.insert(.data, index, ...)
fromrlist
. push_front
: usinglist.prepend(.data, ...)
fromrlist
.
vector
oflogical
- To perform
&&
,||
or!
action onvector
oflogical
: use&
,|
or!
.
- To perform
-
Builtin funcitons:
- help(x)?x
- ??x
- Provid manual page about x.
- object.size(x)
- Get the size of an aobject.
- rm(x)
- Delete the name
x
and release its release if no other names use it (due to copy-on-modify semantics).
- Delete the name
- gc()
- Do garbage collection immediately. It can be usefull to call after a large object have been removed and return memory to the
- operating system. GC happens automatically without any user intervention, so normally a call to gc() isn't necessary and
- can hurt the performance if call it after the removal every object. For more, see
help(gc)
and help(gctorture)`.
- help(Memory):
- Documents how objects are allocated in
R
.
- Documents how objects are allocated in
-
Making packages
- write
DESCRIPTION
file at the root of the project:
Package: Helloworld Title: What The Package Does (one line, title case required) Version: 0.1 Author: person("First", "Last", email = "[email protected]", Maintainer: Description: What the package does (one paragraph) Depends: R (>= 3.1.0) License: What license is it under? LazyData: true ByteCompile: true RoxygenNote: 6.1.1
-
Put code into
root_of_pack/R/*.R
. -
Then run
roxygenise()
from packageroxygen2
with current working dir at the root of the project orroxygenise(root_of_project)
.
The info above is from Creating R packages, the byte compiler and from running
vignette("roxygen2", package = "roxygen2")
(it does not needlibrary("roxygen2")
to work).-
Then run
R CMD check --check-subdirs=yes root_of_pack
and fix any error. -
Then run
R CMD build root_of_pack
to generate a*.tar.gz
. -
Run
R CMD check --check-subdirs=yes *.tar.gz
where*.tar.gz
is generated by the previous step. -
RUn
R CMD INSTALL *.tar.gz
to install the package.
For more info on packages, check here.
- write