Last active
April 4, 2016 10:07
-
-
Save stephlocke/dcb5f0dad688b69afb712f7c55824eea to your computer and use it in GitHub Desktop.
key setting in data.table
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--- | |
title: "data.table keys" | |
author: "Steph Locke" | |
date: "4 April 2016" | |
output: | |
md_document: | |
variant: markdown_github | |
--- | |
# Glossary | |
- **KEY** | |
- **COMPOSITE KEY** | |
# Key setting | |
You can set keys on data.tables to facilitate joins, improve querying speed, and to sort your data. You can set a key as you create a data.table with `data.table()`, and you can also set keys with dedicated functions, chiefly `setkey()` and `set2key()`. | |
The iris dataset will be used throughout. | |
```{r} | |
library(data.table) | |
head(setDT(copy(iris))) | |
``` | |
## `data.table()` | |
You can create keys as you create data.tables. | |
### `data.table()` | |
When you make a data.table object via `data.table()` there is an argument `key=`. `key=` allows you to set a key as you produce a data.table - this will perform sorting like `setkey()` would. | |
```{r} | |
irisDT<-data.table(iris, key="Sepal.Width") | |
head(irisDT) | |
``` | |
### `setDT()` | |
Alternatively, the fast setting of a data.frame to data.table function `setDT()` also has a `key=` argument. | |
```{r} | |
irisDT<-setDT(copy(iris), key="Sepal.Width") | |
head(irisDT) | |
``` | |
## `setkey()` | |
`setkey()` assigns a key and performs physical sorting on the table. | |
```{r} | |
irisDT<-setDT(copy(iris)) | |
setkey(irisDT,Sepal.Length) | |
head(irisDT) | |
``` | |
It's possible to make a composite key: | |
```{r} | |
irisDT<-setDT(copy(iris)) | |
setkey(irisDT, Sepal.Length, Sepal.Width) | |
head(irisDT) | |
``` | |
The `setkey()` function takes named arguments but sometimes you may want to dynamically pass in column names. For this you can use the "v" variant `setkeyv()`: | |
```{r} | |
irisDT<-setDT(copy(iris)) | |
key<-c("Sepal.Width","Sepal.Length") | |
setkeyv(irisDT,key ) | |
head(irisDT) | |
``` | |
## `set2key()` | |
`set2key()` assigns a key and **does not** perform physical sorting on the table. | |
```{r} | |
irisDT<-setDT(copy(iris)) | |
set2key(irisDT,Sepal.Length) | |
head(irisDT) | |
``` | |
It's possible to make a composite key: | |
```{r} | |
irisDT<-setDT(copy(iris)) | |
set2key(irisDT, Sepal.Length, Sepal.Width) | |
head(irisDT) | |
``` | |
The `set2key()` function takes named arguments but sometimes you may want to dynamically pass in column names. For this you can use the "v" variant `set2keyv()`: | |
```{r} | |
irisDT<-setDT(copy(iris)) | |
key<-c("Sepal.Width","Sepal.Length") | |
set2keyv(irisDT,key ) | |
head(irisDT) | |
``` | |
----- | |
[Rmd file](https://gist.github.com/stephlocke/dcb5f0dad688b69afb712f7c55824eea) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment