Skip to content

Instantly share code, notes, and snippets.

@yutannihilation
Last active July 16, 2018 05:52
Show Gist options
  • Select an option

  • Save yutannihilation/fc1cd4a44b5487a52360d2184c480501 to your computer and use it in GitHub Desktop.

Select an option

Save yutannihilation/fc1cd4a44b5487a52360d2184c480501 to your computer and use it in GitHub Desktop.

参考

nest_join()full_join()以外のjoinを一般化するものらしい。おもしろい。

nest_join() is the most fundamental join since you can recreate the other joins from it. An inner_join() is a nest_join() plus an [tidyr::unnest()], and left_join() is a nest_join() plus an unnest(drop = FALSE). A semi_join() is a nest_join() plus a filter() where you check that every element of data has at least one row, and an anti_join() is a nest_join() plus a filter() where you check every element has zero rows. (https://github.com/tidyverse/dplyr/blob/c85abf0f5f6279cecee45ea6c88daf34ae6eff9a/R/join.r#L57-L61)

使い方

Mutating join

とりあえずnest_join()してみるとこんな感じ。

library(dplyr, warn.conflicts = FALSE)

d <- band_members %>%
  nest_join(band_instruments)
#> Joining, by = "name"

d
#> # A tibble: 3 x 3
#>   name  band    data            
#> * <chr> <chr>   <list>          
#> 1 Mick  Stones  <tibble [0 x 1]>
#> 2 John  Beatles <tibble [1 x 1]>
#> 3 Paul  Beatles <tibble [1 x 1]>

dataはそれぞれこんな感じ。

d$data
#> [[1]]
#> # A tibble: 0 x 1
#> # ... with 1 variable: plays <chr>
#> 
#> [[2]]
#> # A tibble: 1 x 1
#>   plays 
#>   <chr> 
#> 1 guitar
#> 
#> [[3]]
#> # A tibble: 1 x 1
#>   plays
#>   <chr>
#> 1 bass

これを使ってinner_join()と同じことをするにはunnest()

tidyr::unnest(d)
#> # A tibble: 2 x 3
#>   name  band    plays 
#>   <chr> <chr>   <chr> 
#> 1 John  Beatles guitar
#> 2 Paul  Beatles bass

left_join()と同じことをするにはunnest(.drop = FALSE)...と思ったけどこれはまだうまく動かないっぽい。 tidyverse/tidyr#358 あたりか?

tidyr::unnest(d, .drop = FALSE)
#> # A tibble: 2 x 3
#>   name  band    plays 
#>   <chr> <chr>   <chr> 
#> 1 John  Beatles guitar
#> 2 Paul  Beatles bass

Filtering join

https://dplyr.tidyverse.org/articles/two-table.html#filtering-joins の例

df1 <- data_frame(x = c(1, 1, 3, 4), y = 1:4)
df2 <- data_frame(x = c(1, 1, 2), z = c("a", "b", "a"))

ふつうにsemi_join()で絞り込むときはこんな感じ。

df1 %>%
  semi_join(df2, by = "x")
#> # A tibble: 2 x 2
#>       x     y
#>   <dbl> <int>
#> 1     1     1
#> 2     1     2

一方、nest_join()でやってみる。

nest_join()するとこんな感じになる。

df1 %>%
  nest_join(df2, by = "x")
#> # A tibble: 4 x 3
#>       x     y data            
#> * <dbl> <int> <list>          
#> 1     1     1 <tibble [2 x 1]>
#> 2     1     2 <tibble [2 x 1]>
#> 3     3     3 <tibble [0 x 1]>
#> 4     4     4 <tibble [0 x 1]>

dataの行数が1行以上あるもののみに絞り込めばいい。

df1 %>%
  nest_join(df2, by = "x") %>%
  filter(purrr::map_int(data, nrow) > 0)
#> # A tibble: 2 x 3
#>       x     y data            
#>   <dbl> <int> <list>          
#> 1     1     1 <tibble [2 x 1]>
#> 2     1     2 <tibble [2 x 1]>

Created on 2018-07-16 by the reprex package (v0.2.0).

Session info
devtools::session_info()
#> Session info -------------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.5.1 (2018-07-02)
#>  system   x86_64, mingw32             
#>  ui       RTerm                       
#>  language (EN)                        
#>  collate  Japanese_Japan.932          
#>  tz       Asia/Tokyo                  
#>  date     2018-07-16
#> Packages -----------------------------------------------------------------
#>  package    * version     date       source                          
#>  assertthat   0.2.0       2017-04-11 CRAN (R 3.5.0)                  
#>  backports    1.1.2       2017-12-13 CRAN (R 3.5.0)                  
#>  base       * 3.5.1       2018-07-02 local                           
#>  bindr        0.1.1       2018-03-13 CRAN (R 3.5.0)                  
#>  bindrcpp   * 0.2.2       2018-03-29 CRAN (R 3.5.0)                  
#>  cli          1.0.0       2017-11-05 CRAN (R 3.5.0)                  
#>  compiler     3.5.1       2018-07-02 local                           
#>  crayon       1.3.4       2017-09-16 CRAN (R 3.5.0)                  
#>  datasets   * 3.5.1       2018-07-02 local                           
#>  devtools     1.13.6      2018-06-27 CRAN (R 3.5.0)                  
#>  digest       0.6.15      2018-01-28 CRAN (R 3.5.0)                  
#>  dplyr      * 0.7.99.9000 2018-07-16 local                           
#>  evaluate     0.10.1      2017-06-24 CRAN (R 3.5.0)                  
#>  fansi        0.2.3       2018-05-06 CRAN (R 3.5.1)                  
#>  glue         1.2.0       2017-10-29 CRAN (R 3.5.0)                  
#>  graphics   * 3.5.1       2018-07-02 local                           
#>  grDevices  * 3.5.1       2018-07-02 local                           
#>  htmltools    0.3.6       2017-04-28 CRAN (R 3.5.0)                  
#>  knitr        1.20.2      2018-05-10 local                           
#>  magrittr     1.5         2014-11-22 CRAN (R 3.5.0)                  
#>  memoise      1.1.0       2018-06-13 Github (hadley/memoise@06d16ec) 
#>  methods    * 3.5.1       2018-07-02 local                           
#>  pillar       1.3.0       2018-07-14 CRAN (R 3.5.1)                  
#>  pkgconfig    2.0.1       2017-03-21 CRAN (R 3.5.0)                  
#>  purrr        0.2.5       2018-05-29 CRAN (R 3.5.0)                  
#>  R6           2.2.2       2017-06-17 CRAN (R 3.5.0)                  
#>  Rcpp         0.12.17     2018-05-18 CRAN (R 3.5.0)                  
#>  rlang        0.2.1       2018-05-30 CRAN (R 3.5.0)                  
#>  rmarkdown    1.10        2018-06-11 CRAN (R 3.5.0)                  
#>  rprojroot    1.3-2       2018-01-03 CRAN (R 3.5.0)                  
#>  stats      * 3.5.1       2018-07-02 local                           
#>  stringi      1.2.3       2018-06-12 CRAN (R 3.5.0)                  
#>  stringr      1.3.1       2018-05-10 CRAN (R 3.5.0)                  
#>  tibble       1.4.2       2018-01-22 CRAN (R 3.5.0)                  
#>  tidyr        0.8.1       2018-05-18 CRAN (R 3.5.0)                  
#>  tidyselect   0.2.4       2018-02-26 CRAN (R 3.5.0)                  
#>  tools        3.5.1       2018-07-02 local                           
#>  utf8         1.1.4       2018-05-24 CRAN (R 3.5.0)                  
#>  utils      * 3.5.1       2018-07-02 local                           
#>  withr        2.1.2       2018-06-26 Github (jimhester/withr@fe56f20)
#>  yaml         2.1.19      2018-05-01 CRAN (R 3.5.0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment