nest_join()はfull_join()以外のjoinを一般化するものらしい。おもしろい。
nest_join()is the most fundamental join since you can recreate the other joins from it. Aninner_join()is anest_join()plus an [tidyr::unnest()], andleft_join()is anest_join()plus anunnest(drop = FALSE). Asemi_join()is anest_join()plus afilter()where you check that every element of data has at least one row, and ananti_join()is anest_join()plus afilter()where you check every element has zero rows. (https://github.com/tidyverse/dplyr/blob/c85abf0f5f6279cecee45ea6c88daf34ae6eff9a/R/join.r#L57-L61)
とりあえずnest_join()してみるとこんな感じ。
library(dplyr, warn.conflicts = FALSE)
d <- band_members %>%
nest_join(band_instruments)
#> Joining, by = "name"
d
#> # A tibble: 3 x 3
#> name band data
#> * <chr> <chr> <list>
#> 1 Mick Stones <tibble [0 x 1]>
#> 2 John Beatles <tibble [1 x 1]>
#> 3 Paul Beatles <tibble [1 x 1]>dataはそれぞれこんな感じ。
d$data
#> [[1]]
#> # A tibble: 0 x 1
#> # ... with 1 variable: plays <chr>
#>
#> [[2]]
#> # A tibble: 1 x 1
#> plays
#> <chr>
#> 1 guitar
#>
#> [[3]]
#> # A tibble: 1 x 1
#> plays
#> <chr>
#> 1 bassこれを使ってinner_join()と同じことをするにはunnest()
tidyr::unnest(d)
#> # A tibble: 2 x 3
#> name band plays
#> <chr> <chr> <chr>
#> 1 John Beatles guitar
#> 2 Paul Beatles bassleft_join()と同じことをするにはunnest(.drop = FALSE)...と思ったけどこれはまだうまく動かないっぽい。
tidyverse/tidyr#358 あたりか?
tidyr::unnest(d, .drop = FALSE)
#> # A tibble: 2 x 3
#> name band plays
#> <chr> <chr> <chr>
#> 1 John Beatles guitar
#> 2 Paul Beatles bass※https://dplyr.tidyverse.org/articles/two-table.html#filtering-joins の例
df1 <- data_frame(x = c(1, 1, 3, 4), y = 1:4)
df2 <- data_frame(x = c(1, 1, 2), z = c("a", "b", "a"))ふつうにsemi_join()で絞り込むときはこんな感じ。
df1 %>%
semi_join(df2, by = "x")
#> # A tibble: 2 x 2
#> x y
#> <dbl> <int>
#> 1 1 1
#> 2 1 2一方、nest_join()でやってみる。
nest_join()するとこんな感じになる。
df1 %>%
nest_join(df2, by = "x")
#> # A tibble: 4 x 3
#> x y data
#> * <dbl> <int> <list>
#> 1 1 1 <tibble [2 x 1]>
#> 2 1 2 <tibble [2 x 1]>
#> 3 3 3 <tibble [0 x 1]>
#> 4 4 4 <tibble [0 x 1]>dataの行数が1行以上あるもののみに絞り込めばいい。
df1 %>%
nest_join(df2, by = "x") %>%
filter(purrr::map_int(data, nrow) > 0)
#> # A tibble: 2 x 3
#> x y data
#> <dbl> <int> <list>
#> 1 1 1 <tibble [2 x 1]>
#> 2 1 2 <tibble [2 x 1]>Created on 2018-07-16 by the reprex package (v0.2.0).
Session info
devtools::session_info()
#> Session info -------------------------------------------------------------
#> setting value
#> version R version 3.5.1 (2018-07-02)
#> system x86_64, mingw32
#> ui RTerm
#> language (EN)
#> collate Japanese_Japan.932
#> tz Asia/Tokyo
#> date 2018-07-16
#> Packages -----------------------------------------------------------------
#> package * version date source
#> assertthat 0.2.0 2017-04-11 CRAN (R 3.5.0)
#> backports 1.1.2 2017-12-13 CRAN (R 3.5.0)
#> base * 3.5.1 2018-07-02 local
#> bindr 0.1.1 2018-03-13 CRAN (R 3.5.0)
#> bindrcpp * 0.2.2 2018-03-29 CRAN (R 3.5.0)
#> cli 1.0.0 2017-11-05 CRAN (R 3.5.0)
#> compiler 3.5.1 2018-07-02 local
#> crayon 1.3.4 2017-09-16 CRAN (R 3.5.0)
#> datasets * 3.5.1 2018-07-02 local
#> devtools 1.13.6 2018-06-27 CRAN (R 3.5.0)
#> digest 0.6.15 2018-01-28 CRAN (R 3.5.0)
#> dplyr * 0.7.99.9000 2018-07-16 local
#> evaluate 0.10.1 2017-06-24 CRAN (R 3.5.0)
#> fansi 0.2.3 2018-05-06 CRAN (R 3.5.1)
#> glue 1.2.0 2017-10-29 CRAN (R 3.5.0)
#> graphics * 3.5.1 2018-07-02 local
#> grDevices * 3.5.1 2018-07-02 local
#> htmltools 0.3.6 2017-04-28 CRAN (R 3.5.0)
#> knitr 1.20.2 2018-05-10 local
#> magrittr 1.5 2014-11-22 CRAN (R 3.5.0)
#> memoise 1.1.0 2018-06-13 Github (hadley/memoise@06d16ec)
#> methods * 3.5.1 2018-07-02 local
#> pillar 1.3.0 2018-07-14 CRAN (R 3.5.1)
#> pkgconfig 2.0.1 2017-03-21 CRAN (R 3.5.0)
#> purrr 0.2.5 2018-05-29 CRAN (R 3.5.0)
#> R6 2.2.2 2017-06-17 CRAN (R 3.5.0)
#> Rcpp 0.12.17 2018-05-18 CRAN (R 3.5.0)
#> rlang 0.2.1 2018-05-30 CRAN (R 3.5.0)
#> rmarkdown 1.10 2018-06-11 CRAN (R 3.5.0)
#> rprojroot 1.3-2 2018-01-03 CRAN (R 3.5.0)
#> stats * 3.5.1 2018-07-02 local
#> stringi 1.2.3 2018-06-12 CRAN (R 3.5.0)
#> stringr 1.3.1 2018-05-10 CRAN (R 3.5.0)
#> tibble 1.4.2 2018-01-22 CRAN (R 3.5.0)
#> tidyr 0.8.1 2018-05-18 CRAN (R 3.5.0)
#> tidyselect 0.2.4 2018-02-26 CRAN (R 3.5.0)
#> tools 3.5.1 2018-07-02 local
#> utf8 1.1.4 2018-05-24 CRAN (R 3.5.0)
#> utils * 3.5.1 2018-07-02 local
#> withr 2.1.2 2018-06-26 Github (jimhester/withr@fe56f20)
#> yaml 2.1.19 2018-05-01 CRAN (R 3.5.0)