User-friendly interface that synthesizes power of dplyr::left_join and findInterval.

break_join(x, y, brk = character(), by = NULL, ...)

Arguments

x

A data frame.

y

Data frame containing desired reference information.

brk

Name of column in x and y to join by via interval overlapping. Must be coercible to numeric.

by

Joining variables, if needed. See mutate-joins.

...

additional arguments automatically directed to findInterval and dplyr::left_join. No partial matching.

Value

An object of the same type as x.

  • All x rows will be returned.

  • All columns between x and y are returned.

  • Rows in y are matched with x based on overlapping values of brk (e.g. findInterval(x$brk, y$brk, ...)).

Examples

# joining USA + UK leaders with population time-series break_join(us_uk_pop, us_uk_leaders, brk = c("date" = "start"))
#> Joining, by = "country"
#> # A tibble: 38 x 5 #> country date population name party #> <chr> <date> <int> <chr> <chr> #> 1 USA 1995-01-20 268039654 Clinton Democratic #> 2 USA 1996-01-20 271231546 Clinton Democratic #> 3 USA 1997-01-19 274606475 Clinton Democratic #> 4 USA 1998-01-20 278053607 Clinton Democratic #> 5 USA 1999-01-20 281419130 Clinton Democratic #> 6 USA 2000-01-19 284594395 Clinton Democratic #> 7 USA 2001-01-18 287532638 Clinton Democratic #> 8 USA 2002-01-19 290270187 Bush Republican #> 9 USA 2003-01-18 292883010 Bush Republican #> 10 USA 2004-01-17 295487267 Bush Republican #> # … with 28 more rows
# simple dataset set.seed(1) a <- data.frame(p = c(rep("A", 10), rep("B", 10)), q = runif(20, 0, 10)) b <- data.frame(p = c("A", "A", "B", "B"), q = c(3, 5, 6, 9), r = c("a1", "a2", "b1", "b2")) break_join(a, b, brk = "q") # p identified as common variable automatically
#> Joining, by = "p"
#> p q r #> 1 A 0.6178627 <NA> #> 2 A 2.0168193 <NA> #> 3 A 2.6550866 <NA> #> 4 A 3.7212390 a1 #> 5 A 5.7285336 a2 #> 6 A 6.2911404 a2 #> 7 A 6.6079779 a2 #> 8 A 8.9838968 a2 #> 9 A 9.0820779 a2 #> 10 A 9.4467527 a2 #> 11 B 1.7655675 <NA> #> 12 B 2.0597457 <NA> #> 13 B 3.8003518 <NA> #> 14 B 3.8410372 <NA> #> 15 B 4.9769924 <NA> #> 16 B 6.8702285 b1 #> 17 B 7.1761851 b1 #> 18 B 7.6984142 b1 #> 19 B 7.7744522 b1 #> 20 B 9.9190609 b2
break_join(a, b, brk = "q", by = "p") # same result
#> p q r #> 1 A 0.6178627 <NA> #> 2 A 2.0168193 <NA> #> 3 A 2.6550866 <NA> #> 4 A 3.7212390 a1 #> 5 A 5.7285336 a2 #> 6 A 6.2911404 a2 #> 7 A 6.6079779 a2 #> 8 A 8.9838968 a2 #> 9 A 9.0820779 a2 #> 10 A 9.4467527 a2 #> 11 B 1.7655675 <NA> #> 12 B 2.0597457 <NA> #> 13 B 3.8003518 <NA> #> 14 B 3.8410372 <NA> #> 15 B 4.9769924 <NA> #> 16 B 6.8702285 b1 #> 17 B 7.1761851 b1 #> 18 B 7.6984142 b1 #> 19 B 7.7744522 b1 #> 20 B 9.9190609 b2
break_join(a, b, brk = "q", all.inside = TRUE) # note missing values have been filled
#> Joining, by = "p"
#> p q r #> 1 A 0.6178627 a1 #> 2 A 2.0168193 a1 #> 3 A 2.6550866 a1 #> 4 A 3.7212390 a1 #> 5 A 5.7285336 a1 #> 6 A 6.2911404 a1 #> 7 A 6.6079779 a1 #> 8 A 8.9838968 a1 #> 9 A 9.0820779 a1 #> 10 A 9.4467527 a1 #> 11 B 1.7655675 b1 #> 12 B 2.0597457 b1 #> 13 B 3.8003518 b1 #> 14 B 3.8410372 b1 #> 15 B 4.9769924 b1 #> 16 B 6.8702285 b1 #> 17 B 7.1761851 b1 #> 18 B 7.6984142 b1 #> 19 B 7.7744522 b1 #> 20 B 9.9190609 b1
# joining toll prices with vehicle time-series library(mopac)
#> Error in library(mopac): there is no package called ‘mopac’
library(dplyr, warn.conflicts = FALSE) library(hms) express %>% mutate(time_hms = as_hms(time)) %>% break_join(rates, brk = c("time_hms" = "time"))
#> Error in mutate(., time_hms = as_hms(time)): object 'express' not found