sift.Rd
Imagine dplyr::filter
that includes neighboring observations.
Choose how many observations to include by adjusting inputs sift.col
and scope
.
sift(.data, sift.col, scope, ...)
.data | A data frame. |
---|---|
sift.col | Column name, as symbol, to serve as "sifting/augmenting" dimension. Must be non-missing and coercible to numeric. |
scope | Specifies augmentation bandwidth relative to "key" observations. Parameter should share the same scale as If length 1, bandwidth used is +/- If length 2, bandwidth used is (- |
... | Expressions passed to |
A sifted data frame, with 2 additional columns:
.cluster <int>
: Identifies resulting group formed by each key observation and its neighboring rows. When the key observations are close enough together, the clusters will overlap.
.key <lgl>
: TRUE
indicates key observation.
sift()
can be understood as a 2-step process:
.data
is passed to dplyr::filter
, using subsetting expression(s) provided in ...
. We'll refer to these intermediate results as "key" observations.
For each key observation, sift
expands the row selection bidirectionally along dimension specified by sift.col
. Any row from the original dataset within scope
units of a key observation is captured in the final result.
Essentially, this allows us to "peek" at neighboring rows surrounding the key observations.
# See current events from same timeframe as 2020 Utah Monolith discovery. sift(nyt2020, pub_date, scope = 2, grepl("Monolith", headline))#> # A tibble: 15 x 8 #> headline abstract byline pub_date section_name web_url .cluster .key #> <chr> <chr> <chr> <date> <chr> <chr> <dbl> <lgl> #> 1 Biden Has… The presi… "By Gi… 2020-11-23 U.S. https:/… 1 FALSE #> 2 Pat Quinn… Mr. Quinn… "By Co… 2020-11-23 U.S. https:/… 1 FALSE #> 3 Business … At the ur… "By Ka… 2020-11-23 U.S. https:/… 1 FALSE #> 4 Pandemic … As urbani… "By Ji… 2020-11-23 U.S. https:/… 1 FALSE #> 5 No, Joe B… The video… "By Li… 2020-11-23 Technology https:/… 1 FALSE #> 6 Monolith … A metal m… "By St… 2020-11-24 Science https:/… 1 TRUE #> 7 Coronavir… Upper Man… "By Tr… 2020-11-24 New York https:/… 1 FALSE #> 8 Two Darwi… Cambridge… "By Me… 2020-11-24 World https:/… 1 FALSE #> 9 Recent Co… Recent co… "By Is… 2020-11-24 Business Day https:/… 1 FALSE #> 10 Trump Adm… A key off… "By Mi… 2020-11-24 U.S. https:/… 1 FALSE #> 11 The C.D.C… Federal h… "By Ro… 2020-11-25 World https:/… 1 FALSE #> 12 A Poem of… The New Y… "" 2020-11-25 U.S. https:/… 1 FALSE #> 13 Casualtie… The war i… "By Ri… 2020-11-25 World https:/… 1 FALSE #> 14 Iran Free… Iranian s… "By Fa… 2020-11-25 World https:/… 1 FALSE #> 15 A Poem of… The New Y… "" 2020-11-25 U.S. https:/… 1 FALSE# or Biden's presidential victory. sift(nyt2020, pub_date, scope = 2, grepl("Biden is elected", headline))#> # A tibble: 15 x 8 #> headline abstract byline pub_date section_name web_url .cluster .key #> <chr> <chr> <chr> <date> <chr> <chr> <dbl> <lgl> #> 1 As China’… New telev… By Viv… 2020-11-06 World https:/… 1 FALSE #> 2 Al Roker,… Mr. Roker… By Joh… 2020-11-06 Business Day https:/… 1 FALSE #> 3 The Lates… A new stu… By Ame… 2020-11-06 U.S. https:/… 1 FALSE #> 4 Secretari… While sta… By Sha… 2020-11-06 U.S. https:/… 1 FALSE #> 5 Democracy… WILMINGTO… By Tho… 2020-11-06 U.S. https:/… 1 FALSE #> 6 Joe Biden… WILMINGTO… By Kat… 2020-11-07 U.S. https:/… 1 TRUE #> 7 Biden def… Joseph R.… By Mik… 2020-11-07 U.S. https:/… 1 FALSE #> 8 Tension, … The news … By Joh… 2020-11-07 Business Day https:/… 1 FALSE #> 9 Voters Sa… About a f… By Sab… 2020-11-07 U.S. https:/… 1 FALSE #> 10 After War… Amid the … By Rei… 2020-11-07 U.S. https:/… 1 FALSE #> 11 Turkey’s … President… By Car… 2020-11-08 Business Day https:/… 1 FALSE #> 12 There’s n… On Twitte… By Jim… 2020-11-08 U.S. https:/… 1 FALSE #> 13 A Nation … Nebraska … By Dio… 2020-11-08 U.S. https:/… 1 FALSE #> 14 Five Take… As he add… By Ada… 2020-11-08 U.S. https:/… 1 FALSE #> 15 Read Joe … In his vi… By Mat… 2020-11-08 U.S. https:/… 1 FALSE# We can specify lower & upper scope to see what happened AFTER Trump tested positive. sift(nyt2020, pub_date, scope = c(0, 2), grepl("Trump Tests Positive", headline))#> # A tibble: 10 x 8 #> headline abstract byline pub_date section_name web_url .cluster .key #> <chr> <chr> <chr> <date> <chr> <chr> <dbl> <lgl> #> 1 "Trump Te… The presi… "By Pe… 2020-10-02 U.S. https:/… 1 TRUE #> 2 "‘You don… President… "By Da… 2020-10-02 U.S. https:/… 1 FALSE #> 3 "17 Repub… Seventeen… "By Ca… 2020-10-02 U.S. https:/… 1 FALSE #> 4 "TV news … Televisio… "By Mi… 2020-10-02 U.S. https:/… 1 FALSE #> 5 "" Positive … "" 2020-10-02 World https:/… 1 FALSE #> 6 "Battling… The blaze… "By Ma… 2020-10-03 World https:/… 1 FALSE #> 7 "Obama of… Former Pr… "By Li… 2020-10-03 U.S. https:/… 1 FALSE #> 8 "Contact … Tracing i… "By Be… 2020-10-03 World https:/… 1 FALSE #> 9 "What to … Dr. Conle… "By Al… 2020-10-03 U.S. https:/… 1 FALSE #> 10 "Trump’s … Outside e… "By Gi… 2020-10-03 Health https:/… 1 FALSE#> Error in library(mopac): there is no package called ‘mopac’express %>% group_by(direction) %>% sift(time, 30, plate == "EAS-1671") # row augmentation performed within groups.#> Error in group_by(., direction): object 'express' not found