Title: | Store Data About Rows |
---|---|
Description: | Tools for keeping track of information, named "keys", about rows of data frame like objects. This is done by creating special attribute "keys" which is updated after every change in rows (subsetting, ordering, etc.). This package is designed to work tightly with 'dplyr' package. |
Authors: | Evgeni Chasnovski [aut, cre] |
Maintainer: | Evgeni Chasnovski <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.7.9000 |
Built: | 2024-11-02 03:41:53 UTC |
Source: | https://github.com/echasnovski/keyholder |
keyholder
offers a set of tools for storing information about rows of data
frame like objects. The common use cases are:
Track rows of data frame without changing it.
Store columns for future restoring in data frame.
Hide columns for convenient use of dplyr's *_if scoped variants of verbs.
To learn more about keyholder
:
Browse vignettes with browseVignettes(package = "keyholder")
.
Look how to set keys.
Look at the list of supported functions.
Maintainer: Evgeni Chasnovski [email protected] (ORCID)
Useful links:
Report bugs at https://github.com/echasnovski/keyholder/issues/
These functions perform keying by selection of variables using corresponding scoped variant of select. Appropriate data frame is selected with scoped function first, and then it is assigned as keys.
key_by_all(.tbl, .funs = list(), ..., .add = FALSE, .exclude = FALSE) key_by_if(.tbl, .predicate, .funs = list(), ..., .add = FALSE, .exclude = FALSE) key_by_at(.tbl, .vars, .funs = list(), ..., .add = FALSE, .exclude = FALSE)
key_by_all(.tbl, .funs = list(), ..., .add = FALSE, .exclude = FALSE) key_by_if(.tbl, .predicate, .funs = list(), ..., .add = FALSE, .exclude = FALSE) key_by_at(.tbl, .vars, .funs = list(), ..., .add = FALSE, .exclude = FALSE)
.tbl |
Reference data frame . |
.funs |
Parameter for scoped functions. |
... |
Parameter for scoped functions. |
.add |
Whether to add keys to (possibly) existing ones. If |
.exclude |
Whether to exclude key variables from |
.predicate |
Parameter for scoped functions. |
.vars |
Parameter for scoped functions. |
mtcars %>% key_by_all(.funs = toupper) mtcars %>% key_by_if(rlang::is_integerish, toupper) mtcars %>% key_by_at(c("vs", "am"), toupper)
mtcars %>% key_by_all(.funs = toupper) mtcars %>% key_by_if(rlang::is_integerish, toupper) mtcars %>% key_by_at(c("vs", "am"), toupper)
Utility functions for keyed objects which are implemented with class
keyed_df
. Keyed object should be a data frame which inherits from
keyed_df
and contains a data frame of keys in attribute 'keys'.
is_keyed_df(.tbl) is.keyed_df(.tbl) ## S3 method for class 'keyed_df' print(x, ...) ## S3 method for class 'keyed_df' x[i, j, ...]
is_keyed_df(.tbl) is.keyed_df(.tbl) ## S3 method for class 'keyed_df' print(x, ...) ## S3 method for class 'keyed_df' x[i, j, ...]
.tbl |
Object to check. |
x |
Object to print or extract elements. |
... |
Further arguments passed to or from other methods. |
i , j
|
Arguments for |
is_keyed_df(mtcars) mtcars %>% key_by(vs) %>% is_keyed_df # Not valid keyed_df df <- mtcars class(df) <- c("keyed_df", "data.frame") is_keyed_df(df)
is_keyed_df(mtcars) mtcars %>% key_by(vs) %>% is_keyed_df # Not valid keyed_df df <- mtcars class(df) <- c("keyed_df", "data.frame") is_keyed_df(df)
Defined methods for dplyr generic single table functions. Most of them
preserve 'keyed_df' class and 'keys' attribute (excluding summarise
with
scoped variants, distinct
and do
which remove them). Also these methods
modify rows in keys according to the rows modification in reference
data frame (if any).
## S3 method for class 'keyed_df' select(.data, ...) ## S3 method for class 'keyed_df' rename(.data, ...) ## S3 method for class 'keyed_df' mutate(.data, ...) ## S3 method for class 'keyed_df' transmute(.data, ...) ## S3 method for class 'keyed_df' summarise(.data, ...) ## S3 method for class 'keyed_df' group_by(.data, ...) ## S3 method for class 'keyed_df' ungroup(x, ...) ## S3 method for class 'keyed_df' rowwise(data, ...) ## S3 method for class 'keyed_df' distinct(.data, ..., .keep_all = FALSE) ## S3 method for class 'keyed_df' do(.data, ...) ## S3 method for class 'keyed_df' arrange(.data, ..., .by_group = FALSE) ## S3 method for class 'keyed_df' filter(.data, ...) ## S3 method for class 'keyed_df' slice(.data, ...)
## S3 method for class 'keyed_df' select(.data, ...) ## S3 method for class 'keyed_df' rename(.data, ...) ## S3 method for class 'keyed_df' mutate(.data, ...) ## S3 method for class 'keyed_df' transmute(.data, ...) ## S3 method for class 'keyed_df' summarise(.data, ...) ## S3 method for class 'keyed_df' group_by(.data, ...) ## S3 method for class 'keyed_df' ungroup(x, ...) ## S3 method for class 'keyed_df' rowwise(data, ...) ## S3 method for class 'keyed_df' distinct(.data, ..., .keep_all = FALSE) ## S3 method for class 'keyed_df' do(.data, ...) ## S3 method for class 'keyed_df' arrange(.data, ..., .by_group = FALSE) ## S3 method for class 'keyed_df' filter(.data, ...) ## S3 method for class 'keyed_df' slice(.data, ...)
.data , data , x
|
A keyed object. |
... |
Appropriate arguments for functions. |
.keep_all |
Parameter for dplyr::distinct. |
.by_group |
Parameter for dplyr::arrange. |
dplyr::transmute()
is supported implicitly with dplyr::mutate()
support.
dplyr::rowwise()
is not supposed to be generic in dplyr
. Use
rowwise.keyed_df
directly.
All scoped variants of present functions are also supported.
mtcars %>% key_by(vs, am) %>% dplyr::mutate(gear = 1)
mtcars %>% key_by(vs, am) %>% dplyr::mutate(gear = 1)
Defined methods for dplyr generic join functions. All of them preserve 'keyed_df' class and 'keys' attribute of the first argument. Also these methods modify rows in keys according to the rows modification in first argument (if any).
## S3 method for class 'keyed_df' inner_join(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...) ## S3 method for class 'keyed_df' left_join(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...) ## S3 method for class 'keyed_df' right_join(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...) ## S3 method for class 'keyed_df' full_join(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...) ## S3 method for class 'keyed_df' semi_join(x, y, by = NULL, copy = FALSE, ...) ## S3 method for class 'keyed_df' anti_join(x, y, by = NULL, copy = FALSE, ...)
## S3 method for class 'keyed_df' inner_join(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...) ## S3 method for class 'keyed_df' left_join(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...) ## S3 method for class 'keyed_df' right_join(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...) ## S3 method for class 'keyed_df' full_join(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...) ## S3 method for class 'keyed_df' semi_join(x, y, by = NULL, copy = FALSE, ...) ## S3 method for class 'keyed_df' anti_join(x, y, by = NULL, copy = FALSE, ...)
x , y , by , copy , suffix , ...
|
Parameters for join functions. |
dplyr::band_members %>% key_by(band) %>% dplyr::semi_join(dplyr::band_instruments, by = "name") %>% keys()
dplyr::band_members %>% key_by(band) %>% dplyr::semi_join(dplyr::band_instruments, by = "name") %>% keys()
Functions for creating id column and key.
use_id(.tbl) compute_id_name(x) add_id(.tbl) key_by_id(.tbl, .add = FALSE, .exclude = FALSE)
use_id(.tbl) compute_id_name(x) add_id(.tbl) key_by_id(.tbl, .add = FALSE, .exclude = FALSE)
.tbl |
Reference data frame. |
x |
Character vector of names. |
.add , .exclude
|
Parameters for |
use_id()
assigns as keys a tibble with column '.id'
and row numbers of .tbl
as values.
compute_id_name()
computes the name which is different from every
element in x
by the following algorithm: if '.id' is not present in x
it
is returned; if taken - '.id1' is checked; if taken - '.id11' is checked and
so on.
add_id()
creates a column with unique name (computed with
compute_id_name()
) and row numbers as values (grouping is ignored). After
that puts it as first column.
key_by_id()
is similar to add_id()
: it creates a column with unique name
and row numbers as values (grouping is ignored) and calls key_by()
function
to use this column as key. If .add
is FALSE
unique name is computed based
on .tbl
column names; if TRUE
then based on .tbl
and its keys column
names.
mtcars %>% use_id() mtcars %>% add_id() mtcars %>% key_by_id(.exclude = TRUE)
mtcars %>% use_id() mtcars %>% add_id() mtcars %>% key_by_id(.exclude = TRUE)
keyholder offers scoped variants of the following functions:
key_by()
. See key_by_all().
.funs |
Parameter for scoped functions. |
.vars |
Parameter for scoped functions. |
.predicate |
Parameter for scoped functions. |
... |
Parameter for scoped functions. |
Not scoped manipulation functions
keyholder
supports the following functions:
Base subsetting with [.
dplyr
one table verbs.
dplyr
two table verbs.
Functions for getting information about keys.
keys(.tbl) raw_keys(.tbl) has_keys(.tbl)
keys(.tbl) raw_keys(.tbl) has_keys(.tbl)
.tbl |
Reference data frame. |
keys()
always returns a tibble of keys. In case of
no keys it returns a tibble with number of rows as in .tbl
and zero
columns. raw_keys()
is just a wrapper for attr(.tbl, "keys")
.
To know whether .tbl
has keys use has_keys()
.
keys(mtcars) raw_keys(mtcars) has_keys(mtcars) df <- key_by(mtcars, vs, am) keys(df) has_keys(df)
keys(mtcars) raw_keys(mtcars) has_keys(mtcars) df <- key_by(mtcars, vs, am) keys(df) has_keys(df)
Functions to manipulate keys.
remove_keys(.tbl, ..., .unkey = FALSE) restore_keys(.tbl, ..., .remove = FALSE, .unkey = FALSE) pull_key(.tbl, var) rename_keys(.tbl, ...)
remove_keys(.tbl, ..., .unkey = FALSE) restore_keys(.tbl, ..., .remove = FALSE, .unkey = FALSE) pull_key(.tbl, var) rename_keys(.tbl, ...)
.tbl |
Reference data frame. |
... |
Variables to be used for operations defined in similar fashion as
in |
.unkey |
Whether to |
.remove |
Whether to remove keys after restoring. |
var |
Parameter for |
remove_keys()
removes keys defined with ...
.
restore_keys()
transfers keys defined with ...
into .tbl
and removes
them from keys
if .remove == TRUE
. If .tbl
is grouped the following
happens:
If restored keys don't contain grouping variables then groups don't change;
If restored keys contain grouping variables then result will be regrouped based on restored values. In other words restoring keys beats 'not-modifying' grouping variables rule. It is made according to the ideology of keys: they contain information about rows and by restoring you want it to be available.
pull_key()
extracts one specified column from keys with dplyr::pull()
.
rename_keys()
renames columns in keys using dplyr::rename()
.
df <- mtcars %>% dplyr::as_tibble() %>% key_by(vs, am, .exclude = TRUE) df %>% remove_keys(vs) df %>% remove_keys(dplyr::everything()) df %>% remove_keys(dplyr::everything(), .unkey = TRUE) df %>% restore_keys(vs) df %>% restore_keys(vs, .remove = TRUE) df %>% restore_keys(dplyr::everything(), .remove = TRUE) df %>% restore_keys(dplyr::everything(), .remove = TRUE, .unkey = TRUE) # Restoring on grouped data frame df_grouped <- df %>% dplyr::mutate(vs = 1) %>% dplyr::group_by(vs) df_grouped %>% restore_keys(dplyr::everything()) # Pulling df %>% pull_key(vs) # Renaming df %>% rename_keys(Vs = vs)
df <- mtcars %>% dplyr::as_tibble() %>% key_by(vs, am, .exclude = TRUE) df %>% remove_keys(vs) df %>% remove_keys(dplyr::everything()) df %>% remove_keys(dplyr::everything(), .unkey = TRUE) df %>% restore_keys(vs) df %>% restore_keys(vs, .remove = TRUE) df %>% restore_keys(dplyr::everything(), .remove = TRUE) df %>% restore_keys(dplyr::everything(), .remove = TRUE, .unkey = TRUE) # Restoring on grouped data frame df_grouped <- df %>% dplyr::mutate(vs = 1) %>% dplyr::group_by(vs) df_grouped %>% restore_keys(dplyr::everything()) # Pulling df %>% pull_key(vs) # Renaming df %>% rename_keys(Vs = vs)
Key is a vector which goal is to provide information about rows in reference
data frame. Its length should always be equal to number of rows in
data frame. Keys are stored as tibble in attribute "keys"
and so one data frame can have multiple keys. Data frame with keys is
implemented as class keyed_df.
keys(.tbl) <- value assign_keys(.tbl, value) key_by(.tbl, ..., .add = FALSE, .exclude = FALSE) unkey(.tbl)
keys(.tbl) <- value assign_keys(.tbl, value) key_by(.tbl, ..., .add = FALSE, .exclude = FALSE) unkey(.tbl)
.tbl |
Reference data frame . |
value |
Values of keys (converted to tibble). |
... |
Variables to be used as keys defined in similar fashion as in
|
.add |
Whether to add keys to (possibly) existing ones. If |
.exclude |
Whether to exclude key variables from |
key_by
ignores grouping when creating keys. Also if .add == TRUE
and names of some added keys match the names of existing keys the new ones
will override the old ones.
Value for keys<-
should not be NULL
because it is converted to tibble
with zero rows. To remove keys use unkey()
, remove_keys()
or
restore_keys()
. assign_keys
is a more suitable for piping wrapper for
keys<-
.
df <- dplyr::as_tibble(mtcars) # Value is converted to tibble keys(df) <- 1:nrow(df) # This will throw an error ## Not run: keys(df) <- 1:10 ## End(Not run) # Use 'vs' and 'am' as keys df %>% key_by(vs, am) df %>% key_by(vs, am, .exclude = TRUE) df %>% key_by(vs) %>% key_by(am, .add = TRUE, .exclude = TRUE) # Override keys df %>% key_by(vs, am) %>% dplyr::mutate(vs = 1) %>% key_by(gear, vs, .add = TRUE) # Use select helpers df %>% key_by(dplyr::one_of(c("vs", "am"))) df %>% key_by(dplyr::everything())
df <- dplyr::as_tibble(mtcars) # Value is converted to tibble keys(df) <- 1:nrow(df) # This will throw an error ## Not run: keys(df) <- 1:10 ## End(Not run) # Use 'vs' and 'am' as keys df %>% key_by(vs, am) df %>% key_by(vs, am, .exclude = TRUE) df %>% key_by(vs) %>% key_by(am, .add = TRUE, .exclude = TRUE) # Override keys df %>% key_by(vs, am) %>% dplyr::mutate(vs = 1) %>% key_by(gear, vs, .add = TRUE) # Use select helpers df %>% key_by(dplyr::one_of(c("vs", "am"))) df %>% key_by(dplyr::everything())
These functions remove selection of keys using corresponding
scoped variant of select. .funs
argument is removed because of its redundancy.
remove_keys_all(.tbl, ..., .unkey = FALSE) remove_keys_if(.tbl, .predicate, ..., .unkey = FALSE) remove_keys_at(.tbl, .vars, ..., .unkey = FALSE)
remove_keys_all(.tbl, ..., .unkey = FALSE) remove_keys_if(.tbl, .predicate, ..., .unkey = FALSE) remove_keys_at(.tbl, .vars, ..., .unkey = FALSE)
.tbl |
Reference data frame. |
... |
Parameter for scoped functions. |
.unkey |
Whether to |
.predicate |
Parameter for scoped functions. |
.vars |
Parameter for scoped functions. |
df <- mtcars %>% dplyr::as_tibble() %>% key_by(vs, am, disp) df %>% remove_keys_all() df %>% remove_keys_all(.unkey = TRUE) df %>% remove_keys_if(rlang::is_integerish) df %>% remove_keys_at(c("vs", "am"))
df <- mtcars %>% dplyr::as_tibble() %>% key_by(vs, am, disp) df %>% remove_keys_all() df %>% remove_keys_all(.unkey = TRUE) df %>% remove_keys_if(rlang::is_integerish) df %>% remove_keys_at(c("vs", "am"))
These functions rename selection of keys using corresponding scoped variant of rename.
rename_keys_all(.tbl, .funs = list(), ...) rename_keys_if(.tbl, .predicate, .funs = list(), ...) rename_keys_at(.tbl, .vars, .funs = list(), ...)
rename_keys_all(.tbl, .funs = list(), ...) rename_keys_if(.tbl, .predicate, .funs = list(), ...) rename_keys_at(.tbl, .vars, .funs = list(), ...)
.tbl |
Reference data frame. |
.funs |
Parameter for scoped functions. |
... |
Parameter for scoped functions. |
.predicate |
Parameter for scoped functions. |
.vars |
Parameter for scoped functions. |
These functions restore selection of keys using corresponding
scoped variant of select. .funs
argument can be used to rename some keys (without touching actual keys)
before restoring.
restore_keys_all(.tbl, .funs = list(), ..., .remove = FALSE, .unkey = FALSE) restore_keys_if(.tbl, .predicate, .funs = list(), ..., .remove = FALSE, .unkey = FALSE) restore_keys_at(.tbl, .vars, .funs = list(), ..., .remove = FALSE, .unkey = FALSE)
restore_keys_all(.tbl, .funs = list(), ..., .remove = FALSE, .unkey = FALSE) restore_keys_if(.tbl, .predicate, .funs = list(), ..., .remove = FALSE, .unkey = FALSE) restore_keys_at(.tbl, .vars, .funs = list(), ..., .remove = FALSE, .unkey = FALSE)
.tbl |
Reference data frame. |
.funs |
Parameter for scoped functions. |
... |
Parameter for scoped functions. |
.remove |
Whether to remove keys after restoring. |
.unkey |
Whether to |
.predicate |
Parameter for scoped functions. |
.vars |
Parameter for scoped functions. |
df <- mtcars %>% dplyr::as_tibble() %>% key_by(vs, am, disp) # Just restore all keys df %>% restore_keys_all() # Restore all keys with renaming and without touching actual keys df %>% restore_keys_all(.funs = toupper) # Restore with renaming and removing df %>% restore_keys_all(.funs = toupper, .remove = TRUE) # Restore with renaming, removing and unkeying df %>% restore_keys_all(.funs = toupper, .remove = TRUE, .unkey = TRUE) # Restore with renaming keys satisfying the predicate df %>% restore_keys_if(rlang::is_integerish, .funs = toupper) # Restore with renaming specified keys df %>% restore_keys_at(c("vs", "disp"), .funs = toupper)
df <- mtcars %>% dplyr::as_tibble() %>% key_by(vs, am, disp) # Just restore all keys df %>% restore_keys_all() # Restore all keys with renaming and without touching actual keys df %>% restore_keys_all(.funs = toupper) # Restore with renaming and removing df %>% restore_keys_all(.funs = toupper, .remove = TRUE) # Restore with renaming, removing and unkeying df %>% restore_keys_all(.funs = toupper, .remove = TRUE, .unkey = TRUE) # Restore with renaming keys satisfying the predicate df %>% restore_keys_if(rlang::is_integerish, .funs = toupper) # Restore with renaming specified keys df %>% restore_keys_at(c("vs", "disp"), .funs = toupper)