Format data frames and simple features using common approaches
Source:R/format_data.R
format_data.Rd
This function can apply the following common data cleaning tasks:
Usage
format_data(
x,
var_names = NULL,
clean_names = TRUE,
replace_na_with = NULL,
replace_with_na = NULL,
replace_empty_char_with_na = TRUE,
fix_date = TRUE,
sf_col = NULL
)
rename_with_xwalk(x, xwalk = NULL)
fix_date(x)
relocate_sf_col(x, .after = dplyr::everything())
rename_sf_col(x, sf_col = "geometry")
bind_address_col(x, city = NULL, county = NULL, state = NULL)
bind_block_col(
x,
bldg_num = "bldg_num",
street_dir_prefix = "street_dir_prefix",
street_name = "street_name",
street_suffix = "street_type"
)
bind_boundary_col(x, boundary = NULL, join = NULL, ...)
bind_units_col(x, y, units = NULL, drop = FALSE, keep_all = TRUE, .id = NULL)
Arguments
- x
A tibble or data frame object
- var_names
A named list following the format,
list("New var name" = old_var_name)
, or a two column data frame with the first column being the new variable names and the second column being the old variable names; defaults toNULL
.- clean_names
If
TRUE
, pass data frame to janitor::clean_names; defaults toTRUE
.- replace_na_with
A named list to pass to tidyr::replace_na; defaults to
NULL
.- replace_with_na
A named list to pass to naniar::replace_with_na; defaults to
NULL
.- replace_empty_char_with_na
If
TRUE
, replace "" withNA
using naniar::replace_with_na_if, Default:TRUE
- fix_date
If
TRUE
, fix UNIX dates (common issue with dates from FeatureServer and MapServer sources) , Default:TRUE
- sf_col
Name to use for the sf column after renaming; defaults to "geometry".
- xwalk
a data frame with two columns using the first column as name and the second column as value; or a named list. The existing names of x must be the values and the new names must be the names.
- .after
The location to place sf column after; defaults to dplyr::everything.
- city, county, state
City, county, and state to bind to data frame or
sf
object.- boundary
An sf object with a column named "name" or a list of sf objects where all items in the list have a "name" column.
- join
geometry predicate function; defaults to
NULL
, set to sf::st_intersects if key_list contains only POLYGON or MULTIPOLYGON objects or sf::st_nearest_feature if key_list contains other types.- y
Vector of numeric or units values to bind to x.
- units
Units to use for y (if numeric) or convert to (if y is units class); defaults to
NULL
.- drop
If
TRUE
, apply the units::drop_units function to the column with units class values and return numeric values instead; defaults toFALSE
.- keep_all
If
FALSE
, keep all columns. IfFALSE
, return only the named .id column.- .id
Name to use for vector of units provided to "y" parameter, when "y" is bound to the "x" data frame or tibble as a new column.
Details
Applies stringr::str_squish and stringr::str_trim to all character columns (str_trim_squish)
Optionally replaces all character values of "" with
NA
valuesOptionally corrects UNIX formatted dates with 1970-01-01 origins
Optionally renames variables by passing a named list of variables
Bind columns:
bind_address_col bind a provided value for city, county, and state to a data frame (to supplement address data with consistent values for these variables)
bind_block_col requires a data frame with columns named "bldg_num", "street_dir_prefix", "street_name", and "street_type" and binds derived values for whether a building is on the even or odd side of a block and create a block number (street segment), and block face (street segment side) identifier.
bind_boundary_col uses sf::st_join to assign simple feature data to an enclosing polygon.
Simple feature only functions:
If "sf_col"
is not NULL
for format_data, the function calls
rename_sf_col and relocate_sf_col
rename_sf_col: Rename sf column.
relocate_sf_col: Relocate sf column after everything (default) or specified column.
bind_boundary_col is also only able to work with simple feature objects.