Format data frames and simple features using common approaches

This function can apply the following common data cleaning tasks:

Usage

format_data(
  x,
  var_names = NULL,
  clean_names = TRUE,
  replace_na_with = NULL,
  replace_with_na = NULL,
  replace_empty_char_with_na = TRUE,
  fix_date = TRUE,
  sf_col = NULL
)

rename_with_xwalk(x, xwalk = NULL)

fix_date(x)

relocate_sf_col(x, .after = dplyr::everything())

rename_sf_col(x, sf_col = "geometry")

bind_address_col(x, city = NULL, county = NULL, state = NULL)

bind_block_col(
  x,
  bldg_num = "bldg_num",
  street_dir_prefix = "street_dir_prefix",
  street_name = "street_name",
  street_suffix = "street_type"
)

bind_boundary_col(x, boundary = NULL, join = NULL, ...)

bind_units_col(x, y, units = NULL, drop = FALSE, keep_all = TRUE, .id = NULL)

Arguments

x: A tibble or data frame object
var_names: A named list following the format, list("New var name" = old_var_name), or a two column data frame with the first column being the new variable names and the second column being the old variable names; defaults to NULL.
clean_names: If TRUE, pass data frame to janitor::clean_names; defaults to TRUE.
replace_na_with: A named list to pass to tidyr::replace_na; defaults to NULL.
replace_with_na: A named list to pass to naniar::replace_with_na; defaults to NULL.
replace_empty_char_with_na: If TRUE, replace "" with NA using naniar::replace_with_na_if, Default: TRUE
fix_date: If TRUE, fix UNIX dates (common issue with dates from FeatureServer and MapServer sources) , Default: TRUE
sf_col: Name to use for the sf column after renaming; defaults to "geometry".
xwalk: a data frame with two columns using the first column as name and the second column as value; or a named list. The existing names of x must be the values and the new names must be the names.
.after: The location to place sf column after; defaults to dplyr::everything.
city, county, state: City, county, and state to bind to data frame or sf object.
boundary: An sf object with a column named "name" or a list of sf objects where all items in the list have a "name" column.
join: geometry predicate function; defaults to NULL, set to sf::st_intersects if key_list contains only POLYGON or MULTIPOLYGON objects or sf::st_nearest_feature if key_list contains other types.
y: Vector of numeric or units values to bind to x.
units: Units to use for y (if numeric) or convert to (if y is units class); defaults to NULL.
drop: If TRUE, apply the units::drop_units function to the column with units class values and return numeric values instead; defaults to FALSE.
keep_all: If FALSE, keep all columns. If FALSE, return only the named .id column.
.id: Name to use for vector of units provided to "y" parameter, when "y" is bound to the "x" data frame or tibble as a new column.

Value

The input data frame or simple feature object with formatting functions applied.

Details

Applies stringr::str_squish and stringr::str_trim to all character columns (str_trim_squish)
Optionally replaces all character values of "" with NA values
Optionally corrects UNIX formatted dates with 1970-01-01 origins
Optionally renames variables by passing a named list of variables

Bind columns:

bind_address_col bind a provided value for city, county, and state to a data frame (to supplement address data with consistent values for these variables)
bind_block_col requires a data frame with columns named "bldg_num", "street_dir_prefix", "street_name", and "street_type" and binds derived values for whether a building is on the even or odd side of a block and create a block number (street segment), and block face (street segment side) identifier.
bind_boundary_col uses sf::st_join to assign simple feature data to an enclosing polygon.

Simple feature only functions:

If "sf_col" is not NULL for format_data, the function calls rename_sf_col and relocate_sf_col

rename_sf_col: Rename sf column.
relocate_sf_col: Relocate sf column after everything (default) or specified column.

bind_boundary_col is also only able to work with simple feature objects.