
Format data frames and simple features using common approaches
Source:R/format_data.R
      format_data.RdThis function can apply the following common data cleaning tasks:
- Applies stringr::str_squish and stringr::str_trim to all character columns 
- Optionally replaces all character values of "" with - NAvalues
- Optionally corrects UNIX formatted dates with 1970-01-01 origins 
- Optionally renames variables by passing a named list of variables 
The address functions previously included with format_data() are now
documented at format_address_data().
Usage
format_data(
  x,
  var_names = NULL,
  xwalk = NULL,
  clean_names = TRUE,
  .name_repair = "check_unique",
  replace_na_with = NULL,
  replace_with_na = NULL,
  replace_empty_char_with_na = FALSE,
  fix_date = FALSE,
  label = FALSE,
  remove_empty = NULL,
  remove_constant = FALSE,
  format_sf = FALSE,
  ...,
  call = caller_env()
)
rename_with_xwalk(
  x,
  xwalk = NULL,
  cols = c("label", "name"),
  label = FALSE,
  .strict = TRUE,
  keep_all = TRUE,
  arg = caller_arg(x),
  call = caller_env()
)
label_with_xwalk(x, xwalk = NULL, label = "var", ...)
make_variable_dictionary(
  x,
  .labels = NULL,
  .definitions = NULL,
  details = c("basic", "none", "full")
)
fix_epoch_date(x, .cols = dplyr::contains("date"), tz = "")Arguments
- x
- A tibble or data frame object 
- var_names
- A named list following the format, - list("New var name" = old_var_name), or a two column data frame with the first column being the new variable names and the second column being the old variable names; defaults to- NULL.
- xwalk
- A data frame with two columns using the first column as name and the second column as value; or a named list. The existing names of x must be the values and the new names must be the names. 
- clean_names
- If - TRUE, set .name_repair to- janitor::make_clean_names(); defaults to- TRUE.
- .name_repair
- Defaults to "check_unique" 
- replace_na_with
- A named list to pass to - tidyr::replace_na(); defaults to- NULL.
- replace_with_na
- A named list to pass to - naniar::replace_with_na(); defaults to- NULL.
- replace_empty_char_with_na
- If - TRUE, replace "" with- NAusing- naniar::replace_with_na_if(), Default:- TRUE
- fix_date
- If - FALSE, fix UNIX epoch dates (common issue with dates from FeatureServer and MapServer sources) using the- fix_epoch_date()function, Default:- TRUE
- label
- For - label_with_xwalk()use- label = "val"to use- labelled::set_value_labels()or "var" (default) to use- labelled::set_variable_labels(). For- rename_with_xwalk(), if label is- TRUE, xwalk is passed to- label_with_xwalk()with label = "var" to label columns using the original names. Defaults to- FALSE.
- remove_empty
- If not - NULL, pass values ("rows", "cols" or c("rows", "cols") (default)) to the which parameter of- janitor::remove_empty()
- remove_constant
- If - TRUE, pass data to janitor::remove_constant() using default parameters.
- format_sf
- If - TRUE, pass x and additional parameters to- format_sf_data().
- ...
- Additional parameters passed to - format_sf_data()
- cols
- Column names to use for crosswalk. 
- .strict
- If - TRUE(default), require that all values from the xwalk are found in the column names of the x data.frame. If- FALSE, unmatched values from the xwalk are ignored.
- keep_all
- If - FALSE, columns that are not named in the xwalk are dropped. If- TRUE(default), all columns are retained. If x is an- sfobject, the geometry column will not be dropped even it is not renamed.
- arg, call
- Additional parameters used internally with - cli::cli_abort()to improve error messages.
- .labels
- Replaces labels column created by - labelled::generate_dictionary()if column is all- NA(no existing labels assigned); defaults to- NULL.
- .definitions
- Character vector of definitions appended to dictionary data frame. Must be in the same order as the variables in the provided data frame x. 
- details
- add details about each variable (full details could be time consuming for big data frames, - FALSEis equivalent to- "none"and- TRUEto- "full")
- .cols
- tidyselect for columns to apply epoch date fixing function to. Defaults to - dplyr::contains("date").
- tz
- Time zone passed to - as.POSIXct().
Examples
nc <- get_location_data(data = system.file("shape/nc.shp", package = "sf"))
format_data(nc)
#> Simple feature collection with 100 features and 14 fields
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -9386880 ymin: 4012991 xmax: -8399788 ymax: 4382079
#> Projected CRS: WGS 84 / Pseudo-Mercator
#> # A tibble: 100 × 15
#>     area perimeter  cnty cnty_id name  fips  fipsno cress_id bir74 sid74 nwbir74
#>  * <dbl>     <dbl> <dbl>   <dbl> <chr> <chr>  <dbl>    <int> <dbl> <dbl>   <dbl>
#>  1 0.114      1.44  1825    1825 Ashe  37009  37009        5  1091     1      10
#>  2 0.061      1.23  1827    1827 Alle… 37005  37005        3   487     0      10
#>  3 0.143      1.63  1828    1828 Surry 37171  37171       86  3188     5     208
#>  4 0.07       2.97  1831    1831 Curr… 37053  37053       27   508     1     123
#>  5 0.153      2.21  1832    1832 Nort… 37131  37131       66  1421     9    1066
#>  6 0.097      1.67  1833    1833 Hert… 37091  37091       46  1452     7     954
#>  7 0.062      1.55  1834    1834 Camd… 37029  37029       15   286     0     115
#>  8 0.091      1.28  1835    1835 Gates 37073  37073       37   420     0     254
#>  9 0.118      1.42  1836    1836 Warr… 37185  37185       93   968     4     748
#> 10 0.124      1.43  1837    1837 Stok… 37169  37169       85  1612     1     160
#> # ℹ 90 more rows
#> # ℹ 4 more variables: bir79 <dbl>, sid79 <dbl>, nwbir79 <dbl>,
#> #   geometry <MULTIPOLYGON [m]>