Skip to contents

make_area_xwalk() creates a crosswalk data frame based on the weight_col parameter (if year = 2020, use "POP20" for population, "HOUSING20" for households, or "ALAND20" for land area). Using this function with other years, requires users to add population data to the block_xwalk as the tigris::blocks() function only includes population and household count data for the 2020 year. This function has also not been tested when areas include overlapping geometry and the results may be invalid for those overlapping areas if that is the case.

Usage

make_area_xwalk(
  area,
  block_xwalk = NULL,
  state = NULL,
  county = NULL,
  year = 2020,
  name_col = "NAME",
  weight_col = "HOUSING20",
  geoid_col = "GEOID",
  tract_col = "TRACTCE20",
  by = c(TRACTCE20 = "TRACTCE"),
  suffix = c("_block", "_tract"),
  placement = c("largest", "surface", "centroid"),
  digits = 2,
  extensive = TRUE,
  coverage = TRUE,
  erase = FALSE,
  area_threshold = 0.75,
  keep_geometry = FALSE,
  crs = NULL,
  ...
)

use_area_xwalk(
  data,
  area_xwalk,
  geography = "area",
  name_col = "NAME",
  geoid_col = "GEOID",
  suffix = c("_area", ""),
  weight_col = "perc_HOUSING20",
  variable_col = "variable",
  value_col = "estimate",
  moe_col = "moe",
  digits = 0,
  perc = TRUE,
  extensive = TRUE
)

Arguments

area

A sf object with an arbitrary geography overlapping with the block_xwalk. Required. If area only partly overlaps with block_xwalk, coverage should be set to TRUE (default).

block_xwalk

Block-tract crosswalk sf object. If NULL, state is required to create a crosswalk using make_block_xwalk()

state

The two-digit FIPS code (string) of the state you want. Can also be state name or state abbreviation.

county

The three-digit FIPS code (string) of the county you'd like to subset for, or a vector of FIPS codes if you desire multiple counties. Can also be a county name or vector of names.

year

the data year; defaults to 2021

name_col

Name column in area.

weight_col

Column name in input block_xwalk to use for weighting. Generated weight_col used by use_area_xwalk() should be the same as the weight_col for make_area_xwalk() but include the "perc_" prefix. Defaults to "HOUSING20" for make_block_xwalk() and "perc_HOUSING20" for use_area_xwalk().

geoid_col, tract_col

GeoID for Census tract and Census tract ID column in block_xwalk

by

Specification of join variables in the format of c("block column name for tract" = "tract column name"). Passed to dplyr::left_join().

suffix

Suffixes added to the output to disambiguate column names from the block and tract data. Unused for 2020 data.

placement

String with option for joining area and block_xwalk: "largest", "surface", or "centroid". "largest" joins the two using sf::st_join() with largest set to TRUE. "surface" first transforms block_xwalk using sf::st_point_on_surface() and "centroid" uses sf::st_centroid().

digits

Digits to use for percent share of weight value.

extensive

If TRUE (default) calculate new estimate values as weighted sums and re-calculate margin of error with tidycensus::moe_sum(). If FALSE, calculate new estimate values as weighted means (appropriate for ACS median variables) and drop the margin of error. perc is also always set to FALSE if extensive is FALSE.

coverage

If TRUE (default), it is assumed that area does not cover the full extent of the block_xwalk and an additional feature is added with the difference between the unioned area geometry and unioned block_xwalk geometry. This additional coverage ensures that blocks are accurately assigned to this alternate geography but it is excluded from the returned data frame. If coverage is TRUE and all features in area overlap with block_xwalk, the function issues a warning and then resets coverage to FALSE. The reverse option is applied if any features from area do not overlap

erase

If TRUE, apply tigris::erase_water() to input area and block_xwalk before joining. Defaults to FALSE. If erase is a sf object, the geometry of the input sf is erased from area and block_xwalk. This option is intended to support erasing open space or other non-developed land as well as water areas.

area_threshold

The percentile rank cutoff of water areas to use in the erase operation, ranked by size. Defaults to 0.75, representing the water areas in the 75th percentile and up (the largest 25 percent of areas). This value may need to be modified by the user to achieve optimal results for a given location.

keep_geometry

If TRUE, area_xwalk is a sf object with the same geometry as the input area. Defaults to FALSE.

crs

Coordinate reference system to use for input data. Recommended to set to a projected CRS if input area data is in a geographic CRS.

...

Passed to make_block_xwalk().

data

A data frame downloaded with tidycensus::get_acs().

area_xwalk

A area crosswalk data frame created with make_area_xwalk(). Required for use_area_xwalk().

geography

A character string used as general description for area geography type. Defaults to "area" but typical values could include "neighborhood", "planning district", or "service area".

variable_col

Variable column name. Defaults to "variable"

value_col, moe_col

Value and margin of error column names (defaults to "estimate" and "moe").

perc

If TRUE (default), use the denominator column ID to calculate each estimate as a percent share of the denominator value and use tidycensus::moe_prop() to calculate a new margin of error for the percent estimate.

Value

A tibble or a sf object.

Details

Using an area crosswalk

After creating an area crosswalk with make_area_xwalk(), you can pass the crosswalk to use_area_xwalk() along with a data frame from tidycensus::get_acs() or get_acs_tables(). At a minimum, the data must have a column with the same name as geoid_col along with columns named "variable", "estimate", and "moe". Please note that this approach to aggregation does not work well if your data contains "jam" values, e.g. the substitution of 0 for "1939 or older" for the Median Year Built variable. Ideally, the weight used for aggregation should be based on household counts when aggregating a household-level variable and population counts when aggregating a individual-level variable.