Make and use crosswalk data based on U.S. Census block-level weights for U.S. Census tracts and non-Census geographic areas
Source:R/xwalk.R
make_area_xwalk.Rd
make_area_xwalk()
creates a crosswalk data frame based on the weight_col
parameter (if year = 2020
, use "POP20" for population, "HOUSING20" for
households, or "ALAND20" for land area). Using this function with other
years, requires users to add population data to the block_xwalk as the
tigris::blocks()
function only includes population and household count data
for the 2020 year. This function has also not been tested when areas include
overlapping geometry and the results may be invalid for those overlapping
areas if that is the case.
Usage
make_area_xwalk(
area,
block_xwalk = NULL,
state = NULL,
county = NULL,
year = 2020,
name_col = "NAME",
weight_col = "HOUSING20",
geoid_col = "GEOID",
tract_col = "TRACTCE20",
by = c(TRACTCE20 = "TRACTCE"),
suffix = c("_block", "_tract"),
placement = c("largest", "surface", "centroid"),
digits = 2,
extensive = TRUE,
coverage = TRUE,
erase = FALSE,
area_threshold = 0.75,
keep_geometry = FALSE,
crs = NULL,
...
)
use_area_xwalk(
data,
area_xwalk,
geography = "area",
name_col = "NAME",
geoid_col = "GEOID",
suffix = c("_area", ""),
weight_col = "perc_HOUSING20",
variable_col = "variable",
value_col = "estimate",
moe_col = "moe",
digits = 0,
perc = TRUE,
extensive = TRUE
)
Arguments
- area
A sf object with an arbitrary geography overlapping with the block_xwalk. Required. If area only partly overlaps with block_xwalk, coverage should be set to
TRUE
(default).- block_xwalk
Block-tract crosswalk sf object. If
NULL
, state is required to create a crosswalk usingmake_block_xwalk()
- state
The two-digit FIPS code (string) of the state you want. Can also be state name or state abbreviation.
- county
The three-digit FIPS code (string) of the county you'd like to subset for, or a vector of FIPS codes if you desire multiple counties. Can also be a county name or vector of names.
- year
the data year; defaults to 2021
- name_col
Name column in area.
- weight_col
Column name in input block_xwalk to use for weighting. Generated weight_col used by
use_area_xwalk()
should be the same as the weight_col formake_area_xwalk()
but include the "perc_" prefix. Defaults to "HOUSING20" formake_block_xwalk()
and "perc_HOUSING20" foruse_area_xwalk()
.- geoid_col, tract_col
GeoID for Census tract and Census tract ID column in block_xwalk
- by
Specification of join variables in the format of c("block column name for tract" = "tract column name"). Passed to
dplyr::left_join()
.- suffix
Suffixes added to the output to disambiguate column names from the block and tract data. Unused for 2020 data.
- placement
String with option for joining
area
andblock_xwalk
: "largest", "surface", or "centroid". "largest" joins the two usingsf::st_join()
with largest set toTRUE
. "surface" first transforms block_xwalk usingsf::st_point_on_surface()
and "centroid" usessf::st_centroid()
.- digits
Digits to use for percent share of weight value.
- extensive
If
TRUE
(default) calculate new estimate values as weighted sums and re-calculate margin of error withtidycensus::moe_sum()
. IfFALSE
, calculate new estimate values as weighted means (appropriate for ACS median variables) and drop the margin of error.perc
is also always set toFALSE
if extensive isFALSE
.- coverage
If
TRUE
(default), it is assumed that area does not cover the full extent of the block_xwalk and an additional feature is added with the difference between the unioned area geometry and unioned block_xwalk geometry. This additional coverage ensures that blocks are accurately assigned to this alternate geography but it is excluded from the returned data frame. Ifcoverage
isTRUE
and all features in area overlap with block_xwalk, the function issues a warning and then resets coverage toFALSE
. The reverse option is applied if any features from area do not overlap- erase
If
TRUE
, applytigris::erase_water()
to input area and block_xwalk before joining. Defaults toFALSE
. Iferase
is a sf object, the geometry of the input sf is erased from area and block_xwalk. This option is intended to support erasing open space or other non-developed land as well as water areas.- area_threshold
The percentile rank cutoff of water areas to use in the erase operation, ranked by size. Defaults to 0.75, representing the water areas in the 75th percentile and up (the largest 25 percent of areas). This value may need to be modified by the user to achieve optimal results for a given location.
- keep_geometry
If
TRUE
, area_xwalk is a sf object with the same geometry as the input area. Defaults toFALSE
.- crs
Coordinate reference system to use for input data. Recommended to set to a projected CRS if input area data is in a geographic CRS.
- ...
Passed to
make_block_xwalk()
.- data
A data frame downloaded with
tidycensus::get_acs()
.- area_xwalk
A area crosswalk data frame created with
make_area_xwalk()
. Required foruse_area_xwalk()
.- geography
A character string used as general description for area geography type. Defaults to "area" but typical values could include "neighborhood", "planning district", or "service area".
- variable_col
Variable column name. Defaults to "variable"
- value_col, moe_col
Value and margin of error column names (defaults to "estimate" and "moe").
- perc
If
TRUE
(default), use the denominator column ID to calculate each estimate as a percent share of the denominator value and usetidycensus::moe_prop()
to calculate a new margin of error for the percent estimate.
Details
Using an area crosswalk
After creating an area crosswalk with make_area_xwalk()
, you can pass the
crosswalk to use_area_xwalk()
along with a data frame from
tidycensus::get_acs()
or get_acs_tables()
. At a minimum, the data must
have a column with the same name as geoid_col along with columns named
"variable", "estimate", and "moe". Please note that this approach to
aggregation does not work well if your data contains "jam" values, e.g. the
substitution of 0 for "1939 or older" for the Median Year Built variable.
Ideally, the weight used for aggregation should be based on household counts
when aggregating a household-level variable and population counts when
aggregating a individual-level variable.