Applying R to community planning and working with spatial data
August 2, 2023
Find the presentation at elipousson.github.io/presentation-tugis2023/ or the code at github.com/elipousson/presentation-tugis2023
A few things about me:
R is best known for as a statistical programming language often used in data science and research.
Like Python or C++, R is an object-oriented, functional programming language where the base set of features can be extended through open-source packages or libraries.
R turns 30 years old this month 🎂
R is based an S—a language created at Bell Labs in 1976 to support exploratory data analysis.
S ran on VAX 11/780 computers like the mainframe above.
The average number of monthly downloads for {sf}
(the most popular package for working with spatial data in R) has grown from just 1,300 in June 2018 to over 58,000 in June 2023.
There can’t be packages for everything—but sometimes it feels that way.
Packages let you work with everything from Microsoft Word documents to 3D renderings of digital elevation data to Google Drive to Google Earth Engine.
If a package doesn’t exist already, there are also packages like {usethis}
or {devtools}
that help you build it.
The {sf}
package, first published in 2016, is the most popular R package for spatial data.
Extension packages include:
{sf}
is built on open source librariessf
objects are data frames with “sticky” geometrysf
objects are an implementation of the simple feature standard.
Attribute data is attached to a “sticky” geometry column.
{sf}
in actionPackages are simple to install:
And to load:
Using the {sf}
package, you can use st_read()
to read data into R from a local file, URL, or database:
Reading layer `md_counties' from data source
`/Users/elipousson/Desktop/presentation-tugis2023/files/md_counties.gpkg'
using driver `GPKG'
Simple feature collection with 24 features and 17 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -8848525 ymin: 4563419 xmax: -8347435 ymax: 4825776
Projected CRS: WGS 84 / Pseudo-Mercator
You can take a look at the geometry column using st_geometry()
:
Geometry set for 24 features
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -8848525 ymin: 4563419 xmax: -8347435 ymax: 4825776
Projected CRS: WGS 84 / Pseudo-Mercator
First 5 geometries:
MULTIPOLYGON (((-8380505 4582785, -8380646 4582...
MULTIPOLYGON (((-8772971 4809172, -8772981 4809...
MULTIPOLYGON (((-8539486 4774265, -8539486 4774...
MULTIPOLYGON (((-8434911 4785700, -8434909 4785...
MULTIPOLYGON (((-8537944 4825494, -8537927 4825...
The geometry for a sf
data frame is known as a simple feature collection (sfc
) object. A sfc
object is made up of sfg
objects.
Or, if you drop the geometry, a sf
object is just like a spreadsheet:
STATEFP COUNTYFP COUNTYNS GEOID NAME NAMELSAD LSAD
1 24 047 01668802 24047 Worcester Worcester County 06
2 24 001 01713506 24001 Allegany Allegany County 06
3 24 510 01702381 24510 Baltimore Baltimore city 25
4 24 015 00596115 24015 Cecil Cecil County 06
5 24 005 01695314 24005 Baltimore Baltimore County 06
6 24 013 01696228 24013 Carroll Carroll County 06
7 24 009 01676636 24009 Calvert Calvert County 06
8 24 019 00596495 24019 Dorchester Dorchester County 06
9 24 003 01710958 24003 Anne Arundel Anne Arundel County 06
10 24 021 01711211 24021 Frederick Frederick County 06
11 24 033 01714670 24033 Prince George's Prince George's County 06
12 24 027 01709077 24027 Howard Howard County 06
13 24 023 01711058 24023 Garrett Garrett County 06
14 24 031 01712500 24031 Montgomery Montgomery County 06
15 24 035 00596089 24035 Queen Anne's Queen Anne's County 06
16 24 041 00592947 24041 Talbot Talbot County 06
17 24 045 01668606 24045 Wicomico Wicomico County 06
18 24 039 00596907 24039 Somerset Somerset County 06
19 24 025 01698178 24025 Harford Harford County 06
20 24 037 01697853 24037 St. Mary's St. Mary's County 06
21 24 029 00593907 24029 Kent Kent County 06
22 24 017 01676992 24017 Charles Charles County 06
23 24 043 01714220 24043 Washington Washington County 06
24 24 011 00595737 24011 Caroline Caroline County 06
CLASSFP MTFCC CSAFP CBSAFP METDIVFP FUNCSTAT ALAND AWATER
1 H1 G4020 480 41540 <NA> A 1213156434 586531107
2 H1 G4020 <NA> 19060 <NA> A 1093489884 14710932
3 C7 G4020 548 12580 <NA> F 209649327 28758743
4 H1 G4020 428 37980 48864 A 896912533 185281256
5 H1 G4020 548 12580 <NA> A 1549740652 215957832
6 H1 G4020 548 12580 <NA> A 1159355859 13112464
7 H1 G4020 548 47900 47894 A 552158542 341580668
8 H1 G4020 480 15700 <NA> A 1400573746 1145353068
9 H1 G4020 548 12580 <NA> A 1074353889 448032843
10 H1 G4020 548 47900 23224 A 1710922224 17674121
11 H1 G4020 548 47900 47894 A 1250057213 41922695
12 H1 G4020 548 12580 <NA> A 649956423 6336170
13 H1 G4020 <NA> <NA> <NA> A 1681102437 22498420
14 H1 G4020 548 47900 23224 A 1277193339 35686502
15 H1 G4020 548 12580 <NA> A 962673214 360020725
16 H1 G4020 548 20660 <NA> A 695561637 539363457
17 H1 G4020 480 41540 <NA> A 969767208 66764613
18 H1 G4020 480 41540 <NA> A 828145301 752652883
19 H1 G4020 548 12580 <NA> A 1132152044 231885675
20 H1 G4020 548 15680 <NA> A 928809940 1050592561
21 H1 G4020 <NA> <NA> <NA> A 717497117 353321619
22 H1 G4020 548 47900 47894 A 1185757843 479438830
23 H1 G4020 548 25180 <NA> A 1185655248 24820607
24 H1 G4020 <NA> <NA> <NA> A 827350254 16777066
INTPTLAT INTPTLON
1 +38.2221332 -075.3099315
2 +39.6123134 -078.7031037
3 +39.3000324 -076.6104761
4 +39.5623537 -075.9415852
5 +39.4431666 -076.6165693
6 +39.5633280 -077.0153297
7 +38.5227191 -076.5297621
8 +38.4291957 -076.0474333
9 +38.9916174 -076.5608941
10 +39.4701773 -077.3976358
11 +38.8292778 -076.8481880
12 +39.2522639 -076.9244057
13 +39.5472985 -079.2746192
14 +39.1373815 -077.2030633
15 +39.0406929 -076.0824053
16 +38.7483486 -076.1784757
17 +38.3673699 -075.6320828
18 +38.0744501 -075.8533228
19 +39.5374292 -076.2997894
20 +38.2230765 -076.5344870
21 +39.2412793 -076.1259867
22 +38.4728532 -077.0154272
23 +39.6036207 -077.8146709
24 +38.8715306 -075.8316620
You can plot data with R’s built-in plot()
function:
Square brackets ([]
) are an easy way to subset rows and columns.
You can summarize data using the summary()
function:
STATEFP COUNTYFP COUNTYNS GEOID
Length:24 Length:24 Length:24 Length:24
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
NAME NAMELSAD LSAD CLASSFP
Length:24 Length:24 Length:24 Length:24
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
MTFCC CSAFP CBSAFP METDIVFP
Length:24 Length:24 Length:24 Length:24
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
FUNCSTAT ALAND AWATER INTPTLAT
Length:24 Min. :2.096e+08 Min. :6.336e+06 Length:24
Class :character 1st Qu.:8.279e+08 1st Qu.:2.424e+07 Class :character
Mode :character Median :1.084e+09 Median :2.006e+08 Mode :character
Mean :1.048e+09 Mean :2.908e+08
3rd Qu.:1.222e+09 3rd Qu.:4.559e+08
Max. :1.711e+09 Max. :1.145e+09
INTPTLON geom
Length:24 MULTIPOLYGON :24
Class :character epsg:3857 : 0
Mode :character +proj=merc...: 0
The sf
package adds S3 methods for the base plot()
and summary()
functions. Learn more about S3 methods in Advanced R.
{sf}
with tidyverse packagestidyverse packages work well with {sf}
. For example, we can use geom_sf()
from {ggplot2}
to make a simple map:
Or take a peek at the values of the data with the glimpse()
function from {dplyr}
:
Rows: 24
Columns: 18
$ STATEFP <chr> "24", "24", "24", "24", "24", "24", "24", "24", "24", "24", "…
$ COUNTYFP <chr> "047", "001", "510", "015", "005", "013", "009", "019", "003"…
$ COUNTYNS <chr> "01668802", "01713506", "01702381", "00596115", "01695314", "…
$ GEOID <chr> "24047", "24001", "24510", "24015", "24005", "24013", "24009"…
$ NAME <chr> "Worcester", "Allegany", "Baltimore", "Cecil", "Baltimore", "…
$ NAMELSAD <chr> "Worcester County", "Allegany County", "Baltimore city", "Cec…
$ LSAD <chr> "06", "06", "25", "06", "06", "06", "06", "06", "06", "06", "…
$ CLASSFP <chr> "H1", "H1", "C7", "H1", "H1", "H1", "H1", "H1", "H1", "H1", "…
$ MTFCC <chr> "G4020", "G4020", "G4020", "G4020", "G4020", "G4020", "G4020"…
$ CSAFP <chr> "480", NA, "548", "428", "548", "548", "548", "480", "548", "…
$ CBSAFP <chr> "41540", "19060", "12580", "37980", "12580", "12580", "47900"…
$ METDIVFP <chr> NA, NA, NA, "48864", NA, NA, "47894", NA, NA, "23224", "47894…
$ FUNCSTAT <chr> "A", "A", "F", "A", "A", "A", "A", "A", "A", "A", "A", "A", "…
$ ALAND <dbl> 1213156434, 1093489884, 209649327, 896912533, 1549740652, 115…
$ AWATER <dbl> 586531107, 14710932, 28758743, 185281256, 215957832, 13112464…
$ INTPTLAT <chr> "+38.2221332", "+39.6123134", "+39.3000324", "+39.5623537", "…
$ INTPTLON <chr> "-075.3099315", "-078.7031037", "-076.6104761", "-075.9415852…
$ geom <MULTIPOLYGON [m]> MULTIPOLYGON (((-8380505 45..., MULTIPOLYGON (((…
{sf}
for spatial transformations{sf}
includes a range of spatial transformation functions with names matching the spatial functions of PostGIS.
To show how this works, we can get the center of Baltimore City, buffer by 25 miles, and filter intersecting counties.
{sf}
for spatial transformationsSimple feature collection with 1 feature and 17 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: -8528259 ymin: 4764882 xmax: -8528259 ymax: 4764882
Projected CRS: WGS 84 / Pseudo-Mercator
STATEFP COUNTYFP COUNTYNS GEOID NAME NAMELSAD LSAD CLASSFP MTFCC
3 24 510 01702381 24510 Baltimore Baltimore city 25 C7 G4020
CSAFP CBSAFP METDIVFP FUNCSTAT ALAND AWATER INTPTLAT INTPTLON
3 548 12580 <NA> F 209649327 28758743 +39.3000324 -076.6104761
geom
3 POINT (-8528259 4764882)
{sf}
for spatial transformationsbalt_center <- st_centroid(md[3, ])
balt_area <- st_buffer(balt_center, dist = as_units(25, "mi"))
balt_area
Simple feature collection with 1 feature and 17 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: -8568493 ymin: 4724649 xmax: -8488026 ymax: 4805116
Projected CRS: WGS 84 / Pseudo-Mercator
STATEFP COUNTYFP COUNTYNS GEOID NAME NAMELSAD LSAD CLASSFP MTFCC
3 24 510 01702381 24510 Baltimore Baltimore city 25 C7 G4020
CSAFP CBSAFP METDIVFP FUNCSTAT ALAND AWATER INTPTLAT INTPTLON
3 548 12580 <NA> F 209649327 28758743 +39.3000324 -076.6104761
geom
3 POLYGON ((-8488026 4764882,...
{sf}
for spatial transformationsbalt_center <- st_centroid(md[3, ])
balt_area <- st_buffer(balt_center, dist = as_units(25, "mi"))
balt_area_counties <- st_filter(md, balt_area, .predicate = st_intersects)
balt_area_counties
Simple feature collection with 9 features and 17 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -8630342 ymin: 4655334 xmax: -8433120 ymax: 4825554
Projected CRS: WGS 84 / Pseudo-Mercator
STATEFP COUNTYFP COUNTYNS GEOID NAME NAMELSAD LSAD
1 24 510 01702381 24510 Baltimore Baltimore city 25
2 24 005 01695314 24005 Baltimore Baltimore County 06
3 24 013 01696228 24013 Carroll Carroll County 06
4 24 003 01710958 24003 Anne Arundel Anne Arundel County 06
5 24 033 01714670 24033 Prince George's Prince George's County 06
6 24 027 01709077 24027 Howard Howard County 06
7 24 031 01712500 24031 Montgomery Montgomery County 06
8 24 025 01698178 24025 Harford Harford County 06
9 24 029 00593907 24029 Kent Kent County 06
CLASSFP MTFCC CSAFP CBSAFP METDIVFP FUNCSTAT ALAND AWATER INTPTLAT
1 C7 G4020 548 12580 <NA> F 209649327 28758743 +39.3000324
2 H1 G4020 548 12580 <NA> A 1549740652 215957832 +39.4431666
3 H1 G4020 548 12580 <NA> A 1159355859 13112464 +39.5633280
4 H1 G4020 548 12580 <NA> A 1074353889 448032843 +38.9916174
5 H1 G4020 548 47900 47894 A 1250057213 41922695 +38.8292778
6 H1 G4020 548 12580 <NA> A 649956423 6336170 +39.2522639
7 H1 G4020 548 47900 23224 A 1277193339 35686502 +39.1373815
8 H1 G4020 548 12580 <NA> A 1132152044 231885675 +39.5374292
9 H1 G4020 <NA> <NA> <NA> A 717497117 353321619 +39.2412793
INTPTLON geom
1 -076.6104761 MULTIPOLYGON (((-8539486 47...
2 -076.6165693 MULTIPOLYGON (((-8537944 48...
3 -077.0153297 MULTIPOLYGON (((-8566905 48...
4 -076.5608941 MULTIPOLYGON (((-8524879 47...
5 -076.8481880 MULTIPOLYGON (((-8564237 47...
6 -076.9244057 MULTIPOLYGON (((-8545393 47...
7 -077.2030633 MULTIPOLYGON (((-8577825 47...
8 -076.2997894 MULTIPOLYGON (((-8483226 47...
9 -076.1259867 MULTIPOLYGON (((-8479204 47...
{sf}
for spatial transformations{sf}
for spatial transformations{dplyr}
for attribute transformationsChanges to attribute data with functions like summarise()
also change geometry:
md_AWATER_summary <- md |>
mutate(
perc_AWATER = round(AWATER / (ALAND + AWATER), digits = 2),
category = case_when(
perc_AWATER > 0.4 ~ "A lot of water",
perc_AWATER > 0.1 ~ "Some water",
.default = "Not much water"
),
category = factor(category, c("A lot of water", "Some water", "Not much water"))
) |>
group_by(category) |>
summarise(
mean_perc_AWATER = mean(perc_AWATER)
) |>
arrange(category)
glimpse(md_AWATER_summary)
Rows: 3
Columns: 3
$ category <fct> A lot of water, Some water, Not much water
$ mean_perc_AWATER <dbl> 0.475, 0.247, 0.021
$ geom <GEOMETRY [m]> POLYGON ((-8438155 4567026,..., MULTIPOLYGON (((-8384…
{sf}
is not the only package for working with spatial data! Developers are both extending {sf}
and using alternate approaches to support:
mapview()
You can view data interactively with {mapview}
and {leaflet}
(built on top of the Leaflet Javascript library):
You can download data from OpenStreetMap by using the {osmdata}
package to access the Overpass API:
The |>
or %>%
symbol is known as a “pipe” and allows you to chain together functions where the output from one becomes the first input for the next.
The {osmdata}
package returns a list with the original query and the different geometry types extracted from OpenStreetMap:
List of 8
$ bbox : chr "39.0207808733236,-76.9720808621772,39.5801417319894,-76.2492317058742"
$ overpass_call : chr "[out:xml][timeout:25];\n(\n node [\"amenity\"=\"university\"] (39.0207808733236,-76.9720808621772,39.580141731"| __truncated__
$ meta :List of 3
..$ timestamp : chr "[ Tue 3 Aug 2023 18:38:15 ]"
..$ OSM_version : chr "0.6"
..$ overpass_version: chr "Overpass API 0.7.61.4 df4c946a"
$ osm_points :Classes 'sf' and 'data.frame': 2793 obs. of 28 variables:
..$ osm_id : chr [1:2793] "37522081" "49391514" "50379933" "50621967" ...
..$ name : chr [1:2793] NA NA NA NA ...
..$ addr:city : chr [1:2793] NA NA NA NA ...
..$ addr:housenumber : chr [1:2793] NA NA NA NA ...
..$ addr:postcode : chr [1:2793] NA NA NA NA ...
..$ addr:state : chr [1:2793] NA NA NA NA ...
..$ addr:street : chr [1:2793] NA NA NA NA ...
..$ amenity : chr [1:2793] NA NA NA NA ...
..$ barrier : chr [1:2793] NA NA NA NA ...
..$ button_operated : chr [1:2793] NA NA NA NA ...
..$ check_date : chr [1:2793] NA NA NA NA ...
..$ crossing : chr [1:2793] NA NA NA NA ...
..$ ele : chr [1:2793] NA NA NA NA ...
..$ entrance : chr [1:2793] NA NA NA NA ...
..$ gnis:county_id : chr [1:2793] NA NA NA NA ...
..$ gnis:created : chr [1:2793] NA NA NA NA ...
..$ gnis:edited : chr [1:2793] NA NA NA NA ...
..$ gnis:feature_id : chr [1:2793] NA NA NA NA ...
..$ gnis:state_id : chr [1:2793] NA NA NA NA ...
..$ highway : chr [1:2793] NA NA NA NA ...
..$ internet_access : chr [1:2793] NA NA NA NA ...
..$ kerb : chr [1:2793] NA NA NA NA ...
..$ motorcar : chr [1:2793] NA NA NA NA ...
..$ motorcycle : chr [1:2793] NA NA NA NA ...
..$ operator : chr [1:2793] NA NA NA NA ...
..$ short_name : chr [1:2793] NA NA NA NA ...
..$ traffic_signals:sound: chr [1:2793] NA NA NA NA ...
..$ geometry :sfc_POINT of length 2793; first list element: 'XY' num [1:2] -76.5 39.3
..- attr(*, "sf_column")= chr "geometry"
..- attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA ...
.. ..- attr(*, "names")= chr [1:27] "osm_id" "name" "addr:city" "addr:housenumber" ...
$ osm_lines :Classes 'sf' and 'data.frame': 34 obs. of 40 variables:
..$ osm_id : chr [1:34] "31030343" "31274489" "114709354" "235563773" ...
..$ name : chr [1:34] NA NA NA NA ...
..$ access : chr [1:34] NA NA NA NA ...
..$ addr:city : chr [1:34] NA NA NA NA ...
..$ addr:country : chr [1:34] NA NA NA NA ...
..$ addr:housenumber : chr [1:34] NA NA NA NA ...
..$ addr:postcode : chr [1:34] NA NA NA NA ...
..$ addr:state : chr [1:34] NA NA NA NA ...
..$ addr:street : chr [1:34] NA NA NA NA ...
..$ amenity : chr [1:34] NA NA NA NA ...
..$ barrier : chr [1:34] NA NA NA NA ...
..$ building : chr [1:34] NA NA NA NA ...
..$ building:levels : chr [1:34] NA NA NA NA ...
..$ check_date : chr [1:34] NA NA NA NA ...
..$ denomination : chr [1:34] NA NA NA NA ...
..$ ele : chr [1:34] NA NA NA NA ...
..$ fence_type : chr [1:34] NA NA NA NA ...
..$ gnis:county_id : chr [1:34] NA NA NA NA ...
..$ gnis:county_name : chr [1:34] NA NA NA NA ...
..$ gnis:created : chr [1:34] NA NA NA NA ...
..$ gnis:edited : chr [1:34] NA NA NA NA ...
..$ gnis:feature_id : chr [1:34] NA NA NA NA ...
..$ gnis:import_uuid : chr [1:34] NA NA NA NA ...
..$ gnis:reviewed : chr [1:34] NA NA NA NA ...
..$ gnis:state_id : chr [1:34] NA NA NA NA ...
..$ internet_access : chr [1:34] NA NA NA NA ...
..$ internet_access:fee: chr [1:34] NA NA NA NA ...
..$ name_1 : chr [1:34] NA NA NA NA ...
..$ old_name : chr [1:34] NA NA NA NA ...
..$ operator : chr [1:34] NA NA NA NA ...
..$ operator:wikidata : chr [1:34] NA NA NA NA ...
..$ phone : chr [1:34] NA NA NA NA ...
..$ religion : chr [1:34] NA NA NA NA ...
..$ short_name : chr [1:34] NA NA NA NA ...
..$ source : chr [1:34] NA NA NA NA ...
..$ surface : chr [1:34] NA NA NA NA ...
..$ website : chr [1:34] NA NA NA NA ...
..$ wikidata : chr [1:34] NA NA NA NA ...
..$ wikipedia : chr [1:34] NA NA NA NA ...
..$ geometry :sfc_LINESTRING of length 34; first list element: 'XY' num [1:84, 1:2] -76.7 -76.7 -76.7 -76.7 -76.7 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:84] "364705495" "2821804114" "2821804115" "2821804116" ...
.. .. ..$ : chr [1:2] "lon" "lat"
..- attr(*, "sf_column")= chr "geometry"
..- attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA ...
.. ..- attr(*, "names")= chr [1:39] "osm_id" "name" "access" "addr:city" ...
$ osm_polygons :Classes 'sf' and 'data.frame': 85 obs. of 40 variables:
..$ osm_id : chr [1:85] "31119729" "31119744" "82222563" "86580064" ...
..$ name : chr [1:85] "Community College of Baltimore County: Catonsville" "Community College of Baltimore County: Essex" "Community College Of Baltimore County Hunt Valley Extension" "Anne Arundel Community College - Arundel Mills" ...
..$ access : chr [1:85] NA NA NA NA ...
..$ addr:city : chr [1:85] "Baltimore" "Baltimore" "Hunt Valley" NA ...
..$ addr:country : chr [1:85] NA NA NA NA ...
..$ addr:housenumber : chr [1:85] "800" "7201" "11101" NA ...
..$ addr:postcode : chr [1:85] "21228" "21237" "21031" NA ...
..$ addr:state : chr [1:85] "MD" NA NA NA ...
..$ addr:street : chr [1:85] "South Rolling Road" "Rossville Boulevard" "McCormick Road" NA ...
..$ amenity : chr [1:85] "college" "college" "college" "college" ...
..$ barrier : chr [1:85] NA NA NA NA ...
..$ building : chr [1:85] NA NA "yes" NA ...
..$ building:levels : chr [1:85] NA NA NA NA ...
..$ check_date : chr [1:85] NA NA NA NA ...
..$ denomination : chr [1:85] NA NA NA NA ...
..$ ele : chr [1:85] NA NA NA "50" ...
..$ fence_type : chr [1:85] NA NA NA NA ...
..$ gnis:county_id : chr [1:85] NA NA NA "003" ...
..$ gnis:county_name : chr [1:85] NA NA NA NA ...
..$ gnis:created : chr [1:85] NA NA NA "08/24/2007" ...
..$ gnis:edited : chr [1:85] NA NA NA NA ...
..$ gnis:feature_id : chr [1:85] "592063" "592133" NA "2132656" ...
..$ gnis:import_uuid : chr [1:85] NA NA NA NA ...
..$ gnis:reviewed : chr [1:85] NA NA NA NA ...
..$ gnis:state_id : chr [1:85] NA NA NA "24" ...
..$ internet_access : chr [1:85] NA NA NA NA ...
..$ internet_access:fee: chr [1:85] NA NA NA NA ...
..$ name_1 : chr [1:85] NA NA NA NA ...
..$ old_name : chr [1:85] NA NA NA NA ...
..$ operator : chr [1:85] NA NA NA NA ...
..$ operator:wikidata : chr [1:85] NA NA NA NA ...
..$ phone : chr [1:85] NA "+1 443 840 2222" "+1-443-840-5830" NA ...
..$ religion : chr [1:85] NA NA NA NA ...
..$ short_name : chr [1:85] NA NA "CCBC - Hunt Valley Ext" NA ...
..$ source : chr [1:85] NA NA NA NA ...
..$ surface : chr [1:85] NA NA NA NA ...
..$ website : chr [1:85] "www.ccbcmd.edu" "http://www.ccbcmd.edu/About-CCBC/Locations/CCBC-Essex.aspx" "http://www.ccbcmd.edu/" NA ...
..$ wikidata : chr [1:85] NA NA NA NA ...
..$ wikipedia : chr [1:85] NA NA NA NA ...
..$ geometry :sfc_POLYGON of length 85; first list element: List of 1
.. ..$ : num [1:68, 1:2] -76.7 -76.7 -76.7 -76.7 -76.7 ...
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : chr [1:68] "345874523" "6390816250" "6390816249" "345874525" ...
.. .. .. ..$ : chr [1:2] "lon" "lat"
.. ..- attr(*, "class")= chr [1:3] "XY" "POLYGON" "sfg"
..- attr(*, "sf_column")= chr "geometry"
..- attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA ...
.. ..- attr(*, "names")= chr [1:39] "osm_id" "name" "access" "addr:city" ...
$ osm_multilines : NULL
$ osm_multipolygons:Classes 'sf' and 'data.frame': 11 obs. of 30 variables:
..$ osm_id : chr [1:11] "1599827" "3170935" "6699380" "6849272" ...
..$ name : chr [1:11] "Loyola University Maryland" "Towson University" "Stevenson University Owings Mills Campus" "The Peabody Institute of Johns Hopkins" ...
..$ addr:city : chr [1:11] "Baltimore" "Towson" "" "" ...
..$ addr:housenumber : chr [1:11] "4501" "8000" "" "" ...
..$ addr:postcode : chr [1:11] "21210" "21204" "" "" ...
..$ addr:state : chr [1:11] "" "MD" "" "" ...
..$ addr:street : chr [1:11] "North Charles Street" "York Road" "" "" ...
..$ alt_name : chr [1:11] "" "" "" "" ...
..$ amenity : chr [1:11] "university" "university" "university" "university" ...
..$ check_date : chr [1:11] "" "" "" "2022-05-13" ...
..$ ele : chr [1:11] "99" "" "" "24" ...
..$ gnis:county_id : chr [1:11] "" "" "" "" ...
..$ gnis:created : chr [1:11] "" "" "" "" ...
..$ gnis:feature_id : chr [1:11] "" "" "" "" ...
..$ gnis:state_id : chr [1:11] "" "" "" "" ...
..$ internet_access : chr [1:11] "" "" "" "" ...
..$ internet_access:fee : chr [1:11] "" "" "" "" ...
..$ internet_access:ssid: chr [1:11] "" "" "" "" ...
..$ isced:level:2011 : chr [1:11] "" "" "" "" ...
..$ name:en : chr [1:11] "" "" "" "" ...
..$ operator : chr [1:11] "" "" "Stevenson University" "Johns Hopkins University" ...
..$ operator:short : chr [1:11] "" "" "" "JHU" ...
..$ operator:wikidata : chr [1:11] "" "" "" "" ...
..$ phone : chr [1:11] "" "" "" "" ...
..$ short_name : chr [1:11] "" "" "" "" ...
..$ type : chr [1:11] "multipolygon" "multipolygon" "multipolygon" "multipolygon" ...
..$ website : chr [1:11] "https://www.loyola.edu/" "" "" "" ...
..$ wikidata : chr [1:11] "Q5399470" "Q1474129" "" "Q1156934" ...
..$ wikipedia : chr [1:11] "en:Loyola University Maryland" "" "" "" ...
..$ geometry :sfc_MULTIPOLYGON of length 11; first list element: List of 1
.. ..$ :List of 6
.. .. ..$ 114709358: num [1:65, 1:2] -76.6 -76.6 -76.6 -76.6 -76.6 ...
.. .. .. ..- attr(*, "dimnames")=List of 2
.. .. .. .. ..$ : chr [1:65] "9823629897" "9823720659" "6303560379" "6303560374" ...
.. .. .. .. ..$ : chr [1:2] "lat" "lon"
.. .. ..$ 114709360: num [1:17, 1:2] -76.6 -76.6 -76.6 -76.6 -76.6 ...
.. .. .. ..- attr(*, "dimnames")=List of 2
.. .. .. .. ..$ : chr [1:17] "1298673031" "1298673121" "9823712899" "9823712900" ...
.. .. .. .. ..$ : chr [1:2] "lat" "lon"
.. .. ..$ 673113087: num [1:8, 1:2] -76.6 -76.6 -76.6 -76.6 -76.6 ...
.. .. .. ..- attr(*, "dimnames")=List of 2
.. .. .. .. ..$ : chr [1:8] "6303560373" "6303560372" "6303560371" "6303560370" ...
.. .. .. .. ..$ : chr [1:2] "lat" "lon"
.. .. ..$ 673113079: num [1:18, 1:2] -76.6 -76.6 -76.6 -76.6 -76.6 ...
.. .. .. ..- attr(*, "dimnames")=List of 2
.. .. .. .. ..$ : chr [1:18] "6303560352" "6303560364" "6303560362" "6303560361" ...
.. .. .. .. ..$ : chr [1:2] "lat" "lon"
.. .. ..$ 673113549: num [1:6, 1:2] -76.6 -76.6 -76.6 -76.6 -76.6 ...
.. .. .. ..- attr(*, "dimnames")=List of 2
.. .. .. .. ..$ : chr [1:6] "6303565073" "9823720645" "6303565076" "6303565075" ...
.. .. .. .. ..$ : chr [1:2] "lat" "lon"
.. .. ..$ 673113545: num [1:6, 1:2] -76.6 -76.6 -76.6 -76.6 -76.6 ...
.. .. .. ..- attr(*, "dimnames")=List of 2
.. .. .. .. ..$ : chr [1:6] "6303565069" "6303565072" "9823720636" "6303565071" ...
.. .. .. .. ..$ : chr [1:2] "lat" "lon"
.. ..- attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"
..- attr(*, "sf_column")= chr "geometry"
..- attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA ...
.. ..- attr(*, "names")= chr [1:29] "osm_id" "name" "addr:city" "addr:housenumber" ...
- attr(*, "class")= chr [1:3] "list" "osmdata" "osmdata_sf"
str()
displays the structure of any R object.
A specialized package like {osmdata}
helps you work with the an unfamiliar API or data format with confidence!
If your desktop GIS does something that you don’t know how to do in R, you can even interface directly using:
You can also use the {esri2sf}
package to download FeatureLayer data and metadata.
INSPIRE stands for Investing in Neighborhoods and Schools to Promote Improvement, Revitalization, and Excellence.
The Department of Planning is working with communities in a quarter-mile area (about a five minute walk) around each 21st Century School to create a plan that can guide new private and public investments in each area.
I use R for a little bit of everything:
{officer}
and {officerExtras}
package)This work relies on a set of custom packages {mapbaltimore}
, {bcpss}
, and {mapmaryland}
with city and state-specific data and data access functions.
{mapbaltimore}
packageLoading the {mapbaltimore}
package gives you access to:
{mapbaltimore}
packageinspire_plans
is one of dozens of datasets that are available after the package is loaded:
{mapbaltimore}
packageThe package includes “helper” functions like get_area_streets()
to filter data by name or location:
area_streets <- get_area_streets(area)
area_streets <- st_transform(area_streets, crs = 3857)
area_streets
Simple feature collection with 40 features and 1 field
Geometry type: GEOMETRY
Dimension: XY
Bounding box: xmin: -8526277 ymin: 4763286 xmax: -8525230 ymax: 4764336
Projected CRS: WGS 84 / Pseudo-Mercator
# A tibble: 40 × 2
fullname geometry
* <chr> <GEOMETRY [m]>
1 BOYER ST MULTILINESTRING ((-8525452 4763491, -8525403 4763493, -85253…
2 CONGRESS CT LINESTRING (-8526276 4763464, -8526264 4763464)
3 E BALTIMORE ST MULTILINESTRING ((-8525609 4763559, -8525571 4763561), (-852…
4 E FAIRMOUNT AVE MULTILINESTRING ((-8526123 4763690, -8526068 4763693), (-852…
5 E FAYETTE ST MULTILINESTRING ((-8526029 4763885, -8525948 4763889, -85258…
6 E LOMBARD ST MULTILINESTRING ((-8526075 4763368, -8526049 4763369, -85260…
7 IRVINE PL LINESTRING (-8526266 4763621, -8526267 4763644, -8526269 476…
8 JEFFERSON ST MULTILINESTRING ((-8525488 4764246, -8525457 4764247), (-852…
9 LAMLEY ST MULTILINESTRING ((-8525552 4763798, -8525528 4763800, -85254…
10 MOYER ST MULTILINESTRING ((-8525639 4763641, -8525633 4763642), (-852…
# ℹ 30 more rows
{mapbaltimore}
packageINSPIRE supports improvements along school walking routes but we first need to figure out:
The {mapmaryland}
package includes a function to download and format data from the state open data portal for vehicle crashes:
get_md_crash_data()
is built using the {RSocrata}
package created and maintained by the City of Chicago.
The dplyr::filter()
function limits data to injury crashes:
area_crashes <- st_as_sf(area_crashes, coords = c("longitude", "latitude"))
st_crs(area_crashes) <- 4326
injury_crashes <- filter(area_crashes, report_type == "Injury Crash")
injury_crashes
Simple feature collection with 405 features and 43 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: -76.59249 ymin: 39.28995 xmax: -76.58349 ymax: 39.29714
Geodetic CRS: WGS 84
# A tibble: 405 × 44
crash_date dotw weekend year quarter light_desc county_desc county_no
* <date> <chr> <chr> <dbl> <chr> <chr> <chr> <dbl>
1 2018-05-25 Fri N 2018 Q2 Daylight Baltimore Ci… 24
2 2016-08-15 Mon N 2016 Q3 Daylight Baltimore Ci… 24
3 2019-01-07 Mon N 2019 Q1 Daylight Baltimore Ci… 24
4 2016-07-24 Sun Y 2016 Q3 Dark Lights On Baltimore Ci… 24
5 2018-12-29 Sat Y 2018 Q4 Daylight Baltimore Ci… 24
6 2016-07-07 Thu N 2016 Q3 Daylight Baltimore Ci… 24
7 2017-11-28 Tue N 2017 Q4 Dark Lights On Baltimore Ci… 24
8 2017-10-30 Mon N 2017 Q4 Daylight Baltimore Ci… 24
9 2016-10-15 Sat Y 2016 Q4 Dark Lights On Baltimore Ci… 24
10 2016-02-12 Fri N 2016 Q1 <NA> Baltimore Ci… 24
# ℹ 395 more rows
# ℹ 36 more variables: junction_desc <chr>, collision_type_desc <chr>,
# surf_cond_desc <chr>, rd_cond_desc <chr>, rd_div_desc <chr>,
# fix_obj_desc <chr>, report_no <chr>, acc_date <dbl>, acc_time <time>,
# signal_flag_desc <chr>, signal_flag <chr>, c_m_zone_flag <chr>,
# harm_event_code1 <dbl>, harm_event_desc2 <chr>, harm_event_code2 <dbl>,
# rte_no <dbl>, log_mile <dbl>, logmile_dir_flag_desc <chr>, …
Mapping pedestrian injury crashes help identify priority intersections:
We included crash data on a packet of printed maps helped we used during a walk around the neighborhood to identify priority locations.
The {mapboxapi}
package provided the 15-minute walking distance isochrones and the custom basemap in this {ggplot2}
map.
The INSPIRE program is creating a community plan for each INSPIRE area.
For each plan, we need to know: what are the area demographics and housing conditions?
How do you know who lives in the planning area?
Here are the key steps:
To start, we can get the area from the inspire_plans
data (and use mutate()
add a new column we’ll need shortly):
Simple feature collection with 1 feature and 22 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -8522284 ymin: 4761947 xmax: -8521246 ymax: 4762989
Projected CRS: WGS 84 / Pseudo-Mercator
# A tibble: 1 × 23
plan_name plan_name_short overall_status inspire_lead_planner plan_url
* <chr> <chr> <chr> <chr> <chr>
1 John Ruhrah Elem… John Ruhrah EMS Review in pro… Eli Pousson https:/…
# ℹ 18 more variables: year_launched <chr>, year_adopted <chr>,
# adoption_status <chr>, adoption_date <chr>, document_url <chr>,
# recommendation_report_status <chr>, recommendation_report_url <chr>,
# kick_off_presentation_date <chr>, launch_date_target <chr>,
# walking_route_id_target_date <chr>, recommendations_date_target <chr>,
# commission_review_date_target <chr>, implementation_status <chr>,
# planning_districts <chr>, neighborhoods <chr>, council_districts <chr>, …
We can use {tidycensus}
package download American Community Survey data directly from the Census API:
acs_data <- get_acs(
geography = "tract",
table = "B25105",
county = "Baltimore city",
state = "MD",
year = 2021,
survey = "acs5",
geometry = TRUE
)
acs_data <- st_transform(acs_data, 3857)
acs_data
Simple feature collection with 199 features and 5 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -8539487 ymin: 4749959 xmax: -8519220 ymax: 4775128
Projected CRS: WGS 84 / Pseudo-Mercator
First 10 features:
GEOID NAME variable
1 24510160300 Census Tract 1603, Baltimore city, Maryland B25105_001
2 24510020100 Census Tract 201, Baltimore city, Maryland B25105_001
3 24510190300 Census Tract 1903, Baltimore city, Maryland B25105_001
4 24510271801 Census Tract 2718.01, Baltimore city, Maryland B25105_001
5 24510272007 Census Tract 2720.07, Baltimore city, Maryland B25105_001
6 24510240200 Census Tract 2402, Baltimore city, Maryland B25105_001
7 24510270903 Census Tract 2709.03, Baltimore city, Maryland B25105_001
8 24510151100 Census Tract 1511, Baltimore city, Maryland B25105_001
9 24510160600 Census Tract 1606, Baltimore city, Maryland B25105_001
10 24510090400 Census Tract 904, Baltimore city, Maryland B25105_001
estimate moe geometry
1 788 223 MULTIPOLYGON (((-8532207 47...
2 1780 231 MULTIPOLYGON (((-8526012 47...
3 589 342 MULTIPOLYGON (((-8532300 47...
4 531 107 MULTIPOLYGON (((-8536825 47...
5 1291 58 MULTIPOLYGON (((-8539487 47...
6 2198 151 MULTIPOLYGON (((-8528101 47...
7 1156 129 MULTIPOLYGON (((-8527043 47...
8 1221 151 MULTIPOLYGON (((-8535831 47...
9 870 187 MULTIPOLYGON (((-8534762 47...
10 959 110 MULTIPOLYGON (((-8528134 47...
We can use st_filter()
to filter the data to our planning area:
The {getACS}
package provides functions to weight tract-level estimates based on share of the block-level population or households that live.
Start by making a crosswalk between block and tract data:
Simple feature collection with 10025 features and 28 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: -8539487 ymin: 4749959 xmax: -8519220 ymax: 4775128
Projected CRS: WGS 84 / Pseudo-Mercator
First 10 features:
STATEFP20 COUNTYFP20 TRACTCE20 BLOCKCE20 GEOID20 NAME20 MTFCC20
1 24 510 270401 4006 245102704014006 Block 4006 G5040
2 24 510 272007 1010 245102720071010 Block 1010 G5040
3 24 510 130200 1007 245101302001007 Block 1007 G5040
4 24 510 250500 3020 245102505003020 Block 3020 G5040
5 24 510 200100 1009 245102001001009 Block 1009 G5040
6 24 510 272003 1002 245102720031002 Block 1002 G5040
7 24 510 030200 1020 245100302001020 Block 1020 G5040
8 24 510 271300 3023 245102713003023 Block 3023 G5040
9 24 510 271501 2014 245102715012014 Block 2014 G5040
10 24 510 250500 1015 245102505001015 Block 1015 G5040
UR20 UACE20 UATYPE20 FUNCSTAT20 ALAND20 AWATER20 INTPTLAT20 INTPTLON20
1 U 04843 U S 19165 0 +39.3455590 -076.5541477
2 U 04843 U S 29938 0 +39.3637168 -076.7028884
3 U 04843 U S 1630 0 +39.3108913 -076.6240941
4 U 04843 U S 5530 0 +39.2246859 -076.5891821
5 U 04843 U S 4407 0 +39.2922426 -076.6466720
6 U 04843 U S 24258 0 +39.3711849 -076.6789280
7 U 04843 U S 8955 0 +39.2881509 -076.6032131
8 U 04843 U S 9388 0 +39.3542857 -076.6369351
9 U 04843 U S 31413 0 +39.3611131 -076.6655301
10 U 04843 U S 65802 0 +39.2059241 -076.5759715
HOUSING20 POP20 STATEFP COUNTYFP GEOID NAME NAMELSAD
1 27 38 24 510 24510270401 2704.01 Census Tract 2704.01
2 166 314 24 510 24510272007 2720.07 Census Tract 2720.07
3 0 0 24 510 24510130200 1302 Census Tract 1302
4 19 35 24 510 24510250500 2505 Census Tract 2505
5 16 35 24 510 24510200100 2001 Census Tract 2001
6 22 118 24 510 24510272003 2720.03 Census Tract 2720.03
7 35 71 24 510 24510030200 302 Census Tract 302
8 3 21 24 510 24510271300 2713 Census Tract 2713
9 23 41 24 510 24510271501 2715.01 Census Tract 2715.01
10 0 0 24 510 24510250500 2505 Census Tract 2505
MTFCC FUNCSTAT ALAND AWATER INTPTLAT INTPTLON
1 G5020 S 1457277 0 +39.3462013 -076.5467855
2 G5020 S 802298 0 +39.3636968 -076.7064881
3 G5020 S 352623 0 +39.3123177 -076.6305781
4 G5020 S 8584546 4070107 +39.2176239 -076.5823444
5 G5020 S 264900 0 +39.2909656 -076.6478189
6 G5020 S 1934714 0 +39.3633286 -076.6854724
7 G5020 S 506034 91912 +39.2865363 -076.6022612
8 G5020 S 2861713 3175 +39.3628280 -076.6395253
9 G5020 S 3628340 18282 +39.3642233 -076.6614760
10 G5020 S 8584546 4070107 +39.2176239 -076.5823444
geometry
1 POLYGON ((-8522055 4771161,...
2 POLYGON ((-8538657 4773870,...
3 POLYGON ((-8529841 4766300,...
4 POLYGON ((-8525952 4753874,...
5 POLYGON ((-8532324 4763585,...
6 POLYGON ((-8536016 4775034,...
7 POLYGON ((-8527463 4763015,...
8 POLYGON ((-8531243 4772484,...
9 POLYGON ((-8534527 4773513,...
10 POLYGON ((-8524756 4751349,...
tidycensus::interpolate_pw()
is another option for interpolating U.S. Census data in R.
Then make a crosswalk between the area and tracts:
We can then use the crosswalk to adjust the ACS data:
# A tibble: 1 × 22
NAME variable column_id table_id estimate moe perc_estimate perc_moe
<chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 John Ruhrah… B25105_… B25105001 B25105 2977 160 NA NA
# ℹ 14 more variables: geography <chr>, table_title <chr>,
# simple_table_title <chr>, subject_area <chr>, universe <chr>,
# denominator_column_id <chr>, topics <chr>, line_number <dbl>,
# column_title <chr>, indent <dbl>, parent_column_id <chr>,
# denominator_estimate <dbl>, denominator_moe <dbl>,
# denominator_column_title <chr>
But data for a single planning area doesn’t say much without context. {getACS}
extends {tidycensus}
to support downloading multiple geographies at once:
regional_acs_data <- get_acs_geographies(
geography = c("county", "metropolitan statistical area/micropolitan statistical area"),
table = "B25105",
state = "MD",
county = "Baltimore city",
msa = "Baltimore-Columbia-Towson, MD Metro Area"
)
tbl_acs_data <- bind_rows(
area_acs_data,
regional_acs_data
)
tbl_acs_data <- select_acs(tbl_acs_data, .column_title_col = NULL)
Then I can use gt_acs()
(a {getACS}
function built around the gt()
function from the {gt}
package) to create a formatted table:
gt_acs(
data = tbl_acs_data,
rowname_col = "NAME",
value_label = "Median monthly housing costs",
table = "B25105"
) |>
fmt_currency(c("estimate", "moe"), decimals = 0)
Median monthly housing costs | |
---|---|
John Ruhrah EMS | $2,977 ± $160 |
Baltimore city, Maryland | $1,184 ± $11 |
Baltimore-Columbia-Towson, MD Metro Area | $1,510 ± $8 |
Source: 2017-2021 ACS 5-year Estimates, Table B25105. |
{ggplot2}
A similar workflow is used to create production-ready plots and graphics using {ggplot2}
:
Within Baltimore City government, R is used routinely by data and research staff, the Baltimore City Department of Finance, the Mayor’s Office of Recovery Programs, Mayor’s Office of Performance and Innovation, and the Baltimore City Public School System.
“The suite of geospatial visualization and analysis tools in R is head and shoulders better than SAS and Stata and often easier and more reproducible than ArcGIS.”
— Graham MacDonald, CIO, VP for Technology and Data Science from R Is the Best Programming Language for Innovation at Urban
“Most of what we use R for is ‘under the hood’ to access and manipulate data that later gets inserted into plans, reports, and GIS applications.”
— Ben Kraft, Planning Research Coordinator, Montgomery Planning
These GitHub lists include resources on R, Python, and QGIS:
Find the presentation at elipousson.github.io/presentation-tugis2023/ or the code at github.com/elipousson/presentation-tugis2023