{sf}
and {dplyr}
{ggplot2}
{sf}
and {dplyr}
A simple feature (or sf
) object is a data.frame
with a “sticky” geometry column.
read_sf()
Load the {sf}
package and read a shapefile into your environment with read_sf()
:
sf
data.framesYou can subset columns and view a sf object just like any other data.frame in R:
Simple feature collection with 2 features and 14 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -81.74107 ymin: 36.23436 xmax: -80.90344 ymax: 36.58965
Geodetic CRS: NAD27
# A tibble: 2 × 15
AREA PERIMETER CNTY_ CNTY_ID NAME FIPS FIPSNO CRESS_ID BIR74 SID74 NWBIR74
<dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <int> <dbl> <dbl> <dbl>
1 0.114 1.44 1825 1825 Ashe 37009 37009 5 1091 1 10
2 0.061 1.23 1827 1827 Alleg… 37005 37005 3 487 0 10
# ℹ 4 more variables: BIR79 <dbl>, SID79 <dbl>, NWBIR79 <dbl>,
# geometry <MULTIPOLYGON [°]>
sf
data.frames using {dplyr}
You can also use {dplyr}
to sort a data.frame:
Simple feature collection with 100 features and 14 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
Geodetic CRS: NAD27
# A tibble: 100 × 15
AREA PERIMETER CNTY_ CNTY_ID NAME FIPS FIPSNO CRESS_ID BIR74 SID74 NWBIR74
<dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <int> <dbl> <dbl> <dbl>
1 0.08 1.31 1936 1936 Yanc… 37199 37199 100 770 0 12
2 0.086 1.27 1893 1893 Yadk… 37197 37197 99 1269 1 65
3 0.094 1.31 1979 1979 Wils… 37195 37195 98 3702 11 1827
4 0.199 1.98 1874 1874 Wilk… 37193 37193 97 3146 4 200
5 0.142 1.66 2029 2029 Wayne 37191 37191 96 6638 18 2593
6 0.081 1.29 1880 1880 Wata… 37189 37189 95 1323 1 17
7 0.1 1.33 1962 1962 Wash… 37187 37187 94 990 5 521
8 0.118 1.42 1836 1836 Warr… 37185 37185 93 968 4 748
9 0.219 2.13 1938 1938 Wake 37183 37183 92 14484 16 4397
10 0.072 1.08 1842 1842 Vance 37181 37181 91 2180 4 1179
# ℹ 90 more rows
# ℹ 4 more variables: BIR79 <dbl>, SID79 <dbl>, NWBIR79 <dbl>,
# geometry <MULTIPOLYGON [°]>
sf
data.frames using {dplyr}
You can filter a data.frame by attributes:
Simple feature collection with 11 features and 14 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -80.06441 ymin: 33.88199 xmax: -76.49254 ymax: 36.06665
Geodetic CRS: NAD27
# A tibble: 11 × 15
AREA PERIMETER CNTY_ CNTY_ID NAME FIPS FIPSNO CRESS_ID BIR74 SID74 NWBIR74
* <dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <int> <dbl> <dbl> <dbl>
1 0.219 2.13 1938 1938 Wake 37183 37183 92 14484 16 4397
2 0.201 1.80 1968 1968 Rand… 37151 37151 76 4456 7 384
3 0.207 1.85 1989 1989 John… 37101 37101 51 3999 6 1165
4 0.203 3.20 2004 2004 Beau… 37013 37013 7 2692 7 1131
5 0.241 2.21 2083 2083 Samp… 37163 37163 82 3025 4 1396
6 0.204 1.87 2100 2100 Dupl… 37061 37061 31 2483 4 1061
7 0.24 2.00 2150 2150 Robe… 37155 37155 78 7889 31 5904
8 0.225 2.11 2162 2162 Blad… 37017 37017 9 1782 8 818
9 0.214 2.15 2185 2185 Pend… 37141 37141 71 1228 4 580
10 0.24 2.37 2232 2232 Colu… 37047 37047 24 3350 15 1431
11 0.212 2.02 2241 2241 Brun… 37019 37019 10 2181 5 659
# ℹ 4 more variables: BIR79 <dbl>, SID79 <dbl>, NWBIR79 <dbl>,
# geometry <MULTIPOLYGON [°]>
The geometry column is a simple feature collection or sfc
list column.
[1] "sfc_MULTIPOLYGON" "sfc"
A sfc
object is a list of sfg
or simple feature geometry objects.
[1] "XY" "MULTIPOLYGON" "sfg"
sf
geometryThe {sf}
package provides a range of functions for exploring the geometry, coordinate reference system, bounding box, and other attributes of a simple feature object.
sf
geometryYou can subset the geometry column:
Geometry set for 100 features
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
Geodetic CRS: NAD27
First 5 geometries:
sf
geometryYou can check the geometry type with st_geometry_type()
:
[1] MULTIPOLYGON
18 Levels: GEOMETRY POINT LINESTRING POLYGON MULTIPOINT ... TRIANGLE
sf
geometryYou can look up the coordinate reference system with st_crs()
:
Coordinate Reference System:
User input: NAD27
wkt:
GEOGCRS["NAD27",
DATUM["North American Datum 1927",
ELLIPSOID["Clarke 1866",6378206.4,294.978698213898,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
CS[ellipsoidal,2],
AXIS["latitude",north,
ORDER[1],
ANGLEUNIT["degree",0.0174532925199433]],
AXIS["longitude",east,
ORDER[2],
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4267]]
sf
geometryYou can get a bounding box for the geometry with st_bbox()
:
xmin ymin xmax ymax
-84.32385 33.88199 -75.45698 36.58965
sf
objectsOf course, you probably want to see the geometry as well. You can use the built-in plot()
method to make simple maps:
{mapview}
You can also explore data interactively with {mapview}
:
{mapview}
{ggplot2}
Before making a map {ggplot2}
, we can start by downloading Maryland counties with the {tigris}
package:
Simple feature collection with 24 features and 17 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -79.48765 ymin: 37.88661 xmax: -74.98628 ymax: 39.72304
Geodetic CRS: NAD83
First 10 features:
STATEFP COUNTYFP COUNTYNS GEOID NAME NAMELSAD LSAD
131 24 047 01668802 24047 Worcester Worcester County 06
170 24 001 01713506 24001 Allegany Allegany County 06
184 24 510 01702381 24510 Baltimore Baltimore city 25
337 24 015 00596115 24015 Cecil Cecil County 06
656 24 005 01695314 24005 Baltimore Baltimore County 06
817 24 013 01696228 24013 Carroll Carroll County 06
875 24 009 01676636 24009 Calvert Calvert County 06
909 24 019 00596495 24019 Dorchester Dorchester County 06
1026 24 003 01710958 24003 Anne Arundel Anne Arundel County 06
1051 24 021 01711211 24021 Frederick Frederick County 06
CLASSFP MTFCC CSAFP CBSAFP METDIVFP FUNCSTAT ALAND AWATER
131 H1 G4020 480 41540 <NA> A 1213156434 586531107
170 H1 G4020 <NA> 19060 <NA> A 1093489884 14710932
184 C7 G4020 548 12580 <NA> F 209649327 28758743
337 H1 G4020 428 37980 48864 A 896912533 185281256
656 H1 G4020 548 12580 <NA> A 1549740652 215957832
817 H1 G4020 548 12580 <NA> A 1159355859 13112464
875 H1 G4020 548 47900 47894 A 552158542 341580668
909 H1 G4020 480 15700 <NA> A 1400573746 1145353068
1026 H1 G4020 548 12580 <NA> A 1074353889 448032843
1051 H1 G4020 548 47900 23224 A 1710922224 17674121
INTPTLAT INTPTLON geometry
131 +38.2221332 -075.3099315 MULTIPOLYGON (((-75.28336 3...
170 +39.6123134 -078.7031037 MULTIPOLYGON (((-78.80894 3...
184 +39.3000324 -076.6104761 MULTIPOLYGON (((-76.71151 3...
337 +39.5623537 -075.9415852 MULTIPOLYGON (((-75.77209 3...
656 +39.4431666 -076.6165693 MULTIPOLYGON (((-76.69766 3...
817 +39.5633280 -077.0153297 MULTIPOLYGON (((-76.95782 3...
875 +38.5227191 -076.5297621 MULTIPOLYGON (((-76.41579 3...
909 +38.4291957 -076.0474333 MULTIPOLYGON (((-75.913 38....
1026 +38.9916174 -076.5608941 MULTIPOLYGON (((-76.58029 3...
1051 +39.4701773 -077.3976358 MULTIPOLYGON (((-77.45475 3...
{ggplot2}
After loading the {ggplot2}
package, making a map is as simple as supplying your data to ggplot()
and adding a geom_sf()
function:
{ggplot2}
You can modify this map using a ggplot2 theme. theme_void()
, for example, hides the graticules (grid lines) and axes:
You can add text with geom_sf_text()
or a label with geom_sf_label()
(using the nudge_x
and nudge_y
parameters to adjust label or text position):
I’ll use MTA bus stops for this workshop but pick a dataset that interests you.
Find a GeoJSON URL and read with sf::read_sf()
:
dplyr::glimpse()
Load dplyr and you can take a look at the object attributes with dplyr::glimpse()
:
Rows: 4,536
Columns: 13
$ OBJECTID <int> 54570, 54571, 54572, 54573, 54574, 54575, 54576, 5…
$ stop_name <chr> "CYLBURN AVE & GREENSPRING AVE fs wb", "LANIER AVE…
$ Rider_On <int> 212, 34, 105, 31, 13, 4, 50, 47, 8, 0, 14, 6, 3, 1…
$ Rider_Off <int> 172, 20, 18, 36, 24, 1, 21, 14, 6, 3, 12, 1, 12, 1…
$ Rider_Total <int> 384, 54, 124, 67, 37, 5, 71, 61, 14, 3, 26, 8, 15,…
$ Stop_Ridership_Rank <int> 291, 2042, 1129, 1804, 2488, 3860, 1728, 1900, 339…
$ Routes_Served <chr> "94, 31, 91", "94, 31, 91", "94, 31, 91", "91", "9…
$ Distribution_Policy <chr> "E1 - Public Domain - Internal Use Only", "E1 - Pu…
$ Mode <chr> "Bus", "Bus", "Bus", "Bus", "Bus", "Bus", "Bus", "…
$ Shelter <chr> "Yes", "No", "No", "No", "No", "No", "No", "No", "…
$ County <chr> "Baltimore City", "Baltimore City", "Baltimore Cit…
$ stop_id <chr> "1", "3", "4", "8", "10", "11", "12", "15", "16", …
$ geometry <POINT [°]> POINT (-76.66039 39.35095), POINT (-76.66336…
aes()
Make a map with md_counties
and your additional data as a new layer:
aes()
This doesn’t look quite right yet. Let’s try out a few more changes:
ggplot2::coord_sf()
to modify the map limitstigris::roads()
or tigris::area_water()
ggplot2::scale_color_brewer()
or ggplot2::scale_color_distiller()
sf::st_crop()
or sf::st_filter()
coord_sf()
to modify the map limitsFilter Baltimore City from md_counties and convert the sf
object into a bounding box (a bbox
):
Pass the xmin
and xmax
values from the bounding box to xlim
and the ymin
and ymax
values from the bounding box to ylim
:
coord_sf()
to modify the map limitscoord_sf()
to modify the map limits{tigris}
The tigris package is a helpful source for physical data as well as administrative data. For example, you can download roads and water areas:
baltimore_layers <-
list(
geom_sf(data = baltimore_city, fill = "white", color = "black", linewidth = 0.25),
geom_sf(data = baltimore_water, fill = "slateblue4", color = NA),
geom_sf(data = baltimore_roads, color = "gray20", linewidth = 0.035),
theme_void()
)
baltimore_basemap <- ggplot() +
baltimore_layers
baltimore_basemap
Start with the default colors:
scale_color_brewer()
scale_color_brewer()
scale_color_distiller()
scale_color_distiller()
st_crop()
or st_filter()
to trim data to areaMatch the coordinate reference system for two objects to crop or filter data:
Compare the original stop data, stops cropped to Baltimore city’s bounding box, and stops filtered to Baltimore city’s geometry:
You can also pass a list of objects to mapview to create layers and compare interactively:
labs()
to customize title and attribute labelsggrepel::geom_text_repel()
{mapboxapi}
{patchwork}
ggrepel::geom_text_repel()
library(ggrepel)
baltimore_basemap +
geom_sf(
data = slice_max(mta_stops_filtered, order_by = desc(Stop_Ridership_Rank), n = 100),
mapping = aes(size = Rider_Total),
color = "purple", alpha = 0.3
) +
geom_text_repel(
data = slice_max(mta_stops_filtered, order_by = desc(Stop_Ridership_Rank), n = 5),
aes(label = paste0(Stop_Ridership_Rank, ". ", stop_name), geometry = geometry),
size = 2, stat = "sf_coordinates"
) +
baltimore_bbox_lims +
labs(
title = "Baltimore city's 100 busiest bus stops",
size = "Total ridership"
)
ggrepel::geom_text_repel()
mapboxapi::layer_static_mapbox()
mapboxapi::layer_static_mapbox()
patchwork::inset_element()
library(patchwork)
md_counties <- st_transform(md_counties, 3857)
baltimore_city <- st_transform(baltimore_city, 3857)
md_map <- ggplot() +
geom_sf(data = md_counties, fill = "white", color = "black") +
geom_sf(data = baltimore_city, fill = "yellow") +
theme_void()
inset_md_map <-
inset_element(
md_map,
left = 0.1,
right = 0.65,
top = 0.4,
bottom = 0.075
)
mapbox_dark + inset_md_map
patchwork::inset_element()
ggfx::with_shadow()
ggfx::with_shadow()
ggfx::with_shadow()
baltimore_bbox <- st_bbox(baltimore_city)
mapbox_dark +
geom_sf(
data = st_difference(md_counties, baltimore_city),
fill = "white",
alpha = 1,
color = NA
) +
coord_sf(
xlim = c(baltimore_bbox$xmin, baltimore_bbox$xmax),
ylim = c(baltimore_bbox$ymin, baltimore_bbox$ymax)
) +
inset_element(
md_map,
left = 0.1,
right = 0.65,
top = 0.4,
bottom = 0.075
)
ggfx::with_shadow()