--- title: "Getting Started" output: rmarkdown::html_vignette description: > Start here if this is your first time using tidygeocder. vignette: > %\VignetteIndexEntry{Getting Started} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} --- ## Introduction Tidygeocoder provides a unified interface for performing both forward and reverse geocoding queries with a variety of geocoding services. In forward geocoding you provide an address to the geocoding service and you get latitude and longitude coordinates in return. In reverse geocoding you provide the latitude and longitude and the geocoding service will return that location's address. In both cases, other data about the location can be provided by the geocoding service. The `geocode()` and `geo()` functions are for forward geocoding while the `reverse_geocode()` and `reverse_geo()` functions perform reverse geocoding. The `geocode()` and `reverse_geocode()` functions extract either addresses (forward geocoding) or coordinates (reverse geocoding) from the input dataframe and pass this data to the `geo()` and `reverse_geo()` functions respectively which execute the geocoding queries. All extra arguments (`...`) given to `geocode()` are passed to `geo()` and extra arguments given to `reverse_geocode()` are passed to `reverse_geo()`. ## Forward Geocoding ```r library(tibble) library(dplyr) library(tidygeocoder) address_single <- tibble(singlelineaddress = c( "11 Wall St, NY, NY", "600 Peachtree Street NE, Atlanta, Georgia" )) address_components <- tribble( ~street, ~cty, ~st, "11 Wall St", "NY", "NY", "600 Peachtree Street NE", "Atlanta", "GA" ) ``` You can use the `address` argument to specify single-line addresses. Note that when multiple addresses are provided, the batch geocoding functionality of the Census geocoding service is used. Additionally, `verbose = TRUE` displays logs to the console. ```r census_s1 <- address_single %>% geocode(address = singlelineaddress, method = "census", verbose = TRUE) #> #> Number of Unique Addresses: 2 #> Executing batch geocoding... #> Batch limit: 10,000 #> Passing 2 addresses to the US Census batch geocoder #> Querying API URL: https://geocoding.geo.census.gov/geocoder/locations/addressbatch #> Passing the following parameters to the API: #> format : "json" #> benchmark : "Public_AR_Current" #> vintage : "Current_Current" #> Query completed in: 0.4 seconds ``` |singlelineaddress | lat| long| |:-----------------------------------------|--------:|---------:| |11 Wall St, NY, NY | 40.70747| -74.01121| |600 Peachtree Street NE, Atlanta, Georgia | 33.77085| -84.38505| Alternatively you can run the same query with the `geo()` function by passing the address values from the dataframe directly. In either `geo()` or `geocode()`, the `lat` and `long` arguments are used to name the resulting latitude and longitude fields. Here the `method` argument is used to specify the "osm" (Nominatim) geocoding service. Refer to the `geo()` function documentation for the possible values of the `method` argument. ```r osm_s1 <- geo( address = address_single$singlelineaddress, method = "osm", lat = latitude, long = longitude ) #> Passing 2 addresses to the Nominatim single address geocoder #> Query completed in: 2 seconds ``` |address | latitude| longitude| |:-----------------------------------------|--------:|---------:| |11 Wall St, NY, NY | 40.70707| -74.01117| |600 Peachtree Street NE, Atlanta, Georgia | 33.77086| -84.38614| Instead of single-line addresses, you can use any combination of the following arguments to specify your addresses: `street`, `city`, `state`, `county`, `postalcode`, and `country`. ```r census_c1 <- address_components %>% geocode(street = street, city = cty, state = st, method = "census") #> Passing 2 addresses to the US Census batch geocoder #> Query completed in: 0.9 seconds ``` |street |cty |st | lat| long| |:-----------------------|:-------|:--|--------:|---------:| |11 Wall St |NY |NY | 40.70747| -74.01121| |600 Peachtree Street NE |Atlanta |GA | 33.77085| -84.38505| To return the full geocoding service results (not just latitude and longitude), specify `full_results = TRUE`. Additionally, for the Census geocoder you can get fields for geographies such as Census tracts by specifying `api_options = list(census_return_type = 'geographies')`. Be sure to use `full_results = TRUE` with the "geographies" return type in order to allow the Census geography columns to be returned. ```r census_full1 <- address_single %>% geocode( address = singlelineaddress, method = "census", full_results = TRUE, api_options = list(census_return_type = 'geographies') ) #> Passing 2 addresses to the US Census batch geocoder #> Query completed in: 0.5 seconds ``` |singlelineaddress | lat| long| id|input_address |match_indicator |match_type |matched_address |tiger_line_id |tiger_side |state_fips |county_fips |census_tract |census_block | |:-----------------------------------------|--------:|---------:|--:|:----------------------------------------------|:---------------|:----------|:------------------------------------|:-------------|:----------|:----------|:-----------|:------------|:------------| |11 Wall St, NY, NY | 40.70747| -74.01121| 1|11 Wall St, NY, NY, , , |Match |Exact |11 WALL ST, NEW YORK, NY, 10005 |59659656 |R |36 |061 |000700 |1004 | |600 Peachtree Street NE, Atlanta, Georgia | 33.77085| -84.38505| 2|600 Peachtree Street NE, Atlanta, Georgia, , , |Match |Non_Exact |600 PEACHTREE ST, ATLANTA, GA, 30308 |17343689 |L |13 |121 |001902 |2003 | As mentioned earlier, the `geocode()` function passes addresses in dataframes to the `geo()` function for geocoding so we can also directly use the `geo()` function in a similar way: ```r salz <- geo("Salzburg, Austria", method = "osm", full_results = TRUE) %>% select(-licence) #> Passing 1 address to the Nominatim single address geocoder #> Query completed in: 1 seconds ``` |address | lat| long| place_id|osm_type | osm_id|boundingbox |display_name |class |type | importance|icon | |:-----------------|--------:|--------:|---------:|:--------|------:|:----------------------------------------------|:--------------------------|:--------|:--------------|----------:|:------------------------------------------------------------------------------------| |Salzburg, Austria | 47.79813| 13.04648| 297962771|relation | 86538|47.7512115, 47.8543925, 12.9856478, 13.1275256 |Salzburg, 5020, Österreich |boundary |administrative | 0.6854709|https://nominatim.openstreetmap.org/ui/mapicons/poi_boundary_administrative.p.20.png | ## Reverse Geocoding For reverse geocoding you'll use `reverse_geocode()` instead of `geocode()` and `reverse_geo()` instead of `geo()`. Note that the reverse geocoding functions are structured very similarly to the forward geocoding functions and share many of the same arguments (`method`, `limit`, `full_results`, etc.). For reverse geocoding you will provide latitude and longitude coordinates as inputs and the location's address will be returned by the geocoding service. Below, the `reverse_geocode()` function is used to geocode coordinates in a dataframe. The `lat` and `long` arguments specify the columns that contain the latitude and longitude data. The `address` argument can be used to specify the single line address column name that is returned from the geocoder. Just as with forward geocoding, the `method` argument is used to specify the geocoding service. ```r lat_longs1 <- tibble( latitude = c(38.895865, 43.6534817), longitude = c(-77.0307713, -79.3839347) ) rev1 <- lat_longs1 %>% reverse_geocode(lat = latitude, long = longitude, address = addr, method = "osm") #> Passing 2 coordinates to the Nominatim single coordinate geocoder #> Query completed in: 2 seconds ``` | latitude| longitude|addr | |--------:|---------:|:--------------------------------------------------------------------------------------------------------------------------------------------------| | 38.89587| -77.03077|L’Enfant's plan, Pennsylvania Avenue, Washington, District of Columbia, 20045, United States | | 43.65348| -79.38393|Toronto City Hall, 100, Queen Street West, Financial District, Spadina—Fort York, Old Toronto, Toronto, Golden Horseshoe, Ontario, M5H 2N2, Canada | The same query can also be performed by passing the latitude and longitudes directly to the `reverse_geo()` function. Here we will use `full_results = TRUE` so that the full results are returned (not just the single line address column). ```r rev2 <- reverse_geo( lat = lat_longs1$latitude, long = lat_longs1$longitude, method = "osm", full_results = TRUE ) #> Passing 2 coordinates to the Nominatim single coordinate geocoder #> Query completed in: 2 seconds glimpse(rev2) #> Rows: 2 #> Columns: 23 #> $ lat 38.89587, 43.65348 #> $ long -77.03077, -79.38393 #> $ address "L’Enfant's plan, Pennsylvania Avenue, Washington, District of Columbia, 20045, United States", "Toronto City Hall, 100, Queen Street West, F… #> $ place_id 275427865, 152735232 #> $ licence "Data © OpenStreetMap contributors, ODbL 1.0. https://osm.org/copyright", "Data © OpenStreetMap contributors, ODbL 1.0. https://osm.org/copyr… #> $ osm_type "way", "way" #> $ osm_id 899927546, 198500761 #> $ osm_lat "38.895859599999994", "43.6536032" #> $ osm_lon "-77.0306779870984", "-79.38400546703345" #> $ tourism "L’Enfant's plan", NA #> $ road "Pennsylvania Avenue", "Queen Street West" #> $ city "Washington", "Old Toronto" #> $ state "District of Columbia", "Ontario" #> $ `ISO3166-2-lvl4` "US-DC", "CA-ON" #> $ postcode "20045", "M5H 2N2" #> $ country "United States", "Canada" #> $ country_code "us", "ca" #> $ boundingbox <"38.8957273", "38.8959688", "-77.0311667", "-77.0301895">, <"43.6529946", "43.6541458", "-79.3848438", "-79.3830415"> #> $ amenity NA, "Toronto City Hall" #> $ house_number NA, "100" #> $ neighbourhood NA, "Financial District" #> $ quarter NA, "Spadina—Fort York" #> $ state_district NA, "Golden Horseshoe" ``` ## Working With Messy Data Only unique input data (either addresses or coordinates) is passed to geocoding services even if your data contains duplicates. NA and blank inputs are excluded from queries. Input latitudes and longitudes are also limited to the range of possible values. Below is an example of how duplicate and missing data is handled by tidygeocoder. As the console messages shows, only the two unique addresses are passed to the geocoding service. ```r # create a dataset with duplicate and NA addresses duplicate_addrs <- address_single %>% bind_rows(address_single) %>% bind_rows(tibble(singlelineaddress = rep(NA, 3))) duplicates_geocoded <- duplicate_addrs %>% geocode(singlelineaddress, verbose = TRUE) #> #> Number of Unique Addresses: 2 #> Passing 2 addresses to the Nominatim single address geocoder #> #> Number of Unique Addresses: 1 #> Querying API URL: https://nominatim.openstreetmap.org/search #> Passing the following parameters to the API: #> q : "11 Wall St, NY, NY" #> limit : "1" #> format : "json" #> HTTP Status Code: 200 #> Query completed in: 0.9 seconds #> Total query time (including sleep): 1 seconds #> #> #> Number of Unique Addresses: 1 #> Querying API URL: https://nominatim.openstreetmap.org/search #> Passing the following parameters to the API: #> q : "600 Peachtree Street NE, Atlanta, Georgia" #> limit : "1" #> format : "json" #> HTTP Status Code: 200 #> Query completed in: 0.2 seconds #> Total query time (including sleep): 1 seconds #> #> Query completed in: 2 seconds ``` |singlelineaddress | lat| long| |:-----------------------------------------|--------:|---------:| |11 Wall St, NY, NY | 40.70707| -74.01117| |600 Peachtree Street NE, Atlanta, Georgia | 33.77086| -84.38614| |11 Wall St, NY, NY | 40.70707| -74.01117| |600 Peachtree Street NE, Atlanta, Georgia | 33.77086| -84.38614| |NA | NA| NA| |NA | NA| NA| |NA | NA| NA| As shown above, duplicates will not be removed from your results by default. However, you can return only unique results by using `unique_only = TRUE`. Note that passing `unique_only = TRUE` to `geocode()` or `reverse_geocode()` will result in the original dataframe format (including column names) to be discarded in favor of the standard field names (ie. "address", 'lat, 'long', etc.). ```r uniqueonly1 <- duplicate_addrs %>% geocode(singlelineaddress, unique_only = TRUE) #> Passing 2 addresses to the Nominatim single address geocoder #> Query completed in: 2 seconds ``` |address | lat| long| |:-----------------------------------------|--------:|---------:| |11 Wall St, NY, NY | 40.70707| -74.01117| |600 Peachtree Street NE, Atlanta, Georgia | 33.77086| -84.38614| ## Combining Multiple Queries The `geocode_combine()` function allows you to execute and combine the results of multiple geocoding queries. The queries are specified as a list of lists with the `queries` parameter and are executed in the order provided. By default only addresses that are not found are passed to the next query, but this behavior can be toggled with the `cascade` argument. In the first example below, the US Census service is used for the first query while the Nominatim ("osm") service is used for the second query. The `global_params` argument passes the `address` column from the input dataset to both queries. ```r addresses_combine <- tibble( address = c('100 Wall Street NY, NY', 'Paris', 'Not An Address') ) cascade_results1 <- addresses_combine %>% geocode_combine( queries = list( list(method = 'census'), list(method = 'osm') ), global_params = list(address = 'address') ) #> #> Passing 3 addresses to the US Census batch geocoder #> Query completed in: 0.2 seconds #> Passing 2 addresses to the Nominatim single address geocoder #> Query completed in: 2 seconds ``` |address | lat| long|query | |:----------------------|--------:|----------:|:------| |100 Wall Street NY, NY | 40.70516| -74.007346|census | |Paris | 48.85889| 2.320041|osm | |Not An Address | NA| NA| | If `cascade` is set to FALSE then all addresses are attempted by each query regardless of if the address was found initially or not. ```r no_cascade_results1 <- addresses_combine %>% geocode_combine( queries = list( list(method = 'census'), list(method = 'osm') ), global_params = list(address = 'address'), cascade = FALSE ) #> #> Passing 3 addresses to the US Census batch geocoder #> Query completed in: 0.2 seconds #> Passing 3 addresses to the Nominatim single address geocoder #> Query completed in: 3 seconds ``` |address | lat| long|query | |:----------------------|--------:|----------:|:------| |100 Wall Street NY, NY | 40.70516| -74.007346|census | |100 Wall Street NY, NY | 40.70522| -74.006800|osm | |Paris | NA| NA|census | |Paris | 48.85889| 2.320041|osm | |Not An Address | NA| NA|census | |Not An Address | NA| NA|osm | Additionally, the results from each query can be returned in separate list items by setting `return_list = TRUE`. ## Customizing Queries The `limit` argument can be specified to allow multiple results (rows) per input if available. The maximum value for the `limit` argument is often 100 for geocoding services. To use the default `limit` value for the selected geocoding service you can use `limit = NULL` which will prevent the limit parameter from being included in the query. ```r geo_limit <- geo( c("Lima, Peru", "Cairo, Egypt"), method = "osm", limit = 3, full_results = TRUE ) #> Passing 2 addresses to the Nominatim single address geocoder #> Query completed in: 2 seconds glimpse(geo_limit) #> Rows: 6 #> Columns: 13 #> $ address "Lima, Peru", "Lima, Peru", "Lima, Peru", "Cairo, Egypt", "Cairo, Egypt", "Cairo, Egypt" #> $ lat -12.06211, -12.20011, -11.99997, 30.04439, 30.03521, 30.03325 #> $ long -77.03653, -76.28506, -76.83322, 31.23573, 31.56337, 31.56217 #> $ place_id 298428260, 298361673, 298381610, 338723021, 298794275, 298680500 #> $ licence "Data © OpenStreetMap contributors, ODbL 1.0. https://osm.org/copyright", "Data © OpenStreetMap contributors, ODbL 1.0. https://osm.org/copyright… #> $ osm_type "relation", "relation", "relation", "node", "relation", "relation" #> $ osm_id 1944756, 1944659, 1944670, 271613766, 5466227, 4103336 #> $ boundingbox <"-12.0797663", "-12.0303496", "-77.0884555", "-77.0017774">, <"-13.3241714", "-10.2741856", "-77.8863105", "-75.5075">, <"-12.5199316", "-11.572… #> $ display_name "Lima, Lima Metropolitana, Lima, Perú", "Lima, Perú", "Lima, Lima Metropolitana, Lima, Perú", "القاهرة, 11519, مصر", "القاهرة, مصر", "القاهرة, مص… #> $ class "boundary", "boundary", "boundary", "place", "place", "boundary" #> $ type "administrative", "administrative", "administrative", "city", "city", "administrative" #> $ importance 0.7830015, 0.6119761, 0.5934835, 0.6960286, 0.6960286, 0.4835559 #> $ icon "https://nominatim.openstreetmap.org/ui/mapicons/poi_boundary_administrative.p.20.png", "https://nominatim.openstreetmap.org/ui/mapicons/poi_boun… ``` To directly specify specific API parameters for a given `method` you can use the `custom_query` parameter. For example, [the Nominatim (OSM) geocoder has a 'polygon_geojson' argument](https://nominatim.org/release-docs/develop/api/Details/#parameters) that can be used to return GeoJSON geometry content. To pass this parameter you can insert it with a named list using the `custom_query` argument: ```r cairo_geo <- geo("Cairo, Egypt", method = "osm", full_results = TRUE, custom_query = list(polygon_geojson = 1), verbose = TRUE ) #> #> Number of Unique Addresses: 1 #> Passing 1 address to the Nominatim single address geocoder #> #> Number of Unique Addresses: 1 #> Querying API URL: https://nominatim.openstreetmap.org/search #> Passing the following parameters to the API: #> q : "Cairo, Egypt" #> limit : "1" #> polygon_geojson : "1" #> format : "json" #> HTTP Status Code: 200 #> Query completed in: 0.2 seconds #> Total query time (including sleep): 1 seconds #> #> Query completed in: 1 seconds glimpse(cairo_geo) #> Rows: 1 #> Columns: 15 #> $ address "Cairo, Egypt" #> $ lat 30.04439 #> $ long 31.23573 #> $ place_id 338723021 #> $ licence "Data © OpenStreetMap contributors, ODbL 1.0. https://osm.org/copyright" #> $ osm_type "node" #> $ osm_id 271613766 #> $ boundingbox <"29.8843879", "30.2043879", "31.0757257", "31.3957257"> #> $ display_name "القاهرة, 11519, مصر" #> $ class "place" #> $ type "city" #> $ importance 0.6960286 #> $ icon "https://nominatim.openstreetmap.org/ui/mapicons/poi_place_city.p.20.png" #> $ geojson.type "Point" #> $ geojson.coordinates <31.23573, 30.04439> ``` To test a query without sending any data to a geocoding service, you can use `no_query = TRUE` (NA results are returned). ```r noquery1 <- geo(c("Vancouver, Canada", "Las Vegas, NV"), no_query = TRUE, method = "arcgis" ) #> #> Number of Unique Addresses: 2 #> Passing 2 addresses to the ArcGIS single address geocoder #> #> Number of Unique Addresses: 1 #> Querying API URL: https://geocode.arcgis.com/arcgis/rest/services/World/GeocodeServer/findAddressCandidates #> Passing the following parameters to the API: #> SingleLine : "Vancouver, Canada" #> maxLocations : "1" #> f : "json" #> #> Number of Unique Addresses: 1 #> Querying API URL: https://geocode.arcgis.com/arcgis/rest/services/World/GeocodeServer/findAddressCandidates #> Passing the following parameters to the API: #> SingleLine : "Las Vegas, NV" #> maxLocations : "1" #> f : "json" #> Query completed in: 0 seconds ``` |address | lat| long| |:-----------------|---:|----:| |Vancouver, Canada | NA| NA| |Las Vegas, NV | NA| NA| Additional usage notes for the `geocode()`, `geo()`, `reverse_geocode()`, and `reverse_geo()` functions: - Set `quiet = TRUE` to silence console logs displayed by default (how many inputs were submitted, to what geocoding service, and the elapsed time). - Use the `progress_bar` argument to control if a progress bar is displayed. - The `verbose`, `quiet`, and `progress_bar` arguments can be set globally with `options`. For instance `options(tidygeocoder.verbose = TRUE)` will set verbose to `TRUE` for all queries by default. - To customize the API endpoint, use the `api_options` or `api_url` arguments. See `?geo` or `?reverse_geo` for details. - If not specified manually, the `min_time` argument will default to a value based on the maximum query rate of the given geocoding service. If you are using a local Nominatim server or have a commercial geocoder plan that has less restrictive usage limits then you can manually set `min_time` to a lower value (such as 0). - The `mode` argument can be used to specify whether the batch geocoding or single address/coordinate geocoding should be used. By default batch geocoding will be used if available when more than one address or coordinate is provided (with some noted exceptions for slower batch geocoding services). - The `return_addresses` and `return_coords` parameters (for forward and reverse geocoding respectively) can be used to toggle whether the input addresses or coordinates are returned. Setting these parameters to `FALSE` is necessary to use batch geocoding if `limit` is greater than 1 or NULL. - For the `reverse_geocode()` and `geocode()` functions, the `return_input` argument can be used to toggle if the input dataset is included in the returned dataframe. - For use with the [memoise package](https://memoise.r-lib.org/), you may need to use character values when specifying input data columns in the `geocode()` and `reverse_geocode()` functions. See [#154](https://github.com/jessecambon/tidygeocoder/pull/154) for details.