--- title: "Web APIs" author: "JJB + Course" date: "04/05/2019" output: html_document: toc: true toc_float: collapsed: false --- # HTTP Requests ## Example: Downloading a File ```{r download-file-enr} # install.packages("readxl") # URL of file to retrieve url = "http://dmi.illinois.edu/stuenr/class/enrsp19.xls" # Save this file as ... destfile = "enrsp19.xls" # Download the file download.file(url = url, destfile = destfile) ``` ```{r read-file} # Read the file into R enrollfa17 = readxl::read_excel(destfile, skip = 4) # Adding skip = 4 to read statement allows for the # proper headers to be set for the data as it skips # the initial four rows that contain no information. head(enrollfa17) ``` ### Exercise: Automatically download all of Spring semester http://dmi.illinois.edu/stuenr/class/enrsp19.xls http://dmi.illinois.edu/stuenr/class/enrsp18.xls http://dmi.illinois.edu/stuenr/class/enrsp17.xls http://dmi.illinois.edu/stuenr/class/enrsp05.xls ```{r} # URL of file to retrieve url_base = "http://dmi.illinois.edu/stuenr/class/" # Save this file as ... destfile_name = "enrsp" destfile_ext = ".xls" years = 10:19 format_url = paste0( url_base, destfile_name, years, destfile_ext ) format_url destfile_full = paste0( destfile_name, years, destfile_ext ) destfile_full ``` ```{r} # Download the file download.file(url = format_url, destfile = destfile_full) ``` ```{r} ``` # REST ## Example: Retrieving a website ```{r httr-demo} # install.packages("httr") library("httr") # Form a GET request to obtain the STAT department website web_page = GET("https://stat.illinois.edu/") # Check HTTP status code status_code(web_page) # Retrieve body of request content(web_page, "text") ``` ## Example: URL Encoding ```{r url-encoding} # Encoding a URL URLencode("https://www.google.com/search?q=URL decoded") # Decoding the URL URLdecode("https://www.google.com/search?q=URL%20encoded") # Encoding Special Characters URLencode("+ * - \ ^ # $ % &") # Decode Special Characters URLdecode(URLencode("+ * - \ ^ # $ % &")) ``` ## Example: Querying Google ```{r example-google-query} # Provide a query term query_terms = list(q = "uiuc stat department") # Form a GET request to search google google_search = GET("https://www.google.com/search", query = query_terms) google_search ``` We can see the header content using: ```{r check-headers} headers(google_search) ``` Though, we are much more interested in manipulating the text information. ```{r body-content} # Retrieve body of request content(google_search, "text") # Default is an xml document content(google_search) ``` ## Exercise: GET request Send a GET request to Check: - Status - Header - Body ```{r} library("httr") httpbin_get = GET("https://httpbin.org/get") httpbin_get status_code(httpbin_get) content(httpbin_get) headers(httpbin_get) ``` # JSON ## Parsing JSON Sample JSON s ```{r parse-json-txt} # install.packages("jsonlite") library("jsonlite") my_profile = fromJSON('{ "name": "Your Name", "occupation": "Student", "eyecolor": "hazel", "salary": -14000, "starbucks": true, "music": { "band": "Johnny Cash", "song": "Ring of Fire" }, "required": ["name", "occupation"] }') ``` Recall, variables in a list can be accessed using `$`. For example, we can retrieve `music` using `my_profile$music`. ```{r json-list-form} # This a list with another list embedded (e.g. music) my_profile # Extracting the value from the embedded list gives: my_profile$music$band ``` ## Example: Query GitHub's API ```{r gh-api-call} # Specify base URL base_url = "https://api.github.com" # Specify user gh_user = "tidyverse" # Create a request for information endpoint_resource = paste0("/users/", gh_user ,"/repos") # Make a url url = paste0(base_url, endpoint_resource) # Form a GET request to obtain a list of repos on GitHub gh_get_repos = GET(url) ``` ## Example: Manipulating Results from GitHub Once we have the response, we need to be able to access the body contents. To do so, the `content()` function is useful. By default, `content()` seeks to automatically apply a coercion to an _R_ object. Sometimes this works correctly; however, it doesn't in practice. As an example, the data in this case is returned as a `list` instead of a `data.frame`: ```{r} gh_list_repos = content(gh_get_repos) gh_list_repos ``` To avoid the "automatic" conversion done by `httr`, we directly specify it to return text. From the text response, we aim to modify it into a `data.frame`: ```{r json-data-frame} # May generate a list OR a data frame gh_data_repos = fromJSON(content(gh_get_repos, "text")) gh_data_repos ``` To make sure the conversion is okay, consider check the underlying structure. In particular, you may wish to verify that it is a data frame ```{r} is.data.frame(gh_data_repos) ``` Moreover, you may want to look at the individual classes. ```{r check-struct} str(gh_data_repos) ``` ### Exercise: GitHub's Emoji Codes Retrieve the a list of all emoji's supported on GitHub based on the details provided in: https://developer.github.com/v3/emojis/ ```{r gh-api-emoji} ### API Call base_url_gh = "https://api.github.com" endpoint = "/emojis" full_call = paste0(base_url_gh, endpoint) full_call ``` ```{r} getting_emoji_data = GET(full_call) getting_emoji_data ``` ```{r} content(getting_emoji_data) ``` ```{r} jsonlite::fromJSON(content(getting_emoji_data, "text")) ``` ### Exercise: Game of Thrones Houses Obtain the list of houses from the Ice and Fire API: ```{r gh-api-got} ### API Call got_houses = GET("https://www.anapioficeandfire.com/api/houses") got_houses content(got_houses) ``` # Authorization ## Example: CUMTD The Champaign-Urbana Mass Transit District (CUMTD) bus service provides an API that requires a _key_ to access data. The _key_ acts as a way to authorize users to download data and apply _rate control_. You can request your key from: The query parameter in `GET()` allows for a key to be supplied. ```{r cu-mtd, eval = FALSE} key = rstudioapi::askForPassword("Please enter the API Key") # Show output cumtd_stop = GET("https://developer.cumtd.com/api/v2.2/json/getstop", query = list(key = key, stop_id = "it")) ```