--- title: "Functionals" author: "JJB + Course" output: html_document: toc: true toc_float: collapsed: false --- # Functionals ### Example: Functional calling another function ```{r} call_func = function(x, f) { # call the function `f` with data `x` f(x) } x = c(-2, 0.3, 1.2, 4.8) call_func(x, mean) call_func(x, min) ``` ## Example: Functional in Action ```{r} set.seed(111) replicate(3, runif(5)) rep(runif(5), 3) ``` ### Exercise: Creating a simulation Use the `replicate` function to sample 10 observations from a normal distribution (`?rnorm`) 5 times. ```{r} set.seed(999999999) ?replicate replicate(n = 5, expr = { rnorm(10) }) ``` # Functionals in Practice ## Example: Specify Missingness ```{r} set.seed(191) n = 100 my_df = data.frame(col1 = sample(-1:10, n, replace = TRUE), col2 = sample(-1:10, n, replace = TRUE), col3 = sample(-1:10, n, replace = TRUE), col4 = sample(-1:10, n, replace = TRUE)) my_df my_df$col1[my_df$col1 == -1] = NA my_df$col2[my_df$col2 == -1] = NA my_df$col3[my_df$col3 == -1] = NA my_df$col4[my_df$col4 == -1] = NA my_df ``` ## Example: Functionize it! ```{r} # Action repeated consistently code_missing = function(x) { x[x == -1] = NA x } # Apply behavior to data my_df$col1 = code_missing(my_df$col1) my_df$col2 = code_missing(my_df$col2) my_df$col3 = code_missing(my_df$col3) my_df$col4 = code_missing(my_df$col4) ``` ## Example: Repeating a Behavior ```{r} # Action repeated consistently code_missing = function(x) { x[x == -1] = NA x } # Apply uniformly the action to columns for(i in seq_len(ncol(my_df))) { my_df[, i] = code_missing(my_df[, i]) } ``` ## Example: Vectorization ```{r} x = 1:4 x^2 # f(x) = x^2 ``` ## Example: Most commonly used functionals in _R_ | Function | Description | Output | |-----------|-----------------------|----------------------------------| | `lapply` | Apply a Function over a List or Vector | `list` | | `sapply` | Apply a Function over a List or Vector | `vector`, `matrix`, `array`, `list` | | `apply` | Apply Functions Over Array Margins | `matrix`, `array` | | `mapply` | Apply a Function to Multiple List or Vector Arguments | `vector`, `matrix`, `array`, `list` | ## Example Functional ```{r} x = 1:4 # Define function square = function(x) {x^2} # List Output lapply(x, FUN = square) # Vector / Matrix Output sapply(x, FUN = square) # Force output to be a list sapply(x, FUN = square, simplify = FALSE) ``` ### Exercise: Determine Variable Data Types Using a functional, determine the data type of each variable in the `mtcars` data frame with the `class()` function. ```{r} class(mtcars$mpg) class(mtcars$cyl) ``` ```{r} # This is generating a vector of character classes sapply(mtcars, FUN = class) sapply(mtcars, FUN = mean) ``` ```{r} lapply(mtcars, FUN = class) ``` ## Example: Functional as a Loop ```{r} lapply_func = function(x, f) { out = vector('list', length(x)) for(i in seq_along(out)) { out[i] = f(x[i]) } out } lapply_func(x, square) ``` ### Example: Emphasized Loop ```{r} means = vector("double", ncol(trees)) for(i in seq_along(trees)) { means[[i]] = mean(trees[[i]], na.rm = TRUE) } sds = vector("double", ncol(trees)) for(i in seq_along(trees)) { sds[[i]] = sd(trees[[i]], na.rm = TRUE) } means sds ``` c.f. [Hadley Wickham's](https://twitter.com/hadleywickham) talk on ["Managing many models with _R_"](https://www.youtube.com/watch?v=rz3_FDVt9eg#t=19m55s) at Edinburgh R User Group ## Example: Emphasize Action ```{r} means = sapply(trees, FUN = mean) sds = sapply(trees, FUN = sd) means sds ``` ### Exercise: Handling Missing Values Re-write the following loop to use a functional to code missing values. ```{r} for(i in seq_len(ncol(my_df))) { my_df[, i] = code_missing(my_df[, i]) } ``` ```{r} set.seed(191) n = 100 my_df = data.frame(col1 = sample(-1:10, n, replace = TRUE), col2 = sample(-1:10, n, replace = TRUE), col3 = sample(-1:10, n, replace = TRUE), col4 = sample(-1:10, n, replace = TRUE)) my_df my_df$col1[my_df$col1 == -1] = NA my_df$col2[my_df$col2 == -1] = NA my_df$col3[my_df$col3 == -1] = NA my_df$col4[my_df$col4 == -1] = NA # Action repeated consistently code_missing = function(x) { x[x == -1] = NA x } ``` ```{r} my_df2 = sapply(my_df, FUN = code_missing) my_df2 class(my_df2) ``` ```{r} my_df_lapply[] = lapply(my_df, FUN = code_missing) my_df_lapply ``` ### Example: `apply()` on nD Structures ```{r} x = matrix(1:12, nrow = 3) # Margin = 1 -> Take the summation of each row. apply(x, FUN = sum, MARGIN = 1) # Equivalent to performing a row sum. rowSums(x) # Margin 2 -> Take the summation of each column apply(x, FUN = sum, MARGIN = 2) # Equivalent to performing the column summation. colSums(x) ``` What happens if we use both on a matrix??? ```{r} apply(x, FUN = sum, MARGIN = c(1,2)) ``` ### Exercise: 3D Array Maximums Obtain the maximum values in a 3D array for values across the 2nd dimension. ```{r} my_array = array(seq_len(24), dim = c(2, 3, 4)) my_array ``` ```{r} apply(my_array, MARGIN = c(1, 3), FUN = max) ``` ### Example: Tabulating Data ```{r} # Creating sample data val = 1:10 group = factor( rep(c("control", "treat"), each = 5) ) # Sum values by group tapply(X = val, INDEX = group, FUN = sum) ``` ```{r} # Finding the median for # each species in a data.frame tapply( X = iris$Sepal.Width, INDEX = iris$Species, FUN = median) ``` ```{r} # dplyr approach library("dplyr") iris %>% group_by(Species) %>% summarise( spec_median = median(Sepal.Width) ) ``` ### Example: Functions as Data ```{r} stat_funs = list(min = min, median = median, mean = mean, sd = sd, max = max) stat_funs # Apply a Function over a List or Vector version_one = sapply(stat_funs, FUN = function(x, data) sapply(data, x), data = trees) version_one # Apply a Function to Multiple Lists/Vectors version_two = mapply(sapply, stat_funs, MoreArgs=list(X=trees)) all.equal(version_one, version_two) version_one version_two ``` ### Exercise: Functionals with Data - Use the `summary()` on three data sets: ```{r} data_combined = list(PlantGrowth, rock, mtcars) ``` - Compute the quantiles of: ```{r} sim_data = list(normal_nums = rnorm(100), uniform_nums = runif(50)) ``` See `?quantile` # An Oddessy in Purrr ## Example: Type Stable ```{r} library("purrr") # Map output to list map(mtcars, mean) # Using base R lapply(mtcars, FUN = mean) # Map to double vector map_dbl(mtcars, mean) ``` Type stability is a nice feature: ```{r} # Base R type-stable map vapply(mtcars, FUN = mean, FUN.VALUE = numeric(1)) # Avoid using type-unstable map sapply(mtcars, FUN = mean) ``` ### Exercise: Downloading and Loading Excel Spreadsheets Recall from the Web API lecture, we downloaded DMI's spreadsheets. ```{r cache = TRUE} # URL of file to retrieve url_base = "http://dmi.illinois.edu/stuenr/class/" dir.create("data", showWarnings = FALSE) # Save this file as ... destfile_name = "enrsp" destfile_ext = ".xls" years = 10:19 # Create URL format_url = paste0( url_base, destfile_name, years, destfile_ext ) # Make destination file destfile_full = paste0( destfile_name, years, destfile_ext ) # Download the files download.file(url = format_url, destfile = file.path("data", destfile_full)) ``` From here, we're interested in reading them into _R_. To do so, we need to get a path to their location. ```{r} my_data = list.files(path = "data", pattern =".xls", full.names = T) my_data ``` If we wanted to read in _one_ file, we would use: ```{r} library("readxl") enrollment_sp = readxl::read_excel(my_data[1], skip = 4) ``` How can we read in _all_ files together? ```{r} read_dmi_abstract = function(x) { dmi_abstract = readxl::read_excel(x, skip = 4) dmi_abstract$spreadsheet = x return(dmi_abstract) } library("purrr") map_dfr(my_data, read_dmi_abstract ) ```