--- title: "Derived Variables" author: "JJB + Course" date: "02/08/2019" output: html_document: toc: true toc_float: collapsed: false --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` # Derived Variables ## Example: Classification ```{r sample-data} # Set seed set.seed(188) # Generate data sim_data = data.frame( x = c(runif(25, 0, .75), runif(25, 0.75, 1)), y = c(runif(25, 0, .5), runif(25, 0.75, 1)), group = c(rep("A", 25), rep("B", 25)) ) ``` ```{r show-sample} library(ggplot2) ggplot(data = sim_data) + aes(x, y, color = group) + geom_point() + theme_bw() + labs(title = "Observing Data", subtitle = "Randomly Generated Uniform Groups of Data", color = "Groups", caption = "STAT 385 @ UIUC") ``` ```{r apply-decision-rule} ggplot(data = sim_data) + aes(x, y, color = group) + geom_point() + geom_vline(xintercept = .75, color = "orange") + theme_bw() + labs(title = "Crafting a Decision Rule", subtitle = "Classifying with a rule of 0.75", color = "Groups", caption = "STAT 385 @ UIUC") ``` ## Example: Blood Pressure Data frame containing the blood pressure data ```{r construct-bp-data} ## Construct BP data bp_data = data.frame( Subject_ID = c("S005", "S130", "S023", "S098", "S035", "S007", "S104"), Sex = c("Male", "Female", "Male", "Male", "Male", "Female", "Female"), Systolic = c(110, 141, 125, 168, 115, 122, 135) ) ## View data.frame bp_data ``` Post discussion on how to classify data using a 'lookup' table. ```{r augment-data-frame-with-type} ## Add column to data.frame bp_data$BP_Type = c("Normal", "Stage 2", "Elevated", "Stage 2", "Normal", "Elevated", "Stage 1") ``` ## Example: Creating New Variables ```{r ways-to-add-variables} # Classified Data to be added... classified_data = c("Normal", "Stage 2", "Elevated", "Stage 2", "Normal", "Elevated", "Stage 1") # Constructing the variable with $ bp_data$BP_Type = classified_data # Recreating the data.frame with a new column bp_data = data.frame(bp_data, BP_Type = classified_data) # Add a new variable to data.frame using transform() bp_data = transform(bp_data, BP_Type = classified_data) # Add a new variable to data.frame using within() bp_data = within(bp_data, { BP_Type = classified_data }) ``` # if-else ## Example: if-else bank account ```{r if-bank-acct} prob_opening_savings_acct = 0.91 # about 91% if( prob_opening_savings_acct > 0.80) { message("Target this user!") } else { message("Save our money and don't bother sending flyers.") } ``` ## Example: if-else ```{r example-if-else} x = -2L if(x < 0L) { -1 * x } else { x } ``` What happens if `x` is positive? negative? 0? ## Example: Classifying Determining `x` by comparing it against pre-set values. ```{r example-classify} x = "Jerry" if ( x != "Jerry" & x != "Elaine" & x != "George") { message("Soup for you!") } else { message("No soup for you!") } ``` An alternative way to write the above if. ```{r example-classify-in} if ( !(x %in% c("Jerry", "Elaine", "George")) ) { message("Soup for you!") } else { message("No soup for you!") } ``` ### Exercise: Determining parity of number Using the modulus operator, determine whether `x` is odd or even. Output using the `message()` function: - "x is even" - "x is odd" ```{r ex-if-number-parity} x = 1 # 1 modulus 2 1 %% 2 # 2 modulus 2 2 %% 2 3 %% 2 4 %% 2 ``` ```{r} test_case = 18 if( test_case %% 2 == 0) { message("The number ", test_case, " is divisible by 2") message("The value ", test_case, " is therefore EVEN.") } else { message("The number ", test_case, " is not divisible by 2") message("The value ", test_case, " is therefore ODD.") } ``` ## Example: if-else-if-else Absolute Value We could split the absolute value logic into three cases. Though, the second case is redundant as anything multiplied by 0 will be 0. ```{r three-case-abs} if ( x < 0 ) { -1 * x } else if( x == 0 ) { 0 } else { x } ``` ## Example: if-else-if-else Discriminant ```{r discriminant-ex} a = 3; b = 2; c = -1 discriminant = b^2 - 4*a*c if ( discriminant > 0 ) { message("two real roots") } else if( discriminant == 0 ) { message("one real root") } else { message("two imaginary roots") } ``` ### Exercise: Grade Scale Write an `if-else if-else` statement for the traditional grade scale. ```{r ex-grade-scale} grade_pct = 82 if(grade_pct > 90) { "A" } else if(grade_pct > 80) { "B" } else if(grade_pct > 70) { "C" } else { "F" } ``` ```{r ex-grade-scale-if-else} if(grade_pct > 90) { "A" } else { if(grade_pct > 80) { "B" } else { if(grade_pct > 70) { "C" } else { "F" } } } ``` ## Example: Classifying Temperature ```{r classify-temp} temperature = c(92, 93, 81, 70, 68) temperature > 76 ifelse(temperature > 76, "hot", "cold") ``` Only one value is allowed in the conditional statement. ```{r if-with-a-vector-error} # Try if() with a vector... if(temperature > 76) { "hot" } else { "cold" } ``` Why didn't it error? Checking a vector inside of a _regular_ if will be equivalent to looking at only the first element. ```{r understanding-vector-access-in-if} temperature[1] > 76 ``` How can `if-else` be used with vectors then? By **reducing** a vector to a single _logical_ value. `any()`: checks for _at least one_ `TRUE` e.g. it acts as an "OR" `|`. ```{r any-logical-reduction} any(temperature > 76) ``` `all()`: checks for _EVERYTHING_ being equal to `TRUE` e.g. AND ```{r all-logical-reduction} all(temperature > 76) all(temperature > 67) temperature > 67 ``` ## Example: Vectorized If-Else ```{r vectorized-bp} bp_data$BP_Type = ifelse(bp_data$Systolic < 120 , "Normal", ifelse( bp_data$Systolic < 129, "Elevated", ifelse( bp_data$Systolic < 139, "Stage 1", "Stage 2") ) ) ``` ## Exercise: Recoding "M" and "F" Recode the values `"M"` to `"Male"` and `"F"` to `"Female"`. ```{r re-code-ifelse} x = c("M", "M", "F", "M", "F", "F") x # This is the comparison check x == "M" # Here we use a vectorized ifelse to re-classify the subjects' sex. recode_x = ifelse(x == "M", "Male", "Female") recode_x ``` ## Example: Switch with recoding values The `switch` allows for classification of _one_ value similar to the non-vectorized `if-else` check. We could reimplement the last recoding exercise to clearly express the logic as: ```{r} x = c("M", "M", "F", "M", "F", "F") x switch(x[1], "M" = "Male", "F" = "Female") ``` ## Exercise: Switch to determine OS Write a `switch()` statement that determines what operating system _R_ is on and "updates" the names to what users would be more accustom to. **Note:** `sys_name` gives: - `"Windows"` - `"Darwin"` - `"Linux"` - `"SunOS"` Change these names to: "Microsoft", "Apple", "FOSS", "Sun/Oracle" ```{r os-switch} sys_name = Sys.info()[["sysname"]] ### Your code here... ```