---
title: "Transforming Data"
author: "JJB + Course"
date: "02/04/2019"
output:
  html_document:
    toc: true
    toc_float:
      collapsed: false
---

# Functions

Functions are a piece of code that performs a specified task that may or may not depend on parameters and it may or may not return one or more values.

## Example: Add

Here we create a function with _two_ parameters that adds together
the values passed in. 

```{r my-add-func}
add = function(x, y) {
  return(x + y)
}

add(1, 3)
```

## Example: Hello World!

Consider a common task...

```{r say-hello}
message("Hello World!")
```

How could we always repeat this task _without_ needing to retype the code elsewhere?

Idea: Use a function to **describe** a recipe.

```{r say-hello-consistently}
say_hello_world = function() { 
  message("Hello World!")
}

say_hello_world()
```

## Example: Generic Code to Specific Routine

Generic _R_ script values

```{r hidden-meaning}
set.seed(1115)

sample(6, size = 1)

sample(6, size = 1)

sample(6, size = 1)
```

Adding a name to the routine...

```{r roll-a-die}
roll_die = function(num_sides) {
    roll = sample(num_sides, size = 1)
    return(roll)
}

set.seed(1115)
roll_die(6)
```

What happens if we forget to specify a `num_sides` value? E.g. what is `roll_die()`?

Making the function receive default settings...

```{r roll-die-default}
roll_die_default = function(num_sides = 6) {
    roll = sample(num_sides, size = 1)
    return(roll)
}

set.seed(1115)
roll_die_default()

set.seed(1115)
roll_die_default(6)
```

Generalizing to _n_ rolls:

```{r generalized-die-roll}
roll_n_die = function(num_rolls, num_sides = 6) {
    rolls = sample(num_sides, size = num_rolls,
                   replace = TRUE)
    return(rolls)
}

set.seed(1115)
roll_n_die(3, 6)
```

## Exercise: Transforming a Workflow

Clean up the following code by implementing a function that:

1. Generates data from a normal distribution
2. Applies the mean normalization 

```{r make-me-a-func}
set.seed(325)

x = rnorm(10)
y = rnorm(10)

x_nmu = (x - mean(x)) / (max(x) - min(x))
x_nmu
y_nmu = (y - mean(y)) / (max(y) - min(y))
y_nmu
```


Let's take a little look:


```{r}
set.seed(325)

x = rnorm(10)
y = rnorm(10)
z = rnorm(10)

x_nmu = (x - mean(x)) / (max(x) - min(x))
x_nmu
y_nmu = (y - mean(y)) / (max(y) - min(y))
y_nmu

z_nmu = (z - mean(y)) / (max(z) - min(z))
z_nmu
```

```{r}
set.seed(325)


mean_normalization <- function(n) {
  x = rnorm(n)
  x_nmu = (x - mean(x)) / (max(x) - min(x))
  # x_nmu
  #return(x_nmu)
}

mean_normalization(10)
```


# Classes and Objects

## Example: Vector Types

```{r view-vectors}
# Vector of numeric elements
w = c(9.5, -3.14, 88.9999, 12.0)
     # ^     ^      ^        ^  decimals

# Vector of integer elements
x = c(1L, 2L, 3L, 4L)

# Vector of logical elements
y = c(TRUE, FALSE, FALSE, TRUE)

# Vector of character elements
z = c("a", "b", "c", "d")
```

## Example: Creating a Data Frame by Hand

```{r viewing-heights}
subject_heights = data.frame(
  id     = c(1, 2, 3, 55),
  sex    = c("M", "F", "F", "M"),
  height = c(6.1, 5.5, 5.2, 5.9)
)
```

## Example: Determine Class and Structure


```{r looking-into-data}
class(subject_heights)
str(subject_heights)
```

## Exercise: Running `str()` and `class()` on `id`

```{r}
id = c(1, 2, 3, 55)

class(id)

str(id)

```

```{r}
id_int = c(1L, 2L, 3L, 55L)

class(id_int)
str(id_int)

```


# Vectorization 

## Example: Vectorization and Elements

Simultaneously calculating multiple points.

```{r vectorized-addition}
x = c(1, 2, 3, 4)
y = c(5, 6, 7, 8)
z = x + y
z
```

## Example: Vectorized Binary Operators

_R_ has multiple **binary** operators built-in to speed up calculations.

```{r example-of-ops}
x = c(1, 2, 3, 4)
y = c(5, 6, 7, 8)
x + y             # Addition
x - y             # Subtraction
x * y             # Multiplication       
x / y             # Division
x ^ y             # Exponentiation
x %/% y           # Integer Division
x %% y            # Modulus
```


### Aside: Modulus

The _modulus_ operator computes the remainder term of a division.

$$a \mod q$$

```{r mod-ex}
12 %% 7                 # a = n*q + r => 12 = 1*7 + 5

outer(9:1, 2:9, `%%`)   # Compute the cross between X & Y
```


## Example: Recycling

Handling length "mis-matches"...

```{r recycle-process}
a = c(1, 2, 3, 4)
length(a)

b = c(5, 6, 7)
length(b)

a + b
```

## Example: Recycling - Round 2

What happens if the shorter vector is an even multiple of the longer vector?

```{r expansion-shorter}
c(1, 2, 3, 4) + c(-1, 1) 
```

## Exercise: Determining Scalars

Explain what happens if we have a vector and add a single value

```{r whats-a-scalar}
a = 2 
x = c(1, 2, 3, 4) 
x + a
```

### Exercise: Recycle a value for a Confidence Interval

```{r}
p_hat = 0.6

n = 110

z_crit = qnorm(0.975)


p_hat + c(-1, 1) * z_crit * sqrt(p_hat * (1-p_hat) / n)

```


## Example: Everything is a Vector

```{r etia}
a = 2
length(a)

a_vec = c(2)
length(a_vec)
```

```{r eq-check}
identical(a, a_vec)
```


# Subsets

Selecting a smaller amount of data.

## Example: Positional Indexes

```{r ex-vector}
ex_vec = c(5, 3, -2, 42)
```

## Example: Retrieving a Single Value

```{r retrieve-first}
ex_vec = c(5, 3, -2, 42)

# Retrieve first element
ex_vec[1]

# Retrieve second element
ex_vec[4]

# Retrieve the nth element
last_pos = length(ex_vec)
ex_vec[last_pos]
```

## Example: Retrieve Multiple Values

```{r retrieve-seq}
ex_vec = c(5, 3, -2, 42)

ex_vec[c(2, 3)]

ex_vec[2:3]
```

## Example: Retrieve Multiple Values by Removing Indices

```{r neg-seq}
ex_vec = c(5, 3, -2, 42)

ex_vec[-c(1, 4)]
```

## Example: Named Access Retrieval


```{r named-access}
# Create example vector
ex_vec = c(5, 3, -2, 42)

# Set the element names
names(ex_vec) = c(
   "a", "b", "c", "d"
)

# Select element "b"
ex_vec["b"]

# Retrieve the element names
names(ex_vec)
```


## Example: Generating Indices

There are _many_ ways to create the positional indices for each 
vector.

```{r sample-index-creation}
# Construct an example
# vector
ex_vec = c(5, 3, -2, 42)

# Create indices
1:length(ex_vec)

seq(1, length(ex_vec))

seq_len(length(ex_vec))

seq_along(ex_vec)
```


## Exercise: Positional Index Methods

Using all sequence methods, create sequences for the following vectors. Are all approaches the same?

```{r}
int_vec = c(8L, -2L, 5L, 0L)
empty_vec = numeric(0)
```


```{r}

# Filled vectors

1:length(int_vec)

```

```{r}

# An empty vector
empty_vec

1:length(empty_vec)

length(empty_vec)

empty_vec[0]

empty_vec[1]
```


```{r}

1:length(empty_vec)

length(empty_vec)

1:0
c(1, 0 )

seq_len(length(empty_vec))

seq_along(empty_vec)
```