AE 17: Exam 02 Review

Published

November 29, 2023

Important

Go to the course GitHub organization and locate your ae-17 repo to get started.Render, commit, and push your responses to GitHub by the end of class.

Render, commit, and push your responses to GitHub by the end of class. The responses are due in your GitHub repo no later than Saturday, December 2 at 11:59pm.

Note: This in-class review is not exhaustive. Use lecture notes notes, application exercises, labs, and homework for a comprehensive exam review.

Packages

library(tidyverse)
library(tidymodels)
library(knitr)

Part 1: Logistic regression

Data

The data for this analysis is about credit card customers. It can be found in the file credit.csv. The following variables are in the data set:

income: Income in $1,000’s
limit: Credit limit
rating: Credit rating
cards: Number of credit cards
age: Age in years
education: Number of years of education
own: A factor with levels No and Yes indicating whether the individual owns their home
student: A factor with levels No and Yes indicating whether the individual was a student
married: A factor with levels No and Yes indicating whether the individual was married
region: A factor with levels South, East, and West indicating the region of the US the individual is from
balance: Average credit card balance in $.

The objective of this analysis is to predict whether a person has maxed out their credit card, i.e., had $0 average card balance.

credit <- read_csv("data/credit.csv") |>
  mutate(maxed = factor(if_else(balance == 0, 1, 0)))

Exercise 1

Why is logistic regression the best modeling approach for this analysis?

Fill in the blanks. In logistic regression, we
- use log-odds to …
- use odds to …
- use probabilities to …

Exercise 2

Split the data into 80% training and 20% test sets. Use seed 210.

set.seed(210)
credit_split <- initial_split(credit, prop = 0.8)
credit_train <- training(credit_split)
credit_test <- testing(credit_split)

Specify the logistic regression model. Call it credit_spec.

#add code here

Exercise 3

Create a recipe called credit_rec that does the following:

Predict maxed using income, age, and student.
Mean center the quantitative predictors.
Create dummy variables where needed and drop any zero variance variables.

credit_rec <- recipe(maxed ~ income + age + student, 
                     data = credit_train)

# add recipe steps

Exercise 4

Create the workflow that brings together the model specification and recipe. Call it credit_wflow.

# add code here

Exercise 5

Conduct 5-fold cross validation. Use seed 210 to create the folds.

# add code here

Then, summarize the metrics from your CV resamples.

# add code here

Exercise 6

Create a workflow for another model with a new recipe that includes the variable region along with all the variables and recipe steps from Exercise 3. Conduct cross validation, then select a model between the two.

# add code here

Exercise 7

Fit the model you selected in the previous exercise to the entire training set.

# add code here

Exercise 8

Now let’s evaluate the performance of the selected model using the testing data.

Create a confusion matrix using a cutoff probability of 0.5.

# add code here

What is the sensitivity? What does it mean in the context of the data ?
What is the specificity? What does it mean in the context of the data?
What is the false positive rate? What does it mean in the context of the data?
What is the false negative rate? What does it mean in the context of the data?

Exercise 9

Produce the ROC curve.

# add code here

Describe how you can use this curve to select a cutoff probability (rather than just going with 0.5).

Exercise 10

Questions about checking conditions for logistic regression:

Do we assess conditions on the training or testing set?
Why do we not consider categorical predictors when checking linearity?
Why do we not need to check constant variance for logistic regression?

Submission

Important

To submit the AE:

Render the document to produce the PDF with all of your work from today’s class.
Push all your work to your ae-17 repo on GitHub. (You do not submit AEs on Gradescope).