library(tidyverse)
library(tidymodels)
library(knitr)
library(openintro)
AE 09: Feature engineering with recipes
Peer-to-peer lender
Go to the course GitHub organization and locate your ae-09
repo to get started.
Render, commit, and push your responses to GitHub by the end of class. The responses are due in your GitHub repo no later than Thursday, October 12 at 11:59pm.
Packages + data
The data for this AE is from the loan50
data set in the openintro R package. We will focus on the following variables:
Predictors
annual_income
: Annual income (in US dollars)debt_to_income
: Debt-to-income ratio, i.e. the percentage of a borrower’s total debt divided by their total incomeverified_income
: Whether borrower’s income source and amount have been verified (Not Verified
,Source Verified
,Verified
)
Response
interest_rate
: Interest rate for the loan (0- 100)
Analysis goal
The goals of this analysis are to build a recipe to fit a linear regression model on the training data that has the following features:
annual_income
rescaled to thousands of dollars- Mean-centered quantitative variables
- Indicator (dummy) variables for the categorical predictor
- Interaction term between rescaled
annual_income
andverified_income
and (2) use prep()
and bake()
to check the recipe
Test/train split
Fill in the code to split the data into 90% training, 10% testing.
Remove #| eval: false
from the code chunk.
set.seed(123)
<- initial_split(loan50, prop = _____)
loans_split <- training(_____)
loan_train <- _____(loan_split) loan_test
Build a recipe
Use
step_mutate()
to create a new variableannual_income_th
that isannual_income
rescaled to thousands of dollarsUse
step_center()
to mean-center quantitative variablesUse
step_dummy()
to create indicator variables for the categorical predictorUse
step_interact()
to create interaction betweenannual_income_th
andverified_income
Remove #| eval: false
from the code chunk.
# use original variables when specifying recipe
<- recipe(interest_rate ~ annaul_income + debt_to_income + verified_income,
loan_rec data = loan_train) |>
# add recipe steps
loan_rec
Check recipe using prep()
and bake()
Remove #| eval: false from the code chunk
# determine required parameters to be estimated
<- prep(loan_rec)
loan_rec_trained
# apply recipe computations to data
bake(loan_rec_trained, loan_train) |>
glimpse()
To submit the AE
- Render the document to produce the PDF with all of your work from today’s class.
- Push all your work to your
ae-09
repo on GitHub. (You do not submit AEs on Gradescope).