Feature engineering

Prof. Maria Tackett

Oct 09, 2023

min	median	max	iqr
5.31	9.93	26.3	5.755

term	estimate	std.error	statistic	p.value
(Intercept)	9.484	0.989	9.586	0.000
debt_inc_cent	0.691	0.685	1.009	0.319
verified_incomeSource Verified	2.157	1.418	1.522	0.135
verified_incomeVerified	7.181	1.870	3.840	0.000
annual_income_th_cent	-0.007	0.020	-0.341	0.735
verified_incomeSource Verified:annual_income_th_cent	-0.016	0.026	-0.643	0.523
verified_incomeVerified:annual_income_th_cent	-0.032	0.033	-0.979	0.333

Understanding the model

$\begin{aligned} \hat{i n t e r e s t_r a t e} & = 9.484 + 0.691 \times d e b t_i n c_c e n t \\ - 0.007 \times a n n u a l_i n c o m e_t h_c e n t \\ + 2.157 \times S o u r c e V e r i f i e d + 7.181 \times V e r i f i e d \\ - 0.016 \times a n n u a l_i n c_t h_c e n t \times S o u r c e V e r i f i e d \\ - 0.032 \times a n n u a l_i n c_t h_c e n t \times V e r i f i e d \end{aligned}$

What is $p$ , the number of predictor terms in the model?
Write the equation of the model to predict interest rate for applicants with Not Verified income.
Write the equation of the model to predict interest rate for applicants with Verified income.

Feature engineering Prof. Maria Tackett Oct 09, 2023

Feature engineering
Announcements
Statistician of the day: Rafael Irizarry
Work on impacts of Hurricane Maria
Categorical predictors, interactions, & feature engineering
Topics
Computational setup
Types of predictors
Data: Peer-to-peer lender
Variables
Response: interest_rate
Predictors
Data manipulation 1: Rescale income
Data manipulation 2: Mean-center numeric predictors
Data manipulation 3: Create indicator variables for verified_income
Interest rate vs. annual income
Data manipulation 4: Create interaction variables
Interaction term in the model
Interpreting interaction terms
Understanding the model
Feature engineering
Introduction
The Office
Data
IMDB ratings
IMDB ratings vs. number of votes
Outliers
IMDB ratings vs. air date
IMDB ratings vs. seasons
Modeling
Spending our data
Splitting the data
Train / test
Training data
Feature engineering
Feature engineering with dplyr
Modeling workflow
Building recipes
Initiate a recipe
Step 1: Alter roles
Step 2: Add features
Step 3: Add more features
Step 4: Convert numbers to factors
Step 5: Make dummy variables
Step 6: Remove zero variance predictors
Putting it all together
Putting it all together
Next step…
Application exercise
Working with recipes
Application exercise
Recap