Logistic Regression: Prediction + classification

Prof. Maria Tackett

Nov 06, 2023

Odds ratios practice

Let’s take a look at one of the models from Lab 06 using flipper length and species to predict the odds a penguin is large (has a body mass above average).

term	estimate	std.error	statistic	p.value
(Intercept)	-39.151	7.149	-5.477	0.000
speciesChinstrap	-1.870	0.580	-3.221	0.001
speciesGentoo	0.512	0.843	0.607	0.544
flipper_length_mm	0.195	0.037	5.295	0.000

Interpret the coefficient of flipper_length_mm in terms of the odds a penguin is large.
Interpret the coefficient of speciesChinstrap in terms of the odds a penguin is large.

03:00

	Email is spam	Email is not spam
Email classified as spam	True positive	False positive (Type 1 error)
Email classified as not spam	False negative (Type 2 error)	True negative

	Email is spam	Email is not spam
Email classified as spam	True positive	False positive (Type 1 error)
Email classified as not spam	False negative (Type 2 error)	True negative

	Email is not spam	Email is spam
Email classified as not spam	877	82
Email classified as spam	8	14

	Email is not spam	Email is spam
Email classified as not spam	830	52
Email classified as spam	55	44

	Email is not spam	Email is spam
Email classified as not spam	883	89
Email classified as spam	2	7

Logistic Regression: Prediction + classification Prof. Maria Tackett Nov 06, 2023

Logistic Regression: Prediction + classification
Announcements
Odds ratios practice
Topics
Computational setup
Data
openintro::email
Training and testing split
Exploratory data analysis
Reminder: Modeling workflow
Start with a recipe
Initiate a recipe
Remove certain variables
Feature engineer date
Discretize numeric variables
Create dummy variables
Remove zero variance variables
Recipe: All in one place
Build a workflow
Define model
Define workflow
Fit model to training data
Make predictions
Make predictions for test data
A closer look at predictions
Sensitivity and specificity
False positive and negative
Sensitivity and specificity
Evaluate the performance
ROC curve, under the hood
ROC curve
Evaluate the performance: AUC
Make decisions
Cutoff probability: 0.5
Confusion matrix
Classification
Cutoff probability: 0.25
Classification
Cutoff probability: 0.75
Classification
Use ROC curve
Recap