AE 16: Conditions for logistic regression

Published

November 15, 2023

Important

Go to the course GitHub organization and locate your ae-16 repo to get started.Render, commit, and push your responses to GitHub by the end of class.

Render, commit, and push your responses to GitHub by the end of class. The responses are due in your GitHub repo no later than Saturday, November 18 at 11:59pm.

Packages

library(tidyverse)
library(tidymodels)
library(knitr)
library(NHANES)

Today’s data is the nhanes_adult data set derived from the NHANES data set in the NHANES R package. It contains information for U.S. adults (age 18+) who participated in the 2009 - 2010 and 2011 - 2012 years of the National Health and Nutrition Examination Survey (NHANES). You can find details about how participants are selected on the CDC website.

We will use the following variables:

  • HealthGen: Self-reported rating of participant’s health in general. Excellent, Vgood, Good, Fair, or Poor.

  • Age: Age at time of screening (in years). Participants 80 or older were recorded as 80.

  • PhysActive: Participant does moderate to vigorous-intensity sports, fitness or recreational activities.

nhanes_adult <- read_csv("data/nhanes-adult.csv")

The goal of this AE is to check the conditions for a model that uses age and physical activity to understand the odds an adult in the United States rates their health as “excellent”.

Tip

Click here to see notes on conditions for logistic regression.

  1. Create a new variable called HealthExcellent that takes the value 1 if HealthGen = “Excellent” and 0 otherwise. Make a table showing the distribution of HealthExcellent.
# add code
  1. Calculate the empirical logit of HealthExcellent == 1 for each level of PhysActive. Then describe what the empirical logit means in the context of the data.
# add code
  1. Check the model conditions for a logistic regression model that uses age and physical activity to understand the odds an adult in the United States rates their health as “excellent”.

    State whether each condition is satisfied and briefly explain your response.
    • Linearity

    • Randomness

    • Independence

Tip

You can make the plots to check linearity “manually” using ggplot or using the functions from the Stat2Data package.

If you use the Stat2Data package, you need to add library(Stat2Data) to the load-packages code chunk at the top of the document. You may also need to install the package by running the code below in the console.

install.packages("Stat2Data")

Submission

Important

To submit the AE:

  • Render the document to produce the PDF with all of your work from today’s class.
  • Push all your work to your ae-16 repo on GitHub. (You do not submit AEs on Gradescope).