STA 210 - Fall 2023 – Cross validation application

Announcements

See Ed Discussion for upcoming events and internship opportunities
Statistics Experience due Mon, Nov 20 at 11:59pm
Prof. Tackett office hours Fridays 1:30 - 3:30pm for the rest of the semester
Start the final project in lab this week - start thinking about the data your team wants to use

Mid-semester survey

Thank you to everyone who filled out the mid-semester survey!

Aspect of class most helpful with learning

Application exercises
Lectures
Discussing content with others

Something to do in class to better help with learning

Zooming out more / reminder of the big picture
Taking time to finish AEs (perhaps do some of this in lab)
More conceptual questions on assignments, specifically HW

Things you do that are helpful with learning

Attend office hours!
Review course materials
Lots practice - review AEs, HW, labs

Mid-semester survey

Why we do in-class exams

Opportunity to demonstrate understanding of concepts and how they apply to application
- This is what will make you stand out as a statistician/ data scientist!
In-class provides the most “level” playing field to demonstrate conceptual understanding, given all the online resources available now
Lots of other opportunities to demonstrate application skills through labs, HW, final project, and take-home portion of exam

Statistician of the day: Felicity Enders

Dr. Felicity Enders received her PhD from Johns Hopkins Bloomberg School of Public Health. She is a Professor of Biostatistics at the Mayo Clinic. With close to 200 publications, she has worked closely with clinicians, with particular focus on women’s health and psychology. Across the medical spectrum, Dr. Enders has provided advanced statistical modeling collaboration in clinical trials.

She is also passionate about biostatistics education and works to dissolve the hidden curriculum for research, particularly statistical knowledge needed for non-statisticians.

Source: hardin47.github.io/CURV/scholars/enders

Felicity Enders

Dr. Enders was a statistician on an interdisciplinary research team that used logistic regression to identify demographic, clinical, and laboratory variables associated with the presence (or absence) of advanced fibrosis with the aim to create a scoring system that could be used by clinicians.

“Data from each of the 4 countries were randomly separated into 2/3 and 1/3 of patients for model building and model validation, respectively. Hence, data on 480 patients were used to build a model, whereas data on 253 patients were used to validate the model.”

“…cross-validation was used with 20 subgroups, so that at most 5% of the data under consideration was excluded at any one time. By employing cross-validation, the possibility of an unusually positive or negative validation subset could be assessed.”

Angulo, Paul, et al. “The NAFLD fibrosis score: a noninvasive system that identifies liver fibrosis in patients with NAFLD.” Hepatology 45.4 (2007): 846-854.

Topics

Cross validation application exercise

Computational setup

# load packages
library(tidyverse)
library(tidymodels)
library(patchwork)
library(knitr)
library(kableExtra)
library(countdown)
library(rms)

# set default theme and larger font size for ggplot2
ggplot2::theme_set(ggplot2::theme_bw(base_size = 20))

Cross validation application

Announcements

Mid-semester survey

Mid-semester survey

Statistician of the day: Felicity Enders

Felicity Enders

Topics

Computational setup

Introduction

Data: Restaurant tips

Variables

Outcome: `Tip`

Predictors

Outcome vs. predictors

Analysis goal

Application exercise

Inference for multiple linear regression

Modeling workflow