SLR: Randomization test for the slope

Prof. Maria Tackett

Sep 13, 2022

term	estimate	std.error	statistic	p.value
(Intercept)	116652.33	53302.46	2.19	0.03
area	159.48	18.17	8.78	0.00

Hypothesis testing framework

Start with a null hypothesis, $H_{0}$ that represents the status quo
Set an alternative hypothesis, $H_{A}$ that represents the research question, i.e. claim we’re testing
Conduct a hypothesis test under the assumption that the null hypothesis is true and calculate a p-value (probability of getting the observed or a more extreme outcome given that the null hypothesis is true)
- if the test results suggest that the data do not provide convincing evidence for the alternative hypothesis, stick with the null hypothesis
- if they do, then reject the null hypothesis in favor of the alternative

Concluding the hypothesis test

Is the observed slope of $\hat{β_{1}} = 159$ (or an even more extreme slope) a likely outcome under the null hypothesis that $β = 0$ ? What does this mean for our original question: “Do the data provide sufficient evidence that $β_{1}$ (the true slope for the population) is different from 0?”

Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.

Warning: Removed 2 rows containing missing values (`geom_bar()`).

Mathematical models for inference

term	estimate	std.error	statistic	p.value
(Intercept)	116652.325	53302.463	2.188	0.031
area	159.483	18.171	8.777	0.000

Mathematical representation of the model

$\begin{aligned} Y & = M o d e l + E r r o r \\ = f (X) + ϵ \\ = μ_{Y | X} + ϵ \\ = β_{0} + β_{1} X + ϵ \end{aligned}$

where the errors are independent and normally distributed:

independent: Knowing the error term for one observation doesn’t tell you anything about the error term for another observation
normally distributed: $ϵ \sim N (0, σ_{ϵ}^{2})$

Mathematical representation, visualized

$Y | X \sim N (β_{0} + β_{1} X, σ_{ϵ}^{2})$

`geom_smooth()` using formula = 'y ~ x'

Graph reproduced from *Beyond Multiple Linear Regression*.

Mean: $β_{0} + β_{1} X$ , the predicted value based on the regression model
Variance: $σ_{ϵ}^{2}$ , constant across the range of $X$
- How do we estimate $σ_{ϵ}^{2}$ ?

term	estimate	std.error	statistic	p.value
(Intercept)	116652.33	53302.46	2.19	0.03
area	159.48	18.17	8.78	0.00

SLR: Randomization test for the slope Prof. Maria Tackett Sep 13, 2022

SLR: Randomization test for the slope
Announcements
Topics
Computational setup
Recap of last lecture
Data: Duke Forest houses
The regression model
Inference for simple linear regression
Sampling is natural
Confidence interval via bootstrapping
Bootstrapping pipeline I
Bootstrapping pipeline II
Bootstrapping pipeline III
Bootstrapping pipeline IV
Visualize the bootstrap distribution
Compute the CI
But first…
Compute 95% confidence interval
Hypothesis test for the slope
Research question and hypotheses
Hypothesis testing as a court trial
Hypothesis testing framework
Quantify the variability of the slope
Permutation, described
Permutation, visualized
Permutation, repeated
Concluding the hypothesis test
Application exercise
Permutation pipeline I
Permutation pipeline II
Permutation pipeline III
Permutation pipeline IV
Permutation pipeline V
Visualize the null distribution
Reason around the p-value
Compute the p-value
Mathematical models for inference
The regression model, revisited
Inference, revisited
Mathematical representation of the model
Mathematical representation, visualized
Regression standard error
Standard error of ${\hat{β}}_{1}$