Generalized linear models

class: center, middle, inverse, title-slide

.title[
# Generalized linear models
]
.subtitle[
## Logistic regression
]
.author[
### Dr Nemanja Vaci
]
.date[
### 2024-02-15
]

---

## Press record

---

## First meeting practice

---

## Linear regression

Gaussian distribution:
`$y_{i} \sim \mathcal{N}(\mu_{i},\sigma)$` 
`$\mu_{i} = \alpha+\beta*x_i$` 
 
<img src="Week2LogisticModel_files/figure-html/unnamed-chunk-1-1.png" width="100%" style="display: block; margin: auto;" />

---
## Exponential family
.center[
<img src="Exp5.png", width = "70%"> 
]
???
We can use all these functions as an outcome in the generalized linear models 
Poisson - counts with no upper bound 
Gamma - always positive, time to an event with multiple components (eg. age onset of cancer) 
Many of these distributions will converge to Normal: Binomial if the probability is far from edge, Poisson and Gamma if the mean is large - remember central limit theorem 
https://en.wikipedia.org/wiki/Exponential_family 
https://i.stack.imgur.com/HgpO4.jpg 
Check out amazing lectures from Richard McElreath (this lecture uses his idea extensively): [youtube channel](https://www.youtube.com/channel/UCNJK6_DZvcMqNSzQdEkzvzA)
---

## Binomial distribution
Distribution: 
`$$y_i \sim Binomial(n,p)$$`

Number of successes of n number of independent Bernoulli trials -> Binomial distribution: 
Probability mass function: `$P(X=x)=(^n_x)p^x*(1-p)^{n-x}$` 
**.red[Mean]**: `$\mu=n*p$` 
**.green[Variance]**: `$\sigma^2=n*p(1-p)$`
???
Number of successes in a sequence of n independent estimates. The probability of seeing __x__ number of successes in __n__ tries. 
---

## Logistic regression

`$$p=\alpha+\beta*x_i$$`
--

What would happen if we would model probability using linear regression? 
--
Linear function is unbounded - does not have to result in a number between 0 and 1 
Errors are not necessarily normal 
Mean and SD are related 
--
 
Map values to [0,1]: 
`$$f(p) =log(\frac{p}{1-p})$$`
`$$logit(p_i)=\alpha+\beta*x_i$$`

???
Our y or the outcome are counts that are following Binomial distribution. 
What we would like to estimate/model is the probability (p) 
Mean and variance are related: [link](https://www.researchgate.net/publication/275248713/figure/fig1/AS:282627547385872@1444395066350/Mean-variance-relationships-for-different-models-A-normal-model-assumes-that-the_W640.jpg) 
---
## Probabilities
<img src="Week2LogisticModel_files/figure-html/unnamed-chunk-2-1.png" width="100%" />
---

## First step: odds
`$$Odds =\frac{p_{i}}{1-p_{i}}$$`

<img src="Week2LogisticModel_files/figure-html/unnamed-chunk-3-1.png" width="100%" style="display: block; margin: auto;" />
???
Odds - likelihood of an outcome. Probability that event would occur versus the probability that event will not occur. 
Odds ratios (often used as a same thing as __odds__) but in research used as a ratio of two different odds. The odds of event A happening in the case when B present, versus the odds of event happening in absence of B (e.g. medication).
---
## Second step: log odds

`$$logOdds =log(\frac{p_{i}}{1-p_{i}})$$`

???
The odds distribution is always positive and it is increasing exponentially, which means that we would like to unbound it to go also towards negative values and to be linear 
---

## What is the model doing: log odds space?

---

## What is the model doing: probability space?

???
We are squeezing the logit values in a probability space 
---

## Relation between LogOdds and Probabilities
.center[
<img src="image4.png", width = "50%">
]
---

## Logistic regression in R

We return back to our crying babies that are motivated to start crawling.

---

## Logistic regression in R: coefficients

Predict crawling success as a linear function of age

```r
glm1 <- glm ( Crawl ~ Age, data = Babies, family = binomial ( link = 'logit' ))
glm1 $ coefficients
```

```
## (Intercept) Age 
## -1.3307752 0.1194777
```
 
Three ways to interpret logistic regression: 
 1. Logit values 
 2. **.red[Odds]** 
 3. **.green[Probabilities]**

---
## Odds ratio space

Odds:

```r
exp ( glm1 $ coefficients )
```

```
## (Intercept)         Age 
##   0.2642723   1.1269081
```

Intercept: When Age is **.red[zero]**, the **.red[odds]** of babies crawling are 74% (1-.26) **.green[less likely]** than odds of crawling.

Age: Babies **.red[one month older]** are 12% more likely to start crawling OR unit increase in Age **.green[increases the odds]** of crawling by a factor of 1.12

???
For 0.26 baby that crawls, you will have 1 baby that does not crawl 
Or we can invert the odds:1/0.26 = 3.84

At Age 0 (birth), it is 3.84 times more likely for babies not to crawl than to crawl. 
For every 3.84 babies that do not crawl you will have 1 that does.

---

## Probability space: intercept

Logit to probability space: `$$\frac{1}{1+exp^{-(\alpha+\beta*x)}}$$`

Intercept: Estimated probability of babies starting to crawl when Age is 0 (birth)

```r
1 / ( 1 + exp ( - ( -1.33078 + 0.119 * 0)))
```

```
## [1] 0.2090304
```

---

## Probability space: slope

We need to look where to evaluate changes on the curve:

```r
1 / ( 1 + exp( 1.33078 - 0.11948 * 10))
```

```
## [1] 0.4660573
```

```r
arm::invlogit(coef( glm1 )[[ 1 ]] + coef( glm1 )[[ 2 ]] * mean( Babies $ Age ))
```

```
## [1] 0.6562395
```

Estimated probability of babies starting to crawl when Age is 10 or mean of the sample

---

## Interpretation continued

```r
Babies$LogOdds=-1.33078+0.11948*Babies$Age
Babies$Odds=exp(Babies$LogOdds)
Babies$Probs=Babies$Odds/(1+Babies$Odds)
```
<img src="Week2LogisticModel_files/figure-html/unnamed-chunk-12-1.png" width="100%" />

???

This is the connection between the log odds, odds and probabilities in our data, based on our model. 
---

## Predicted probability versus outcome

```r
ggplot(data=Babies, aes(x=Age, y=Probs))+geom_point(size=3, col='blue')+geom_point(data=Babies, aes(x=Age, y=Crawl), size=3, col='red')
```

<img src="Week2LogisticModel_files/figure-html/unnamed-chunk-13-1.png" width="100%" style="display: block; margin: auto;" />
???
The red dots indicate the observed crawling values (1 and 0) - whether baby is crawling or not given their age. The blue dots are predicted probabilities by the model. 
---
## Relationship between probabilities and odds

Probability: `$p = \frac{f}{N}$` 
Expectations of how many times we will see event in N number of trials (between 0 and 1) 
Odds of event: `$odds = \frac{p}{1-p}$` (between 0 and positive infinity) 
Probability that the event will occur divided by the probability that event will not occur

p = 0.80 
odds = `$\frac{0.80}{0.20}=4$` - odds are 4 to 1 that the event will occur 
p = 0.20 
odds= `$\frac{0.20}{0.80}=0.25$` - odds are 0.25 to 1 that the event will occur 
If a race horse runs 100 races and wins 25 times and loses 75 of times, the probability of winning is 25/100 = 0.25 or 25%, but the odds of the horse winning are 25/75 = 0.333 or 1 win to 3 loses

---

## Logistic regression formula

What is missing in this equation?

`$$log(\frac{p}{1-p})=\alpha+\beta*x_i$$`

$$ y = \alpha + \beta * Age + \epsilon $$

We do not have an estimation of error: `$y_i \sim Binomial(n,p)$`

We do have residuals - estimate of the fit

`$$residual_i=-2*(y_i-logit^{-1}(\beta*x_i))$$`

---

## Deviance

Goodness of fit: deviance of the fitted model with respect to a perfect model 
Null deviance: `$D=-2*loglik(\beta_0)$` 
Residual deviance: `$D=-2*loglik(\beta*x)$`

```r
with(glm1, pchisq(null.deviance - deviance, df.null - df.residual, lower.tail = FALSE))
```

```
## [1] 1.751679e-05
```

AIC: `$-2loglik(\beta*x)+2*npar$` 
BIC: `$-2loglik(\beta*x)+log(number of observations)*npar$`

???
Deviance: difference in categorising outcomes in comparison to the perfect model 
Likelihood: the probability of the observed results given the parameter estimates 
AIC and BIC are used to compare the models: [link](https://stats.stackexchange.com/a/767)
---

pre[class] {
 max-height: 100px;
}
</style>

## Model results

```r
summary(glm1)
```

```
## 
## Call:
## glm(formula = Crawl ~ Age, family = binomial(link = "logit"), 
##     data = Babies)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.0687  -0.9876   0.5443   0.8979   1.6082  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -1.33078    0.50868  -2.616 0.008894 ** 
## Age          0.11948    0.03079   3.880 0.000104 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 131.79  on 99  degrees of freedom
## Residual deviance: 113.35  on 98  degrees of freedom
## AIC: 117.35
## 
## Number of Fisher Scoring iterations: 3
```
???
[Wald test](https://stats.stackexchange.com/a/60083)

---

## Link functions and other GLMs

Most common Generalized linear models:
 - Poisson regression: traffic, goals in a soccer game 
 - Negative binomial: same problems but deals with overdispersion 
 - Gamma regression: reaction times? always positive right skewed data 
---
## Absolutely cool things to note

The question that we are asking is can we stratify the probability of a success in outcome using our predictor variables

We are modelling the mean and we do not have an information on individual data points - no error

The scale of parameters is different!

Parameters are multiplicative - coefficient interact, especially in the tails of probability distribution

Not easy to compare the models - especially if they differ in a more than one predictor - deviance criteria gives you information how well you model fits the mean 
---

class: inverse, middle, center
# Practical aspect
---

## Data

NBA dataset: [Link](https://www.kaggle.com/ionaskel/nba-games-stats-from-2014-to-2018)

```r
basketball<-read.table('Basketball.txt',sep='\t', header=T)
knitr::kable(head(basketball[,c(5,13,18,31,34,43)]), format = 'html')
```

<table>
 <thead>
 <tr>
 <th style="text-align:left;"> </th>
 <th style="text-align:left;"> Home </th>
 <th style="text-align:right;"> X3PointShots </th>
 <th style="text-align:right;"> FreeThrows. </th>
 <th style="text-align:right;"> Opp.3PointShots. </th>
 <th style="text-align:right;"> Opp.FreeThrows. </th>
 <th style="text-align:right;"> Win </th>
 </tr>
 </thead>
<tbody>
 <tr>
 <td style="text-align:left;"> 1 </td>
 <td style="text-align:left;"> Away </td>
 <td style="text-align:right;"> 13 </td>
 <td style="text-align:right;"> 0.529 </td>
 <td style="text-align:right;"> 0.308 </td>
 <td style="text-align:right;"> 0.818 </td>
 <td style="text-align:right;"> 0 </td>
 </tr>
 <tr>
 <td style="text-align:left;"> 32 </td>
 <td style="text-align:left;"> Away </td>
 <td style="text-align:right;"> 6 </td>
 <td style="text-align:right;"> 0.867 </td>
 <td style="text-align:right;"> 0.323 </td>
 <td style="text-align:right;"> 0.750 </td>
 <td style="text-align:right;"> 1 </td>
 </tr>
 <tr>
 <td style="text-align:left;"> 83 </td>
 <td style="text-align:left;"> Home </td>
 <td style="text-align:right;"> 8 </td>
 <td style="text-align:right;"> 0.652 </td>
 <td style="text-align:right;"> 0.368 </td>
 <td style="text-align:right;"> 0.769 </td>
 <td style="text-align:right;"> 1 </td>
 </tr>
 <tr>
 <td style="text-align:left;"> 112 </td>
 <td style="text-align:left;"> Home </td>
 <td style="text-align:right;"> 12 </td>
 <td style="text-align:right;"> 0.684 </td>
 <td style="text-align:right;"> 0.481 </td>
 <td style="text-align:right;"> 0.941 </td>
 <td style="text-align:right;"> 0 </td>
 </tr>
 <tr>
 <td style="text-align:left;"> 165 </td>
 <td style="text-align:left;"> Away </td>
 <td style="text-align:right;"> 7 </td>
 <td style="text-align:right;"> 0.769 </td>
 <td style="text-align:right;"> 0.364 </td>
 <td style="text-align:right;"> 0.652 </td>
 <td style="text-align:right;"> 0 </td>
 </tr>
 <tr>
 <td style="text-align:left;"> 196 </td>
 <td style="text-align:left;"> Away </td>
 <td style="text-align:right;"> 7 </td>
 <td style="text-align:right;"> 0.611 </td>
 <td style="text-align:right;"> 0.500 </td>
 <td style="text-align:right;"> 0.650 </td>
 <td style="text-align:right;"> 1 </td>
 </tr>
</tbody>
</table>
---
## Important aspects: theory

- How do we use linear model with the non-normal outcomes (link functions) 
- What are logit values, how do we get to them 
- How to interpret logistic regression: logits, odds, and probabilities 
- What is deviance of the model and AIC

---
## Important aspects: practice

- Building a logistic model 
- Predicting values from logistic model 
- Transforming logit values to odds and to probabilities 
- Comparing models using deviance, anova function or AIC information 
- Cross-tabulating model predictions with observed data

---
## Literature

Chapter 5 of "Data Analysis Using Regression and Multilevel/Hierarchical Models" by Andrew Gelman and Jennifer Hill

Chapter 10 of "Statistical Rethinking: A Bayesian Course with Examples in R and Stan " by Richard McElreath + check out his youtube channel

Jaeger, F. (2008). Categorical data analysis: Away from ANOVAs (transformation or not) and towards logit mixed models. Journal of Memory and Language, 59, 434-446.

---

## Song for the end

[Link](https://youtu.be/ZINXFoQMZVs)

---

# Thank you for your attention