Generalised Linear Model

This chapter introduces the generalised linear models: linear regression, logistic (binomial, ordinal, and multinomial), poisson, and negative binomial regression.

This chapter is meant to be for application purposes. The estimation processes and details are contained in chapter 3 and chapter 4.


Overview

The generalised linear model (GLM) are a series of models that help explain the relationship between a series of explanatory variables \(X_1, X_2, \dots, X_p\) and an outcome variable \(Y\), for individual observations \(t = 1, 2, \dots, n\).

  • Note that relationships cannot be interpreted causally, unless the treatment in question was randomly assigned and meets the condition of independence (definition 5.9). If you are interested in causal estimation, see the next chapter.

The generalised linear model (GLM) is a grouping of many different models that take the following form:

\[ g(\mu) = \beta_0 + \beta_1 X_{t1} + \beta_2 X_{t2} + \dots + \beta_p X_{tp} \]

Where \(\mu\) is the parameter of interest of variable \(Y\) (depends on the model) and \(g(\cdot)\) is some link-function that allows the GLM to be applied to a variety of different types of \(Y\). \(\beta_0, \dots, \beta_p\) are the parameters of the model that need to be estimated.

GLMs have two main purposes. First, they can be used for predicting new values of \(Y\) given \(X_1, \dots, X_p\) values. Second, they can be used to understand the correlation between \(X_1, \dots, X_p\) and \(Y\). The choice of model depends on your purpose (prediction or correlation), and the type of outcome \(Y\) you have:

Model When to Use
Linear Regression

Continuous \(Y\) for both prediction and correlation.

Correlation with count \(Y\), and prediction (but can produce poor predictions, so use poisson or negative binomial).

Correlation with ordinal \(Y\) (not good for prediction, use ordinal or multinomial logistic).

Linear Probability Correlation with binary \(Y\) (do not use for prediction, use binomial logistic).
Binomial Logistic Binary \(Y\) prediction, but can be used for correlation as well (but harder correlation interpretation than linear probability).
Ordinal Logistic Correlation or prediction with ordinal \(Y\) (but prediction may be better/more flexible with multinomial logistic).
Multinomial Logistic

Correlation and prediction with categorical \(Y\).

Prediction with ordinal \(Y\).

Poisson Prediction and correlation with count \(Y\) (Use only if \(\V Y < \E Y\), if not, use negative binomial).
Negative Binomial Prediction and correlation with count \(Y\) (Works no matter what type of \(Y\), but poisson may be more efficient if \(\V Y < \E Y\)).
Type Description Example
Continuous Can take any value within an interval \([a, b]\). GDP of a country, temperature.
Binary Can only take value 0 or 1. Yes/no, true/false, did/didn’t.
Ordinal Can only take a finite set of discrete outcomes \(\{a, b, c\}\), but these outcomes can be ordered. strongly disagree - disagree - neutral - agree - strongly agree
Categorical Can only take a finite set of discrete outcomes \(\{a, b, c\}\), and these outcomes have no natural order. Country of birth.
Count Can only take an integer value \(\{0, 1, 2, \dots\}\). Number of cases of a disease, number of phone calls received at a call centre in an hour.

The final section on model specification issues introduces a few different topics (categorical explanatory variables, transformations, interactions) that can be applied to all of the models.


Linear Regression Model

The linear regression model is used for continuous outcome variables \(Y\), and a set of explanatory variables \(\set X = \{X_1, \dots, X_p\}\) for observations \(t = 1, \dots, n\):

\[ Y_t = \underbrace{\beta_0 + \beta_1X_{t1} + \beta_2 X_{t2} + \dots + \beta_p X_{tp}}_{\E(Y_t | \set X_t)} + \eps_t \]

Where \(\beta_0\) is the intercept, \(\beta_1, \dots, \beta_p\) are the parameters that explain the relationship between \(X_j\) and \(Y\), and \(\eps_t\) is the error term that accounts for random variation/noise in \(Y\). For more details on the linear model, see chapter 4.

The parameters \(\beta_0, \dots, \beta_p\) need to be estimated with sample data to obtain estimates \(\hat\beta_0, \dots, \hat\beta_p\). The linear model can be estimated with a variety of estimators:

Interpretation of coefficient estimates are as follows (proof from theorem 4.1). Remember that these interpretations are not causal unless our \(X_j\) has been randomly assigned and meets the condition of independence (definition 1.6):

Continuous \(X_{j}\) Binary \(X_{j}\)
\(\hat\beta_j\) For every one unit increase in \(X_{j}\), there is an expected \(\hat\beta_j\) unit change in \(Y\), holding all other explanatory variables constant. There is a \(\hat\beta_j\) unit difference in \(Y_i\) between category \(X_{j} = 1\) and category \(X_{j} = 0\), holding all other explanatory variables constant.
\(\hat\beta_0\) When all explanatory variables equal 0, the expected value of \(Y\) is \(\hat\beta_0\). For category \(X_{j} = 0\), the expected value of \(Y\) is \(\hat\beta_0\) (when all other explanatory variables equal 0).

For more information on categorical \(X\), interactions, polynomial and logarithmic transformations, see the final section on model specification.


Linear Probability Model

The linear probability model is for binary \(Y\) (bernoulli distribution). It should generally only be used for interpreting relationships between \(X_j\) and binary \(Y\), and not for prediction (use binomial logistic instead). The outcome of interest is the \(\E(Y_t | \set X_t) = \P (Y_t = 1 | \set X_t)\), which we denote as \(\pr_t\).

\[ \pr_t = \beta_0 + \beta_1 X_{1t} + \beta_2 X_{2t} + \dots + \beta_p X_{tp} \]

Where \(\beta_0\) is the intercept, \(\beta_1, \dots, \beta_p\) are the parameters that explain the relationship between \(X_j\) and \(Y\), that need to be estimated with sample data with one of two estimators:

Interpretation of coefficient estimates are as follows (proof from theorem 4.1). Remember that these interpretations are not causal unless our \(X_j\) has been randomly assigned and meets the condition of independence (definition 1.6):

Continuous \(X_{j}\) Binary \(X_{j}\)
\(\hat\beta_j\) For every one unit increase in \(X_{j}\), there is an expected \(\hat\beta_j \times 100\) percentage point change in the probability of a unit being in category \(Y_t=1\), holding all other explanatory variables constant. There is a \(\hat\beta_j\times 100\) percentage point difference in the probability of a unit being in category \(Y_t=1\) between category \(X_{j} = 1\) and category \(X_{j} = 0\), holding all other explanatory variables constant.
\(\widehat{\beta_0}\) When all explanatory variables equal 0, the expected probability of a unit being in category \(Y_t=1\) is \(\hat\beta_0 \times 100\) For category \(X_{j} = 0\), the expected probability of a unit being in category \(Y_t=1\) is \(\hat\beta_j \times 100\) (when all other explanatory variables equal 0).


Binomial Logistic Regression

The linear probability model is for binary \(Y\) (bernoulli distribution). The outcome of interest is the \(\E(Y_t | \set X_t) = \P (Y_t = 1 | \set X_t)\), which we denote as \(\pr_t\). The logistic model applies a link function \(g(\cdot)\) to \(\pr_t\) to ensure that predicted probabilities are always between 0 and 1. The model is specified as:

\[ \log\left(\frac{\pr_t}{1 - \pr_t}\right) = \beta_0 + \beta_1X_{t1} + \beta_2X_{t2} + \dots + \beta_pX_{tp} \tag{7.1}\]

Using the property of logarithms, we can rewrite the model in respect to \(\pr_t\):

\[ \pr_t = \frac{e^{\beta_0 + \beta_1X_{t1} + \beta_2X_{t2} + \dots + \beta_pX_{tp}}}{1+e^{\beta_0 + \beta_1X_{t1} + \beta_2X_{t2} + \dots + \beta_pX_{tp}}} \]

Where \(\beta_0\) is the intercept, \(\beta_1, \dots, \beta_p\) are the parameters that explain the relationship between \(X_j\) and \(Y\), that need to be estimated with sample data. The model is always estimated with the maximum likelihood estimator.

model <- glm(Y ~ X1 + X2 + X3,
             data = mydata,
             family = "binomial")
summary(model)

Details on the maximum likelihood estimator are provided here.

Interpretation of coefficient estimates are very difficult - instead, we will interpret odds ratios, which are \(e^{\hat\beta_j}\). Remember that these interpretations are not causal unless our \(X_j\) has been randomly assigned and meets the condition of independence (definition 1.6):

Odds of an event \(A\) is the probability of \(A\) occuring divided by the probability of event \(A\) not occuring. We can apply the same logic to \(\P(Y_t = 1) = \pr_t\):

\[ \mathrm{odds}_A = \frac{\P(A)}{1 - \P(A)} \quad \implies \quad \mathrm{odds}_{Y_t = 1} = \frac{\pr_t}{1-\pr_t} \]

From eq. 7.1, if we exponent both sides, we can get the odds of \(Y_t = 1\) from the logistic regression:

\[ \frac{\pr_t}{1-\pr_t} = e^{\beta_0 + \beta_1X_{t1} + \beta_2X_{t2} + \dots + \beta_pX_{tp}} \]

An odds ratio is a ratio of two odds. For the odds of event \(A\) and \(B\), the odds ratio is:

\[ OR = \frac{\mathrm{odds}_A}{\mathrm{odds}_B} = \frac{\P A / ( 1 - \P A)}{\P B / (1 - \P B)} \]

We can apply the same to the logistic regression. We can find the odds of \(\pr_t | X_j = x+1\) and \(\pr_t | X_j = x\):

\[ OR = \frac{\mathrm{odds}_{\pr_t | X_j = x+1}}{\mathrm{odds}_{\pr_t | X_j = x}} = e^{\beta_j} \]

Thus, \(e^{\beta_j}\) is the multiplicative change in the odds of event \(Y_t = 1\), for every one unit increase in \(X_j\).

Continuous \(X_{j}\) Binary \(X_{j}\)
\(\hat\beta_j\) For every one unit increase in \(X_{j}\), there is an expected \(\times e^{\hat\beta_j}\) multiplicative change in the odds of a unit being in category \(Y_t=1\), holding all other explanatory variables constant. There is a \(\times e^{\hat\beta_j}\) multiplicative difference in the odds of a unit being in category \(Y_t=1\) between category \(X_{j} = 1\) and category \(X_{j} = 0\), holding all other explanatory variables constant.
\(\hat\beta_0\) The odds of event \(Y_t = 1\) when all explanatory variables equal 0 is \(e^{\hat\beta_0}\). The odds of event \(Y_t = 1\) for category \(X_j = 0\) is \(e^{\hat\beta_0}\), when all other explanatory variables equal 0.


Ordinal Logistic Regression


Multinomial Logistic Regression


Poisson Regression


Negative Binomial Regression


Model Specification Issues