Calculus and Probability

This chapter is a reference for the essentials of Mathematics for Political Science, mainly consisting of calculus. This reference is not meant to teach these concepts. If you want a good math course for political scientists/social scientists, see David Siegel’s youtube channel (scroll through the playlists).

Use the right sidebar for easy navigation.

Derivatives and Partial Derivatives

Derivatives $f'(x)$ calculate the slope of the original function $f(x)$ at a point.

Formal Definition of a Derivative

The slope between two points on a function $y = f(x)$ can be defined as follows:

\[ \frac{\Delta f(x)}{\Delta x} = \frac{f(x_2) - f(x_1)}{x_2 - x_1} = \frac{f(x+h) - f(x)}{x + h -x} = \frac{f(x+h) - f(x)}{h} \] We want the slope at one point, not at two. So we can make the distance between the two points $h$ approach zero:

\[ f'(x) = \lim\limits_{h \rightarrow 0}\frac{f(x+h)-f(x)}{h} \]

This is the formal definition of a derivative, and any derivative can be calculated this way.

Some quick derivative rules (where $c$ is some constant, and $k$ is some constant exponent):

\[ \begin{array}{|c|c|} f(x) & f'(x) \\ \hline c & 0 \\ cx & c \\ cu(x) & c'u(x) \\ x^k & kx^{k-1} \\ u[v(x)] & u'[v(x)] \cdot v'(x) \\ u(x)+v(x) & u'(x) + v'(x) \\ u(x)v(x) & u'(x)v(x) + v'(x)u(x) \\ \frac{u(x)}{v(x)} & \frac{u'(x)v(x) + v'(x)u(x)}{(v(x))^2} \\ \end{array} \qquad \qquad \begin{array}{|c|c|} f(x) & f'(x) \\ \hline e^x & e^x \\ e^{u(x)} & e^{u(x)} \cdot u'(x) \\ c^x & \ln (c) \cdot c^x \\ \ln(x) & \frac{1}{x} \\ \ln(x^k) & \frac{k}{x} \\ \ln(u(x)) & \frac{u'(x)}{u(x)} \\ \sin(x) & \cos(x) \\ \cos(x) & -\sin(x) \end{array} \]

Example 1 (Product and Power Rule)

Let us find the derivative of $f(x) = (x^3)(2x^4)$. There are two ways to do this.

Method 1: Product Rule

Product rule says that $[f(x)g(x)]' = f'(x)g(x) + g'(x)f(x)$. We can define $f(x):=x^3$ and $g(x) := 2x^4$.

Using these definitions, we can find the derivative of our defined $f(x)$ and $g(x)$:

$f'(x) = 3x^2$ using power rule.
$g'(x) = 8x^3$ using the rule $[cf(x)]' = f'(x)$ and power rule.

Using this info, we can now put it into the product rule form and simplify:

\[ \begin{align} f'(x) & = (3x^2)(2x^4) + (8x^3)(x^3) && \text{(product rule)} \\ & = 6x^6 + 8x^6 && \text{(multiply with properties of exponents)} \\ & = 14x^6 \end{align} \]

Method 2: Power Rule

Instead of using product rule (which I did more as a demonstration), we can just simplify our original expression and use power rule:

\[ \begin{align} f(x) & = (x^3)(2x^4) \\ f(x) & = 2x^7 && \text{(multiply with property of exponents)} \\ f'(x) & = 2[x^7]' && ([cf(x)]' = cf'(x)) \\ f'(x) & = 2(7x^6) && \text{(power rule)} \\ f'(x) & = 14x^6 && \text{(multiply out)} \end{align} \]

Example 2 (Chain Rule)

Let us find the derivative of $f(x) = (3x^2 + 5x - 7)^6$.

This is a composite function, so we can use chain rule $[f(g(x))]' = f'(g(x)) \cdot g'(x)$. Let us define $f(x) := x^6$ and $g(x) := 3x^2 + 5x - 7$.

We can quickly determine that:

$f'(x) = 6x^5$ by power rule.
$g'(x) = 6x + 5$ by power rule and sum rule.

Now, let us put it into the form of chain rule and simplify:

\[ \begin{align} f'(x) & = 6(3x^2 + 5x - 7)^5 \cdot (6x + 5) && \text{(chain rule)} \\ f'(x) & = 36x(3x^2 + 5x - 7)^5 + 30(3x^2 + 5x - 7)^5 && \text{(multiply out)} \end{align} \]

Partial derivatives are derivatives in functions with multiple variables $x_1, \dots, x_n$, but we only care about one of these variables $x_j$. We denote them with the partial sign $\frac{\partial y}{\partial x_j}$. Essentially, we just pretend the other variables are constants (treat them like you would the number 5), and take the derivative as normal.

Partial Derivatives Example

Let us find the partial derivative in respect to $x$ of function $f(x,z) = 5x^2z^3 + 2x + 3z +5$.

Since we are taking the partial derivative in respect to $x$, we can pretend that $z$ is just some constant, like the number 5:

\[ \begin{align} \frac{\partial f(x,z)}{\partial x} & = \frac{\partial}{\partial x}[5x^2z^3 + 2x + 3z +5]\\ & = \frac{\partial}{\partial x}[5x^2z^3] + \frac{\partial}{\partial x}[2x] + \frac{\partial}{\partial x}[3z] + \frac{\partial}{\partial x}[5] && \text{(by sum rule)} \\ & = \frac{\partial}{\partial x}[5x^2z^3] + \frac{\partial}{\partial x}[2x] + 0 + 0 && \text{(derivative of constants is 0)} \\ & = \frac{\partial}{\partial x}[5x^2z^3] + 2 && ([cx]' = c)\\ & = 5z^3 \frac{\partial}{\partial x}[x^2] + 2&& ([cf(x)]' = cf'(x)) \\ & = 5z^3[2x] + 2 && \text{(power rule)} \\ & = 10z^3x + 2 && \text{(multiply out)} \end{align} \]

The Gradient

Unconstrained Optimisation

Maxima and Minima are extrema of a function $f(x)$. A local maximum/minimum is some $x$ value that corresponds with the maximum/minimum $f(x)$ value in its neighbourhood (in nearby $x$ values). A global maximum/minimum is some $x$ value that corresponds with the maximum/minimum $f(x)$ value within the entire domain of the function.

Maxima and Minima (both global and local) always occur at a stationary point. A stationary point is defined as a point where the first derivative $f'(x) = 0$. Maxima occur at a stationary point when a function is concave, i.e. the second derivative $f''(x) < 0$. Minima occur at a stationary point where a function is convex, i.e. the second derivative $f''(x) > 0$.

Thus, to optimise a function (find the $x$ value that produces the minima/maximima), our procedure is the following:

Find $f'(x)$. Then, set $f'(x) = 0$ (first order condition). Solve for the $x$ values that makes $f'(x) = 0$ true (there can be multiple). Let us denote those $x$ values as $x_s$.
Find $f''(x)$. Now, plug in our solved $x_s$’s into our second derivative $f''(x_s)$. If it is negative, we have a maxima. If it is positive, we have a minima.
If we are interested in the actual function value at the extrema, we can plug them back into the original function, $f(x_s)$.

If we want to find the maximum within some sub-domain of the function $[a, b]$, we should also plug in $f(a)$ and $f(b)$ into the original function to see if these edge-cases of the sub-domain are higher/lower than our calculated maxima/minima.

Example of Unconstrained Maximisation

Constrained Optimisation

Indefinite and Definite Integrals

In evaluating indefinite integrals, we always include a $+C$ (integration constant). Common rules include:

\[ \begin{array}{|c|c|} f(x) & \int f(x) = F(x) \\ \hline c & cx + C\\ c u(x) & c \cdot \int u(x)dx \\ u(x) + v(x) & \int u(x)dx + \int v(x)dx \\ x^k & \frac{x^{k+1}}{k+1} + C \\ [u(x)]^k u'(x) & \frac{1}{k + 1}[u(x)]^{k+1} + C \end{array} \qquad \qquad \begin{array}{|c|c|} f(x) & \int f(x) = F(x) \\ \hline x^{-1} & \ln (x) + C \\ \frac{u'(x)}{u(x)} & \ln(f(x)) + C \\ e^x & e^x + C\\ e^{u(x)}u'(x) & e^{u(x)} + C \\ \cos(x) & \sin(x) + C \\ \end{array} \]

Example

Let us find the integral of $f(x) = e^xe^{e^x} + 3x^2$.

First, we know that $\int [u(x) v(x)]dx = \int u(x)dx + \int v(x)dx$. Thus, we can split our above into $u(x) := e^e^{e^x}$ and $v(x) := 3x^2$.

First, let us find $\int u(x)dx$. We know the rule from above $\int [e^{u(x)} u'(x)]dx = e^{u(x)} + C$.

We can rearrange our original part into the rule form as $\int e^{e^x}e^xdx$. Since this now fits the rule, we know $u(x)dx = e^{ex} + C $.

Now, let us find $\int v(x)dx$. Using reverse power rule $\int x^k dx = \frac{x^{k+1}}{k+1} + C$, we can quickly deduce that $\int v(x)dx = 3(\frac{1}{3}x^3) = x^3 + C$.

Thus, putting the two back together, we get:

\[ \int \left[e^x e^{e^x} + 3x^2 \right]dx = e^{e^x} + x^3 + C \]

There are two more complex ways to solve integrals as well:

\[ \begin{align} & \int u[v(x)]v'(x)dx = U(v(x)) + C && \text{integration by substitution} \\ & \int u(x) v'(x)dx = u(x) v(x) - \int u'(x) v(x)dx && \text{integration by parts} \end{align} \]

Example of Integration by Substitution

Let us find the integral of $f(x) = (x+4)^5$.

We know integration by substitution says:

\[ \int u[v(x)]v'(x)dx = U(v(x)) + C \]

For integration by substitution, we want to find some $u(x)$ and $v(x)$ we can define to rearrange our original equation, where $\int u(x)dx$ is not too difficult to find.

For this example, we could define $u(x) := x^5$ and $v(x) := x +4$. These substitutions will make the $u[v(x)]$ look like $(x+4)^5$. However, if we take a look at the integration by substitution, we will note that our left hand side is in the form $u[v(x)] v'(x)$. Luckily, our originally definitions still work, since $v'(x) = 1$.

Now, we have this form, we can use the integration by substitution formula, which tells us the integra will be $U(v(x)) + C$.

We see that we have to find $U(x) = \int u(x)dx$. We know $u(x) := x^5$, so we can use reverse power rule $\int x^k dx = \frac{x^{k+1}}{k+1} + C$, and see this is $\frac{1}{6}x^6 + C$.

Now, we need to fund $U[v(x)]$, so we insert $v(x) = x + 4$ inside to get our final answer:

\[ \int (x+4)^5 dx = \frac{1}{6}(x+4)^2 + C \]

Example of Integration by Parts

Let us find the integral of $f(x) = xe^xdx$.

We know integration by parts says:

\[ \int u(x) v'(x)dx = u(x) v(x) - \int u'(x) v(x)dx \]

For integration by parts, we want to find some $u(x)$ and $v'(x)$ we can define to rearrange our original equation, where $\int u'(x)v(x)dx$ is not too difficult to find.

We can define $u(x) := x$, and $v'(x) := e^x$. This means our left side of the rule $u(x)v'(x) = xe^x$ as we need it to be.

Now, we know to solve this by integration by parts, we need to find $\int u'(x)v(x)$. First, we know $u'(x) = 1$. Thus, we need to find $\int 1v(x)$. We know $\int e^x = e^x$, so $\int u'(x)v(x) = e^x$.

Now, we have all the parts we need to put into our integraion by parts formula for the answer:

\[ \int xe^x dx = x \cdot e^x - e^x + C \]

Definite integrals are notated $\int_a^b f(x)$. They denote the area under the curve $f(x)$ between points $a$ and $b$. The Fundamental Theorem of Calculus allows us to find the value of definite integrals.

\[ \int\limits_a^b f(x)dx = F(b) - F(a) \]

This is a useful property for calculating the area under distributions, which are also probabilities.

Basic Set Theory

A set is a collection of distinct objects (generally numbers). We notate a set with a capital letter $A$, and an element/object within a set with a lowercase $a$. The most common sets you will see include:

$\mathbb R$: The set of all real numbers (not imaginary)
$\mathbb Z$: The set of all integers {…, -1, 0, 1, 2, …}.
$\mathbb N$: The set of all natural numbers {1,2,3,…}.
$\omega$ The set of all possible outcomes of some event. This depends on the event, but is quite widely used in probability.
$\varnothing$: An empty set with no elements.

We can define sets by their elements. Below is set $A$ with 3 elements $a, b, c$:

\[ A = \{a, b, c \} \]

We can also define sets by some interval/ranges/conditions. Below, we define set $A$ as some elements $x$ such that ($:$) $x$ is in between -1 and 3, and $x$ is in all real numbers $\mathbb R$.

\[ A = \{x : -1 < x < 3, x \in \mathbb R\} \]

An element $a$ in set $A$ is denoted $a \in A$. If element $a$ is not in set $A$, we say $a \notin A$. You will also see the notation $a \in [c, d]$ or $a \in (c, d)$, which represents element $a$ is in the interval between $c$ and $d$.

The intersection between two sets $A$ and $B$, denoted $A \cap B$, is a new set of only the elements that are both in $A$ and $B$ at the same time. The union between two sets $A$ and $B$, denoted $A \cup B$, is a set of elements that are either in $A$ or $B$ or both. It is the combination of elements of both sets.

The notation $A \subset B$ means $A$ is a subset of $B$. This means that all elements of $A$ are present in $B$ as well, and $A$ has less elements than $B$ does.

You will also see the set complement $A^c$, which refers to all elements that are not in set $A$ but are a part of the set of possible elements (which will depend on the context).

Basics of Probability

An experiment is some process that has several different possible outcomes. The sample space $\Omega$ of the experiment is the set of all possible outcomes of the experiment.

An event $A$ is some subset of outcomes of the sample space, $A \in \Omega$. The probability of event $A$ occuring, that we notate as $P(A)$, is the chance of some subset of outcomes $A$ occuring in relation to the set of all outcomes $\Omega$.

Let us have two events $A$ and $B$. The following probabilities are:

$P(A \cap B)$: the probability that $A$ and $B$ both happen. This is called a join probability.
$P(A \cup B)$: the probability that $A$ or $B$ occurs - at least one occurs.
$P(A^c)$: the probability that not-$A$ occurs - so the probability that anything other than $A$ occurs. By definition, $P(A^c) = 1 - P(A)$.

Probabilities have a few axioms:

For any event $A$, there must be a non-negative probability: $P(A) ≥ 0$.
The probability of all possible outcomes (sample space) is 1: $P(\Omega) = 1$.

If we have mutually exclusive events (i.e. if $A$ occurs than $B$ cannot occur at the same time), then the probability either $A$ or $B$ occuring is:

\[ P(A \cup B) = P(A)+Pr(B), \quad \text{more than 2 events}: P\left(\bigcup\limits_{i=1}^n A_i\right) = \sum\limits_{i=1}^n P(A_i) \]

If we have independent events (i.e. if $A$ occurs, that does not affect the probability of $B$ occuring), then the probability that both $A$ and $B$ occuring is:

\[ P (A \cap B) = P(A) P(B), \quad \text{more than 2 events}: P\left(\bigcap\limits_{i=1}^n A_i \right) = \prod\limits_{i=1}^n P(A_i) \]

Bayes’ Rule

Conditional Probability is the probability of one event, given another has already occured. The conditional probability of $A$ given $B$ has occured is denoted $P(A|B)$:

\[ P(A|B) = \frac{P(A \cap B)}{P(B)} \]

The law of Total Probability helps us calculate the probability of an event $A$, by considering different mutually exclusive events that partition the sample space $\Omega$.

For example, perhaps you want to calculate the overall smoking rate $A$. Assuming all humans can be divided into male or female, and no one can be both male and female, you can calculate the total smoking rate of the population, if you know the conditional probabilities of smoking for males and females, and their proportion in the population.

\[ P(A) = P(A|B_M) P(B_M) + P(A|B_F)P(B_F) \]

Where $B_M$ is male and $B_F$ is female.

Furthermore, we know that events $B$ and $B^c$ completely partition the sample space $\Omega$. Thus, we can also write the law of total probability as:

\[ P(A) = P(A|B)P(B) + P(A|B^c)P(B^c) \]

Using the definition of conditional probability and the law of total probability, we can create Bayes’ Rule, one of the most important rules in statistics and probability:

\[ \begin{align} P(A|B) & = \frac{P(B|A)P(A)}{P(B)} \\ & = \frac{P(B|A)P(A)}{P(B|A)P(A) + P(B|A^c)P(A^c)} && \text{by law of total probability} \end{align} \]

Deriving Bayes’ Rule

Let us start off with the definition of conditional probability:

\[ P(A|B) = \frac{P(A \cap B)}{P(B)} \]

Let us solve for $P(A \cap B)$ to get:

\[ P(A\cap B) = P(A|B)Pr(B) \]

Now, let us take the opposite conditional probability:

\[ P(B|A) = \frac{P(B \cap A)}{P(A)} \]

Let us solve for $P(B \cap A)$:

\[ P(B\cap A) = P(B|A) P(A) \]

By the commutative property of sets, we know that:

\[ P(A \cap B) = P(B \cap A) \]

Thus, the following must also be true.

\[ P(A|B)P(B) = P(B|A)P(A) \]

Now, let us solve for $P(A|B)$:

\[ P(A|B) = \frac{P(B|A)P(A)}{P(B)} \]

Thus, we have proved Bayes’ Rule. We can plug in the law of total probability below to expand.