The provided sources detail the measure-theoretic foundations of modern probability theory, a rigorous mathematical framework established primarily by Andrey Kolmogorov in 1933. This framework replaced heuristic approaches to probability to resolve paradoxes associated with continuous and infinite sample spaces.
Here is a brief explanation of the core concepts covered in the texts:
- Kolmogorov's Axioms: Probability is formalized using a probability space $(\Omega, \mathcal{F}, P)$. Here, $\Omega$ is the sample space (all possible outcomes), $\mathcal{F}$ is a $\sigma$-algebra (a collection of events closed under countable unions and complements), and $P$ is a probability measure. $P$ must satisfy three axioms: non-negativity, normalization ($P(\Omega) = 1$), and countable additivity (the probability of a union of mutually exclusive events is the sum of their individual probabilities).
- Carathéodory's Extension Theorem: To rigorously assign probabilities to continuous subsets (like intervals on the real line), this theorem is essential. It guarantees that a simple "pre-measure" defined on a basic ring of sets can be uniquely extended into a full measure on a $\sigma$-algebra. This is the mechanism used to construct the Lebesgue measure.
- Lebesgue Integration: In this framework, random variables are formally defined as measurable functions mapping $\Omega$ to real numbers. Their expectation (average) is calculated using the Lebesgue integral. Unlike the introductory Riemann integral—which partitions the domain and fails for highly discontinuous functions—the Lebesgue integral partitions the function's range. This effortlessly unifies discrete sums and continuous integrals and enables powerful limit theorems (e.g., Monotone Convergence, Dominated Convergence) that allow mathematicians to seamlessly swap limits and integrals.
- Conditional Expectation: Traditional conditional probability, defined as a ratio $P(A|B) = P(A \cap B)/P(B)$, breaks down if the conditioning event $B$ has a probability of zero. Measure theory solves this via the Radon-Nikodym theorem, redefining conditional expectation $E[X|\mathcal{G}]$ as an entire $\mathcal{G}$-measurable random variable. Geometrically, within $L^2$ spaces, it acts as the best orthogonal projection (or optimal prediction) of a variable $X$ given the information $\mathcal{G}$.
- The One-Way Fubini Property: When modeling a continuum of independent random variables (e.g., infinite economic agents facing idiosyncratic risks), standard joint measurability fails. To integrate these processes properly, researchers utilize a "one-way Fubini extension," an extended probability space that restricts the order of integration but successfully allows individual independent shocks to cancel out at the macroscopic level.