# Rademacher's theorem

This is a set of notes I prepared for a seminar. The talk abstract was as follows:

Every set is nearly a finite union of intervals, every function is nearly continuous, every convergent sequence of functions is nearly uniformly convergent, every pancake is nearly a waffle - but is every function nearly differentiable? Come see how Lipschitz continuity can make the world a better place.

## 1. Rademacher’s Theorem

Recall that an **outer measure** on $X$ is a nonnegative
extended-real-valued function $\mu$ on
$\mathscr{P}(X)$ that is countably subadditive. Following
**Carathéodory’s criterion**, we shall call
a set $A \subseteq X$ **measurable** if

$$\mu(E) = \mu(E \cap A) + \mu(E \cap A^c)$$

for all $E \subseteq X$. The outer measure of a disjoint union
of measurable sets is the sum of the outer measure of each set.
Henceforth, we shall drop **outer** and call any such $\mu$
a **measure**.

A measure $\mu$ on a topological space $X$ is **Borel regular**
if all Borel sets are measurable and every subset of $X$ is contained
in a Borel set of the same measure. The standard measure on
$\mathbb{R}^n$ is the **$n$-dimensional Lebesgue measure**
$\mathscr{L}^n$, which is the unique translation-invariant Borel regular
measure on $\mathbb{R}^n$ with the normalization $\mathscr{L}^n([0,1]^n) = 1$.

There are a number of theorems that establish the $\mathscr{L}^n$-almost
everywhere differentiability of a function $f:\mathbb{R}^n \to \mathbb{R}^m$. These
theorems are typically proved using techniques from **geometric measure
theory**, which studies the geometric properties of subsets of Euclidean
spaces and measures on them. Today’s talk will focus on **Rademacher’s
theorem**, which establishes the almost-everywhere differentiability of
Lipschitz functions:

Definition.A function $f:\mathbb{R}^n \to \mathbb{R}^m$ isLipschitzif there exists a nonnegative real number $M$ such that$$f(x)-f(y)| \leq M|x-y|$$

for all $x,y \in \mathbb{R}^n$. The

Lipschitz constant$\operatorname{Lip} f$ is the infimum of all such $M$.

Theorem 1(Rademacher, 1919). If $f:\mathbb{R}^n \to \mathbb{R}^m$ is Lipschitz, then $f$ is differentiable $\mathscr{L}^n$-almost everywhere on $\mathbb{R}^n$.

## 2. Bounded Variations

The one-dimensional Rademacher’s theorem is a special case of differentiation theory on $\mathbb{R}$, which is a standard real-analysis topic. Let us briefly review the relevant facts.

Recall that a curve $\gamma$ in $\mathbb{R}^2$ with a continuous parametrization
$z(t) = (x(t),y(t))$ on $[a,b]$ is **rectifiable** if there exists
a nonnegative real number $M$ such that

$$\sum_{j=1}^N |z(t_j) - z(t_{j-1})| \leq M$$

for every partition $a = t_0 < \cdots < t_N = b$. The **length** of
a rectifiable curve $\gamma$ is the supremum of all such sums, or
equivalently the infimum of all such $M$. It is well-known that
$\gamma$ is rectifiable if and only if $x(t)$ and $y(t)$ are of
bounded variation, which we define as follows:

Definition.$F:[a,b] \to \mathbb{R}$ is ofbounded variationif there exists a nonnegative real number $M$ such that$$\sum_{j=1}^N |F(t_j) - F(t_{j-1})| \leq M$$

for every partition $a = t_0 < \cdots < t_N = b$. The

total variation$T_F([a,b])$ of $F$ is the supremum of all such sums, or equivalently the infimum of all such $M$.

As it turns out, bounded variations are differentiable almost everywhere:

Lemma 2.If $F:[a,b] \to \mathbb{R}$ is of bounded variation, then $F$ is differentiable $\mathscr{L}^1$-almost everywhere on $[a,b]$.

For a proof, see [SS05] Chapter 3, Theorem 3.4 or [Fol99] Theorem 3.27. Rudin, of course, leaves it as an exercise: see [Rud87] Exercise 13 in Chapter 7 if you’d like to try it yourself.

To connect the above lemma to Lipschitz functions, we need the notion of bounded variation on $\mathbb{R}$:

Definition.$F:\mathbb{R} \to \mathbb{R}$ is ofbounded variationif there exists a nonnegative real number $M$ such that \[T_F([a,b]) \leq M\] for every interval $[a,b] \subseteq \mathbb{R}$.

A simple modification of the proof yields the following:

Corollary 3.If $F:\mathbb{R} \to \mathbb{R}$ is of bounded variation, then $F$ is differentiable $\mathscr{L}^1$-almost everywhere on $\mathbb{R}$.

Observing that Lipschitz functions are of bounded variation on $\mathbb{R}$, we now conclude:

Corollary 4(One-dimensional Rademacher's theorem). If $f:\mathbb{R} \to \mathbb{R}$ is Lipschitz, then $F$ is differentiable $\mathscr{L}^1$-almost everywhere on $\mathbb{R}$.

## 3. Hausdorff Measures

We shall also need a way of assigning $m$-dimensional measures
on subsets of $\mathbb{R}^n$, when $m<n$. To this end, we recall that
the **diameter** of a set $E \subseteq \mathbb{R}^n$ is

$$\operatorname{diam}(E) = \inf_{x,y \in E} |x-y|.$$

Recall also that the **distance** between two subsets $E$ and $F$ of
$\mathbb{R}^n$ is

$$\operatorname{dist}(E,F) = \inf_{\substack{x \in E \ y \in F}} |x-y|.$$

Before we give a definition of the Hausdorff measure, we review a particularly constructive way of defining the Lebesgue measure. The crucial ingredient is the following

Lemma 5(Whitney decomposition theorem). Every open set $O$ in $\mathbb{R}^n$ can be decomposed into a countable union of cubes whose interiors are disjoint. Furthermore, we can choose the cubes $(Q_j)_{j=1}^\infty$ satisfying$$\operatorname{diam}(Q_j) \leq \operatorname{dist}(Q_j,O^c) \leq 4\operatorname{diam}(Q_j).$$

The proof of the first statement is easy and can be found in [SS05], Chapter 1, Theorem 1.4. The full proof can be found in [Ste70], Chapter VI §1.2.

We now use the Whitney decomposition theorem to construct the Lebesgue measure. Here we denote by $|Q_j|$ the standard $n$-dimensional volume of the cube $Q_j$.

Definition(The Lebesgue measure). The$n$-dimensional Lebesgue measure$\mathscr{L}^n$ on $\mathbb{R}^n$ is defined for every subset $E \subseteq \mathbb{R}^n$ by$$ \mathscr{L}^n(E) = \inf_{O \supseteq E} \inf_{O = \bigcup Q_j} \sum_{j=1}^\infty |Q_j| $$

where the first infimum is taken over all open supersets of $E$ and the second infimum over all Whitney decompositions of $O$.

In words, we approximate the measure of each set by the measure of its open covers, whose measure is approximated by their Whitney decompositions. That the Lebesgue measure is Borel Regular is a consequence of the following

Lemma 6(Metric outer measures are Borel regular). A measure $\mu$ on $\mathbb{R}^n$ is Borel regular if and only if $$ \mu(E \cup F) = \mu(E) + \mu(F) $$ for all subsets $E$ and $F$ of $\mathbb{R}^n$ with $\operatorname{dist}(E,F) > 0$.

The proof of the “only if” part is easy. For a proof of the “if” part, see [SS05], Chapter 6, Theorem 1.2 or [Fol99], Proposition 11.16.

We now return to the task of assigning $m$-dimensional measures on the subsets of $\mathbb{R}^n$. The idea, due to Hausdorff, is to approximate the measure by $m$-dimensional balls:

Definition.Fix $m \in \mathbb{N}$, and let $\omega_m$ be the $m$-dimensional Lebesgue measure of the closed unit ball in $\mathbb{R}^m$. The$m$-dimensional Hausdorff measure$\mathscr{H}^m$ is defined for every subset $E \subseteq \mathbb{R}^n$ by$$ \mathscr{H}^m(E) = \lim_{\delta \to 0} \inf_{\substack{E \subseteq \bigcup S_j \ \operatorname{diam}(S_j) \leq \delta}} \sum_{j=1}^\infty \omega_m \left(\frac{\operatorname{diam}(S_j)}{2} \right)^m, $$

where the infimum is taken over a countable cover $(S_j)_{n=1}^\infty$ of $E$ of diameter at most $\delta$.

Here we have chosen the normalization that will give us the identity $\mathscr{H}^n = \mathscr{L}^n$. Note that $\mathscr{H}^0$ is the counting measure. The one-dimensional Hausdorff measure $\mathscr{H}^1$ of a rectifiable curve is its length. In general, the $m$-dimensional Hausdorff measure of an $m$-manifold coincides with its surface measure, but we shall not pursue the idea in this talk.

## 4. Proof of Rademacher’s Theorem

We are now ready to present a proof of Rademacher’s theorem. The proof is taken from pages 101-102 of [Mat95].

Let $f:\mathbb{R}^n \to \mathbb{R}^m$ be a Lipschitz function. Since the differentiability of
$f$ is equivalent to the differentiability of the coordinate
functions $f_1,\ldots,f_m$, we may assume without loss of generality that
$m=1$. Recall that the **directional derivative** of $f$ at $x \in \mathbb{R}^n$
in the direction of $u \in \mathbb{S}^{n-1}$ is

$$D_u f(x) = \lim_{t \to 0} \frac{f(x+tu)-f(x)}{t},$$

provided that the limit exists. We contend that $D_u f(x)$ exists for $\mathscr{L}^n$-almost every $x \in \mathbb{R}^n$.

To prove the assertion, we let $B_e$ denote the collection of points $x \in \mathbb{R}^n$ at which $D_u f(x)$ does not exist and observe that

$$B_u = \left\lbrace x \in \mathbb{R}^n : \limsup_{t \to 0} \frac{f(x+tu)-f(x)}{t} - \liminf_{t \to 0} \frac{f(x+tu)-f(x)}{t} > 0 \right\rbrace.$$

Since the difference quotient is continuous at $t \neq 0$, the limit superior and the limit inferior thereof are measurable, whereby $B_u$ is measurable. Applying the one-dimensional Rademacher’s theorem to $t \mapsto f(x+tu)$, we find that

$$\mathscr{H}^1(B_u \cap \lbrace x + tu : t \in \mathbb{R}\rbrace) = 0$$

for each $x \in \mathbb{R}^n$. We now set $\chi_{B_u}$ to be the characteristic function on $B_u$ and write

$$\mathscr{L}^n(B_u) = \int_{\mathbb{R}^n} \chi_{B_u}(x) \, dx.$$

By the Fubini-Tonelli theorem, we may integrate along each line parallel to the unit vector $u$ and then integrate over the set of all such lines to conclude that $\mathscr{L}^n(B_u) = 0$.

Recall that the **gradient** of $f$ at $x \in \mathbb{R}^n$ is given by
the row vector

$$\nabla f(x) = \begin{bmatrix} D_1 f(x) & \cdots & D_n f(x) \end{bmatrix},$$

where $D_j f(x) = D_{e_j} f(x)$ are the partial derivatives in the standard basis ${e_1,\ldots, e_n}$ for $\mathbb{R}^n$. We claim that, for each $u \in \mathbb{S}^{n-1}$,

$$D_u f(x) = \nabla f(x) \cdot u$$

at $\mathscr{L}^n$-almost every $x \in \mathbb{R}^n$.

To this end, we fix $\varphi \in \mathscr{C}^\infty_c(\mathbb{R}^n)$ and observe that

$$\begin{align*} \int_{\mathbb{R}^n} \frac{f(x+hu)-f(x)}{h} \varphi(x) \, dx &= \frac{1}{h} \left( \int_{\mathbb{R}^n} f(x+hu) \varphi(x) \, dx - \int_{\mathbb{R}^n} f(x)\varphi(x) \, dx \right) \\ &= \frac{1}{h} \left( \int_{\mathbb{R}^n} f(x) \varphi(x-hu) \, dx - \int_{\mathbb{R}^n} f(x)\varphi(x) \, dx \right) \\ &= -\frac{\varphi(x)-\varphi(x-hu)}{h} f(x) \, dx \end{align*}$$

for each $h \neq 0$. By the Lipschitz condition, we have the bound

$$\left| \frac{f(x+hu)-f(x)}{h} \varphi(x) \right| \leq \frac{\operatorname{Lip} f |hu|}{|h|} |\varphi(x)| \leq \operatorname{Lip} f |\varphi (x)|,$$

whence by the dominated convergence theorem

$$\begin{align*} \lim_{h \to 0} \int_{\mathbb{R}^n} \frac{f(x+hu)-f(x)}{h} \varphi(x) \, dx &= \int_{\mathbb{R}^n} \lim_{h \to 0} \frac{f(x+hu)-f(x)}{h} \varphi(x) \, dx \\ &= \int_{\mathbb{R}^n} D_u f(x) \varphi(x) \, dx \end{align*}$$

and

$$\begin{align*} \lim_{h \to 0} \int_{\mathbb{R}^n} \frac{f(x+hu)-f(x)}{h} \varphi(x) \, dx &= -\lim_{h \to 0} \int_{\mathbb{R}^n} \frac{\varphi(x)-\varphi(x-hu)}{h} f(x) \, dx \\ &= -\int_{\mathbb{R}^n} \lim_{h \to 0} \frac{\varphi(x)-\varphi(x-hu)}{h} f(x) \, dx \\ &= -\int_{\mathbb{R}^n} f(x) D_u \varphi(x) \, dx. \end{align*}$$

Since $\varphi(x)$ is $\mathscr{C}^1$ everywhere, we have

$$D_u\varphi(x) = \nabla \varphi(x) \cdot u$$

for all $x \in \mathbb{R}^n$. It thus follows that

$$\begin{align*} \int_{\mathbb{R}^n} D_u f(x) \varphi(x) \, dx &= -\int_{\mathbb{R}^n} f(x) D_u \varphi(x) \, dx \\ &= -\int_{\mathbb{R}^n} f(x) \left(\nabla \varphi(x) \cdot u \right) \, dx \\ &= -\int_{\mathbb{R}^n} f(x) \left(\sum_{j=1}^n D_j \varphi(x) u_j \right) \, dx \\ &= - \sum_{j=1}^n u_j \int_{\mathbb{R}^n} f(x) D_j \varphi(x) \, dx \\ &= \sum_{j=1}^n u_j \int_{\mathbb{R}^n} D_j f(x) \varphi(x) \, dx \\ &= \int_{\mathbb{R}^n} \left( \sum_{j=1}^n D_j f(x) u_j \right) \varphi(x) \, dx \\ &= \int_{\mathbb{R}^n} \nabla f(x) \cdot u \varphi(x) \, dx. \end{align*}$$

Since the identity holds for all $\varphi \in \mathscr{C}^\infty_c$, we conclude that

$$D_uf(x) = \nabla f(x) \cdot u$$

whenever $D_u f(x)$ exists.

We now recall that $f$ is **differentiable** at $x \in \mathbb{R}^n$ if there
exists a matrix $Df(x)$ such that

$$\lim_{|v| \to 0} \frac{|f(x+v) - f(x) - Df(x) \cdot v|}{|v|} = 0.$$

We are now left with the task of establishing the $\mathscr{L}^n$-almost everywhere differentiability of $f$. Of course, we will have

$$Df(x) = \nabla f(x)$$

whenever the left-hand side exists.

Since $\mathbb{S}^{n-1}$ is compact, it is separable, and so we can find a countable dense subset ${u_1,u_2,\ldots}$ of $\mathbb{S}^{n-1}$. For each $j$, we let $A_i$ be the collection of $x \in \mathbb{R}^n$ at which $\nabla f(x)$ and $D_{u_j} f(x)$ exist and $D_{u_j} f(x) = \nabla f(x) \cdot u_j$. We set

$$A = \bigcup_{j=1}^\infty A_j$$

and observe that

$$\begin{align*} \mathscr{L}^n(\mathbb{R}^n \smallsetminus A) &= \mathscr{L}^n \left( \mathbb{R}^n \smallsetminus \bigcap_{j=1}^\infty A_j \right) \\ &= \mathscr{L}^n \left( \mathbb{R}^n \cap \left( \bigcap_{j=1}^\infty A_j \right)^c \right) \\ &= \mathscr{L}^n \left( \mathbb{R}^n \cap \bigcup_{j=1}^\infty A_j^c \right) \\ &\leq \mathscr{L}^n \left( \bigcup_{j=1}^\infty A_j^c \right) \\ &\leq \sum_{j=1}^\infty \mathscr{L}^n(A_j) \\ &= 0. \end{align*}$$

Our final claim is that $f$ is differentiable on $A$. To show this, we set, for every $x \in A$, $u \in \mathbb{S}^{n-1}$, and $h > 0$,

$$Q(x,u,h) = \frac{f(x+hu)-f(x)}{h} - \nabla f(x) \cdot u.$$

Given a fixed $x_0 \in A$, it suffices to show that

$$\lim_{h \to 0} Q(x_0,u,h) = 0$$

uniformly on $\mathbb{S}^{n-1}$. We first note that

$$\begin{align*} & |Q(x_0,u,h)-Q(x_0,u’,h)| \\ =& \left|\frac{f(x+hu)-f(x+hu’)}{h} - \nabla f(x) \cdot (u-u’) \right| \\ =& \left|\frac{f(x+hu)-f(x+hu’)}{h} - \lim_{t \to 0} \frac{f(x+t(u-u’))-f(x)}{t} \right| \\ \leq& \frac{\operatorname{Lip} f |h||u-u’|}{|h|} + \frac{\operatorname{Lip} f |t||u-u’|}{|t|} \\ \leq& 2 (\operatorname{Lip} f) |u-u’|. \end{align*}$$

Fix $\varepsilon>0$. By the generalized Heine-Borel theorem, we can cover $\mathbb{S}^{n-1}$ with finitely many $\varepsilon$-balls. Since ${u_1,u_2,\ldots}$ is dense in $\mathbb{S}^{n-1}$, we can find $N \in \mathbb{N}$ such that the $\varepsilon$-balls centered at $u_1,\ldots,u_N$ cover $\mathbb{S}^{n-1}$. By definition of $A$, we have $Q(x_0,u_j,h) \to 0$ as $h \to 0$ regardless of $j$. Therefore, we can find $\delta>0$ such that $|Q(x_0,u_j,h)| < \varepsilon$ for $0 < h < \delta$ and $1 \leq j \leq N$.

Now, for each $u \in \mathbb{S}^{n-1}$ and $0 < h < \delta$, we can find $1 \leq j \leq N$ such that $|u-u_j| < \varepsilon$. We then have

$$\begin{align*} |Q(x_0,u,h)| &\leq |Q(x_0,u,h)-Q(x_0,u_j,h)| + |Q(x_0,u_j,h)| \\ &< 2 (\operatorname{Lip} f) |u-u_j| + \varepsilon \\ &< (2 \operatorname{Lip} f + 1) \varepsilon, \end{align*}$$

which proves the claim. This completes the proof of Rademacher’s theorem.

## References

- [Fed69] Herbert Federer,
*Geometric Measure Theory*, Springer-Verlag, 1969. - [Fol99] Gerald B. Folland,
*Real Analysis: Modern Techniques and Their Applications*, second ed., John Wiley & Sons, 1999. - [Mat95] Pertti Mattila,
*Geometry of Sets and Measures in Euclidean Spaces: Fractals and Rectifiability*, Cambridge University Press, 1995. - [Mor09] Frank Morgan, *Geometric Measure Theory: A Beginner’s Guide, fourth ed., Academic Press, 2009.
- [Rud87] Walter Rudin,
*Real and Complex Analysis*, third ed., McGraw-Hill, 1987. - [SS05] Elias M. Stein and Rami Shakarchi,
*Real Analysis: Measure Theory, Integration, and Hilbert Spaces*, Princeton University Press, 2005. - [Ste70] Elias M. Stein,
*Singular Integrals and Differentiability Properties of Functions*, Princeton University Press, 1970.