Fourier analysis on boolean hypercubes

March 16, 2014

In this post, we present an introduction to the analysis of boolean functions, focusing on developing the basic theory of Fourier analysis on boolean hypercubes.

The reader is assumed to be familiar with basic real analysis, functional analysis, probability theory, and group theory.

1. Introduction

One of the most frequently used tools in mathematics is the Fourier transform, which decomposes a given function into a superposition of more basic functions, often symmetric in nature. Classical Fourier analysis provides us with two versions of the Fourier transform: the Fourier coefficient operator that produces the Fourier coefficients

$$\hat{f}(n) = \frac{1}{2\pi} \int_{-\pi}^{\pi} f(x) e^{-i n \pi} \, dx$$

of a $2\pi$-periodic function $f:[-\pi, \pi] \to \mathbb{C}$, and the Fourier transform operator that produces the Fourier transform

$$\hat{f}(\xi) = \int_{-\infty}^\infty f(x) e^{-2 \pi i x \xi} \, dx$$

of a non-periodic function $f:\mathbb{R} \to \mathbb{R}$. These are invertible, in the sense that we have the Fourier series

$$f(x) = \sum_{n \in \mathbb{Z}} \hat{f}(n) e^{inx}$$

and the inverse Fourier transform

$$f(x) = \int_{-\infty}^\infty \hat{f}(\xi) e^{2 \pi i \xi x} \, d\xi.$$

Furthermore, both versions satisfy useful $L^2$-identities: Parseval’s theorem

$$\sum_{n \in \mathbb{Z}}|\hat{f}(n)|^2 = \frac{1}{2\pi} \int_{-\infty}^\infty |f(x)|^2 \, dx$$

and Plancherel’s theorem

$$\int_{-\infty}^\infty |f(x)|^2 \, dx = \int_{-\infty}^\infty |\hat{f}(\xi)| \, d\xi.$$

In what sense are these two versions instantiations of one unified concept, the Fourier transform? To make the connections clearer, we transport $2\pi$-periodic functions on $\mathbb{R}$ onto the circle group $\mathbb{T} = \lbrace z \in \mathbb{C} : |z| = 1\rbrace$ via the change-of-variables map $x \mapsto e^{ix}$. With the usual multiplication operation as group operation, $\mathbb{T}$ is an abelian group. Furthermore, the restriction of the Euclidean topology on $\mathbb{T}$ turns it into a compact topological group, i.e., the group operation map $G \times G \to G$ and the inverse map $G \to G$ are continuous. Carrying over the one-dimensional Lebesgue measure onto $\mathbb{T}$, we end up with a Haar measure $\sigma$ on $\mathbb{T}$, viz., $\sigma(z E) = \sigma(E)$ for each $z \in \mathbb{T}$ and every $\mu$-measurable subset of $\mathbb{T}$. Compare this with $\mathbb{R}$, a locally compact abelian group on which the Lebesgue measure is a Haar measure.

A standard result in the theory of locally compact abelian (LCA) groups is that each LCA group carries a unique Haar measure, up to a multiplicative constant. With this measure $\mu$ on a given LCA group $G$, we develop Fourier analysis on $G$ as follows. Firstly, we consolidate the continuous characters of $G$, which are continuous functions $\chi: G \to \mathbb{C}$ such that $\chi(gh) = \chi(g)\chi(h)$ for all $g, h \in G$. The collection of all continuous characters of $G$ is called the Pontryagin dual group of $G$, denoted by $\hat{G}$. Note, for example, that $\hat{T} = \lbrace e^{inx}: n \in \mathbb{Z}\rbrace$ and $\hat{R} = \lbrace e^{2 \pi i x \xi}: \xi \in \mathbb{R}\rbrace$.

The Fourier transform is then defined to be the integral transform $\mathscr{F}:L^1(G) \to \mathscr{C}_0(\hat{G})$ that sends $f \in L^1(G)$ to

$$(\mathscr{F}f)(\xi) = \hat{f}(\xi) = \int_G f(x) \overline{\xi(x)} \, dx.$$

With this definition, we can endow $\hat{G}$ with an appropriate Haar measure, called the dual measure that renders the Fourier inversion formula

$$f(x) = \int_{\hat{G}} \hat{f}(\xi) \xi(x) \, d\xi$$

and Plancherel’s theorem

$$\int_G |f(x)|^2 \, dx = \int_{\hat{G}} |\hat{f}(\xi)|^2 \, d\xi$$

true. Both are established through the Pontryagin duality theorem, which states that the Pontryagin dual of the Pontryagin dual of any LCA group $G$ is naturally isomorphic to $G$.

This is the abstract theory of the Fourier transform that unites the two versions of the Fourier transform studied in classical Fourier analysis.

The goal of this post is to study another concrete instantiation of the abstract theory. Specifically, we develop Fourier analysis on the Boolean hypercube, which is a space of bit sequences, i.e., sequences whose terms are selected from a two-element set. As the name suggests, the Boolean hypercube is ubiquitous in theoretical computer science. It also serves as an abstract model for Bernoulli trials in probability theory, and as an oft-used counterexample in metric geometry.

2. Fourier analysis on the infinite Boolean hypercube

Formally, we take the $N$-dimensional Boolean hypercube to be the $N$-fold direct product

$$\mathbb{Z_{\mathnormal{2}}}^N = \lbrace x = (x_1,\ldots,x_N): x_i \in \mathbb{Z_{\mathnormal{2}}}\rbrace$$

of the cyclic group of order 2 $\mathbb{Z}_2$ with the normalized counting measure $\mathbb{P}_N = 2^{-N}\sharp$. For the purposes of developing Fourier analysis on Boolean hypercubes, it is useful to consider the infinite Boolean hypercube

$$\mathbb{Z_{\mathnormal{2}}}^\infty = \lbrace x = (x_n)^\infty_{n=1} : x_i \in \mathbb{Z_{\mathnormal{2}}}\rbrace,$$

which is the $\mathbb{N}$-fold direct product of $\mathbb{Z}_2$ with the probability measure given by the following construction:

Lemma 2.1. Let $\lbrace (\Omega_j, \Sigma_j, \mathbb{P}_j\rbrace_{j \in \mathbb{N}}$ be a collection of probability spaces, and define a cylinder set $E$ as a subset of $\prod_j X_j$ of the form
$$\lbrace x = (x_j): x_j \in E_j \in \mathcal{M_{\mathnormal{j}}} \text{ and } E_j = X_j \text{ for almost all } j\rbrace.$$
There exists a product measure $\mathbb{P}$ on $\Omega = \prod_j \Omega_j$ such that $\mathbb{P}(E) = \prod_j \mathbb{P}_j(E_j)$ for all cylinder sets $E$.

Proof. It is enough to show that $\mathbb{P}$ is countably additive for cylinder sets that are disjoint unions of cylinder sets. The Carathéodory extension theorem takes care of the rest.

Let $E = \prod_{j=1}^\infty E^j$ be a cylinder set and suppose that $E$ can be written as the disjoint union of a countable collection $\lbrace E_n = \prod^\infty_{j=1} E^j_n\rbrace_{n \in \mathbb{N}}$ of cylinder sets. We assume without loss of generality that $E \neq \prod_{j=1}^\infty \Omega_j$ and find $k \in \mathbb{N}$ such that $E^j = \Omega_j$ for all $j > k$ and that $E^k \neq \Omega_j$. Taking unions if necessary, we conclude from this assumption that $E^j_n = \Omega_j$ for each $j > k$ and every $n \in \mathbb{N}$.

Observe that

$$\begin{align*} \mathbb{P}(E) &= \mathbb{P_{\mathnormal{1}}}(E^1)\cdots\mathbb{P_{\mathnormal{k}}}(E^k) \cdot \prod^\infty_{j=k+1} \mathbb{P_{\mathnormal{j}}}(E^j) \\ &= \mathbb{P_{\mathnormal{1}}}(E^1) \cdots \mathbb{P_{\mathnormal{k}}}(E^k). \end{align*}$$

By the argument for the $k$-fold product case, we have

$$\begin{align*} \mathbb{P}(E) &= \sum_{n=1}^\infty \mathbb{P_{\mathnormal{1}}}(E^1_n) \cdots \mathbb{P_{\mathnormal{k}}}(E^k_n) \\ &= \sum_{n=1}^\infty \mathbb{P_{\mathnormal{1}}}(E^1) \cdots \mathbb{P_{\mathnormal{k}}}(E^k_n) \cdot \prod_{j=k+1}^\infty \mathbb{P_{\mathnormal{j}}}(E^j_n) \end{align*}$$

and we have countable additivity. $\square$

We now consider $\mathbb{Z}_2$ to be the multiplicative group $\lbrace -1, 1\rbrace$ and define

$$\mathfrak{r}_n(x) = x_n$$

for each $x \in \mathbb{Z_{\mathnormal{2}}}^\infty$ and every $n \in \mathbb{N}$. We see at once that $(\mathfrak{r})_{n=1}^\infty$ are identically distributed random variables on $\mathbb{Z}_2^\infty$. Indeed, it is instructive to think of them as the score assignment functions in the following game: at each coin flip, the player gains a point if the coin lands on heads and loses a point if the coin lands on tails. The partial sums $S_N = \mathfrak{r}_1 + \cdots + \mathfrak{r}_N$ represent the total score after the $N$th coin flip.

We now prove that $(\mathfrak{r_{\mathnormal{n}}})_{n=1}^\infty$ are independent, thereby showing that they are an example of a Rademacher sequence.

Definition 2.2. A Rademacher sequence is a sequence $(\varepsilon_n)_{n=1}^\infty$ of independent, identically distributed random variables on some probability space $(\Omega, \Sigma, \mathbb{P})$ with the distribution
$$\mathbb{P}(\varepsilon_n = 1) = \mathbb{P}(\varepsilon_n = -1) = \frac{1}{2}.$$

The independence of $(\mathfrak{r_{\mathnormal{n}}})_{n=1}^\infty$ is a consequence of the following lemma:

Lemma 2.3. Let $(X, \mathbb{P})$ be the product of a collection $\lbrace (X_n, \mathbb{P}_n)\rbrace_{n \in \mathbb{N}}$ of probability spaces. If a sequence of random variables $(r_n)_{n=1}^\infty$ admits random variables $F_n: X_n \to \mathbb{R}$ such that $f_n(x) = F_n(x_n)$ for all $n \in \mathbb{N}$ and $x \in X$, then $(f_n)_{n=1}^\infty$ is mutually independent.

Proof. Let $(B_n)_{n=1}^\infty$ be a sequence of Borel subsets of $\mathbb{R}$ and let $E_n = \lbrace x : f_n(x) \in B_n\rbrace$ and $E_n’ = \lbrace x_n: F_n(x_n) \in B_n\rbrace$. Then $E_n = \lbrace x: x_n \ni E_n’\rbrace$, so that $E_n$ is a cylinder set with $\mathbb{P}(E_n) = \mathbb{P}_n(E_n’)$. By the construction of the infinite product measure $\mathbb{P}$, we see that

$$\begin{align*} \mathbb{P}\left(\bigcap_{n=1}^N E_n \right) &= \prod_{n=1}^N \mathbb{P_{\mathnormal{n}}}(E_n’) \\ &= \prod_{n=1}^N \mathbb{P}(E_n), \end{align*}$$

whence sending $N \to \infty$ yields the desired result. $\square$

Observe that $(\mathfrak{r_{\mathnormal{n}}})^\infty_{n=1}$ is not an orthonormal basis of $L_2(\mathbb{Z_{\mathnormal{2}}}^\infty)$. Indeed,

$$\langle \mathfrak{r_{\mathnormal{n}}}, 1 \rangle_{L_2(\mathbb{Z_{\mathnormal{2}}}^\infty)} = \mathbb{E}[\mathfrak{r_{\mathnormal{n}}}] = 0$$

for all $n \in \mathbb{N}$. In particular, this implies that $(\mathfrak{r_{\mathnormal{n}}})^\infty_{n=1}$ cannot serve as a Fourier basis of $L_2(\mathbb{Z_{\mathnormal{2}}}^\infty)$.

To remedy this issue, we collect all finite products of our Rademacher sequence, labeling them as follows. We set $\mathfrak{w_{\mathnormal{0}}}(t) = 1$ and set, for each $n = 2^{k_1} + \cdots + 2^{k_l}$ with $k_1 > \cdots > k_l$,

$\mathfrak{w_{\mathnormal{n}}}(t) = \prod_{i=1}^l \mathfrak{r_{\mathnormal{k_i}}}(t).$

The resulting sequence $(\mathfrak{w_{\mathnormal{n}}})^\infty_{n=0}$ is the sequence of Walsh functions, an orthonormal basis of $L_2(\mathbb{Z_{\mathnormal{2}}}^\infty)$.

We remark that the term Walsh functions in literature typically refers to an orthonormal basis $(w_n)^\infty_{n=0}$ in $L_2([0,1])$ closely related to $(\mathfrak{w_{\mathnormal{n}}})^\infty_{n=0}$. To construct $(w_n)^\infty_{n=0}$, we recall that each real number in $[0,1]$ admits a unique binary expansion, so long as we adopt a convention to accept either the repeated 1’s at the end or the finitary counterparts thereof via repeated 0’s at the end, but not both. Then there is an injective mapping $D:[0,1] \to \mathbb{Z_{\mathnormal{2}}}^\infty$, where $N = \mathbb{Z_{\mathnormal{2}}}^\infty \smallsetminus D([0,1])$ is a countable set.

We now set

$$r_n(t) = \mathfrak{r_{\mathnormal{n}}}(D(t)),$$

so that $r_n$ is a random variable on $[0,1]$, modulo a countable set. Observe that

$$r_1(t) = \begin{cases} 1 & \text{ if } 0 < t < 1/2; \\ -1 & \text{ if } 1/2 < t < 1. \end{cases}$$

Furthermore, if we take the 1-periodic extension of $r_1$ onto $\mathbb{R}$, then

$$r_n(t) = r_1(2^{n-1}t)$$

on $[0,1]$. The functions $r_n$ are referred to as the Rademacher functions, and the products

$$w_n(t) = \prod_{i=1}^l r_{k_i}(t).$$

are referred to as the Walsh functions.

We show that the collection $\mathscr{W}$ Walsh functions form an orthonormal basis of $L_2([0,1])$. To this end, we first check that $\mathscr{W}$ is an orthonormal set. Since $r_n(t)^2 = 1$ for all $n$ and $t$, each Walsh function is evidently of norm one. That $r_n^2 = 1$ allows us to write the product of two Walsh functions as the product of distinct Rademacher functions, whence

$$\mathbb{E}[r_{k_1} \cdots r_{k_l}] = \prod_{i=1}^l \mathbb{E}[r_{k_i}] = 0$$

by the independence of Rademacher functions. It follows that $\mathscr{W}$ is an orthonormal set.

Let us now check that $\mathscr{W}$ spans all of $L_2([0,1])$. Note first that $\operatorname{span} \mathscr{W}$ contains indicator functions over the dyadic intervals $[\frac{i}{2^j}, \frac{i+1}{2^j})$. Therefore, every $f \in \mathscr{W}^\perp = (\operatorname{span} \mathscr{W})^\perp$ must satisfy the equality

$$\int_{i/2^j}^{(i+1)/2^j} f(t) \, dt = 0$$

for all $i$ and $j$. Since every subinterval of $[0,1)$ can be written as a disjoint countable union of dyadic intervals, it follows that

$$\int_a^b f(t) \, dt = 0$$

for all $0 \leq a < b < 1$. Now, we invoke the Lebesgue differentiation theorem to conclude that

$$0 = \lim_{h \to 0} \frac{1}{h} \int_{t-h}^{t+h} f(s) \, ds = f(t).$$

Since $(\mathfrak{w_{\mathnormal{n}}})^\infty_{n=0}$ form the Pontryagin dual $\widehat{\mathbb{Z_{\mathnormal{2}}}^\infty}$ of the infinite Boolean hypercube $\mathbb{Z}_2^\infty$, we see that Fourier analysis on the infinite Boolean hypercube is essentially the same as that of Fourier analysis on $[0,1]$, i.e., the Fourier series. Analogously, the 1-periodic extensions of Walsh functions admit an identification between Fourier analysis on the infinite Boolean hypercube and Fourier analysis on $\mathbb{R}$, i.e., the one-dimensional Fourier transform.

3. Fourier analysis on Boolean hypercubes

Let us know restrict our attention to the Boolean $N$-cube, the $N$-fold product $\mathbb{Z}_2^N$ of $\mathbb{Z}_2 = \lbrace -1,1\rbrace$. In this case, we do not need to invoke the full machinery of Pontryagin duality to establish the basic tools of Fourier analysis.

For notational convenience, we write $[N]$ to denote the set $\lbrace 1,\ldots,N\rbrace$. For each $S \subseteq $[N]$, we have the Walsh function

$$\chi_S(x_1,\ldots,x_N) = \prod_{n \in S} x_n.$$

The independence of Rademacher funtions implies that $\mathscr{W}_N = \lbrace \chi_S : S \subseteq [N]\rbrace$ is an orthonormal set in $L_2(\mathbb{Z}_2^N)$. Since $\dim L_2(\mathbb{Z}_2^N) = 2^N = |\mathscr{W}_N|$, we see that $\mathscr{W}_N$ is an orthonormal basis of $\mathscr{W}_N$.

The theory of finite-dimensional inner product spaces now guarantees that the Fourier–Walsh expansion

$$f(x) = \sum_{S \subseteq [N]} \hat{f}(S) \chi_S(x)$$

is unique, and that the Parseval identity

$$\langle f, f \rangle_{L_2(\mathbb{Z_{\mathnormal{2}}}^N)} = \sum_{S \subseteq [N]} \hat{f}(S)^2$$

is valid for all functions $f:\mathbb{Z}_2^N \to \mathbb{R}$.

Likewise, simple computations show that the foundational Fourier-analytic results such as the convolution theorem

$$\widehat{f \ast g}(S) = \hat{f}(S)\hat{g}(S)$$

and the Hausdorff–Young inequality

$$\lVert \hat{f}\rVert_{L_q(\mathbb{Z_{\mathnormal{2}}}^N)} \leq \lVert f\rVert_{L_p(\mathbb{Z_{\mathnormal{2}}}^N)}$$

for $1 \leq p \leq 2$ and $1/p + 1/q = 1$ continue to hold.

See Chapter 1 of Ryan O’Donnell, Analysis of Boolean Functions for more details on Fourier analysis on Boolean hypercubes. The rest of the O’Donnell book contains a fascinating theory and applications of functions on Boolean hypercubes whose range is restricted to $\lbrace -1, 1\rbrace$.