Lebesgue's Proof of the Weierstrass Approximation Theorem

 Lebesgue Stamp

The Weierstrass Approximation Theorem states that a real continuous function on an interval can be uniformly approximated as close as desired by polynomials. That is, given a continuous real function \(f: [a,b] \rightarrow \mathbb{R}\) and an \(\varepsilon > 0\), there is a polynomial \(p(x)\) such that \(|f(x)-p(x)| < \varepsilon\) for all \(x \in [a,b]\). Weierstrass first proved this theorem in a fruitful but unintuitive way in 1885.[1] Lebesgue's proof of 1898, presented here, takes a natural approach that Euclid would have appreciated.[2] Weierstrass proved the theorem towards the end of his career when he was 70 years old; Lebesgue's proof was in his first published paper at the age of 23. Lebesgue stitches together three principles:

1) A continuous function on an interval can be uniformly approximated by a polygonal line.

2) A polygonal line can be represented as a constant plus a sum of functions of the form \(a|x-b|\).

3) The function \(|x|\) can be uniformly approximated by polynomials on an interval.

Approximation by a Polygonal Line
 delta-epsilon box in continuity

A function is continuous at point \(x_1\) when for every \(\varepsilon > 0\), there is a \(\delta > 0\) such that \(|f(x)-f(x_1)|<\varepsilon\) whenever \(|x-x_1| < \delta\). That is, given \(\varepsilon\), there is a \(\delta\) such that the graph of \(f(x)\) is in the \(2\delta \times 2\varepsilon\) box centered at \((x_1,f(x_1))\), as pictured. In fact, a continuous function in an interval is uniformly continuous there, meaning that for a given \(\varepsilon > 0\), the same \(\delta\) can be used for every value \(x\) in the interval. So partition the interval \([a,b]\) into abscissae \(x_0 = a, x_1, x_2, \cdots, x_n = b\), where the \(x_i\) are exactly \(\delta\) apart, except possibly for the last pair, which may be less than \(\delta\) apart. The abscissae at the bottom of the box in the image, for example, might be \(x_0, x_1, x_2\), and there are a series of similar boxes to the right. The polygonal line approximating \(f(x)\) throughout interval \([a,b]\) will be the one connecting points \((x_0,f(x_0)), (x_1,f(x_1)), \cdots, (x_n,f(x_n))\), the first two segments of which are shown here. The graph of the function and polygonal segments are all in the boxes, so the ordinates of the function and the polygonal line can be no further apart that \(2\varepsilon\) for every \(x\). This shows that the polygonal line uniformly approximates the function \(f(x)\) and proves point (1).

Linear Splines and the Plus Function
 Pierre Bezier
Pierre Bézier (1910-1999)

Physical splines are quite old, strips that bend under carefully calibrated pressure to form smooth curves. The term was adapted more recently in computer assisted design to denote a series of piecewise polynomials stitched together to approximate a function or interpolate a set of data points. The name of Pierre Bézier will forever be associated with splines in the computer field, based on his work in the 1960s on what are now called Bézier curves. Bézier worked for Renault in France, and adapted the idea to using computers to help design automobile bodies. Bézier curves took the world by storm, being used, for example, to render fonts (TrueType), images (Adobe Illustrator), and digital documents of all kinds (Postscript). There too the idea is older, having roots in the 19th century in the work of Hermite and others (even Lagrange around 1800). In fact, Bézier curves are based on the Bernstein basis polynomials introduced by Sergei Bernstein in 1912 in his proof of the Weierstrass Approximation Theorem.

 Linear spline

A polygonal line is nothing but a linear spline, that is, a series of line segments stitched together at hinge points (knots) to make a continuous function.[3] Linear splines can be compounded from the plus function, a kind of primitive linear spline:
\[ P(x) := (x)_+ :=
\begin{cases}
0, & \; \text{if } x < 0\\
x, & \; \text{if } x \geq 0.
\end{cases}
\]
Given the linear spline \(f(x)\) shown here with hinge points \(P_0 = (1,1), \; P_1 = (2,2), \; P_2 = (3,0)\), for example, it is possible to calculate values \(c_i\) such that:
\begin{align*}
f(x) &= c_0 + c_1 \cdot P(x-1) + c_2 \cdot P(x-2) + c_3 \cdot P(x-3).\tag{1}
\end{align*}
The \(c_i\) are calculated as follows. First off, \(c_0 = 1\) guarantees that the right side of (1) agrees with \(f(x)\) for \(x \leq 1\) no matter the values \(c_1, \; c_2, \; c_3\). That's because \(P(x-1) = 0\) for \(x \leq 1\) and similarly for the other two terms on the right side of (1), so put \(c_0 = 1\). The segment between \(P_0\) and \(P_1\), if extended to an infinite straight line, has equation \(g_1(x) = x\). As before, \(c_2\) and \(c_3\) are irrelevant for \( x \leq 2\), what matters is the behavior for \(1 \leq x \leq 2\), and in this interval \(P(x-1) = x-1\), so (1) becomes:
\begin{align*}
f(x) &= c_0 + c_1 \cdot P(x-1), \;\; 1 \leq x \leq 2\\
&= 1 + c_1 \cdot (x-1)\\
&= (1 - c_1) + c_1x.\tag{2}
\end{align*}
The value of \(c_1\) making \(f(x) = g_1(x) = x\) for \(1 \leq x \leq 2\) in (2) is \(c_1 = 1\), so that too is fixed. In like fashion:
\begin{align*}
f(x) &= c_0 + c_1 \cdot P(x-1) + c_2 \cdot P(x-2), \;\; 2 \leq x \leq 3\\
&= 1 + (x-1) + c_2 \cdot (x-2)\\
&= (-2c_2) + (1+c_2)x.\tag{3}
\end{align*}
The segment between \(P_1\) and \(P_2\), if extended to an infinite straight line, has equation \(g_2(x) = -2x + 6\). The assignment making \(f(x)\) in (3) equal to \(g_2(x)\) for \(2 \leq x \leq 3\) is \(c_2 = -3.\) Continuing the calculation one more step produces \(c_3 = 2\), so (1) becomes:
\begin{align*}
f(x) &= 1 + 1 \cdot P(x-1) - 3 \cdot P(x-2) + 2 \cdot P(x-3).\tag{1'}
\end{align*}
In like way, any linear spline can be written as:
\begin{align*}
f(x) &= c_0 + \sum_{i=0}^{n-1} c_{i+1}P(x-x_i),\tag{4}
\end{align*}
where the hinge points are \(x_0, \; x_1, \; \cdots, \; x_{n-1}\). It does not matter what (4) does outside the interval \([x_0, \; x_{n-1}]\) for present purposes, considering that the target function \(f(x)\) is being approximated in that interval. Since
\[ x + |x| =
\begin{cases}
0, & \; \text{if } x < 0\\
2x, & \; \text{if } x \geq 0,
\end{cases}
\]
\begin{align*}
P(x) = \frac{x+|x|}{2}.\tag{5}
\end{align*}
Plugging (5) into (1') and simplifying:
\begin{align*}
f(x) = \frac{1}{2} + \frac{1}{2}|x-1| -\frac{3}{2}|x-2|+|x-3|.\tag{6}
\end{align*}
Such calculations can be executed for any polygonal line on an interval, proving point (2).

Approximating \(|x|\) by Polynomials

The next step is to uniformly approximate \(|x|\) by polynomials in \([-1,1]\). Lebesgue's approach was to note that:
\begin{align*}
|x| = \sqrt{x^2} = \sqrt{1-(1-x^2)} = \sqrt{1-z}\tag{7},
\end{align*}
where \(z = 1-x^2\), then expand (7) by Newton's generalized binomial theorem, giving rise to:
\begin{align*}
\sqrt{1-z} &= 1 - \frac{1}{2} z - \frac{1}{8}z^{2} - \frac{1}{16}z^{3} - \frac{5}{128}z^{4} - \frac{7}{256}z^{5} - \cdots\\
\therefore |x| &= 1 - \frac{1}{2} (1-x^2) - \frac{1}{8}(1-x^2)^{2} - \frac{1}{16}(1-x^2)^{3} - \cdots
\end{align*}
Jean Dieudonné took a different tack, producing a sequence of monotonically increasing polynomials \(u_n(t)\) such that for any \(t\) with \(0 \leq t \leq 1\), \(u_n(t) \rightarrow \sqrt{t}\).[4] The polynomials are defined recursively:
\begin{align*}
u_0(t) &= 0\\
u_{n+1}(t) &= u_n(t) + \tfrac{1}{2}(t-u_n^2(t)), \;\; n \geq 0.
\end{align*}
The proof is by induction. Fix \(t\) and suppose \(u_{n}(t) \leq \sqrt{t}\) for a given \(n\). Then:
\begin{align*}
\sqrt{t} - u_{n+1}(t) &= \sqrt{t} - u_n(t) - \tfrac{1}{2}\left(t-u_n^2(t)\right)\\[0.3em]
&= \left(\sqrt{t} - u_n(t)\right) - \tfrac{1}{2}\left(\sqrt{t} - u_n(t)\right)\left(\sqrt{t} + u_n(t)\right)\\[0.3em]
&= {\color{red}{\left(\sqrt{t} - u_n(t)\right)}}\left(1 - \tfrac{1}{2}\left(\sqrt{t} + u_n(t)\right)\right)\tag{8}.
\end{align*}
The red factor in (8) is positive by the inductive hypothesis; the task is to prove that the other factor is positive as well. That is, the following is needed:
\begin{align*}
0 &\leq 1 - \tfrac{1}{2}\left(\sqrt{t} + u_n(t)\right)\\[0.3em]
\sqrt{t} + u_n(t) &\leq 2.\tag{9}
\end{align*}
The inductive hypothesis \(u_{n}(t) \leq \sqrt{t}\) implies \(\sqrt{t}+u_{n}(t) \leq 2\sqrt{t} \leq 2\), the upper bound of 2 due to \(t \leq 1\). Therefore (9) is true and so both factors on the right side of (8) are positive and the left side is too. That is, \(u_{n+1}(t) \leq \sqrt{t}\). The induction is complete, proving that \(u_n(t) \leq \sqrt{t}\) for all \(n\).

 Polynomials approaching |t|

For a fixed \(t, \; \{u_n(t)\}_n\) is a positive monotonically increasing sequence of real numbers less than \(\sqrt{t}\), so \(u_n(t) \rightarrow \xi\), where \(0 < \xi \leq \sqrt{t}\). Because of the recursive definition of \(u_n(t)\):

\begin{align*}
\xi &= \xi + \tfrac{1}{2}\left(t-\xi^2\right)\\[0.3em]
\therefore \xi^2 &= t\\[0.3em]
\xi &= \sqrt{t}\\[0.3em]
u_n(t) &\rightarrow \sqrt{t}.\tag{10}
\end{align*}

Dini's Theorem says that if a monotone sequence of continuous functions converges pointwise on an interval and if the limit function is also continuous, then the convergence is uniform. These conditions apply here, so the convergence in (10) is uniform. If \(v_n(t) = u_n(t^2)\). Then \(v_n(t) \rightarrow \sqrt{t^2} = |t|\) for \(0 \leq t \leq 1\). This proves point (3).

The restriction of the domain to \([-1, \; 1]\) is entirely arbitrary, since \(t\) can be stretched to produce polynomials uniformly approaching \(|t|\) for \(-k \leq t \leq k\) for any positive real number \(k\). To see this, define \(v_n(t) := u_n(t/k)\) for \(-k \leq t \leq k\). Then \(t/k \in [0, \; 1]\), so:
\begin{align*}
u_n(t/k) &\rightarrow \sqrt{t}, \;\; -k \leq t \leq k\\
\therefore k \cdot u_n(t/k) &\rightarrow k \cdot \sqrt{t}\\
k \cdot v_n(t) &\rightarrow k \cdot \sqrt{t}\\
v_n(t) &\rightarrow \sqrt{t}.
\end{align*}

Stitching it Together

The final step is to put these pieces together. It's been established that a given continuous function can be uniformly approximated on an interval by a polygonal line expressible as a sum of absolute values like \(f(x)\) in (6). What remains is to show that such a sum can be uniformly approximated by polynomials. Take the polygonal line in (6) for example, assumed to approximate the original function in \([0, \; 4]\). Given \(\varepsilon > 0\), choose polynomial \(p(x)\) such that \(|p(x) - |x|| < 2\varepsilon / 9\) in \([-3, \; 3]\) and put:
\begin{align*}
q(x) = \frac{1}{2} + \frac{1}{2} p(x-1) - \frac{3}{2} p(x-2) + p(x-3),
\end{align*}
where 1, 2, 3 are the hinge points and the coefficients are those in \(f(x)\). Then:
\begin{align*}
|f(x) - q(x)| &= \left|f(x) - \left[\frac{1}{2} + \frac{1}{2}p(x-1) - \frac{3}{2}p(x-2) + p(x-3)\right]\right|\\[0.7em]
&= \left|\frac{1}{2}\Big{(}p(x-1) - |x-1|\Big{)} - \frac{3}{2}\Big{(}p(x-2) - |x-2|\Big{)} + \Big{(}p(x-3) - |x-3|\Big{)}\right|\\[0.7em]
&\leq \frac{1}{2}\Big{|}p(x-1) - |x-1|\Big{|} + \frac{3}{2}\Big{|}p(x-2) - |x-2|\Big{|} + \Big{|}p(x-3) - |x-3|\Big{|}\\[0.7em]
& \leq \frac{1}{2} \cdot \frac{2\varepsilon}{9} + \frac{3}{2} \cdot \frac{2\varepsilon}{9} + \frac{2\varepsilon}{9}\\[0.7em]
&= \frac{\varepsilon}{9} + \frac{3\varepsilon}{9} + \frac{2\varepsilon}{9}\\[0.7em]
& < \varepsilon.
\end{align*}
This method can be used on any function with the same form as \(f(x)\). The fudge factor \(2/9 \varepsilon\) will differ based on the number of hinge points and the coefficients, but otherwise this example captures the general argument proving that there is a polynomial as uniformly close to a polygonal line as desired. This completes Lebesgue's proof.

Summing Up

A big theorem, a beautiful proof. The importance of this result became recognized immediately and was very much in the air in 1885, Runge all but proving it at the same time as Weierstrass. Allan Pinkus helpfully reprises the many early proofs[5], culminating in Sergei Bernstein's probabilistic demonstration (!) in 1912. Many of these proofs were quite different, employing techniques that lived on: Weierstrass (convolution integrals), Lebesgue (linear splines), Bernstein (Bézier curves). These proofs arose organically from the analysis practiced in the late nineteenth century. The theorem was an end-point, but also marked the beginning of a new wave of generalization, becoming a foundation stone of functional analysis as it started to develop in the early twentieth century. As a functional analyst would have it: the polynomials are dense in \(C[0,1]\), the space of continuous real functions on \([0,1]\) with sup norm. This proof of Lebesgue stands out as being perhaps the most direct and accessible, deeply rooted in Euclidean-type thinking, but also forward-looking in its engineering sensibility. All that together with surpassing intuition and elegance.

Mike Bertrand

August 9, 2024


^ 1. "Über die analytische Darstellbarkeit sogenannter willkürlicher Functionen einer reellen Veränderlichen", by K. Weierstrass, Sitzungsberichte der Akademie zu Berlin (1885), pp. 633-639 and 789-805. The History of Approximation Theory site has a comprehensive list of historic papers in this area with links, including Weierstrass's paper and those of Lebesgue and Bernstein. It appears this paper's importance was recognized right away, with French translations forthcoming in short order: "Sur la possibilité d'une représentation analytique des fonctions dites arbitraires d'une variable réelle", by K. Weierstrass, Journal de Mathématiques Pures et Appliquées 2 (1886), pp. 105-113 and pp. 115-138.

^ 2. "Sur l'approximation des fonctions", by H. Lebesgue, Bulletin des Sciences Mathématiques 22 (1898), pp. 278-287. The invocation of Euclid is not fortuitous. Euclid XII.2 (circles are to one another as the squares on their diameters) proceeds by inscribing a nested series of regular polygons inside a circle, perhaps the first extant example of the still older method of exhaustion, le méthode des anciens as the French put it. This proposition essentially proves the existence of \(\pi\) as four times the ratio of the area of a circle to the square of its diameter and is based on Euclid XII.1 — similar polygons inscribed in circles are to one another as the squares on their diameters. Euclid continues to use exhaustion in Book XII to prove theorems about pyramids, cones, and spheres, as in XII.18 — spheres are to one another as the cubes on their respective diameters.

^ 3. I've followed "The Weierstrass Approximation Theorems", by Allan Pinkus, Surveys in Approximation Theory 1 (2005), pp. 18-21. Pinkus does a thorough survey of the proofs of this theorem and its companion for trigonometric polynomials up to 1913, including those by Weierstrass, Lebesgue, Bernstein, Fejér, Landau, and Mittag-Leffler.

^ 4. Foundations of Modern Analysis, by Jean Dieudonné, Academic Press (1960, 1969), p. 137. Thanks to Allan Pinkus for the reference.

^ 5. See footnote 3 for Pinkus's survey.