In 1949, Selberg and Erdős proved the Prime Number Theorem in an elementary way based on an inequality Selberg had proved shortly before, "elementary" meaning without the machinery of complex analysis.[1] This was notable because the theorem had first been proven by Hadamard and de la Vallée Poussin in 1896 using complex analysis and there had been doubts that an elementary proof was possible. The Prime Number Theorem states that:
\[\lim_{x \to \infty} \; \frac{\pi(x)}{x/\log{x}} = 1, \hspace{8pt} \text{ where } \pi(x) = \text{ #primes } \leq x.\]
It seemed unreasonable that such advanced methods far from basic number theory were necessary to prove a theorem about prime numbers. Selberg closed the gap in 1949 and Tatuzawa and Iseki[1'] gave a compact proof of Selberg's Inequality in 1951, a derivation I propose to follow in this article.[2]
In what follows, \(x\) denotes a real number greater than or equal to one, \(p\) is a prime, and other lower case letters are positive integers. \(\log{}\) is the natural logarithm. \(\lfloor x \rfloor\) is the greatest integer less than or equal to \(x,\) so \(\lfloor x \rfloor \leq x < \lfloor x \rfloor + 1.\) Also:
\begin{align}
\sum_{p \leq x} &= \text{ sum over all primes } p \leq x,\\[1em]
\sum_{n \leq x} &= \text{ sum over all integers } n, \text{ where } 1 \leq n \leq \lfloor x \rfloor,\\[1em]
\sum_{ij \leq x} &= \text{ sum over all integers } i, j, \text{ where } i \geq 1, \; j\geq 1, \; ij \leq x,\\[1em]
\sum_{d|n} &= \text{ sum over all positive integers } d \text{ dividing } n.
\end{align}
\(O(\cdot)\) is the Bachmann–Landau Big O notation used to compare the asymptotic growth rates of real functions:
For example, \(f(x) = O(1)\) means that \(f(x)\) is bounded; and \(f(x) = O(x)\) means that \(|f(x)| < Kx\) for large \(x.\) It is true that \(\log{x} = O(x)\) because \(\log{x} / x \leq 1/e\) for \(x \geq 1,\) so \( K = 1/e, x_0 = 1\) can be used in the definition.
If \(f_1(x) = O(g(x))\) and \(f_2(x) = O(g(x)),\) then \(f_1(x) + f_2(x)= O(g(x))\) by using the larger of the \(K\)'s and of the \(x_0\)'s. Another easily proven principle is that if \(f(x) = O(g(x)),\) then \(xf(x) = O(xg(x)).\) Statements like \(f(x) = k(x) + O(g(x))\) mean that \(f(x) = k(x) + f_1(x),\) where \(f_1(x) = O(g(x)).\) There are any number of such formulas and I strongly recommend becoming well-acquainted with big O manipulations before proceeding. \(f(x) = O(g(x))\) looks like an equality, but it is really an inequality, a comparison of asymptotic growth rates.
These key functions will play a role:
1) The Möbius function \(\mu(n).\)
\[\mu(n) :=
\left\{
\begin{alignat}{1}\;
&(-1)^k \text{ if } n = p_1p_2 \cdots p_k, \text{ where the } p_i \text{ are all different}\\
&0 \ \text{ if } p^2 | n \text{ for any prime } p.
\end{alignat}\right.\]
In other words, if \(n = p_1^{\alpha_1}p_2^{\alpha_2} \cdots p_k^{\alpha_k},\) then \(\mu(n) = 0\) if any of the \(\alpha_i > 1.\) Otherwise \(n\) is a product of different primes and \(\mu(n) = 1\) if there are an even number of primes and \(\mu(n) = -1\) is there are an odd number of primes. Since 1 is a product of zero primes, this definition implies that \(\mu(1) = 1.\) Here are the first few values of \(\mu(n):\)
\(n\) | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
\(\mu(n)\) | 1 | -1 | -1 | 0 | -1 | 1 | -1 | 0 | 0 | 1 | -1 | 0 |
2) The von Mangoldt function \(\Lambda(n).\)
\[\Lambda(n) :=
\left\{
\begin{alignat}{1}\;
&\log{p} \text{ when } n = p^k \text{ for some } k \geq 1\\
&0 \text { otherwise}.
\end{alignat}\right.\]
In other words, \(\Lambda(n)\) is non-zero only when \(n\) is a power of a prime, and then is the logarithm of that prime. Note this make \(\Lambda(1) = 0.\)
3) Chebyshev's \(\theta(x)\). This sometimes appears as \(\vartheta(x),\) as it does in Tatuzama and Iseki's paper.
\[\theta(x) := \sum_{p \leq x} \log{p}.\]
For example, \(\theta(12.5) = \log{2} + \log{3} + \log{5} + \log{7} + \log{11} = 7.7450.\)
\begin{align}
\psi(x) & := \sum_{n \leq x} \Lambda(n)\\[1em]
&= \sum_{k=1}^\infty \theta\left(x^{1/k}\right)\\[1em]
&= \theta(x) + \theta\left(x^{1/2}\right) + \theta\left(x^{1/3}\right) + \ldots
\end{align}
Note that the sum in the second expression is finite, because for fixed \(x\) and \(k\) beyond a certain point, \(x^{1/k} < 2\) and so \(\theta(x^{1/k}) = 0.\) To see that the first and second expressions are equivalent, consider that:
\begin{align}
\psi(x) & := \sum_{n \leq x} \Lambda(n)\\[1em]
&= \sum_{p^k \leq x} \log{p}\\[1em]
&= \sum_{p \leq x^{1/k}} \log{p}\\[1em]
&= \sum_{k=1}^\infty \theta\left(x^{1/k}\right).\\
\end{align}
The second expression follows from the first because all the values of \(n\) that are not powers of a prime can be ignored since then \(\Lambda(n) = 0.\) And all the values of \(n\) that are a prime power \(p^k\) have \(\Lambda(n) = \log{p}.\) The second and third expressions are really double sums: for every prime \(p\) and every \(k = 1, 2, 3, \ldots\) such that \(p \leq x^{1/k},\) add in a copy of \(\log{p}.\) \(\theta(x)\) is the sum of the logs of the primes \(p\) such that \(p \leq x,\) \(\theta(x^{1/2})\) is the sum of the logs of the primes \(p\) such that \(p^2 \leq x,\) and so on — they add up to the sum on the last line.
Start with \(T(x) := \log{(\lfloor x \rfloor !)} = \sum_{n \leq x} \log{n},\) defined by Chebyshev in his epochal article of 1850, Mémoire sur les nombres premiers.[3] Chebyshev proved that:
\begin{align}
T(x) &= \displaystyle\sum_{m=1}^\infty \psi\left(\frac{x}{m}\right)\\[1em]
& < \tfrac{1}{2} \log{2\pi} + x(\log{x}-1) + \tfrac{1}{2} \log{x} + \tfrac{1}{12}\\[1em]
&= x \log{x} - x + \tfrac{1}{2} \log{x} + C\\[1em]
&= x \log{x} - x + O(\log{x}).\tag{1}
\end{align}
He also showed that:
\begin{align}
\psi(x) & < \tfrac{6}{5} Ax + \tfrac{5}{4 \log{6}} \log^2{x} + \tfrac{5}{4} \log{x} + 1\\[1em]
&= O(x).\tag{2}
\end{align}
Establishing the third bound requires:
Theorem. \( \quad \log{n} = \displaystyle \sum_{j|n} \Lambda(j) = \sum_{ij = n} \Lambda(j).\)\((2')\)
Proof. Let \(n = p_1^{\alpha_1}p_2^{\alpha_2} \cdots p_k^{\alpha_k}.\) Since \(\Lambda\) is supported on prime powers, it suffices to consider factors that are powers of the \(p_j\) up to the highest power dividing \(n\) when evaluating the sum. There are \(\alpha_1\) such powers of \(p_1,\) namely \(p_1, p_1^2, \ldots p_1^{\alpha_1}\) and \(\Lambda\) is \(\log{p_1}\) for each of them. That is, the contribution to the sum from these factors is \(\alpha_1 \log{p_1}.\) The same holds for the other \(p_j\) so adding them all up results in:
\[\sum_{d|n} \Lambda(d) = \alpha_1 \log{p_1} + \alpha_2 \log{p_2} + \ldots + \alpha_k \log{p_k}.\]
But:
\begin{align}
\log{n} &= \log{\left(p_1^{\alpha_1} p_2^{\alpha_2} \cdots p_k^{\alpha_k}\right)}\\[1em]
&= \log{\left(p_1^{\alpha_1}\right)} + \log{\left(p_2^{\alpha_2}\right)} + \ldots + \log{ \left(p_k^{\alpha_k}\right)}\\[1em]
&= \alpha_1 \log{p_1} + \alpha_2 \log{p_2} + \ldots + \alpha_k \log{p_k}.
\end{align}
The second sum in the theorem's statement is just an alternative way of writing the first sum. QED.
\begin{align}
\therefore T(x) &= \sum_{n \leq x} \log{n}\\[1em]
&= \sum_{n \leq x} \sum_{ij = n} \Lambda(j)\\[1em]
&= \sum_{ij \leq x} \Lambda(j) \quad (\color{red}{*})\\[1em]
&= \sum_{j=1}^{\lfloor x \rfloor} \sum_{i=1}^{\left\lfloor x/j \right\rfloor} \Lambda(j) \quad (\color{red}{**})\\[1em]
&= \sum_{j \leq x} \Lambda(j) \left\lfloor \frac{x}{j} \right\rfloor. \quad (\color{red}{{*}{*}{*}})\\[1em]
\end{align}
To explain the last two steps, note that \((\color{red}{*})\) is really a double sum and one way to traverse it is to walk \(j\) from 1 to \(\lfloor x \rfloor,\) for each such \(j\) walking \(i\) from 1 to \(\lfloor x/j \rfloor.\) This is (\(\color{red}{**}).\) The image illustrates the situation for \(x = 6\) in the \(ij\) plane. Starting with the horizontal line \(j=1,\) walk \(i\) from \(i=1\) to \(i=6;\) that is, visit the lowest row of lattice points from left to right — there are six of them. Then for \(j=2,\) visit the three lattice points along \(j=2\) from left to right, and continue up the graph. For each lattice point visited, add in a copy of \(\Lambda(j).\) This leads naturally to \((\color{red}{{*}{*}{*}}),\) since the inner sum of \((\color{red}{**})\) is just \(\lfloor x/j \rfloor\) copies \(\Lambda(j).\)
The way is now clear to derive the third bound. There is the simple fact that:
\[\frac{x}{n} = \left\lfloor \frac{x}{n} \right\rfloor + \xi, \quad 0 \leq \xi < 1,\]
which together with (1) and (2) leads to:
\begin{align}
\sum_{n \leq x} \Lambda(n) \frac{x}{n} &= \sum_{n \leq x} \Lambda(n) \left\lfloor \frac{x}{n} \right\rfloor + \sum_{n \leq x} \Lambda(n) \cdot \xi\\[1em]
&= T(x) + \xi \cdot \sum_{n \leq x} \Lambda(n)\\[1em]
&= T(x) + \xi \cdot \psi(x).\\[1em]
\therefore \; \left|\sum_{n \leq x} \Lambda(n) \frac{x}{n}\right| & \leq T(x) + \xi \cdot \psi(x)\\
&= (x \log{x} - x + O(\log{x})) + O(x).\\[1em]
\end{align}
This implies the third bound:
\[\sum_{n \leq x} \Lambda(n) \frac{x}{n} = x \log{x} + O(x).\tag{3}\]
Two results are necessary before proceeding to Tatuzawa-Iseki.
Theorem. \(\quad \displaystyle \sum_{d|n} \mu(d) = 0\) unless \(n = 1,\) in which case the sum equals 1. \((4)\)
Proof. Let \(k = \) the number of different primes dividing \(n.\) Suppose \(k = 3,\) for example, so \(n = p^\alpha q^\beta r^\gamma,\) where \(\alpha, \beta, \gamma \geq 1.\) The only divisors necessary to consider are \(\{1, p, q, r, pq, pr, qr, pqr\}\) because all the others have a prime square divisor, so:
\begin{align}
\sum_{d|n} \mu(d) &= \mu(1) + [\underbrace{\mu(p) + \mu(q) + \mu(r)}_{\large \text{one prime}}] + [\underbrace{\mu(pq) + \mu(pr) + \mu(qr)}_{\large \text{two primes}}] + \underbrace{\mu(pqr)}_{\large \text{three primes}}\\[1em]
&= \color{red}{1} \cdot (+1) + \color{red}{3} \cdot (-1) + \color{red}{3} \cdot (+1) + \color{red}{1} \cdot (-1)\\[1em]
&= 0.
\end{align}
In moving from left to right in the middle line, the number of primes in a factor increases by one in each grouping, hence the sign changes. The coefficients are just the number of ways of choosing \(j\) things from three things, namely \({3}\choose{j}.\) So the red coefficients can be read off from Pascal's Triangle. The same argument obtains whenever the \(k\) is odd because of the symmetry of the triangle. When \(k = 5,\) for example, the weighted coefficients are:
\[1 -5 +10 -10 +5 -1,\]
which sum to zero. Symmetry doesn't apply when \(k\) is even. When \(k = 6,\) the weighted coefficients are:
\[1 -6 +15 -20 +15 -6 +1.\]
That they add to zero follows from:
\begin{align}
0 &= (1 - 1)^k\\[1em]
&= 1 + {{k}\choose{1}} 1^{k-1} (-1)^{1} + {{k}\choose{2}} 1^{k-2} (-1)^{2} + {{k}\choose{3}} 1^{k-3} (-1)^{3} + \cdots\\[1em]
&= 1 - {{k}\choose{1}} + {{k}\choose{2}} - {{k}\choose{3}} + \cdots.
\end{align}
This calculation actually works for odd \(k\) as well. QED.
The second result needed is:
\[\Lambda(n) = \sum_{d|n} \mu(d) \log{\frac{n}{d}},\tag{5}\]
which follows from (2') by the Möbius inversion formula. On to Tatuzawa-Iseki.
Let \(F(x)\) and \(G(x)\) be any two functions defined for \(x \geq 1\) and such that:
\[G(x) = \sum_{n \leq x}F\left(\frac{x}{n}\right) \log{x}.\tag{6}\]
Then:
\begin{align}
J(x) & := \sum_{k \leq x} \mu(k) G\left( \frac{x}{k}\right)\\[1em]
&= \sum_{k \leq x} \mu(k) \sum_{j \leq x/k} F\left(\frac{x}{jk}\right) \log{\left(\frac{x}{k}\right)}\\[1em]
&= \sum_{jk \leq x} \mu(k) \log{\left(\frac{x}{k}\right)} F\left(\frac{x}{jk}\right).
\end{align}
The second line results from expanding \(G\) in terms of \(F.\) At first glance, it looks like the inner sum has been lost in the third line, but not so. The third line is a double sum just like the second line, traversing for every \(k,\) for every \(j.\) Putting \(n = jk\) in the third line, \(k\) will traverse all divisors of \(n,\) so continue as follows:
\begin{align}
J(x) &= \sum_{n \leq x} F\left(\frac{x}{n}\right) \sum_{k|n} \mu(k) \log{\left(\frac{x}{k}\right)}\\[1em]
&= \sum_{n \leq x} F\left(\frac{x}{n}\right) \sum_{k|n} \mu(k) \cdot \left(\log{\left(\frac{x}{n}\right)} + \log{\left(\frac{n}{k}\right)}\right).\\[1em]
\end{align}
The second line here simply breaks the log of a product into the sum of logs in the inner sum. The next step is to break the inner sum into two:
\begin{align}
J(x) &= \sum_{n \leq x}F\left(\frac{x}{n}\right) \log{\left(\frac{x}{n}\right)} \sum_{k|n} \mu(k) + \sum_{n \leq x}F\left(\frac{x}{n}\right) \sum_{k|n} \mu(k) \log{\left(\frac{n}{k}\right)}\\[1em]
&= F(x) \log{x} + \sum_{n \leq x}F\left(\frac{x}{n}\right) \Lambda(n).\\[1em]
\end{align}
Nice! Let's see why the second line follows. In the first sum on the first line, \(\sum_{k|n} \mu(k)\) is almost always zero by (4). The only time its not zero is when \(n = 1\) and in that case, \(\sum_{k|n} \mu(k) = 1\) and the entire sum reduces to \(F(x) \log{x}.\) In the second sum on the first line, substitute by virtue of (5). The initial and final expressions for \(J(x)\) state the Tatuzawa-Iseki identity for functions \(F\) and \(G\) related by (6):
\[\sum_{k \leq x} \mu(k) G\left( \frac{x}{k}\right) = F(x) \log{x} + \sum_{n \leq x}F\left(\frac{x}{n}\right) \Lambda(n).\tag{7}\]
More properly, the Tatuzawa-Iseki identity implies the Selberg inequality. Let's repeat the Tatuzawa-Iseki identity, where \(F\) is a given function defined for \(x \geq 1:\)
\begin{align}
G(x) &= \sum_{n \leq x}F\left(\frac{x}{n}\right) \log{x}.\tag{6'}\\[1em]
\sum_{k \leq x} \mu(k) G\left( \frac{x}{k}\right) &= F(x) \log{x} + \sum_{n \leq x}F\left(\frac{x}{n}\right) \Lambda(n).\tag{7'}
\end{align}
Tatuzawa-Iseki says that for any functions \(F\) and \(G\) defined for \(x \geq 1\) and related by (6'), the identity (7') holds. The approach is to now choose a specific \(F\) moving us towards the goal, writing down its corresponding \(G\) defined by (6'):
\begin{align}
F(x) &= \psi(x) - x + \gamma + 1\tag{8}\\[1em]
G(x) &= \sum_{n \leq x} \left(\psi \left(\frac{x}{n} \right) - \frac{x}{n} + \gamma + 1 \right) \log{x}.\tag{9}
\end{align}
\(\gamma = 0.5772156649\) is Euler's constant. \(F\) is defined for \(x \geq 1\) and so is its associated \(G.\) Break (9) out into three sums:
\begin{align}
G(x) &= \sum_{n \leq x} \psi \left(\frac{x}{n} \right) \log{x} - \sum_{n \leq x} \frac{x}{n} \log{x} + (\gamma + 1) \sum_{n \leq x} \log{x}.\tag{10}
\end{align}
Consider the three sums in (10) in order.
\begin{align}
\sum_{n \leq x} \psi \left(\frac{x}{n} \right) \log{x} &= \log{x} \sum_{n \leq x} \psi \left(\frac{x}{n} \right)\\[1em]
&= \log{x} (x \log{x} - x + O(\log{x}))\\[1em]
&= x \log^2{x} - x \log{x} + O(\log^2{x}),\tag{10.1}
\end{align}
where the second line follows from (1). The second sum in (10) requires:
Theorem. \(\quad \displaystyle \sum_{n \leq x} \frac{1}{n} = \log{x} + \gamma + O\left(\frac{1}{x}\right).\)
Proof. Let \(K_n = \displaystyle \sum_{j=1}^n \frac{1}{j} - \log{n}\) for any integer \(n \geq 1.\) Then:
\begin{align}
K_n - K_{n+1} &= \left(\sum_{j=1}^n \frac{1}{j} - \log{n}\right) - \left(\sum_{j=1}^{n+1} \frac{1}{j} - \log{(n+1)}\right)\\[1em]
&= \log\left(1+\frac1n\right)-\frac1{n+1}\\[1em]
&= \int_0^{1/n}\frac{x}{(1+x)^2}\,dx\\[1em]
& < \int_0^{1/n}x\,dx\\[1em]
&= \frac1{2n^2}\\[1em]
& < \frac1{2n-1}-\frac1{2n+1}.
\end{align}
Fix \(n\) and let \(m > n.\) Then:
\begin{align}
K_n - K_{n+1} & < \frac1{2n-1}-\frac1{2n+1}\\[1em]
K_{n+1} - K_{n+2} & < \frac1{2n+1}-\frac1{2n+3}\\
& \vdots\\
& \vdots\\
K_{m-1} - K_m & < \frac1{2m-3}-\frac1{2m-1}\\[1em]
\therefore K_n - K_m & < \frac1{2n-1}-\frac1{2m-1},
\end{align}
where the last line results from adding all the earlier ones, most of the terms cancelling. Rearranging and substituting in for \(K_m:\)
\begin{align}
K_n & < \frac1{2n-1} + \left(K_m - \frac1{2m-1}\right)\\[1em]
&= \frac1{2n-1} + \underbrace{ \sum_{j=1}^m \frac1{j} - \log{m} }_{\large \text{goes to } \gamma} - \underbrace{\frac1{2m-1}}_{\large \text{goes to } 0}.
\end{align}
Let \(m\) go to infinity. Then
\begin{align}
K_n & \leq \frac1{2n-1} + \gamma\\[1em]
\text{ie, } \sum_{j=1}^n \frac{1}{j} & \leq \log{n} + \gamma + \frac1{2n-1}.\tag{A}
\end{align}
(A) holds for integers \(n,\) but it is also valid if \(n\) is replaced with any real \(x\) such that \(x \geq 1.\) For suppose \(n < x < n+1\) and consider the adjustment in (A). The sum will be the same for \(x\) as for \(n.\) The \(\log{}\) term increases and the fraction on the right decreases when \(n\) is replaced by \(x.\) But the \(\log{}\) increases faster than the fraction decreases in the interval \((n, n+1),\) as is seen by looking at the derivatives:
\begin{align}
\frac{d}{dx}(\log{x}) &= \frac1{x},\\[1em]
\frac{d}{dx} \left(\frac1{2x-1}\right) &= -\frac{2}{(2x-1)^2}.
\end{align}
Putting \(x\) for \(n\) in (A):
\begin{align}
\sum_{n \leq x} \frac{1}{n} &= \log{x} + \gamma + O\left(\frac1{2x-1}\right)\\[1em]
&= \log{x} + \gamma + O\left(\frac1{x}\right). \quad \mathbf{QED.}\tag{B}
\end{align}
Multiplying (B) by \(x \log{x}\) results in:
\[\sum_{n \leq x} \frac{x}{n} \log{x} = x \log^2{x} + \gamma \; x \log{x} + O(\log{x}).\tag{10.2}\]
Hat tip to robjohn at Mathematics Stack Exchange for the proof of this theorem, I'd been pulling my hair out. For the third sum in (10), proceed as follows:
\begin{align}
(\gamma + 1) \sum_{n \leq x} \log{x} &= (\gamma + 1) \log{x} \sum_{n \leq x} 1\\[1em]
&= (\gamma + 1) \log{x} \cdot \lfloor x \rfloor\\[1em]
&= (\gamma + 1) x \log{x} + (\gamma + 1) ( \lfloor x \rfloor - x ) \log{x}\\[1em]
& \leq (\gamma + 1) x \log{x} + (\gamma + 1) \cdot 1 \cdot \log{x}\\[1em]
&= (\gamma + 1) x \log{x} + O(\log{x}).\tag{10.3}
\end{align}
Now \(G(x)\) can be bounded. Proceeding with (10) and applying the three bounds just derived:
\begin{align}
G(x) &= \sum_{n \leq x} \psi \left(\frac{x}{n} \right) \log{x} - \sum_{n \leq x} \frac{x}{n} \log{x} + (\gamma + 1) \sum_{n \leq x} \log{x}\\[1em]
&= x \log^2{x} - x \log{x} + O(\log^2{x})\\[1em]
&\quad - \left(x \log^2{x} + \gamma \; x \log{x} + O(\log{x})\right)\\[1em]
&\quad + (\gamma + 1) x \log{x} + O(\log{x})\\[1em]
&= O\left(\log^2{x}\right) - O(\log{x}) + O(\log{x})\\[1em]
&= O\left(\sqrt{x}\right).
\end{align}
Applying this bound on \(G\) and the Tatuzawa-Iseki identity (7'):
\begin{align}
I(x) &:= F(x) \log{x} + \sum_{n \leq x}F\left(\frac{x}{n}\right) \Lambda(n)\tag{11}\\[1em]
&= \sum_{k \leq x} \mu(k) G\left( \frac{x}{k}\right)\\[1em]
&= O\left(\sum_{k \leq x} G\left( \frac{x}{k}\right)\right)\\[1em]
&= O\left(\sum_{k \leq x} \sqrt{\frac{x}{k}} \right)\\[1em]
&= O\left( \sqrt{x} \cdot \sum_{k \leq x} \sqrt{\frac{1}{k}} \right)\\[1em]
\end{align}
The sum on the last line can be bounded by an integral, as suggested by the graph, where \(n \leq x < x + 1:\)
\begin{align}
\sum_{k=2}^n \frac1{\sqrt{k}} & < \int_1^n \frac{du}{\sqrt{u}}\\[1em]
\therefore \sum_{k=1}^n \frac1{\sqrt{k}} & < 1 + \int_1^n \frac{du}{\sqrt{u}}\\[1em]
\therefore \sum_{k \leq x} \frac1{\sqrt{k}} & < 1 + \int_1^{\lfloor x \rfloor} \frac{du}{\sqrt{u}}\\[1em]
& \leq 1 + \int_1^{x} \frac{du}{\sqrt{u}}\\[1em]
&= 1 + (2\sqrt{x} - 2)\\[1em]
&= (2\sqrt{x} -1) < 2 \sqrt{x} = O(\sqrt{x}).\\[1em]
\end{align}
It follows that \(I(x) = O(\sqrt{x} \cdot \sqrt{x}) = O(x).\)
Plugging \(F(x) = \psi(x) - x + \gamma + 1\) into (11):
\[(\psi(x) - x + \gamma + 1) \cdot \log{x} + \sum_{n \leq x} \left(\psi\left(\frac{x}{n}\right) - \frac{x}{n} + \gamma + 1\right) \cdot \Lambda(n) = O(x).\]
Rearranging:
\begin{align}
\psi(x)\log{x} + \sum_{x \leq n} \psi\left(\frac{x}{n}\right) \Lambda(n) &= x \log{x} - (\gamma + 1)x + \sum_{n \leq x} \frac{x}{n}\Lambda(n) - (\gamma+1)\sum_{n \leq x}\Lambda(n) + O(x)\\[1em]
&= x \log{x} - O(x) + (x \log{x} + O(x)) - O(x) + O(x)\\[1em]
&= 2 x \log{x} + O(x).
\end{align}
The third substitution from the left in the middle line is by virtue of (3), the fourth substitution by the definition of \(\psi(x)\) and (2).
This proves a variant of Selberg's inequality[4] sufficient to prove the Prime Number Theorem:
\[\psi(x)\log{x} + \sum_{n \leq x} \psi\left(\frac{x}{n}\right) \Lambda(n) = 2 x \log{x} + O(x).\]
Mike Bertrand
June 6, 2023
^ 1. "An Elementary Proof of the Prime-Number Theorem", by Atle Selberg, Annals of Mathematics, Second Series, Vol. 50, No. 2 (Apr., 1949), pp. 305-313.
^ 1'. I've tried to learn more about Tikao Tatuzawa and Kanesiroo Iseki without much success and could not find an image of either of them. It seems both had distinguished careers.
^ 2. "On Selberg's Elementary Proof of the Prime-Number Theorem", by Tikao Tatuzawa and Kanesiroo Iseki, Proc. Japan Acad., 27 (1951) pp. 340-342. This three page article is very dense. A great aid in understanding it is "A Motivated Account of an Elementary Proof of the Prime Number Theorem", by Norman Levinson, The American Mathematical Monthly, Vol. 76, No. 3 (Mar., 1969), pp. 225-245 — see §1 - §4.
^ 3. "Mémoire sur les nombres premiers", by Pafnuty Chebyshev, Journal de mathématiques pures et appliquées 1re série, tome 17 (1852), pp. 366-390. The paper was originally read to the St. Petersburg Academy of Sciences in September 1850. See my article on the Chebyshev's Mémoire, which follows it in detail.
^ 4. Selberg's formula differs slightly; he proved:
\[\theta(x)\log{x} + \sum_{p \leq x} \log{p} \cdot \theta\left(\frac{x}{p}\right) = 2 x \log{x} + O(x).\]
The two formulations are equivalent and either can be used to prove the Prime Number Theorem.