### Spectral theory, part I: The moment problem and the spectral measure

Let’s briefly recall what I am about to do here. If $\mu$ is a measure supported on the real line and the $n$-th orthonormal polynomial is denoted with $p_n$, then, as we have seen before, there are positive numbers $a_n$ and real numbers $b_n$ such that

$xp_n(x) = a_{n+1} p_{n+1}(x) + b_n p_n(x) + a_n p_{n-1}(x),$

$p_{-1}(x) = 0, p_0(x) = 1$,

which is called the three-term recursion. I asked the question that if we are given positive numbers $a_n$ and real numbers $b_n$, can we construct a measure such that its recurrence coefficients obtained from the three-term recursion match our parameters? In other words, last time we studied the mapping

$\textnormal{measure} \mapsto \textnormal{recurrence coefficients}$,

and now we want to study the inverse mapping

$\textnormal{recurrence coefficients} \mapsto \textnormal{measure}$.

I want to show three approaches for constructing measures from the recurrence coefficients. This post tells you about the first approach in detail. Our approach is adapted from the ancient tome [Freud].

1. The moments and Heine’s formula. I mentioned the moment problem briefly in an earlier post, and now we take a more thorough look. If $\mu$ is a measure supported on the real line, the quantities

$m_k(\mu) = m_k = \int x^k d\mu(x)$

are called the moments. We always assume that they are finite, otherwise there are no orthogonal polynomials. If we define the quantities

$D_n(m_0, \dots, m_{2n}) = D_n := \begin{vmatrix} m_0 & m_1 & \dots & m_n\\ m_1 & m_2 & \dots & m_{n+1} \\ \vdots & \vdots & \ddots & \vdots \\ m_n & m_{n+1} & \dots & m_{2n} \end{vmatrix}$

it is easy to show that $D_n$ needs to be positive for all $n$. This is true because if $\Pi_n(x) = \sum_{k=0}^{n} a_k x^k$ is a nonzero polynomial, then

$\int \Pi_n(x)^2 d\mu(x) = \sum_{k,l=0}^{m} a_k a_l m_{k+l} > 0$,

which, by Sylvester’s criterion, means that $D_n > 0$.

Determinants of the form $D_n$ are called Hankel determinants, and there is an extensive literature for them.

The moments can be used, for example, giving a determinantal formula for orthogonal polynomials. If

$\widetilde{p_n}(x) := \begin{vmatrix} m_0 & m_1 & \dots & m_{n-1} & 1\\ m_1 & m_2 & \dots & m_{n} & x \\ \vdots & \vdots & \ddots & \vdots \\ m_n & m_{n+1} & \dots & m_{2n-1} & x^n \end{vmatrix}$,

it can be seen (by integrating elementwise) that

$\int x^k \widetilde{p_n}(x) d\mu(x) = \begin{cases} 0, & k < n, \\ D_n, & k = n. \end{cases}$.

This means that $\widetilde{p_n}(x)$ is orthogonal to $1, x, \dots, x^{n-1}$, therefore it is a constant multiple of the $n$-th orthogonal polynomial $p_n(x)$, so it can be written as $p_n(x) = \alpha \widetilde{p_n}(x)$. To calculate $\alpha$, we have

$1 = \int p_n(x)^2 d\mu(x) = \int (\alpha D_{n-1} x^n + \Pi_{n-1}(x)) \alpha \widetilde{p_n}(x)$

$= \alpha^2 D_{n-1} D_n$,

so we get

$p_n(x) = \frac{1}{\sqrt{D_{n-1} D_n}} \widetilde{p_n}(x)$.

This is called Heine’s formula. For us, it is useful not because it gives a nice closed-form expression of the orthogonal polynomials, but because it provides some intuition for solving the moment problem.

2. Solving the moment problem. We saw that if $\mu$ is a Borel measure on the real line, then $D_n > 0$, where $D_n$ was defined in the previous section. Now suppose that we are given a sequence of numbers $\{ m_k \}_{k=0}^{\infty}$ such that $D_n = D_n(m_0, \dots, m_{2n}) > 0$. Our goal is to construct a measure $\mu$ such that its moments coincide with the previously given numbers.

Before I construct the measure satisfying our conditions, a remark. If there is a measure $\mu$ solving the moment problem, the integral of every polynomial $\Pi_n(x) = \sum_{k=1}^{n} a_k x^k$ can be expressed in the terms of the moments. Indeed, we have

$\int \Pi_n(x) d\mu(x) = \sum_{k=1}^{n} a_k m_k$!

Therefore, before even having a measure $\mu$ which solves the moment problem, we can talk about the “integral” of a polynomial! This situation is a bit schizophrenic, and to handle it, I introduce the notation

$I(\Pi_n) := \sum_{k=1}^{n} a_k m_k$.

The linear functional $I$ is positive definite by definition! Often I will refer this as the “integral” of $\Pi_n$. There are no underlying measure, hence the quote. $I$ also goes under the name moment functional, as I will refer to it later. Now let’s solve the moment problem!

First, take a look at the polynomials $\widetilde{p_n}$ defined previously as

$\widetilde{p_n}(x) := \begin{vmatrix} m_0 & m_1 & \dots & m_{n-1} & 1\\ m_1 & m_2 & \dots & m_{n} & x \\ \vdots & \vdots & \ddots & \vdots \\ m_n & m_{n+1} & \dots & m_{2n-1} & x^n \end{vmatrix}$.

Our key observation is

$I(x^k \widetilde{p_n}(x)) = \begin{cases} 0, & k < n, \\ D_n, & k = n. \end{cases}$

This property translates to orthogonality, when there is in fact an underlying measure. It nicely shows that the definition of $\widetilde{p_n}$ was hinted by Heine’s formula, because our goal was to “imitate” orthogonality.

Denote the zeros of $\widetilde{p_n}(x)$ with $x_{1,n}, \dots, x_{n,n}$. These zeros are real and simple. Indeed, check the proof of Theorem 1 in an earlier post. It uses nothing else but orthogonality! If $x_1, \dots, x_k$ denotes the real and simple zeros of $\widetilde{p_n}$, then

$I(\widetilde{p_n}(x) \prod_{l=1}^{k} (x-x_l)) = 0,$

which is a contradiction, since $\widetilde{p_n}(x) \prod_{l=1}^{k} (x-x_l)$ is nonnegative everywhere! (And $I$ is positive definite.) Therefore we can suppose that

$x_{1,n} < x_{2,n} < \dots < x_{n,n}$.

The plan is to construct approximating measures $\mu_n$ which converge (or at least a subsequence of the measures converge) to a measure $\mu$ which has the prescribed moment. To proceed, define the Lagrange interpolation polynomials

$l_{k,n}(x) = \frac{\widetilde{p_n}(x)}{\widetilde{p_n}(x)^{'} (x-x_{k,n})}$.

$l_{k,n}$ is of degree $n-1$ and it satisfies

$l_{k,n}(x_{l,n}) = \begin{cases} 0, & l \neq k \\ 1, & l = k. \end{cases}$.

This way, every polynomial $\Pi_n(x)$ of degree $n$ can be written as $\Pi_n(x) = \sum_{k=1}^{n} \Pi_n(x_{k,n}) l_{k,n}(x)$, therefore its “integral” would be

$I(\Pi_n(x)) = \sum_{k=1}^{n} \Pi_n(x_{k,n}) I(l_{k,n})$.

Define the numbers $\lambda_{k,n}$ as

$\lambda_{k,n} := I(l_{k,n}(x))$.

This way, we have $I(\Pi_n(x)) = \sum_{k=1}^{n} \Pi_n(x_{k,n}) \lambda_{k,n}$. Now we can define our approximating measure as

$\mu_n = \sum_{k=1}^{n} \lambda_{k,n} \delta_{x_{k,n}}$,

which has cumulative distribution function

$F_n(x) = \sum_{x_{k,n} < x} \lambda_{k,n}$.

It is important to show that $\mu_n$ is a finite and positive measure. The weights $\lambda_{k,n}$ are positive, and this can be shown by noting that

$\lambda_{k,n} = I(l_{k,n}(x)) = I(l_{k,n}(x)^2) > 0$.

Indeed, it follows by $I(l_{k,n}(x) - l_{k,n}(x)^2) = I(\widetilde{p_n}(x) \Pi_{n-2}(x)) = 0$, where $\Pi_{n-2}(x)$ is some polynomial of degree $n-2$. The finiteness of $\mu_n$ can be easily seen by

$\mu_n(\mathbb{R}) = \sum_{k=1}^{n} \lambda_{k,n} = I(1) = m_0 < \infty$.

The positivity and the finiteness of $\mu_n$ says that we can apply Helly’s selection principle. You can find it in many forms, but the most useful (at least, for our purpose) can be found in [Ismail].

Theorem 1. (Helly’s selection principle) Let $\{ \mu_n \}_{n=0}^{\infty}$ be a sequence of positive measures for which $\mu_n(\mathbb{R}) \leq M < \infty$ for some $M$. (That is, the measures are uniformly finite.) Then there is a subsequence $\{ \mu_{n_k} \}_{k=0}^{\infty}$ which converges weakly to a positive measure $\mu$ . Moreover, if for every $n$ the moments $m_k(\mu_n)$ exist for all $k \in \{0,1,\dots\}$, then the moments of $\mu$ exist and $\lim_{n \to \infty }m_k(\mu_n) = m_k(\mu)$. Furthermore, if $\{\mu_n\}_{n=0}^{\infty}$ does not converge, then there are at least two such subsequences. $\Box$

Applying this theorem for $\mu_n$, we obtain a subsequence $\mu_{n_k}$ which converges to a measure $\mu$.Since $\int x^k d\mu_n(x) = m_k$ for $n > k$, we have

$\int x^k d\mu(x) = m_k$.

It means that $\mu$ solves our moment problem! To summarize, we have just proved the following theorem.

Theorem 2. (Solution of the moment problem) If $\{ m_k \}_{k=0}^{\infty}$ is a sequence of numbers for which $D_n(m_0, \dots, m_{2n}) > 0$ for all $n \in \{0,1,2,\dots\}$, there exists a measure $\mu$ such that

$m_k = \int x^k d\mu(x)$

holds for all $k \in \{0,1,2, \dots\}$. $\Box$

Before using the solution of the moment problem to construct measures from recurrence coefficients of orthogonal polynomials, I conclude this section with two remarks.

a. If you have a sharp eye, you have seen that we basically defined a quadrature process and proved that its weak limit (or a weak limit of its subsequence) converges to a measure with the desired properties. Why did we use the zeros of some mysterious polynomial $\widetilde{p_n}$ defined by determinants? The reason lies in the line of computation

$I(l_{k,n}(x) - l_{k,n}(x)^2) = I(\widetilde{p_n}(x) \Pi_{n-2}) = 0$,

which was used to prove the positivity of our quadrature process. Would we’ve chosen the nodes other way, this would not work!

b. The measure obtained by solving the moment problem is not necessarily unique! (See the last sentence of Helly’s selection principle.) There are some conditions for uniqueness, and we shall see one while discussing the second approach for the spectral theorem. Also, there is an article by Barry Simon titled The classical moment problem as a self-adjoint finite difference operator (with a monstrous size of 107 pages) which describes unicity in detail.

3. Constructing measures from recurrence coefficients. Now we reap the fruits of the previous work, and construct a measure from a set of numbers such that the recurrence coefficients for the orthogonal polynomials matches the prescribed numbers. Suppose that we are given two real sequences

$\{ a_n \}_{n=0}^{\infty}, \{ b_n \}_{n=0}^{\infty}, \quad a_n > 0$.

From these numbers we can construct polynomials $p_n$ with the recursion

$xp_n(x) = a_{n+1} p_{n+1}(x) + b_n p_n(x) + a_n p_{n-1}(x)$,

$p_{-1}(x) = 0, p_0(x) = 1$.

Keep in mind that this time $p_n$ is not related to any measure! Rearranging the terms, this can be written as

$p_{n+1}(x) = \frac{x-b_n}{a_{n+1}}p_n(x) + \frac{a_n}{a_{n+1}}p_{n-1}(x)$,

from which, applying an induction argument, it is clear that $p_n(x) = (a_0 a_1 \dots a_n)^{-1} x^n + \text{lower order terms}$. To simplify things, I introduce the notation

$\gamma_n := \frac{1}{a_0 a_1 \dots a_n}$.

Our plan is to construct a moment functional $I$, show that it is positive definite, then give the desired measure by solving the moment problem. Let $I(x^0) = I(1) = a_0$ (it is not necessary to choose $a_0$, it can be arbitrary), and suppose that $I(1), I(x), \dots, I(x^n)$ are defined. Then we can define $I(x^{n+1})$ as

$I(x^{n+1}) := I(x^{n+1} - \gamma_{n+1}^{-1} p_{n+1}(x))$.

There is no problem with the definition, since the polynomial $x^{n+1} - \gamma_{n+1}^{-1} p_{n+1}(x)$ is of degree $n$, therefore it is valid to evaluate $I$ at there. An induction argument for $m$ shows that

$I(x^m p_n(x)) = 0, \quad m < n$.

Indeed, for $m = 0$, we have $I(p_n(x)) = I(x^n) - I(x^n - p_n(x)) = 0$. For other $m$, we have, via the recurrence relation,

$I(x^m p_{n}(x)) = a_{n+1} I(x^{m-1} p_{n+1}(x)) + b_n I(x^{m-1} p_n(x)) + a_n I(x^{m-1} p_{n-1}(x)) = 0$.

This shows that orthogonality is “encoded” onto the functional $I$. We only need to show the positive definiteness before we can apply our solution of the moment problem.

Positive definiteness of $I$. It is enough to show that for every polynomial $\Pi_n$ of degree $n$, we have $I(\Pi_n(x)^2) > 0$. Indeed, there is a well-known fact which says the following.

Lemma 1. If $\Pi(x)$ is a nonnegative nonzero polynomial, then there are two polynomials $P(x)$ and $Q(x)$ such that $\Pi(x) = P(x)^2 + Q(x)^2$. $\Box$

The proof of this lemma can be found, for example, in [Freud].

To prove the positivity, first notice that by repeated application of the recurrence formula, we have

$I(x^n p_n(x)) = a_n I(x^{n-1} p_{n-1}(x)) = \dots = a_n a_{n-1} \dots a_0 > 0$,

which is clearly positive. It follows that

$I(p_n^2(x)) = I((\gamma_n x^n + \text{lower order terms}) p_n(x)) = \gamma_n I(x^n p_n(x)) > 0$.

Now, if $\Pi_n(x) = \sum_{k=0}^{n} a_k p_k(x)$ is an arbitrary polynomial, then, by “orthogonality”, we have

$I(\Pi_n(x)^2) = \sum_{k=0}^{n} a_k^2 I(p_k(x)^2) > 0$,

and finally it follows from Lemma 1 that $I$ is positive definite, therefore we can apply Theorem 2. To summarize, we have just proved the following theorem.

Theorem 3. (Favard’s theorem) If $\{ a_n \}_{n=0}^{\infty}, a_n > 0$ is a sequence of positive real numbers and $\{ b_n \}_{n=0}^{\infty}$ is a sequence of real numbers, there exist a measure $\mu$ for which the orthonormal polynomials are

$xp_n(x) = a_{n+1} p_{n+1}(x) + b_n p_n(x) + a_n p_{n-1}(x),$

$p_{-1}(x) = 0, p_0(x) = 1$.

This measure is called the spectral measure. Moreover, the measure $\mu$ is not necessarily unique. $\Box$

4. Concluding remarks. Before I end this post, I shall summarize in a few words what we just did. Given a sequence of numbers $\{ m_k \}_{k=0}^{\infty}$, we constructed a functional $I$ which acted on the set of polynomials. Then we constructed some polynomials $\widetilde{p_n}$ which was imitating the orthogonality property. That is, if we imagine that the functional $I$ is obtained from a measure with integration, the polynomials $\widetilde{p_n}$ are the orthogonal polynomials with respect to that measure. (Apart from a constant multiplyer.) We then used the zeros of these polynomials to construct approximating measures, then we saw that the weak limit of these measures (or a weak limit of a subsequence of these measures) solved our moment problem.
Then, given two real sequences $\{ a_n \}_{n=0}^{\infty}, a_n > 0$ and $\{ b_n \}_{n=0}^{\infty}$, we constructed a measure $\mu$ by solving a moment problem such that the coefficients in the three-term recursion for orthonormal polynomials coincided with the previously given sequences.

Although we did what we wanted to (which was Favard’s theorem), there are a few remaining questions.
a. Is there an explicit example where the solution of the moment problem is not unique?
b. What can we say about the support of the spectral measure? For example, when is it compact?
c. What does the naming “spectral measure” has to do with the classic spectral theorem about diagonalizing Hermitian matrices?
d. Can we construct the spectral measure straight from the recurrence coefficients?

The next part will describe a second approach in detail, which will answer some of these questions.

References.
[Freud] Géza Freud, Orthogonal polynomials, Pergamon Press, 1971
[Ismail] Mourad E. H. Ismail, Classical and quantum orthogonal polynomials in one variable, Cambridge University Press, Cambridge, 2005