A Proofs 😱
A.1 Proof of Theorem 3.1
We let \(X_t = W_t + \mu\), where \(\mu < \infty\) and \((W_t)\) is a strong white noise process with variance \(\sigma^2\) and finite fourth moment (i.e. \(\mathbb{E} [W_t^4] < \infty\)).
Next, we consider the sample autocovariance function computed on \((X_t)\), i.e.
\[ \hat \gamma \left( h \right) = \frac{1}{n}\sum\limits_{t = 1}^{n - h} {\left( {{X_t} - \bar X} \right)\left( {{X_{t + h}} - \bar X} \right)}. \]
For this equation, it is clear that \(\hat \gamma \left( 0 \right)\) and \(\hat \gamma \left( h \right)\) (with \(h > 0\)) are two statistics involving sums of different lengths. As we will see, this prevents us from using directly the multivariate central limit theorem on the vector \([ \hat \gamma \left( h \right) \;\;\; \hat \gamma \left( h \right) ]^T\). However, the lag \(h\) is fixed and therefore the difference in the number of elements of both sums is asymptotically negligible. Therefore, we define a new statistic
\[\tilde{\gamma} \left( h \right) = \frac{1}{n}\sum\limits_{t = 1}^{n} {\left( {{X_t} - \mu} \right)\left( {{X_{t + h}} - \mu} \right)}, \]
which, as we will see, is easier to used and show that \(\hat \gamma \left( h \right)\) and \(\tilde{\gamma} \left( h \right)\) are asymptotically equivalent in the sense that:
\[ n^{\frac{1}{2}}[\tilde{\gamma} \left( h \right) - \hat \gamma \left( h \right)] = o_p(1). \]
Therefore, assuming this results to be true, \(\tilde{\gamma} \left( h \right)\) and \(\hat \gamma \left( h \right)\) would have the same asymptotic distribution, it is sufficient to show the asymptotic distribution of \(\tilde{\gamma} \left( h \right)\). So that before continuing the proof the Theorem 1 we first state and prove the following lemma:
Lemma A1: Let
\[ X_t = \mu + \sum\limits_{j = -\infty}^{\infty} \psi_j W_{t-j}, \] where \((W_t)\) is a strong white process with variance \(\sigma^2\), and the coefficients satisfying \(\sum \, |\psi_j| < \infty\). Then, we have
\[ n^{\frac{1}{2}}[\tilde{\gamma} \left( h \right) - \hat \gamma \left( h \right)] = o_p(1). \]
Proof: By Markov inequality, we have
\[ \mathbb{P}\left( |n^{\frac{1}{2}}[\tilde{\gamma} \left( h \right) - \hat \gamma \left( h \right)]| \geq \epsilon \right) \leq \frac{\mathbb{E}|n^{\frac{1}{2}}[\tilde{\gamma} \left( h \right) - \hat \gamma \left( h \right)]|}{\epsilon}, \] for any \(\epsilon > 0\). Thus, it is enough to show that
\[\mathop {\lim }\limits_{n \to \infty } \; \mathbb{E} \left[|n^{\frac{1}{2}}[\tilde{\gamma} \left( h \right) - \hat \gamma \left( h \right)]|\right] = 0\]
to prove Lemma A1.By the definitions of \(\tilde{\gamma} \left( h \right)\) and \(\hat \gamma \left( h \right)\), we have
\[ \begin{aligned} n^{\frac{1}{2}}[\tilde{\gamma} \left( h \right) - \hat \gamma \left( h \right)] &= \frac{1}{\sqrt{n}} \sum_{t = n-h+1}^{n}(X_t - \mu)(X_{t+h} - \mu) \\ &+ \frac{1}{\sqrt{n}} \sum_{t = 1}^{n-h}\left[(X_t - \mu)(X_{t+h} - \mu) - (X_t - \bar{X})(X_{t+h} - \bar{X})\right]\\ &= \frac{1}{\sqrt{n}} \sum_{t = n-h+1}^{n}(X_t - \mu)(X_{t+h} - \mu) + \frac{1}{\sqrt{n}} \sum_{t = 1}^{n-h}\left[(\bar{X} - \mu)(X_t + X_{t+h} - \mu - \bar{X})\right]\\ &= \frac{1}{\sqrt{n}} \sum_{t = n-h+1}^{n} (X_t - \mu)(X_{t+h} - \mu) + \frac{1}{\sqrt{n}} (\bar{X} - \mu)\sum_{t = 1}^{n-h}(X_t + X_{t+h} - \mu - \bar{X})\\ &= \frac{1}{\sqrt{n}} \sum_{t = n-h+1}^{n} (X_t - \mu)(X_{t+h} - \mu) + \frac{1}{\sqrt{n}} (\bar{X} - \mu)\left[\sum_{t = 1+h}^{n-h}X_t - (n-h)\mu + h\bar{X}\right]\\ &= \frac{1}{\sqrt{n}} \sum_{t = n-h+1}^{n} (X_t - \mu)(X_{t+h} - \mu) + \frac{1}{\sqrt{n}} (\bar{X} - \mu)\left[\sum_{t = 1+h}^{n-h}(X_t - \mu) - h(\mu - \bar{X})\right]\\ &= \frac{1}{\sqrt{n}} \sum_{t = n-h+1}^{n} (X_t - \mu)(X_{t+h} - \mu) + \frac{1}{\sqrt{n}} (\bar{X} - \mu)\sum_{t = 1+h}^{n-h}(X_t - \mu) + \frac{h}{\sqrt{n}} (\bar{X} - \mu)^2, \end{aligned} \] where \(\bar{X} = \frac{1}{n}\sum_{t=1}^n X_t = \mu + \frac{1}{n}\sum_{t=1}^n\sum_{j=-\infty}^{\infty} \psi_j W_{t-j} = \mu + \frac{1}{n} \sum_{j = -\infty}^{\infty} \sum_{t=1}^n \psi_j W_{t-j}\).
Then, we have \[ \begin{aligned} \mathbb{E}\left[\left|n^{\frac{1}{2}}[\tilde{\gamma} \left( h \right) - \hat \gamma \left( h \right)]\right|\right] &\leq \frac{1}{\sqrt{n}} \sum_{t = n-h+1}^{n} \mathbb{E}\left[\left|(X_t - \mu) \, (X_{t+h} - \mu)\right|\right]\\ &+ \frac{1}{\sqrt{n}} \mathbb{E} \left[\left|(\bar{X} - \mu) \, \sum_{t = 1+h}^{n-h}(X_t - \mu)\right|\right] + \frac{h}{\sqrt{n}}\mathbb{E} \left[ (\bar{X} - \mu)^2 \right]. \end{aligned} \]
Next, we consider each term of the above equation. For the first term, since \((X_t - \mu)^2 = \left(\sum_{j = -\infty}^{\infty} \psi_j W_{t-j}\right)^2\), and \(\mathbb{E}[W_iW_j] \neq 0\) only if \(i = j\). By Cauchy–Schwarz inequality we have
\[ \mathbb{E}\left[|(X_t - \mu)(X_{t+h} - \mu)|\right] \leq \sqrt{\mathbb{E}\left[|(X_t - \mu)|^2\right] \mathbb{E}\left[|(X_{t+h} - \mu)|^2\right]} = \sigma^2 \sum_{i = -\infty}^{\infty}\psi_i^2. \]
Then, we consider the third term, since it will be used in the second term
\[\mathbb{E}[(\bar{X} - \mu)^2] = \frac{1}{n^2} \sum_{t = 1}^{n} \sum_{i = -\infty}^{\infty} \psi_i^2 \mathbb{E}\left[ W_{t-i}^2 \right] = \frac{\sigma^2}{n} \sum_{i = -\infty}^{\infty}\psi_i^2.\]
Similarly, for the second term we have
\[\begin{aligned} \mathbb{E}\left[\left|(\bar{X} - \mu) \sum_{t = 1+h}^{n-h}(X_t - \mu)\right|\right] &\leq \sqrt{\mathbb{E}\left[|(\bar{X} - \mu)|^2\right] \mathbb{E}\left[|\sum_{t = 1+h}^{n-h}(X_t - \mu)|^2\right]}\\ &= \sqrt{\mathbb{E}\left[(\bar{X} - \mu)^2\right] \mathbb{E}\left[\sum_{t = 1+h}^{n-h}\left(X_t - \mu \right)^2 + \sum_{t_1 \neq t_2}(X_{t_1} - \mu)(X_{t_2} - \mu) \right]}\\ &\leq \sqrt{\frac{\sigma^2}{n} \sum_{i = -\infty}^{\infty}\psi_i^2 \cdot (n-2h)\sigma^2 \left( \sum_{j = -\infty}^{\infty} |\psi_j| \right)^2}\\ &\leq \sqrt{\frac{n-2h}{n}}\sigma^2 \left(\sum_{i = -\infty}^{\infty}|\psi_i| \right)^2. \end{aligned} \]
Combining the above results we obtain
\[\begin{aligned} \mathbb{E}|n^{\frac{1}{2}}[\tilde{\gamma} \left( h \right) - \hat \gamma \left( h \right)]| &\leq \frac{1}{\sqrt{n}} h \sigma^2 \sum_{i = -\infty}^{\infty}\psi_i^2 + \sqrt{\frac{n-2h}{n^2}}\sigma^2 \left(\sum_{i = -\infty}^{\infty}|\psi_i| \right)^2 + \frac{h}{n\sqrt{n}}\sigma^2 \sum_{i = -\infty}^{\infty}\psi_i^2\\ &\leq \frac{1}{n\sqrt{n}} (nh + \sqrt{n - 2h} + h) \sigma^2 \left(\sum_{i = -\infty}^{\infty}|\psi_i|\right)^2, \end{aligned} \]
By the taking the limit in \(n\) we have
\[\mathop {\lim }\limits_{n \to \infty } \; \mathbb{E} \left[|n^{\frac{1}{2}}[\tilde{\gamma} \left( h \right) - \hat \gamma \left( h \right)]|\right] \leq \sigma^2 \left(\sum_{i = -\infty}^{\infty}|\psi_i|\right)^2 \mathop {\lim }\limits_{n \to \infty } \; \frac{nh + \sqrt{n - 2h} + h}{n\sqrt{n}} = 0. \]
We can therefore conclude that
\[\sqrt{n}[\tilde{\gamma} \left( h \right) - \hat \gamma \left( h \right)] = o_p(1),\]
which concludes the proof of Lemma A1. \(\;\;\;\;\;\;\;\; \blacksquare\)
\(\\\)
\(\\\)
Returning to the proof of Theorem 1, since the process \((Y_t)\), where \(Y_t = \left( {{X_t} - \mu} \right)\left( {{X_{t + h}} - \mu} \right)\), is iid, we can apply multivariate central limit theorem to the vector \([ \tilde \gamma \left( h \right) \;\;\; \tilde \gamma \left( h \right) ]^T\), and we obtain
\[\begin{aligned} \sqrt{n}\left\{ \begin{bmatrix} \tilde{\gamma} \left( 0 \right) \\ \tilde{\gamma} \left( h \right) \end{bmatrix} - \mathbb{E}\begin{bmatrix} \tilde{\gamma} \left( 0 \right) \\ \tilde{\gamma} \left( h \right) \end{bmatrix} \right\} &= \frac{1}{\sqrt{n}}\begin{bmatrix} \sum\limits_{t = 1}^{n}(X_t - \mu)^2 - n\mathbb{E}\left[ \tilde{\gamma} \left( 0 \right) \right]\\ \sum\limits_{t = 1}^{n}\left( {{X_t} - \mu} \right)\left( {{X_{t + h}} - \mu} \right) - n\mathbb{E}\left[ \tilde{\gamma} \left( h \right) \right] \end{bmatrix} \\ & \overset{\mathcal{D}}{\to} \mathcal{N}\left(0, n \, \text{var} \left(\begin{bmatrix} \tilde{\gamma} \left( 0 \right) \\ \tilde{\gamma} \left( h \right) \end{bmatrix} \right)\right) \end{aligned} \]
Moreover, by Cauchy–Schwarz inequality and since \(\text{var}(X_t) = \sigma^2\), we have
\[ \sum\limits_{t = 1}^{n} {\left( {{X_t} - \mu} \right)\left( {{X_{t + h}} - \mu} \right)} \leq \sqrt{\sum\limits_{t = 1}^{n} {\left( {{X_t} - \mu} \right)^2} \sum\limits_{t = 1}^{n} {\left( {{X_{t + h}} - \mu} \right)^2}} < \infty. \]
Therefore, by bounded convergence theorem and \((W_t)\) is iid, we have
\[\begin{aligned} \mathbb{E}[\tilde{\gamma} \left( h \right)] &= \frac{1}{n}\mathbb{E}\left[\sum\limits_{t = 1}^{n} {\left( {{X_t} - \mu} \right)\left( {{X_{t + h}} - \mu} \right)}\right]\\ &= \frac{1}{n}\left[\sum\limits_{t = 1}^{n} { \mathbb{E}\left( {{X_t} - \mu} \right)\mathbb{E}\left( {{X_{t + h}} - \mu} \right)}\right] = \begin{cases} \sigma^2, & \text{for } h = 0\\ 0, & \text{for } h \neq 0 \end{cases}. \end{aligned} \]
Next, we consider the variance of \(\tilde{\gamma} \left( h \right)\) when \(h \neq 0\),
\[ \begin{aligned} var[\tilde{\gamma} \left( h \right)] &= \frac{1}{n^2}\mathbb{E}\left\{\left[\sum\limits_{t = 1}^{n} {\left( {{X_t} - \mu} \right)\left( {{X_{t + h}} - \mu} \right)}\right]^2\right\}\\ &= \frac{1}{n^2}\mathbb{E}\left\{\left[\sum\limits_{i = 1}^{n} {\left( {{X_i} - \mu} \right)\left( {{X_{i + h}} - \mu} \right)}\right] \left[\sum\limits_{j = 1}^{n} {\left( {{X_j} - \mu} \right)\left( {{X_{j + h}} - \mu} \right)}\right]\right\}\\ &= \frac{1}{n^2}\mathbb{E}\left[\sum\limits_{i = 1}^{n}\sum\limits_{j = 1}^{n} {\left( {{X_i} - \mu} \right)\left( {{X_{i + h}} - \mu} \right)}{\left( {{X_j} - \mu} \right)\left( {{X_{j + h}} - \mu} \right)}\right]. \end{aligned} \]
Also by Cauchy–Schwarz inequality and the finite fourth moment assumption, we can use the bounded convergence theorem. Once again since \((W_t)\) is white noise process, we have
\[ \mathbb{E}\left[{\left( {{X_i} - \mu} \right)\left( {{X_{i + h}} - \mu} \right)}{\left( {{X_j} - \mu} \right)\left( {{X_{j + h}} - \mu} \right)}\right] \neq 0 \] only when \(i = j\).
Therefore, we obtain
\[\begin{aligned} var[\tilde{\gamma} \left( h \right)] &= \frac{1}{n^2}\sum\limits_{i = 1}^{n} \mathbb{E}\left[ {\left( {{X_i} - \mu} \right)^2\left( {{X_{i + h}} - \mu} \right)^2}\right]\\ &= \frac{1}{n^2}\sum\limits_{i = 1}^{n} \mathbb{E}{\left( {{X_i} - \mu} \right)^2\mathbb{E}\left( {{X_{i + h}} - \mu} \right)^2} = \frac{1}{n}\sigma^4. \end{aligned} \]
Similarly, for \(h = 0\), we have
\[ \begin{aligned} var[\tilde{\gamma} \left( 0 \right)] &= \frac{1}{n^2}\mathbb{E}\left\{\left[\sum\limits_{t = 1}^{n} {\left( {{X_t} - \mu} \right)^2}\right]^2\right\} - \frac{1}{n^2}\left[\mathbb{E}\sum\limits_{t = 1}^{n} {\left( {{X_t} - \mu} \right)^2}\right]^2 = \frac{2}{n}\sigma^4. \end{aligned} \]
Next, we consider the covariance between \(\tilde{\gamma} \left( 0 \right)\) and \(\tilde{\gamma} \left( h \right)\), for \(h \neq 0\), and we obtain
\[ \begin{aligned} cov[\tilde{\gamma} \left( 0 \right), \tilde{\gamma} \left( h \right)] &= \mathbb{E}[\tilde{\gamma} \left( 0 \right) \tilde{\gamma} \left( h \right)] - \mathbb{E}[\tilde{\gamma} \left( 0 \right)] \mathbb{E}[\tilde{\gamma} \left( h \right)] = \mathbb{E}[\tilde{\gamma} \left( 0 \right) \tilde{\gamma} \left( h \right)]\\ &= \mathbb{E}\left[\left[\sum\limits_{t = 1}^{n} {\left( {{X_t} - \mu} \right)^2}\right]\left[\sum\limits_{t = 1}^{n} {\left( {{X_t} - \mu} \right)\left( {{X_{t + h}} - \mu} \right)}\right]\right] = 0. \end{aligned} \]
Therefore by Slutsky’s Theorem we have,
\[ \begin{aligned} \sqrt{n}\left\{ \begin{bmatrix} \hat{\gamma} \left( 0 \right) \\ \hat{\gamma} \left( h \right) \end{bmatrix} - \begin{bmatrix} \sigma^2 \\ 0 \end{bmatrix} \right\} &= \sqrt{n}\left\{ \begin{bmatrix} \tilde{\gamma} \left( 0 \right) \\ \tilde{\gamma} \left( h \right) \end{bmatrix} - \begin{bmatrix} \sigma^2 \\ 0 \end{bmatrix} \right\} + \underbrace{\sqrt{n}\left\{ \begin{bmatrix} \hat{\gamma} \left( 0 \right) \\ \hat{\gamma} \left( h \right) \end{bmatrix} - \begin{bmatrix} \tilde{\gamma} \left( 0 \right) \\ \tilde{\gamma} \left( h \right) \end{bmatrix} \right\}}_{\overset{p}{\to} 0}\\ &\overset{\mathcal{D}}{\to} \mathcal{N}\left(0, \begin{bmatrix} 2\sigma^4 & 0\\ 0 & \sigma^4 \end{bmatrix} \right). \end{aligned} \]
Next, we define the function \(g\left( \begin{bmatrix} a \\ b \end{bmatrix} \right) = b/a\), where \(a \neq 0\). For this function it is clear that
\[ \nabla g\left( \begin{bmatrix} a \\ b \end{bmatrix} \right) = \begin{bmatrix} -\frac{b}{a^2} \\ \frac{1}{a} \end{bmatrix}^{T} , \]
and thus using the Delta method, we have for \(h \neq 0\)
\[ \begin{aligned} \sqrt{n}\hat{\rho}(h) = \sqrt{n}\left\{g\left( \begin{bmatrix} \hat{\gamma} \left( 0 \right) \\ \hat{\gamma} \left( h \right) \end{bmatrix} \right) - {\mu} \right\} &\overset{\mathcal{D}}{\to} \mathcal{N}\left(0, \sigma_r^2 \right), \end{aligned} \]
where
\[ \begin{aligned} {\mu} &= g\left(\begin{bmatrix} \sigma^2 & 0 \end{bmatrix} \right) = 0,\\ \sigma_r^2 &= \nabla g\left(\begin{bmatrix} \sigma^2 \\ 0 \end{bmatrix} \right) \begin{bmatrix} 2\sigma^4 & 0\\ 0 & \sigma^4 \end{bmatrix} \nabla g\left(\begin{bmatrix} \sigma^2 \\ 0 \end{bmatrix} \right)^{T} = \begin{bmatrix} 0 & \sigma^{-2} \end{bmatrix} \begin{bmatrix} 2\sigma^4 & 0\\ 0 & \sigma^4 \end{bmatrix} \begin{bmatrix} 0 \\ \sigma^{-2} \end{bmatrix} = 1. \end{aligned} \]
Thus, we have
\[ \sqrt{n}\hat{\rho}(h) \overset{\mathcal{D}}{\to} \mathcal{N}\left(0, 1 \right), \]
which concludes the proof the Theorem 1. \(\;\;\;\;\;\;\;\; \blacksquare\)
A.2 Proof of Theorem 4.1
Consider:
\[{\left( {{X_t} - m} \right)^2} = {\left[ {\left( {{X_t} - E\left[ {{X_t}|{\Omega _t}} \right]} \right) + \left( {E\left[ {{X_t}|{\Omega _t}} \right] - m} \right)} \right]^2}\]
where \(m = m\left( {{X_{1}}, \cdots ,{X_{t}}} \right)\) and \({\Omega _t} = \left( {{X_{1}}, \cdots ,{X_{t}}} \right)\).
Therefore we can write
\[{\left( {{X_t} - m} \right)^2} = {\left( {{X_t} - E\left[ {{X_t}|{\Omega _T}} \right]} \right)^2} + {\left( {E\left[ {{X_t}|{\Omega _T}} \right] - m} \right)^2} + 2\left( {{X_t} - E\left[ {{X_t}|{\Omega _t}} \right]} \right)\left( {E\left[ {{X_t}|{\Omega _T}} \right] - m} \right)\]
Focusing on only the last term (and dropping the constant 2), we have that
\[\underbrace {\left( {{X_t} - E\left[ {{X_t}|{\Omega _t}} \right]} \right)}_{ = {\varepsilon _t}}\left( {E\left[ {{X_t}|{\Omega _T}} \right] - m} \right).\]
At this point, let us study the value of \(E\left[ {{\varepsilon _t}|{\Omega _t}} \right]\) (the reason for this will become apparent in the next steps of the proof):
\[E\left[ {{\varepsilon _t}|{\Omega _t}} \right] = E\left[ {{X_t} - E\left[ {{X_t}|{\Omega _t}} \right]|{\Omega _t}} \right] = E\left[ {{X_t}|{\Omega _t}} \right] - E\left[ {{X_t}|{\Omega _t}} \right] = 0\]
Given this, we now consider the law of total expectation (i.e. \(E[X] = E[E[X|Y]]\) which allows us to rewrite the expectation of the last term as follows
\[E\left[ {{\varepsilon _t}\left( {E\left[ {{X_t}|{\Omega _t}} \right] - m} \right)} \right] = E\left[ {E\left[ {{\varepsilon _t}\left( {E\left[ {{X_t}|{\Omega _t}} \right] - m} \right)|{\Omega _t}} \right]} \right] = E\left[ {\underbrace {E\left[ {{\varepsilon _t}|{\Omega _t}} \right]}_{ = 0}\left( {E\left[ {{X_t}|{\Omega _t}} \right] - m} \right)} \right] = 0\]
Since we have shown that the expectation of the last term is zero, we have that
\[{\left( {{X_t} - m} \right)^2} = {\left( {{X_t} - E\left[ {{X_t}|{\Omega _t}} \right]} \right)^2} + {\left( {E\left[ {{X_t}|{\Omega _t}} \right] - m} \right)^2}.\]
Now the first term is positive and doesn’t depend on \(m\), so we focus on the second term which is minimized for \(m = E\left[ {{X_t}|{\Omega _t}} \right]\) thereby minimizing the entire expression in terms of \(m\).
A.3 Proof of Theorem 4.2
We begin by examining the norm and notice that
\[\begin{equation*} \left\| {X_0 - \hat{X}} \right\|_2^2 = {X_0^T}X_0 + \hat{X}^T \hat{X} - 2X_0^T \hat{X} \end{equation*}\]Therefore, we obtain
\[\begin{align*} \mathbb{E}\left[ \mathbb{E}_0 \left[ \left\| {X_0 - \hat{X}} \right\|_2^2\right] \right] &= \mathbb{E}\left[ \mathbb{E}_0 \left[{X_0^T}X_0\right] \right] + \mathbb{E}\left[ \mathbb{E}_0 \left[\hat{X}^T \hat{X}\right] \right] - 2 \mathbb{E}\left[ \mathbb{E}_0 \left[X_0^T \hat{X}\right] \right]\\ &= \mathbb{E}_0 \left[{X_0^T}X_0\right] + \mathbb{E}\left[ \hat{X}^T \hat{X}\right] - 2 \mathbb{E}\left[ \mathbb{E}_0^T \left[X_0\right] \hat{X} \right]. \end{align*}\]Next, we let \({C^*} \equiv \mathbb{E}\left[ {\left\| {X - \hat X} \right\|_2^2} \right]\) which can be expressed as follows:
\[\begin{equation*} \mathbb{E}\left[ \left\| {X - \hat{X}} \right\|_2^2\right] = \mathbb{E} \left[{X^T}X\right] + \mathbb{E}\left[ \hat{X}^T \hat{X}\right] - 2 \mathbb{E}\left[ X^T \hat{X} \right]. \end{equation*}\]Since \(X\) and \(X_0\) have the same distribution we have that \(\mathbb{E}[X] = \mathbb{E}_0 [X_0]\) and \(\mathbb{E}[X^T X] = \mathbb{E}_0 [X_0^T X_0]\). By taking the difference between \(C\) and \(C^*\) we obtain
\[\begin{align*} C - {C^*} &= 2 \mathbb{E}\left[ X^T \hat{X} \right] - 2 \mathbb{E}\left[ \mathbb{E}_0^T \left[X_0\right] \hat{X} \right] = 2 \mathbb{E}\left[ \left(X - \mathbb{E}_0 \left[X_0\right]\right)^T \hat{X} \right]\\ &=2 \mathbb{E}\left[ \left(X - \mathbb{E} \left[X\right]\right)^T \hat{X} \right] = 2 \mathbb{E}\left[ tr \left(\left(X - \mathbb{E} \left[X\right]\right)^T \hat{X}\right) \right]\\ &= 2 \mathbb{E}\left[ tr \left(\hat{X} \left(X - \mathbb{E} \left[X\right]\right)^T \right) \right] = 2 tr \left( \mathbb{E}\left[ \hat{X} \left(X - \mathbb{E} \left[X\right]\right)^T \right] \right)\\ &= 2 tr \left( \text{cov}\left( X - \mathbb{E} \left[X\right], \hat{X} \right)\right) + 2 tr \left( \mathbb{E}\left[ X - \mathbb{E} \left[X\right] \right] \mathbb{E}^T[\hat{X}]\right)\\ &= 2 tr \left( \text{cov}\left( X, \hat{X} \right)\right) + 2 tr \left(\left( \mathbb{E}\left[ X \right] - \mathbb{E} \left[X\right] \right) \mathbb{E}^T[\hat{X}]\right)\\ &= 2 tr \left( \text{cov}\left( X, \hat{X} \right)\right). \end{align*}\]Thus, we have
\[\begin{equation*} C = \mathbb{E}\left[ {\left\| {X - \hat X} \right\|_2^2} \right] + 2 tr \left( \text{cov}\left( X, \hat{X} \right)\right), \end{equation*}\]which concludes the proof.