Modern Stochastics: Theory and Applications logo


  • Help
Login Register

  1. Home
  2. Issues
  3. Volume 8, Issue 4 (2021)
  4. Bias reduction of a conditional maximum ...

Bias reduction of a conditional maximum likelihood estimator for a Gaussian second-order moving average model
Volume 8, Issue 4 (2021), pp. 435–463
Fumiaki Honda   Takeshi Kurosawa  

Authors

 
Placeholder
https://doi.org/10.15559/21-VMSTA187
Pub. online: 4 August 2021      Type: Research Article      Open accessOpen Access

Received
8 December 2020
Revised
8 July 2021
Accepted
8 July 2021
Published
4 August 2021

Abstract

In this study, we consider a bias reduction of the conditional maximum likelihood estimators for the unknown parameters of a Gaussian second-order moving average (MA(2)) model. In many cases, we use the maximum likelihood estimator because the estimator is consistent. However, when the sample size n is small, the error is large because it has a bias of $O({n^{-1}})$. Furthermore, the exact form of the maximum likelihood estimator for moving average models is slightly complicated even for Gaussian models. We sometimes rely on simpler maximum likelihood estimation methods. As one of the methods, we focus on the conditional maximum likelihood estimator and examine the bias of the conditional maximum likelihood estimator for a Gaussian MA(2) model. Moreover, we propose new estimators for the unknown parameters of the Gaussian MA(2) model based on the bias of the conditional maximum likelihood estimators. By performing simulations, we investigate properties of this bias, as well as the asymptotic variance of the conditional maximum likelihood estimators for the unknown parameters. Finally, we confirm the validity of the new estimators through this simulation study.

1 Introduction

Estimators of unknown parameters must be consistent. The consistency is ensured when we have large samples. The estimators might have a bias if the sample size is small. In recent years, some computational methods has been developed to compute estimates for unknown parameters. However, an analytical solution for the bias enables us to know a relationship between unknown parameter and the bias. It means that the bias changes depending on unknown parameters. Many analytical evaluations for the bias for a class of nonlinear estimators in models with i.i.d. samples have been conducted for many years. Tanaka (1983) [14] provided asymptotic expansions of the least square estimator for the first-order autoregressive process AR(1) and computed its bias. Tanaka (1984) [15] also gave asymptotic expansions of the maximum likelihood estimators for autoregressive moving average (ARMA) models, including AR(1), AR(2), MA(1), and MA(2), and also computed their bias. Cordeiro and Klein (1994) [8] derived the bias of the maximum likelihood estimators for ARMA models in another way although the result for MA(2) was not shown. Cheang and Reinsel (2000) [7] developed a way to reduce the bias of AR models using the restricted maximum likelihood estimation.
Practically, we often rely on the conditional maximum likelihood estimation for reducing the computational cost of the maximum likelihood estimation and for a prediction of an unobserved variable which is the next value of the observed data (see Section 2 for the definition of the conditional maximum likelihood estimation). The conditional maximum likelihood estimation is often referred as the quasi-maximum likelihood estimation (QMLE). Statistical properties of the conditional maximum likelihood estimation have been discussed in some literatures (see [3] and [4] by Bao and Ullah, for example). Giummolè and Vidoni (2010) [9] showed the bias of the conditional maximum likelihood estimator for a Gaussian MA(1) model in a process of obtaining improved coverage probabilities for ARMA models. However, the bias of the estimator for a Gaussian first-order moving average (MA(1)) model was slightly strange. Hence, Kurosawa, Noguchi, and Honda (2017) [12] corrected the bias and deduced a simple expression for the bias using a method by Barndorff-Nielsen and Cox (1994) [5]. We also should note the recent remarkable results by Y. Bao (2016) [1] and (2018) [2]. We shall discuss his results in Remark 3.4 below.
In this study, we show the bias of the conditional maximum likelihood estimators of unknown parameters for a Gaussian second-order moving average (MA(2)) model followed by the method in [12]. In Section 2, we introduce a Gaussian MA(2) model and the conditional maximum likelihood function. In Section 3, we derive both the bias and the mean squared errors (MSEs) of the conditional maximum likelihood estimator for a Gaussian MA(2) model, and then propose new estimators based on the $O({n^{-1}})$ term in the bias of the conditional maximum likelihood estimators. Moreover, we show that the proposed estimators are less biased and have the lower MSEs than those of the conditional maximum likelihood estimators. In Section 4, we conduct a simple simulation study to verify our results. Furthermore, we apply our method to GNP in United States of America as an illustrative example of our method in Section 6.

2 A Gaussian MA(2) model and the conditional maximum likelihood estimator

Let $\{{Y_{t}}\}$ be a Gaussian MA(1) model (see, e.g., [6, 10]) defined by
(1)
\[ {Y_{t}}=\mu +{\varepsilon _{t}}+\rho {\varepsilon _{t-1}},\hspace{1em}{\varepsilon _{t}}\stackrel{\mathrm{i}.\mathrm{i}.\mathrm{d}.}{\sim }N(0,{\sigma ^{2}})\hspace{2em}(t\ge 1),\]
where $|\rho |<1$. Kurosawa, Noguchi, and Honda (2017) [12] computed the bias of the conditional maximum likelihood estimator under the condition that
(2)
\[ {\varepsilon _{0}}=0\]
for the Gaussian MA(1) model. Assumption (2) is a useful condition for not only an estimation problem but also a prediction problem, since ${\varepsilon _{T}}$ ($T\ge 1$) can be written by a linear combination of ${Y_{1}},\dots ,{Y_{T}}$. Then, the best linear unbiased estimator ${\widehat{Y}_{T+h}}$ of ${Y_{T+h}}$ ($h>0$) given $S=\{{Y_{1}}={y_{1}},\dots ,{Y_{T}}={y_{T}}\}$ is described using a finite linear combination of ${\varepsilon _{1}},\dots ,{\varepsilon _{T}}$ (see, e.g., [12]). They gave the following:
Theorem 2.1 ([12]).
The bias of the conditional maximum likelihood estimators of the unknown parameters given (2) for the Gaussian MA(1) model is
\[\begin{array}{r@{\hskip10.0pt}c@{\hskip10.0pt}l}\displaystyle {E_{}}[\widehat{\mu }-\mu ]& \displaystyle =& \displaystyle o({n^{-1}}),\\ {} \displaystyle {E_{}}[\widehat{\sigma }-\sigma ]& \displaystyle =& \displaystyle -\frac{5\sigma }{4n}+o({n^{-1}}),\\ {} \displaystyle {E_{}}[\widehat{\rho }-\rho ]& \displaystyle =& \displaystyle \frac{2\rho -1}{n}+o({n^{-1}}).\end{array}\]
In this study, we consider a Gaussian MA(2) model defined by
(3)
\[ {Y_{t}}=\mu +{\varepsilon _{t}}+{\rho _{1}}{\varepsilon _{t-1}}+{\rho _{2}}{\varepsilon _{t-2}}\hspace{2em}(t\ge 1),\]
where ${\varepsilon _{t}}\stackrel{\mathrm{i}.\mathrm{i}.\mathrm{d}.}{\sim }N(0,{\sigma ^{2}})$ and $\theta ={(\mu ,\sigma ,{\rho _{1}},{\rho _{2}})^{\top }}$ is a vector consisting of the unknown parameters. Although the MA(2) model has the property of stationarity regardless of the values of ${\rho _{1}}$ and ${\rho _{2}}$, we assume the invertibility, which means
\[ {\rho _{1}}-{\rho _{2}}<1,\hspace{1em}{\rho _{1}}+{\rho _{2}}>-1,\hspace{1em}-1<{\rho _{2}}<1,\]
to identify the model uniquely. Otherwise, the maximum likelihood function takes the same value at different points.
If we consider an estimation problem for the maximum likelihood function with ${Y_{1}},\dots ,{Y_{n}}$, then the likelihood function can be expressed using the infinite number of εs. To avoid the problem of infinite number of εs, we solve the conditional maximum likelihood function given
(4)
\[ {\varepsilon _{0}}={\varepsilon _{-1}}=0\]
for the Gaussian MA(2) model. In this case, ${Y_{1}},\dots ,{Y_{n}}$ can be transformed using the finite number of εs. The conditional log-likelihood function using the finite number of εs for the Gaussian MA(2) model given (4) is expressed as
(5)
\[ \mathcal{L}(\theta ;y)=-\frac{n}{2}\log (2\pi )-\frac{n}{2}\log ({\sigma ^{2}})-{\sum \limits_{t=1}^{n}}\frac{{\{{\varepsilon _{t}}(\theta ;y)\}^{2}}}{2{\sigma ^{2}}}.\]
The likelihood function with (4) is referred as the conditional likelihood function (see [11, p. 653] and [17]). We use the following lemma to compute the bias of the unknown parameters in the Gaussian MA (2) model.
Lemma 2.2.
Let $Y={({Y_{1}},\dots ,{Y_{n}})^{\top }}$ be a vector of random variables generated by a Gaussian MA(2) model in (3). Assume that (4) holds. Then, we have
(6)
\[ {\varepsilon _{t}}(\theta ;Y)={\varepsilon _{t}}={\sum \limits_{k=0}^{t-1}}\left({\sum \limits_{l=0}^{t-k-1}}{\lambda _{1}^{l}}{\lambda _{2}^{t-k-l-1}}\right)({Y_{k+1}}-\mu )\hspace{2em}(t\ge 1),\]
where
(7)
\[ {\lambda _{1}}=\frac{-{\rho _{1}}+\Delta }{2},\hspace{1em}{\lambda _{2}}=\frac{-{\rho _{1}}-\Delta }{2},\hspace{1em}\Delta =\sqrt{{\rho _{1}^{2}}-4{\rho _{2}}}.\]
The proof will be in the Appendix. We know that
\[\begin{aligned}{}{\varepsilon _{t}}& =\frac{1}{(1-{\lambda _{1}}L)(1-{\lambda _{2}}L)}({Y_{t}}-\mu )={\sum \limits_{k=0}^{\infty }}{\sum \limits_{l=0}^{k}}{\lambda _{1}^{l}}{\lambda _{2}^{k-l}}({Y_{t-k}}-\mu )\\ {} & ={\sum \limits_{k=-\infty }^{t-1}}\left({\sum \limits_{l=0}^{t-k-1}}{\lambda _{1}^{l}}{\lambda _{2}^{t-k-l-1}}\right)({Y_{k+1}}-\mu )\end{aligned}\]
since the process is invertible. The lemma suggests that the coefficients of ${Y_{t}}\hspace{2.5pt}(t\le 0)$ are zero when ${\varepsilon _{0}}={\varepsilon _{-1}}=0$.
Since the conditional likelihood function is expressed as a function of independent samples ${\varepsilon _{1}},\dots ,{\varepsilon _{n}}$, we can apply the following lemma in [5] to the conditional log-likelihood function. We apply Lemma 2.3 for i.i.d. random variables ε, not Y. The high-order differentiations of the conditional log-likelihood function (5) are required to obtain the bias and the MSEs of the maximum likelihood estimators and Lemma 2.2 will be used for the calculations. An asymptotic expansion of the bias of the maximum likelihood estimator is given by the following:
Lemma 2.3 (See Barndorff-Nielsen and Cox (1994) [5, p. 150]).
Let $\theta ={({\theta _{1}},\dots ,{\theta _{d}})^{\top }}$ be a vector of unknown parameters for a random variable Z, and $\widehat{\theta }={({\widehat{\theta }_{1}},\dots ,{\widehat{\theta }_{d}})^{\top }}$ be a vector of the maximum likelihood estimators of θ for a vector of random samples $Z={({Z_{1}},\dots ,{Z_{n}})^{\top }}$. Then, the bias of ${\widehat{\theta }_{r}}$ $(1\le r\le d)$ is given by
\[\begin{aligned}{}& {E_{Z}}[{\widehat{\theta }_{r}}-{\theta _{r}}]=\frac{1}{2}{\sum \limits_{s=1}^{d}}{\sum \limits_{t=1}^{d}}{\sum \limits_{u=1}^{d}}{i^{{\theta _{r}}{\theta _{s}}}}{i^{{\theta _{t}}{\theta _{u}}}}({\nu _{{\theta _{s}}{\theta _{t}}{\theta _{u}}}}+2{\nu _{{\theta _{s}}{\theta _{t}},{\theta _{u}}}})+O({n^{-3/2}})\\ {} & \hspace{2em}(r=1,\dots ,d),\end{aligned}\]
where $\mathcal{L}(\theta ;Z)$ is the log-likelihood function for $Z={({Z_{1}},\dots ,{Z_{n}})^{\top }}$,
\[\begin{array}{r@{\hskip10.0pt}c@{\hskip10.0pt}l}\displaystyle {l_{{\theta _{s}}}}& \displaystyle =& \displaystyle \frac{\partial \mathcal{L}(\theta ;Z)}{\partial {\theta _{s}}},\hspace{1em}{l_{{\theta _{s}}{\theta _{t}}}}=\frac{{\partial ^{2}}\mathcal{L}(\theta ;Z)}{\partial {\theta _{s}}\partial {\theta _{t}}},\hspace{1em}{l_{{\theta _{s}}{\theta _{t}}{\theta _{u}}}}=\frac{{\partial ^{3}}\mathcal{L}(\theta ;Z)}{\partial {\theta _{s}}\partial {\theta _{t}}\partial {\theta _{u}}},\\ {} \displaystyle {i^{{\theta _{r}}{\theta _{s}}}}& \displaystyle =& \displaystyle {\left({I_{n}}{(\theta )^{-1}}\right)_{r,s}},\hspace{1em}{\nu _{{\theta _{s}}{\theta _{t}}{\theta _{u}}}}={E_{Z}}[{l_{{\theta _{s}}{\theta _{t}}{\theta _{u}}}}],\hspace{1em}{\nu _{{\theta _{s}}{\theta _{t}},{\theta _{u}}}}={E_{Z}}[{l_{{\theta _{s}}{\theta _{t}}}}{l_{{\theta _{u}}}}],\end{array}\]
and ${I_{n}}(\theta )$ is the Fisher information matrix for Z.

3 Main results

In this section, we compute the biases of estimators for the unknown parameters of the Gaussian MA(2) model using the conditional maximum likelihood estimate, and also propose new estimators for these parameters. Before obtaining the results on bias, we observe the MSEs. The MSEs appear in the diagonal elements in the covariance matrix by
\[ {E_{}}[(\widehat{\theta }-\theta ){(\widehat{\theta }-\theta )^{\top }}]={I_{n}}{(\theta )^{-1}}+o({n^{-1}}),\]
where ${I_{n}}(\theta )$ is the Fisher’s information matrix. It can be simplified by applying asymptotic properties
\[ \underset{n\to \infty }{\lim }n{I_{n}}{(\theta )^{-1}}=J(\theta ).\]
Therefore,
(8)
\[ {E_{}}[(\widehat{\theta }-\theta ){(\widehat{\theta }-\theta )^{\top }}]=\frac{J(\theta )}{n}+o({n^{-1}}).\]
Theorem 3.1.
The elements in the asymptotic covariance matrix of the conditional maximum likelihood estimators of the unknown parameters under (4) for the Gaussian MA(2) model in (3) is given by ${E_{}}[(\widehat{\mu }-\mu )(\widehat{\sigma }-\sigma )]={E_{}}[(\widehat{\mu }-\mu )({\widehat{\rho }_{1}}-{\rho _{1}})]={E_{}}[(\widehat{\mu }-\mu )({\widehat{\rho }_{2}}-{\rho _{2}})]$ $={E_{}}[(\widehat{\sigma }-\sigma )({\widehat{\rho }_{1}}-{\rho _{1}})]$ $={E_{}}[(\widehat{\sigma }-\sigma )({\widehat{\rho }_{2}}-{\rho _{2}})]=o({n^{-1}})$ and
\[\begin{array}{l}\displaystyle {E_{}}[{(\widehat{\mu }-\mu )^{2}}]=\frac{{\sigma ^{2}}{({\rho _{1}}+{\rho _{2}}+1)^{2}}}{n}+o({n^{-1}}),\hspace{1em}{E_{}}[{(\widehat{\sigma }-\sigma )^{2}}]=\frac{{\sigma ^{2}}}{2n}+o({n^{-1}}),\\ {} \displaystyle {E_{}}[{({\widehat{\rho }_{1}}-{\rho _{1}})^{2}}]=\frac{1-{\rho _{2}^{2}}}{n}+o({n^{-1}}),\hspace{1em}{E_{}}[{({\widehat{\rho }_{2}}-{\rho _{2}})^{2}}]=\frac{1-{\rho _{2}^{2}}}{n}+o({n^{-1}}),\\ {} \displaystyle {E_{}}[({\widehat{\rho }_{1}}-{\rho _{1}})({\widehat{\rho }_{2}}-{\rho _{2}})]=\frac{{\rho _{1}}(1-{\rho _{2}})}{n}+o({n^{-1}}).\end{array}\]
The proof is given in Section 5.1. By applying Lemma 2.3 to (5), we obtain the bias of the conditional maximum likelihood estimators.
Theorem 3.2.
The bias of the conditional maximum likelihood estimators of the unknown parameters under (4) for the Gaussian MA(2) model in (3) is given by
(9)
\[\begin{array}{r@{\hskip10.0pt}c@{\hskip10.0pt}l}\displaystyle {E_{}}[\widehat{\mu }-\mu ]& \displaystyle =& \displaystyle o({n^{-1}}),\end{array}\]
(10)
\[\begin{array}{r@{\hskip10.0pt}c@{\hskip10.0pt}l}\displaystyle {E_{}}[\widehat{\sigma }-\sigma ]& \displaystyle =& \displaystyle -\frac{7\sigma }{4n}+o({n^{-1}}),\end{array}\]
(11)
\[\begin{array}{r@{\hskip10.0pt}c@{\hskip10.0pt}l}\displaystyle {E_{}}[{\widehat{\rho }_{1}}-{\rho _{1}}]& \displaystyle =& \displaystyle \frac{{\rho _{1}}+{\rho _{2}}-1}{n}+o({n^{-1}}),\end{array}\]
(12)
\[\begin{array}{r@{\hskip10.0pt}c@{\hskip10.0pt}l}\displaystyle {E_{}}[{\widehat{\rho }_{2}}-{\rho _{2}}]& \displaystyle =& \displaystyle \frac{3{\rho _{2}}-1}{n}+o({n^{-1}}).\end{array}\]
The proof is given in Section 5.2. We observe that the bias of the conditional maximum likelihood estimators for the Gaussian MA(2) model is the same as that for the full maximum likelihood estimators for a Gaussian MA(2) model (see Tanaka (1984) [15]). Although (11) looks different from the result by Tanaka, we can deduce the same result for the bias of ${\hat{\sigma }^{2}}$ (see (14)).
Remark 3.3.
We note that MA$(2)$ is reduced to MA$(1)$ if we put ${\rho _{2}}=0$ in (3). However, the bias for $\rho (={\rho _{1}})$ in Theorem 2.1 is not obtained even if we put ${\rho _{2}}=0$ in (11). The results in Theorem 3.2 with ${\rho _{2}}=0$ are obtained from a solution of the maximum likelihood estimate in four dimensions. The bias of MA(1) in Theorem 2.1 can be considered as the bias of MA(2) given ${\rho _{2}}=0$ in Theorem 3.2. This implies that Theorem 3.2 is not result under the assumption with algebraic relationships among unknown parameters such as ${\rho _{2}}=0$, ${\rho _{2}}=1$, $\Delta ={\rho _{1}^{2}}-4{\rho _{2}}=0$, and so on. Namely, we do not assume any algebraic relationships among the unknown parameters in advance. Thus, $\Delta \ne 0$ which is equivalent to ${\lambda _{1}}\ne {\lambda _{2}}$ is used in the proof of Theorem 3.2 and Propositions and Lemmas in the appendix except for Lemma 2.2.
Remark 3.4.
We have recently found the notable results by Y. Bao (2016) [1] although we originally proved Theorem 3.2. The abstract in [1] claims that the bias of the conditional Gaussian likelihood estimation with nonnormal errors is derived. The gap between the “Gaussian” and the “nonnormal errors” imply that he used likelihood function (5) even if the errors follow a nonnormal distribution. For the calculation of the bias, we require the values of the skewness ${\gamma _{1}}$ and the kurtosis ${\gamma _{2}}$. Namely, he derived the bias regarding the likelihood function as the Gaussian likelihood function without the conditions ${\gamma _{1}}=0$ and ${\gamma _{2}}=3$ for a normal distribution. He gave the bias of various models including MA(2) with a matrix representation which was originally studied by Corderio and Klein (1994) [8], while we use the roots of the characteristic function for the derivation of the bias. We purely focus on the Gaussian MA(2) model with the conditional likelihood function and evaluate the corrected bias. Furthermore, we propose a new estimator below based on the corrected bias and discuss the corrected estimators under a pure Gaussian MA(2) model in detail using the (estimated) bias and the MSE in the simulation study.
Using (10), (11), and (12), we propose the following new estimators for the Gaussian MA(2) model:
(13)
\[ \widetilde{\sigma }=\widehat{\sigma }+\frac{7\widehat{\sigma }}{4n},\hspace{2em}{\widetilde{\rho }_{1}}={\widehat{\rho }_{1}}-\frac{{\widehat{\rho }_{1}}+{\widehat{\rho }_{2}}-1}{n},\hspace{2em}{\widetilde{\rho }_{2}}={\widehat{\rho }_{2}}-\frac{3{\widehat{\rho }_{2}}-1}{n}.\]
As we can see, the proposed estimators are asymptotically equal to the usual estimators. We consider the MSEs of the new estimators:
(14)
\[ {E_{}}[\widehat{\sigma }]=\left(1-\frac{7}{4n}\right)\sigma +o({n^{-1}}),\hspace{1em}{E_{}}[{\widehat{\sigma }^{2}}]=\left(1-\frac{3}{n}\right){\sigma ^{2}}+o({n^{-1}}).\]
Thus, we have
\[ {E_{}}[{(\widetilde{\sigma }-\sigma )^{2}}]=\frac{{\sigma ^{2}}}{2n}+o({n^{-1}}).\]
In other words, the MSEs of $\widetilde{\sigma }$ and $\widehat{\sigma }$ are asymptotically the same as well:
\[\begin{array}{r@{\hskip10.0pt}c@{\hskip10.0pt}l}\displaystyle {E_{}}[{\widehat{\rho }_{1}^{2}}]& \displaystyle =& \displaystyle {\rho _{1}^{2}}+\frac{1-{\rho _{2}^{2}}}{n}+2{\rho _{1}}\frac{{\rho _{1}}+{\rho _{2}}-1}{n}+o({n^{-1}}),\\ {} \displaystyle {E_{}}[{\widehat{\rho }_{2}^{2}}]& \displaystyle =& \displaystyle {\rho _{2}^{2}}+\frac{1-{\rho _{2}^{2}}}{n}+2{\rho _{2}}\frac{3{\rho _{2}}-1}{n}+o({n^{-1}}),\\ {} \displaystyle {E_{}}[{\widehat{\rho }_{1}}{\widehat{\rho }_{2}}]& \displaystyle =& \displaystyle {\rho _{1}}{\rho _{2}}+{\rho _{2}}\frac{3{\rho _{1}}+{\rho _{2}}-1}{n}+o({n^{-1}}).\end{array}\]
Therefore, we have
\[ {E_{}}[{({\widetilde{\rho }_{1}}-{\rho _{1}})^{2}}]=\frac{1-{\rho _{2}^{2}}}{n}+o({n^{-1}}),\hspace{1em}{E_{}}[{({\widetilde{\rho }_{2}}-{\rho _{2}})^{2}}]=\frac{1-{\rho _{2}^{2}}}{n}+o({n^{-1}}).\]
In other words, the MSEs of ${\widetilde{\rho }_{1}}$ and ${\widehat{\rho }_{1}}$ are asymptotically the same, as are the MSEs of ${\widetilde{\rho }_{2}}$ and ${\widehat{\rho }_{2}}$.

4 Simulation study

In this section, we conduct a simulation study in order to verify Theorems 3.1 and 3.2 and evaluate the validity of the new estimators. Let $\mu =1$ and $\sigma =1$. For a fixed n and a fixed ρ, we generate $y={({y_{1}},\dots ,{y_{n}})^{\top }}$ 30,000 times from the Gaussian MA(2) model. For each y, we calculate a vector of the conditional maximum likelihood estimators $\widehat{\theta }={(\widehat{\mu },\widehat{\sigma },{\widehat{\rho }_{1}},{\widehat{\rho }_{2}})^{\top }}$. Using the 30,000 replications, we calculate the estimated bias and MSEs using Monte Carlo simulations. In Subsetions 4.2 and 4.3, we also compute the full maximum likelihood estimators ${\mathbf{\hat{\theta }}^{\mathrm{MLE}}}$ to compare it with our estimators.

4.1 Evaluation of asymptotic variances

We evaluate how much the MSEs of the conditional maximum likelihood estimators change depending on the true values of the unknown parameters. Table 1 shows the estimated MSEs and the values of $J(\theta )/n$ obtained in Theorem 3.1 of the conditional maximum likelihood estimators for each unknown parameter.
Table 1.
Comparisons of the estimated MSEs and $J(\theta )/n$ for each unknown parameter (upper: the estimated MSE, lower: $J(\theta )/n$)
$({\rho _{1}},{\rho _{2}})=(0.25,-0.25)$ $({\rho _{1}},{\rho _{2}})=(-0.40,-0.59)$
n $\widehat{\mu }$ $\widehat{\sigma }$ ${\widehat{\rho }_{1}}$ ${\widehat{\rho }_{2}}$ n $\widehat{\mu }$ $\widehat{\sigma }$ ${\widehat{\rho }_{1}}$ ${\widehat{\rho }_{2}}$
50 0.02046 0.01133 0.03102 0.03684 50 0.01412 0.01146 0.04705 0.02783
0.02000 0.01000 0.01875 0.01875 0.01312 0.01000 0.01304 0.01304
100 0.01020 0.00526 0.01136 0.01233 100 0.00691 0.00578 0.01566 0.00944
0.01000 0.00500 0.00938 0.00938 0.00656 0.00500 0.00652 0.00652
150 0.00675 0.00346 0.00706 0.00739 150 0.00453 0.00386 0.00921 0.00564
0.00667 0.00333 0.00625 0.00625 0.00437 0.00333 0.00435 0.00435
$({\rho _{1}},{\rho _{2}})=(0.15,-0.55)$ $({\rho _{1}},{\rho _{2}})=(0.15,0.55)$
n $\widehat{\mu }$ $\widehat{\sigma }$ ${\widehat{\rho }_{1}}$ ${\widehat{\rho }_{2}}$ n $\widehat{\mu }$ $\widehat{\sigma }$ ${\widehat{\rho }_{1}}$ ${\widehat{\rho }_{2}}$
50 0.00786 0.01148 0.03108 0.03334 50 0.05849 0.01099 0.01943 0.02612
0.00720 0.01000 0.01395 0.01395 0.05780 0.01000 0.01395 0.01395
100 0.00379 0.00526 0.01051 0.01123 100 0.02907 0.00520 0.00814 0.00937
0.00360 0.00500 0.00698 0.00698 0.02890 0.00500 0.00698 0.00698
150 0.00248 0.00346 0.00597 0.00620 150 0.01932 0.00343 0.00511 0.00559
0.00240 0.00333 0.00465 0.00465 0.01927 0.00333 0.00465 0.00465
We conducted simulations under the four settings. The top-left table is prepared for checking of performance under the condition that the true parameters are within the invertibility condition. On the other hand, the top-right table is close to the boundary of the invertibility condition. The two bottom tables are made for checking the symmetry of ${\rho _{2}}$. It is clearly observed from Table 1 for all the settings that the estimated MSEs decrease when the sample size n is large. The estimated MSE of $\widehat{\sigma }$ does not depend on the values of ${\rho _{1}}$ and ${\rho _{2}}$, which coincides with the result that $J(\theta )/n$ of $\widehat{\sigma }$ in Theorem 3.1 does not include ${\rho _{1}}$ and ${\rho _{2}}$ in the expression. Since $J(\theta )/n$s of ${\widehat{\rho }_{1}}$ and ${\widehat{\rho }_{2}}$ depend on the value of ${\rho _{2}}$ but are independent of the value of ${\rho _{1}}$, we expect that the estimated MSEs of ${\widehat{\rho }_{1}}$ and ${\widehat{\rho }_{2}}$ on $({\rho _{1}},{\rho _{2}})=(0.15,-0.55)$ and $({\rho _{1}},{\rho _{2}})=(0.15,0.55)$ are close, but the results show different values in the small sample size $n=50$. This result may be the influence of $o({n^{-1}})$. We compare n times the estimated MSEs and $J(\theta )$ on $({\rho _{1}},{\rho _{2}})=(0.15,-0.55)$ and $({\rho _{1}},{\rho _{2}})=(0.15,0.55)$ of $n=50$ and $n=1000$ to verify the influence by $o({n^{-1}})$ in Table 2.
Table 2.
Comparisons of n times the estimated MSEs and $J(\theta )$ of $n=50$ and $n=1000$
$({\rho _{1}},{\rho _{2}})=(0.15,-0.55)$
$\widehat{\mu }$ $\widehat{\sigma }$ ${\widehat{\rho }_{1}}$ ${\widehat{\rho }_{2}}$
$n\times \text{the estimated MSE}$ $n=50$ 0.39307 0.57401 1.55412 1.66715
$n=1000$ 0.36061 0.51210 0.72833 0.71769
$J(\theta )$ 0.36000 0.50000 0.69750 0.69750
$({\rho _{1}},{\rho _{2}})=(0.15,0.55)$
$\widehat{\mu }$ $\widehat{\sigma }$ ${\widehat{\rho }_{1}}$ ${\widehat{\rho }_{2}}$
$n\times \text{the estimated MSE}$ $n=50$ 2.92462 0.54941 0.97164 1.30611
$n=1000$ 2.88205 0.51144 0.69979 0.71517
$J(\theta )$ 2.89000 0.50000 0.69750 0.69750
Generally, n times the estimated MSEs converge to $J(\theta )$ if n is large. The result shows that the estimated MSE of ${\widehat{\rho }_{1}}$ is close to ${\widehat{\rho }_{2}}$ for $n=1000$. Next, we present the behavior of the estimated MSEs for ${\rho _{1}}=0.25$ and ${\rho _{2}}=-0.25$ when the sample size is small in Table 3.
Table 3.
Behavior of the estimated MSEs when the sample size is small
n $\widehat{\mu }$ $\widehat{\sigma }$ ${\widehat{\rho }_{1}}$ ${\widehat{\rho }_{2}}$
10 0.14485 0.09400 0.41133 0.36819
11 0.12914 0.08262 0.34886 0.33276
12 0.11453 0.07419 0.29700 0.29798
13 0.10000 0.06659 0.26646 0.27767
14 0.08955 0.06034 0.23022 0.25188
15 0.08262 0.05502 0.21125 0.23577
16 0.07426 0.05051 0.18974 0.21314
17 0.06889 0.04616 0.17351 0.19723
18 0.06413 0.04295 0.15869 0.18569
19 0.05969 0.04028 0.14624 0.17146
20 0.05529 0.03762 0.13552 0.16067
The estimated MSE becomes smaller as the sample size becomes larger.

4.2 Estimator of σ

We express the bias of $\widehat{\sigma }$ as
\[ {E_{}}[\widehat{\sigma }-\sigma ]={e_{1}}+{e_{2}},\hspace{1em}{e_{1}}=-\frac{7\sigma }{4n},\hspace{1em}{e_{2}}=o({n^{-1}}),\]
which implies that
  • ${e_{1}}+{e_{2}}$ is the bias of the conditional maximum likelihood estimator,
  • ${e_{2}}$ is the bias of the conditional maximum likelihood estimator without the term $O({n^{-1}})$.
Table 4.
Evaluation of the estimated bias of $\widehat{\sigma }$
$({\rho _{1}},{\rho _{2}})=(0.25,-0.25)$ $({\rho _{1}},{\rho _{2}})=(0.15,0.55)$
n ${e_{1}}+{e_{2}}$ ${e_{2}}$ n ${e_{1}}+{e_{2}}$ ${e_{2}}$
50 $-0.03579$ $-0.00079$ 50 $-0.02814$ 0.00686
100 $-0.01635$ 0.00115 100 $-0.01322$ 0.00428
150 $-0.01035$ 0.00132 150 $-0.00825$ 0.00342
The bias of $\widehat{\sigma }$ does not depend on the value of ${\rho _{1}}$ and ${\rho _{2}}$, which coincide with (10). Moreover, $|{e_{2}}|$ is smaller than $|{e_{1}}+{e_{2}}|$ because of the exclusion of the term $O({n^{-1}})$. Next, we compare the bias and MSEs of $\widehat{\sigma }$ and the proposed estimator $\widetilde{\sigma }$ for ${\rho _{1}}=0.25$ and ${\rho _{2}}=-0.25$. We also compute the full maximum likelihood estimator ${\widehat{\sigma }^{\mathrm{MLE}}}$.
Table 5.
Comparison of the bias and MSEs of $\widehat{\sigma }$ and $\widetilde{\sigma }$
Bias MSE
n $\widehat{\sigma }$ ${\widehat{\sigma }^{\mathrm{MLE}}}$ $\widetilde{\sigma }$ n $\widehat{\sigma }$ ${\widehat{\sigma }^{\mathrm{MLE}}}$ $\widetilde{\sigma }$
50 $-0.03579$ $-0.04193$ $-0.00204$ 50 0.01133 0.01195 0.01077
100 $-0.01635$ $-0.01820$ 0.00087 100 0.00526 0.00531 0.00517
150 $-0.01035$ $-0.01142$ 0.00120 150 0.00346 0.00347 0.00343
The estimated bias of $\widetilde{\sigma }$ is lower than those of $\widehat{\sigma }$ and ${\widetilde{\sigma }^{\mathrm{MLE}}}$. On the other hand, there is no difference among the three estimated MSEs. This result certainly is in accordance with the discussion in Section 3.

4.3 Estimators of ${\rho _{1}}$ and ${\rho _{2}}$

We express the bias of ${\widehat{\theta }_{i}}$ $(i=3,4)$ as
\[ {E_{}}[{\widehat{\theta }_{i}}-{\theta _{i}}]={e_{1}}+{e_{2}},\hspace{1em}{e_{1}}=O({n^{-1}}),\hspace{1em}{e_{2}}=o({n^{-1}}),\]
where ${e_{1}}=({\rho _{1}}+{\rho _{2}}-1)/n$ for $i=3$, ${e_{1}}=(3{\rho _{2}}-1)/n$ for $i=4$. Then, ${e_{1}}+{e_{2}}$ and ${e_{2}}$ imply the bias of the conditional maximum likelihood estimator and the bias of the conditional maximum likelihood estimator without the term $O({n^{-1}})$, respectively.
Table 6.
Evaluation of the estimated bias of ${\widehat{\rho }_{1}}$ and ${\widehat{\rho }_{2}}$
$({\rho _{1}},{\rho _{2}})=(0.25,-0.25)$ $({\rho _{1}},{\rho _{2}})=(0.40,-0.59)$
${\widehat{\rho }_{1}}$ ${\widehat{\rho }_{2}}$ ${\widehat{\rho }_{1}}$ ${\widehat{\rho }_{2}}$
n ${e_{1}}+{e_{2}}$ ${e_{2}}$ ${e_{1}}+{e_{2}}$ ${e_{2}}$ n ${e_{1}}+{e_{2}}$ ${e_{2}}$ ${e_{1}}+{e_{2}}$ ${e_{2}}$
50 $-0.03591$ $-0.01591$ $-0.04913$ $-0.01413$ 50 $-0.13333$ $-0.10953$ $-0.00976$ 0.04564
100 $-0.01319$ $-0.00319$ $-0.01969$ $-0.00219$ 100 $-0.07309$ $-0.06119$ 0.01046 0.03816
150 $-0.00847$ $-0.00181$ $-0.01271$ $-0.00104$ 150 $-0.05414$ $-0.04621$ 0.01191 0.03037
$({\rho _{1}},{\rho _{2}})=(0.15,-0.55)$ $({\rho _{1}},{\rho _{2}})=(0.15,0.55)$
${\widehat{\rho }_{1}}$ ${\widehat{\rho }_{2}}$ ${\widehat{\rho }_{1}}$ ${\widehat{\rho }_{2}}$
n ${e_{1}}+{e_{2}}$ ${e_{2}}$ ${e_{1}}+{e_{2}}$ ${e_{2}}$ n ${e_{1}}+{e_{2}}$ ${e_{2}}$ ${e_{1}}+{e_{2}}$ ${e_{2}}$
50 $-0.06690$ $-0.03890$ $-0.06690$ $-0.01390$ 50 $-0.00862$ $-0.00262$ 0.00769 $-0.00531$
100 $-0.02493$ $-0.01093$ $-0.02765$ $-0.00115$ 100 $-0.00305$ $-0.00005$ 0.00247 $-0.00403$
150 $-0.01475$ $-0.00542$ $-0.01663$ 0.00104 150 $-0.00250$ $-0.00050$ 0.00095 $-0.00338$
Except for the case where ${\rho _{1}}$ and ${\rho _{2}}$ are close to the boundary condition $({\rho _{1}},{\rho _{2}})\hspace{-0.1667em}=(0.40,-0.59)$, $|{e_{2}}|$ is smaller than $|{e_{1}}+{e_{2}}|$. The bias of ${\widehat{\rho }_{1}}$ depends on the value of ${\rho _{1}}$ and ${\rho _{2}}$. The bias of ${\widehat{\rho }_{2}}$ depends on the value of ${\rho _{2}}$ but not ${\rho _{1}}$, which coincides with (12). Next, we compare the bias between ${\widehat{\rho }_{1}}$ and ${\widetilde{\rho }_{1}}$ and between ${\widehat{\rho }_{2}}$ and ${\widetilde{\rho }_{2}}$. We also compute the full maximum likelihood estimators ${\hat{\rho }_{i}^{\mathrm{MLE}}}$.
Table 7.
Comparisons of estimated bias for the estimator of ${\rho _{1}}$ and ${\rho _{2}}$
$({\rho _{1}},{\rho _{2}})=(0.25,-0.25)$
n ${\widehat{\rho }_{1}}$ ${\widehat{\rho }_{1}^{\mathrm{MLE}}}$ ${\widetilde{\rho }_{1}}$ ${\widehat{\rho }_{2}}$ ${\widehat{\rho }_{2}^{\mathrm{MLE}}}$ ${\widetilde{\rho }_{2}}$
50 −0.03591 −0.03166 −0.01421 −0.04913 −0.05512 −0.01118
100 −0.01319 −0.01087 −0.00286 −0.01969 −0.02115 −0.00160
150 −0.00847 −0.00716 −0.00166 −0.01271 −0.01338 −0.00079
$({\rho _{1}},{\rho _{2}})=(0.40,-0.59)$
n ${\widehat{\rho }_{1}}$ ${\widehat{\rho }_{1}^{\mathrm{MLE}}}$ ${\widetilde{\rho }_{1}}$ ${\widehat{\rho }_{2}}$ ${\widehat{\rho }_{2}^{\mathrm{MLE}}}$ ${\widetilde{\rho }_{2}}$
50 −0.13333 −0.08952 −0.10667 −0.00976 −0.05341 0.04623
100 −0.07309 −0.03134 −0.06057 0.01046 −0.01752 0.03784
150 −0.05414 −0.01796 −0.04592 0.01191 −0.01095 0.03014
$({\rho _{1}},{\rho _{2}})=(0.15,-0.55)$
n ${\widehat{\rho }_{1}}$ ${\widehat{\rho }_{1}^{\mathrm{MLE}}}$ ${\widetilde{\rho }_{1}}$ ${\widehat{\rho }_{2}}$ ${\widehat{\rho }_{2}^{\mathrm{MLE}}}$ ${\widetilde{\rho }_{2}}$
50 −0.06690 −0.06346 −0.03622 −0.06690 −0.09484 −0.00988
100 −0.02493 −0.01971 −0.01041 −0.02765 −0.03755 −0.00032
150 −0.01475 −0.01109 −0.00521 −0.01663 −0.02199 0.00137
$({\rho _{1}},{\rho _{2}})=(0.15,0.55)$
n ${\widehat{\rho }_{1}}$ ${\widehat{\rho }_{1}^{\mathrm{MLE}}}$ ${\widetilde{\rho }_{1}}$ ${\widehat{\rho }_{2}}$ ${\widehat{\rho }_{2}^{\mathrm{MLE}}}$ ${\widetilde{\rho }_{2}}$
50 −0.00862 −0.00733 −0.00260 0.00769 0.02588 −0.00577
100 −0.00305 −0.00265 −0.00004 0.00247 0.00854 −0.00410
150 −0.00250 −0.00223 −0.00049 0.00095 0.00473 −0.00340
The biases of the proposed estimators ${\widetilde{\rho }_{1}}$ and ${\widetilde{\rho }_{2}}$ are less than those of ${\widehat{\rho }_{1}}$ and ${\widehat{\rho }_{2}}$, respectively, except for the case as mentioned the above. Moreover, the calibration of ${\widetilde{\rho }_{1}}$ and ${\widetilde{\rho }_{2}}$ depends on the values of ${\widehat{\rho }_{1}}$ and ${\widehat{\rho }_{2}}$. Next, we compare the MSEs between ${\widehat{\rho }_{1}}$ and ${\widetilde{\rho }_{1}}$, and between ${\widehat{\rho }_{2}}$ and ${\widetilde{\rho }_{2}}$.
Table 8.
Comparisons of the estimated MSEs for the estimators of ${\rho _{1}}$ and ${\rho _{2}}$
$({\rho _{1}},{\rho _{2}})=(0.25,-0.25)$
n ${\widehat{\rho }_{1}}$ ${\widehat{\rho }_{1}^{\mathrm{MLE}}}$ ${\widetilde{\rho }_{1}}$ ${\widehat{\rho }_{2}}$ ${\widehat{\rho }_{2}^{\mathrm{MLE}}}$ ${\widetilde{\rho }_{2}}$
50 0.03102 0.03332 0.02822 0.03684 0.04005 0.03055
100 0.01136 0.01157 0.01090 0.01233 0.01264 0.01124
150 0.00706 0.00711 0.00686 0.00739 0.00748 0.00695
$({\rho _{1}},{\rho _{2}})=(0.40,-0.59)$
n ${\widehat{\rho }_{1}}$ ${\widehat{\rho }_{1}^{\mathrm{MLE}}}$ ${\widetilde{\rho }_{1}}$ ${\widehat{\rho }_{2}}$ ${\widehat{\rho }_{2}^{\mathrm{MLE}}}$ ${\widetilde{\rho }_{2}}$
50 0.04705 0.03905 0.03866 0.02783 0.03581 0.02665
100 0.01566 0.01050 0.01364 0.00944 0.01016 0.01021
150 0.00921 0.00579 0.00826 0.00564 0.00569 0.00619
$({\rho _{1}},{\rho _{2}})=(0.15,-0.55)$
n ${\widehat{\rho }_{1}}$ ${\widehat{\rho }_{1}^{\mathrm{MLE}}}$ ${\widetilde{\rho }_{1}}$ ${\widehat{\rho }_{2}}$ ${\widehat{\rho }_{2}^{\mathrm{MLE}}}$ ${\widetilde{\rho }_{2}}$
50 0.03108 0.03492 0.02635 0.03334 0.04263 0.02561
100 0.01051 0.01124 0.00972 0.01123 0.01288 0.00985
150 0.00597 0.00613 0.00568 0.00620 0.00668 0.00569
$({\rho _{1}},{\rho _{2}})=(0.15,0.55)$
n ${\widehat{\rho }_{1}}$ ${\widehat{\rho }_{1}^{\mathrm{MLE}}}$ ${\widetilde{\rho }_{1}}$ ${\widehat{\rho }_{2}}$ ${\widehat{\rho }_{2}^{\mathrm{MLE}}}$ ${\widetilde{\rho }_{2}}$
50 0.01943 0.01993 0.01848 0.02612 0.03126 0.02306
100 0.00814 0.00818 0.00795 0.00937 0.00993 0.00883
150 0.00511 0.00513 0.00503 0.00559 0.00577 0.00538
There is no difference between the estimated MSEs of ${\widehat{\rho }_{i}}$ and ${\widetilde{\rho }_{i}}$ $(i=1,2)$, which certainly coincides with the discussion in Section 3.

5 Proof of theorems

We show our main Theorems 3.1 and 3.2 using lemmas and propositions in Appendix A.

5.1 Proof of Theorem 3.1

The second derivatives are given by the inverse of the Fisher information matrix and the components of the matrix can be obtained by the expectation of the second derivative of the log-likelihood functions. The expectations of the components are given in Proposition A.4, and then the Fisher information matrix is
\[ {I_{n}}(\theta )=\left[\begin{array}{c@{\hskip10.0pt}c@{\hskip10.0pt}c@{\hskip10.0pt}c}{i_{\mu \mu }}& 0& 0& 0\\ {} 0& {i_{\sigma \sigma }}& 0& 0\\ {} 0& 0& {i_{{\rho _{1}}{\rho _{1}}}}& {i_{{\rho _{1}}{\rho _{2}}}}\\ {} 0& 0& {i_{{\rho _{1}}{\rho _{2}}}}& {i_{{\rho _{2}}{\rho _{2}}}}\end{array}\right],\]
where
\[\begin{array}{l}\displaystyle {i_{\mu \mu }}=-{E_{}}[{l_{\mu \mu }}]=\frac{1}{{\sigma ^{2}}}{\sum \limits_{t=1}^{n}}{d_{t-1}^{2}},\hspace{1em}{i_{\sigma \sigma }}=-{E_{}}[{l_{\sigma \sigma }}]=\frac{2n}{{\sigma ^{2}}},\\ {} \displaystyle {i_{{\rho _{1}}{\rho _{1}}}}=-{E_{}}[{l_{{\rho _{1}}{\rho _{1}}}}]={\sum \limits_{t=1}^{n}}{\sum \limits_{k=1}^{t-1}}{({\varphi _{1}}(k))^{2}},\\ {} \displaystyle {i_{{\rho _{2}}{\rho _{2}}}}\hspace{-0.1667em}=\hspace{-0.1667em}-{E_{}}[{l_{{\rho _{2}}{\rho _{2}}}}]\hspace{-0.1667em}=\hspace{-0.1667em}{\sum \limits_{t=1}^{n}}{\sum \limits_{k=1}^{t-2}}{({\varphi _{1}}(k))^{2}},\hspace{1em}{i_{{\rho _{1}}{\rho _{2}}}}\hspace{-0.1667em}=\hspace{-0.1667em}-{E_{}}[{l_{{\rho _{1}}{\rho _{2}}}}]={\sum \limits_{t=1}^{n}}{\sum \limits_{k=2}^{t-1}}{\varphi _{1}}(k){\varphi _{1}}(k-1).\end{array}\]
The functions ${\varphi _{1}}$ and ${d_{t}}$ are defined in (A.3) and (A.5), respectively. Thus, the inverse matrix is given by
(15)
\[ {I_{n}}{(\theta )^{-1}}=\left[\begin{array}{c@{\hskip10.0pt}c@{\hskip10.0pt}c@{\hskip10.0pt}c}{i^{\mu \mu }}& 0& 0& 0\\ {} 0& {i^{\sigma \sigma }}& 0& 0\\ {} 0& 0& {i^{{\rho _{1}}{\rho _{1}}}}& {i^{{\rho _{1}}{\rho _{2}}}}\\ {} 0& 0& {i^{{\rho _{1}}{\rho _{2}}}}& {i^{{\rho _{2}}{\rho _{2}}}}\end{array}\right],\]
where
\[\begin{array}{l}\displaystyle {i^{\mu \mu }}=\frac{1}{{i_{\mu \mu }}}=\frac{{\sigma ^{2}}}{{\textstyle\textstyle\sum _{t=1}^{n}}{d_{t-1}^{2}}},\hspace{1em}{i^{\sigma \sigma }}=\frac{1}{{i_{\sigma \sigma }}}=\frac{{\sigma ^{2}}}{2n},\\ {} \displaystyle {i^{{\rho _{1}}{\rho _{1}}}}={M^{-1}}{i_{{\rho _{2}}{\rho _{2}}}},\hspace{1em}{i^{{\rho _{1}}{\rho _{2}}}}=-{M^{-1}}{i_{{\rho _{1}}{\rho _{2}}}},\hspace{1em}{i^{{\rho _{2}}{\rho _{2}}}}={M^{-1}}{i_{{\rho _{1}}{\rho _{1}}}},\\ {} \displaystyle M={i_{{\rho _{1}}{\rho _{1}}}}{i_{{\rho _{2}}{\rho _{2}}}}-{({i_{{\rho _{1}}{\rho _{2}}}})^{2}}.\end{array}\]
We shall compute the limiting values for ${i^{\mu \mu }}$, ${i^{\sigma \sigma }}$, ${i^{{\rho _{1}}{\rho _{1}}}}$, ${i^{{\rho _{1}}{\rho _{2}}}}$, and ${i^{{\rho _{2}}{\rho _{2}}}}$. Using the different expression (B.4) for ${\varphi _{1}}$, we have
(16)
\[ {d_{t-1}}={\sum \limits_{t=1}^{t}}{\varphi _{1}}(k)=\frac{1}{1+{\rho _{1}}+{\rho _{2}}}+\frac{{\rho _{2}}{\varphi _{1}}(t)-{\varphi _{1}}(t+1)}{1+{\rho _{1}}+{\rho _{2}}}.\]
We consider the summation from $t=1$ to n of ${d_{t-1}^{2}}$ to get ${i^{\mu \mu }}$. The second term in ${d_{t-1}}$ is not a main term for the summation as $n\to \infty $. Therefore,
\[ {i^{\mu \mu }}=\frac{{\sigma ^{2}}{(1+{\rho _{1}}+{\rho _{2}})^{2}}}{n}+o({n^{-1}}).\]
Similarly, we have
\[ {\sum \limits_{k=1}^{t-1}}{({\varphi _{1}}(k))^{2}}=\frac{1}{{\Delta ^{2}}}\left(\frac{{\lambda _{1}^{2}}-{\lambda _{1}^{2t}}}{1-{\lambda _{1}^{2}}}-2\frac{{\rho _{2}}-{\rho _{2}^{t}}}{1-{\rho _{2}}}+\frac{{\lambda _{2}^{2}}-{\lambda _{2}^{2t}}}{1-{\lambda _{2}^{2}}}\right)\]
and
\[ {\sum \limits_{k=2}^{t-1}}{\varphi _{1}}(k){\varphi _{1}}(k-1)=\frac{1}{{\Delta ^{2}}}\left(\frac{{\lambda _{1}^{2}}-{\lambda _{1}^{2t-1}}}{1-{\lambda _{1}^{2}}}+{\rho _{1}}\frac{1-{\rho _{2}^{t}}}{1-{\rho _{2}}}+\frac{{\lambda _{2}^{2}}-{\lambda _{2}^{2t-1}}}{1-{\lambda _{2}^{2}}}\right),\]
and then
\[\begin{aligned}{}{i_{{\rho _{1}}{\rho _{1}}}}& ={i_{{\rho _{2}}{\rho _{2}}}}=\frac{1}{{\Delta ^{2}}}\left(\frac{{\lambda _{1}^{2}}}{1-{\lambda _{1}^{2}}}-2\frac{{\rho _{2}}}{1-{\rho _{2}}}+\frac{{\lambda _{2}^{2}}}{1-{\lambda _{2}^{2}}}\right)n+o(n)\\ {} & =-\frac{1+{\rho _{2}}}{(1-{\rho _{1}^{2}}+2{\rho _{2}}+{\rho _{2}^{2}})(1-{\rho _{2}})}n+o(n)\end{aligned}\]
and
\[\begin{aligned}{}{i_{{\rho _{1}}{\rho _{2}}}}& =\frac{1}{{\Delta ^{2}}}\left(\frac{{\lambda _{1}}}{1-{\lambda _{1}^{2}}}+\frac{{\rho _{1}}}{1-{\rho _{2}}}+\frac{{\lambda _{2}}}{1-{\lambda _{2}^{2}}}\right)n+o(n)\\ {} & =-\frac{{\rho _{1}}}{(1-{\rho _{1}^{2}}+2{\rho _{2}}+{\rho _{2}^{2}})(1-{\rho _{2}})}n+o(n).\end{aligned}\]
Thus,
\[ M={i_{{\rho _{1}}{\rho _{1}}}}{i_{{\rho _{2}}{\rho _{2}}}}-{({i_{{\rho _{1}}{\rho _{2}}}})^{2}}=-\frac{1}{(1-{\rho _{1}^{2}}+2{\rho _{2}}+{\rho _{2}^{2}}){(1-{\rho _{2}})^{2}}}{n^{2}}+o({n^{2}}).\]
Therefore, we obtain
(17)
\[\begin{array}{r@{\hskip10.0pt}c@{\hskip10.0pt}l}\displaystyle {i^{{\rho _{1}}{\rho _{1}}}}& \displaystyle =& \displaystyle {i^{{\rho _{2}}{\rho _{2}}}}=\frac{1-{\rho _{2}^{2}}}{n}+o({n^{-1}}),\end{array}\]
(18)
\[\begin{array}{r@{\hskip10.0pt}c@{\hskip10.0pt}l}\displaystyle {i^{{\rho _{1}}{\rho _{2}}}}& \displaystyle =& \displaystyle \frac{{\rho _{1}}(1-{\rho _{2}})}{n}+o({n^{-1}}).\end{array}\]

5.2 Proof of Theorem 3.2

Eq. (9) is trivial by Lemma 2.3 and Proposition A.6. We show (10) using Lemma A.7. The lemma can be reduced to
(19)
\[ {E_{}}[{\widehat{\theta }_{r}}-{\theta _{r}}]=\frac{1}{2}{i^{{\theta _{r}}{\theta _{r}}}}{\sum \limits_{t=1}^{d}}{\sum \limits_{u=1}^{d}}{i^{{\theta _{t}}{\theta _{u}}}}({\nu _{{\theta _{r}}{\theta _{t}}{\theta _{u}}}}+2{\nu _{{\theta _{r}}{\theta _{t}},{\theta _{u}}}})+O({n^{-3/2}})\]
if ${i^{{\theta _{r}}{\theta _{s}}}}=0$ for all $s\ne r$. We see ${i^{\sigma \mu }}={i^{\sigma {\rho _{1}}}}={i^{\sigma {\rho _{1}}}}=0$ from (15), and then we can apply (19) for the bias $\hat{\sigma }$. The components in the sum (19) are
(20)
\[\begin{array}{r@{\hskip10.0pt}c@{\hskip10.0pt}l}& & \displaystyle {i^{\mu \mu }}({\nu _{\sigma \mu \mu }}+2{\nu _{\sigma \mu ,\mu }})=\frac{{\sigma ^{2}}}{{\textstyle\textstyle\sum _{t=1}^{n}}{d_{t-1}^{2}}}\left(-\frac{2}{{\sigma ^{3}}}{\sum \limits_{t=1}^{n}}{d_{t-1}^{2}}\right)=-\frac{2}{\sigma },\end{array}\]
(21)
\[\begin{array}{r@{\hskip10.0pt}c@{\hskip10.0pt}l}& & \displaystyle {i^{\sigma \sigma }}({\nu _{\sigma \sigma \sigma }}+2{\nu _{\sigma \sigma ,\sigma }})=\frac{{\sigma ^{2}}}{2n}\left(-\frac{2n}{{\sigma ^{3}}}\right)=-\frac{1}{\sigma },\end{array}\]
(22)
\[\begin{array}{r@{\hskip10.0pt}c@{\hskip10.0pt}l}& & \displaystyle {i^{{\rho _{1}}{\rho _{1}}}}({\nu _{\sigma {\rho _{1}}{\rho _{1}}}}+2{\nu _{\sigma {\rho _{1}},{\rho _{1}}}})={M^{-1}}{i_{{\rho _{2}}{\rho _{2}}}}\left(-\frac{2}{\sigma }{i_{{\rho _{1}}{\rho _{1}}}}\right)=-\frac{2{i_{{\rho _{1}}{\rho _{1}}}}{i_{{\rho _{2}}{\rho _{2}}}}}{M\sigma },\end{array}\]
(23)
\[\begin{array}{r@{\hskip10.0pt}c@{\hskip10.0pt}l}& & \displaystyle {i^{{\rho _{2}}{\rho _{2}}}}({\nu _{\sigma {\rho _{2}}{\rho _{2}}}}+2{\nu _{\sigma {\rho _{2}},{\rho _{2}}}})={M^{-1}}{i_{{\rho _{1}}{\rho _{1}}}}\left(-\frac{2}{\sigma }{i_{{\rho _{2}}{\rho _{2}}}}\right)=-\frac{2{i_{{\rho _{1}}{\rho _{1}}}}{i_{{\rho _{2}}{\rho _{2}}}}}{M\sigma },\end{array}\]
(24)
\[\begin{array}{r@{\hskip10.0pt}c@{\hskip10.0pt}l}& & \displaystyle {i^{{\rho _{1}}{\rho _{2}}}}({\nu _{\sigma {\rho _{1}}{\rho _{2}}}}+2{\nu _{\sigma {\rho _{1}},{\rho _{2}}}})+{i^{{\rho _{2}}{\rho _{1}}}}({\nu _{\sigma {\rho _{2}}{\rho _{1}}}}+2{\nu _{\sigma {\rho _{2}},{\rho _{1}}}})\\ {} & \displaystyle =& \displaystyle 2{i^{{\rho _{1}}{\rho _{2}}}}({\nu _{\sigma {\rho _{1}}{\rho _{2}}}}+{\nu _{\sigma {\rho _{1}},{\rho _{2}}}}+{\nu _{\sigma {\rho _{2}},{\rho _{1}}}})\\ {} & \displaystyle =& \displaystyle -2{M^{-1}}{i_{{\rho _{1}}{\rho _{2}}}}\left(-\frac{2}{\sigma }{i_{{\rho _{1}}{\rho _{2}}}}\right)=\frac{4{({i_{{\rho _{1}}{\rho _{2}}}})^{2}}}{M\sigma }.\end{array}\]
The summation of (22), (23), and (24) is
\[ -\frac{4{i_{{\rho _{1}}{\rho _{1}}}}{i_{{\rho _{2}}{\rho _{2}}}}}{M\sigma }+\frac{4{({i_{{\rho _{1}}{\rho _{2}}}})^{2}}}{M\sigma }=-\frac{4}{\sigma }.\]
By adding (20) and (21) on the above, we obtain (10).
Next, we show (11) and (12) using Proposition A.8. The 10 components of the equation in Lemma 2.3 which are used for the calculation of the biases of ${\rho _{1}}$ and ${\rho _{2}}$ are given by merely substituting the results in Proposition A.8:
\[\begin{array}{r@{\hskip10.0pt}c@{\hskip10.0pt}l}& & \displaystyle {i^{\mu \mu }}({\nu _{{\rho _{1}}\mu \mu }}+2{\nu _{{\rho _{1}}\mu ,\mu }})=\frac{-2{\sum \limits_{t=1}^{n}}{\sum \limits_{k=1}^{t-1}}{d_{t-1}}{d_{t-k-1}}{\varphi _{1}}(k)}{{\sum \limits_{s=1}^{n}}{d_{s-1}^{2}}},\\ {} & & \displaystyle {i^{\sigma \sigma }}({\nu _{{\rho _{1}}\sigma \sigma }}+2{\nu _{{\rho _{1}}\sigma ,\sigma }})=0,\\ {} & & \displaystyle {i^{{\rho _{1}}{\rho _{1}}}}({\nu _{{\rho _{1}}{\rho _{1}}{\rho _{1}}}}+2{\nu _{{\rho _{1}}{\rho _{1}},{\rho _{1}}}})={M^{-1}}{i_{{\rho _{2}}{\rho _{2}}}}{\sum \limits_{t=1}^{n}}({S_{1,t}}+2{T_{0,0,t}}),\\ {} & & \displaystyle 2{i^{{\rho _{1}}{\rho _{2}}}}({\nu _{{\rho _{1}}{\rho _{1}}{\rho _{2}}}}+{\nu _{{\rho _{1}}{\rho _{1}},{\rho _{2}}}}+{\nu _{{\rho _{1}}{\rho _{2}},{\rho _{1}}}})=-2{M^{-1}}{i_{{\rho _{1}}{\rho _{2}}}}{\sum \limits_{t=1}^{n}}({S_{2,t}}+{T_{0,1,t}}+{T_{1,0,t}}),\\ {} & & \displaystyle {i^{{\rho _{2}}{\rho _{2}}}}({\nu _{{\rho _{1}}{\rho _{2}}{\rho _{2}}}}+2{\nu _{{\rho _{1}}{\rho _{2}},{\rho _{2}}}})={M^{-1}}{i_{{\rho _{1}}{\rho _{1}}}}{\sum \limits_{t=1}^{n}}({S_{3,t}}+2{T_{1,1,t}}),\end{array}\]
and
\[\begin{aligned}{}& {i^{\mu \mu }}({\nu _{{\rho _{2}}\mu \mu }}+2{\nu _{{\rho _{2}}\mu ,\mu }})=\frac{-2{\sum \limits_{t=1}^{n}}{\sum \limits_{k=1}^{t-2}}{d_{t-1}}{d_{t-k-2}}{\varphi _{1}}(k)}{{\sum \limits_{s=1}^{n}}{d_{s-1}^{2}}},\\ {} & {i^{\sigma \sigma }}({\nu _{{\rho _{2}}\sigma \sigma }}+2{\nu _{{\rho _{2}}\sigma ,\sigma }})=0,\\ {} & {i^{{\rho _{1}}{\rho _{1}}}}({\nu _{{\rho _{2}}{\rho _{1}}{\rho _{1}}}}+2{\nu _{{\rho _{2}}{\rho _{1}},{\rho _{1}}}})={M^{-1}}{i_{{\rho _{2}}{\rho _{2}}}}{\sum \limits_{t=1}^{n}}({S_{0,t-1}}+2{T_{1,0,t}}),\\ {} & 2{i^{{\rho _{1}}{\rho _{2}}}}({\nu _{{\rho _{1}}{\rho _{2}}{\rho _{2}}}}+{\nu _{{\rho _{1}}{\rho _{2}},{\rho _{2}}}}+{\nu _{{\rho _{2}}{\rho _{2}},{\rho _{1}}}})\hspace{-0.1667em}=\hspace{-0.1667em}-2{M^{-1}}{i_{{\rho _{1}}{\rho _{2}}}}{\sum \limits_{t=1}^{n}}({S_{1,t-1}}+{T_{1,1,t}}+{T_{0,0,t-1}}),\\ {} & {i^{{\rho _{2}}{\rho _{2}}}}({\nu _{{\rho _{2}}{\rho _{2}}{\rho _{2}}}}+2{\nu _{{\rho _{2}}{\rho _{2}},{\rho _{2}}}})={M^{-1}}{i_{{\rho _{1}}{\rho _{1}}}}{\sum \limits_{t=1}^{n}}({S_{2,t-1}}+2{T_{0,1,t}}),\end{aligned}\]
where ${S_{p,q}}$ and ${T_{p,q,t}}$ are defined in (A.26) and (A.27), respectively. Now, we want to know the limiting values of the right-hand side of the above equations and to evaluate the sum of the first 5 equations and the remaining 5 equations. Let U and V be the sum of the first 5 equations and that of the remaining 5 equations, respectively. We see that
\[\begin{array}{r@{\hskip10.0pt}c@{\hskip10.0pt}l}\displaystyle {\sum \limits_{t=1}^{n}}{\sum \limits_{k=1}^{t-1}}{d_{t-1}}{d_{t-k-1}}{\varphi _{1}}(k)& \displaystyle =& \displaystyle \frac{1}{{({\rho _{1}}+{\rho _{2}}+1)^{3}}}n+o(n),\\ {} \displaystyle {\sum \limits_{t=1}^{n}}{\sum \limits_{k=1}^{t-2}}{d_{t-1}}{d_{t-k-2}}{\varphi _{1}}(k)& \displaystyle =& \displaystyle \frac{1}{{({\rho _{1}}+{\rho _{2}}+1)^{3}}}n+o(n)\end{array}\]
by (16). What regards the other summations, the values are constructed by ${S_{p,q}}$ and ${T_{p,q,t}}$. The components of ${S_{p,q}}$ and ${T_{p,q,t}}$ are given by using (B.4) and (B.5) which are a sort of geometric series. We consider the summations of ${S_{p,q}}$ and ${T_{p,q,t}}$ with respect to t from 1 to n as $n\to \infty $. Since the components are expressed as a sort of geometric series, the following limiting evaluations are useful.
\[\begin{array}{r@{\hskip10.0pt}c@{\hskip10.0pt}l}\displaystyle {\sum \limits_{t=1}^{n}}{\sum \limits_{k=1}^{t}}{\alpha ^{k}}=\frac{\alpha }{1-\alpha }n+o(n),& & \displaystyle {\sum \limits_{t=1}^{n}}{\sum \limits_{k=1}^{t}}(k+1){\alpha ^{k+1}}=\frac{{\alpha ^{2}}(2-\alpha )}{{(1-\alpha )^{2}}}n+o(n),\\ {} \displaystyle {\sum \limits_{t=1}^{n}}{\sum \limits_{k=1}^{t}}{\sum \limits_{m=1}^{t-k}}{\alpha ^{k}}{\beta ^{m}}& \displaystyle =& \displaystyle \frac{\alpha \beta }{(1-\alpha )(1-\beta )}n+o(n)\end{array}\]
for any $|\alpha |<1$ and $|\beta |<1$. Then, the summations of ${S_{p,t}}$ and ${T_{p,q,t}}$ with respect to t are
\[\begin{aligned}{}{\sum \limits_{t=1}^{n}}{S_{1,t}}& =\frac{2{\rho _{1}}({\rho _{2}}+1)}{({\rho _{2}}-1){\{{\rho _{1}^{2}}-{({\rho _{2}}+1)^{2}}\}^{2}}}n+o(n),\\ {} {\sum \limits_{t=1}^{n}}{S_{2,t}}& =\frac{2\{{\rho _{1}^{2}}-{\rho _{2}}{({\rho _{2}}+1)^{2}}\}}{{({\rho _{2}}-1)^{2}}{\{{\rho _{1}^{2}}-{({\rho _{2}}+1)^{2}}\}^{2}}}n+o(n),\\ {} {\sum \limits_{t=1}^{n}}{S_{3,t}}& =\frac{2{\rho _{1}}\{-{\rho _{1}^{2}}+2{\rho _{2}}({\rho _{2}}+1)\}}{{({\rho _{2}}-1)^{2}}{\{{\rho _{1}^{2}}-{({\rho _{2}}+1)^{2}}\}^{2}}}n+o(n),\\ {} {\sum \limits_{t=1}^{n}}{S_{0,t-1}}& =\frac{2\{1+{\rho _{2}}(2-{\rho _{1}^{2}}+{\rho _{2}})\}}{{({\rho _{2}}-1)^{2}}{\{{\rho _{1}^{2}}-{({\rho _{2}}+1)^{2}}\}^{2}}}n+o(n),\\ {} {\sum \limits_{t=1}^{n}}{S_{1,t-1}}& =\frac{2{\rho _{1}}({\rho _{2}}+1)}{{({\rho _{2}}-1)^{2}}{\{{\rho _{1}^{2}}-{({\rho _{2}}+1)^{2}}\}^{2}}}n+o(n),\\ {} {\sum \limits_{t=1}^{n}}{S_{2,t-1}}& =\frac{2\{{\rho _{1}^{2}}-{\rho _{2}}{({\rho _{2}}+1)^{2}}\}}{{({\rho _{2}}-1)^{2}}{\{{\rho _{1}^{2}}-{({\rho _{2}}+1)^{2}}\}^{2}}}n+o(n),\\ {} {\sum \limits_{t=1}^{n}}{T_{0,0,t}}& =\frac{-2{\rho _{1}}({\rho _{2}}+1)}{({\rho _{2}}-1){\{{\rho _{1}^{2}}-{({\rho _{2}}+1)^{2}}\}^{2}}}n+o(n),\\ {} {\sum \limits_{t=1}^{n}}{T_{0,1,t}}& =\frac{-2\{{\rho _{1}^{2}}-{\rho _{2}}{({\rho _{2}}+1)^{2}}\}}{{({\rho _{2}}-1)^{2}}{\{{\rho _{1}^{2}}-{({\rho _{2}}+1)^{2}}\}^{2}}}n+o(n),\\ {} {\sum \limits_{t=1}^{n}}{T_{1,0,t}}& =\frac{{\rho _{1}^{2}}+{({\rho _{2}}+1)^{2}}}{({\rho _{2}}-1){\{{\rho _{1}^{2}}-{({\rho _{2}}+1)^{2}}\}^{2}}}n+o(n),\\ {} {\sum \limits_{t=1}^{n}}{T_{1,1,t}}& =\frac{{\rho _{1}}(1+{\rho _{1}^{2}}-2{\rho _{2}}-3{\rho _{2}^{2}})}{{({\rho _{2}}-1)^{2}}{\{{\rho _{1}^{2}}-{({\rho _{2}}+1)^{2}}\}^{2}}}n+o(n),\\ {} {\sum \limits_{t=1}^{n}}{T_{0,0,t-1}}& =\frac{-2{\rho _{1}}({\rho _{2}}+1)}{({\rho _{2}}-1){\{{\rho _{1}^{2}}-{({\rho _{2}}+1)^{2}}\}^{2}}}n+o(n).\end{aligned}\]
Now, we are ready to use Lemma 2.3. Using the above evaluations, we have
\[\begin{aligned}{}U& =\frac{-2}{{\rho _{1}}+{\rho _{2}}+1}+\frac{4{\rho _{1}}{({\rho _{2}}+1)^{2}}-2{\rho _{1}}\{{\rho _{1}^{2}}+{({\rho _{2}}+1)^{2}}\}}{{\{{\rho _{1}^{2}}-{({\rho _{2}}+1)^{2}}\}^{2}}}+o(1),\\ {} V& =\frac{-2}{{\rho _{1}}+{\rho _{2}}+1}+\frac{4{\rho _{1}}({\rho _{2}}+1)\{{\rho _{1}^{2}}-{\rho _{2}}{({\rho _{2}}+1)^{2}}\}+2{\rho _{1}^{2}}(1+{\rho _{1}^{2}}-2{\rho _{2}}-3{\rho _{2}^{2}})}{(1-{\rho _{2}}){\{{\rho _{1}^{2}}-{({\rho _{2}}+1)^{2}}\}^{2}}}\\ {} & \hspace{1em}+o(1).\end{aligned}\]
By (17) and (18), the biases of the unknown parameters ${\widehat{\rho }_{1}}$ and ${\widehat{\rho }_{2}}$ are
\[ {E_{}}[{\widehat{\rho }_{1}}-{\rho _{1}}]=\frac{1}{2}{i^{{\rho _{1}}{\rho _{1}}}}U+\frac{1}{2}{i^{{\rho _{1}}{\rho _{2}}}}V=\frac{{\rho _{1}}+{\rho _{2}}-1}{n}+o({n^{-1}})\]
and
\[ {E_{}}[{\widehat{\rho }_{2}}-{\rho _{2}}]=\frac{1}{2}{i^{{\rho _{1}}{\rho _{2}}}}U+\frac{1}{2}{i^{{\rho _{2}}{\rho _{2}}}}V=\frac{3{\rho _{2}}-1}{n}+o({n^{-1}}).\]
Therefore, we obtain (11) and (12).

6 Practical examples

We provide a practical example using quarterly U.S. GNP from $1947(1)$ to $2002(3)$, $n=223$ observations. The original data which are provided by the Federal Reserve Bank of St. Louis [16] are introduced after being adjusted as a good example for MA(2) in [13] when the data are transformed to compute the GNP rate from ${X_{t}}$ by
(25)
\[ {Y_{t}}=\nabla \log ({X_{t}}),\]
where ${X_{t}}$ is the U.S. GNP. The adjusted data which are different from the original can be obtained from the web site by the author of the book.
We do not know true values for unknown parameters in the MA(2) but we assume that the GNP rate follows
(26)
\[ {Y_{t}}=0.0083+{\varepsilon _{t}}+0.3028{\varepsilon _{t-1}}+0.2036{\varepsilon _{t-2}},\]
where the coefficients are calculated by the (full) maximum likelihood estimation using $n=223-1=222$ observations because we need to take the difference by (25) and confirm whether the bias using the model with the last 20 samples is reduced by our method or not.
Table 9.
MLE and conditional MLE with a correction for U.S. GNP using MA$(2)$
μ σ ${\rho _{1}}$ ${\rho _{2}}$
TRUE ($n=222$) 0.0083 0.0094 0.3028 0.2036
MLE ($n=20$) 0.0071 0.0060 0.1446 0.1370
QMLE ($n=20$) 0.0072 0.0060 0.1564 0.1321
corrected MLE – 0.0065 0.1824 0.1680
corrected QMLE – 0.0065 0.1939 0.1639
Bias for MLE −0.0012 −0.0035 −0.1582 −0.0666
Bias for QMLE −0.0012 −0.0035 −0.1464 −0.0714
Bias for corrected MLE – −0.0029 −0.1204 −0.0356
Bias for corrected QMLE – −0.0029 −0.1089 −0.0397
For an unknown parameter $\theta \hspace{2.5pt}(\theta =\mu ,\sigma ,{\rho _{1}},{\rho _{2}})$, MLE and QMLE in Table 9 correspond to the maximum likelihood estimate ${\widehat{\theta }^{\text{MLE}}}$ and the conditional maximum likelihood estimate (quasi-maximum likelihood estimate) $\widehat{\theta }$ using $n=20$ observations. The corrected MLE and corrected QMLE correspond to the result by Tanaka (1984) [15] and $\tilde{\theta }$ defined in (13), respectively. The reason why there are dashes (–) in the table for μ is that the correction is not required for μ. The four biases in the table from the bottom correspond to those for MLE, QMLE, corrected MLE, and corrected QMLE. We note that the corrected MLE is expected to be the best when we use $n=20$ observations because the estimate uses the full maximum likelihood estimation. The both corrections work well, namely the bias for true model (26) become small against the models without the corrections.

A Lemmas

We show the derivatives of the conditional log-likelihood function and expectations. Let
(A.1)
\[ {\lambda _{1}}+{\lambda _{2}}=-{\rho _{1}},\hspace{1em}{\lambda _{1}}-{\lambda _{2}}=\Delta ,\hspace{1em}{\lambda _{1}}{\lambda _{2}}={\rho _{2}}\]
and
(A.2)
\[ {\lambda _{i}^{2}}+{\rho _{1}}{\lambda _{i}}+{\rho _{2}}=0\]
for $i=1,2$. Furthermore, we define the following functions:
(A.3)
\[ {\varphi _{1}}(k)={\sum \limits_{l=0}^{k-1}}{\lambda _{1}^{l}}{\lambda _{2}^{k-1-l}},\]
(A.4)
\[ {\varphi _{2}}(k)=-2\frac{\partial {\varphi _{1}}(k)}{\partial {\rho _{1}}},\]
and
(A.5)
\[ {d_{t}}={\sum \limits_{k=1}^{t+1}}{\varphi _{1}}(k).\]
These functions reduce calculation costs. We list the lemmas and propositions in the following. Their proofs are provided in Appendix B except for Lemmas A.3 and A.5 because Lemmas A.3 and A.5 just list the derivatives of the log-likelihood function and they are easily obtained by the chain rule for (A.6) and (A.7).
The first derivatives of the conditional log-likelihood function (5) for the Gaussian $\mathrm{MA}(2)$ model are given by
(A.6)
\[ {l_{\mu }}=\frac{1}{{\sigma ^{2}}}{\sum \limits_{t=1}^{n}}{d_{t-1}}{\varepsilon _{t}},\hspace{1em}{l_{\sigma }}=-\frac{n}{\sigma }+\frac{1}{{\sigma ^{3}}}{\sum \limits_{t=1}^{n}}{\varepsilon _{t}^{2}},\]
(A.7)
\[ {l_{{\rho _{1}}}}=-\frac{1}{{\sigma ^{2}}}{\sum \limits_{t=1}^{n}}{\varepsilon _{t}}\frac{\partial {\varepsilon _{t}}}{\partial {\rho _{1}}},\hspace{1em}{l_{{\rho _{2}}}}=-\frac{1}{{\sigma ^{2}}}{\sum \limits_{t=1}^{n}}{\varepsilon _{t}}\frac{\partial {\varepsilon _{t}}}{\partial {\rho _{2}}}.\]
Lemma A.1.
The first derivatives of ${\varepsilon _{t}}$ in (6) are given by
(A.8)
\[\begin{array}{r@{\hskip10.0pt}c@{\hskip10.0pt}l}\displaystyle \frac{\partial {\varepsilon _{t}}}{\partial \mu }& \displaystyle =& \displaystyle -{d_{t-1}},\end{array}\]
(A.9)
\[\begin{array}{r@{\hskip10.0pt}c@{\hskip10.0pt}l}\displaystyle \frac{\partial {\varepsilon _{t}}}{\partial {\rho _{1}}}& \displaystyle =& \displaystyle -{\sum \limits_{k=1}^{t-1}}{\varphi _{1}}(k){\varepsilon _{t-k}}=-{\sum \limits_{k=1}^{t-1}}{\varphi _{1}}(t-k){\varepsilon _{k}},\end{array}\]
(A.10)
\[\begin{array}{r@{\hskip10.0pt}c@{\hskip10.0pt}l}\displaystyle \frac{\partial {\varepsilon _{t}}}{\partial {\rho _{2}}}& \displaystyle =& \displaystyle \frac{\partial {\varepsilon _{t-1}}}{\partial {\rho _{1}}}=-{\sum \limits_{k=1}^{t-2}}{\varphi _{1}}(k){\varepsilon _{t-k-1}}=-{\sum \limits_{k=1}^{t-2}}{\varphi _{1}}(t-k-1){\varepsilon _{k}}.\end{array}\]
Lemma A.1 implies that $\partial {\varepsilon _{t}}/\partial {\rho _{1}}$ does not depend on ${\varepsilon _{t}}$, but is expressed as a linear combination of ${\varepsilon _{1}},\dots ,{\varepsilon _{t-1}}$. Furthermore, $\partial {\varepsilon _{t}}/\partial {\rho _{2}}$ does not depend on ${\varepsilon _{t-1}}$ and ${\varepsilon _{t}}$, but depends on ${\varepsilon _{1}},\dots ,{\varepsilon _{t-2}}$.
Lemma A.2.
The second derivatives of ${\varepsilon _{t}}$ in (6) are given by
(A.11)
\[\begin{array}{r@{\hskip10.0pt}c@{\hskip10.0pt}l}\displaystyle \frac{{\partial ^{2}}{\varepsilon _{t}}}{\partial \mu \partial {\rho _{1}}}& \displaystyle =& \displaystyle \frac{\partial {d_{t-1}}}{\partial {\rho _{1}}}=-{\sum \limits_{k=1}^{t-1}}{\varphi _{1}}(k){d_{t-k-1}},\end{array}\]
(A.12)
\[\begin{array}{r@{\hskip10.0pt}c@{\hskip10.0pt}l}\displaystyle \frac{{\partial ^{2}}{\varepsilon _{t}}}{\partial \mu \partial {\rho _{2}}}& \displaystyle =& \displaystyle \frac{{\partial ^{2}}{\varepsilon _{t-1}}}{\partial \mu \partial {\rho _{1}}}=\frac{\partial {d_{t-1}}}{\partial {\rho _{2}}}=-{\sum \limits_{k=1}^{t-2}}{\varphi _{1}}(k){d_{t-k-2}},\end{array}\]
(A.13)
\[\begin{array}{r@{\hskip10.0pt}c@{\hskip10.0pt}l}\displaystyle \frac{{\partial ^{2}}{\varepsilon _{t}}}{\partial {\rho _{1}^{2}}}& \displaystyle =& \displaystyle {\sum \limits_{k=1}^{t-2}}{\varphi _{2}}(k+1){\varepsilon _{t-k-1}}={\sum \limits_{k=1}^{t-2}}{\varphi _{2}}(t-k){\varepsilon _{k}},\end{array}\]
(A.14)
\[\begin{array}{r@{\hskip10.0pt}c@{\hskip10.0pt}l}\displaystyle \frac{{\partial ^{2}}{\varepsilon _{t}}}{\partial {\rho _{1}}\partial {\rho _{2}}}& \displaystyle =& \displaystyle \frac{{\partial ^{2}}{\varepsilon _{t-1}}}{\partial {\rho _{1}^{2}}}={\sum \limits_{k=1}^{t-3}}{\varphi _{2}}(k+1){\varepsilon _{t-k-2}}={\sum \limits_{k=1}^{t-3}}{\varphi _{2}}(t-k-1){\varepsilon _{k}},\end{array}\]
(A.15)
\[\begin{array}{r@{\hskip10.0pt}c@{\hskip10.0pt}l}\displaystyle \frac{{\partial ^{2}}{\varepsilon _{t}}}{\partial {\rho _{2}^{2}}}& \displaystyle =& \displaystyle \frac{{\partial ^{2}}{\varepsilon _{t-2}}}{\partial {\rho _{1}^{2}}}={\sum \limits_{k=1}^{t-4}}{\varphi _{2}}(k+1){\varepsilon _{t-k-3}}={\sum \limits_{k=1}^{t-4}}{\varphi _{2}}(t-k-2){\varepsilon _{k}}.\hspace{1em}\end{array}\]
Similarly, ${\partial ^{2}}{\varepsilon _{t}}/\partial {\rho _{1}^{2}}$ is expressed as a linear combination of ${\varepsilon _{1}},\dots ,{\varepsilon _{t-2}}$, ${\partial ^{2}}{\varepsilon _{t}}/\partial {\rho _{1}}\partial {\rho _{2}}$ is a linear combination of ${\varepsilon _{1}},\dots ,{\varepsilon _{t-3}}$, and ${\partial ^{2}}{\varepsilon _{t}}/\partial {\rho _{2}^{2}}$ is that of ${\varepsilon _{1}},\dots ,{\varepsilon _{t-4}}$.
Lemma A.3.
The second derivatives of the conditional log-likelihood function (5) for the Gaussian $\mathrm{MA}(2)$ model are given by
\[\begin{array}{r@{\hskip10.0pt}c@{\hskip10.0pt}l}\displaystyle {l_{\mu \mu }}& \displaystyle =& \displaystyle -\frac{1}{{\sigma ^{2}}}{\sum \limits_{t=1}^{n}}{d_{t-1}^{2}},\hspace{1em}{l_{\mu \sigma }}=-\frac{2}{{\sigma ^{3}}}{\sum \limits_{t=1}^{n}}{d_{t-1}}{\varepsilon _{t}},\hspace{1em}{l_{\mu {\rho _{1}}}}=\frac{1}{{\sigma ^{2}}}{\sum \limits_{t=1}^{n}}\left({\varepsilon _{t}}\frac{\partial {d_{t-1}}}{\partial {\rho _{1}}}+{d_{t-1}}\frac{\partial {\varepsilon _{t}}}{\partial {\rho _{1}}}\right),\\ {} \displaystyle {l_{\mu {\rho _{2}}}}& \displaystyle =& \displaystyle \frac{1}{{\sigma ^{2}}}{\sum \limits_{t=1}^{n}}\left({\varepsilon _{t}}\frac{\partial {d_{t-1}}}{\partial {\rho _{2}}}+{d_{t-1}}\frac{\partial {\varepsilon _{t}}}{\partial {\rho _{2}}}\right),\hspace{1em}{l_{\sigma \sigma }}=\frac{n}{{\sigma ^{2}}}-\frac{3}{{\sigma ^{4}}}{\sum \limits_{t=1}^{n}}{\varepsilon _{t}^{2}},\hspace{1em}{l_{\sigma {\rho _{1}}}}=\frac{2}{{\sigma ^{3}}}{\sum \limits_{t=1}^{n}}{\varepsilon _{t}}\frac{\partial {\varepsilon _{t}}}{\partial {\rho _{1}}},\\ {} \displaystyle {l_{\sigma {\rho _{2}}}}& \displaystyle =& \displaystyle \frac{2}{{\sigma ^{3}}}{\sum \limits_{t=1}^{n}}{\varepsilon _{t}}\frac{\partial {\varepsilon _{t}}}{\partial {\rho _{2}}},\hspace{1em}{l_{{\rho _{1}}{\rho _{1}}}}=-\frac{1}{{\sigma ^{2}}}{\sum \limits_{t=1}^{n}}\left\{{\left(\frac{\partial {\varepsilon _{t}}}{\partial {\rho _{1}}}\right)^{2}}+{\varepsilon _{t}}\frac{{\partial ^{2}}{\varepsilon _{t}}}{\partial {\rho _{1}^{2}}}\right\},\\ {} \displaystyle {l_{{\rho _{1}}{\rho _{2}}}}& \displaystyle =& \displaystyle -\frac{1}{{\sigma ^{2}}}{\sum \limits_{t=1}^{n}}\left(\frac{\partial {\varepsilon _{t}}}{\partial {\rho _{1}}}\frac{\partial {\varepsilon _{t}}}{\partial {\rho _{2}}}+{\varepsilon _{t}}\frac{{\partial ^{2}}{\varepsilon _{t}}}{\partial {\rho _{1}}\partial {\rho _{2}}}\right),\hspace{1em}{l_{{\rho _{2}}{\rho _{2}}}}=-\frac{1}{{\sigma ^{2}}}{\sum \limits_{t=1}^{n}}\left\{{\left(\frac{\partial {\varepsilon _{t}}}{\partial {\rho _{2}}}\right)^{2}}+{\varepsilon _{t}}\frac{{\partial ^{2}}{\varepsilon _{t}}}{\partial {\rho _{2}^{2}}}\right\}.\end{array}\]
Since $\partial {\varepsilon _{t}}/\partial {\rho _{1}}$ is a linear combination of ${\varepsilon _{k}}$ ($k=1,\dots ,t-1$), and $\partial {\varepsilon _{t}}/\partial {\rho _{2}}$ is a linear combination of ${\varepsilon _{l}}$ ($l=1,\dots ,t-2$) by Lemma A.1, their expectations equal 0. Lemma A.3 is easily obtained by the first derivatives (A.6) and (A.7) with the chain rule.
Proposition A.4.
The expectations of the second derivatives of the conditional log-likelihood function (5) for the Gaussian $\mathrm{MA}(2)$ model are given by
(A.16)
\[\begin{array}{r@{\hskip10.0pt}c@{\hskip10.0pt}l}\displaystyle {E_{}}[{l_{\mu \mu }}]& \displaystyle =& \displaystyle -\frac{1}{{\sigma ^{2}}}{\sum \limits_{t=1}^{n}}{d_{t-1}^{2}},\hspace{1em}{E_{}}[{l_{\sigma \sigma }}]=-\frac{2n}{{\sigma ^{2}}},\\ {} \displaystyle {E_{}}[{l_{\mu \sigma }}]& \displaystyle =& \displaystyle {E_{}}[{l_{\mu {\rho _{1}}}}]={E_{}}[{l_{\mu {\rho _{2}}}}]={E_{}}[{l_{\sigma {\rho _{1}}}}]={E_{}}[{l_{\sigma {\rho _{2}}}}]=0,\\ {} \displaystyle {E_{}}[{l_{{\rho _{1}}{\rho _{1}}}}]& \displaystyle =& \displaystyle -{\sum \limits_{t=1}^{n}}{\sum \limits_{k=1}^{t-1}}{({\varphi _{1}}(k))^{2}},\\ {} \displaystyle {E_{}}[{l_{{\rho _{1}}{\rho _{2}}}}]& \displaystyle =& \displaystyle -{\sum \limits_{t=1}^{n}}{\sum \limits_{k=2}^{t-1}}{\varphi _{1}}(k){\varphi _{1}}(k-1),\\ {} \displaystyle {E_{}}[{l_{{\rho _{2}}{\rho _{2}}}}]& \displaystyle =& \displaystyle -{\sum \limits_{t=1}^{n}}{\sum \limits_{k=1}^{t-2}}{({\varphi _{1}}(k))^{2}}.\end{array}\]
Lemma A.5.
The third derivatives of the conditional log-likelihood function (5) for the Gaussian $\mathrm{MA}(2)$ model are given by
\[\begin{aligned}{}{l_{\mu \mu \mu }}& =0,\hspace{1em}{l_{\mu \sigma \sigma }}=\frac{6}{{\sigma ^{4}}}{\sum \limits_{t=1}^{n}}{d_{t-1}}{\varepsilon _{t}},\hspace{1em}{l_{\mu {\rho _{1}}{\rho _{1}}}}=\frac{1}{{\sigma ^{2}}}{\sum \limits_{t=1}^{n}}\left({d_{t-1}}\frac{{\partial ^{2}}{\varepsilon _{t}}}{\partial {\rho _{1}^{2}}}+{\varepsilon _{t}}\frac{{\partial ^{2}}{d_{t-1}}}{\partial {\rho _{1}^{2}}}+2\frac{\partial {\varepsilon _{t}}}{\partial {\rho _{1}}}\frac{\partial {d_{t-1}}}{\partial {\rho _{1}}}\right),\\ {} {l_{\mu {\rho _{2}}{\rho _{2}}}}& =\frac{1}{{\sigma ^{2}}}{\sum \limits_{t=1}^{n}}\left({d_{t-1}}\frac{{\partial ^{2}}{\varepsilon _{t}}}{\partial {\rho _{2}^{2}}}+{\varepsilon _{t}}\frac{{\partial ^{2}}{d_{t-1}}}{\partial {\rho _{2}^{2}}}+2\frac{\partial {\varepsilon _{t}}}{\partial {\rho _{2}}}\frac{\partial {d_{t-1}}}{\partial {\rho _{2}}}\right),\\ {} {l_{\mu {\rho _{1}}{\rho _{2}}}}& =\frac{1}{{\sigma ^{2}}}{\sum \limits_{t=1}^{n}}\left(\frac{\partial {d_{t-1}}}{\partial {\rho _{1}}}\frac{\partial {\varepsilon _{t}}}{\partial {\rho _{2}}}+\frac{\partial {\varepsilon _{t}}}{\partial {\rho _{1}}}\frac{\partial {d_{t-1}}}{\partial {\rho _{2}}}+{\varepsilon _{t}}\frac{{\partial ^{2}}{d_{t-1}}}{\partial {\rho _{1}}\partial {\rho _{2}}}+{d_{t-1}}\frac{{\partial ^{2}}{\varepsilon _{t}}}{\partial {\rho _{1}}\partial {\rho _{2}}}\right),\end{aligned}\]
\[\begin{aligned}{}{l_{\sigma \mu \mu }}& =\frac{2}{{\sigma ^{3}}}{\sum \limits_{t=1}^{n}}{d_{t-1}^{2}},\hspace{1em}{l_{\sigma \sigma \sigma }}=-\frac{2n}{{\sigma ^{3}}}+\frac{12}{{\sigma ^{5}}}{\sum \limits_{t=1}^{n}}{\varepsilon _{t}^{2}},\hspace{1em}{l_{\sigma {\rho _{1}}{\rho _{1}}}}=\frac{2}{{\sigma ^{3}}}{\sum \limits_{t=1}^{n}}\left\{{\left(\frac{\partial {\varepsilon _{t}}}{\partial {\rho _{1}}}\right)^{2}}+{\varepsilon _{t}}\frac{{\partial ^{2}}{\varepsilon _{t}}}{\partial {\rho _{1}^{2}}}\right\},\\ {} {l_{\sigma {\rho _{2}}{\rho _{2}}}}& =\frac{2}{{\sigma ^{3}}}{\sum \limits_{t=1}^{n}}\left\{{\left(\frac{\partial {\varepsilon _{t}}}{\partial {\rho _{2}}}\right)^{2}}+{\varepsilon _{t}}\frac{{\partial ^{2}}{\varepsilon _{t}}}{\partial {\rho _{2}^{2}}}\right\},\hspace{1em}{l_{\sigma {\rho _{1}}{\rho _{2}}}}=\frac{2}{{\sigma ^{3}}}{\sum \limits_{t=1}^{n}}\left(\frac{\partial {\varepsilon _{t}}}{\partial {\rho _{1}}}\frac{\partial {\varepsilon _{t}}}{\partial {\rho _{2}}}+{\varepsilon _{t}}\frac{{\partial ^{2}}{\varepsilon _{t}}}{\partial {\rho _{1}}\partial {\rho _{2}}}\right),\end{aligned}\]
\[\begin{aligned}{}{l_{{\rho _{1}}\mu \mu }}& =-\frac{2}{{\sigma ^{2}}}{\sum \limits_{t=1}^{n}}{d_{t-1}}\frac{\partial {d_{t-1}}}{\partial {\rho _{1}}}=\frac{2}{{\sigma ^{2}}}{\sum \limits_{t=1}^{n}}{\sum \limits_{k=1}^{t-1}}{d_{t-1}}{d_{t-k-1}}{\varphi _{1}}(k),\\ {} {l_{{\rho _{1}}\sigma \sigma }}& =-\frac{6}{{\sigma ^{4}}}{\sum \limits_{t=1}^{n}}{\varepsilon _{t}}\frac{\partial {\varepsilon _{t}}}{\partial {\rho _{1}}},\hspace{1em}{l_{{\rho _{1}}{\rho _{1}}{\rho _{1}}}}=-\frac{1}{{\sigma ^{2}}}{\sum \limits_{t=1}^{n}}\left(3\frac{{\partial ^{2}}{\varepsilon _{t}}}{\partial {\rho _{1}^{2}}}\frac{\partial {\varepsilon _{t}}}{\partial {\rho _{1}}}+{\varepsilon _{t}}\frac{{\partial ^{3}}{\varepsilon _{t}}}{\partial {\rho _{1}^{3}}}\right),\\ {} {l_{{\rho _{1}}{\rho _{2}}{\rho _{2}}}}& =-\frac{1}{{\sigma ^{2}}}{\sum \limits_{t=1}^{n}}\left(\frac{\partial {\varepsilon _{t}}}{\partial {\rho _{1}}}\frac{{\partial ^{2}}{\varepsilon _{t}}}{\partial {\rho _{2}^{2}}}+2\frac{\partial {\varepsilon _{t}}}{\partial {\rho _{2}}}\frac{{\partial ^{2}}{\varepsilon _{t}}}{\partial {\rho _{1}}\partial {\rho _{2}}}+{\varepsilon _{t}}\frac{{\partial ^{3}}{\varepsilon _{t}}}{\partial {\rho _{1}}\partial {\rho _{2}^{2}}}\right),\\ {} {l_{{\rho _{1}}{\rho _{1}}{\rho _{2}}}}& =-\frac{1}{{\sigma ^{2}}}{\sum \limits_{t=1}^{n}}\left(\frac{\partial {\varepsilon _{t}}}{\partial {\rho _{2}}}\frac{{\partial ^{2}}{\varepsilon _{t}}}{\partial {\rho _{1}^{2}}}+2\frac{\partial {\varepsilon _{t}}}{\partial {\rho _{1}}}\frac{{\partial ^{2}}{\varepsilon _{t}}}{\partial {\rho _{1}}\partial {\rho _{2}}}+{\varepsilon _{t}}\frac{{\partial ^{3}}{\varepsilon _{t}}}{\partial {\rho _{1}^{2}}\partial {\rho _{2}}}\right),\end{aligned}\]
\[\begin{aligned}{}{l_{{\rho _{2}}\mu \mu }}& =-\frac{2}{{\sigma ^{2}}}{\sum \limits_{t=1}^{n}}{d_{t-1}}\frac{\partial {d_{t-1}}}{\partial {\rho _{2}}}=\frac{2}{{\sigma ^{2}}}{\sum \limits_{t=1}^{n}}{\sum \limits_{k=1}^{t-2}}{d_{t-1}}{d_{t-k-2}}{\varphi _{1}}(k),\\ {} {l_{{\rho _{2}}\sigma \sigma }}& =-\frac{6}{{\sigma ^{4}}}{\sum \limits_{t=1}^{n}}{\varepsilon _{t}}\frac{\partial {\varepsilon _{t}}}{\partial {\rho _{2}}},\hspace{1em}{l_{{\rho _{2}}{\rho _{2}}{\rho _{2}}}}=-\frac{1}{{\sigma ^{2}}}{\sum \limits_{t=1}^{n}}\left(3\frac{{\partial ^{2}}{\varepsilon _{t}}}{\partial {\rho _{2}^{2}}}\frac{\partial {\varepsilon _{t}}}{\partial {\rho _{2}}}+{\varepsilon _{t}}\frac{{\partial ^{3}}{\varepsilon _{t}}}{\partial {\rho _{2}^{3}}}\right).\end{aligned}\]
We easily obtain the third derivatives in Lemma A.5 by Lemma A.3 and the chain rule. The following three propositions are derived from the expectations of the third derivatives and the products of the first and second derivatives of the conditional log-likelihood function (5). The three propositions correspond to the unknown parameters μ, σ, ${\rho _{1}}$ and ${\rho _{2}}$, respectively.
Proposition A.6.
The expectations of the third derivatives and the products of the first and second derivatives of the conditional log-likelihood function (5) for the Gaussian $\mathrm{MA}(2)$ model are given by
\[\begin{array}{r@{\hskip10.0pt}c@{\hskip10.0pt}l}\displaystyle {\nu _{\mu \mu \mu }}& \displaystyle =& \displaystyle {\nu _{\mu \sigma \sigma }}={\nu _{\mu {\rho _{1}}{\rho _{1}}}}={\nu _{\mu {\rho _{2}}{\rho _{2}}}}={\nu _{\mu {\rho _{1}}{\rho _{2}}}}=0,\\ {} \displaystyle {\nu _{\mu \mu ,\mu }}& \displaystyle =& \displaystyle {\nu _{\mu \sigma ,\sigma }}={\nu _{\mu {\rho _{1}},{\rho _{1}}}}={\nu _{\mu {\rho _{1}},{\rho _{2}}}}={\nu _{\mu {\rho _{2}},{\rho _{1}}}}={\nu _{\mu {\rho _{2}},{\rho _{2}}}}=0.\end{array}\]
Proposition A.7.
The expectations of the third derivatives and the products of the first and second derivatives of the conditional log-likelihood function (5) for the Gaussian $\mathrm{MA}(2)$ model are given by
\[\begin{array}{r@{\hskip10.0pt}c@{\hskip10.0pt}l}\displaystyle {\nu _{\sigma \mu \mu }}& \displaystyle =& \displaystyle \frac{2}{{\sigma ^{3}}}{\sum \limits_{t=1}^{n}}{d_{t-1}^{2}},\hspace{1em}{\nu _{\sigma \sigma \sigma }}=\frac{10n}{{\sigma ^{3}}},\\ {} \displaystyle {\nu _{\sigma {\rho _{1}}{\rho _{1}}}}& \displaystyle =& \displaystyle \frac{2}{\sigma }{i_{{\rho _{1}}{\rho _{1}}}},\hspace{1em}{\nu _{\sigma {\rho _{2}}{\rho _{2}}}}=\frac{2}{\sigma }{i_{{\rho _{2}}{\rho _{2}}}},\hspace{1em}{\nu _{\sigma {\rho _{1}}{\rho _{2}}}}=\frac{2}{\sigma }{i_{{\rho _{1}}{\rho _{2}}}},\\ {} \displaystyle {\nu _{\sigma \mu ,\mu }}& \displaystyle =& \displaystyle -\frac{2}{{\sigma ^{3}}}{\sum \limits_{t=1}^{n}}{d_{t-1}^{2}},\hspace{1em}{\nu _{\sigma \sigma ,\sigma }}=-\frac{6n}{{\sigma ^{3}}},\hspace{1em}{\nu _{\sigma {\rho _{1}},{\rho _{1}}}}=-\frac{2}{\sigma }{i_{{\rho _{1}}{\rho _{1}}}},\\ {} \displaystyle {\nu _{\sigma {\rho _{2}},{\rho _{2}}}}& \displaystyle =& \displaystyle -\frac{2}{\sigma }{i_{{\rho _{2}}{\rho _{2}}}},\hspace{1em}{\nu _{\sigma {\rho _{1}},{\rho _{2}}}}=-\frac{2}{\sigma }{i_{{\rho _{1}}{\rho _{2}}}},\hspace{1em}{\nu _{\sigma {\rho _{2}},{\rho _{1}}}}=-\frac{2}{\sigma }{i_{{\rho _{1}}{\rho _{2}}}}.\end{array}\]
Proposition A.8.
The expectations of the third derivatives and the products of the first and second derivatives of the conditional log-likelihood function (5) for the Gaussian $\mathrm{MA}(2)$ model are given by
(A.17)
\[\begin{array}{r@{\hskip0pt}l@{\hskip0pt}r@{\hskip0pt}l}& \displaystyle {\nu _{{\rho _{1}}\mu \mu }}=\frac{2}{{\sigma ^{2}}}{\sum \limits_{t=1}^{n}}{\sum \limits_{k=1}^{t-1}}{d_{t-1}}{d_{t-k-1}}{\varphi _{1}}(k),\hspace{2em}& & \displaystyle {\nu _{{\rho _{2}}\mu \mu }}=\frac{2}{{\sigma ^{2}}}{\sum \limits_{t=1}^{n}}{\sum \limits_{k=1}^{t-2}}{d_{t-1}}{d_{t-k-2}}{\varphi _{1}}(k),\end{array}\]
(A.18)
\[\begin{array}{r@{\hskip0pt}l@{\hskip0pt}r@{\hskip0pt}l}& \displaystyle {\nu _{{\rho _{1}}\sigma \sigma }}=0,\hspace{2em}& & \displaystyle {\nu _{{\rho _{2}}\sigma \sigma }}=0,\end{array}\]
(A.19)
\[\begin{array}{r@{\hskip0pt}l@{\hskip0pt}r@{\hskip0pt}l}& \displaystyle {\nu _{{\rho _{1}}{\rho _{1}}{\rho _{1}}}}=3{\sum \limits_{t=1}^{n}}{S_{1,t}},\hspace{2em}& & \displaystyle {\nu _{{\rho _{2}}{\rho _{1}}{\rho _{1}}}}={\sum \limits_{t=1}^{n}}({S_{0,t-1}}+2{S_{2,t}}),\end{array}\]
(A.20)
\[\begin{array}{r@{\hskip0pt}l@{\hskip0pt}r@{\hskip0pt}l}& \displaystyle {\nu _{{\rho _{1}}{\rho _{2}}{\rho _{2}}}}={\sum \limits_{t=1}^{n}}({S_{3,t}}+2{S_{1,t-1}}),\hspace{2em}& & \displaystyle {\nu _{{\rho _{2}}{\rho _{2}}{\rho _{2}}}}=3{\sum \limits_{t=1}^{n}}{S_{2,t-1}},\end{array}\]
(A.21)
\[\begin{array}{r@{\hskip0pt}l@{\hskip0pt}r@{\hskip0pt}l}& \displaystyle {\nu _{{\rho _{1}}\mu ,\mu }}\hspace{-0.1667em}=\hspace{-0.1667em}-\frac{2}{{\sigma ^{2}}}{\sum \limits_{t=1}^{n}}{\sum \limits_{k=1}^{t-1}}{d_{t-1}}{d_{t-k-1}}{\varphi _{1}}(k),\hspace{2em}& & \displaystyle {\nu _{{\rho _{2}}\mu ,\mu }}\hspace{-0.1667em}=\hspace{-0.1667em}-\frac{2}{{\sigma ^{2}}}{\sum \limits_{t=1}^{n}}{\sum \limits_{k=1}^{t-2}}{d_{t-1}}{d_{t-k-2}}{\varphi _{1}}(k),\end{array}\]
(A.22)
\[\begin{array}{r@{\hskip0pt}l@{\hskip0pt}r@{\hskip0pt}l}& \displaystyle {\nu _{{\rho _{1}}\sigma ,\sigma }}=0,\hspace{2em}& & \displaystyle {\nu _{{\rho _{2}}\sigma ,\sigma }}=0,\end{array}\]
(A.23)
\[\begin{array}{r@{\hskip0pt}l@{\hskip0pt}r@{\hskip0pt}l}& \displaystyle {\nu _{{\rho _{1}}{\rho _{1}},{\rho _{1}}}}={\sum \limits_{t=1}^{n}}({T_{0,0,t}}-{S_{1,t}}),\hspace{2em}& & \displaystyle {\nu _{{\rho _{2}}{\rho _{1}},{\rho _{1}}}}={\sum \limits_{t=1}^{n}}({T_{1,0,t}}-{S_{2,t}}),\end{array}\]
(A.24)
\[\begin{array}{r@{\hskip0pt}l@{\hskip0pt}r@{\hskip0pt}l}& \displaystyle {\nu _{{\rho _{1}}{\rho _{2}},{\rho _{2}}}}={\sum \limits_{t=1}^{n}}({T_{1,1,t}}-{S_{1,t-1}}),\hspace{2em}& & \displaystyle {\nu _{{\rho _{2}}{\rho _{2}},{\rho _{2}}}}={\sum \limits_{t=1}^{n}}({T_{0,1,t-1}}-{S_{2,t-1}}),\end{array}\]
(A.25)
\[\begin{array}{r@{\hskip0pt}l@{\hskip0pt}r@{\hskip0pt}l}& \displaystyle {\nu _{{\rho _{1}}{\rho _{1}},{\rho _{2}}}}={\sum \limits_{t=1}^{n}}({T_{0,1,t}}-{S_{0,t-1}}),\hspace{2em}& & \displaystyle {\nu _{{\rho _{2}}{\rho _{2}},{\rho _{1}}}}={\sum \limits_{t=1}^{n}}({T_{0,0,t-1}}-{S_{3,t}}),\end{array}\]
where
(A.26)
\[\begin{array}{r@{\hskip10.0pt}c@{\hskip10.0pt}l}\displaystyle {S_{p,t}}& \displaystyle =& \displaystyle {\sum \limits_{k=1}^{t-p-1}}{\varphi _{1}}(k+p){\varphi _{2}}(k+1),\end{array}\]
(A.27)
\[\begin{array}{r@{\hskip10.0pt}c@{\hskip10.0pt}l}\displaystyle {T_{p,q,t}}& \displaystyle =& \displaystyle -{\sum \limits_{k=1}^{t-p-1}}{\sum \limits_{m=1}^{t-k-1-p-q}}{\varphi _{1}}(k){\varphi _{1}}(k+m+p+q){\varphi _{1}}(m)\\ {} & & \displaystyle \hspace{2em}-{\sum \limits_{k=1}^{t-1}}{\sum \limits_{m=1}^{t-k-q-1}}{\varphi _{1}}(k){\varphi _{1}}(k+m-p+q){\varphi _{1}}(m).\end{array}\]

B Proof of lemmas and propositions

B.1 Proof of Lemma 2.2

We show that the solution of
\[ \left\{\begin{array}{l@{\hskip10.0pt}l}{\varepsilon _{t}}={Y_{t}}-\mu -{\rho _{1}}{\varepsilon _{t-1}}-{\rho _{2}}{\varepsilon _{t-2}}\hspace{1em}& \hspace{1em}(t\ge 1),\\ {} {\varepsilon _{-1}}={\varepsilon _{0}}=0\hspace{1em}& \hspace{1em}\end{array}\right.\]
coincides with (6) by a mathematical induction. For $t=-1$ and 0, the values of the both sides of (6) are 0. Then, (6) holds when $t=-1$, 0.
We assume that
\[\begin{aligned}{}{\varepsilon _{s-2}}& ={\sum \limits_{k=0}^{s-3}}\left({\sum \limits_{l=0}^{s-k-3}}{\lambda _{1}^{l}}{\lambda _{2}^{s-k-l-3}}\right)({Y_{k+1}}-\mu ),\\ {} {\varepsilon _{s-1}}& ={\sum \limits_{k=0}^{s-2}}\left({\sum \limits_{l=0}^{s-k-2}}{\lambda _{1}^{l}}{\lambda _{2}^{s-k-l-2}}\right)({Y_{k+1}}-\mu )\end{aligned}\]
hold for $s\ge 1$. By (A.1) and (A.2), we have
\[\begin{aligned}{}{\varepsilon _{s}}& =({Y_{s}}-\mu )-{\rho _{1}}{\varepsilon _{s-1}}-{\rho _{2}}{\varepsilon _{s-2}}\\ {} & =({Y_{s}}-\mu )-{\rho _{1}}{\sum \limits_{k=0}^{s-2}}\left({\sum \limits_{l=0}^{s-k-2}}{\lambda _{1}^{l}}{\lambda _{2}^{s-k-l-2}}\right)({Y_{k+1}}-\mu )\\ {} & \hspace{2em}-{\rho _{2}}{\sum \limits_{k=0}^{s-2}}\left({\sum \limits_{l=0}^{s-k-3}}{\lambda _{1}^{l}}{\lambda _{2}^{s-k-l-3}}\right)({Y_{k+1}}-\mu )\\ {} & =({Y_{s}}-\mu )+{\sum \limits_{k=0}^{s-2}}\left\{{\sum \limits_{l=0}^{s-k-3}}{\lambda _{1}^{l}}{\lambda _{2}^{s-k-l-1}}-{\rho _{1}}{\lambda _{1}^{s-k-2}}\right\}({Y_{k+1}}-\mu )\\ {} & =({Y_{s}}-\mu )+{\sum \limits_{k=0}^{s-2}}\left({\sum \limits_{l=0}^{s-k-1}}{\lambda _{1}^{l}}{\lambda _{2}^{s-k-l-1}}\right)({Y_{k+1}}-\mu ).\end{aligned}\]
Therefore, (6) holds for all $t\ge -1$.  □

B.2 Proof of Lemma A.1

First, we show (A.8). We see that ${\varepsilon _{t}}$ can be written using ${\varphi _{1}}$ as
(B.1)
\[ {\varepsilon _{t}}={\sum \limits_{k=0}^{t-1}}{\varphi _{1}}(t-k)({Y_{k+1}}-\mu ).\]
Thus,
\[ \frac{\partial {\varepsilon _{t}}}{\partial \mu }=-{\sum \limits_{k=0}^{t-1}}{\varphi _{1}}(t-k).\]
Therefore, (A.8) follows by (A.5).
Next, we show (A.9). We have, by (A.4) and (B.1),
(B.2)
\[\begin{aligned}{}-2\frac{\partial {\varepsilon _{t}}}{\partial {\rho _{1}}}& ={\sum \limits_{k=0}^{t-1}}{\varphi _{2}}(t-k)({Y_{k+1}}-\mu )\\ {} & ={\sum \limits_{k=0}^{t-1}}{\varphi _{2}}(t-k)({\varepsilon _{k+1}}+{\rho _{1}}{\varepsilon _{k}}+{\rho _{2}}{\varepsilon _{k-1}})\\ {} & ={\sum \limits_{k=1}^{t-2}}({\varphi _{2}}(t-k+1)+{\rho _{1}}{\varphi _{2}}(t-k)+{\rho _{2}}{\varphi _{2}}(t-k-1)){\varepsilon _{k}}\\ {} & \hspace{2em}+({\rho _{1}}{\varphi _{2}}(1)+{\varphi _{2}}(2)){\varepsilon _{t-1}}+{\varphi _{2}}(1){\varepsilon _{t}}\end{aligned}\]
since ${\varepsilon _{-1}}={\varepsilon _{0}}=0$. We note that ${\varphi _{2}}(2)=-2(\frac{\partial {\lambda _{1}}}{\partial {\rho _{1}}}+\frac{\partial {\lambda _{2}}}{\partial {\rho _{1}}})=2$ and ${\varphi _{2}}(1)=0$. Thus, the function does not contain ${\varepsilon _{t}}$ in the linear combination. We know by (A.2) that
(B.3)
\[ {\varphi _{1}}(k+1)+{\rho _{1}}{\varphi _{1}}(k)+{\rho _{2}}{\varphi _{1}}(k-1)=0.\]
Therefore,
\[ {\varphi _{2}}(k+1)+{\rho _{1}}{\varphi _{2}}(k)+{\rho _{2}}{\varphi _{2}}(k-1)=2{\varphi _{1}}(k),\]
and then the coefficient of ${\varepsilon _{k}}\hspace{2.5pt}(1\le k\le t-1)$ in (B.2) is $2{\varphi _{1}}(t-k)$.
Finally, we prove (A.10). By (B.3), we have
\[ \frac{\partial {\varphi _{1}}(k+1)}{\partial {\rho _{2}}}+{\rho _{1}}\frac{\partial {\varphi _{1}}(k)}{\partial {\rho _{2}}}+{\rho _{2}}\frac{\partial {\varphi _{1}}(k-1)}{\partial {\rho _{2}}}=-{\varphi _{1}}(k-1),\]
and then
\[\begin{aligned}{}\frac{\partial {\varepsilon _{t}}}{\partial {\rho _{2}}}& ={\sum \limits_{k=1}^{t-2}}\left(\frac{\partial {\varphi _{1}}(t-k+1)}{\partial {\rho _{2}}}+{\rho _{1}}\frac{\partial {\varphi _{1}}(t-k)}{\partial {\rho _{2}}}+{\rho _{2}}\frac{\partial {\varphi _{1}}(t-k-1)}{\partial {\rho _{2}}}\right){\varepsilon _{k}}\\ {} & \hspace{1em}+\left({\rho _{1}}\frac{\partial {\varphi _{1}}(1)}{\partial {\rho _{2}}}+\frac{\partial {\varphi _{1}}(2)}{\partial {\rho _{2}}}\right){\varepsilon _{t-1}}+\frac{\partial {\varphi _{1}}(1)}{\partial {\rho _{2}}}{\varepsilon _{t}}=-{\sum \limits_{k=1}^{t-2}}{\varphi _{1}}(t-k-1){\varepsilon _{k}}\end{aligned}\]
by the same way as in (B.2).

B.3 Proof of Lemma A.2

First, we show (A.11). Eqs. (A.8) and (A.9) yield
\[ \frac{\partial {d_{t-1}}}{\partial {\rho _{1}}}=-\frac{{\partial ^{2}}{\varepsilon _{t}}}{\partial {\rho _{1}}\partial \mu }={\sum \limits_{k=1}^{t-1}}{\varphi _{1}}(k)\frac{\partial {\varepsilon _{t-k}}}{\partial \mu }=-{\sum \limits_{k=1}^{t-1}}{\varphi _{1}}(k){d_{t-k-1}}.\]
By applying the same way to (A.10), we have
\[ \frac{\partial {d_{t-1}}}{\partial {\rho _{2}}}=-\frac{{\partial ^{2}}{\varepsilon _{t}}}{\partial {\rho _{2}}\partial \mu }=-\frac{\partial }{\partial \mu }\frac{\partial {\varepsilon _{t-1}}}{\partial {\rho _{1}}}=\frac{\partial {d_{t-2}}}{\partial {\rho _{1}}}.\]
Then, we obtain (A.12).
Next, we show (A.13). First, we express ${\varphi _{2}}$ in a different form. By (A.1), we have
\[ \frac{\partial \Delta }{\partial {\rho _{1}}}=\frac{{\rho _{1}}}{\sqrt{{\rho _{1}^{2}}-4{\rho _{2}}}}=\frac{{\rho _{1}}}{\Delta },\hspace{1em}\frac{\partial {\lambda _{1}}}{\partial {\rho _{1}}}=\frac{-\Delta +{\rho _{1}}}{2\Delta }=-\frac{{\lambda _{1}}}{\Delta },\hspace{1em}\frac{\partial {\lambda _{2}}}{\partial {\rho _{1}}}=-\frac{\Delta +{\rho _{1}}}{2\Delta }=\frac{{\lambda _{2}}}{\Delta }.\]
We note that we do not have any algebraic relationship among unknown parameters $(\mu ,\sigma ,{\rho _{1}},{\rho _{2}})$ such as $\Delta ={\rho _{1}^{2}}-4{\rho _{2}}=0$, $\mu =0$, etc., which makes a different problem. Using these derivatives, we have
(B.4)
\[\begin{array}{r@{\hskip10.0pt}c@{\hskip10.0pt}l}\displaystyle {\varphi _{1}}(k)& \displaystyle =& \displaystyle \frac{{\lambda _{1}^{k}}-{\lambda _{2}^{k}}}{\Delta },\end{array}\]
(B.5)
\[\begin{array}{r@{\hskip10.0pt}c@{\hskip10.0pt}l}\displaystyle {\varphi _{2}}(k)& \displaystyle =& \displaystyle \frac{2}{{\Delta ^{2}}}\left\{{\rho _{1}}{\varphi _{1}}(k)+k({\lambda _{1}^{k}}+{\lambda _{2}^{k}})\right\}.\end{array}\]
This implies
\[ 2{\sum \limits_{l=1}^{k}}{\varphi _{1}}(l){\varphi _{1}}(k+1-l)={\varphi _{2}}(k+1)\]
and then, we get with (A.4)
\[\begin{array}{r@{\hskip10.0pt}c@{\hskip10.0pt}l}\displaystyle \frac{\partial }{\partial {\rho _{1}}}\frac{\partial {\varepsilon _{t}}}{\partial {\rho _{1}}}& \displaystyle =& \displaystyle -{\sum \limits_{k=1}^{t-1}}\left\{\frac{\partial {\varphi _{1}}(k)}{\partial {\rho _{1}}}{\varepsilon _{t-k}}-{\varphi _{1}}(k){\sum \limits_{l=1}^{t-k-1}}{\varphi _{1}}(l){\varepsilon _{t-k-l}}\right\}\\ {} & \displaystyle =& \displaystyle {\sum \limits_{s=1}^{t-2}}\left\{-\frac{\partial {\varphi _{1}}(s+1)}{\partial {\rho _{1}}}+{\sum \limits_{k=1}^{s}}{\varphi _{1}}(k){\varphi _{1}}(s+1-k)\right\}{\varepsilon _{t-s-1}}\\ {} & \displaystyle =& \displaystyle {\sum \limits_{s=1}^{t-2}}{\varphi _{2}}(s+1){\varepsilon _{t-s-1}}.\end{array}\]
This gives (A.13), and we also obtain (A.14) and (A.15) by (A.10).

B.4 Proof of Proposition A.4

We show only (A.16) because the remaining equations are trivial by Lemma A.3. The reason why they trivial is that $\frac{\partial {d_{t-1}}}{\partial {\rho _{1}}}$ in ${l_{\mu {\rho _{1}}}}$ is not a function of ${\varepsilon _{s}}\hspace{2.5pt}(1\le s\le t)$. As for ${l_{\sigma {\rho _{1}}}}$, it contains $\frac{\partial {\varepsilon _{t}}}{\partial {\rho _{1}}}$, but $\frac{\partial {\varepsilon _{t}}}{\partial {\rho _{1}}}$ is a linear combination of ${\varepsilon _{1}},\dots ,{\varepsilon _{t-1}}$ by (A.9). The expectation of ${\varepsilon _{t}}{\varepsilon _{s}}\hspace{2.5pt}(1\le s\le t-1)$ vanishes. So, we only consider (A.16) carefully.
\[\begin{aligned}{}{E_{}}[{l_{{\rho _{1}}{\rho _{2}}}}]& =-\frac{1}{{\sigma ^{2}}}{E_{}}\left[{\sum \limits_{t=1}^{n}}\frac{\partial {\varepsilon _{t}}}{\partial {\rho _{1}}}\frac{\partial {\varepsilon _{t}}}{\partial {\rho _{2}}}\right]\\ {} & =-\frac{1}{{\sigma ^{2}}}{E_{}}\left[{\sum \limits_{t=1}^{n}}{\sum \limits_{k=1}^{t-1}}{\sum \limits_{m=1}^{t-2}}{\varphi _{1}}(k){\varepsilon _{t-k}}{\varphi _{1}}(m){\varepsilon _{t-m-1}}\right]\\ {} & =-\frac{1}{{\sigma ^{2}}}{E_{}}\left[{\sum \limits_{t=1}^{n}}{\sum \limits_{k=2}^{t-1}}{\varphi _{1}}(k){\varphi _{1}}(k-1){\varepsilon _{t-k}^{2}}\right]=-{\sum \limits_{t=1}^{n}}{\sum \limits_{k=2}^{t-1}}{\varphi _{1}}(k){\varphi _{1}}(k-1).\end{aligned}\]

B.5 Proof of Proposition A.6

The log-likelihood functions appearing in the first line in Proposition A.6 are linear functions of ${\varepsilon _{k}}$. Thus, the expectations vanish because the expectation for ${\varepsilon _{k}}$ is 0. For the expectations in the second line, we see that the first or the cubic power of ${\varepsilon _{k}}$ appears in the product of the first and the second derivatives of the log-likelihood functions as in
\[\begin{aligned}{}{\nu _{\mu \mu ,\mu }}& ={E_{}}[{l_{\mu \mu }}{l_{\mu }}]=-\frac{1}{{\sigma ^{4}}}{E_{}}\left[{\sum \limits_{t=1}^{n}}{\sum \limits_{s=1}^{n}}{d_{t-1}^{2}}{\varepsilon _{s}}{d_{s-1}}\right]=0,\\ {} {\nu _{\mu \sigma ,\sigma }}& =-\frac{2}{{\sigma ^{6}}}{E_{}}\left[{\sum \limits_{t=1}^{n}}{\sum \limits_{s=1}^{n}}{d_{t-1}}{\varepsilon _{t}}{\varepsilon _{s}^{2}}\right]=0.\end{aligned}\]
We also see that the functions in the expectations of the following values are expressed as the product of three linear combinations of ${\varepsilon _{k}}$. At least, the first power or the cubic power of ${\varepsilon _{k}}$ is included in the expression. Therefore, we have
\[\begin{aligned}{}{\nu _{\mu {\rho _{1}},{\rho _{1}}}}& =-\frac{1}{{\sigma ^{4}}}{E_{}}\left[{\sum \limits_{t=1}^{n}}{\sum \limits_{s=1}^{n}}{\varepsilon _{t}}\frac{\partial {d_{t-1}}}{\partial {\rho _{1}}}{\varepsilon _{s}}\frac{\partial {\varepsilon _{s}}}{\partial {\rho _{1}}}+{\sum \limits_{t=1}^{n}}{\sum \limits_{s=1}^{n}}{d_{t-1}}\frac{\partial {\varepsilon _{t}}}{\partial {\rho _{1}}}{\varepsilon _{s}}\frac{\partial {\varepsilon _{s}}}{\partial {\rho _{1}}}\right]=0,\\ {} {\nu _{\mu {\rho _{1}},{\rho _{2}}}}& ={\nu _{\mu {\rho _{2}},{\rho _{1}}}}={\nu _{\mu {\rho _{2}},{\rho _{2}}}}=0.\end{aligned}\]

B.6 Proof of Proposition A.7

Unlike Proposition A.6, the product of the first and the second derivative of the log-likelihood function is expressed as the sum of the second or the fourth power of ${\varepsilon _{k}}$. The expectations are obtained by presenting courteously the power of ${\varepsilon _{k}}$ as
\[\begin{aligned}{}{\nu _{\sigma \mu ,\mu }}& =-\frac{2}{{\sigma ^{5}}}{E_{}}\left[{\sum \limits_{t=1}^{n}}{\sum \limits_{s=1}^{n}}{d_{t-1}}{\varepsilon _{t}}{d_{s-1}}{\varepsilon _{s}}\right]=-\frac{2}{{\sigma ^{5}}}{E_{}}\left[{\sum \limits_{t=1}^{n}}{d_{t-1}^{2}}{\varepsilon _{t}^{2}}\right]=-\frac{2}{{\sigma ^{3}}}{\sum \limits_{t=1}^{n}}{d_{t-1}^{2}},\\ {} {\nu _{\sigma \sigma ,\sigma }}& =-\frac{{n^{2}}}{{\sigma ^{3}}}+\frac{n\cdot n{\sigma ^{2}}}{{\sigma ^{5}}}+\frac{3n\cdot n{\sigma ^{2}}}{{\sigma ^{5}}}-\frac{3}{{\sigma ^{7}}}{E_{}}\left[{\sum \limits_{t=1}^{n}}{\sum \limits_{s=1}^{n}}{\varepsilon _{t}^{2}}{\varepsilon _{s}^{2}}\right]\\ {} & =\frac{3{n^{2}}}{{\sigma ^{3}}}-\frac{3}{{\sigma ^{3}}}n(n+2)=-\frac{6n}{{\sigma ^{3}}}.\end{aligned}\]
Similarly, the following expectations are given by the second and the forth moments. The resulting values are expressed using ${i_{{\rho _{1}}{\rho _{1}}}}$, ${i_{{\rho _{1}}{\rho _{2}}}}$, and ${i_{{\rho _{2}}{\rho _{2}}}}$ which are components in the Fisher information matrix.
\[\begin{aligned}{}{\nu _{\sigma {\rho _{1}},{\rho _{1}}}}& =-\frac{2}{{\sigma ^{5}}}{E_{}}\left[{\sum \limits_{t=1}^{n}}{\sum \limits_{s=1}^{n}}{\varepsilon _{t}}\frac{\partial {\varepsilon _{t}}}{\partial {\rho _{1}}}{\varepsilon _{s}}\frac{\partial {\varepsilon _{s}}}{\partial {\rho _{1}}}\right]=-\frac{2}{{\sigma ^{5}}}{E_{}}\left[{\sum \limits_{t=1}^{n}}{\varepsilon _{t}^{2}}{\left(\frac{\partial {\varepsilon _{t}}}{\partial {\rho _{1}}}\right)^{2}}\right]\\ {} & =-\frac{2}{{\sigma ^{5}}}{E_{}}\left[{\sum \limits_{t=1}^{n}}{\sum \limits_{k=1}^{t-1}}{\sum \limits_{m=1}^{t-1}}{\varepsilon _{t}^{2}}{\varphi _{1}}(t-k){\varepsilon _{k}}{\varphi _{1}}(t-m){\varepsilon _{m}}\right]\\ {} & =-\frac{2}{{\sigma ^{5}}}{\sum \limits_{t=1}^{n}}{\sum \limits_{k=1}^{t-1}}{({\varphi _{1}}(t-k))^{2}}{\sigma ^{4}}\\ {} & =-\frac{2}{\sigma }{\sum \limits_{t=1}^{n}}{\sum \limits_{k=1}^{t-1}}{({\varphi _{1}}(k))^{2}}=-\frac{2}{\sigma }{i_{{\rho _{1}}{\rho _{1}}}},\\ {} {\nu _{\sigma {\rho _{2}},{\rho _{2}}}}& =-\frac{2}{\sigma }{\sum \limits_{t=1}^{n}}{\sum \limits_{k=1}^{t-2}}{({\varphi _{1}}(k))^{2}}=-\frac{2}{\sigma }{i_{{\rho _{2}}{\rho _{2}}}},\end{aligned}\]
\[\begin{aligned}{}{\nu _{\sigma {\rho _{1}},{\rho _{2}}}}& =-\frac{2}{{\sigma ^{5}}}{E_{}}\left[{\sum \limits_{t=1}^{n}}{\sum \limits_{s=1}^{n}}{\varepsilon _{t}}\frac{\partial {\varepsilon _{t}}}{\partial {\rho _{1}}}{\varepsilon _{s}}\frac{\partial {\varepsilon _{s}}}{\partial {\rho _{2}}}\right]=-\frac{2}{{\sigma ^{5}}}{E_{}}\left[{\sum \limits_{t=1}^{n}}{\varepsilon _{t}^{2}}\frac{\partial {\varepsilon _{t}}}{\partial {\rho _{1}}}\frac{\partial {\varepsilon _{t}}}{\partial {\rho _{2}}}\right]\\ {} & =-\frac{2}{{\sigma ^{5}}}{E_{}}\left[{\sum \limits_{t=1}^{n}}{\sum \limits_{k=1}^{t-1}}{\sum \limits_{m=1}^{t-2}}{\varepsilon _{t}^{2}}{\varphi _{1}}(t-k){\varepsilon _{k}}{\varphi _{1}}(t-m-1){\varepsilon _{m}}\right]\\ {} & =-\frac{2}{{\sigma ^{5}}}{E_{}}\left[{\sum \limits_{t=1}^{n}}{\sum \limits_{k=1}^{t-2}}{\varepsilon _{t}^{2}}{\varphi _{1}}(t-k){\varphi _{1}}(t-k-1){\varepsilon _{k}^{2}}\right]\\ {} & =-\frac{2}{\sigma }{\sum \limits_{t=1}^{n}}{\sum \limits_{k=2}^{t-1}}{\varphi _{1}}(k){\varphi _{1}}(k-1)=-\frac{2}{\sigma }{i_{{\rho _{1}}{\rho _{2}}}},\\ {} {\nu _{\sigma {\rho _{2}},{\rho _{1}}}}& =-\frac{2}{{\sigma ^{5}}}{E_{}}\left[{\sum \limits_{t=1}^{n}}{\sum \limits_{s=1}^{n}}{\varepsilon _{t}}\frac{\partial {\varepsilon _{t}}}{\partial {\rho _{2}}}{\varepsilon _{s}}\frac{\partial {\varepsilon _{s}}}{\partial {\rho _{1}}}\right]=-\frac{2}{\sigma }{i_{{\rho _{1}}{\rho _{2}}}}.\end{aligned}\]

B.7 Proof of Proposition A.8

Eqs. (A.17) and (A.18) are trivial by the expressions of ${l_{{\rho _{1}}\mu \mu }}$, ${l_{{\rho _{2}}\mu \mu }}$, ${l_{{\rho _{1}}\sigma \sigma }}$, and ${l_{{\rho _{2}}\sigma \sigma }}$. First, we show (A.21). The expectation of ${l_{{\rho _{1}}\mu ,\mu }}$ is expressed as
(B.6)
\[ {\nu _{{\rho _{1}}\mu ,\mu }}=\frac{1}{{\sigma ^{4}}}{E_{}}\left[{\sum \limits_{t=1}^{n}}{\sum \limits_{s=1}^{n}}{\varepsilon _{t}}\frac{\partial {d_{t-1}}}{\partial {\rho _{1}}}{d_{s-1}}{\varepsilon _{s}}\right]+\frac{1}{{\sigma ^{4}}}{E_{}}\left[{\sum \limits_{t=1}^{n}}{\sum \limits_{s=1}^{n}}{d_{t-1}}\frac{\partial {\varepsilon _{t}}}{\partial {\rho _{1}}}{d_{s-1}}{\varepsilon _{s}}\right].\]
The first term of the right-hand side in (B.6) is given by
\[\begin{aligned}{}\frac{1}{{\sigma ^{4}}}{E_{}}\left[{\sum \limits_{t=1}^{n}}{\sum \limits_{s=1}^{n}}{\varepsilon _{t}}\frac{\partial {d_{t-1}}}{\partial {\rho _{1}}}{d_{s-1}}{\varepsilon _{s}}\right]& =\frac{1}{{\sigma ^{2}}}{\sum \limits_{t=1}^{n}}{d_{t-1}}\frac{\partial {d_{t-1}}}{\partial {\rho _{1}}}\\ {} & =-\frac{1}{{\sigma ^{2}}}{\sum \limits_{t=1}^{n}}{\sum \limits_{k=1}^{t-1}}{d_{t-1}}{d_{t-k-1}}{\varphi _{1}}(k).\end{aligned}\]
Similarly, the second term in (B.6) is given by
\[\begin{aligned}{}& \frac{1}{{\sigma ^{4}}}{E_{}}\left[{\sum \limits_{t=1}^{n}}{\sum \limits_{s=1}^{n}}{d_{t-1}}\frac{\partial {\varepsilon _{t}}}{\partial {\rho _{1}}}{d_{s-1}}{\varepsilon _{s}}\right]\\ {} & \hspace{1em}=-\frac{1}{{\sigma ^{4}}}{E_{}}\left[{\sum \limits_{t=1}^{n}}{\sum \limits_{s=1}^{n}}{\sum \limits_{k=1}^{t-1}}{d_{t-1}}{\varphi _{1}}(t-k){\varepsilon _{k}}{d_{s-1}}{\varepsilon _{s}}\right]\\ {} & \hspace{1em}=-\frac{1}{{\sigma ^{4}}}{E_{}}\left[{\sum \limits_{t=1}^{n}}{\sum \limits_{k=1}^{t-1}}{d_{t-1}}{d_{k-1}}{\varphi _{1}}(t-k){\varepsilon _{k}^{2}}\right]=-\frac{1}{{\sigma ^{2}}}{\sum \limits_{t=1}^{n}}{\sum \limits_{k=1}^{t-1}}{d_{t-1}}{d_{t-k-1}}{\varphi _{1}}(k).\end{aligned}\]
Hence, we obtain
\[ {\nu _{{\rho _{1}}\mu ,\mu }}=-\frac{2}{{\sigma ^{2}}}{\sum \limits_{t=1}^{n}}{\sum \limits_{k=1}^{t-1}}{d_{t-1}}{d_{t-k-1}}{\varphi _{1}}(k).\]
By the same process, we have
\[ {\nu _{{\rho _{2}}\mu ,\mu }}=-\frac{2}{{\sigma ^{2}}}{\sum \limits_{t=1}^{n}}{\sum \limits_{k=1}^{t-2}}{d_{t-1}}{d_{t-k-2}}{\varphi _{1}}(k).\]
Eq. (A.22) is trivial. Next, we show (B.7), (B.9), and (B.10) to prove (A.19), (A.20), (A.23), (A.24), and (A.25). We have
(B.7)
\[\begin{array}{r@{\hskip10.0pt}c@{\hskip10.0pt}l}\displaystyle {E_{}}\left[\frac{{\partial ^{2}}{\varepsilon _{t-p+1}}}{\partial {\rho _{1}^{2}}}\frac{\partial {\varepsilon _{t}}}{\partial {\rho _{1}}}\right]& \displaystyle =& \displaystyle -{E_{}}\left[{\sum \limits_{k=1}^{t-p-1}}{\sum \limits_{m=1}^{t-1}}{\varphi _{2}}(t-p-k+1){\varepsilon _{k}}{\varphi _{1}}(t-m){\varepsilon _{m}}\right]\\ {} & \displaystyle =& \displaystyle -{\sigma ^{2}}{\sum \limits_{k=1}^{t-p-1}}{\varphi _{1}}(t-k){\varphi _{2}}(t-p-k+1)\\ {} & \displaystyle =& \displaystyle -{\sigma ^{2}}{\sum \limits_{k=1}^{t-p-1}}{\varphi _{1}}(k+p){\varphi _{2}}(k+1)=-{\sigma ^{2}}{S_{p,t}}.\end{array}\]
Moreover, we have
(B.8)
\[\begin{aligned}{}& {E_{}}\left[{\sum \limits_{s=1}^{n}}\frac{\partial {\varepsilon _{t-p}}}{\partial {\rho _{1}}}\frac{\partial {\varepsilon _{t}}}{\partial {\rho _{1}}}{\varepsilon _{s}}\frac{\partial {\varepsilon _{s-q}}}{\partial {\rho _{1}}}\right]\\ {} =& -{E_{}}\left[{\sum \limits_{s=1}^{n}}{\sum \limits_{k=1}^{t-p-1}}{\sum \limits_{m=1}^{t-1}}{\sum \limits_{l=1}^{s-q-1}}{\varphi _{1}}(t-p-k){\varepsilon _{k}}{\varphi _{1}}(t-m){\varepsilon _{m}}{\varepsilon _{s}}{\varphi _{1}}(s-q-l){\varepsilon _{l}}\right]\end{aligned}\]
by (A.9) for $p,q\ge 0$. We consider the right-hand side of (B.8). First, we fix k and m. For l $(1\le l\le s-q-1)$, s must not be l. The summation appears only when $s=k$ and $l=m$, or $l=k$ and $s=m$. When $s=k$ and $l=m$, we have
\[\begin{array}{r@{\hskip10.0pt}c@{\hskip10.0pt}l}& & \displaystyle {E_{}}\left[{\sum \limits_{k=1}^{t-p-1}}{\sum \limits_{m=1}^{k-q-1}}{\varphi _{1}}(t-p-k){\varphi _{1}}(t-m){\varphi _{1}}(k-q-m){\varepsilon _{k}^{2}}{\varepsilon _{m}^{2}}\right]\\ {} & \displaystyle =& \displaystyle {\sigma ^{4}}{\sum \limits_{k=1}^{t-p-1}}{\sum \limits_{m=1}^{k-q-1}}{\varphi _{1}}(t-p-k){\varphi _{1}}(t-k+q+m){\varphi _{1}}(m)\\ {} & \displaystyle =& \displaystyle {\sigma ^{4}}{\sum \limits_{k=1}^{t-p-1}}{\sum \limits_{m=1}^{t-k-1-p-q}}{\varphi _{1}}(k){\varphi _{1}}(k+m+p+q){\varphi _{1}}(m).\end{array}\]
On the other hand, when $l=k$ and $s=m$, we have
\[\begin{array}{r@{\hskip10.0pt}c@{\hskip10.0pt}l}& & \displaystyle {E_{}}\left[{\sum \limits_{m=1}^{t-1}}{\sum \limits_{k=1}^{m-q-1}}{\varphi _{1}}(t-p-k){\varphi _{1}}(t-m){\varphi _{1}}(m-q-k){\varepsilon _{m}^{2}}{\varepsilon _{k}^{2}}\right]\\ {} & \displaystyle =& \displaystyle {\sigma ^{4}}{\sum \limits_{m=1}^{t-1}}{\sum \limits_{k=1}^{m-q-1}}{\varphi _{1}}(t-m){\varphi _{1}}(t-m+k-p+q){\varphi _{1}}(k)\\ {} & \displaystyle =& \displaystyle {\sigma ^{4}}{\sum \limits_{k=1}^{t-1}}{\sum \limits_{m=1}^{t-k-q-1}}{\varphi _{1}}(k){\varphi _{1}}(k+m-p+q){\varphi _{1}}(m).\end{array}\]
Therefore,
(B.9)
\[ {E_{}}\left[{\sum \limits_{t=1}^{n}}{\sum \limits_{s=1}^{n}}\frac{\partial {\varepsilon _{t-p}}}{\partial {\rho _{1}}}\frac{\partial {\varepsilon _{t}}}{\partial {\rho _{1}}}{\varepsilon _{s}}\frac{\partial {\varepsilon _{s-q}}}{\partial {\rho _{1}}}\right]={\sigma ^{4}}{\sum \limits_{t=1}^{n}}{T_{p,q,t}}.\]
Moreover, we consider the following expectation:
\[\begin{aligned}{}& {E_{}}\left[{\sum \limits_{t=1}^{n}}{\sum \limits_{s=1}^{n}}{\varepsilon _{t}}\frac{{\partial ^{2}}{\varepsilon _{t-p}}}{\partial {\rho _{1}}}{\varepsilon _{s}}\frac{\partial {\varepsilon _{s-q}}}{\partial {\rho _{1}}}\right]\\ {} & =-{E_{}}\left[{\sum \limits_{t=1}^{n}}{\sum \limits_{s=1}^{n}}{\sum \limits_{k=1}^{t-p-2}}{\sum \limits_{m=1}^{s-q-1}}{\varepsilon _{t}}{\varphi _{2}}(t-p-k){\varepsilon _{k}}{\varepsilon _{s}}{\varphi _{1}}(s-q-m){\varepsilon _{m}}\right].\end{aligned}\]
Similarly, we fix t and k. Then, the summation appears only when $s=t$ and $m=k$.
\[\begin{aligned}{}& {E_{}}\left[{\sum \limits_{t=1}^{n}}{\sum \limits_{k=1}^{t-p-2}}{\varphi _{2}}(t-p-k){\varphi _{1}}(t-q-k){\varepsilon _{t}^{2}}{\varepsilon _{k}^{2}}\right]\\ {} & ={\sigma ^{4}}{\sum \limits_{t=1}^{n}}{\sum \limits_{k=1}^{t-p-2}}{\varphi _{1}}(k+p-q+1){\varphi _{2}}(k+1).\end{aligned}\]
Therefore,
(B.10)
\[ {E_{}}\left[{\sum \limits_{t=1}^{n}}{\sum \limits_{s=1}^{n}}{\varepsilon _{t}}\frac{{\partial ^{2}}{\varepsilon _{t-p}}}{\partial {\rho _{1}}}{\varepsilon _{s}}\frac{\partial {\varepsilon _{s-q}}}{\partial {\rho _{1}}}\right]=-{\sigma ^{4}}{\sum \limits_{t=1}^{n}}{S_{p-q+1,t-q}}.\]
We complete the proof of the lemma by (B.7), (B.9), and (B.10).

Acknowledgement

The authors would like to thank anonymous referees for giving us suitable suggestions and comments. They gave us not only technical comments, but also practical insights. By their comments, we revised a lot of equations and added Remark 3.3 and Section 6. The paper was dramatically improved by their comments. The author also thanks Dr. Tao Zou (ANU) for giving us advices about the data generation process we use and for having discussions with us. This work was supported by JSPS KAKENHI Grant Numbers JP18K11200.

References

[1] 
Bao, Y.: Finite-sample bias of the conditional Gaussian maximum likelihood estimator in ARMA models. Adv. Econom. 36, 207–244 (2016)
[2] 
Bao, Y.: A general result on the estimation bias of ARMA models. J. Stat. Plan. Inference 197, 107–125 (2018). MR3799026. https://doi.org/10.1016/j.jspi.2018.01.001
[3] 
Bao, Y., Ullah, A.: The second-order bias and mean squared error of estimators in time-series models. J. Econom. 140, 650–669 (2007). MR2408921. https://doi.org/10.1016/j.jeconom.2006.07.007
[4] 
Bao, Y., Ullah, A.: Expectation of quadratic forms in normal and nonnormal variables with applications. J. Stat. Plan. Inference 140, 1193–1205 (2010). MR2581122. https://doi.org/10.1016/j.jspi.2009.11.002
[5] 
Barndorff-Nielsen, O.E., Cox, D.R.: Inference and Asymptotics. Chapman & Hall/CRC (1994). MR1317097. https://doi.org/10.1007/978-1-4899-3210-5
[6] 
Brockwell, P.J., Davis, R.A.: Time Series: Theory and Methods. Springer (1991). MR2839251
[7] 
Cheang, W.K., Reinsel, G.C.: Bias reduction of autoregressive estimates in time series regression model through restricted maximum likelihood. J. Am. Stat. Assoc. 95, 1173–1184 (2000). MR1804241. https://doi.org/10.2307/2669758
[8] 
Cordeiro, G.M., Klein, R.: Bias correction in ARMA models. Stat. Probab. Lett. 19, 169–176 (1994). MR1278646. https://doi.org/10.1016/0167-7152(94)90100-7
[9] 
Giummolé, F., Vidoni, P.: Improved prediction limits for a general class of Gaussian models. J. Time Ser. Anal. 31, 483–493 (2010). MR2732602. https://doi.org/10.1111/j.1467-9892.2010.00680.x
[10] 
Hamilton, J.D.: Time Series Analysis. Princeton University Press (1994). MR1278033
[11] 
Hillmer, S.C., Tiao, G.C.: Likelihood function of stationary multiple autoregressive moving average models. J. Am. Stat. Assoc. 74(367), 652–660 (1979). MR0548261
[12] 
Kurosawa, T., Noguchi, K., Honda, F.: Bias reduction of the maximum likelihood estimator for a conditional Gaussian MA(1) model. Commun. Stat., Theory Methods 31(17), 8588–8602 (2017). MR3680779. https://doi.org/10.1080/03610926.2016.1185119
[13] 
Shumway, R.H., Stoffer, D.S.: Time Series Analysis and Its Applications With R Examples, 4th edn. Springer (2017)
[14] 
Tanaka, K.: Asymptotic expansions associated with the AR(1) model with unknown mean. Econometrica 51, 1221–1231 (1983). MR0710228. https://doi.org/10.2307/1912060
[15] 
Tanaka, K.: An asymptotic expansion associated with the maximum likelihood estimators in ARMA models. J. R. Stat. Soc. B 46, 58–67 (1984). MR0745216
[16] 
The Federal Reserve Bank of St. Louis. http://research.stlouisfed.org/
[17] 
Wilson, G.T.: The estimation of parameters in multivariate time series models. J. R. Stat. Soc. 35(367), 76–85 (1973). MR0339421
Exit Reading PDF XML


Table of contents
  • 1 Introduction
  • 2 A Gaussian MA(2) model and the conditional maximum likelihood estimator
  • 3 Main results
  • 4 Simulation study
  • 5 Proof of theorems
  • 6 Practical examples
  • A Lemmas
  • B Proof of lemmas and propositions
  • Acknowledgement
  • References

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

MSTA

MSTA

  • Online ISSN: 2351-6054
  • Print ISSN: 2351-6046
  • Copyright © 2018 VTeX

About

  • About journal
  • Indexed in
  • Editors-in-Chief

For contributors

  • Submit
  • OA Policy
  • Become a Peer-reviewer

Contact us

  • ejournals-vmsta@vtex.lt
  • Mokslininkų 2A
  • LT-08412 Vilnius
  • Lithuania
Powered by PubliMill  •  Privacy policy