1 Introduction
Estimators of unknown parameters must be consistent. The consistency is ensured when we have large samples. The estimators might have a bias if the sample size is small. In recent years, some computational methods has been developed to compute estimates for unknown parameters. However, an analytical solution for the bias enables us to know a relationship between unknown parameter and the bias. It means that the bias changes depending on unknown parameters. Many analytical evaluations for the bias for a class of nonlinear estimators in models with i.i.d. samples have been conducted for many years. Tanaka (1983) [14] provided asymptotic expansions of the least square estimator for the first-order autoregressive process AR(1) and computed its bias. Tanaka (1984) [15] also gave asymptotic expansions of the maximum likelihood estimators for autoregressive moving average (ARMA) models, including AR(1), AR(2), MA(1), and MA(2), and also computed their bias. Cordeiro and Klein (1994) [8] derived the bias of the maximum likelihood estimators for ARMA models in another way although the result for MA(2) was not shown. Cheang and Reinsel (2000) [7] developed a way to reduce the bias of AR models using the restricted maximum likelihood estimation.
Practically, we often rely on the conditional maximum likelihood estimation for reducing the computational cost of the maximum likelihood estimation and for a prediction of an unobserved variable which is the next value of the observed data (see Section 2 for the definition of the conditional maximum likelihood estimation). The conditional maximum likelihood estimation is often referred as the quasi-maximum likelihood estimation (QMLE). Statistical properties of the conditional maximum likelihood estimation have been discussed in some literatures (see [3] and [4] by Bao and Ullah, for example). Giummolè and Vidoni (2010) [9] showed the bias of the conditional maximum likelihood estimator for a Gaussian MA(1) model in a process of obtaining improved coverage probabilities for ARMA models. However, the bias of the estimator for a Gaussian first-order moving average (MA(1)) model was slightly strange. Hence, Kurosawa, Noguchi, and Honda (2017) [12] corrected the bias and deduced a simple expression for the bias using a method by Barndorff-Nielsen and Cox (1994) [5]. We also should note the recent remarkable results by Y. Bao (2016) [1] and (2018) [2]. We shall discuss his results in Remark 3.4 below.
In this study, we show the bias of the conditional maximum likelihood estimators of unknown parameters for a Gaussian second-order moving average (MA(2)) model followed by the method in [12]. In Section 2, we introduce a Gaussian MA(2) model and the conditional maximum likelihood function. In Section 3, we derive both the bias and the mean squared errors (MSEs) of the conditional maximum likelihood estimator for a Gaussian MA(2) model, and then propose new estimators based on the $O({n^{-1}})$ term in the bias of the conditional maximum likelihood estimators. Moreover, we show that the proposed estimators are less biased and have the lower MSEs than those of the conditional maximum likelihood estimators. In Section 4, we conduct a simple simulation study to verify our results. Furthermore, we apply our method to GNP in United States of America as an illustrative example of our method in Section 6.
2 A Gaussian MA(2) model and the conditional maximum likelihood estimator
Let $\{{Y_{t}}\}$ be a Gaussian MA(1) model (see, e.g., [6, 10]) defined by
where $|\rho |<1$. Kurosawa, Noguchi, and Honda (2017) [12] computed the bias of the conditional maximum likelihood estimator under the condition that
for the Gaussian MA(1) model. Assumption (2) is a useful condition for not only an estimation problem but also a prediction problem, since ${\varepsilon _{T}}$ ($T\ge 1$) can be written by a linear combination of ${Y_{1}},\dots ,{Y_{T}}$. Then, the best linear unbiased estimator ${\widehat{Y}_{T+h}}$ of ${Y_{T+h}}$ ($h>0$) given $S=\{{Y_{1}}={y_{1}},\dots ,{Y_{T}}={y_{T}}\}$ is described using a finite linear combination of ${\varepsilon _{1}},\dots ,{\varepsilon _{T}}$ (see, e.g., [12]). They gave the following:
(1)
\[ {Y_{t}}=\mu +{\varepsilon _{t}}+\rho {\varepsilon _{t-1}},\hspace{1em}{\varepsilon _{t}}\stackrel{\mathrm{i}.\mathrm{i}.\mathrm{d}.}{\sim }N(0,{\sigma ^{2}})\hspace{2em}(t\ge 1),\]Theorem 2.1 ([12]).
The bias of the conditional maximum likelihood estimators of the unknown parameters given (2) for the Gaussian MA(1) model is
\[\begin{array}{r@{\hskip10.0pt}c@{\hskip10.0pt}l}\displaystyle {E_{}}[\widehat{\mu }-\mu ]& \displaystyle =& \displaystyle o({n^{-1}}),\\ {} \displaystyle {E_{}}[\widehat{\sigma }-\sigma ]& \displaystyle =& \displaystyle -\frac{5\sigma }{4n}+o({n^{-1}}),\\ {} \displaystyle {E_{}}[\widehat{\rho }-\rho ]& \displaystyle =& \displaystyle \frac{2\rho -1}{n}+o({n^{-1}}).\end{array}\]
In this study, we consider a Gaussian MA(2) model defined by
where ${\varepsilon _{t}}\stackrel{\mathrm{i}.\mathrm{i}.\mathrm{d}.}{\sim }N(0,{\sigma ^{2}})$ and $\theta ={(\mu ,\sigma ,{\rho _{1}},{\rho _{2}})^{\top }}$ is a vector consisting of the unknown parameters. Although the MA(2) model has the property of stationarity regardless of the values of ${\rho _{1}}$ and ${\rho _{2}}$, we assume the invertibility, which means
to identify the model uniquely. Otherwise, the maximum likelihood function takes the same value at different points.
(3)
\[ {Y_{t}}=\mu +{\varepsilon _{t}}+{\rho _{1}}{\varepsilon _{t-1}}+{\rho _{2}}{\varepsilon _{t-2}}\hspace{2em}(t\ge 1),\]If we consider an estimation problem for the maximum likelihood function with ${Y_{1}},\dots ,{Y_{n}}$, then the likelihood function can be expressed using the infinite number of εs. To avoid the problem of infinite number of εs, we solve the conditional maximum likelihood function given
for the Gaussian MA(2) model. In this case, ${Y_{1}},\dots ,{Y_{n}}$ can be transformed using the finite number of εs. The conditional log-likelihood function using the finite number of εs for the Gaussian MA(2) model given (4) is expressed as
The likelihood function with (4) is referred as the conditional likelihood function (see [11, p. 653] and [17]). We use the following lemma to compute the bias of the unknown parameters in the Gaussian MA (2) model.
(5)
\[ \mathcal{L}(\theta ;y)=-\frac{n}{2}\log (2\pi )-\frac{n}{2}\log ({\sigma ^{2}})-{\sum \limits_{t=1}^{n}}\frac{{\{{\varepsilon _{t}}(\theta ;y)\}^{2}}}{2{\sigma ^{2}}}.\]Lemma 2.2.
Let $Y={({Y_{1}},\dots ,{Y_{n}})^{\top }}$ be a vector of random variables generated by a Gaussian MA(2) model in (3). Assume that (4) holds. Then, we have
where
The proof will be in the Appendix. We know that
\[\begin{aligned}{}{\varepsilon _{t}}& =\frac{1}{(1-{\lambda _{1}}L)(1-{\lambda _{2}}L)}({Y_{t}}-\mu )={\sum \limits_{k=0}^{\infty }}{\sum \limits_{l=0}^{k}}{\lambda _{1}^{l}}{\lambda _{2}^{k-l}}({Y_{t-k}}-\mu )\\ {} & ={\sum \limits_{k=-\infty }^{t-1}}\left({\sum \limits_{l=0}^{t-k-1}}{\lambda _{1}^{l}}{\lambda _{2}^{t-k-l-1}}\right)({Y_{k+1}}-\mu )\end{aligned}\]
since the process is invertible. The lemma suggests that the coefficients of ${Y_{t}}\hspace{2.5pt}(t\le 0)$ are zero when ${\varepsilon _{0}}={\varepsilon _{-1}}=0$.Since the conditional likelihood function is expressed as a function of independent samples ${\varepsilon _{1}},\dots ,{\varepsilon _{n}}$, we can apply the following lemma in [5] to the conditional log-likelihood function. We apply Lemma 2.3 for i.i.d. random variables ε, not Y. The high-order differentiations of the conditional log-likelihood function (5) are required to obtain the bias and the MSEs of the maximum likelihood estimators and Lemma 2.2 will be used for the calculations. An asymptotic expansion of the bias of the maximum likelihood estimator is given by the following:
Lemma 2.3 (See Barndorff-Nielsen and Cox (1994) [5, p. 150]).
Let $\theta ={({\theta _{1}},\dots ,{\theta _{d}})^{\top }}$ be a vector of unknown parameters for a random variable Z, and $\widehat{\theta }={({\widehat{\theta }_{1}},\dots ,{\widehat{\theta }_{d}})^{\top }}$ be a vector of the maximum likelihood estimators of θ for a vector of random samples $Z={({Z_{1}},\dots ,{Z_{n}})^{\top }}$. Then, the bias of ${\widehat{\theta }_{r}}$ $(1\le r\le d)$ is given by
\[\begin{aligned}{}& {E_{Z}}[{\widehat{\theta }_{r}}-{\theta _{r}}]=\frac{1}{2}{\sum \limits_{s=1}^{d}}{\sum \limits_{t=1}^{d}}{\sum \limits_{u=1}^{d}}{i^{{\theta _{r}}{\theta _{s}}}}{i^{{\theta _{t}}{\theta _{u}}}}({\nu _{{\theta _{s}}{\theta _{t}}{\theta _{u}}}}+2{\nu _{{\theta _{s}}{\theta _{t}},{\theta _{u}}}})+O({n^{-3/2}})\\ {} & \hspace{2em}(r=1,\dots ,d),\end{aligned}\]
where $\mathcal{L}(\theta ;Z)$ is the log-likelihood function for $Z={({Z_{1}},\dots ,{Z_{n}})^{\top }}$,
\[\begin{array}{r@{\hskip10.0pt}c@{\hskip10.0pt}l}\displaystyle {l_{{\theta _{s}}}}& \displaystyle =& \displaystyle \frac{\partial \mathcal{L}(\theta ;Z)}{\partial {\theta _{s}}},\hspace{1em}{l_{{\theta _{s}}{\theta _{t}}}}=\frac{{\partial ^{2}}\mathcal{L}(\theta ;Z)}{\partial {\theta _{s}}\partial {\theta _{t}}},\hspace{1em}{l_{{\theta _{s}}{\theta _{t}}{\theta _{u}}}}=\frac{{\partial ^{3}}\mathcal{L}(\theta ;Z)}{\partial {\theta _{s}}\partial {\theta _{t}}\partial {\theta _{u}}},\\ {} \displaystyle {i^{{\theta _{r}}{\theta _{s}}}}& \displaystyle =& \displaystyle {\left({I_{n}}{(\theta )^{-1}}\right)_{r,s}},\hspace{1em}{\nu _{{\theta _{s}}{\theta _{t}}{\theta _{u}}}}={E_{Z}}[{l_{{\theta _{s}}{\theta _{t}}{\theta _{u}}}}],\hspace{1em}{\nu _{{\theta _{s}}{\theta _{t}},{\theta _{u}}}}={E_{Z}}[{l_{{\theta _{s}}{\theta _{t}}}}{l_{{\theta _{u}}}}],\end{array}\]
and ${I_{n}}(\theta )$ is the Fisher information matrix for Z.
3 Main results
In this section, we compute the biases of estimators for the unknown parameters of the Gaussian MA(2) model using the conditional maximum likelihood estimate, and also propose new estimators for these parameters. Before obtaining the results on bias, we observe the MSEs. The MSEs appear in the diagonal elements in the covariance matrix by
\[ {E_{}}[(\widehat{\theta }-\theta ){(\widehat{\theta }-\theta )^{\top }}]={I_{n}}{(\theta )^{-1}}+o({n^{-1}}),\]
where ${I_{n}}(\theta )$ is the Fisher’s information matrix. It can be simplified by applying asymptotic properties
Therefore,
Theorem 3.1.
The elements in the asymptotic covariance matrix of the conditional maximum likelihood estimators of the unknown parameters under (4) for the Gaussian MA(2) model in (3) is given by ${E_{}}[(\widehat{\mu }-\mu )(\widehat{\sigma }-\sigma )]={E_{}}[(\widehat{\mu }-\mu )({\widehat{\rho }_{1}}-{\rho _{1}})]={E_{}}[(\widehat{\mu }-\mu )({\widehat{\rho }_{2}}-{\rho _{2}})]$ $={E_{}}[(\widehat{\sigma }-\sigma )({\widehat{\rho }_{1}}-{\rho _{1}})]$ $={E_{}}[(\widehat{\sigma }-\sigma )({\widehat{\rho }_{2}}-{\rho _{2}})]=o({n^{-1}})$ and
\[\begin{array}{l}\displaystyle {E_{}}[{(\widehat{\mu }-\mu )^{2}}]=\frac{{\sigma ^{2}}{({\rho _{1}}+{\rho _{2}}+1)^{2}}}{n}+o({n^{-1}}),\hspace{1em}{E_{}}[{(\widehat{\sigma }-\sigma )^{2}}]=\frac{{\sigma ^{2}}}{2n}+o({n^{-1}}),\\ {} \displaystyle {E_{}}[{({\widehat{\rho }_{1}}-{\rho _{1}})^{2}}]=\frac{1-{\rho _{2}^{2}}}{n}+o({n^{-1}}),\hspace{1em}{E_{}}[{({\widehat{\rho }_{2}}-{\rho _{2}})^{2}}]=\frac{1-{\rho _{2}^{2}}}{n}+o({n^{-1}}),\\ {} \displaystyle {E_{}}[({\widehat{\rho }_{1}}-{\rho _{1}})({\widehat{\rho }_{2}}-{\rho _{2}})]=\frac{{\rho _{1}}(1-{\rho _{2}})}{n}+o({n^{-1}}).\end{array}\]
The proof is given in Section 5.1. By applying Lemma 2.3 to (5), we obtain the bias of the conditional maximum likelihood estimators.
Theorem 3.2.
The bias of the conditional maximum likelihood estimators of the unknown parameters under (4) for the Gaussian MA(2) model in (3) is given by
(9)
\[\begin{array}{r@{\hskip10.0pt}c@{\hskip10.0pt}l}\displaystyle {E_{}}[\widehat{\mu }-\mu ]& \displaystyle =& \displaystyle o({n^{-1}}),\end{array}\](10)
\[\begin{array}{r@{\hskip10.0pt}c@{\hskip10.0pt}l}\displaystyle {E_{}}[\widehat{\sigma }-\sigma ]& \displaystyle =& \displaystyle -\frac{7\sigma }{4n}+o({n^{-1}}),\end{array}\]The proof is given in Section 5.2. We observe that the bias of the conditional maximum likelihood estimators for the Gaussian MA(2) model is the same as that for the full maximum likelihood estimators for a Gaussian MA(2) model (see Tanaka (1984) [15]). Although (11) looks different from the result by Tanaka, we can deduce the same result for the bias of ${\hat{\sigma }^{2}}$ (see (14)).
Remark 3.3.
We note that MA$(2)$ is reduced to MA$(1)$ if we put ${\rho _{2}}=0$ in (3). However, the bias for $\rho (={\rho _{1}})$ in Theorem 2.1 is not obtained even if we put ${\rho _{2}}=0$ in (11). The results in Theorem 3.2 with ${\rho _{2}}=0$ are obtained from a solution of the maximum likelihood estimate in four dimensions. The bias of MA(1) in Theorem 2.1 can be considered as the bias of MA(2) given ${\rho _{2}}=0$ in Theorem 3.2. This implies that Theorem 3.2 is not result under the assumption with algebraic relationships among unknown parameters such as ${\rho _{2}}=0$, ${\rho _{2}}=1$, $\Delta ={\rho _{1}^{2}}-4{\rho _{2}}=0$, and so on. Namely, we do not assume any algebraic relationships among the unknown parameters in advance. Thus, $\Delta \ne 0$ which is equivalent to ${\lambda _{1}}\ne {\lambda _{2}}$ is used in the proof of Theorem 3.2 and Propositions and Lemmas in the appendix except for Lemma 2.2.
Remark 3.4.
We have recently found the notable results by Y. Bao (2016) [1] although we originally proved Theorem 3.2. The abstract in [1] claims that the bias of the conditional Gaussian likelihood estimation with nonnormal errors is derived. The gap between the “Gaussian” and the “nonnormal errors” imply that he used likelihood function (5) even if the errors follow a nonnormal distribution. For the calculation of the bias, we require the values of the skewness ${\gamma _{1}}$ and the kurtosis ${\gamma _{2}}$. Namely, he derived the bias regarding the likelihood function as the Gaussian likelihood function without the conditions ${\gamma _{1}}=0$ and ${\gamma _{2}}=3$ for a normal distribution. He gave the bias of various models including MA(2) with a matrix representation which was originally studied by Corderio and Klein (1994) [8], while we use the roots of the characteristic function for the derivation of the bias. We purely focus on the Gaussian MA(2) model with the conditional likelihood function and evaluate the corrected bias. Furthermore, we propose a new estimator below based on the corrected bias and discuss the corrected estimators under a pure Gaussian MA(2) model in detail using the (estimated) bias and the MSE in the simulation study.
Using (10), (11), and (12), we propose the following new estimators for the Gaussian MA(2) model:
As we can see, the proposed estimators are asymptotically equal to the usual estimators. We consider the MSEs of the new estimators:
Thus, we have
In other words, the MSEs of $\widetilde{\sigma }$ and $\widehat{\sigma }$ are asymptotically the same as well:
(13)
\[ \widetilde{\sigma }=\widehat{\sigma }+\frac{7\widehat{\sigma }}{4n},\hspace{2em}{\widetilde{\rho }_{1}}={\widehat{\rho }_{1}}-\frac{{\widehat{\rho }_{1}}+{\widehat{\rho }_{2}}-1}{n},\hspace{2em}{\widetilde{\rho }_{2}}={\widehat{\rho }_{2}}-\frac{3{\widehat{\rho }_{2}}-1}{n}.\](14)
\[ {E_{}}[\widehat{\sigma }]=\left(1-\frac{7}{4n}\right)\sigma +o({n^{-1}}),\hspace{1em}{E_{}}[{\widehat{\sigma }^{2}}]=\left(1-\frac{3}{n}\right){\sigma ^{2}}+o({n^{-1}}).\]
\[\begin{array}{r@{\hskip10.0pt}c@{\hskip10.0pt}l}\displaystyle {E_{}}[{\widehat{\rho }_{1}^{2}}]& \displaystyle =& \displaystyle {\rho _{1}^{2}}+\frac{1-{\rho _{2}^{2}}}{n}+2{\rho _{1}}\frac{{\rho _{1}}+{\rho _{2}}-1}{n}+o({n^{-1}}),\\ {} \displaystyle {E_{}}[{\widehat{\rho }_{2}^{2}}]& \displaystyle =& \displaystyle {\rho _{2}^{2}}+\frac{1-{\rho _{2}^{2}}}{n}+2{\rho _{2}}\frac{3{\rho _{2}}-1}{n}+o({n^{-1}}),\\ {} \displaystyle {E_{}}[{\widehat{\rho }_{1}}{\widehat{\rho }_{2}}]& \displaystyle =& \displaystyle {\rho _{1}}{\rho _{2}}+{\rho _{2}}\frac{3{\rho _{1}}+{\rho _{2}}-1}{n}+o({n^{-1}}).\end{array}\]
Therefore, we have
\[ {E_{}}[{({\widetilde{\rho }_{1}}-{\rho _{1}})^{2}}]=\frac{1-{\rho _{2}^{2}}}{n}+o({n^{-1}}),\hspace{1em}{E_{}}[{({\widetilde{\rho }_{2}}-{\rho _{2}})^{2}}]=\frac{1-{\rho _{2}^{2}}}{n}+o({n^{-1}}).\]
In other words, the MSEs of ${\widetilde{\rho }_{1}}$ and ${\widehat{\rho }_{1}}$ are asymptotically the same, as are the MSEs of ${\widetilde{\rho }_{2}}$ and ${\widehat{\rho }_{2}}$.4 Simulation study
In this section, we conduct a simulation study in order to verify Theorems 3.1 and 3.2 and evaluate the validity of the new estimators. Let $\mu =1$ and $\sigma =1$. For a fixed n and a fixed ρ, we generate $y={({y_{1}},\dots ,{y_{n}})^{\top }}$ 30,000 times from the Gaussian MA(2) model. For each y, we calculate a vector of the conditional maximum likelihood estimators $\widehat{\theta }={(\widehat{\mu },\widehat{\sigma },{\widehat{\rho }_{1}},{\widehat{\rho }_{2}})^{\top }}$. Using the 30,000 replications, we calculate the estimated bias and MSEs using Monte Carlo simulations. In Subsetions 4.2 and 4.3, we also compute the full maximum likelihood estimators ${\mathbf{\hat{\theta }}^{\mathrm{MLE}}}$ to compare it with our estimators.
4.1 Evaluation of asymptotic variances
We evaluate how much the MSEs of the conditional maximum likelihood estimators change depending on the true values of the unknown parameters. Table 1 shows the estimated MSEs and the values of $J(\theta )/n$ obtained in Theorem 3.1 of the conditional maximum likelihood estimators for each unknown parameter.
Table 1.
Comparisons of the estimated MSEs and $J(\theta )/n$ for each unknown parameter (upper: the estimated MSE, lower: $J(\theta )/n$)
$({\rho _{1}},{\rho _{2}})=(0.25,-0.25)$ | $({\rho _{1}},{\rho _{2}})=(-0.40,-0.59)$ | |||||||||
n | $\widehat{\mu }$ | $\widehat{\sigma }$ | ${\widehat{\rho }_{1}}$ | ${\widehat{\rho }_{2}}$ | n | $\widehat{\mu }$ | $\widehat{\sigma }$ | ${\widehat{\rho }_{1}}$ | ${\widehat{\rho }_{2}}$ | |
50 | 0.02046 | 0.01133 | 0.03102 | 0.03684 | 50 | 0.01412 | 0.01146 | 0.04705 | 0.02783 | |
0.02000 | 0.01000 | 0.01875 | 0.01875 | 0.01312 | 0.01000 | 0.01304 | 0.01304 | |||
100 | 0.01020 | 0.00526 | 0.01136 | 0.01233 | 100 | 0.00691 | 0.00578 | 0.01566 | 0.00944 | |
0.01000 | 0.00500 | 0.00938 | 0.00938 | 0.00656 | 0.00500 | 0.00652 | 0.00652 | |||
150 | 0.00675 | 0.00346 | 0.00706 | 0.00739 | 150 | 0.00453 | 0.00386 | 0.00921 | 0.00564 | |
0.00667 | 0.00333 | 0.00625 | 0.00625 | 0.00437 | 0.00333 | 0.00435 | 0.00435 | |||
$({\rho _{1}},{\rho _{2}})=(0.15,-0.55)$ | $({\rho _{1}},{\rho _{2}})=(0.15,0.55)$ | |||||||||
n | $\widehat{\mu }$ | $\widehat{\sigma }$ | ${\widehat{\rho }_{1}}$ | ${\widehat{\rho }_{2}}$ | n | $\widehat{\mu }$ | $\widehat{\sigma }$ | ${\widehat{\rho }_{1}}$ | ${\widehat{\rho }_{2}}$ | |
50 | 0.00786 | 0.01148 | 0.03108 | 0.03334 | 50 | 0.05849 | 0.01099 | 0.01943 | 0.02612 | |
0.00720 | 0.01000 | 0.01395 | 0.01395 | 0.05780 | 0.01000 | 0.01395 | 0.01395 | |||
100 | 0.00379 | 0.00526 | 0.01051 | 0.01123 | 100 | 0.02907 | 0.00520 | 0.00814 | 0.00937 | |
0.00360 | 0.00500 | 0.00698 | 0.00698 | 0.02890 | 0.00500 | 0.00698 | 0.00698 | |||
150 | 0.00248 | 0.00346 | 0.00597 | 0.00620 | 150 | 0.01932 | 0.00343 | 0.00511 | 0.00559 | |
0.00240 | 0.00333 | 0.00465 | 0.00465 | 0.01927 | 0.00333 | 0.00465 | 0.00465 |
We conducted simulations under the four settings. The top-left table is prepared for checking of performance under the condition that the true parameters are within the invertibility condition. On the other hand, the top-right table is close to the boundary of the invertibility condition. The two bottom tables are made for checking the symmetry of ${\rho _{2}}$. It is clearly observed from Table 1 for all the settings that the estimated MSEs decrease when the sample size n is large. The estimated MSE of $\widehat{\sigma }$ does not depend on the values of ${\rho _{1}}$ and ${\rho _{2}}$, which coincides with the result that $J(\theta )/n$ of $\widehat{\sigma }$ in Theorem 3.1 does not include ${\rho _{1}}$ and ${\rho _{2}}$ in the expression. Since $J(\theta )/n$s of ${\widehat{\rho }_{1}}$ and ${\widehat{\rho }_{2}}$ depend on the value of ${\rho _{2}}$ but are independent of the value of ${\rho _{1}}$, we expect that the estimated MSEs of ${\widehat{\rho }_{1}}$ and ${\widehat{\rho }_{2}}$ on $({\rho _{1}},{\rho _{2}})=(0.15,-0.55)$ and $({\rho _{1}},{\rho _{2}})=(0.15,0.55)$ are close, but the results show different values in the small sample size $n=50$. This result may be the influence of $o({n^{-1}})$. We compare n times the estimated MSEs and $J(\theta )$ on $({\rho _{1}},{\rho _{2}})=(0.15,-0.55)$ and $({\rho _{1}},{\rho _{2}})=(0.15,0.55)$ of $n=50$ and $n=1000$ to verify the influence by $o({n^{-1}})$ in Table 2.
Table 2.
Comparisons of n times the estimated MSEs and $J(\theta )$ of $n=50$ and $n=1000$
$({\rho _{1}},{\rho _{2}})=(0.15,-0.55)$ | |||||
$\widehat{\mu }$ | $\widehat{\sigma }$ | ${\widehat{\rho }_{1}}$ | ${\widehat{\rho }_{2}}$ | ||
$n\times \text{the estimated MSE}$ | $n=50$ | 0.39307 | 0.57401 | 1.55412 | 1.66715 |
$n=1000$ | 0.36061 | 0.51210 | 0.72833 | 0.71769 | |
$J(\theta )$ | 0.36000 | 0.50000 | 0.69750 | 0.69750 | |
$({\rho _{1}},{\rho _{2}})=(0.15,0.55)$ | |||||
$\widehat{\mu }$ | $\widehat{\sigma }$ | ${\widehat{\rho }_{1}}$ | ${\widehat{\rho }_{2}}$ | ||
$n\times \text{the estimated MSE}$ | $n=50$ | 2.92462 | 0.54941 | 0.97164 | 1.30611 |
$n=1000$ | 2.88205 | 0.51144 | 0.69979 | 0.71517 | |
$J(\theta )$ | 2.89000 | 0.50000 | 0.69750 | 0.69750 |
Generally, n times the estimated MSEs converge to $J(\theta )$ if n is large. The result shows that the estimated MSE of ${\widehat{\rho }_{1}}$ is close to ${\widehat{\rho }_{2}}$ for $n=1000$. Next, we present the behavior of the estimated MSEs for ${\rho _{1}}=0.25$ and ${\rho _{2}}=-0.25$ when the sample size is small in Table 3.
Table 3.
Behavior of the estimated MSEs when the sample size is small
n | $\widehat{\mu }$ | $\widehat{\sigma }$ | ${\widehat{\rho }_{1}}$ | ${\widehat{\rho }_{2}}$ |
10 | 0.14485 | 0.09400 | 0.41133 | 0.36819 |
11 | 0.12914 | 0.08262 | 0.34886 | 0.33276 |
12 | 0.11453 | 0.07419 | 0.29700 | 0.29798 |
13 | 0.10000 | 0.06659 | 0.26646 | 0.27767 |
14 | 0.08955 | 0.06034 | 0.23022 | 0.25188 |
15 | 0.08262 | 0.05502 | 0.21125 | 0.23577 |
16 | 0.07426 | 0.05051 | 0.18974 | 0.21314 |
17 | 0.06889 | 0.04616 | 0.17351 | 0.19723 |
18 | 0.06413 | 0.04295 | 0.15869 | 0.18569 |
19 | 0.05969 | 0.04028 | 0.14624 | 0.17146 |
20 | 0.05529 | 0.03762 | 0.13552 | 0.16067 |
The estimated MSE becomes smaller as the sample size becomes larger.
4.2 Estimator of σ
We express the bias of $\widehat{\sigma }$ as
which implies that
Table 4.
Evaluation of the estimated bias of $\widehat{\sigma }$
$({\rho _{1}},{\rho _{2}})=(0.25,-0.25)$ | $({\rho _{1}},{\rho _{2}})=(0.15,0.55)$ | |||||
n | ${e_{1}}+{e_{2}}$ | ${e_{2}}$ | n | ${e_{1}}+{e_{2}}$ | ${e_{2}}$ | |
50 | $-0.03579$ | $-0.00079$ | 50 | $-0.02814$ | 0.00686 | |
100 | $-0.01635$ | 0.00115 | 100 | $-0.01322$ | 0.00428 | |
150 | $-0.01035$ | 0.00132 | 150 | $-0.00825$ | 0.00342 |
The bias of $\widehat{\sigma }$ does not depend on the value of ${\rho _{1}}$ and ${\rho _{2}}$, which coincide with (10). Moreover, $|{e_{2}}|$ is smaller than $|{e_{1}}+{e_{2}}|$ because of the exclusion of the term $O({n^{-1}})$. Next, we compare the bias and MSEs of $\widehat{\sigma }$ and the proposed estimator $\widetilde{\sigma }$ for ${\rho _{1}}=0.25$ and ${\rho _{2}}=-0.25$. We also compute the full maximum likelihood estimator ${\widehat{\sigma }^{\mathrm{MLE}}}$.
Table 5.
Comparison of the bias and MSEs of $\widehat{\sigma }$ and $\widetilde{\sigma }$
Bias | MSE | |||||||
n | $\widehat{\sigma }$ | ${\widehat{\sigma }^{\mathrm{MLE}}}$ | $\widetilde{\sigma }$ | n | $\widehat{\sigma }$ | ${\widehat{\sigma }^{\mathrm{MLE}}}$ | $\widetilde{\sigma }$ | |
50 | $-0.03579$ | $-0.04193$ | $-0.00204$ | 50 | 0.01133 | 0.01195 | 0.01077 | |
100 | $-0.01635$ | $-0.01820$ | 0.00087 | 100 | 0.00526 | 0.00531 | 0.00517 | |
150 | $-0.01035$ | $-0.01142$ | 0.00120 | 150 | 0.00346 | 0.00347 | 0.00343 |
The estimated bias of $\widetilde{\sigma }$ is lower than those of $\widehat{\sigma }$ and ${\widetilde{\sigma }^{\mathrm{MLE}}}$. On the other hand, there is no difference among the three estimated MSEs. This result certainly is in accordance with the discussion in Section 3.
4.3 Estimators of ${\rho _{1}}$ and ${\rho _{2}}$
We express the bias of ${\widehat{\theta }_{i}}$ $(i=3,4)$ as
\[ {E_{}}[{\widehat{\theta }_{i}}-{\theta _{i}}]={e_{1}}+{e_{2}},\hspace{1em}{e_{1}}=O({n^{-1}}),\hspace{1em}{e_{2}}=o({n^{-1}}),\]
where ${e_{1}}=({\rho _{1}}+{\rho _{2}}-1)/n$ for $i=3$, ${e_{1}}=(3{\rho _{2}}-1)/n$ for $i=4$. Then, ${e_{1}}+{e_{2}}$ and ${e_{2}}$ imply the bias of the conditional maximum likelihood estimator and the bias of the conditional maximum likelihood estimator without the term $O({n^{-1}})$, respectively.Table 6.
Evaluation of the estimated bias of ${\widehat{\rho }_{1}}$ and ${\widehat{\rho }_{2}}$
$({\rho _{1}},{\rho _{2}})=(0.25,-0.25)$ | $({\rho _{1}},{\rho _{2}})=(0.40,-0.59)$ | |||||||||
${\widehat{\rho }_{1}}$ | ${\widehat{\rho }_{2}}$ | ${\widehat{\rho }_{1}}$ | ${\widehat{\rho }_{2}}$ | |||||||
n | ${e_{1}}+{e_{2}}$ | ${e_{2}}$ | ${e_{1}}+{e_{2}}$ | ${e_{2}}$ | n | ${e_{1}}+{e_{2}}$ | ${e_{2}}$ | ${e_{1}}+{e_{2}}$ | ${e_{2}}$ | |
50 | $-0.03591$ | $-0.01591$ | $-0.04913$ | $-0.01413$ | 50 | $-0.13333$ | $-0.10953$ | $-0.00976$ | 0.04564 | |
100 | $-0.01319$ | $-0.00319$ | $-0.01969$ | $-0.00219$ | 100 | $-0.07309$ | $-0.06119$ | 0.01046 | 0.03816 | |
150 | $-0.00847$ | $-0.00181$ | $-0.01271$ | $-0.00104$ | 150 | $-0.05414$ | $-0.04621$ | 0.01191 | 0.03037 | |
$({\rho _{1}},{\rho _{2}})=(0.15,-0.55)$ | $({\rho _{1}},{\rho _{2}})=(0.15,0.55)$ | |||||||||
${\widehat{\rho }_{1}}$ | ${\widehat{\rho }_{2}}$ | ${\widehat{\rho }_{1}}$ | ${\widehat{\rho }_{2}}$ | |||||||
n | ${e_{1}}+{e_{2}}$ | ${e_{2}}$ | ${e_{1}}+{e_{2}}$ | ${e_{2}}$ | n | ${e_{1}}+{e_{2}}$ | ${e_{2}}$ | ${e_{1}}+{e_{2}}$ | ${e_{2}}$ | |
50 | $-0.06690$ | $-0.03890$ | $-0.06690$ | $-0.01390$ | 50 | $-0.00862$ | $-0.00262$ | 0.00769 | $-0.00531$ | |
100 | $-0.02493$ | $-0.01093$ | $-0.02765$ | $-0.00115$ | 100 | $-0.00305$ | $-0.00005$ | 0.00247 | $-0.00403$ | |
150 | $-0.01475$ | $-0.00542$ | $-0.01663$ | 0.00104 | 150 | $-0.00250$ | $-0.00050$ | 0.00095 | $-0.00338$ |
Except for the case where ${\rho _{1}}$ and ${\rho _{2}}$ are close to the boundary condition $({\rho _{1}},{\rho _{2}})\hspace{-0.1667em}=(0.40,-0.59)$, $|{e_{2}}|$ is smaller than $|{e_{1}}+{e_{2}}|$. The bias of ${\widehat{\rho }_{1}}$ depends on the value of ${\rho _{1}}$ and ${\rho _{2}}$. The bias of ${\widehat{\rho }_{2}}$ depends on the value of ${\rho _{2}}$ but not ${\rho _{1}}$, which coincides with (12). Next, we compare the bias between ${\widehat{\rho }_{1}}$ and ${\widetilde{\rho }_{1}}$ and between ${\widehat{\rho }_{2}}$ and ${\widetilde{\rho }_{2}}$. We also compute the full maximum likelihood estimators ${\hat{\rho }_{i}^{\mathrm{MLE}}}$.
Table 7.
Comparisons of estimated bias for the estimator of ${\rho _{1}}$ and ${\rho _{2}}$
$({\rho _{1}},{\rho _{2}})=(0.25,-0.25)$ | ||||||
n | ${\widehat{\rho }_{1}}$ | ${\widehat{\rho }_{1}^{\mathrm{MLE}}}$ | ${\widetilde{\rho }_{1}}$ | ${\widehat{\rho }_{2}}$ | ${\widehat{\rho }_{2}^{\mathrm{MLE}}}$ | ${\widetilde{\rho }_{2}}$ |
50 | −0.03591 | −0.03166 | −0.01421 | −0.04913 | −0.05512 | −0.01118 |
100 | −0.01319 | −0.01087 | −0.00286 | −0.01969 | −0.02115 | −0.00160 |
150 | −0.00847 | −0.00716 | −0.00166 | −0.01271 | −0.01338 | −0.00079 |
$({\rho _{1}},{\rho _{2}})=(0.40,-0.59)$ | ||||||
n | ${\widehat{\rho }_{1}}$ | ${\widehat{\rho }_{1}^{\mathrm{MLE}}}$ | ${\widetilde{\rho }_{1}}$ | ${\widehat{\rho }_{2}}$ | ${\widehat{\rho }_{2}^{\mathrm{MLE}}}$ | ${\widetilde{\rho }_{2}}$ |
50 | −0.13333 | −0.08952 | −0.10667 | −0.00976 | −0.05341 | 0.04623 |
100 | −0.07309 | −0.03134 | −0.06057 | 0.01046 | −0.01752 | 0.03784 |
150 | −0.05414 | −0.01796 | −0.04592 | 0.01191 | −0.01095 | 0.03014 |
$({\rho _{1}},{\rho _{2}})=(0.15,-0.55)$ | ||||||
n | ${\widehat{\rho }_{1}}$ | ${\widehat{\rho }_{1}^{\mathrm{MLE}}}$ | ${\widetilde{\rho }_{1}}$ | ${\widehat{\rho }_{2}}$ | ${\widehat{\rho }_{2}^{\mathrm{MLE}}}$ | ${\widetilde{\rho }_{2}}$ |
50 | −0.06690 | −0.06346 | −0.03622 | −0.06690 | −0.09484 | −0.00988 |
100 | −0.02493 | −0.01971 | −0.01041 | −0.02765 | −0.03755 | −0.00032 |
150 | −0.01475 | −0.01109 | −0.00521 | −0.01663 | −0.02199 | 0.00137 |
$({\rho _{1}},{\rho _{2}})=(0.15,0.55)$ | ||||||
n | ${\widehat{\rho }_{1}}$ | ${\widehat{\rho }_{1}^{\mathrm{MLE}}}$ | ${\widetilde{\rho }_{1}}$ | ${\widehat{\rho }_{2}}$ | ${\widehat{\rho }_{2}^{\mathrm{MLE}}}$ | ${\widetilde{\rho }_{2}}$ |
50 | −0.00862 | −0.00733 | −0.00260 | 0.00769 | 0.02588 | −0.00577 |
100 | −0.00305 | −0.00265 | −0.00004 | 0.00247 | 0.00854 | −0.00410 |
150 | −0.00250 | −0.00223 | −0.00049 | 0.00095 | 0.00473 | −0.00340 |
The biases of the proposed estimators ${\widetilde{\rho }_{1}}$ and ${\widetilde{\rho }_{2}}$ are less than those of ${\widehat{\rho }_{1}}$ and ${\widehat{\rho }_{2}}$, respectively, except for the case as mentioned the above. Moreover, the calibration of ${\widetilde{\rho }_{1}}$ and ${\widetilde{\rho }_{2}}$ depends on the values of ${\widehat{\rho }_{1}}$ and ${\widehat{\rho }_{2}}$. Next, we compare the MSEs between ${\widehat{\rho }_{1}}$ and ${\widetilde{\rho }_{1}}$, and between ${\widehat{\rho }_{2}}$ and ${\widetilde{\rho }_{2}}$.
Table 8.
Comparisons of the estimated MSEs for the estimators of ${\rho _{1}}$ and ${\rho _{2}}$
$({\rho _{1}},{\rho _{2}})=(0.25,-0.25)$ | ||||||
n | ${\widehat{\rho }_{1}}$ | ${\widehat{\rho }_{1}^{\mathrm{MLE}}}$ | ${\widetilde{\rho }_{1}}$ | ${\widehat{\rho }_{2}}$ | ${\widehat{\rho }_{2}^{\mathrm{MLE}}}$ | ${\widetilde{\rho }_{2}}$ |
50 | 0.03102 | 0.03332 | 0.02822 | 0.03684 | 0.04005 | 0.03055 |
100 | 0.01136 | 0.01157 | 0.01090 | 0.01233 | 0.01264 | 0.01124 |
150 | 0.00706 | 0.00711 | 0.00686 | 0.00739 | 0.00748 | 0.00695 |
$({\rho _{1}},{\rho _{2}})=(0.40,-0.59)$ | ||||||
n | ${\widehat{\rho }_{1}}$ | ${\widehat{\rho }_{1}^{\mathrm{MLE}}}$ | ${\widetilde{\rho }_{1}}$ | ${\widehat{\rho }_{2}}$ | ${\widehat{\rho }_{2}^{\mathrm{MLE}}}$ | ${\widetilde{\rho }_{2}}$ |
50 | 0.04705 | 0.03905 | 0.03866 | 0.02783 | 0.03581 | 0.02665 |
100 | 0.01566 | 0.01050 | 0.01364 | 0.00944 | 0.01016 | 0.01021 |
150 | 0.00921 | 0.00579 | 0.00826 | 0.00564 | 0.00569 | 0.00619 |
$({\rho _{1}},{\rho _{2}})=(0.15,-0.55)$ | ||||||
n | ${\widehat{\rho }_{1}}$ | ${\widehat{\rho }_{1}^{\mathrm{MLE}}}$ | ${\widetilde{\rho }_{1}}$ | ${\widehat{\rho }_{2}}$ | ${\widehat{\rho }_{2}^{\mathrm{MLE}}}$ | ${\widetilde{\rho }_{2}}$ |
50 | 0.03108 | 0.03492 | 0.02635 | 0.03334 | 0.04263 | 0.02561 |
100 | 0.01051 | 0.01124 | 0.00972 | 0.01123 | 0.01288 | 0.00985 |
150 | 0.00597 | 0.00613 | 0.00568 | 0.00620 | 0.00668 | 0.00569 |
$({\rho _{1}},{\rho _{2}})=(0.15,0.55)$ | ||||||
n | ${\widehat{\rho }_{1}}$ | ${\widehat{\rho }_{1}^{\mathrm{MLE}}}$ | ${\widetilde{\rho }_{1}}$ | ${\widehat{\rho }_{2}}$ | ${\widehat{\rho }_{2}^{\mathrm{MLE}}}$ | ${\widetilde{\rho }_{2}}$ |
50 | 0.01943 | 0.01993 | 0.01848 | 0.02612 | 0.03126 | 0.02306 |
100 | 0.00814 | 0.00818 | 0.00795 | 0.00937 | 0.00993 | 0.00883 |
150 | 0.00511 | 0.00513 | 0.00503 | 0.00559 | 0.00577 | 0.00538 |
There is no difference between the estimated MSEs of ${\widehat{\rho }_{i}}$ and ${\widetilde{\rho }_{i}}$ $(i=1,2)$, which certainly coincides with the discussion in Section 3.
5 Proof of theorems
5.1 Proof of Theorem 3.1
The second derivatives are given by the inverse of the Fisher information matrix and the components of the matrix can be obtained by the expectation of the second derivative of the log-likelihood functions. The expectations of the components are given in Proposition A.4, and then the Fisher information matrix is
where
\[ {I_{n}}(\theta )=\left[\begin{array}{c@{\hskip10.0pt}c@{\hskip10.0pt}c@{\hskip10.0pt}c}{i_{\mu \mu }}& 0& 0& 0\\ {} 0& {i_{\sigma \sigma }}& 0& 0\\ {} 0& 0& {i_{{\rho _{1}}{\rho _{1}}}}& {i_{{\rho _{1}}{\rho _{2}}}}\\ {} 0& 0& {i_{{\rho _{1}}{\rho _{2}}}}& {i_{{\rho _{2}}{\rho _{2}}}}\end{array}\right],\]
where
\[\begin{array}{l}\displaystyle {i_{\mu \mu }}=-{E_{}}[{l_{\mu \mu }}]=\frac{1}{{\sigma ^{2}}}{\sum \limits_{t=1}^{n}}{d_{t-1}^{2}},\hspace{1em}{i_{\sigma \sigma }}=-{E_{}}[{l_{\sigma \sigma }}]=\frac{2n}{{\sigma ^{2}}},\\ {} \displaystyle {i_{{\rho _{1}}{\rho _{1}}}}=-{E_{}}[{l_{{\rho _{1}}{\rho _{1}}}}]={\sum \limits_{t=1}^{n}}{\sum \limits_{k=1}^{t-1}}{({\varphi _{1}}(k))^{2}},\\ {} \displaystyle {i_{{\rho _{2}}{\rho _{2}}}}\hspace{-0.1667em}=\hspace{-0.1667em}-{E_{}}[{l_{{\rho _{2}}{\rho _{2}}}}]\hspace{-0.1667em}=\hspace{-0.1667em}{\sum \limits_{t=1}^{n}}{\sum \limits_{k=1}^{t-2}}{({\varphi _{1}}(k))^{2}},\hspace{1em}{i_{{\rho _{1}}{\rho _{2}}}}\hspace{-0.1667em}=\hspace{-0.1667em}-{E_{}}[{l_{{\rho _{1}}{\rho _{2}}}}]={\sum \limits_{t=1}^{n}}{\sum \limits_{k=2}^{t-1}}{\varphi _{1}}(k){\varphi _{1}}(k-1).\end{array}\]
The functions ${\varphi _{1}}$ and ${d_{t}}$ are defined in (A.3) and (A.5), respectively. Thus, the inverse matrix is given by
(15)
\[ {I_{n}}{(\theta )^{-1}}=\left[\begin{array}{c@{\hskip10.0pt}c@{\hskip10.0pt}c@{\hskip10.0pt}c}{i^{\mu \mu }}& 0& 0& 0\\ {} 0& {i^{\sigma \sigma }}& 0& 0\\ {} 0& 0& {i^{{\rho _{1}}{\rho _{1}}}}& {i^{{\rho _{1}}{\rho _{2}}}}\\ {} 0& 0& {i^{{\rho _{1}}{\rho _{2}}}}& {i^{{\rho _{2}}{\rho _{2}}}}\end{array}\right],\]
\[\begin{array}{l}\displaystyle {i^{\mu \mu }}=\frac{1}{{i_{\mu \mu }}}=\frac{{\sigma ^{2}}}{{\textstyle\textstyle\sum _{t=1}^{n}}{d_{t-1}^{2}}},\hspace{1em}{i^{\sigma \sigma }}=\frac{1}{{i_{\sigma \sigma }}}=\frac{{\sigma ^{2}}}{2n},\\ {} \displaystyle {i^{{\rho _{1}}{\rho _{1}}}}={M^{-1}}{i_{{\rho _{2}}{\rho _{2}}}},\hspace{1em}{i^{{\rho _{1}}{\rho _{2}}}}=-{M^{-1}}{i_{{\rho _{1}}{\rho _{2}}}},\hspace{1em}{i^{{\rho _{2}}{\rho _{2}}}}={M^{-1}}{i_{{\rho _{1}}{\rho _{1}}}},\\ {} \displaystyle M={i_{{\rho _{1}}{\rho _{1}}}}{i_{{\rho _{2}}{\rho _{2}}}}-{({i_{{\rho _{1}}{\rho _{2}}}})^{2}}.\end{array}\]
We shall compute the limiting values for ${i^{\mu \mu }}$, ${i^{\sigma \sigma }}$, ${i^{{\rho _{1}}{\rho _{1}}}}$, ${i^{{\rho _{1}}{\rho _{2}}}}$, and ${i^{{\rho _{2}}{\rho _{2}}}}$. Using the different expression (B.4) for ${\varphi _{1}}$, we have
We consider the summation from $t=1$ to n of ${d_{t-1}^{2}}$ to get ${i^{\mu \mu }}$. The second term in ${d_{t-1}}$ is not a main term for the summation as $n\to \infty $. Therefore,
Similarly, we have
(16)
\[ {d_{t-1}}={\sum \limits_{t=1}^{t}}{\varphi _{1}}(k)=\frac{1}{1+{\rho _{1}}+{\rho _{2}}}+\frac{{\rho _{2}}{\varphi _{1}}(t)-{\varphi _{1}}(t+1)}{1+{\rho _{1}}+{\rho _{2}}}.\]
\[ {\sum \limits_{k=1}^{t-1}}{({\varphi _{1}}(k))^{2}}=\frac{1}{{\Delta ^{2}}}\left(\frac{{\lambda _{1}^{2}}-{\lambda _{1}^{2t}}}{1-{\lambda _{1}^{2}}}-2\frac{{\rho _{2}}-{\rho _{2}^{t}}}{1-{\rho _{2}}}+\frac{{\lambda _{2}^{2}}-{\lambda _{2}^{2t}}}{1-{\lambda _{2}^{2}}}\right)\]
and
\[ {\sum \limits_{k=2}^{t-1}}{\varphi _{1}}(k){\varphi _{1}}(k-1)=\frac{1}{{\Delta ^{2}}}\left(\frac{{\lambda _{1}^{2}}-{\lambda _{1}^{2t-1}}}{1-{\lambda _{1}^{2}}}+{\rho _{1}}\frac{1-{\rho _{2}^{t}}}{1-{\rho _{2}}}+\frac{{\lambda _{2}^{2}}-{\lambda _{2}^{2t-1}}}{1-{\lambda _{2}^{2}}}\right),\]
and then
\[\begin{aligned}{}{i_{{\rho _{1}}{\rho _{1}}}}& ={i_{{\rho _{2}}{\rho _{2}}}}=\frac{1}{{\Delta ^{2}}}\left(\frac{{\lambda _{1}^{2}}}{1-{\lambda _{1}^{2}}}-2\frac{{\rho _{2}}}{1-{\rho _{2}}}+\frac{{\lambda _{2}^{2}}}{1-{\lambda _{2}^{2}}}\right)n+o(n)\\ {} & =-\frac{1+{\rho _{2}}}{(1-{\rho _{1}^{2}}+2{\rho _{2}}+{\rho _{2}^{2}})(1-{\rho _{2}})}n+o(n)\end{aligned}\]
and
\[\begin{aligned}{}{i_{{\rho _{1}}{\rho _{2}}}}& =\frac{1}{{\Delta ^{2}}}\left(\frac{{\lambda _{1}}}{1-{\lambda _{1}^{2}}}+\frac{{\rho _{1}}}{1-{\rho _{2}}}+\frac{{\lambda _{2}}}{1-{\lambda _{2}^{2}}}\right)n+o(n)\\ {} & =-\frac{{\rho _{1}}}{(1-{\rho _{1}^{2}}+2{\rho _{2}}+{\rho _{2}^{2}})(1-{\rho _{2}})}n+o(n).\end{aligned}\]
Thus,
\[ M={i_{{\rho _{1}}{\rho _{1}}}}{i_{{\rho _{2}}{\rho _{2}}}}-{({i_{{\rho _{1}}{\rho _{2}}}})^{2}}=-\frac{1}{(1-{\rho _{1}^{2}}+2{\rho _{2}}+{\rho _{2}^{2}}){(1-{\rho _{2}})^{2}}}{n^{2}}+o({n^{2}}).\]
Therefore, we obtain
5.2 Proof of Theorem 3.2
Eq. (9) is trivial by Lemma 2.3 and Proposition A.6. We show (10) using Lemma A.7. The lemma can be reduced to
if ${i^{{\theta _{r}}{\theta _{s}}}}=0$ for all $s\ne r$. We see ${i^{\sigma \mu }}={i^{\sigma {\rho _{1}}}}={i^{\sigma {\rho _{1}}}}=0$ from (15), and then we can apply (19) for the bias $\hat{\sigma }$. The components in the sum (19) are
The summation of (22), (23), and (24) is
(19)
\[ {E_{}}[{\widehat{\theta }_{r}}-{\theta _{r}}]=\frac{1}{2}{i^{{\theta _{r}}{\theta _{r}}}}{\sum \limits_{t=1}^{d}}{\sum \limits_{u=1}^{d}}{i^{{\theta _{t}}{\theta _{u}}}}({\nu _{{\theta _{r}}{\theta _{t}}{\theta _{u}}}}+2{\nu _{{\theta _{r}}{\theta _{t}},{\theta _{u}}}})+O({n^{-3/2}})\](20)
\[\begin{array}{r@{\hskip10.0pt}c@{\hskip10.0pt}l}& & \displaystyle {i^{\mu \mu }}({\nu _{\sigma \mu \mu }}+2{\nu _{\sigma \mu ,\mu }})=\frac{{\sigma ^{2}}}{{\textstyle\textstyle\sum _{t=1}^{n}}{d_{t-1}^{2}}}\left(-\frac{2}{{\sigma ^{3}}}{\sum \limits_{t=1}^{n}}{d_{t-1}^{2}}\right)=-\frac{2}{\sigma },\end{array}\](21)
\[\begin{array}{r@{\hskip10.0pt}c@{\hskip10.0pt}l}& & \displaystyle {i^{\sigma \sigma }}({\nu _{\sigma \sigma \sigma }}+2{\nu _{\sigma \sigma ,\sigma }})=\frac{{\sigma ^{2}}}{2n}\left(-\frac{2n}{{\sigma ^{3}}}\right)=-\frac{1}{\sigma },\end{array}\](22)
\[\begin{array}{r@{\hskip10.0pt}c@{\hskip10.0pt}l}& & \displaystyle {i^{{\rho _{1}}{\rho _{1}}}}({\nu _{\sigma {\rho _{1}}{\rho _{1}}}}+2{\nu _{\sigma {\rho _{1}},{\rho _{1}}}})={M^{-1}}{i_{{\rho _{2}}{\rho _{2}}}}\left(-\frac{2}{\sigma }{i_{{\rho _{1}}{\rho _{1}}}}\right)=-\frac{2{i_{{\rho _{1}}{\rho _{1}}}}{i_{{\rho _{2}}{\rho _{2}}}}}{M\sigma },\end{array}\](23)
\[\begin{array}{r@{\hskip10.0pt}c@{\hskip10.0pt}l}& & \displaystyle {i^{{\rho _{2}}{\rho _{2}}}}({\nu _{\sigma {\rho _{2}}{\rho _{2}}}}+2{\nu _{\sigma {\rho _{2}},{\rho _{2}}}})={M^{-1}}{i_{{\rho _{1}}{\rho _{1}}}}\left(-\frac{2}{\sigma }{i_{{\rho _{2}}{\rho _{2}}}}\right)=-\frac{2{i_{{\rho _{1}}{\rho _{1}}}}{i_{{\rho _{2}}{\rho _{2}}}}}{M\sigma },\end{array}\](24)
\[\begin{array}{r@{\hskip10.0pt}c@{\hskip10.0pt}l}& & \displaystyle {i^{{\rho _{1}}{\rho _{2}}}}({\nu _{\sigma {\rho _{1}}{\rho _{2}}}}+2{\nu _{\sigma {\rho _{1}},{\rho _{2}}}})+{i^{{\rho _{2}}{\rho _{1}}}}({\nu _{\sigma {\rho _{2}}{\rho _{1}}}}+2{\nu _{\sigma {\rho _{2}},{\rho _{1}}}})\\ {} & \displaystyle =& \displaystyle 2{i^{{\rho _{1}}{\rho _{2}}}}({\nu _{\sigma {\rho _{1}}{\rho _{2}}}}+{\nu _{\sigma {\rho _{1}},{\rho _{2}}}}+{\nu _{\sigma {\rho _{2}},{\rho _{1}}}})\\ {} & \displaystyle =& \displaystyle -2{M^{-1}}{i_{{\rho _{1}}{\rho _{2}}}}\left(-\frac{2}{\sigma }{i_{{\rho _{1}}{\rho _{2}}}}\right)=\frac{4{({i_{{\rho _{1}}{\rho _{2}}}})^{2}}}{M\sigma }.\end{array}\]
\[ -\frac{4{i_{{\rho _{1}}{\rho _{1}}}}{i_{{\rho _{2}}{\rho _{2}}}}}{M\sigma }+\frac{4{({i_{{\rho _{1}}{\rho _{2}}}})^{2}}}{M\sigma }=-\frac{4}{\sigma }.\]
By adding (20) and (21) on the above, we obtain (10).Next, we show (11) and (12) using Proposition A.8. The 10 components of the equation in Lemma 2.3 which are used for the calculation of the biases of ${\rho _{1}}$ and ${\rho _{2}}$ are given by merely substituting the results in Proposition A.8:
\[\begin{array}{r@{\hskip10.0pt}c@{\hskip10.0pt}l}& & \displaystyle {i^{\mu \mu }}({\nu _{{\rho _{1}}\mu \mu }}+2{\nu _{{\rho _{1}}\mu ,\mu }})=\frac{-2{\sum \limits_{t=1}^{n}}{\sum \limits_{k=1}^{t-1}}{d_{t-1}}{d_{t-k-1}}{\varphi _{1}}(k)}{{\sum \limits_{s=1}^{n}}{d_{s-1}^{2}}},\\ {} & & \displaystyle {i^{\sigma \sigma }}({\nu _{{\rho _{1}}\sigma \sigma }}+2{\nu _{{\rho _{1}}\sigma ,\sigma }})=0,\\ {} & & \displaystyle {i^{{\rho _{1}}{\rho _{1}}}}({\nu _{{\rho _{1}}{\rho _{1}}{\rho _{1}}}}+2{\nu _{{\rho _{1}}{\rho _{1}},{\rho _{1}}}})={M^{-1}}{i_{{\rho _{2}}{\rho _{2}}}}{\sum \limits_{t=1}^{n}}({S_{1,t}}+2{T_{0,0,t}}),\\ {} & & \displaystyle 2{i^{{\rho _{1}}{\rho _{2}}}}({\nu _{{\rho _{1}}{\rho _{1}}{\rho _{2}}}}+{\nu _{{\rho _{1}}{\rho _{1}},{\rho _{2}}}}+{\nu _{{\rho _{1}}{\rho _{2}},{\rho _{1}}}})=-2{M^{-1}}{i_{{\rho _{1}}{\rho _{2}}}}{\sum \limits_{t=1}^{n}}({S_{2,t}}+{T_{0,1,t}}+{T_{1,0,t}}),\\ {} & & \displaystyle {i^{{\rho _{2}}{\rho _{2}}}}({\nu _{{\rho _{1}}{\rho _{2}}{\rho _{2}}}}+2{\nu _{{\rho _{1}}{\rho _{2}},{\rho _{2}}}})={M^{-1}}{i_{{\rho _{1}}{\rho _{1}}}}{\sum \limits_{t=1}^{n}}({S_{3,t}}+2{T_{1,1,t}}),\end{array}\]
and
\[\begin{aligned}{}& {i^{\mu \mu }}({\nu _{{\rho _{2}}\mu \mu }}+2{\nu _{{\rho _{2}}\mu ,\mu }})=\frac{-2{\sum \limits_{t=1}^{n}}{\sum \limits_{k=1}^{t-2}}{d_{t-1}}{d_{t-k-2}}{\varphi _{1}}(k)}{{\sum \limits_{s=1}^{n}}{d_{s-1}^{2}}},\\ {} & {i^{\sigma \sigma }}({\nu _{{\rho _{2}}\sigma \sigma }}+2{\nu _{{\rho _{2}}\sigma ,\sigma }})=0,\\ {} & {i^{{\rho _{1}}{\rho _{1}}}}({\nu _{{\rho _{2}}{\rho _{1}}{\rho _{1}}}}+2{\nu _{{\rho _{2}}{\rho _{1}},{\rho _{1}}}})={M^{-1}}{i_{{\rho _{2}}{\rho _{2}}}}{\sum \limits_{t=1}^{n}}({S_{0,t-1}}+2{T_{1,0,t}}),\\ {} & 2{i^{{\rho _{1}}{\rho _{2}}}}({\nu _{{\rho _{1}}{\rho _{2}}{\rho _{2}}}}+{\nu _{{\rho _{1}}{\rho _{2}},{\rho _{2}}}}+{\nu _{{\rho _{2}}{\rho _{2}},{\rho _{1}}}})\hspace{-0.1667em}=\hspace{-0.1667em}-2{M^{-1}}{i_{{\rho _{1}}{\rho _{2}}}}{\sum \limits_{t=1}^{n}}({S_{1,t-1}}+{T_{1,1,t}}+{T_{0,0,t-1}}),\\ {} & {i^{{\rho _{2}}{\rho _{2}}}}({\nu _{{\rho _{2}}{\rho _{2}}{\rho _{2}}}}+2{\nu _{{\rho _{2}}{\rho _{2}},{\rho _{2}}}})={M^{-1}}{i_{{\rho _{1}}{\rho _{1}}}}{\sum \limits_{t=1}^{n}}({S_{2,t-1}}+2{T_{0,1,t}}),\end{aligned}\]
where ${S_{p,q}}$ and ${T_{p,q,t}}$ are defined in (A.26) and (A.27), respectively. Now, we want to know the limiting values of the right-hand side of the above equations and to evaluate the sum of the first 5 equations and the remaining 5 equations. Let U and V be the sum of the first 5 equations and that of the remaining 5 equations, respectively. We see that
\[\begin{array}{r@{\hskip10.0pt}c@{\hskip10.0pt}l}\displaystyle {\sum \limits_{t=1}^{n}}{\sum \limits_{k=1}^{t-1}}{d_{t-1}}{d_{t-k-1}}{\varphi _{1}}(k)& \displaystyle =& \displaystyle \frac{1}{{({\rho _{1}}+{\rho _{2}}+1)^{3}}}n+o(n),\\ {} \displaystyle {\sum \limits_{t=1}^{n}}{\sum \limits_{k=1}^{t-2}}{d_{t-1}}{d_{t-k-2}}{\varphi _{1}}(k)& \displaystyle =& \displaystyle \frac{1}{{({\rho _{1}}+{\rho _{2}}+1)^{3}}}n+o(n)\end{array}\]
by (16). What regards the other summations, the values are constructed by ${S_{p,q}}$ and ${T_{p,q,t}}$. The components of ${S_{p,q}}$ and ${T_{p,q,t}}$ are given by using (B.4) and (B.5) which are a sort of geometric series. We consider the summations of ${S_{p,q}}$ and ${T_{p,q,t}}$ with respect to t from 1 to n as $n\to \infty $. Since the components are expressed as a sort of geometric series, the following limiting evaluations are useful.
\[\begin{array}{r@{\hskip10.0pt}c@{\hskip10.0pt}l}\displaystyle {\sum \limits_{t=1}^{n}}{\sum \limits_{k=1}^{t}}{\alpha ^{k}}=\frac{\alpha }{1-\alpha }n+o(n),& & \displaystyle {\sum \limits_{t=1}^{n}}{\sum \limits_{k=1}^{t}}(k+1){\alpha ^{k+1}}=\frac{{\alpha ^{2}}(2-\alpha )}{{(1-\alpha )^{2}}}n+o(n),\\ {} \displaystyle {\sum \limits_{t=1}^{n}}{\sum \limits_{k=1}^{t}}{\sum \limits_{m=1}^{t-k}}{\alpha ^{k}}{\beta ^{m}}& \displaystyle =& \displaystyle \frac{\alpha \beta }{(1-\alpha )(1-\beta )}n+o(n)\end{array}\]
for any $|\alpha |<1$ and $|\beta |<1$. Then, the summations of ${S_{p,t}}$ and ${T_{p,q,t}}$ with respect to t are
\[\begin{aligned}{}{\sum \limits_{t=1}^{n}}{S_{1,t}}& =\frac{2{\rho _{1}}({\rho _{2}}+1)}{({\rho _{2}}-1){\{{\rho _{1}^{2}}-{({\rho _{2}}+1)^{2}}\}^{2}}}n+o(n),\\ {} {\sum \limits_{t=1}^{n}}{S_{2,t}}& =\frac{2\{{\rho _{1}^{2}}-{\rho _{2}}{({\rho _{2}}+1)^{2}}\}}{{({\rho _{2}}-1)^{2}}{\{{\rho _{1}^{2}}-{({\rho _{2}}+1)^{2}}\}^{2}}}n+o(n),\\ {} {\sum \limits_{t=1}^{n}}{S_{3,t}}& =\frac{2{\rho _{1}}\{-{\rho _{1}^{2}}+2{\rho _{2}}({\rho _{2}}+1)\}}{{({\rho _{2}}-1)^{2}}{\{{\rho _{1}^{2}}-{({\rho _{2}}+1)^{2}}\}^{2}}}n+o(n),\\ {} {\sum \limits_{t=1}^{n}}{S_{0,t-1}}& =\frac{2\{1+{\rho _{2}}(2-{\rho _{1}^{2}}+{\rho _{2}})\}}{{({\rho _{2}}-1)^{2}}{\{{\rho _{1}^{2}}-{({\rho _{2}}+1)^{2}}\}^{2}}}n+o(n),\\ {} {\sum \limits_{t=1}^{n}}{S_{1,t-1}}& =\frac{2{\rho _{1}}({\rho _{2}}+1)}{{({\rho _{2}}-1)^{2}}{\{{\rho _{1}^{2}}-{({\rho _{2}}+1)^{2}}\}^{2}}}n+o(n),\\ {} {\sum \limits_{t=1}^{n}}{S_{2,t-1}}& =\frac{2\{{\rho _{1}^{2}}-{\rho _{2}}{({\rho _{2}}+1)^{2}}\}}{{({\rho _{2}}-1)^{2}}{\{{\rho _{1}^{2}}-{({\rho _{2}}+1)^{2}}\}^{2}}}n+o(n),\\ {} {\sum \limits_{t=1}^{n}}{T_{0,0,t}}& =\frac{-2{\rho _{1}}({\rho _{2}}+1)}{({\rho _{2}}-1){\{{\rho _{1}^{2}}-{({\rho _{2}}+1)^{2}}\}^{2}}}n+o(n),\\ {} {\sum \limits_{t=1}^{n}}{T_{0,1,t}}& =\frac{-2\{{\rho _{1}^{2}}-{\rho _{2}}{({\rho _{2}}+1)^{2}}\}}{{({\rho _{2}}-1)^{2}}{\{{\rho _{1}^{2}}-{({\rho _{2}}+1)^{2}}\}^{2}}}n+o(n),\\ {} {\sum \limits_{t=1}^{n}}{T_{1,0,t}}& =\frac{{\rho _{1}^{2}}+{({\rho _{2}}+1)^{2}}}{({\rho _{2}}-1){\{{\rho _{1}^{2}}-{({\rho _{2}}+1)^{2}}\}^{2}}}n+o(n),\\ {} {\sum \limits_{t=1}^{n}}{T_{1,1,t}}& =\frac{{\rho _{1}}(1+{\rho _{1}^{2}}-2{\rho _{2}}-3{\rho _{2}^{2}})}{{({\rho _{2}}-1)^{2}}{\{{\rho _{1}^{2}}-{({\rho _{2}}+1)^{2}}\}^{2}}}n+o(n),\\ {} {\sum \limits_{t=1}^{n}}{T_{0,0,t-1}}& =\frac{-2{\rho _{1}}({\rho _{2}}+1)}{({\rho _{2}}-1){\{{\rho _{1}^{2}}-{({\rho _{2}}+1)^{2}}\}^{2}}}n+o(n).\end{aligned}\]
Now, we are ready to use Lemma 2.3. Using the above evaluations, we have
\[\begin{aligned}{}U& =\frac{-2}{{\rho _{1}}+{\rho _{2}}+1}+\frac{4{\rho _{1}}{({\rho _{2}}+1)^{2}}-2{\rho _{1}}\{{\rho _{1}^{2}}+{({\rho _{2}}+1)^{2}}\}}{{\{{\rho _{1}^{2}}-{({\rho _{2}}+1)^{2}}\}^{2}}}+o(1),\\ {} V& =\frac{-2}{{\rho _{1}}+{\rho _{2}}+1}+\frac{4{\rho _{1}}({\rho _{2}}+1)\{{\rho _{1}^{2}}-{\rho _{2}}{({\rho _{2}}+1)^{2}}\}+2{\rho _{1}^{2}}(1+{\rho _{1}^{2}}-2{\rho _{2}}-3{\rho _{2}^{2}})}{(1-{\rho _{2}}){\{{\rho _{1}^{2}}-{({\rho _{2}}+1)^{2}}\}^{2}}}\\ {} & \hspace{1em}+o(1).\end{aligned}\]
By (17) and (18), the biases of the unknown parameters ${\widehat{\rho }_{1}}$ and ${\widehat{\rho }_{2}}$ are
\[ {E_{}}[{\widehat{\rho }_{1}}-{\rho _{1}}]=\frac{1}{2}{i^{{\rho _{1}}{\rho _{1}}}}U+\frac{1}{2}{i^{{\rho _{1}}{\rho _{2}}}}V=\frac{{\rho _{1}}+{\rho _{2}}-1}{n}+o({n^{-1}})\]
and
\[ {E_{}}[{\widehat{\rho }_{2}}-{\rho _{2}}]=\frac{1}{2}{i^{{\rho _{1}}{\rho _{2}}}}U+\frac{1}{2}{i^{{\rho _{2}}{\rho _{2}}}}V=\frac{3{\rho _{2}}-1}{n}+o({n^{-1}}).\]
Therefore, we obtain (11) and (12).6 Practical examples
We provide a practical example using quarterly U.S. GNP from $1947(1)$ to $2002(3)$, $n=223$ observations. The original data which are provided by the Federal Reserve Bank of St. Louis [16] are introduced after being adjusted as a good example for MA(2) in [13] when the data are transformed to compute the GNP rate from ${X_{t}}$ by
where ${X_{t}}$ is the U.S. GNP. The adjusted data which are different from the original can be obtained from the web site by the author of the book.
We do not know true values for unknown parameters in the MA(2) but we assume that the GNP rate follows
where the coefficients are calculated by the (full) maximum likelihood estimation using $n=223-1=222$ observations because we need to take the difference by (25) and confirm whether the bias using the model with the last 20 samples is reduced by our method or not.
Table 9.
MLE and conditional MLE with a correction for U.S. GNP using MA$(2)$
μ | σ | ${\rho _{1}}$ | ${\rho _{2}}$ | |
TRUE ($n=222$) | 0.0083 | 0.0094 | 0.3028 | 0.2036 |
MLE ($n=20$) | 0.0071 | 0.0060 | 0.1446 | 0.1370 |
QMLE ($n=20$) | 0.0072 | 0.0060 | 0.1564 | 0.1321 |
corrected MLE | – | 0.0065 | 0.1824 | 0.1680 |
corrected QMLE | – | 0.0065 | 0.1939 | 0.1639 |
Bias for MLE | −0.0012 | −0.0035 | −0.1582 | −0.0666 |
Bias for QMLE | −0.0012 | −0.0035 | −0.1464 | −0.0714 |
Bias for corrected MLE | – | −0.0029 | −0.1204 | −0.0356 |
Bias for corrected QMLE | – | −0.0029 | −0.1089 | −0.0397 |
For an unknown parameter $\theta \hspace{2.5pt}(\theta =\mu ,\sigma ,{\rho _{1}},{\rho _{2}})$, MLE and QMLE in Table 9 correspond to the maximum likelihood estimate ${\widehat{\theta }^{\text{MLE}}}$ and the conditional maximum likelihood estimate (quasi-maximum likelihood estimate) $\widehat{\theta }$ using $n=20$ observations. The corrected MLE and corrected QMLE correspond to the result by Tanaka (1984) [15] and $\tilde{\theta }$ defined in (13), respectively. The reason why there are dashes (–) in the table for μ is that the correction is not required for μ. The four biases in the table from the bottom correspond to those for MLE, QMLE, corrected MLE, and corrected QMLE. We note that the corrected MLE is expected to be the best when we use $n=20$ observations because the estimate uses the full maximum likelihood estimation. The both corrections work well, namely the bias for true model (26) become small against the models without the corrections.