Asymptotic normality of corrected estimator in Cox proportional hazards model with measurement error

Chimisov, C.; Kukush, A.

doi:10.15559/vmsta-2014.1.1.3

Abstract

Cox proportional hazards model is considered. In Kukush et al. (2011), Journal of Statistical Research, Vol. 45, No. 2, 77–94 simultaneous estimators $\lambda _{n}(\cdot )$ and $\beta _{n}$ of baseline hazard rate $\lambda (\cdot )$ and regression parameter β are studied. The estimators maximize the objective function that corrects the log-likelihood function for measurement errors and censoring. Parameter sets for $\lambda (\cdot )$ and β are convex compact sets in $C[0,\tau ]$ and ${\mathbb{R}}^{k}$, respectively. In present paper the asymptotic normality for $\beta _{n}$ and linear functionals of $\lambda _{n}(\cdot )$ is shown. The results are valid as well for a model without measurement errors. A way to compute the estimators is discussed based on the fact that $\lambda _{n}(\cdot )$ is a linear spline.

1 Introduction

We deal with Cox proportional hazards model where a lifetime $T\ge 0$ has the following intensity function

(1.1)

\[\lambda (t|X;\lambda ,\beta )=\lambda (t)\exp \big({\beta }^{\mathsf{T}}X\big),\hspace{1em}t\ge 0.\]

Here we say that positive random variable ξ has intensity function $\tilde{\lambda }(\cdot )$ if

\[\tilde{\lambda }(t)=\underset{h\to 0_{+}}{\lim }{h}^{-1}\mathbf{P}\{t\le \xi <t+h|\hspace{2.5pt}\xi \ge t\},\hspace{1em}t\ge 0.\]

In (1.1) covariate X is a random vector distributed in ${\mathbb{R}}^{k}$, $\lambda (\cdot )\in \varTheta _{\lambda }\subset C[0,\tau ]$ is the baseline hazard function and β is a parameter from $\varTheta _{\beta }\subset {\mathbb{R}}^{k}$. We observe only censored value $Y:=\min \{T,C\}$, where censor C is distributed in $[0,\tau ]$. Survival function of C, $G_{C}(u)=1-F_{C}(u)$, is unknown but we know τ. Censorship indicator $\varDelta :=\mathbb{I}_{\{T\le C\}}$ is observed as well. X is not observed directly, instead a surrogate data $W=X+U$ is observed, where U has known and finite moment generating function $M_{U}(\beta ):=\mathbf{E}{e}^{{\beta }^{\mathsf{T}}U}$. Here E stands for expectation. A couple $(T,X)$, censor C and measurement error U are stochastically independent. We mention that recently measurement error models become quite popular, e.g., in [9] an autoregressive model with measurement error was studied.

Consider independent copies of the model $(X_{i},T_{i},C_{i},Y_{i},\varDelta _{i})$, $i=1,\dots ,n$. Based on $(Y_{i},\varDelta _{i},W_{i})$, $i=1,\dots ,n$, we estimate true values of β and $\lambda (\cdot )$ that we denote by $\beta _{0}$ and $\lambda _{0}(\cdot )$, respectively. The latter is estimated on $[0,\tau ]$ only.

There are a lot of papers on estimation of $\beta _{0}$ and cumulative hazard $\varLambda (t)={\int _{0}^{t}}\lambda (t)\hspace{0.1667em}\mathrm{d}t$. In [1] general ideas are presented based on partial likelihood. Same model but with measurement errors is considered in [4], where, based on Corrected Score method, consistent and asymptotically normal estimators are constructed for regression parameter and cumulative hazard function. Another approach is proposed in [6] where doubly censored data are considered without measurement error. Here cumulative hazard is estimated, and strong consistency and asymptotic normality of maximum likelihood estimators are proven. However, sometimes it is necessary to know the behaviour of baseline hazard function $\lambda (\cdot )$ itself, not cumulative hazard (see [10]). Our model is presented in [2] and [5] where baseline hazard function is assumed to belong to a parametric space while we consider $\lambda (\cdot )$ from a compact set of $C[0,\tau ]$.

If values of $X_{i}$ were measured without measurement error, we could use Maximum Likelihood Estimator (MLE) which maximizes the log-likelihood function

\[\tilde{Q}_{n}(\lambda ,\beta ):=\frac{1}{n}\sum \limits_{i=1}^{n}q(Y_{i},\varDelta _{i},X_{i};\lambda ,\beta ),\]

where

\[\tilde{q}(Y,\varDelta ,X;\lambda ,\beta )=\varDelta \big(\log \lambda (Y)+{\beta }^{\mathsf{T}}X\big)-{e}^{{\beta }^{\mathsf{T}}X}{\int _{0}^{Y}}\hspace{-0.1667em}\lambda (u)\hspace{0.1667em}\mathrm{d}u.\]

Since $X_{i}$ is contaminated, we have to correct our objective function for measurement error. Due to suggestion of Augustin [2] we construct a new objective function q such that

\[\mathbf{E}\big[q(Y_{i},\varDelta _{i},W_{i};\lambda ,\beta )|\hspace{2.5pt}Y_{i},\varDelta _{i},X_{i}\big]=\tilde{q}(Y_{i},\varDelta _{i},X_{i};\lambda ,\beta )\hspace{1em}\text{a.s.}\]

Then the corrected log-likelihood function is

(1.2)

\[Q_{n}(\lambda ,\beta ):=\frac{1}{n}\sum \limits_{i=1}^{n}q(Y_{i},\varDelta _{i},W_{i};\lambda ,\beta ),\]

where

(1.3)

\[q(Y,\varDelta ,W;\lambda ,\beta )=\varDelta \big(\log \lambda (Y)+{\beta }^{\mathsf{T}}W\big)-\frac{{e}^{{\beta }^{\mathsf{T}}W}}{M_{U}(\beta )}{\int _{0}^{Y}}\hspace{-0.1667em}\lambda (u)\hspace{0.1667em}\mathrm{d}u.\]

As an estimator of true parameters $(\lambda _{0},\beta _{0})$, we use a couple $(\lambda _{n},\beta _{n})$ which maximizes (1.2).

Introduce further assumptions.

(i) $\varTheta _{\lambda }=\{f:[0,\tau ]\to \mathbb{R}|\hspace{0.1667em}f(t)\ge a,\hspace{2.5pt}\forall t\in [0,\tau ],\hspace{2.5pt}f(0)\le A,\hspace{2.5pt}|\hspace{0.1667em}f(t)-f(s)|\le L|t-s|,\hspace{2.5pt}\forall t,s\in [0,\tau ]\}$, where $a>0,\hspace{2.5pt}A>a$ and $L>0$ are fixed constants.
(ii) $\varTheta _{\beta }$ is a compact and convex set in ${\mathbb{R}}^{k}$.
(iii) $\mathbf{E}U=0$ and for some $\varepsilon >0$,
\[\mathbf{E}\big[{e}^{2D\| U\| }\big]<\infty \hspace{1em}\text{where}\hspace{2.5pt}D:=\underset{\beta \in \varTheta _{\beta }}{\max }\| \beta \| +\varepsilon .\]
(iv) $\mathbf{E}[{e}^{2D\| X\| }]<\infty $ where $D>0$ is defined in (iii).
(v) τ is right endpoint of the distribution of C, i.e., $\mathbf{P}\{C>\tau \}=0$ and for all $\varepsilon >0$, $\mathbf{P}\{C>\tau -\varepsilon \}>0$.
(vi) The covariance matrix of random vector X is positive definite.
(vii) $\beta _{0}$ is an interior point of $\varTheta _{\beta }$.
(viii) $\lambda _{0}\in {\varTheta _{\lambda }^{\varepsilon }}$ for some $\varepsilon >0$, where ${\varTheta _{\lambda }^{\varepsilon }}:=\{f:[0,\tau ]\to \mathbb{R}|\hspace{0.1667em}f(t)\ge a+\varepsilon ,\hspace{2.5pt}\forall t\in [0,\tau ],\hspace{2.5pt}f(0)\le A-\varepsilon ,\hspace{2.5pt}|\hspace{0.1667em}f(t)-f(s)|\le (L-\varepsilon )|t-s|,\hspace{2.5pt}\forall t,s\in [0,\tau ]\}$.
(ix) $\mathbf{P}\{C>0\}=1$.

Remark.

Assumptions (i) to (ix) allow us to consider model without measurement error. One just has to set $U_{i}=0$ and $M_{U}(\beta )=1$. All results of the article are valid for this case as well.

In [7] the strong consistency of $(\lambda _{n},\beta _{n})$ is proven and the rate of convergence is presented. Our goal is to provide asymptotic normality for $\beta _{n}$ and $\lambda _{n}$. The paper is organised as follows. Section 2 states the main results on the asymptotic normality. Section 3 suggests the procedure for computation of the estimates. Section 4 proves the stochastic boundedness results. Section 5 proves auxiliary results, Section 6 gives the proof of the main result, and Section 7 concludes.

For a sequence of random variables $\{x_{n}\}$, notation $x_{n}=O_{p}(1)$ means that $\{x_{n}\}$ is stochastically bounded. We assume that censor C has pdf $f_{C}$ (this is a technical assumption that can be easily avoided). According to [7], Section 3, conditional density of $(Y,\varDelta )$ given X at point $(\lambda _{0},\beta _{0})$ equals

(1.4)

\[f(y,\delta |X)={f_{T}^{\delta }}(y|X){G_{T}^{1-\delta }}(y|X){f_{C}^{1-\delta }}{G_{C}^{\delta }}(y),\]

where $f_{T}$ is conditional pdf of T given X and $G_{T}$ is conditional survival function:

\[\begin{array}{r@{\hskip0pt}l}& \displaystyle f_{T}(t|X)=\varLambda (t|X;\lambda _{0},\beta _{0})\exp \bigg(-{\int _{0}^{t}}\lambda (s|X;\lambda _{0},\beta _{0})\hspace{0.1667em}\mathrm{d}s\bigg),\\{} & \displaystyle G_{T}(t|X)=\exp \bigg(-{\int _{0}^{t}}\lambda (s|X;\lambda _{0},\beta _{0})\hspace{0.1667em}\mathrm{d}s\bigg).\end{array}\]

Let Z be a normed linear space. For a function $f:\hspace{2.5pt}Z\to \mathbb{R}$ we denote ${f}^{(n)}(x_{0})$ its n-th Fréchet derivative at a point $x_{0}\in Z$. ${f}^{(n)}(x_{0})$ is n-linear form and for $h_{1},\dots ,h_{n}\in Z$ we denote $\langle {f}^{(n)}(x_{0}),(h_{1},\dots ,h_{n})\rangle $ the action of ${f}^{(n)}(x_{0})$. If $h_{1}=\cdots =h_{n}$ we simply write $\langle {f}^{(n)}(x_{0}),{(h_{1})}^{n}\rangle $ where it does not cause ambiguity. If a functional F acts on a product space $Z_{1}\times Z_{2}$ then elements of this space are denoted as $(h_{1},h_{2})\in Z_{1}\times Z_{2}$ and $\langle F,(h_{1},h_{2})\rangle $ stands for the action of F on $(h_{1},h_{2})$. For $x,y\in Z$, the following set is called an interval that connects x and y

\[[x,y]=\big\{\alpha x+(1-\alpha )y|\hspace{2.5pt}\alpha \in [0,1]\big\}.\]

2 Main result

We make some more notations. Let

\[\begin{array}{r@{\hskip0pt}l}& \displaystyle a(u)=\mathbf{E}\big[X{e}^{{\beta _{0}^{\mathsf{T}}}X}G_{T}(u|X)\big],\hspace{2em}b(u)=\mathbf{E}\big[{e}^{{\beta _{0}^{\mathsf{T}}}X}G_{T}(u|X)\big],\\{} & \displaystyle p(u,x)=\exp \big({\beta _{0}^{\mathsf{T}}}X\big)G_{T}(u|X),\\{} & \displaystyle T(u)=\mathbf{E}\big[X{X}^{\mathsf{T}}p(u,x)\big]\mathbf{E}\big[p(u,x)\big]-\mathbf{E}\big[Xp(u,x)\big]\mathbf{E}\big[{X}^{\mathsf{T}}p(u,x))\big].\end{array}\]

Denote

\[A=\mathbf{E}\bigg[X{X}^{\mathsf{T}}\exp \big({\beta _{0}^{\mathsf{T}}}X\big){\int _{0}^{Y}}\hspace{-0.1667em}\lambda _{0}(u)\hspace{0.1667em}\mathrm{d}u\bigg],\hspace{2em}M={\int _{0}^{\tau }}T(u)K(u)G_{C}(u)\hspace{0.1667em}\mathrm{d}u,\]

where $K(u)=\frac{\lambda _{0}(u)}{b(u)}$. Also introduce a sequence of random vectors

\[\xi _{n}:=\sum \limits_{i=1}^{n}\zeta _{i},\]

with i.i.d. summands

\[\zeta _{i}=-\frac{\varDelta _{i}a(Y_{i})}{b(Y_{i})}+\frac{\exp ({\beta _{0}^{\mathsf{T}}}W_{i})}{M_{U}(\beta _{0})}{\int _{0}^{Y_{i}}}a(u)K(u)\hspace{0.1667em}\mathrm{d}u+\frac{\partial q}{\partial \beta }(Y_{i},\varDelta _{i},W_{i},\beta _{0},\lambda _{0}).\]

Let $\varSigma _{\beta }=4\mathrm{Cov}(\zeta _{1})$, $m(\varphi _{\lambda })={\int _{0}^{\tau }}\hspace{-0.1667em}\varphi _{\lambda }(u)a(u)G_{C}(u)\hspace{0.1667em}\mathrm{d}u$, ${\varSigma _{\varphi }^{2}}=4\mathrm{Var}[\langle {q^{\prime }}(Y,\varDelta ,W,\lambda _{0},\beta _{0}),\varphi \rangle ]$ with $\varphi =(\varphi _{\lambda },\varphi _{\beta })\in C[0,\tau ]\times {\mathbb{R}}^{k}$.

Theorem 1.

Assume conditions (i) to (ix). Then M is invertible and

(2.1)

\[\begin{array}{r@{\hskip0pt}l}& \displaystyle \sqrt{n}(\beta _{n}-\beta _{0})\stackrel{d}{\to }N_{k}\big(0,{M}^{-1}\varSigma _{\beta }{M}^{-1}\big).\end{array}\]

Moreover, for any Lipschitz continuous function f on $[0,\tau ]$,

(2.2)

\[\sqrt{n}{\int _{0}^{\tau }}\hspace{-0.1667em}(\lambda _{n}-\lambda _{0})(u)f(u)G_{C}(u)\hspace{0.1667em}\mathrm{d}u\stackrel{d}{\to }N\big(0,{\sigma _{\varphi }^{2}}(f)\big)\]

where ${\sigma _{\varphi }^{2}}(f)={\sigma _{\varphi }^{2}}$ with $\varphi =(\varphi _{\lambda },\varphi _{\beta })$, $\varphi _{\beta }=-{A}^{-1}m(\varphi _{\lambda })$, and $\varphi _{\lambda }$ is a unique solution to the Fredholm’s integral equation

(2.3)

\[\frac{\varphi _{\lambda }}{K(u)}-{a}^{\mathsf{T}}(u){A}^{-1}m(\varphi _{\lambda })=f(u).\]

Corollary 2.

Let $0<\varepsilon <\tau $. Assume that $\frac{1}{G_{C}}$ is Lipschitz continuous on $[0,\tau -\varepsilon ]$. Under conditions (i) to (ix), for any Lipschitz continuous function f on $[0,\tau ]$ with support on $[0,\tau -\varepsilon ]$,

(2.4)

\[\sqrt{n}{\int _{0}^{\tau -\varepsilon }}\hspace{-0.1667em}(\lambda _{n}-\lambda _{0})(u)f(u)\hspace{0.1667em}\mathrm{d}u\stackrel{d}{\to }N\big(0,{\sigma _{\varphi }^{2}}(f)\big)\]

\[\frac{\varphi _{\lambda }}{K(u)}-{a}^{\mathsf{T}}(u){A}^{-1}m(\varphi _{\lambda })=\frac{f(u)}{G_{C}(u)}.\]

Here by definition $\frac{f(\tau )}{G_{C}(\tau )}=0$.

Note that the corollary immediately follows from the theorem after f is substituted by $\frac{f}{G_{C}}$.

3 Computation of estimators

Since $\varTheta _{\lambda }$ is infinite-dimensional, computation of $(\lambda _{n},\beta _{n})$ is not a parametric problem in general setting. We refer to the ideas of I.J. Schoenberg [11]. We will show that maximum of (1.2) is attained on a linear spline with nodes located at points $Y_{i}$, $i=1,\dots ,n$ and some other points that can be calculated.

Let $i_{1},\dots ,i_{n}\in 1,\dots ,n$ be such a numbering that $Y_{i_{1}}\le \cdots \le Y_{i_{n}}$, i.e., $(Y_{i_{1}},\dots ,Y_{i_{n}})$ is a variational series of $(Y_{1},\dots ,Y_{n})$. Alongside with $(\lambda _{n},\beta _{n})$ we consider $(\overline{\lambda }_{n},\beta _{n})$, where $\overline{\lambda }_{n}$ is the following function. We set $\overline{\lambda }_{n}(Y_{i_{k}})=\lambda (Y_{i_{k}})$, $k=1,\dots ,n$. For each interval $[Y_{i_{k}},Y_{i_{k+1}}]$, $k=1,\dots ,n-1$, perform the next procedure. Draw straight lines

(3.1)

\[{L_{i_{k}}^{1}}(t)=\lambda (Y_{i_{k}})+L(Y_{i_{k}}-t)\]

and

(3.2)

\[{L_{i_{k}}^{2}}(t)=\lambda (Y_{i_{k+1}})+L(t-Y_{i_{k+1}}),\]

where L is defined in (i).

Denote $B_{i_{k}}$ the intersection of ${L_{i_{k}}^{1}}(t)$ and ${L_{i_{k}}^{2}}(t)$. $B_{i_{0}}:=0$, $B_{i_{n}}:=\tau $, $Y_{i_{0}}:=0$, $Y_{i_{n+1}}:=\tau $. We set

(3.3)

\[\overline{\lambda }_{n}(t)=\left\{\begin{array}{l@{\hskip10.0pt}l}\max \{{L_{i_{k}}^{1}}(t),a\}& \text{if}\hspace{2.5pt}t\in [Y_{i_{k}},B_{i_{k}}],\\{} \max \{{L_{i_{k}}^{2}}(t),a\}& \text{if}\hspace{2.5pt}t\in [B_{i_{k}},Y_{i_{k+1}}].\end{array}\right.\]

Note that $\lambda _{n}\ge \overline{\lambda }_{n}$ because $\lambda _{n}\in \varTheta _{\lambda }$. Then

\[{\int _{Y_{i_{k}}}^{Y_{i_{k+1}}}}\hspace{-0.1667em}\lambda _{n}(u)\hspace{0.1667em}\mathrm{d}u\ge {\int _{Y_{i_{k}}}^{Y_{i_{k+1}}}}\hspace{-0.1667em}\overline{\lambda }_{n}(u)\hspace{0.1667em}\mathrm{d}u.\]

Thus, one can easily see that

\[Q_{n}(\lambda _{n},\beta _{n})\le Q_{n}(\overline{\lambda }_{n},\beta _{n})\]

implying $\lambda _{n}=\overline{\lambda }_{n}$ so that we conclude with the following statement.

Theorem 3.

Under conditions (i) and (ii), function $\lambda _{n}$ that maximizes $Q_{n}$ is a linear spline constructed in (3.3).

Using maximization in (3.3) makes computation of $(\lambda _{n},\beta _{n})$ inconvenient. Thus, we propose to modify the estimators. As soon as condition (viii) is satisfied and estimator $(\lambda _{n},\beta _{n})$ is strongly consistent, one can induce that eventually $\overline{\lambda }(B_{i_{k}})>a$, and thus, eventually there is no need in finding maximum in (3.3). Therefore, instead of $(\lambda _{n},\beta _{n})$ we propose to consider a couple $(\widehat{\lambda }_{n},\widehat{\beta }_{n})$ with $\widehat{\beta }_{n}\in \varTheta _{\beta }$ that maximizes $Q_{n}$ under restrictions:

(1) $\widehat{\lambda }_{n}(0)\le A$.
(2) $\widehat{\lambda }_{n}(Y_{i_{k}})\ge a$, $k=1,\dots ,n$.
(3) $\widehat{\lambda }_{n}(Y_{i_{k}})+L(Y_{i_{k}}-Y_{i_{k+1}})\le \widehat{\lambda }_{n}(Y_{i_{k+1}})\le \widehat{\lambda }_{n}(Y_{i_{k}})-L(Y_{i_{k}}-Y_{i_{k+1}})$, $k=1,\dots ,n-1$.
(4) $\widehat{\lambda }_{n}(t):=\left\{\begin{array}{l}{L_{i_{k}}^{1}}(t)\hspace{2.5pt}\text{if}\hspace{2.5pt}t\in [Y_{i_{k}},B_{i_{k}}],\\{} {L_{i_{k}}^{2}}(t)\hspace{2.5pt}\text{if}\hspace{2.5pt}t\in [B_{i_{k}},Y_{i_{k+1}}],\end{array}\right.k=1,\dots ,n-1$.
(5) $\widehat{\lambda }_{n}(t):=\left\{\begin{array}{l}{L_{i_{0}}^{2}}(t)\hspace{2.5pt}\text{if}\hspace{2.5pt}t\in [0,Y_{i_{1}}],\\{} {L_{i_{n}}^{1}}(t)\hspace{2.5pt}\text{if}\hspace{2.5pt}t\in [Y_{i_{n}},\tau ].\end{array}\right.$

Evaluating $(\widehat{\lambda }_{n},\widehat{\beta }_{n})$ is a parametric problem. We mention that eventually $(\widehat{\lambda }_{n},\widehat{\beta }_{n})=(\lambda _{n},\beta _{n})$. We summarise with the next statement.

Theorem 4.

Assume conditions (i) to (ix). Then estimator $(\widehat{\lambda }_{n},\widehat{\beta }_{n})$ is strongly consistent and statements of Theorem 1 and Corollary 2 hold true for that estimator.

4 Stochastic boundedness of transformed and normalized estimators

Theorem 5.

Assume (i) to (vi). Then

\[\begin{array}{r@{\hskip0pt}l}& \displaystyle \sqrt[4]{n}\| \beta _{n}-\beta _{0}\| =O_{p}(1),\\{} & \displaystyle \sqrt{n}{\int _{0}^{\tau }}\hspace{-0.1667em}{\big(\lambda _{n}(u)-\lambda _{0}(u)\big)}^{2}G_{C}(u)\hspace{0.1667em}\mathrm{d}u=O_{p}(1).\end{array}\]

The proof is based on the three lemmas. Using integration by parts one can easily prove the following.

Lemma 6.

For all $u\in [0,\tau ]$

\[{\int _{u}^{\tau }}\big(f_{C}(y)G_{T}(y|X)+f_{T}(y|X)G_{C}(y)\big)\hspace{0.1667em}\mathrm{d}y=G_{T}(u|X)G_{C}(u)=:G(u|X).\]

Crucial step of the proof of Theorem 5 is the following.

Lemma 7.

There exists a closed bounded set A such that $\mu _{X}(A):=P(X\in A)>0$ and that the identity $({v}^{\mathsf{T}}x-c)I_{A}(x)\equiv 0$, for some $v\in {\mathbb{R}}^{k}$, $c\in \mathbb{R}$, implies $v=0$ and $c=0$.

Proof of Lemma 7.

Denote by M the support of $\mu _{X}$, so that M is minimal closed set with $\mu _{X}(M)=\mu _{X}({\mathbb{R}}^{k})$. Since $\mu _{X}$ is not concentrated on a hyperplane due to the condition (vi), there are at least $k+1$ distinct points $m_{1},\dots ,m_{k+1}$ that belong to M and do not lie on a hyperplane. Consider a closed ball $\overline{B}(0,r)$ with radius $r>\max \{\| m_{1}\| ,\dots ,\| m_{k+1}\| \}$. Now one can take $A=M\cap \overline{B}(0,r)$ and make sure that A has all desired properties. □

Let $A_{n}(\omega )$ be a collection of assertions (here ω stands for elementary event). We say that $\{A_{n}\}$ hold eventually if for almost all ω there exists $N_{\omega }$ such that for all $n>N_{\omega }$, $A_{n}(\omega )$ holds.

Lemma 8.

Let $\eta _{n}$, $\xi _{n}$ be two sequences of random variables, $\eta _{n}$ be stochastically bounded, and eventually $\hspace{2.5pt}|\xi _{n}|\le |\eta _{n}|$. Then $\xi _{n}$ is stochastically bounded as well.

Proof of Theorem 5.

Step 1. Denote $q_{\infty }(\lambda ,\beta )=\mathbf{E}[\tilde{q}(Y,\varDelta ,W,\lambda ,\beta )]=\mathbf{E}[\tilde{q}(Y,\varDelta ,X,\lambda ,\beta )]$. Let us show that ${(q_{\infty })}^{\prime }$ exists for $(\lambda ,\beta )\in B$ and equals zero at the true point $(\lambda _{0},\beta _{0})$, where B is some open set in ${\mathbb{R}}^{k}\times C[0,\tau ]$ that contains $\varTheta _{\beta }\times \varTheta _{\lambda }$.

Using (iv) one can easily obtain that

\[\begin{array}{r@{\hskip0pt}l}& \displaystyle \frac{\partial q_{\infty }}{\partial \beta }(\lambda ,\beta )=\mathbf{E}\bigg[\varDelta X-X\exp \big({\beta }^{\mathsf{T}}X\big){\int _{0}^{Y}}\hspace{-0.1667em}\lambda (u)\hspace{0.1667em}\mathrm{d}u\bigg],\\{} & \displaystyle \bigg\langle \frac{\partial q_{\infty }}{\partial \lambda }(\lambda ,\beta ),h\bigg\rangle =\mathbf{E}\bigg[\frac{\varDelta h(Y)}{\lambda (Y)}-\exp \big({\beta }^{\mathsf{T}}X\big){\int _{0}^{Y}}\hspace{-0.1667em}h(u)\hspace{0.1667em}\mathrm{d}u\bigg],\end{array}\]

where $h\in C[0,\tau ]$. Hence, ${(q_{\infty })}^{\prime }$ exists. According to [7], Section 3 $q_{\infty }(\lambda ,\beta )<q_{\infty }(\lambda _{0},\beta _{0})$ for all $(\lambda ,\beta )\ne (\lambda _{0},\beta _{0})$, $(\lambda ,\beta )\in B$. Hence,

\[{(q_{\infty })}^{\prime }(\lambda _{0},\beta _{0})=0.\]

In fact, condition (iv) implies that ${(q_{\infty })}^{\prime\prime }$ and ${(q_{\infty })}^{\prime\prime\prime }$ exist. Hence, third order Taylor’s formula holds,

(4.1)

\[\begin{array}{r@{\hskip0pt}l}\displaystyle q_{\infty }(\lambda _{n},\beta _{n})-q_{\infty }(\lambda _{0},\beta _{0})=& \displaystyle \frac{1}{2}\big\langle {(q_{\infty })}^{\prime\prime }(\lambda _{0},\beta _{0}),{(\lambda _{n}-\lambda _{0},\beta _{n}-\beta _{0})}^{2}\big\rangle \\{} & \displaystyle +\frac{1}{6}\big\langle {(q_{\infty })}^{\prime\prime\prime }(\tilde{\lambda }_{n},\tilde{\beta }_{n}),{(\lambda _{n}-\lambda _{0},\beta _{n}-\beta _{0})}^{3}\big\rangle ,\end{array}\]

where $(\tilde{\lambda }_{n},\tilde{\beta }_{n})$ belongs to interval $[(\lambda _{n},\beta _{n}),(\lambda _{0},\beta _{0})]$.

Step 2. We transform ${(q_{\infty })}^{\prime\prime }$ and show that $-{(q_{\infty })}^{\prime\prime }(\lambda _{0},\beta _{0})$ is a positive definite operator. We have

\[\begin{array}{r@{\hskip0pt}l}& \displaystyle \bigg\langle \frac{{\partial }^{2}q_{\infty }(\lambda _{0},\beta _{0})}{\partial {\lambda }^{2}},(h_{1},h_{2})\bigg\rangle =-\mathbf{E}\bigg[\frac{\varDelta }{{\lambda _{0}^{2}}(Y)}h_{1}(Y)h_{2}(Y)\bigg],\\{} & \displaystyle \frac{{\partial }^{2}q_{\infty }}{\partial {\beta }^{2}}(\lambda _{0},\beta _{0})=-\mathbf{E}\bigg[X{X}^{\mathsf{T}}\exp \big({\beta _{0}^{\mathsf{T}}}X\big){\int _{0}^{Y}}\hspace{-0.1667em}\lambda _{0}(u)\hspace{0.1667em}\mathrm{d}u\bigg],\\{} & \displaystyle \bigg\langle \frac{{\partial }^{2}q_{\infty }(\lambda _{0},\beta _{0})}{\partial \lambda \partial \beta },(h_{\lambda },h_{\beta })\bigg\rangle =-\mathbf{E}\bigg[\big({h_{\beta }^{\mathsf{T}}}X\big)\exp \big({\beta _{0}^{\mathsf{T}}}X\big){\int _{0}^{Y}}\hspace{-0.1667em}h_{\lambda }(u)\hspace{0.1667em}\mathrm{d}u\bigg],\end{array}\]

where $h_{1},h_{2},h_{\lambda }\in C[0,\tau ]$, $h_{\beta }\in {\mathbb{R}}^{k}$.

We use (1.4) and Lemma 6 for further transformations:

(4.2)

\[\begin{array}{r@{\hskip0pt}l}& \displaystyle \bigg\langle \frac{{\partial }^{2}q_{\infty }(\lambda _{0},\beta _{0})}{\partial {\lambda }^{2}},(h_{\lambda },h_{\lambda })\bigg\rangle \\{} & \displaystyle \hspace{1em}=-\mathbf{E}\bigg[\frac{\varDelta }{{\lambda _{0}^{2}}(Y)}{h_{\lambda }^{2}}(Y)\bigg]=\mathbf{E}\bigg({\int _{0}^{\tau }}\frac{{h_{\lambda }^{2}}(u)}{{\lambda _{0}^{2}}(u)}f_{T}(u|X)G_{C}(u)\hspace{0.1667em}\mathrm{d}u\bigg)\\{} & \displaystyle \hspace{1em}=\mathbf{E}\bigg({\int _{0}^{\tau }}\frac{{h_{\lambda }^{2}}(u)}{{\lambda _{0}^{2}}(u)}\lambda _{0}(u)\exp \big({\beta _{0}^{\mathsf{T}}}X\big)\exp \bigg(-{\int _{0}^{u}}\varLambda (s|X;\lambda _{0},\beta _{0})\hspace{0.1667em}\mathrm{d}s\bigg)G_{C}(u)\hspace{0.1667em}\mathrm{d}u\bigg)\\{} & \displaystyle \hspace{1em}=\mathbf{E}\bigg({\int _{0}^{\tau }}\frac{{h_{\lambda }^{2}}(u)}{\lambda _{0}(u)}\exp \big({\beta _{0}^{\mathsf{T}}}X\big)G_{T}(u|X)G_{C}(u)\hspace{0.1667em}\mathrm{d}u\bigg).\end{array}\]

Next,

(4.3)

\[\begin{array}{r@{\hskip0pt}l}& \displaystyle \bigg\langle \frac{{\partial }^{2}q_{\infty }(\lambda _{0},\beta _{0})}{\partial {\beta }^{2}},(h_{\beta },h_{\beta })\bigg\rangle \\{} & \displaystyle \hspace{1em}=-\mathbf{E}\bigg[{\big({h_{\beta }^{\mathsf{T}}}X\big)}^{2}\exp \big({\beta _{0}^{\mathsf{T}}}X\big){\int _{0}^{Y}}\hspace{-0.1667em}\lambda _{0}(u)\hspace{0.1667em}\mathrm{d}u\bigg]\\{} & \displaystyle \hspace{1em}=-\mathbf{E}\bigg[{\big({h_{\beta }^{\mathsf{T}}}X\big)}^{2}\exp \big({\beta _{0}^{\mathsf{T}}}X\big)\bigg({\int _{0}^{\tau }}\bigg({\int _{0}^{y}}\hspace{-0.1667em}\lambda _{0}(u)\hspace{0.1667em}\mathrm{d}u\hspace{2.5pt}f_{T}(y|X)G_{C}(y)\\{} & \displaystyle \hspace{2em}+{\int _{0}^{y}}\hspace{-0.1667em}\lambda _{0}(u)\hspace{0.1667em}\mathrm{d}u\hspace{2.5pt}f_{C}(y)G_{T}(y|X)\bigg)\hspace{0.1667em}\mathrm{d}y\bigg)\bigg]\\{} & \displaystyle \hspace{1em}=-\mathbf{E}\bigg[{\big({h_{\beta }^{\mathsf{T}}}X\big)}^{2}\exp \big({\beta _{0}^{\mathsf{T}}}X\big){\int _{0}^{\tau }}\lambda _{0}(u){\int _{u}^{\tau }}\big(f_{C}(y)G_{T}(y|X)+f_{T}(y|X)G_{C}(y)\big)\hspace{0.1667em}\mathrm{d}y\hspace{0.1667em}\mathrm{d}u\bigg]\\{} & \displaystyle \hspace{1em}=-\mathbf{E}\bigg[{\big({h_{\beta }^{\mathsf{T}}}X\big)}^{2}\exp \big({\beta _{0}^{\mathsf{T}}}X\big){\int _{0}^{\tau }}\lambda _{0}(u)G_{T}(u|X)G_{C}(u)\hspace{0.1667em}\mathrm{d}u\bigg].\end{array}\]

At last,

(4.4)

\[\begin{array}{r@{\hskip0pt}l}& \displaystyle \bigg\langle \frac{{\partial }^{2}q_{\infty }(\lambda _{0},\beta _{0})}{\partial \lambda \partial \beta },(h_{\lambda },h_{\beta })\bigg\rangle \\{} & \displaystyle \hspace{1em}=-\mathbf{E}\bigg[\big({h_{\beta }^{\mathsf{T}}}X\big)\exp \big({\beta _{0}^{\mathsf{T}}}X\big){\int _{0}^{Y}}\hspace{-0.1667em}h_{\lambda }(u)\hspace{0.1667em}\mathrm{d}u\bigg]\\{} & \displaystyle \hspace{1em}=-\mathbf{E}\bigg[\big({h_{\beta }^{\mathsf{T}}}X\big)\exp \big({\beta _{0}^{\mathsf{T}}}X\big){\int _{0}^{\tau }}h_{\lambda }(u){\int _{u}^{\tau }}\big(f_{C}(y)G_{T}(y|X)+f_{T}(y|X)G_{C}(y)\big)\hspace{0.1667em}\mathrm{d}y\hspace{0.1667em}\mathrm{d}u\bigg]\\{} & \displaystyle \hspace{1em}=-\mathbf{E}\bigg[\big({h_{\beta }^{\mathsf{T}}}X\big)\exp \big({\beta _{0}^{\mathsf{T}}}X\big){\int _{0}^{\tau }}h_{\lambda }(u)G_{T}(u|X)G_{C}(u)\hspace{0.1667em}\mathrm{d}u\bigg].\end{array}\]

Hence, from (4.2) to (4.4) it follows that

(4.5)

\[\begin{array}{r@{\hskip0pt}l}& \displaystyle \big\langle {(q_{\infty })}^{\prime\prime }(\lambda _{0},\beta _{0}),{(h_{\lambda },h_{\beta })}^{2}\big\rangle \\{} & \displaystyle \hspace{1em}=-\mathbf{E}\bigg[\exp \big({\beta _{0}^{\mathsf{T}}}X\big){\int _{0}^{\tau }}{\bigg(\big({h_{\beta }^{\mathsf{T}}}X\big)\sqrt{\lambda _{0}(u)G(u|X)}+h_{\lambda }(u)\frac{\sqrt{G(u|x)}}{\sqrt{\lambda _{0}(u)}}\bigg)}^{2}\hspace{0.1667em}\mathrm{d}u\bigg].\end{array}\]

Now, condition (vi) implies that $-{(q_{\infty })}^{\prime\prime }$ is positive definite at $(\lambda _{0},\beta _{0})$, i.e.,

\[\big\langle {(q_{\infty })}^{\prime\prime }(\lambda _{0},\beta _{0}),{(h_{\lambda },h_{\beta })}^{2}\big\rangle =0\hspace{1em}\Longleftrightarrow \hspace{1em}(h_{\lambda },h_{\beta })=(0,0).\]

Indeed, if to assume that $\langle {(q_{\infty })}^{\prime\prime }(\lambda _{0},\beta _{0}),{(h_{\lambda },h_{\beta })}^{2}\rangle =0$ and $(h_{\lambda },h_{\beta })\ne (0,0)$ then (4.5) implies that $h_{\beta }\ne 0$ and $({h_{\beta }^{\mathsf{T}}}X)=const\hspace{2.5pt}a.s.$ We get a contradiction with (vi).

Step 3. We show that there exist such $C>0$ and $\delta >0$ that, whenever $\max \{\| h_{\beta }{\| }^{2},{\int _{0}^{\tau }}\hspace{-0.1667em}\frac{{h_{\lambda }^{2}}(u)G_{0}(u)}{C\lambda _{0}}\hspace{0.1667em}\mathrm{d}u\}>0$, it holds

(4.6)

\[\mathbf{E}\bigg[\frac{-\langle {(\tilde{q})^{\prime\prime }}(Y,\varDelta ,X,\lambda _{0},\beta _{0}),{(h_{\lambda },h_{\beta })}^{2}\rangle }{\max \{\| h_{\beta }{\| }^{2},{\textstyle\int _{0}^{\tau }}\hspace{-0.1667em}\frac{{h_{\lambda }^{2}}(u)G_{0}(u)}{C\lambda _{0}}\hspace{0.1667em}\mathrm{d}u\}}\bigg]\ge \delta .\]

Note that $G(u|X)$ is continuous in X. Denote $G_{0}(u)=\min _{X\in A}G(u|X)$, where A is a set from Lemma 7. Note that $G_{0}(u)=G(u|X_{0})>0$, for all $u\in [0,\tau )$ and some $X_{0}$.

Assume that $\| h_{\beta }{\| }^{2}\ge {\int _{0}^{\tau }}\hspace{-0.1667em}\frac{{h_{\lambda }^{2}}(u)G_{0}(u)}{C\lambda _{0}}\hspace{0.1667em}\mathrm{d}u$. Jensen’s inequality and (4.5) yield

(4.7)

\[\begin{array}{r@{\hskip0pt}l}& \displaystyle -\big\langle {(q_{\infty })}^{\prime\prime }(\lambda _{0},\beta _{0}),{(h_{\lambda },h_{\beta })}^{2}\big\rangle \\{} & \displaystyle \hspace{1em}\ge \frac{1}{\tau }\mathbf{E}\bigg[I_{X\in A}\exp \big({\beta _{0}^{\mathsf{T}}}X\big){\bigg({\int _{0}^{\tau }}\big({h_{\beta }^{\mathsf{T}}}X\big)\sqrt{\lambda _{0}(u)G_{0}(u)}+h_{\lambda }(u)\frac{\sqrt{G_{0}(u)}}{\sqrt{\lambda _{0}(u)}}\hspace{0.1667em}\mathrm{d}u\bigg)}^{2}\bigg]\\{} & \displaystyle \hspace{1em}=\frac{1}{\tau }\mathbf{E}\bigg[I_{X\in A}\exp \big({\beta _{0}^{\mathsf{T}}}X\big){\bigg(\big({h_{\beta }^{\mathsf{T}}}X\big){\int _{0}^{\tau }}\sqrt{\lambda _{0}(u)G_{0}(u)}\mathrm{d}u+{\int _{0}^{\tau }}h_{\lambda }(u)\frac{\sqrt{G_{0}(u)}}{\sqrt{\lambda _{0}(u)}}\hspace{0.1667em}\mathrm{d}u\bigg)}^{2}\bigg].\end{array}\]

Denote

\[\begin{array}{r@{\hskip0pt}l}& \displaystyle a_{0}=\underset{X\in A}{\min }\frac{1}{\tau }\exp \big({\beta _{0}^{\mathsf{T}}}X\big),\hspace{2em}a_{1}={\int _{0}^{\tau }}\sqrt{\lambda _{0}(u)G_{0}(u)}\hspace{0.1667em}\mathrm{d}u,\\{} & \displaystyle K_{\lambda }(h_{\beta })=\frac{{\textstyle\int _{0}^{\tau }}h_{\lambda }(u)\frac{\sqrt{G_{0}(u)}}{\sqrt{\lambda _{0}(u)}}\hspace{0.1667em}\mathrm{d}u}{\| h_{\beta }\| }.\end{array}\]

Inequality (4.7) implies that

\[\begin{array}{r@{\hskip0pt}l}& \displaystyle \mathbf{E}\bigg[\frac{-\langle {(\tilde{q})^{\prime\prime }}(Y,\varDelta ,X,\lambda _{0},\beta _{0}),{(h_{\lambda },h_{\beta })}^{2}\rangle }{\max \{\| h_{\beta }{\| }^{2},{\textstyle\int _{0}^{\tau }}\hspace{-0.1667em}\frac{{h_{\lambda }^{2}}(u)G_{0}(u)}{C\lambda _{0}}\hspace{0.1667em}\mathrm{d}u\}}\bigg]\ge a_{0}\mathbf{E}\big[I_{X\in A}{\big(\big({\widehat{h}_{\beta }^{\mathsf{T}}}X\big)a_{1}+K_{\lambda }\big)}^{2}\big],\end{array}\]

where $\widehat{h}_{\beta }=\frac{h_{\beta }}{\| h_{\beta }\| }$. Fix $T\in \mathbb{R}$. Equality

\[\begin{array}{r@{\hskip0pt}l}& \displaystyle \mathbf{E}\big[I_{X\in A}{\big(\big({\widehat{h}_{\beta }^{\mathsf{T}}}X\big)a_{1}+T\big)}^{2}\big]=0\end{array}\]

implies that $I_{X\in A}(({\widehat{h}_{\beta }^{\mathsf{T}}}X)+\frac{T}{a_{1}})=const\hspace{2.5pt}a.s.$, which contradicts to the choice of A.

It is easy to see that for a fixed $\widehat{h}_{\beta }$, minimum of

\[\mathbf{E}\big[I_{X\in A}{\big(\big({\widehat{h}_{\beta }^{\mathsf{T}}}X\big)a_{1}+T\big)}^{2}\big]\]

is attained at a unique point $T=T(\widehat{h}_{\beta },A)$. Moreover, $T(\widehat{h}_{\beta },A)$ is a continuous function of $\widehat{h}_{\beta }$. Hence, we have

(4.8)

\[\begin{array}{r@{\hskip0pt}l}& \displaystyle \mathbf{E}\bigg[\frac{-\langle {(\tilde{q})^{\prime\prime }}(Y,\varDelta ,X,\lambda _{0},\beta _{0}),{(h_{\lambda },h_{\beta })}^{2}\rangle }{\max \{\| h_{\beta }{\| }^{2},{\textstyle\int _{0}^{\tau }}\hspace{-0.1667em}\frac{{h_{\lambda }^{2}}(u)G_{0}(u)}{C\lambda _{0}}\hspace{0.1667em}\mathrm{d}u\}}\bigg]\\{} & \displaystyle \hspace{1em}\ge a_{0}\mathbf{E}\big[I_{X\in A}{\big(\big({\widehat{h}_{\beta }^{\mathsf{T}}}X\big)a_{1}+T(\widehat{h}_{\beta },A)\big)}^{2}\big]>0.\end{array}\]

Due to $\| \widehat{h}_{\beta }\| =1$, the right hand side of (4.8) attains its minimum at some point $\widehat{h}_{\beta _{0}}$. Now one can take

\[\delta _{1}=a_{0}\mathbf{E}\big[I_{X\in A}{\big(\big({\widehat{h}_{\beta _{0}}}^{\mathsf{T}}X\big)a_{1}+T(\widehat{h}_{\beta _{0}},A)\big)}^{2}\big]>0.\]

Consider the second case, where inequality $\| h_{\beta }{\| }^{2}<{\int _{0}^{\tau }}\hspace{-0.1667em}\frac{{h_{\lambda }^{2}}(u)G_{0}(u)}{C\lambda _{0}(u)}\hspace{0.1667em}\mathrm{d}u$ holds. Transform right hand side of (4.5):

\[\begin{array}{r@{\hskip0pt}l}& \displaystyle \mathbf{E}\bigg[\exp \big({\beta _{0}^{\mathsf{T}}}X\big){\int _{0}^{\tau }}{\bigg(\big({h_{\beta }^{\mathsf{T}}}X\big)\sqrt{\lambda _{0}(u)G(u|X)}+h_{\lambda }(u)\frac{\sqrt{G(u|x)}}{\sqrt{\lambda _{0}(u)}}\bigg)}^{2}\hspace{0.1667em}\mathrm{d}u\bigg]\\{} & \displaystyle \hspace{1em}\ge \mathbf{E}\bigg[I_{X\in A}\exp \big({\beta _{0}^{\mathsf{T}}}X\big)\bigg({\big({h_{\beta }^{\mathsf{T}}}X\big)}^{2}{\int _{0}^{\tau }}\lambda _{0}(u)G_{0}(u)\hspace{0.1667em}\mathrm{d}u+2{\int _{0}^{\tau }}\big({h_{\beta }^{\mathsf{T}}}X\big)h_{\lambda }(u)G_{0}(u)\hspace{0.1667em}\mathrm{d}u\\{} & \displaystyle \hspace{2em}+{\int _{0}^{\tau }}{h_{\lambda }^{2}}(u)\frac{G_{0}(u)}{\lambda _{0}(u)}\hspace{0.1667em}\mathrm{d}u\bigg)\bigg].\end{array}\]

Denote $\varPhi ={\int _{0}^{\tau }}{h_{\lambda }^{2}}(u)\frac{G_{0}(u)}{C\lambda _{0}(u)}\hspace{0.1667em}\mathrm{d}u$. Hence, the left hand side of (4.6) is transformed to

\[\begin{array}{r@{\hskip0pt}l}& \displaystyle \mathbf{E}\bigg[\frac{-\langle {(\tilde{q})^{\prime\prime }}(Y,\varDelta ,X,\lambda _{0},\beta _{0}),{(h_{\lambda },h_{\beta })}^{2}\rangle }{\varPhi }\bigg]\\{} & \displaystyle \hspace{1em}\ge \mathbf{E}\bigg[I_{X\in A}\exp \big({\beta _{0}^{\mathsf{T}}}X\big)\bigg({\big({\tilde{h}_{\beta }^{\mathsf{T}}}X\big)}^{2}a_{2}+\frac{2}{\varPhi }{\int _{0}^{\tau }}\big({h_{\beta }}^{\mathsf{T}}X\big)h_{\lambda }(u)G_{0}(u)\hspace{0.1667em}\mathrm{d}u+C\bigg)\bigg],\end{array}\]

where

\[a_{2}={\int _{0}^{\tau }}\lambda _{0}(u)G(u|X)\hspace{0.1667em}\mathrm{d}u,\hspace{2em}\tilde{h}_{\beta }=\frac{h_{\beta }}{\sqrt{{\textstyle\int _{0}^{\tau }}\hspace{-0.1667em}\frac{{h_{\lambda }^{2}}(u)G_{0}(u)}{C\lambda _{0}(u)}\hspace{0.1667em}\mathrm{d}u}}.\]

Jensen’s inequality implies

\[{\varPhi }^{1/2}\ge \sqrt{\frac{1}{\tau C}}\bigg({\int _{0}^{\tau }}|h_{\lambda }(u)|\sqrt{\frac{G_{0}(u)}{\lambda _{0}(u)}}\hspace{0.1667em}\mathrm{d}u\bigg).\]

Since $\sqrt{\varPhi }>\| h_{\beta }\| $, $G_{0}(u)\in [0,1]$ and $\lambda _{0}$ is bounded away from 0, we have

\[\begin{array}{r@{\hskip0pt}l}\displaystyle \bigg|\frac{{\textstyle\int _{0}^{\tau }}({h_{\beta }}^{\mathsf{T}}X)h_{\lambda }(u)G_{0}(u)\hspace{0.1667em}\mathrm{d}u}{\varPhi }\bigg|& \displaystyle \le \frac{\| h_{\beta }\| }{\sqrt{\varPhi }}\| X\| \bigg|\frac{{\tau }^{1/2}{\textstyle\int _{0}^{\tau }}h_{\lambda }(u)\sqrt{G_{0}(u)}\hspace{0.1667em}\mathrm{d}u}{{\textstyle\int _{0}^{\tau }}|h_{\lambda }(u)|\sqrt{\frac{G_{0}(u)}{\lambda _{0}(u)}}\hspace{0.1667em}\mathrm{d}u}\bigg|\sqrt{C}\\{} & \displaystyle \le \sqrt{C}\| X\| D,\end{array}\]

for some constant $D>0$ which depends only on τ and $\lambda _{0}$. Since $\| \tilde{h}_{\beta }\| <1$, there exist constants $K_{1}>0$, $K_{2}>0$ that satisfy

\[\mathbf{E}\bigg[\frac{-\langle {(\tilde{q})^{\prime\prime }}(Y,\varDelta ,X,\lambda _{0},\beta _{0}),{(h_{\lambda },h_{\beta })}^{2}\rangle }{\varPhi }\bigg]\ge \tau a_{0}(-K_{1}-\sqrt{C}K_{2}+C).\]

Choosing C large enough, we get (4.6).

Step 4. Now transform Taylor’s decomposition (4.1):

(4.9)

\[\begin{array}{r@{\hskip0pt}l}& \displaystyle q_{\infty }(\lambda _{n},\beta _{n})-q_{\infty }(\lambda _{0},\beta _{0})\\{} & \displaystyle \hspace{1em}=\mathbf{E}\bigg(\max \bigg\{\| h_{\beta _{n}}{\| }^{2},{\int _{0}^{\tau }}\hspace{-0.1667em}\frac{{h_{\lambda _{n}}^{2}}(u)G_{0}(u)}{C\lambda _{0}}\hspace{0.1667em}\mathrm{d}u\bigg\}\bigg[\frac{1}{2}\frac{\langle {(\tilde{q})^{\prime\prime }}(Y,\varDelta ,X,\lambda _{0},\beta _{0}),{(h_{\lambda _{n}},h_{\beta _{n}})}^{2}\rangle }{\max \{\| h_{\beta _{n}}{\| }^{2},{\textstyle\int _{0}^{\tau }}\hspace{-0.1667em}\frac{{h_{\lambda _{n}}^{2}}(u)G_{0}(u)}{C\lambda _{0}}\hspace{0.1667em}\mathrm{d}u\}}\\{} & \displaystyle \hspace{2em}+\frac{1}{6}\frac{\langle {(\tilde{q})^{\prime\prime\prime }}(Y,\varDelta ,X,\tilde{\lambda }_{n},\tilde{\beta }_{n}),{(h_{\lambda _{n}},h_{\beta _{n}})}^{3}\rangle }{\max \{\| h_{\beta _{n}}{\| }^{2},{\textstyle\int _{0}^{\tau }}\hspace{-0.1667em}\frac{{h_{\lambda _{n}}^{2}}(u)G_{0}(u)}{C\lambda _{0}}\hspace{0.1667em}\mathrm{d}u\}}\bigg]\bigg),\end{array}\]

where we denote $h_{\lambda _{n}}=\lambda _{n}-\lambda _{0}$ and $h_{\beta _{n}}=\beta _{n}-\beta _{0}$. Remember that $G_{T}(u|X)\in (0,1]$ for all X, so that $G_{0}(u)\ge K_{3}G_{C}(u)$ for some $K_{3}>0$. One can see that $\frac{{\partial }^{3}q_{\infty }}{\partial {\lambda }^{2}\partial \beta }=0$. Using the same technique as in (4.2)–(4.4) and the assumptions, we get

(4.10)

\[\begin{array}{r@{\hskip0pt}l}& \displaystyle \bigg\langle \frac{{\partial }^{3}q_{\infty }(\tilde{\lambda }_{n},\tilde{\beta }_{n})}{\partial {\lambda }^{3}},(h_{1},h_{2},h_{3})\bigg\rangle \\{} & \displaystyle \hspace{1em}=\frac{1}{2}\mathbf{E}\bigg[\frac{\varDelta }{{\tilde{\lambda }_{n}^{3}}(Y)}h_{1}(Y)h_{2}(Y)h_{3}(Y)\bigg]\\{} & \displaystyle \hspace{1em}=\frac{1}{2}\mathbf{E}\bigg({\int _{0}^{\tau }}\frac{h_{1}(u)h_{2}(u)h_{3}(u)}{{\tilde{\lambda }_{n}^{2}}(u)}\exp \big({\tilde{\beta }_{n}^{\mathsf{T}}}X\big)G_{T}(u|X)G_{C}(u)\hspace{0.1667em}\mathrm{d}u\bigg)\\{} & \displaystyle \hspace{1em}\le K_{4}\| h_{1}\| {\int _{0}^{\tau }}h_{2}(u)h_{3}(u)G_{0}(u)\hspace{0.1667em}\mathrm{d}u,\end{array}\]

(4.11)

\[\begin{array}{r@{\hskip0pt}l}& \displaystyle \bigg\langle \frac{{\partial }^{3}q_{\infty }(\tilde{\lambda }_{n},\tilde{\beta }_{n})}{\partial {\beta }^{3}},h_{\beta }\bigg\rangle =-\mathbf{E}\bigg[{\big({h_{\beta }^{\mathsf{T}}}X\big)}^{3}\exp \big({\tilde{\beta }_{n}^{\mathsf{T}}}X\big){\int _{0}^{Y}}\hspace{-0.1667em}\tilde{\lambda }_{n}\hspace{0.1667em}\mathrm{d}u\bigg]\le K_{5}\| h_{\beta }{\| }^{3},\end{array}\]

(4.12)

\[\begin{array}{r@{\hskip0pt}l}& \displaystyle \bigg\langle \frac{{\partial }^{3}q_{\infty }(\tilde{\lambda }_{n},\tilde{\beta }_{n})}{\partial \lambda \partial {\beta }^{2}},(h_{\lambda },h_{\beta },h_{\beta })\bigg\rangle \\{} & \displaystyle \hspace{1em}=-\mathbf{E}\bigg[{\big({h_{\beta }^{\mathsf{T}}}X\big)}^{2}\exp \big({\tilde{\beta }_{n}^{\mathsf{T}}}X\big){\int _{0}^{Y}}\hspace{-0.1667em}h_{\lambda }(u)\hspace{0.1667em}\mathrm{d}u\bigg]\le K_{6}\| h_{\beta }{\| }^{2}\| h_{\lambda }\| \end{array}\]

where $K_{4}$ to $K_{6}$ are positive constants. We note that all constants $K_{3}$ to $K_{6}$ depend only on $\varTheta =\varTheta _{\lambda }\times \varTheta _{\beta }$. Kukush et al. [7] prove strong consistency of the estimator $(\lambda _{n},\beta _{n})$, that is $\max _{t\in [0,\tau ]}|\lambda _{n}(t)-\lambda _{0}(t)|\to 0$ and $\beta _{n}\to \beta _{0}$ a.s., as $n\to \infty $. One can conclude that

(4.13)

\[\underset{n\to \infty }{\lim }\mathbf{E}\bigg[\frac{\langle {(\tilde{q})^{\prime\prime\prime }}(Y,\varDelta ,X,\tilde{\lambda }_{n},\tilde{\beta }_{n}),{(h_{\lambda _{n}},h_{\beta _{n}})}^{3}\rangle }{\max \{\| h_{\beta _{n}}{\| }^{2},{\textstyle\int _{0}^{\tau }}\hspace{-0.1667em}\frac{{h_{\lambda _{n}}^{2}}(u)G_{0}(u)}{C\lambda _{0}}\hspace{0.1667em}\mathrm{d}u\}}\bigg]=0\hspace{1em}a.s.\]

Step 5. Set $S_{n}(\lambda ,\beta )=n(Q_{n}(\lambda ,\beta )-q_{\infty }(\lambda ,\beta ))$. Kukush et al. [7] prove that under assumptions (i) to (vi) $\frac{S_{n}(\lambda ,\beta )}{\sqrt{n}}$ converges in distribution in $C(\varTheta )$ to a Gaussian measure. Hence,

\[\begin{array}{r@{\hskip0pt}l}\displaystyle 0& \displaystyle \le \sqrt{n}(q_{\infty }(\lambda _{0},\beta _{0})-q_{\infty }(\lambda _{n},\beta _{n})\\{} & \displaystyle \le \sqrt{n}\big(Q_{n}(\lambda _{n},\beta _{n})-q_{\infty }(\lambda _{n},\beta _{n})-Q_{n}(\lambda _{0},\beta _{0})+q_{\infty }(\lambda _{0},\beta _{0})\big)\\{} & \displaystyle \le 2\sqrt{n}\underset{(\lambda ,\beta )\in \varTheta _{\lambda }\times \varTheta _{\beta }}{\sup }\big|Q_{n}(\lambda ,\beta )-q_{\infty }(\lambda ,\beta )\big|=O_{p}(1),\end{array}\]

because $q_{\infty }(\lambda ,\beta )$ and $Q_{n}(\lambda ,\beta )$ attain their maximums at $(\lambda _{0},\beta _{0})$ and $(\lambda _{n},\beta _{n})$, respectively.

Now, (4.9) yields

\[\begin{array}{r@{\hskip0pt}l}& \displaystyle \sqrt{n}\max \bigg\{\| h_{\beta _{n}}{\| }^{2},{\int _{0}^{\tau }}\hspace{-0.1667em}\frac{{h_{\lambda }^{2}}(u)G_{0}(u)}{C\lambda _{0}}\hspace{0.1667em}\mathrm{d}u\bigg\}\mathbf{E}\bigg(\bigg[\frac{1}{2}\frac{\langle {(\tilde{q})^{\prime\prime }}(Y,\varDelta ,X,\lambda _{0},\beta _{0}),{(h_{\lambda _{n}},h_{\beta _{n}})}^{2}\rangle }{\max \{\| h_{\beta _{n}}{\| }^{2},{\textstyle\int _{0}^{\tau }}\hspace{-0.1667em}\frac{{h_{\lambda }^{2}}(u)G_{0}(u)}{C\lambda _{0}}\hspace{0.1667em}\mathrm{d}u\}}\\{} & \displaystyle \hspace{2em}+\frac{1}{6}\frac{\langle {(\tilde{q})^{\prime\prime\prime }}(Y,\varDelta ,X,\tilde{\lambda }_{n},\tilde{\beta }_{n}),{(h_{\lambda _{n}},h_{\beta _{n}})}^{3}\rangle }{\max \{\| h_{\beta _{n}}{\| }^{2},{\textstyle\int _{0}^{\tau }}\hspace{-0.1667em}\frac{{h_{\lambda }^{2}}(u)G_{0}(u)}{C\lambda _{0}}\hspace{0.1667em}\mathrm{d}u\}}\bigg]\bigg)\\{} & \displaystyle \hspace{1em}=\sqrt{n}\big(q_{\infty }(\lambda _{0},\beta _{0})-q_{\infty }(\lambda _{n},\beta _{n})\big)=O_{p}(1).\end{array}\]

Step 6. Equations (4.6), (4.9) and (4.13) imply that eventually

\[\sqrt{n}\max \bigg\{\| h_{\beta _{n}}{\| }^{2},{\int _{0}^{\tau }}\hspace{-0.1667em}\frac{{h_{\lambda }^{2}}(u)G_{0}(u)}{C\lambda _{0}}\hspace{0.1667em}\mathrm{d}u\bigg\}<\frac{\sqrt{n}(q_{\infty }(\lambda _{0},\beta _{0})-q_{\infty }(\lambda _{n},\beta _{n}))}{\delta /3}.\]

Lemma 8 proves that $\sqrt{n}\max \{\| h_{\beta _{n}}{\| }^{2},{\int _{0}^{\tau }}\hspace{-0.1667em}\frac{{h_{\lambda _{n}}^{2}}(u)G_{0}(u)}{C\lambda _{0}}\hspace{0.1667em}\mathrm{d}u\}=O_{p}(1)$. Hence the first equation of Theorem 5 is proved:

\[\sqrt{n}\| \beta _{n}-\beta _{0}{\| }^{2}=\sqrt{n}\| h_{\beta _{n}}{\| }^{2}=O_{p}(1).\]

Finally, $G_{0}(u)\ge K_{3}G_{C}(u)$. Note that $\lambda _{0}$ is bounded away from 0 on $[0,\tau ]$. Hence

\[\sqrt{n}{\int _{0}^{\tau }}\hspace{-0.1667em}{h_{\lambda _{n}}^{2}}(u)G_{C}(u)\hspace{0.1667em}\mathrm{d}u=O_{p}(1).\]

Thus, Theorem 5 is proved. □

5 Auxiliary results

We use the ideas of [3].

Let $\theta _{n}=(\lambda _{n},\beta _{n})$, $\theta _{0}=(\lambda _{0},\beta _{0})$, $\varTheta =\varTheta _{\lambda }\times \varTheta _{\beta }$. Denote $\varphi =(\varphi _{\lambda },\varphi _{\beta })$ an admissible shift such that there exists $\delta >0$ with $\theta _{0}\pm \delta \varphi \in \varTheta $. We demand that (vii)–(viii) hold. Note that φ can be a random element and depend on n. However, $\| \varphi \| $ should be bounded from above a.s.

Consider the function $f(t)=Q_{n}(\theta _{n}+t(\theta _{0}-\theta _{n}\pm \delta \varphi ))$, $0\le t\le 1$. It is well-defined (due to the convexity of Θ) and attains its maximum at point $t=0$. Therefore, $\langle {Q^{\prime }_{n}}(\theta _{n}),\theta _{0}-\theta _{n}\pm \delta \varphi \rangle \le 0$ and

\[\big|\big\langle {Q^{\prime }_{n}}(\theta _{n}),\varphi \big\rangle \big|\le \frac{1}{\delta }\big\langle {Q^{\prime }_{n}}(\theta _{n}),\varDelta \theta _{n}\big\rangle ,\]

where $\varDelta \theta _{n}:=\theta _{n}-\theta _{0}$.

Taylor’s expansion at point $(\lambda _{0},\beta _{0})$ implies

(5.1)

\[\begin{array}{r@{\hskip0pt}l}& \displaystyle \bigg|\big\langle {Q^{\prime }_{n}}(\theta _{0}),\varphi \big\rangle +\frac{1}{2}\big\langle {Q^{\prime\prime }_{n}}(\theta _{0}),(\varDelta \theta _{n},\varphi )\big\rangle +\frac{1}{6}\big\langle {Q^{\prime\prime\prime }_{n}}(\tilde{\theta }_{n}),\big(\varDelta {\theta _{n}^{2}},\varphi \big)\big\rangle \bigg|\\{} & \displaystyle \hspace{1em}\le \frac{1}{\delta }\bigg(\big\langle {Q^{\prime }_{n}}(\theta _{0}),\varDelta \theta _{n}\big\rangle +\frac{1}{2}\big\langle {Q^{\prime\prime }_{n}}(\theta _{0}),\varDelta {\theta _{n}^{2}}\big\rangle +\frac{1}{6}\big\langle {Q^{\prime\prime\prime }_{n}}(\widehat{\theta }_{n}),\varDelta {\theta _{n}^{3}}\big\rangle \bigg),\end{array}\]

for some $\widehat{\theta }_{n}$ and $\tilde{\theta }_{n}$ from interval $[\theta _{0},\theta _{n}]$.

Proposition 9.

Under conditions (i) to (viii) for every admissible shift φ, one has that $\sqrt{n}\langle {Q^{\prime\prime }_{n}}(\theta _{0}),(\varDelta \theta _{n},\varphi )\rangle $ and $\sqrt{n}\langle {q^{\prime\prime }_{\infty }}(\theta _{0}),(\varDelta \theta _{n},\varphi )\rangle $ are stochastically bounded.

Relying on this proposition we will be able to show that $\sqrt{n}\| \beta _{n}-\beta _{0}\| $ and $\sqrt{n}{\int _{0}^{\tau }}\hspace{-0.1667em}(\lambda _{n}-\lambda _{0})(u)G_{C}(u)\hspace{0.1667em}\mathrm{d}u$ are stochastically bounded and then prove the asymptotic normality of $\sqrt{n}\langle {Q^{\prime\prime }_{n}}(\theta _{0}),(\varDelta \theta _{n},\varphi )\rangle $.

Denote $\varTheta _{-}=\varTheta -\varTheta $. It is clear that it is compact and convex. Before proving the proposition, we show the following.

Lemma 10.

Under conditions (i) to (viii), $\sqrt{n}{Q^{\prime }_{n}}(\theta _{0})$ and $\sqrt{n}({Q^{\prime\prime }_{n}}(\theta _{0})-{q^{\prime\prime }_{\infty }}(\theta _{0}))$ converge in distribution in $C(\varTheta _{-})$ and $C({\varTheta _{-}^{2}})$, respectively. Moreover, for all $\theta =(\lambda ,\beta )\in \varTheta $, $\sqrt{n}(\frac{{\partial }^{3}Q_{n}}{\partial {\lambda }^{3}}(\theta )-\frac{{\partial }^{3}q_{\infty }}{\partial {\lambda }^{3}}(\theta ))$ converges in distribution in $C({\varTheta _{-}^{3}})$.

Proof of Lemma 10.

Here only convergence for $\sqrt{n}{Q^{\prime }_{n}}(\theta _{0})$ will be shown, because for $\sqrt{n}({Q^{\prime\prime }_{n}}(\theta _{0})-{q^{\prime\prime }_{\infty }}(\theta _{0}))$ and $\sqrt{n}(\frac{{\partial }^{3}Q_{n}}{\partial {\lambda }^{3}}(\theta )-\frac{{\partial }^{3}q_{\infty }}{\partial {\lambda }^{3}}(\theta ))$ the proof is similar. We note that ${q^{\prime }_{\infty }}(\theta _{0})=0$ and due to conditions (iii)–(iv) we have $\mathbf{E}[\sup _{\beta \in \varTheta _{\beta }}{e}^{2{\beta }^{\mathsf{T}}X}\| X{\| }^{k}]<\infty $ and $\mathbf{E}[\sup _{\beta \in \varTheta _{\beta }}{e}^{2{\beta }^{\mathsf{T}}U}\| U{\| }^{k}]<\infty $, for any $k\in \mathbb{N}$.

For $(\lambda ,\beta )\in \varTheta _{-}$ let

\[g(\lambda ,\beta ,Y,\varDelta ,W)=\big\langle {q^{\prime }}(Y,\varDelta ,W,\lambda _{0},\beta _{0}),(\lambda ,\beta )\big\rangle \]

and

\[\rho \big((\lambda _{1},\beta _{1}),(\lambda _{2},\beta _{2})\big)=\underset{u\in [0,\tau ]}{\sup }|\lambda _{1}(u)-\lambda _{2}(u)|+\| \beta _{1}-\beta _{2}\| .\]

$(\varTheta _{-},\rho )$ is a compact metric space. We denote by $Lip(\rho )$ a subspace of Lipschitz continuous functions on $\varTheta _{-}$ with respect to the metric ρ and by $\| \cdot \| _{\rho }$ the norm induced by ρ, that is for some fixed point $({\lambda }^{\ast },{\beta }^{\ast })\in \varTheta _{-}$ and for all $l\in Lip(\rho )$ we define:

\[\| l\| _{\rho }:=\underset{(\lambda _{1},\beta _{1})\ne (\lambda _{2},\beta _{2})}{\sup }\frac{|l(\lambda _{1},\beta _{1})-l(\lambda _{2},\beta _{2})|}{\rho ((\lambda _{1},\beta _{1}),(\lambda _{2},\beta _{2}))}+l\big({\lambda }^{\ast },{\beta }^{\ast }\big).\]

We apply Theorem 2 from [12]. It states that $\sqrt{n}{Q^{\prime }_{n}}(\theta _{0})$ converges in distribution in $C(\varTheta _{-})$ under the following conditions:

(1) $\mathbf{P}(g\in Lip(\rho ))=1$.
(2) $\mathbf{E}\| g{\| _{\rho }^{2}}<\infty $.
(3) $\int _{{0}^{+}}{H}^{\frac{1}{2}}(\varTheta _{-},u)\hspace{0.1667em}\mathrm{d}u<\infty $, where H is ε-entropy on $(\varTheta _{-},\rho )$, i.e. $H(\varTheta _{-},u)=\log _{2}N(\varTheta _{-},u)$, where N is a minimal number of balls with diameter not exceeding $2\varepsilon $ that cover $\varTheta _{-}$.

Let $\varTheta _{\lambda _{-}}=\varTheta _{\lambda }-\varTheta _{\lambda }$ and $\varTheta _{\beta _{-}}=\varTheta _{\beta }-\varTheta _{\beta }$, so that $\varTheta _{-}=\varTheta _{\lambda _{-}}\times \varTheta _{\beta _{-}}$. Consider $\varTheta _{\lambda _{-}}$ and $\varTheta _{\beta _{-}}$ as compact metric spaces with uniform and Euclidean norm, respectively. Then for $N(\varTheta _{-},2u)\le N(\varTheta _{\beta _{-}},u)N(\varTheta _{\lambda _{-}},u)$, (3) is equivalent to

(3.1) $\int _{{0}^{+}}{H}^{\frac{1}{2}}(\varTheta _{\lambda _{-}},u)\hspace{0.1667em}\mathrm{d}u<\infty $, and
(3.2) $\int _{{0}^{+}}{H}^{\frac{1}{2}}(\varTheta _{\beta _{-}},u)\hspace{0.1667em}\mathrm{d}u<\infty $.

Since $\varTheta _{\beta _{-}}\subset {\mathbb{R}}^{k}$, we have $N(\varTheta _{\beta _{-}},u)<C{u}^{k}$ for some constant $C>0$, and (3.2) is fulfilled. Note that $\varTheta _{\lambda _{-}}$ can be considered as a set of Lipschitz continuous functions that map compact connected space $[0,\tau ]$ into some interval in $\mathbb{R}$. Lemma 1 from [8] implies

\[H(\varTheta _{\lambda _{-}},u)\ge 1+H(\varTheta _{\lambda _{-}},4u),\]

so that $\varTheta _{\lambda _{-}}$ is of “uniform type” (see [8]). According to Theorem 1 from [8] there exists such constant C that

\[H(\varTheta _{\lambda _{-}},4L\varepsilon )\le CN\big([0,\tau ],\varepsilon \big).\]

For the space ${\mathbb{R}}^{1}$ we have that $N([0,\tau ],u)<\tilde{C}\frac{1}{u}$ for some constant $\tilde{C}$. Hence (3.1) holds.

To verify (1) and (2) note that

\[\begin{array}{r@{\hskip0pt}l}\displaystyle g(\lambda ,\beta ,Y,\varDelta ,W)=& \displaystyle \frac{\varDelta \lambda (Y)}{\lambda _{0}(Y)}-\frac{{e}^{{\beta _{0}^{\mathsf{T}}}W}}{M_{U}(\beta _{0})}{\int _{0}^{Y}}\lambda \hspace{0.1667em}\mathrm{d}u+\varDelta {\beta }^{\mathsf{T}}W\\{} & \displaystyle +{\beta }^{\mathsf{T}}\frac{(M_{U}(\beta _{0})W-\mathbf{E}(U{e}^{{\beta _{0}^{\mathsf{T}}}U})){e}^{{\beta _{0}^{\mathsf{T}}}W}}{{M_{U}^{2}}(\beta _{0})}{\int _{0}^{Y}}\lambda _{0}\hspace{0.1667em}\mathrm{d}u,\end{array}\]

and conditions (i)–(ii) imply

\[\underset{(\lambda ,\beta )\in \varTheta _{-}}{\sup }\big\| {g^{\prime }}(\lambda ,\beta ,Y,\varDelta ,W)\big\| <\infty ,\]

where ${g^{\prime }}$ is considered as a bilinear operator on $C[0,\tau ]\times {\mathbb{R}}^{k}$. Hence, condition (1) is fulfilled. Moreover, there exists such a constant $K>0$ that

\[\| g(\lambda ,\beta ,Y,\varDelta ,W)\| _{\rho }<K\big(1+\| W\| +{e}^{D\| W\| }+\| W\| {e}^{D\| W\| }\big)\]

and due to conditions (iii) and (iv), condition (2) is also satisfied. Thus, lemma is proved. □

Returning to inequality (5.1), because $\varDelta \theta _{n}$ converges to zero a.s., one can conclude the following.

(a) $\sqrt{n}\langle {Q^{\prime }_{n}}(\theta _{0}),\varphi \rangle =O_{p}(1)$ and $\langle \sqrt{n}{Q^{\prime }_{n}}(\theta _{0}),\varDelta \theta _{n}\rangle =o_{p}(1)$, where $o_{p}(1)$ means convergence to zero in probability.
(b) $\sqrt{n}(\frac{{\partial }^{3}Q_{n}}{\partial {\lambda }^{3}}(\theta )-\frac{{\partial }^{3}q_{\infty }}{\partial {\lambda }^{3}}(\theta ))$ converges in probability in $C({\varTheta _{-}^{3}})$. Inequality (4.10) implies that $\sqrt{n}\langle \frac{{\partial }^{3}q_{\infty }}{\partial {\lambda }^{3}}(\theta ),(\varDelta {\theta _{n}^{2}},\varphi )\rangle $ is stochastically bounded, so is $\sqrt{n}\langle \frac{{\partial }^{3}Q_{n}}{\partial {\lambda }^{3}}(\theta ),(\varDelta {\theta _{n}^{2}},\varphi )\rangle $.
(c) $\sqrt{n}\langle ({Q^{\prime\prime }_{n}}(\theta _{0})-{q^{\prime\prime }_{\infty }}(\theta _{0})),\varDelta {\theta _{n}^{2}}\rangle $ and $\sqrt{n}\langle ({Q^{\prime\prime }_{n}}(\theta _{0})-{q^{\prime\prime }_{\infty }}(\theta _{0})),(\varDelta \theta _{n},\varphi )\rangle $ converge to zero in probability. Note that $\langle \sqrt{n}{Q^{\prime\prime }_{n}}(\theta _{0}),\varDelta {\theta _{n}^{2}}\rangle =O_{p}(1)$ if and only if $\sqrt{n}\langle {q^{\prime\prime }_{\infty }}(\theta _{0}),\varDelta {\theta _{n}^{2}}\rangle =O_{p}(1)$. The latter equality can be easily derived from Theorem 5, formula (4.1) and convergence (4.13).

Proof of Proposition 9.

To prove the first part of the proposition one has to show that

(5.2)

\[\big\langle {Q^{\prime\prime\prime }_{n}}(\widehat{\theta }_{n}),\varDelta {\theta _{n}^{3}}\big\rangle =\frac{O_{p}(1)}{\sqrt{n}}\]

and

(5.3)

\[\big\langle {Q^{\prime\prime\prime }_{n}}(\tilde{\theta }_{n}),\big(\varDelta {\theta _{n}^{2}},\varphi \big)\big\rangle =\frac{O_{p}(1)}{\sqrt{n}}.\]

It is clear that (5.3) yields (5.2). After a series of computations one can induce that for some constants $C_{1}>0,\hspace{2.5pt}C_{2}>0$

(5.4)

\[\begin{array}{r@{\hskip0pt}l}& \displaystyle \bigg|\bigg\langle \frac{{\partial }^{3}q(Y,\varDelta ,W,\lambda ,\beta )}{\partial {\beta }^{3}},{(h_{\beta })}^{3}\bigg\rangle \bigg|\le C_{1}{e}^{D\| W\| }\| h_{\beta }{\| }^{3}\\{} & \displaystyle \bigg|\bigg\langle \frac{{\partial }^{3}q(Y,\varDelta ,W,\lambda ,\beta )}{\partial \lambda \partial {\beta }^{2}},(h_{\beta },h_{\beta },h_{\lambda })\bigg\rangle \bigg|\le C_{2}{e}^{D\| W\| }\| h_{\beta }{\| }^{2}\| h_{\lambda }\| .\end{array}\]

Expectations of right hand sides of inequalities in (5.4) are finite. Together with $\sqrt{n}\| \beta _{n}-\beta _{0}{\| }^{2}=O_{p}(1)$ and SLLN, this implies that $\langle \frac{\partial {Q_{n}^{3}}(\tilde{\theta }_{n})}{\partial {\beta }^{3}},(\varDelta {\theta _{n}^{2}},\varphi )\rangle $ and $\langle \frac{\partial {Q_{n}^{3}}(\tilde{\theta }_{n})}{\partial {\beta }^{2}\partial \lambda },(\varDelta {\theta _{n}^{2}},\varphi )\rangle $ are $\frac{O_{p}(1)}{\sqrt{n}}$. Noting that $\frac{\partial {Q_{n}^{3}}(\tilde{\theta }_{n})}{\partial \beta \partial {\lambda }^{2}}=0$, one can conclude that the first part of the proposition will be proven if one shows that

(5.5)

\[\begin{array}{r@{\hskip0pt}l}\displaystyle \bigg\langle \frac{\partial {Q_{n}^{3}}(\tilde{\theta }_{n})}{\partial {\lambda }^{3}},\big(\varDelta {\theta _{n}^{2}},\varphi \big)\bigg\rangle & \displaystyle =\bigg\langle \frac{\partial {Q_{n}^{3}}(\tilde{\lambda }_{n})}{\partial {\lambda }^{3}},\big((\lambda _{n}-\lambda _{0}),(\lambda _{n}-\lambda _{0}),\varphi _{\lambda }\big)\bigg\rangle \\{} & \displaystyle =\frac{1}{n}\sum \limits_{i=1}^{n}\varDelta _{i}\frac{{(\lambda _{n}-\lambda _{0})}^{2}(Y_{i})\varphi _{\lambda }(Y_{i})}{{\tilde{\lambda }_{n}^{3}}(Y_{i})}=\frac{O_{p}(1)}{\sqrt{n}}.\end{array}\]

From the definition of admissible shifts, $\varphi _{\lambda }$ belongs to Θ. Since $\tilde{\lambda }_{n}$ is bounded away from zero, there is a constant C such that for the second summand we have

\[\begin{array}{r@{\hskip0pt}l}& \displaystyle \bigg|\bigg\langle \frac{\partial {Q_{n}^{3}}(\tilde{\theta }_{n})}{\partial {\lambda }^{3}},\big(\varDelta {\theta _{n}^{2}},\varphi \big)\bigg\rangle \bigg|\le C\bigg|\frac{1}{n}\sum \limits_{i=1}^{n}\varDelta _{i}{(\lambda _{n}-\lambda _{0})}^{2}(Y_{i})\bigg|=\frac{O_{p}(1)}{\sqrt{n}},\end{array}\]

where the last equality holds due to the conclusion (b). Thus, (5.5) holds. This completes the proof of the first part of the proposition. The second part is easily derived from conclusion (c). □

Corollary 11.

\[\begin{array}{r@{\hskip0pt}l}& \displaystyle \sqrt{n}\| \beta _{n}-\beta _{0}\| =O_{p}(1),\\{} & \displaystyle \sqrt{n}{\int _{0}^{\tau }}\hspace{-0.1667em}(\lambda _{n}-\lambda _{0})(u)G_{C}(u)\hspace{0.1667em}\mathrm{d}u=O_{p}(1).\end{array}\]

Proof.

Let $h_{\beta }=\beta _{n}-\beta _{0}$, $h_{\lambda }=\lambda _{n}-\lambda _{0}$. Take some admissible shift $\varphi :=(\varphi _{\lambda },\varphi _{\beta })$. For this shift one has

(5.6)

\[\begin{array}{r@{\hskip0pt}l}\displaystyle -\big\langle {q^{\prime\prime }_{\infty }}(\theta _{0}),(\varDelta \theta _{n},\varphi )\big\rangle =& \displaystyle \mathbf{E}\bigg[{\big(h_{\beta }}^{\mathsf{T}}X\big)\big({\varphi _{\beta }^{\mathsf{T}}}X\big)\exp \big({\beta _{0}^{\mathsf{T}}}X\big){\int _{0}^{Y}}\hspace{-0.1667em}\lambda _{0}(u)\hspace{0.1667em}\mathrm{d}u\bigg]\\{} & \displaystyle +\mathbf{E}\bigg[\big({\varphi _{\beta }}^{\mathsf{T}}X\big)\exp \big({\beta _{0}^{\mathsf{T}}}X\big){\int _{0}^{Y}}\hspace{-0.1667em}h_{\lambda }(u)\hspace{0.1667em}\mathrm{d}u\bigg]\\{} & \displaystyle +\mathbf{E}\bigg[\frac{\varDelta }{{\lambda _{0}^{2}}(Y)}h_{\lambda }(Y)\varphi _{\lambda }(Y)\bigg]\\{} & \displaystyle +\mathbf{E}\bigg[\big({h_{\beta }}^{\mathsf{T}}X\big)\exp \big({\beta _{0}^{\mathsf{T}}}X\big){\int _{0}^{Y}}\hspace{-0.1667em}\varphi _{\lambda }(u)\hspace{0.1667em}\mathrm{d}u\bigg]\\{} \displaystyle =& \displaystyle O_{p}\bigg(\frac{1}{\sqrt{n}}\bigg).\end{array}\]

The idea is to find such $\varphi _{\lambda }$ that

(5.7)

\[E\bigg[\big({\varphi _{\beta }}^{\mathsf{T}}X\big)\exp \big({\beta _{0}^{\mathsf{T}}}X\big){\int _{0}^{Y}}\hspace{-0.1667em}h_{\lambda }(u)\hspace{0.1667em}\mathrm{d}u\bigg]+\mathbf{E}\bigg[\frac{\varDelta }{{\lambda _{0}^{2}}(Y)}h_{\lambda }(Y)\varphi _{\lambda }(Y)\bigg]=0.\]

Then after some calculations (using Lemma 6) one can see that (5.7) is equivalent to

(5.8)

\[\displaystyle {\int _{0}^{\tau }}h_{\lambda }{\varphi _{\beta }^{\mathrm{T}}}a(u)G_{C}(u)\hspace{0.1667em}\mathrm{d}u+{\int _{0}^{\tau }}\frac{h_{\lambda }}{\lambda _{0}}\varphi _{\lambda }b(u)G_{C}(u)\hspace{0.1667em}\mathrm{d}u=0.\]

One can take

\[\varphi _{\lambda }(u):=-\frac{{\varphi _{\beta }^{\mathsf{T}}}a(u)}{b(u)}\lambda _{0}(u)\]

as a solution to (5.8). Since $G_{T}(u|X)$ is differentiable function of u, one can conclude that $\varphi _{\lambda }$ is an admissible shift for $\| \varphi _{\beta }\| $ small enough.

Equation (5.6) is now equivalent to

(5.9)

\[{\int _{0}^{\tau }}{h_{\beta }^{\mathsf{T}}}T(u)\varphi _{\beta }\frac{\lambda _{0}(u)G_{C}(u)}{b(u)}\hspace{0.1667em}\mathrm{d}u=O_{p}\bigg(\frac{1}{\sqrt{n}}\bigg).\]

Using Hölder’s inequality and condition (vi), one can easily see that $T(u)$ is positive definite. Now let $\tilde{h}_{\beta }=\frac{\beta _{n}-\beta _{0}}{\| \beta _{n}-\beta _{0}\| }$ and take $\varphi _{\beta }=\frac{\tilde{h}_{\beta }}{C_{1}}$, where $C_{1}>0$ such that $\varphi =(\varphi _{\lambda },\varphi _{\beta })$ is an admissible shift. Then (5.6) can be transformed to

(5.10)

\[\| h_{\beta }\| {\int _{0}^{\tau }}{\tilde{h}_{\beta }^{\mathsf{T}}}T(u)\tilde{h}_{\beta }\frac{\lambda _{0}(u)G_{C}(u)}{b(u)}\hspace{0.1667em}\mathrm{d}u=O_{p}\bigg(\frac{1}{\sqrt{n}}\bigg).\]

Since $\| \tilde{h}_{\beta }\| =1/C_{1}$, left hand side of (5.10) is greater than $\delta \| h_{\beta }\| $ for some $\delta >0$. Using Lemma 8 the first part of the corollary is proved.

If now in (5.6) one takes $\varphi =(\frac{1}{C_{2}},0)$ for large enough $C_{2}>0$, then (5.6) takes form

\[\mathbf{E}\bigg[\frac{\varDelta }{{\lambda _{0}^{2}}(Y)}h_{\lambda }(Y)\frac{1}{C_{2}}\bigg]+\mathbf{E}\bigg[\big({h_{\beta }}^{\mathsf{T}}X\big)\exp \big({\beta _{0}^{\mathsf{T}}}X\big){\int _{0}^{Y}}\hspace{-0.1667em}\frac{1}{C_{2}}\hspace{0.1667em}\mathrm{d}u\bigg]=O_{p}\bigg(\frac{1}{\sqrt{n}}\bigg).\]

Due to $\sqrt{n}\| \beta _{n}-\beta _{0}\| =O_{p}(1)$, the latter equality implies

\[\mathbf{E}\bigg[\frac{\varDelta }{{\lambda _{0}^{2}}(Y)}h_{\lambda }(Y)\frac{1}{C_{2}}\bigg]=O_{p}\bigg(\frac{1}{\sqrt{n}}\bigg)\]

and the second part of the corollary holds. □

We present the main result of this section.

Theorem 12.

Under conditions (i) to (ix), for all admissible shifts the following convergence in probability holds

(5.11)

\[\sqrt{n}\big\langle {Q^{\prime }_{n}}(\theta _{0}),\varphi \big\rangle +\frac{1}{2}\sqrt{n}\big\langle {q^{\prime\prime }_{\infty }}(\theta _{0}),(\varDelta \theta _{n},\varphi )\big\rangle \stackrel{P}{\to }0.\]

Moreover, if φ is a non-random admissible shift then $\langle {Q^{\prime }_{n}}(\theta _{0}),\varphi \rangle \stackrel{d}{\to }N(0,{\sigma _{\varphi }^{2}})$, where ${\sigma _{\varphi }^{2}}=4\mathrm{Var}[\langle {q^{\prime }}(Y,\varDelta ,W,\lambda _{0},\beta _{0}),\varphi \rangle ]$, and

\[\begin{array}{r@{\hskip0pt}l}& \displaystyle \sqrt{n}\big\langle {Q^{\prime\prime }_{n}}(\theta _{0}),(\varDelta \theta _{n},\varphi )\big\rangle \stackrel{d}{\to }N\big(0,{\sigma _{\varphi }^{2}}\big),\\{} & \displaystyle \sqrt{n}\big\langle {q^{\prime\prime }_{\infty }}(\theta _{0}),(\varDelta \theta _{n},\varphi )\big\rangle \stackrel{d}{\to }N\big(0,{\sigma _{\varphi }^{2}}\big).\end{array}\]

Proof.

Using Corollary 11 and inequality (5.1), one can repeat the proof of Proposition 9 with a remark that stochastic boundedness should be changed for a convergence to zero in probability. We use (4.2) to (4.4) to show $\sqrt{n}\langle {q^{\prime\prime }_{\infty }}(\theta _{0}),\varDelta {\theta _{n}^{2}}\rangle =o_{p}(1)$. Thus, the convergence (5.11) is proved. The rest of the proof is trivial. □

6 Proof of Theorem 1

We assume that condition (ix) is satisfied. Thus A is positive definite and, consequently, invertible. Since $T(u)$ is positive definite, M is positive definite as well and therefore, invertible.

Note that due to conditions (vii) and (viii), Theorem 12 is valid for all non-random shifts $\varphi \in {\mathbb{R}}^{k}\times Lip_{1}([0,\tau ])$, where $Lip_{1}([0,\tau ])$ is a class of Lipschitz continuous functions. Take the shift as follows: $\varphi =({\varphi _{\beta }^{\mathsf{T}}}a(u)K(u),\varphi _{\beta })$, where $\varphi _{\beta }$ is a fixed vector of ${\mathbb{R}}^{k}$. For this shift one can rewrite $\langle {Q^{\prime }_{n}}(\theta _{0}),\varphi \rangle $ as

\[\big\langle {Q^{\prime }_{n}}(\theta _{0}),\varphi \big\rangle ={\varphi _{\beta }^{\mathsf{T}}}\xi _{n}.\]

By the CLT applied to $\xi _{n}$ one can see that the limit distribution of $\sqrt{n}\xi _{n}$ is in fact $N_{k}(0,\frac{1}{4}\varSigma _{\beta })$. Note that we have already faced with the shift φ in Corollary 11. In particular, (5.9) yields that $\langle {q^{\prime\prime }_{\infty }}(\theta _{0}),(\varDelta \theta _{n},\varphi )\rangle $ can be rewritten as

\[{\int _{0}^{\tau }}{h_{\beta }^{\mathsf{T}}}T(u)\varphi _{\beta }K(u)G_{C}(u)\hspace{0.1667em}\mathrm{d}u={h_{\beta }^{\mathsf{T}}}M\varphi _{\beta }.\]

Theorem 12 and Cramér-Wold’s theorem yield that

\[{h_{\beta }^{\mathsf{T}}}M\stackrel{d}{\to }N_{k}(0,\varSigma _{\beta }).\]

Since M is invertible, the convergence (2.1) is proved.

Now, for a fixed shift $\varphi _{\lambda }$ take such $\varphi _{\beta }$ that

\[\begin{array}{r@{\hskip0pt}l}& \displaystyle \mathbf{E}\bigg[{\big(h_{\beta }}^{\mathsf{T}}X\big)\big({\varphi _{\beta }^{\mathsf{T}}}X\big)\exp \big({\beta _{0}^{\mathsf{T}}}X\big){\int _{0}^{Y}}\hspace{-0.1667em}\lambda _{0}(u)\hspace{0.1667em}\mathrm{d}u\bigg]+\mathbf{E}\bigg[\big({h_{\beta }}^{\mathsf{T}}X\big)\exp \big({\beta _{0}^{\mathsf{T}}}X\big){\int _{0}^{Y}}\hspace{-0.1667em}\varphi _{\lambda }(u)\hspace{0.1667em}\mathrm{d}u\bigg]\\{} & \displaystyle \hspace{1em}={h_{\beta }^{\mathsf{T}}}\big(A\varphi _{\beta }+m(\varphi _{\lambda })\big)=0.\end{array}\]

Hence, $\varphi _{\beta }=-{A}^{-1}m(\varphi _{\lambda })$. From (5.6) it follows that

\[\begin{array}{r@{\hskip0pt}l}& \displaystyle -\big\langle {q^{\prime\prime }_{\infty }}(\theta _{0}),(\varDelta \theta _{n},\varphi )\big\rangle \\{} & \displaystyle \hspace{1em}=\mathbf{E}\bigg[\big({\varphi _{\beta }}^{\mathsf{T}}X\big)\exp \big({\beta _{0}^{\mathsf{T}}}X\big){\int _{0}^{Y}}\hspace{-0.1667em}h_{\lambda }(u)\hspace{0.1667em}\mathrm{d}u\bigg]+\mathbf{E}\bigg[\frac{\varDelta }{{\lambda _{0}^{2}}(Y)}h_{\lambda }(Y)\varphi _{\lambda }(Y)\bigg]\\{} & \displaystyle \hspace{1em}={\int _{0}^{\tau }}\hspace{-0.1667em}h_{\lambda }(u){\varphi _{\beta }^{\mathsf{T}}}a(u)G_{C}(u)\hspace{0.1667em}\mathrm{d}u+{\int _{0}^{\tau }}\hspace{-0.1667em}h_{\lambda }(u)\frac{\varphi _{\lambda }}{K(u)}G_{C}(u)\hspace{0.1667em}\mathrm{d}u\\{} & \displaystyle \hspace{1em}={\int _{0}^{\tau }}\hspace{-0.1667em}h_{\lambda }(u)\bigg(-a{(u)}^{\mathsf{T}}\varphi _{\beta }+\frac{\varphi _{\lambda }}{K(u)}\bigg)G_{C}(u)\hspace{0.1667em}\mathrm{d}u.\end{array}\]

In view of Theorem 12 and the remark at the beginning of the proof, in order to show the convergence (2.4), one should show that the equation (2.3) has a Lipschitz continuous solution $\varphi _{\lambda }$. But if $\varphi _{\lambda }$ is a solution to (2.3) then

(6.1)

\[\varphi _{\lambda }(u)=K(u)f(u)+K(u){a}^{\mathsf{T}}(u)C\]

for some constant $C\in {\mathbb{R}}^{k}$ and thus, is Lipschitz continuous. After substitution (6.1) in (2.3) we obtain

\[{a}^{\mathsf{T}}(u)\bigg[C-{\int _{0}^{\tau }}\hspace{-0.1667em}\big(f(u)+{a}^{\mathsf{T}}(u)C\big){A}^{-1}a(u)K(u)G_{C}(u)\hspace{0.1667em}\mathrm{d}u\bigg]=0.\]

Let $S={\int _{0}^{\tau }}\hspace{-0.1667em}f(u){A}^{-1}a(u)K(u)G_{C}(u)\hspace{0.1667em}\mathrm{d}u$ and $P(u)=\mathbf{E}[X{X}^{\mathsf{T}}\exp ({\beta _{0}^{\mathsf{T}}}X)G_{T}(u|X)]$. We show that it is possible to choose C so that

\[C-{\int _{0}^{\tau }}\hspace{-0.1667em}{a}^{\mathsf{T}}(u)C{A}^{-1}a(u)K(u)G_{C}(u)\hspace{0.1667em}\mathrm{d}u=S.\]

After transposing both sides and multiplying by A, we have

\[{C}^{\mathsf{T}}\bigg(A-{\int _{0}^{\tau }}\hspace{-0.1667em}a(u){a}^{\mathsf{T}}(u)K(u)G_{C}(u)\hspace{0.1667em}\mathrm{d}u\bigg)={S}^{\mathsf{T}}A.\]

Transformation of $R:=A-{\int _{0}^{\tau }}\hspace{-0.1667em}a(u){a}^{\mathsf{T}}(u)K(u)G_{C}(u)\hspace{0.1667em}\mathrm{d}u$ leads to

\[R={\int _{0}^{\tau }}\hspace{-0.1667em}\lambda _{0}(u)\bigg(P(u)-\frac{a(u){a}^{\mathsf{T}}(u)}{b(u)}\bigg)G_{C}(u)\hspace{0.1667em}\mathrm{d}u.\]

In the proof of Corollary 11 it was shown that $P(u)-\frac{a(u){a}^{\mathsf{T}}(u)}{b(u)}=\frac{T(u)}{b(u)}$ is a positive definite matrix. Therefore, R is positive definite and invertible. Hence, (2.3) has a unique solution and convergence (2.4) holds. This completes the proof.

7 Conclusion

Here we studied properties of the Corrected MLE $(\lambda _{n},\beta _{n})$ proposed by Kukush et al. [7] in Cox proportional hazards model with measurement error. Asymptotic normality was obtained for $\beta _{n}$ and integral functionals of $\lambda _{n}$. We also present estimator $(\widehat{\lambda }_{n},\widehat{\beta }_{n})$ that inherits properties of $(\lambda _{n},\beta _{n})$ and transforms the maximization problem to a parametric one.

In future we intend to provide simulations in this model.

Authors

Abstract

1 Introduction

(1.1)

(1.2)

(1.3)

Remark.

(1.4)

2 Main result

Theorem 1.

(2.1)

(2.2)

(2.3)

Corollary 2.

(2.4)

3 Computation of estimators

(3.1)

(3.2)

(3.3)

Theorem 3.

Theorem 4.

4 Stochastic boundedness of transformed and normalized estimators

Theorem 5.

Lemma 6.

Lemma 7.

Proof of Lemma 7.

Lemma 8.

Proof of Theorem 5.

(4.1)

(4.2)

(4.3)

(4.4)

(4.5)

(4.6)

(4.7)

(4.8)

(4.9)

(4.10)

(4.11)

(4.12)

(4.13)

5 Auxiliary results

(5.1)

Proposition 9.

Lemma 10.

Proof of Lemma 10.

Proof of Proposition 9.

(5.2)

(5.3)

(5.4)

(5.5)

Corollary 11.

Proof.

(5.6)

(5.7)

(5.8)

(5.9)

(5.10)

Theorem 12.

(5.11)

Proof.

6 Proof of Theorem 1

(6.1)

7 Conclusion

References

Export citation

Copy and paste formatted citation

Download citation in file

Theorem 1.

(2.1)

(2.2)

(2.3)

Theorem 3.

Theorem 4.

Theorem 5.

Theorem 12.

(5.11)