1 Introduction
Survival analysis is a set of statistical methods for analysis of data representing times to the occurrence of some specified event. It is an important part of mathematical statistics due to a wide range of applications: medicine, reliability theory, etc.
The Cox proportional hazards (CPH) model is a semi-parametric regression model that is used to study the association between the survival time of subjects (the so-called lifetime) and one or more predictor variables (the so-called covariates). One of the main features of survival analysis is the presence of incomplete observations or censoring. In such cases only partial information about the true lifetime is available, e.g., that it exceeds some value in case of right censoring. The primary quantities of interest to estimate are the hazard function and the regression parameter. The hazard function represents the instantaneous rate at which events occur at a particular time, given that the individual has survived up to that time. The vector of regression parameters represents effects of covariates on the hazard.
D. R. Cox [4] estimates the regression parameter using partial likelihood without additional assumptions on the baseline hazard rate. Estimates of cumulative hazard are proposed by [9] and [3]. In these papers, it is assumed that the baseline hazard rate belongs to some parametric family (piece-wise constant on certain intervals). Additionally, models without errors in variables are considered.
Nonparametric inference under shape constraints has been an actively researched field in recent decades. It is a framework where the estimated parameters or functions are constrained to satisfy certain shape properties such as monotonicity, convexity or log-concavity. The development of nonparametric methods for estimation of monotone density started with pioneering paper of Grenander [7]. The 2018 special issue of Statistical Science was devoted to inference under shape constraints. A review of recent progress in log-concave density estimation is given in [16]. A review of methods for shape constrainted baseline hazard function in case of censored data is presented in [8]. A monotone baseline hazard rate in Cox model is considered in [14] and [5]. A different variation of shape constrained Cox model with application to breast cancer patients’ survival is considered in [15].
One should be cautious to make inference using regression when covariates could be measured with errors. It is known that applying of naive methodology may lead to inconsistent estimation, see Wallace [17]. CPH model with measurement errors is studied, among others, by Kong and Gu [10] and Augustin [1]. Typically one estimates at first the vector of regression parameters, and then an estimator of the cumulative baseline hazard rate is constructed.
In Kukush et al. [11] the baseline hazard rate belongs to a bounded set of nonnegative Lipschitz functions, and is estimated simultaneously with the vector of regression parameters. This approach is further developed in Kukush and Chernova [12], where the baseline hazard rate belongs to an unbounded set of nonnegative Lipschitz functions.
In all aforementioned papers, measurement errors are assumed to be independent and identically distributed. In practice, measurement errors can be expected to vary considerably among different subjects. E.g., Augustin et al. [2] propose the regression calibration estimation method in CPH model under heteroscedastic measurement errors for nutritional data.
In the present paper we consider a CPH model similar to [11] and [12], but with heteroscedastic measurement errors. The paper is organized as follows. Section 2 describes the observation model and gives main assumptions. In Section 3 we construct a simultaneous consistent estimator $({\hat{\lambda }_{n}},{\hat{\beta }_{n}})$ of the baseline hazard rate and the regression parameter in CPH model with heteroscedastic errors in covariates under bounded parameter set. In Section 4 we do the same for unbounded parameter set, and Section 5 concludes.
2 Model description
The Cox proportional hazards model introduced in [4] assumes that the lifetime T has a hazard rate at moment t for a subject with random vector of covariates X as follows:
\[ \lambda (t|\mathbf{X};{\lambda _{0}},{\beta _{0}})={\lambda _{0}}(t)\exp ({\beta _{0}^{\top }}\mathbf{X}),\hspace{1em}t\ge 0.\]
Here, ${\beta _{0}}$ is a regression parameter belonging to ${\Theta _{\beta }}\subset {\mathbb{R}^{k}}$, and ${\lambda _{0}}(\cdot )\in {\Theta _{\lambda }}\subset C[0,\tau ]$ is a baseline hazard function. The unbounded parameter set ${\Theta _{\lambda }}$ consists of all nonnegative functions with bounded Lipschitz constant. Instead of the lifetime T, right-censored data $Y:=\min \{T,C\}$ and $\Delta :={I_{\{T\le C\}}}$ are available. The censor C has unknown distribution concentrated on a given interval $[0,\tau ]$. A pair $(\mathbf{X},T)$ and random variable C are independent.Assume that instead of true covariates, one can only observe surrogate vector variables
where (not necessarily identically distributed) measurement errors $\mathbf{{U_{i}}},i=1,2,\dots ,n$, are mutually independent centered random vectors that are also independent of the random sequence $(\mathbf{{X_{i}}},{T_{i}},{C_{i}},{Y_{i}},{\Delta _{i}}),i=1,2,\dots ,n$. The moment generating functions ${M_{\mathbf{{U_{i}}}}}(z):=\mathsf{E}\hspace{2.5pt}{e^{{z^{\top }}\mathbf{{U_{i}}}}}$ of the random measurement errors $\mathbf{{U_{i}}}$ are assumed known. The goal is to estimate β and λ based on observations $\left({Y_{i}},{\Delta _{i}},\mathbf{{W_{i}}}\right),i=1,\dots ,n$.
Introduce the following assumptions.
-
(i) ${\Theta _{\lambda }}:=\{\hspace{2.5pt}f:[0,\tau ]\to \mathbb{R}\hspace{2.5pt}|\hspace{0.2778em}f(t)\ge a,\hspace{2.5pt}\forall t\in [0,\tau ],\hspace{0.2778em}f(0)\le A,\hspace{0.2778em}\text{and}\hspace{2.5pt}|f(t)-f(s)|\le L|t-s|,\forall t,s\in [0,\tau ]\hspace{2.5pt}\}$, where a, A and L are fixed positive constants, with $a\lt A$.
-
(i’) ${\Theta _{\lambda }}:=\{\hspace{2.5pt}f:[0,\tau ]\to \mathbb{R}\hspace{2.5pt}|\hspace{0.2778em}f(t)\ge 0,\hspace{2.5pt}\forall t\in [0,\tau ],\hspace{0.2778em}\text{and}\hspace{2.5pt}|f(t)-f(s)|\le L|t-s|,\forall t,s\in [0,\tau ]\hspace{2.5pt}\}$, where L is a fixed positive constant.
-
(ii) ${\Theta _{\boldsymbol{\beta }}}$ is a compact set in ${\mathbb{R}^{k}}$.
-
(iii) There exist positive K and ϵ such that for all $n\ge 1$,
-
(iv) $\mathsf{E}\hspace{2.5pt}{e^{2D\| \mathbf{X}\| }}\lt \infty $, with D defined in (iii).
-
(v) $\tau \gt 0$ is the right endpoint of censor’s distribution, i.e. $\mathsf{P}(C\gt \tau )=0$ and for all $\epsilon \gt 0$, $\mathsf{P}(C\gt \tau -\epsilon )\gt 0$.
-
(vi) The matrix $\mathsf{E}\hspace{2.5pt}\mathbf{\mathbf{X}{X^{\top }}}$ of second moments is positive definite.
Likelihood construction in presence of censored data is described in [13]. In case where covariates are observed without errors, the log-likelihood function is given by
\[\begin{aligned}{}{Q_{n}}(\lambda ,\boldsymbol{\beta })& :=\frac{1}{n}{\sum \limits_{i=1}^{n}}q({Y_{i}},{\Delta _{i}},{\mathbf{X}_{i}};\lambda ,\boldsymbol{\beta }),\hspace{1em}\text{with}\\ {} q({Y_{i}},{\Delta _{i}},\mathbf{{X_{i}}};\lambda ,\boldsymbol{\beta })& :={\Delta _{i}}\cdot (\log \lambda ({Y_{i}})+{\boldsymbol{\beta }^{\top }}\mathbf{{X_{i}}})-\exp ({\boldsymbol{\beta }^{\top }}\mathbf{{X_{i}}}){\int _{0}^{{Y_{i}}}}\lambda (u)du.\end{aligned}\]
T. Augustin [1] proposed the following objective function to adjust for homoscedastic measurement errors
\[ {Q_{n}^{cor}}(\lambda ,\boldsymbol{\beta }):=\frac{1}{n}{\sum \limits_{i=1}^{n}}{q^{cor}}({Y_{i}},{\Delta _{i}},{\mathbf{W}_{i}};\lambda ,\boldsymbol{\beta }),\]
with
\[ {q^{cor}}({Y_{i}},{\Delta _{i}},\mathbf{{W_{i}}};\lambda ,\boldsymbol{\beta }):={\Delta _{i}}\cdot (\log \lambda ({Y_{i}})+{\boldsymbol{\beta }^{\top }}\mathbf{{W_{i}}})-\frac{\exp ({\boldsymbol{\beta }^{\top }}\mathbf{{W_{i}}})}{{M_{\mathbf{{U_{i}}}}}(\boldsymbol{\beta })}{\int _{0}^{{Y_{i}}}}\lambda (u)du.\]
We have
\[ \mathsf{E}[{q^{cor}}({Y_{i}},{\Delta _{i}},\mathbf{{W_{i}}})\hspace{2.5pt}|\hspace{2.5pt}{Y_{i}},{\Delta _{i}},\mathbf{{X_{i}}}]=q({Y_{i}},{\Delta _{i}},\mathbf{{X_{i}}}).\]
Therefore,
\[ \mathsf{E}\hspace{2.5pt}{q^{cor}}({Y_{i}},{\Delta _{i}},\mathbf{{W_{i}}})=\mathsf{E}\hspace{2.5pt}q({Y_{i}},{\Delta _{i}},\mathbf{{X_{i}}})=\mathsf{E}\hspace{2.5pt}q({Y_{1}},{\Delta _{1}},\mathbf{{X_{1}}}).\]
The latter equality means that $\mathsf{E}[{q^{cor}}({Y_{i}},{\Delta _{i}},\mathbf{{W_{i}}})]$ does not depend on i.Denote
\[ {q_{\infty }}(\lambda ,\beta )=\frac{1}{n}{\sum \limits_{i=1}^{n}}\mathsf{E}{q^{cor}}({Y_{i}},{\Delta _{i}},{\mathbf{W}_{i}};\lambda ,\boldsymbol{\beta })=\mathsf{E}{q^{cor}}({Y_{1}},{\Delta _{1}},\mathbf{{W_{1}}}).\]
We will make use of both forms of ${q_{\infty }}$.We define a simultaneous estimator of the baseline hazard rate and regression parameter under bounded and unbounded parameter sets as follows.
Definition 1 (under bounded parameter set).
A Borel function $\left(\hat{\lambda },\hat{\boldsymbol{\beta }}\right)=\left({\hat{\lambda }_{n}},{\hat{\boldsymbol{\beta }}_{n}}\right)$ of observations $({Y_{i}},{\Delta _{i}},{\mathbf{W}_{i}})$, $i=1,\dots ,n$, with values in $\Theta ={\Theta _{\beta }}\times {\Theta _{\lambda }}$, where ${\Theta _{\lambda }}$ is bounded, and such that
is called a simultaneous estimator of the baseline hazard rate and the regression parameter under the bounded parameter set Θ.
Definition 2 (under unbounded parameter set).
Let $\{{\varepsilon _{n}}\}$ be a fixed sequence of positive numbers such that ${\varepsilon _{n}}\downarrow 0$ as $n\to \infty $. A Borel function $\left(\hat{\lambda },\hat{\boldsymbol{\beta }}\right)=\left({\hat{\lambda }_{n}},{\hat{\boldsymbol{\beta }}_{n}}\right)$ of observations $({Y_{i}},{\Delta _{i}},{\mathbf{W}_{i}})$, $i=1,\dots ,n$, with values in $\Theta ={\Theta _{\beta }}\times {\Theta _{\lambda }}$, where ${\Theta _{\lambda }}$ is unbounded, and such that
is called a simultaneous estimator of the baseline hazard rate and the regression parameter over the unbounded parameter set Θ.
3 Simultaneous estimation under bounded parameter set
We extend the result about consistency of a simultaneous estimator of regression parameter and baseline hazard under bounded parameter set from Kukush et al. [11] to the case of heterogeneous measurement errors. In the next section we proceed with a similar result for unbounded parameter set. The main result of this section is the following theorem.
Theorem 1.
Under (i)–(vi), $\left(\hat{\lambda },\hat{\boldsymbol{\beta }}\right)$ defined in (1) is a strongly consistent estimator of true parameters $\left({\lambda _{0}},\boldsymbol{\beta }\right)$.
In what follows we rely on the following version of the Strong Law of Large Numbers for not necessary identically distributed random variables and the next easy to prove Statement 1.
Theorem (Kolmogorov’s Strong Law of Large Numbers, section 10.7 [6]).
Statement 1.
Let ${\{{s_{n}}\}_{n\ge 1}}$ be a real valued sequence such that $\frac{1}{n}{\textstyle\sum _{i=1}^{n}}{s_{i}}$ is bounded. Then ${\textstyle\sum _{n=1}^{\infty }}{s_{n}}/{n^{2}}$ converges.
Remark 1.
By K we will denote any positive deterministic constant the exact value of which is not important. Note that K may change from line to line (or even within one line).
Proof of Theorem 1.
Similarly to the proof of Theorem 1 from [11], one can show the strong consistency of the estimators if
Denote by $\frac{\partial {q^{cor}}}{\partial \boldsymbol{\beta }}$ the derivative of ${q^{cor}}$ with respect to vector $\boldsymbol{\beta }$. For a fixed value of $\boldsymbol{\beta }$, consider ${q^{cor}}$ as a function of λ, i.e. ${q^{cor}}(\cdot ,\boldsymbol{\beta }):C[0,\tau ]\to \mathbb{R}$. Denote by $\frac{\partial {q^{cor}}}{\partial \lambda }$ the Fréchet derivative of ${q^{cor}}$ with respect to the function λ. Then $\frac{\partial {q^{cor}}}{\partial \lambda }:C[0,\tau ]\to \mathcal{L}(C[0,\tau ],\mathbb{R})$ is a linear continuous functional. For $h\in C[0,\tau ]$, let $\left\langle \frac{\partial {q^{cor}}}{\partial \lambda },h\right\rangle $ denote the action of the functional $\frac{\partial {q^{cor}}}{\partial \lambda }$ on h. We have
-
(a) $\underset{(\lambda ,\boldsymbol{\beta })\in \Theta }{\sup }|{Q_{n}^{cor}}(\lambda ,\boldsymbol{\beta })-{q_{\infty }}(\lambda ,\beta )|\to 0\hspace{1em}\text{a.s. as}\hspace{2.5pt}n\to \infty $;
-
(b) ${q_{\infty }}(\lambda ,\beta )\le {q_{\infty }}({\lambda _{0}},{\beta _{0}})$, and equality holds if and only if $\lambda ={\lambda _{0}}$, $\beta ={\beta _{0}}$.
\[ \left\langle \frac{\partial {q^{cor}}}{\partial \lambda }(Y,\Delta ,W;\lambda ,\boldsymbol{\beta }),h\right\rangle =\frac{\Delta h(Y)}{\lambda (Y)}-\frac{{e^{{\boldsymbol{\beta }^{\top }}\mathbf{W}}}}{{M_{\mathbf{U}}}(\boldsymbol{\beta })}{\int _{0}^{Y}}h(u)du,\]
\[ \left|\left|\frac{\partial {q^{cor}}}{\partial \lambda }(Y,\Delta ,W;\lambda ,\boldsymbol{\beta })\right|\right|=\underset{\left|\left|h\right|\right|=1}{\sup }\left\langle \frac{\partial {q^{cor}}}{\partial \lambda }(Y,\Delta ,W;\lambda ,\boldsymbol{\beta }),\hspace{2.5pt}h\right\rangle ,\]
\[\begin{aligned}{}\frac{\partial {q^{cor}}}{\partial \boldsymbol{\beta }}({Y_{i}},{\Delta _{i}},\mathbf{{W_{i}}};\lambda ,\boldsymbol{\beta })& ={\Delta _{i}}\cdot \mathbf{{W_{i}}}-\frac{{M_{\mathbf{{U_{i}}}}}(\boldsymbol{\beta })\mathbf{{W_{i}}}-\mathsf{E}(\mathbf{{U_{i}}}{\mathbf{e}^{{\boldsymbol{\beta }^{\top }}\mathbf{{U_{i}}}}})}{{M_{\mathbf{{U_{i}}}}^{2}}(\boldsymbol{\beta })}{e^{{\boldsymbol{\beta }^{\top }}\mathbf{{U_{i}}}}}{\int _{0}^{Y}}\lambda (u)du.\end{aligned}\]
According to [11] in order to verify (a) it suffices to show:
To investigate when the condition (a1) holds, rewrite
and therefore
-
(a1) ${Q_{n}^{cor}}(\lambda ,\boldsymbol{\beta })-{q_{\infty }}(\lambda ,\beta )\to 0\hspace{1em}\text{a.s. as}\hspace{2.5pt}n\to \infty \hspace{2.5pt}\text{for all}\hspace{2.5pt}(\lambda ,\beta )\in \Theta $;
-
(a2) there exists a positive constant K such that
-
(a3) ${q_{\infty }}(\lambda ,\beta )$ is continuous in $(\lambda ,\beta )$.
\[\begin{aligned}{}{Q_{n}^{cor}}(\lambda ,\boldsymbol{\beta }):=& \frac{1}{n}{\sum \limits_{i=1}^{n}}{\Delta _{i}}\cdot \log \lambda ({Y_{i}})+\frac{1}{n}{\sum \limits_{i=1}^{n}}{\Delta _{i}}{\boldsymbol{\beta }^{\top }}\mathbf{{X_{i}}}+\frac{1}{n}{\sum \limits_{i=1}^{n}}{\Delta _{i}}{\boldsymbol{\beta }^{\top }}\mathbf{{U_{i}}}-\\ {} & -\frac{1}{n}{\sum \limits_{i=1}^{n}}\frac{\exp ({\boldsymbol{\beta }^{\top }}\mathbf{{W_{i}}})}{{M_{\mathbf{{U_{i}}}}}(\boldsymbol{\beta })}{\int _{0}^{{Y_{i}}}}\lambda (u)du.\end{aligned}\]
First and second summand converge to their expectations due to SLLN. Consider third summand. It holds $\mathrm{Var}({\Delta _{i}}{\boldsymbol{\beta }^{\top }}\mathbf{{U_{i}}})\le \mathsf{E}{({\boldsymbol{\beta }^{\top }}\mathbf{{U_{i}}})^{2}}\le K\cdot \mathsf{E}\mathbf{{\left|\left|{U_{i}}\right|\right|^{2}}}$. Then
\[ \frac{1}{n}{\sum \limits_{i=1}^{n}}\mathrm{Var}({\Delta _{i}}{\boldsymbol{\beta }^{\top }}\mathbf{{U_{i}}})\le K\cdot \frac{1}{n}{\sum \limits_{i=1}^{n}}\mathsf{E}\mathbf{{\left|\left|{U_{i}}\right|\right|^{2}}}\]
is bounded due to Remark 2. Therefore by Statement 1
\[ {\sum \limits_{i=1}^{\infty }}\frac{\mathrm{Var}({\Delta _{i}}{\boldsymbol{\beta }^{\top }}\mathbf{{U_{i}}})}{{i^{2}}}\lt \infty \hspace{2.5pt}.\]
SLLN yields
\[ \frac{1}{n}{\sum \limits_{i=1}^{n}}{\Delta _{i}}{\boldsymbol{\beta }^{\top }}\mathbf{{U_{i}}}-\frac{1}{n}{\sum \limits_{i=1}^{n}}\mathsf{E}[{\Delta _{i}}{\boldsymbol{\beta }^{\top }}\mathbf{{U_{i}}}]\to 0\hspace{1em}\text{a.s. as}\hspace{2.5pt}n\to \infty .\]
Consider forth summand. We have
\[ \exp ({\boldsymbol{\beta }^{\top }}\mathbf{{W_{i}}}){\int _{0}^{{Y_{i}}}}\lambda (u)du\le K\cdot {e^{(D-\varepsilon )\left|\left|\mathbf{{W_{i}}}\right|\right|}}.\]
Due to Jensen’s inequality ${M_{\mathbf{{U_{i}}}}}(\boldsymbol{\beta })\ge {e^{\mathsf{E}{\boldsymbol{\beta }^{\top }}\mathbf{{U_{i}}}}}=1$. For all $i\ge 1$ using (iii)–(iv) we obtain
\[\begin{aligned}{}& \mathrm{Var}\left(\frac{\exp ({\boldsymbol{\beta }^{\top }}\mathbf{{W_{i}}})}{{M_{\mathbf{{U_{i}}}}}(\boldsymbol{\beta })}{\int _{0}^{{Y_{i}}}}\lambda (u)du\right)\le \mathsf{E}{\left(\frac{\exp ({\boldsymbol{\beta }^{\top }}\mathbf{{W_{i}}})}{{M_{\mathbf{{U_{i}}}}}(\boldsymbol{\beta })}{\int _{0}^{{Y_{i}}}}\lambda (u)du\right)^{2}}\le \\ {} & \le \frac{K}{{\min _{\beta }}{M_{\mathbf{{U_{i}}}}^{2}}(\boldsymbol{\beta })}\mathsf{E}\left[{e^{2(D-\varepsilon )\left|\left|\mathbf{{X_{i}}}\right|\right|}}\right]\cdot \mathsf{E}\left[{e^{2(D-\varepsilon )\left|\left|\mathbf{{U_{i}}}\right|\right|}}\right]\le K\hspace{2.5pt}\cdot \mathsf{E}\left[{e^{2(D-\varepsilon )\left|\left|\mathbf{{U_{i}}}\right|\right|}}\right].\end{aligned}\]
Then
\[ \frac{1}{n}{\sum \limits_{i=1}^{n}}\mathrm{Var}\left(\frac{\exp ({\boldsymbol{\beta }^{\top }}\mathbf{{W_{i}}})}{{M_{\mathbf{{U_{i}}}}}(\boldsymbol{\beta })}{\int _{0}^{{Y_{i}}}}\lambda (u)du\right)\le K\cdot \frac{1}{n}\hspace{2.5pt}\mathsf{E}\left[{e^{2(D-\varepsilon )\left|\left|\mathbf{{U_{i}}}\right|\right|}}\right].\]
Using Statement 1
(6)
\[ {\sum \limits_{i=1}^{\infty }}\frac{1}{{i^{2}}}\mathrm{Var}\left(\frac{\exp ({\boldsymbol{\beta }^{\top }}\mathbf{{W_{i}}})}{{M_{\mathbf{{U_{i}}}}}(\boldsymbol{\beta })}{\int _{0}^{{Y_{i}}}}\lambda (u)du\right)\lt \infty \hspace{2.5pt},\]
\[ \frac{1}{n}{\sum \limits_{i=1}^{n}}\frac{\exp ({\boldsymbol{\beta }^{\top }}\mathbf{{W_{i}}})}{{M_{\mathbf{{U_{i}}}}}(\boldsymbol{\beta })}{\int _{0}^{{Y_{i}}}}\lambda (u)du-\frac{1}{n}{\sum \limits_{i=1}^{n}}\mathsf{E}\left[\frac{\exp ({\boldsymbol{\beta }^{\top }}\mathbf{{W_{i}}})}{{M_{\mathbf{{U_{i}}}}}(\boldsymbol{\beta })}{\int _{0}^{{Y_{i}}}}\lambda (u)du\right]\to 0\]
a.s. as $n\to \infty $. Thus, under conditions (i)–(iv) (a1) holds.Next, we will verify (a2). We have
\[\begin{aligned}{}\underset{(\lambda ,\boldsymbol{\beta })\in \Theta }{\sup }& \left|\left|\frac{\partial {q^{cor}}}{\partial \beta }({Y_{i}},{\Delta _{i}},\mathbf{{W_{i}}};\lambda ,\boldsymbol{\beta })\right|\right|\le \left|\left|\mathbf{{W_{i}}}\right|\right|+\frac{K}{{\min _{\beta }}{M_{\mathbf{{U_{i}}}}}(\boldsymbol{\beta })}\underset{\beta }{\sup }\left|\left|\mathbf{{W_{i}}}\right|\right|{e^{{\boldsymbol{\beta }^{\top }}\left|\left|\mathbf{{W_{i}}}\right|\right|}}+\\ {} & +\frac{K}{{\min _{\beta }}{M_{\mathbf{{U_{i}}}}^{2}}(\boldsymbol{\beta })}\underset{\beta }{\sup }\mathsf{E}\left[\left|\left|\mathbf{{U_{i}}}\right|\right|{e^{{\boldsymbol{\beta }^{\top }}\left|\left|\mathbf{{U_{i}}}\right|\right|}}\right]{e^{{\boldsymbol{\beta }^{\top }}\left|\left|\mathbf{{W_{i}}}\right|\right|}}.\end{aligned}\]
Then
\[\begin{aligned}{}& \frac{1}{n}{\sum \limits_{i=1}^{n}}\mathsf{E}\underset{(\lambda ,\boldsymbol{\beta })\in \Theta }{\sup }\left|\left|\frac{\partial {q^{cor}}}{\partial \beta }({Y_{i}},{\Delta _{i}},{W_{i}};\lambda ,\boldsymbol{\beta })\right|\right|\le \mathsf{E}\left|\left|{X_{1}}\right|\right|+\frac{1}{n}{\sum \limits_{i=1}^{n}}\mathsf{E}\left|\left|{U_{i}}\right|\right|+\\ {} & +\mathsf{E}\underset{\beta }{\sup }\left(\left|\left|X\right|\right|{e^{{\boldsymbol{\beta }^{\top }}\mathbf{X}}}\right)\cdot \frac{1}{n}{\sum \limits_{i=1}^{n}}\mathsf{E}{e^{D\left|\left|{U_{i}}\right|\right|}}+{e^{D\left|\left|X\right|\right|}}\cdot \frac{1}{n}{\sum \limits_{i=1}^{n}}\mathsf{E}\underset{\beta }{\sup }\left(\left|\left|{U_{i}}\right|\right|{e^{{\boldsymbol{\beta }^{\top }}\mathbf{{U_{i}}}}}\right).\end{aligned}\]
Conditions (iii)–(iv) imply that $\mathsf{E}\left|\left|{X_{i}}\right|\right|{e^{(D-\varepsilon )\left|\left|{X_{i}}\right|\right|}}$ and $\mathsf{E}\left|\left|{U_{i}}\right|\right|{e^{(D-\varepsilon )\left|\left|{U_{i}}\right|\right|}}$ are bounded for all $i\ge 1$. So there exists a positive constant, such that (4) holds.
One can show that
\[ \underset{(\lambda ,\boldsymbol{\beta })\in \Theta }{\sup }\left|\left|\frac{\partial {q^{cor}}}{\partial \lambda }({Y_{i}},{\Delta _{i}},\mathbf{{W_{i}}};\lambda ,\boldsymbol{\beta })\right|\right|\le \underset{\lambda }{\sup }\frac{1}{\left|\left|\lambda \right|\right|}+\frac{\tau \cdot {e^{D(\left|\left|{X_{i}}\right|\right|+\left|\left|{U_{i}}\right|\right|)}}}{{\min _{\beta }}{M_{\mathbf{{U_{i}}}}}(\boldsymbol{\beta })}.\]
Therefore
\[ \frac{1}{n}{\sum \limits_{i=1}^{n}}\mathsf{E}\underset{(\lambda ,\boldsymbol{\beta })\in \Theta }{\sup }\left|\left|\frac{\partial {q^{cor}}}{\partial \lambda }({Y_{i}},{\Delta _{i}},\mathbf{{W_{i}}};\lambda ,\boldsymbol{\beta })\right|\right|\le K\left(1+\mathsf{E}{e^{D\left|\left|X\right|\right|}}\frac{1}{n}{\sum \limits_{i=1}^{n}}\mathsf{E}{e^{D\left|\left|{U_{i}}\right|\right|}}\right).\]
Conditions (i)–(iv) imply that (5) holds.(a3) It is clear that ${q^{cor}}({Y_{i}},{\Delta _{i}},\mathbf{{W_{i}}};\lambda ,\boldsymbol{\beta })$ is continuous in $(\lambda ,\beta )\in \Theta $, and under conditions (i)–(iv) $\mathsf{E}|{q^{cor}}({Y_{i}},{\Delta _{i}},\mathbf{{W_{i}}};\lambda ,\boldsymbol{\beta })|$ is bounded for all $1\le i\le n$. Thus, the dominated convergence theorem implies that ${q_{\infty }}(\lambda ,\boldsymbol{\beta })$ is continuous in $(\lambda ,\beta )\in \Theta $.
Proof that (b) holds is essentially the same as in [11].
To sum up, under (i)–(iv), the pair $\left(\hat{\lambda },\hat{\boldsymbol{\beta }}\right)$ is a strongly consistent estimator of true parameter $\left({\lambda _{0}},\boldsymbol{\beta }\right)$. □
4 Simultaneous estimator under unbounded parameter set
Let us now consider baseline hazards function form unbounded parameter set defined in (i’). The goal of this section is to extend Theorem 3 from [12] in the case of heteroscadastic measurement errors. The main result of this section follows.
Theorem 2.
Under (i’)–(vi), $\left(\hat{\lambda },\hat{\boldsymbol{\beta }}\right)$ defined in (2) is a strongly consistent estimator of true parameters $\left({\lambda _{0}},\boldsymbol{\beta }\right)$.
Proof.
We follow the line of the proof of Theorem 3 from [12]. A relation holds eventually if it is valid for all sample sizes n starting from some random number, almost surely.
We firstly show that
eventually for sufficiently large nonrandom numbers $R\gt ||{\lambda _{0}}||$, where ${\Theta _{\lambda }^{R}}={\Theta _{\lambda }}\cap \bar{B}(0,R)$, ${\Theta ^{R}}={\Theta _{\lambda }^{R}}\times {\Theta _{\beta }}$.
(7)
\[ \underset{(\lambda ,\beta )\in {\Theta ^{R}}}{\sup }{Q_{n}^{cor}}(\lambda ,\beta )\gt \underset{(\lambda ,\beta )\in \Theta \setminus {\Theta ^{R}}}{\sup }{Q_{n}^{cor}}(\lambda ,\beta )\]Denote ${D_{1}}={\max _{\beta }}\| \beta \| $. We have
\[ \underset{(\lambda ,\beta )\in \Theta \setminus {\Theta ^{R}}}{\sup }{Q_{n}^{cor}}(\lambda ,\beta )\le {I_{1}}+\underset{\substack{\lambda \in {\Theta _{\lambda }}:\\ {} \lambda (0)\gt R}}{\sup }{I_{2}}+{I_{3}},\]
where
\[\begin{aligned}{}{I_{1}}& =-(R-L\tau )\frac{1}{n}{\sum \limits_{i=1}^{n}}\frac{\exp (-{D_{1}}||{W_{i}}||){Y_{i}}\cdot I(\Delta =0)}{\underset{\beta \in {\Theta _{\beta }}}{\max }{M_{U}}(\beta )},\\ {} {I_{2}}& =\ln (\lambda (0)+L\tau )\frac{1}{n}\sum \limits_{i:{\Delta _{i}}=1}{\Delta _{i}}-(\lambda (0)+L\tau )\frac{1}{n}{\sum \limits_{i=1}^{n}}\frac{\exp (-{D_{1}}||{W_{i}}||){Y_{i}}\cdot I(\Delta =1)}{\underset{\beta \in {\Theta _{\beta }}}{\max }{M_{U}}(\beta )},\\ {} {I_{3}}& =\frac{1}{n}\sum \limits_{i:{\Delta _{i}}=1}{D_{1}}||{W_{i}}||+2L\tau \frac{1}{n}{\sum \limits_{i=1}^{n}}\frac{\exp (-{D_{1}}||{W_{i}}||){Y_{i}}\cdot I(\Delta =1)}{\underset{\beta \in {\Theta _{\beta }}}{\max }{M_{U}}(\beta )}\hspace{2.5pt}.\end{aligned}\]
We have
\[\begin{aligned}{}& Var\left({Y_{i}}{e^{-{D_{1}}\left|\left|{W_{i}}\right|\right|}}I(\Delta =0)\right)\le \mathsf{E}\left({Y_{i}^{2}}{e^{-2{D_{1}}\left|\left|{W_{i}}\right|\right|}}\right)\le K\cdot \mathsf{E}{e^{-2{D_{1}}\left|\left|{X_{1}}\right|\right|}}\mathsf{E}{e^{-2{D_{1}}\left|\left|{U_{i}}\right|\right|}},\\ {} & \frac{1}{n}{\sum \limits_{i=1}^{n}}Var\left({Y_{i}}{e^{-{D_{1}}\left|\left|{W_{i}}\right|\right|}}I(\Delta =0)\right)\le K\cdot \mathsf{E}{e^{-2{D_{1}}\left|\left|{X_{1}}\right|\right|}}\frac{1}{n}{\sum \limits_{i=1}^{n}}\mathsf{E}{e^{-2{D_{1}}\left|\left|{U_{i}}\right|\right|}}\le K.\end{aligned}\]
SLLN yields
\[ {I_{1}}+(R-L\tau )\frac{1}{n}{\sum \limits_{i=1}^{n}}\frac{\mathsf{E}[\hspace{2.5pt}C\cdot I(\Delta =0)\exp (-{D_{1}}||{W_{i}}||)\hspace{2.5pt}]}{\underset{\beta \in {\Theta _{\beta }}}{\max }{M_{U}}(\beta )}\to 0\]
almost surely as $n\to \infty $. This means that eventually
where ${D_{2}}\gt 0$.Let
\[ {A_{n}}=\frac{1}{n}{\sum \limits_{i=1}^{n}}{\Delta _{i}},\hspace{2.5pt}\hspace{2.5pt}{B_{n}}=\frac{1}{n}{\sum \limits_{i=1}^{n}}\frac{\exp (-{D_{1}}||{W_{i}}||){Y_{i}}}{\underset{\beta \in {\Theta _{\beta }}}{\max }{M_{U}}(\beta )}{1_{\{{\Delta _{i}}=1\}}}.\]
Since ${A_{n}}\gt 0$ and ${B_{n}}\gt 0$ eventually, we obtain
\[ {I_{2}}\le \underset{z\gt 0}{\max }({A_{n}}\ln z-z{B_{n}})={A_{n}}\left(\ln \left(\frac{{A_{n}}}{{B_{n}}}\right)-1\right).\]
Analogously,
\[\begin{aligned}{}& Var\left({Y_{i}}{e^{-{D_{1}}\left|\left|{W_{i}}\right|\right|}}I(\Delta =1)\right)\le \mathsf{E}\left({Y_{i}^{2}}{e^{-2{D_{1}}\left|\left|{W_{i}}\right|\right|}}\right)\le K\cdot \mathsf{E}{e^{-2{D_{1}}\left|\left|{X_{1}}\right|\right|}}\mathsf{E}{e^{-2{D_{1}}\left|\left|{U_{i}}\right|\right|}},\\ {} & \frac{1}{n}{\sum \limits_{i=1}^{n}}Var\left({Y_{i}}{e^{-{D_{1}}\left|\left|{W_{i}}\right|\right|}}I(\Delta =1)\right)\le K.\end{aligned}\]
By SLLN,
\[ {A_{n}}\to \mathsf{P}(\Delta =1)\gt 0,\hspace{1em}{B_{n}}-\frac{1}{n}{\sum \limits_{i=1}^{n}}\frac{\mathsf{E}[\hspace{2.5pt}T\cdot I(\Delta =1)\exp (-{D_{1}}||{W_{i}}||)\hspace{2.5pt}]}{\underset{\beta \in {\Theta _{\beta }}}{\max }{M_{U}}(\beta )}\to 0\]
respectively, almost surely as $n\to \infty $. Hence ${I_{2}}$ is eventually bounded from above by some positive constant ${D_{3}}$.Further, it follows from the strong law of large numbers that ${I_{3}}$ is eventually bounded from above by some positive constant ${D_{4}}$. Hence
\[ \underset{n\to \infty }{\overline{\lim }}\underset{(\lambda ,\beta )\in \Theta \setminus {\Theta ^{R}}}{\sup }{Q_{n}^{cor}}(\lambda ,\beta )\le -(R-L\tau ){D_{2}}+{D_{3}}+{D_{4}}.\]
Note that the constants ${D_{2}}$, ${D_{3}}$ and ${D_{4}}$ introduced above do not depend on $\beta \in {\Theta _{\beta }}$. Letting $R\to +\infty $, we get
\[ \underset{n\to \infty }{\overline{\lim }}\underset{(\lambda ,\beta )\in \Theta \setminus {\Theta ^{R}}}{\sup }{Q_{n}^{cor}}(\lambda ,\beta )\to -\infty ,\hspace{1em}R\to +\infty .\]
This proves that inequality (7) holds eventually for sufficiently large R.Further, one can repeat reasoning from [12] to show that
\[ ({\hat{\lambda }_{n}}(\omega ),{\hat{\beta }_{n}}(\omega ))\to ({\lambda _{0}},\beta ),\hspace{1em}n\to \infty ,\]
for all $\omega \in A$, $P(A)=1$.The theorem is proved. □
5 Conclusions
We have shown consistency of a simultaneous consistent estimator of the baseline hazard rate and the regression parameter in the Cox proportional hazards model assuming baseline hazard function is Lipshitz continuous with fixed constant in the case of heteroscedastic measurement errors for both bounded and unbounded parameter set, respectively.