1 Introduction
The most known semi-parametric model for analysis of failure time regression data is the proportional hazards (PH) model. There are many tests for the PH model from right censored data given by Cox [5], Moreau et al. [11], Lin [10], Nagelkerke et al. [12], Grambsch and Therneau [6], Quantin et al. [13], Bagdonavičius et al. [4]. Tests for specified covariates are given in Kvaloy and Neef [9], Kraus [8].
We consider tests for proportional hazards assumption concerning specified covariates or groups of covariates. The class of alternatives is wide: log-hazard rates under different values of covariates may cross, approach, go away. The data may be right censored. The limit distribution of the test statistics is derived. Power of the test under approaching alternatives is given. Real examples are considered.
2 Modeling non-proportional hazards for specified covariates
Let $S(t|z)$, $\lambda (t|z)$ and $\Lambda (t|z)$ be the survival, hazard rate and cumulative hazard functions under a m-dimensional explanatory variable (or covariate) $z={({z_{1}},\dots ,{z_{m}})^{T}}$.
Let us consider the proportional hazards (PH) model:
where $\lambda (t)$ is unknown baseline hazard function, the parameter $\beta ={({\beta _{1}},\dots ,{\beta _{m}})^{T}}$ is m-dimensional.
Under the PH model the ratios of hazard functions under any two different explanatory variables are constant over time, i.e. they are proportional.
Suppose that proportionality of hazard functions with respect to a specified covariate ${z_{j}}$ or a group of covariates ${z_{{j_{1}}}},\dots ,{z_{{j_{k}}}}$ may be violated. Our purpose is to detect such violations. So we seek a wider model which includes not only possibility of constant hazard rates ratios but also time-varying ratios.
We propose a model of the form
where
(1)
\[ \lambda (t|z)=g\big(z,\Lambda (t),\beta ,{\gamma _{j}}\big)\hspace{0.1667em}\lambda (t),\hspace{1em}\Lambda (t)={\int _{0}^{t}}\lambda (u)du,\]
\[ g(z,\Lambda (t),\beta ,{\gamma _{j}})=\frac{{e^{{\beta ^{T}}z+\Lambda (t){e^{{\gamma _{j}}{z_{j}}}}}}}{1+{e^{{\gamma _{j}}{z_{j}}}}[{e^{\Lambda (t){e^{{\gamma _{j}}{z_{j}}}}}}-1]}.\]
$\beta ={({\beta _{1}},\dots ,{\beta _{m}})^{T}}$ and ${\gamma _{j}}$ are unknown regression parameters, $\lambda (t)$ and $\Lambda (t)$ are unknown baseline hazard and cumulative hazard, respectively.The PH model is a particular case of this model when ${\gamma _{j}}=0$. The model (1) is very wide as compared to the PH model.
Indeed, suppose that two different values ${z^{(1)}}$ and ${z^{(2)}}$ of the covariate vector differ only by the value of the covariate ${z_{j}}$ and denote by $c(t)$ the ratio of hazard rates. For the model (1)
\[ c(0)={e^{{\beta _{j}}({z_{j}^{(2)}}-{z_{j}^{(1)}})}},\hspace{2em}c(\infty )={e^{({\beta _{j}}-{\gamma _{j}})({z_{j}^{(2)}}-{z_{j}^{(1)}})}}.\]
So this model gives a large choice of alternatives: the ratio of hazard rates $c(t)$ can vary from any $a\mathrm{>}0$ to any $b\mathrm{>}0$, whereas $c(t)$ is constant under the PH model. The difference of logarithms of hazard rates under different values of the covariate ${z_{j}}$ is constant under the PH model, so the logarithms of hazard rates as time functions are parallel, whereas in dependence on the values of its parameters the model (1) includes also the possibilities for these functions to approach, to go away, to intersect. So the model (1) may help to detect non-proportionality of hazard rates (or, equivalently, non-parallelism of log-hazard functions) in above mentioned directions. Other directions are very rare in real data.We do not discuss application of the model for analysis of survival regression data (which is a subject of another article). Such analysis could be done if the PH model would be rejected. Here the model is used only as a generator of a wide class of alternatives to the PH model.
3 Test statistic
Let us consider right censored failure time regression data:
where
${T_{i}}$ are failure times and ${C_{i}}$ are censoring times. ${X_{i}}$ is observation time of the ith unit, the event ${\delta _{i}}=1$ indicates that ${X_{i}}$ is the failure time ${T_{i}}$, and the event ${\delta _{i}}=0$ indicates that ${X_{i}}$ is the censoring time ${C_{i}}$.
Set
\[\begin{array}{r@{\hskip0pt}l@{\hskip0pt}r@{\hskip0pt}l}\displaystyle {N_{i}}(t)& \displaystyle ={\mathbf{1}_{\{{X_{i}}\le t,{\delta _{i}}=1\}}},& \displaystyle {Y_{i}}(t)& \displaystyle ={\mathbf{1}_{\{{X_{i}}\ge t\}}},\\ {} \displaystyle N(t)& \displaystyle ={\sum \limits_{i=1}^{n}}{N_{i}}(t),& \displaystyle \hspace{1em}Y(t)& \displaystyle ={\sum \limits_{i=1}^{n}}{Y_{i}}(t);\end{array}\]
here ${\mathbf{1}_{A}}$ denotes the indicator of the event A. The processes ${N_{i}}(t)$ and $N(t)$ are right-continuous counting process showing the numbers of observed failures in the time interval $[0,t]$ for the ith unit and for all n units, respectively. ${Y_{i}}(t)$ and $Y(t)$ are decreasing (not strongly) left-continuous stochastic processes showing the numbers of units which are still not-failed and not-censored just prior to t for the ith and for all units, respectively.Suppose that the survival distribution of the i object given ${z^{(i)}}$ is absolutely continuous with the survival functions ${S_{i}}(t)$ and the hazard rates ${\lambda _{i}}(t)$.
Suppose that the multiplicative intensities model is verified: the compensators of the counting processes ${N_{i}}$ with respect to the history of the observed processes are $\int {Y_{i}}{\lambda _{i}}du$. It is equivalent to the assumption that for any i and t: $P({X_{i}}\mathrm{>}t)\mathrm{>}0$ with almost all $s\in [0,t]$
\[\begin{aligned}{}& \underset{h\downarrow 0}{\lim }\frac{1}{h}P\big\{{T_{i}}\in [s,s+h)|{X_{i}}\ge s,{x^{(i)}}\big\}\\ {} & \hspace{1em}=\underset{h\downarrow 0}{\lim }\frac{1}{h}P\big\{{T_{i}}\in [s,s+h)|{T_{i}}\ge s,{x^{(i)}}\big\}={\lambda _{i}}(s).\end{aligned}\]
It means that for almost all $s\in [0,t]$ the risk to fail just after the time s given that units were non-censored and did not fail to time s is equal to the risk to fail just after the time s given that units did not fail to time s when censoring does not exist. So censoring has no influence to the risk of failure.Information about time-to-failure distribution contains the points, where the counting processes ${N_{i}}$ have jumps. A jump at the point t is possible if ${Y_{i}}(t)=1$. Partial information give the points where the counting processes ${N_{i}}$ have not jumps but the predictable processes have jumps.
The multiplicative intensities model is verified in the case of type I, type II, independent random censoring.
In the parametric case with known λ the unknown finite-dimensional parameter consists of the parameters β and ${\gamma _{j}}$. The component of the parametric score function U corresponding to ${\gamma _{j}}$ has the form
where
where
where ${E_{j}}$ is the component of E corresponding to the covariate ${z_{j}}$.
(2)
\[ {U_{j}}(\beta ,{\gamma _{j}})={\sum \limits_{i=1}^{n}}{\int _{0}^{\infty }}\big\{{w_{j}^{(i)}}(u,\Lambda ,\beta ,{\gamma _{j}})-{\tilde{E}_{j}}(u,\Lambda ,\beta ,{\gamma _{j}})\big\}\hspace{0.1667em}d{N_{i}}(u),\]
\[\begin{array}{l}\displaystyle {w_{j}^{(i)}}(v,\Lambda ,\beta ,{\gamma _{j}})=\frac{\partial }{\partial {\gamma _{j}}}\log \big\{g({z^{(i)}},\Lambda (v),\beta ,{\gamma _{j}})\big\},\\ {} \displaystyle \tilde{E}(v,\Lambda ,\beta ,{\gamma _{j}})=\frac{{\tilde{S}^{(1)}}(v,\Lambda ,\beta ,{\gamma _{j}})}{{\tilde{S}^{(0)}}(v,\Lambda ,\beta ,{\gamma _{j}})},\\ {} \displaystyle {\tilde{S}^{(0)}}(v,\Lambda ,\beta ,{\gamma _{j}})={\sum \limits_{i=1}^{n}}{Y_{i}}(v)g\big\{{z^{(i)}},\Lambda (v),\beta ,{\gamma _{j}}\big\},\\ {} \displaystyle {\tilde{S}^{(1)}}(v,\Lambda ,\beta ,{\gamma _{j}})={\sum \limits_{i=1}^{n}}{Y_{i}}(v)\hspace{0.1667em}\frac{\partial }{\partial {\gamma _{j}}}g\big\{{z^{(i)}},\Lambda (v),\beta ,{\gamma _{j}}\big\}.\end{array}\]
We consider semiparametric case when the baseline hazard rate λ is unknown. The test statistic is constructed in the following way. In the expression of ${U_{j}}$ the parameter β is replaced by its partial likelihood estimator $\hat{\beta }$ under the Cox model, the parameter ${\gamma _{j}}$ is replaced by 0, and the baseline cumulative intensity Λ is replaced by the Breslow estimator (see Andersen et al. [2])
The estimator $\hat{\beta }$ verifies the equation
(3)
\[ {\sum \limits_{i=1}^{n}}{\int _{0}^{\infty }}\big\{{z^{(i)}}-E(t,\hat{\beta })\big\}d{N_{i}}(t)=0,\]
\[\begin{array}{l}\displaystyle E(t,\beta )=\frac{{S^{(1)}}(t,\beta )}{{S^{(0)}}(t,\beta )},\\ {} \displaystyle {S^{(0)}}(t,\beta )={\sum \limits_{i=1}^{n}}{Y_{i}}(t){e^{{\beta ^{T}}{z^{(i)}}}},\hspace{2em}{S^{(1)}}(t,\beta )={\sum \limits_{i=1}^{n}}{z^{(i)}}{Y_{i}}(t){e^{{\beta ^{T}}{z^{(i)}}}}.\end{array}\]
Set $\hat{F}(t)=1-{e^{-\hat{\Lambda }(t)}}$. It is an estimator of the baseline cumulative distribution function. After the above mentioned replacement the following statistic is obtained:
(4)
\[ {\hat{U}_{{\gamma _{j}}}}=-{\sum \limits_{i=1}^{n}}{\int _{0}^{\infty }}\hat{F}(t)\big\{{z_{j}^{(i)}}-{E_{j}}(t,\hat{\beta })\big\}d{N_{i}}(t),\]Let us consider asymptotic distribution of the statistic (4) under the PH model. Set
\[\begin{array}{l}\displaystyle {S^{(2)}}(u,\beta )={\sum \limits_{i=1}^{n}}{({z^{(i)}})^{\otimes 2}}{Y_{i}}(u){e^{{\beta ^{T}}{z^{(i)}}}},\\ {} \displaystyle V(u,\beta )={S^{(2)}}(u,\beta )/{S^{(0)}}(u,\beta )-{E^{\otimes 2}}(u,\beta ),\end{array}\]
where ${A^{\otimes 2}}=A{A^{T}}$ for any matrix A. Denote by ${V_{j}}(u,\beta )$ the jth column and by ${V_{jj}}(u,\beta )$ the jth diagonal element of the matrix $V(u,\beta )$. Set
\[\begin{array}{l}\displaystyle {\hat{\Sigma }_{jj}}(t)={n^{-1}}{\int _{0}^{t}}\hat{F}(u){V_{jj}}(u,\hat{\beta })dN(u),\hspace{2em}{\hat{\Sigma }_{j}}(t)={n^{-1}}{\int _{0}^{t}}\hat{F}(u){V_{j}}(u,\hat{\beta })dN(u),\\ {} \displaystyle \hat{\Sigma }(t)={n^{-1}}{\int _{0}^{t}}V(u,\hat{\beta })dN(u).\end{array}\]
Assumption a) can be weakened considerably but we avoid writing complicated formulas for easier reading. Assumption b) simply means that at some finite time moment (perhaps very remote) observations are stopped. This is a usual assumption for asymptotic results to hold in survival analysis. Assumption c) also can be weakened. Assumption d) means that if censoring would be absent then units might survive after the moment τ with positive probability. Assumption e) is needed to have non-degenerated asymptotic distribution of the test statistic. This assumption is the usual assumption needed for asymptotic normality of the maximum partial likelihood estimator $\hat{\beta }$ of the regression parameter β.
Under Assumptions A [3] there exists a neighborhood Θ of β such that
as $n\to \infty $.
(5)
\[\begin{array}{l}\displaystyle \underset{b\in \Theta ,u\in [0,\tau ]}{\sup }\bigg\| \frac{1}{n}{S^{(i)}}(u,b)-{s^{(i)}}(u,b)\bigg\| \stackrel{P}{\to }0,\\ {} \displaystyle \underset{b\in \Theta ,u\in [0,\tau ]}{\sup }\big\| E(u,b)-e(u,b)\big\| \stackrel{P}{\to }0,\\ {} \displaystyle \underset{b\in \Theta ,u\in [0,\tau ]}{\sup }\big\| V(u,b)-v(u,b)\big\| \stackrel{P}{\to }0,\end{array}\]Assumption A a) may be weakened assuming that there exist non-random ${s^{(i)}}$ such that the convergences (5) hold.
Convergences (5) and Assumption A d) imply that
\[\begin{array}{l}\displaystyle \underset{b\in \Theta ,t\in [0,\tau ]}{\sup }\bigg\| {\int _{0}^{t}}(\frac{1}{n}{S^{(i)}}(u,b)-{s^{(i)}}(u,b))d\Lambda (u)\bigg\| ={o_{P}}(1),\\ {} \displaystyle \underset{b\in \Theta ,t\in [0,\tau ]}{\sup }\bigg\| {\int _{0}^{t}}(E(u,b)-e(u,b))d\Lambda (u)\bigg\| ={o_{P}}(1),\\ {} \displaystyle \underset{b\in \Theta ,t\in [0,\tau ]}{\sup }\bigg\| {\int _{0}^{t}}(V(u,b)-v(u,b))d\Lambda (u)\bigg\| ={o_{P}}(1).\end{array}\]
Theorem 1.
Under Assumptions A the following weak convergence holds:
where
Proof.
Under the Cox model,
where ${M_{i}}$ are martingales with respect to the history generated by the data. So using it and taking into account that
\[ {\sum \limits_{i=1}^{n}}{z_{j}^{(i)}}{Y_{i}}(u){e^{{\hat{\beta }^{T}}{z^{(i)}}}}-{E_{j}}(u,\hat{\beta }){\sum \limits_{i=1}^{n}}{Y_{i}}(u){e^{{\hat{\beta }^{T}}{z^{(i)}}}}=0,\]
the statistic ${\hat{U}_{{\gamma _{j}}}}$ can be written in the form
\[\begin{aligned}{}{\hat{U}_{{\gamma _{j}}}}& ={\sum \limits_{i=1}^{n}}{\int _{0}^{t}}\hat{F}(u)\big\{{z_{j}^{(i)}}-{E_{j}}(u,\hat{\beta })\big\}{Y_{i}}(u)\big({e^{{\hat{\beta }^{T}}{z^{(i)}}}}-{e^{{\beta ^{T}}{z^{(i)}}}}\big)d\Lambda (u)\\ {} & \hspace{1em}-{\sum \limits_{i=1}^{n}}{\int _{0}^{t}}\hat{F}(u)\big\{{z_{j}^{(i)}}-{E_{j}}(u,\hat{\beta })\big\}d{M_{i}}(u).\end{aligned}\]
Assumptions A a)–d) and consistence of the partial likelihood estimator $\hat{\beta }$ imply that
\[\begin{aligned}{}{n^{-1/2}}{\hat{U}_{{\gamma _{j}}}}(t)& ={n^{-1}}{\sum \limits_{i=1}^{n}}{\int _{0}^{t}}\hat{F}(u)\big\{{z_{j}^{(i)}}\\ {} & \hspace{1em}-{E_{j}}(u,\hat{\beta })\big\}{\big({z^{(i)}}\big)^{T}}{Y_{i}}(u){e^{{\hat{\beta }^{T}}{z^{(i)}}}}d\Lambda (u)\hspace{0.1667em}{n^{1/2}}(\hat{\beta }-\beta )\\ {} & \hspace{1em}-{n^{-1/2}}{\sum \limits_{i=1}^{n}}{\int _{0}^{t}}\hat{F}(u)\big\{{z_{j}^{(i)}}-{E_{j}}(u,\hat{\beta })\big\}d{M_{i}}(u)+{o_{P}}(1)\end{aligned}\]
uniformly on $[0,\tau ]$. Using convergences (5) and consistence of the estimator $\hat{\beta }$ we write the first integral in the form
\[\begin{aligned}{}& {n^{-1}}{\sum \limits_{i=1}^{n}}{\int _{0}^{t}}\hat{F}(u)\big\{{z_{j}^{(i)}}-{E_{j}}(u,\hat{\beta })\big\}{({z^{(i)}})^{T}}{Y_{i}}(u){e^{{\hat{\beta }^{T}}{z^{(i)}}}}d\hat{\Lambda }(u)+{o_{P}}(1)\\ {} & \hspace{1em}={n^{-1}}{\sum \limits_{i=1}^{n}}{\int _{0}^{t}}\hat{F}(u)\big\{{z_{j}^{(i)}}-{E_{j}}(t,\hat{\beta })\big\}{({z^{(i)}})^{T}}{Y_{i}}(u){e^{{\hat{\beta }^{T}}{z^{(i)}}}}\frac{dN(u)}{{S^{(0)}}(u,\hat{\beta })}+{o_{P}}(1)\\ {} & \hspace{1em}={\hat{\Sigma }_{j}^{T}}(\tau )+{o_{P}}(1).\end{aligned}\]
Under Assumptions A (now Assumption A e) is crucial) the following expression holds:
\[ {n^{1/2}}(\hat{\beta }-\beta )={\hat{\Sigma }^{-1}}(\tau ){n^{-1/2}}{\sum \limits_{i=1}^{n}}{\int _{0}^{\tau }}\{{z^{(i)}}-E(u,\hat{\beta })\}d{M_{i}}(u)+{o_{p}}(1)\]
uniformly on $[0,\tau ]$ (see Andersen et al. [2], Theorem VII.2.2, where even weaker assumptions instead of a)–d) are used). It implies
\[\begin{aligned}{}{n^{-1/2}}{\hat{U}_{{\gamma _{j}}}}(t)& ={\hat{\Sigma }_{j}^{T}}(t){\hat{\Sigma }^{-1}}(\tau )\hspace{0.1667em}{n^{-1/2}}{\sum \limits_{i=1}^{n}}{\int _{0}^{\tau }}\big\{{z^{(i)}}-E(u,\hat{\beta })\big\}d{M_{i}}(u)\\ {} & \hspace{1em}-{n^{-1/2}}{\sum \limits_{i=1}^{n}}{\int _{0}^{t}}\hat{F}(u)\big\{{z_{j}^{(i)}}-{E_{j}}(t,\hat{\beta })\big\}d{M_{i}}(u)+{o_{p}}(1)\\ {} & =:{M^{\ast }}(t)+{o_{p}}(1)\end{aligned}\]
uniformly on $[0,\tau ]$.The predictable variation of the local martingale ${M^{\ast }}(t)$ is
\[\begin{aligned}{}& \big\langle {M^{\ast }}\big\rangle (t)\\ {} & \hspace{1em}={\hat{\Sigma }_{j}^{T}}(t){\hat{\Sigma }^{-1}}(\tau )\hspace{0.1667em}{n^{-1}}{\sum \limits_{i=1}^{n}}{\int _{0}^{t}}{\big\{{z^{(i)}}-E(u,\hat{\beta })\big\}^{\otimes 2}}{Y_{i}}(u){e^{{\beta ^{T}}{z^{(i)}}}}d\Lambda (u){\hat{\Sigma }^{-1}}(\tau ){\hat{\Sigma }_{j}}(t)\\ {} & \hspace{2em}-{\hat{\Sigma }_{j}^{T}}(t){\hat{\Sigma }^{-1}}(\tau )\hspace{0.1667em}{n^{-1}}{\sum \limits_{i=1}^{n}}{\int _{0}^{t}}\hat{F}(u)\big\{{z_{j}^{(i)}}-{E_{j}}(u,\hat{\beta })\big\}\big\{{z^{(i)}}-E(u,\hat{\beta })\big\}\\ {} & \hspace{2em}\times {Y_{i}}(u){e^{{\beta ^{T}}{z^{(i)}}}}d\Lambda (u)\\ {} & \hspace{2em}-{n^{-1}}{\sum \limits_{i=1}^{n}}{\int _{0}^{t}}\hat{F}(u)\big\{{z_{j}^{(i)}}-{E_{j}}(u,\hat{\beta })\big\}{\big\{{z^{(i)}}-E(u,\hat{\beta })\big\}^{T}}\\ {} & \hspace{2em}\times {Y_{i}}(u){e^{{\beta ^{T}}{z^{(i)}}}}d\Lambda (u){\hat{\Sigma }^{-1}}(\tau ){\hat{\Sigma }_{j}}(t)\\ {} & \hspace{2em}+{n^{-1}}{\sum \limits_{i=1}^{n}}{\int _{0}^{t}}{\hat{F}^{2}}(u){\big\{{z_{j}^{(i)}}-{E_{j}}(u,\hat{\beta })\big\}^{2}}{Y_{i}}(u){e^{{\beta ^{T}}{z^{(i)}}}}d\Lambda (u).\end{aligned}\]
Note that
\[\begin{aligned}{}& {n^{-1}}{\sum \limits_{i=1}^{n}}{\int _{0}^{t}}{\big\{{z^{(i)}}-E(u,\hat{\beta })\big\}^{\otimes 2}}{Y_{i}}(u){e^{{\beta ^{T}}{z^{(i)}}}}d\Lambda (u)=\hat{\Sigma }(t)+{o_{P}}(1),\\ {} & {n^{-1}}{\sum \limits_{i=1}^{n}}{\int _{0}^{t}}\hat{F}(u)\big\{{z_{j}^{(i)}}-{E_{j}}(u,\hat{\beta })\big\}\big\{{z^{(i)}}-E(u,\hat{\beta })\big\}{Y_{i}}(u){e^{{\beta ^{T}}{z^{(i)}}}}d\Lambda (u)\\ {} & \hspace{1em}={\hat{\Sigma }_{j}}(t)+{o_{P}}(1),\\ {} & {n^{-1}}{\sum \limits_{i=1}^{n}}{\int _{0}^{t}}{\hat{F}^{2}}(u){\big\{{z_{j}^{(i)}}-{E_{j}}(u,\hat{\beta })\big\}^{2}}{Y_{i}}(u){e^{{\beta ^{T}}{z^{(i)}}}}d\Lambda (u)={\hat{\Sigma }_{jj}}(t)+{o_{P}}(1).\end{aligned}\]
So the predictable variation
\[ \langle {M^{\ast }}\rangle (t)={\hat{\Sigma }_{jj}}(t)-{\hat{\Sigma }_{j}^{T}}(t){\hat{\Sigma }^{-1}}(\tau ){\hat{\Sigma }_{j}}(t)+{o_{P}}(1)\stackrel{P}{\to }{\Sigma _{jj}}(t)-{\Sigma _{j}^{T}}(t){\Sigma ^{-1}}(\tau ){\Sigma _{j}}(t)\]
uniformly on $[0,\tau ]$; here ${\Sigma _{jj}}(t)$, ${\Sigma _{j}}(t)$, $\Sigma (\tau )$ are the limits in probability of the random matrices ${\hat{\Sigma }_{jj}}(t)$, ${\hat{\Sigma }_{j}}(t)$, $\hat{\Sigma }(\tau )$.Under Assumptions A for any $\varepsilon \mathrm{>}0$ the predictable variation (see Andersen et al. [2]), Theorem VII.2.2, where even weaker assumptions are used)
\[ \Bigg\langle {n^{-1/2}}{\sum \limits_{i=1}^{n}}{\int _{0}^{t}}\big\{{z_{j}^{(i)}}-{E_{j}}(t,\beta )\big\}{\mathbf{1}_{\{\mid {z_{j}^{(i)}}-{E_{j}}(u,\beta )\mid \mathrm{>}\sqrt{n}\varepsilon \}}}d{M_{i}}(u)\Bigg\rangle \stackrel{P}{\to }0\]
and β can be replaced by $\hat{\beta }$ in the expression of the left side because $\hat{\beta }$ is consistent. Similarly
\[ \Bigg\langle {n^{-1/2}}{\sum \limits_{i=1}^{n}}{\int _{0}^{t}}\hat{F}(u)\big\{{z_{j}^{(i)}}-{E_{j}}(t,\hat{\beta })\big\}{\mathbf{1}_{\{\hat{F}(u)\mid {z_{j}^{(i)}}-{E_{j}}(u,\hat{\beta })\mid \mathrm{>}\sqrt{n}\varepsilon \}}}d{M_{i}}(u)\Bigg\rangle \stackrel{P}{\to }0.\]
Hence, the Lindeberg conditions of the central limit theorem for martingales (see Andersen et al. [2]) are verified for ${M^{\ast }}$. We saw that the predictable variations $\langle {M^{\ast }}\rangle (t)$ converge in probability to non-random non-degenerated matrices. So the stochastic process ${n^{-1/2}}{\hat{U}_{{\gamma _{j}}}}(\cdot )$ converges in distribution to a zero mean Gaussian process on $[0,\tau ]$, in particular
here
□The test: the null hypothesis ${H_{0}}:{\gamma _{j}}=0$ is rejected with approximate significance level α if $|T|\mathrm{>}{z_{\alpha /2}}$; here ${z_{\alpha /2}}$ is the $\alpha /2$ critical value of the standard normal distribution.
4 Power under approaching alternatives
Let us consider the alternative model (1) and suppose that alternatives are approaching: ${\gamma _{j}}=\frac{c}{\sqrt{n}}$, where $c\ne 0$ is a fixed constant.
So the model
is considered. Denote by ${\beta _{0}}$ the true value of β under the model (8). The counting process ${N_{i}}(t)$ has the form
(8)
\[ \lambda (t|z)=\frac{\exp \{{\beta ^{T}}z+\Lambda (t)\exp \{\frac{c}{\sqrt{n}}{z_{j}}\}\}}{1+\exp \{\frac{c}{\sqrt{n}}{z_{j}}\}[\exp \{\Lambda (t)\exp \{\frac{c}{\sqrt{n}}{z_{j}}\}\}-1]}\hspace{0.1667em}\lambda (t),\hspace{1em}\Lambda (t)={\int _{0}^{t}}\lambda (u)du,\]
\[ {N_{i}}(t)={\int _{0}^{t}}g\big\{{z^{(i)}},\Lambda (u),{\beta _{0}},c/\sqrt{n}\big\}d\Lambda (u)+{M_{i}}(t),\]
where ${M_{i}}$ are local martingales with respect to the history generated by the data. Note that
Let us consider the stochastic process
where
Note that the derivative with respect to β is
\[ {\dot{Q}_{n}}(t,\beta )=\frac{1}{n}\dot{\ell }(t,\beta )=\frac{1}{n}{\sum \limits_{i=1}^{n}}{\int _{0}^{t}}\big\{{z^{(i)}}-E(u,\beta )\big\}d{N_{i}}(u).\]
Constructing the test statistic we defined $\hat{\beta }$ as a random vector satisfying the estimating equation $\dot{\ell }(\tau ,\beta )=0$. Let us show that $\hat{\beta }$ is a consistent estimator of ${\beta _{0}}$.Theorem 2.
If Assumptions A are satisfied then the probability that the equation $\dot{\ell }(\beta )=0$ has a unique solution, converges to 1 and $\hat{\beta }\stackrel{P}{\to }{\beta _{0}}$.
Proof.
Fix $\beta \in {\mathbf{R}^{m}}$. Using the Doob–Meier decomposition of ${N_{i}}(t)$ write the stochastic process ${Q_{n}}$ in the form
\[\begin{aligned}{}{Q_{n}}(t,\beta )& =\frac{1}{n}{\sum \limits_{i=1}^{n}}{\int _{0}^{t}}\bigg[{(\beta -{\beta _{0}})^{T}}{z^{(i)}}-\ln \frac{{S^{(0)}}(u,\beta )}{{S^{(0)}}(u,{\beta _{0}})}\bigg]\hspace{0.1667em}d{N_{i}}(u)\\ {} & =\frac{1}{n}{\sum \limits_{i=1}^{n}}{\int _{0}^{t}}\bigg[{(\beta -{\beta _{0}})^{T}}{z^{(i)}}-\ln \frac{{S^{(0)}}(u,\beta )}{{S^{(0)}}(u,{\beta _{0}})}\bigg]\\ {} & \hspace{1em}\times {Y_{i}}(u){e^{{\beta _{0}^{T}}{z^{(i)}}}}\bigg(1-\frac{c}{\sqrt{n}}F(u){z_{j}^{(i)}}+{o_{P}}\bigg(\frac{1}{\sqrt{n}}\bigg)\bigg)d\Lambda (u)\\ {} & \hspace{1em}+\frac{1}{n}{\sum \limits_{i=1}^{n}}{\int _{0}^{t}}\bigg[{(\beta -{\beta _{0}})^{T}}{z^{(i)}}-\ln \frac{{S^{(0)}}(u,\beta )}{{S^{(0)}}(u,{\beta _{0}})}\bigg]\hspace{0.1667em}d{M_{i}}(u)\\ {} & ={\int _{0}^{t}}\bigg[{(\beta -{\beta _{0}})^{T}}{S^{(1)}}(u,{\beta _{0}})-\ln \frac{{S^{(0)}}(u,\beta )}{{S^{(0)}}(u,{\beta _{0}})}{S^{(0)}}(u,{\beta _{0}})\bigg]\hspace{0.1667em}d\Lambda (u)\\ {} & \hspace{1em}+{o_{P}}(1)+\frac{1}{n}{\sum \limits_{i=1}^{n}}{\int _{0}^{t}}\bigg[{(\beta -{\beta _{0}})^{T}}{z^{(i)}}-\ln \frac{{S^{(0)}}(u,\beta )}{{S^{(0)}}(u,{\beta _{0}})}\bigg]\hspace{0.1667em}d{M_{i}}(u)\\ {} & ={K_{n}}(t,\beta )+{\tilde{M}_{n}}(t)+{o_{P}}(1).\end{aligned}\]
The predictable variation of the martingale ${\tilde{M}_{n}}$ is
\[\begin{aligned}{}\langle {\tilde{M}_{n}}\rangle (t)& =\frac{1}{{n^{2}}}{\sum \limits_{i=1}^{n}}{\int _{0}^{t}}{\bigg\{{(\beta -{\beta _{0}})^{T}}{z^{(i)}}-\ln \frac{{S^{(0)}}(u,\beta )}{{S^{(0)}}(u,{\beta _{0}})}\bigg\}^{2}}\hspace{0.1667em}{Y_{i}}(u){e^{{\beta _{0}^{T}}{z^{(i)}}}}\\ {} & \hspace{1em}\times \bigg(1-\frac{c}{\sqrt{n}}F(u){z_{j}^{(i)}}+{o_{P}}\bigg(\frac{1}{\sqrt{n}}\bigg)\bigg)d\Lambda (u)\\ {} & =\frac{1}{{n^{2}}}{\int _{0}^{t}}\bigg\{{(\beta -{\beta _{0}})^{T}}{S^{(2)}}(u,{\beta _{0}})(\beta -{\beta _{0}})\\ {} & \hspace{1em}-2{(\beta -{\beta _{0}})^{T}}{S^{(1)}}(u,{\beta _{0}})\ln \frac{{S^{(0)}}(u,\beta )}{{S^{(0)}}(u,{\beta _{0}})}\\ {} & \hspace{1em}+{S^{(0)}}(u,{\beta _{0}}){\ln ^{2}}\frac{{S^{(0)}}(u,\beta )}{{S^{(0)}}(u,{\beta _{0}})}\bigg\}d\Lambda (u)+{o_{p}}\bigg(\frac{1}{n}\bigg).\end{aligned}\]
So $\langle {\tilde{M}_{n}}\rangle (\tau )\stackrel{P}{\to }0$. ${K_{n}}(\tau ,\beta )\stackrel{P}{\to }k(\beta )$; here
\[ k(\beta )={\int _{0}^{\tau }}\bigg[{(\beta -{\beta _{0}})^{T}}{s^{(1)}}(u,{\beta _{0}})-\ln \frac{{s^{(0)}}(u,\beta )}{{s^{(0)}}(u,{\beta _{0}})}{s^{(0)}}(u,{\beta _{0}})\bigg]d\Lambda (u),\]
which implies
Note that
\[\begin{array}{l}\displaystyle \dot{k}(\beta )={\int _{0}^{\tau }}\big[{s^{(1)}}(u,{\beta _{0}})-e(u,\beta ){s^{(0)}}(u,{\beta _{0}})\big]\hspace{0.1667em}d\Lambda (u),\hspace{1em}\dot{k}({\beta _{0}})=0.\\ {} \displaystyle \ddot{k}(\beta )=-{\int _{0}^{\tau }}v(u,\beta ){s^{(0)}}(u,{\beta _{0}})\hspace{0.1667em}d\Lambda (u),\hspace{1em}\ddot{k}({\beta _{0}})=-\Sigma ({\beta _{0}}).\end{array}\]
So the matrix $\ddot{k}({\beta _{0}})$ is negatively definite. The remaining part of the proof coincides with the proof of analogous theorem for the Cox model (see Theorem VII.2.1 of Andersen et al.), i.e. Andersen’s and Gill’s theorem [3] is applied. This theorem says that if the sequence of concave differentiable stochastic processes ${Q_{n}}(\beta )$ pointwise converges in probability to a real function $k(\beta )$ on a convex open set $E\subset {\mathbf{R}^{m}}$, then: a) the function $k(\beta )$ is concave on E; b) the convergence is uniform in probability on compact subsets of the set E; c) if the function $k(\beta )$ has a unique maximum at the point ${\beta _{0}}$ then the probability that the equation $\dot{Q}(\beta )=0$ has a unique root $\hat{\beta }$ in the set E tends to 1 and $\hat{\beta }\stackrel{P}{\to }{\beta _{0}}$.Assumption A e) was crucial for application of Andersen’s and Gill’s theorem. □
Note that $\hat{\Lambda }$ is uniformly consistent estimator of Λ on $[0,\tau ]$:
\[\begin{aligned}{}\hat{\Lambda }(t)& ={\int _{0}^{t}}\frac{dN(u)}{{S^{(0)}}(u,\hat{\beta })}=\Lambda (t)+{\int _{0}^{t}}\bigg[\frac{{S^{(0)}}(u,{\beta _{0}})}{{S^{(0)}}(u,\hat{\beta })}-1\bigg]d\Lambda (u)\\ {} & \hspace{1em}-\frac{c}{\sqrt{n}}{\int _{0}^{t}}\frac{{S_{j}^{(1)}}(u,{\beta _{0}})}{{S^{(0)}}(u,\hat{\beta })}d\Lambda (u)+{\int _{0}^{t}}\frac{dM(u)}{{S^{(0)}}(u,\hat{\beta })}+{o_{p}}(1)=\Lambda (t)+{o_{p}}(1)\end{aligned}\]
uniformly on $[0,\tau ]$ because $\hat{\beta }$ is consistent and
Set
Theorem 3.
If Assumptions A are satisfied then
-
1)\[ \frac{1}{\sqrt{n}}\dot{\ell }(\cdot ,{\beta _{0}})\stackrel{\mathcal{D}}{\to }{\mu _{j}}(t)+Z(\cdot ,{\beta _{0}})\hspace{0.2778em}\textit{on}\hspace{0.2778em}{(D[0,\tau ])^{m}},\]where Z is an m-dimensional Gaussian process with components having independent increments, ${Z_{j}}(0)=0$ a.s. and for all $0\le s\le t\le \tau $: here ${\sigma _{j{j^{\prime }}(t)}}$ are the elements of the matrix $\Sigma (t)$. In particular,
-
2) $\hat{\Sigma }(\tau )=-\frac{1}{n}\ddot{\ell }(\hat{\beta })\stackrel{P}{\to }\Sigma (\tau )$,
-
3) $\sqrt{n}(\hat{\beta }-{\beta _{0}})={(-\frac{1}{n}\ddot{\ell }(\hat{\beta }))^{-1}}\frac{1}{\sqrt{n}}\dot{\ell }({\beta _{0}})+{o_{P}}(1)$.
Proof.
Set
\[ {M^{\ast }}(t)=\frac{1}{\sqrt{n}}{\sum \limits_{i=1}^{n}}{\int _{0}^{t}}\big\{{z^{(i)}}-E(u,{\beta _{0}})\big\}d{M_{i}}(u).\]
Using the Doob–Meier decomposition we have
\[\begin{aligned}{}\frac{1}{\sqrt{n}}\dot{\ell }(t,{\beta _{0}})& =\frac{1}{\sqrt{n}}{\sum \limits_{i=1}^{n}}{\int _{0}^{t}}\big\{{z^{(i)}}-E(u,{\beta _{0}})\big\}d{N_{i}}(u)\\ {} & ={M^{\ast }}(t)+\frac{1}{\sqrt{n}}{\sum \limits_{i=1}^{n}}{\int _{0}^{t}}\big\{{z^{(i)}}-E(u,{\beta _{0}})\big\}{Y_{i}}(u){e^{{\beta _{0}^{T}}{z^{(i)}}}}\\ {} & \hspace{1em}\times \bigg(1-\frac{c}{\sqrt{n}}F(u){z_{j}^{(i)}}+{o_{P}}\bigg(\frac{1}{\sqrt{n}}\bigg)\bigg)d\Lambda (u)\\ {} & ={M^{\ast }}(t)-\frac{c}{n}{\sum \limits_{i=1}^{n}}{\int _{0}^{t}}{V_{j}}(u,{\beta _{0}}){S^{(0)}}(u,{\beta _{0}})F(u)d\Lambda (u)+{o_{P}}(1)\\ {} & ={M^{\ast }}(t)-c{\int _{0}^{t}}{v_{j}}(u,{\beta _{0}}){s^{(0)}}(u,{\beta _{0}})F(u)d\Lambda (u)+{o_{P}}(1)\\ {} & ={M^{\ast }}(t)+{\mu _{j}}(t)+{o_{P}}(1),\end{aligned}\]
where ${V_{j}}$ is the jth column of the matrix V.The first term converges weakly to $Z(\cdot ,{\beta _{0}})\hspace{0.2778em}\text{on}\hspace{0.2778em}{(D[0,\tau ])^{m}}$ because the limit in probability of its predictable covariation matrix has the same expression as that of analogous term under the Cox model:
\[\begin{aligned}{}\langle {M^{\ast }}\rangle (t)& =\frac{1}{n}{\int _{0}^{t}}V(u,{\beta _{0}}){S^{(0)}}(u,{\beta _{0}})\hspace{0.1667em}d\Lambda (u)-\frac{c}{{n^{3/2}}}{\sum \limits_{i=1}^{n}}{\int _{0}^{t}}\big\{{z^{(i)}}-E(u,{\beta _{0}})\big\}\\ {} & \hspace{1em}\times {\big\{{z^{(i)}}-E(u,{\beta _{0}})\big\}^{T}}{Y_{i}}(u)\hspace{0.1667em}{e^{{\beta _{0}^{T}}{z^{(i)}}}}\big(F(u){z_{j}^{(i)}}+{o_{P}}(1)\big)d\Lambda (u)\\ {} & =\frac{1}{n}{\int _{0}^{t}}V(u,{\beta _{0}}){S^{(0)}}(u,{\beta _{0}})\hspace{0.1667em}d\Lambda (u)+{o_{P}}(1)\stackrel{P}{\to }\Sigma (t).\end{aligned}\]
Analogously, verification of the Lindeberg’s condition is done by the same way as in the case of the Cox model.Let us consider the norm of the difference:
\[\begin{aligned}{}\bigg\| -\frac{1}{n}\ddot{\ell }(\hat{\beta })-\Sigma (\tau )\bigg\| & \le \underset{u\in [0,\tau ]}{\sup }\big|V(u,\hat{\beta })-v(u,\hat{\beta })\big|+\underset{u\in [0,\tau ]}{\sup }\big|v(u,\hat{\beta })-v(u,{\beta _{0}})\big|\\ {} & \hspace{1em}+\bigg\| \frac{1}{n}{\int _{0}^{\tau }}v(u,{\beta _{0}})dM(u)\bigg\| \\ {} & \hspace{1em}+\bigg\| {\int _{0}^{\tau }}v(u,{\beta _{0}})\bigg(\frac{1}{n}{S^{(0)}}(u,{\beta _{0}})-{s^{(0)}}(u,{\beta _{0}})\bigg)\hspace{0.1667em}d\Lambda (u)\bigg\| \\ {} & \hspace{1em}+\bigg\| \frac{c}{n\sqrt{n}}{\int _{0}^{\tau }}F(u){\sum \limits_{i=1}^{n}}{z_{j}^{(i)}}{e^{{\beta _{0}^{T}}{z^{(i)}}}}(1+{o_{p}}(1))d\Lambda (u)\bigg\| .\end{aligned}\]
Using the fact that the estimator $\hat{\beta }$ is consistent and that the first four terms have the same structure as analogous terms in the proof of Theorem VII.2.2 of Andersen et al., we have that the first four terms converge in probability to zero. Such convergence of the last term is obvious because the last term multiplied by $\sqrt{n}$ converges to a finite limit in probability.The mean value theorem and consistency of the estimator $\hat{\beta }$ imply
\[ {\dot{\ell }_{j}}(\hat{\beta })-{\dot{\ell }_{j}}({\beta _{0}})={\ddot{\ell }_{j}}({\beta _{j}^{\ast }})(\hat{\beta }-{\beta _{0}})=n\bigg(\frac{1}{n}{\ddot{\ell }_{j}}(\hat{\beta })+{o_{P}}(1)\bigg)(\hat{\beta }-{\beta _{0}});\]
here ${\beta _{j}^{\ast }}$ is a point on the line segment joining the points $\hat{\beta }$ and ${\beta _{0}}$. Since ${\dot{\ell }_{j}}(\hat{\beta })=0$, we obtain
□Set
\[\begin{aligned}{}d& =c\bigg[{\int _{0}^{\tau }}{F^{2}}(u){v_{jj}}(u,{\beta _{0}}){s^{(0)}}(u,{\beta _{0}})d\Lambda (u)\\ {} & \hspace{1em}-{\int _{0}^{\tau }}F(u){v_{j}^{T}}(u,{\beta _{0}})d\Lambda (u){\Sigma ^{-1}}(\tau ){\int _{0}^{\tau }}F(u){v_{j}}(u,{\beta _{0}}){s^{(0)}}(u,{\beta _{0}})d\Lambda (u)\bigg].\end{aligned}\]
Note that for $m=1$
\[ d/c=\frac{\int {F^{2}}v{s^{(0)}}d\Lambda \int v{s^{(0)}}d\Lambda -{(\int Fv{s^{(0)}}d\Lambda )^{2}}}{{\int _{v}}{s^{(0)}}d\Lambda }\mathrm{>}0,\]
because F is not equal to 1 a.s. on $[0,\tau ]$.Theorem 4.
Under Assumptions A
where $\mu =d/{D_{j}}$ and ${D_{j}}$ is the limit in probability of the random variable ${\hat{D}_{j}}$.
Proof.
Set
\[ \bar{M}(t)={n^{-1/2}}{\sum \limits_{i=1}^{n}}{\int _{0}^{t}}\hat{F}(u)\big\{{z_{j}^{(i)}}-{E_{j}}(u,\hat{\beta })\big\}d{M_{i}}(u).\]
Similarly as in the proof of Theorem 1 using the equality ${S_{j}^{(1)}}-{E_{j}}{S^{(0)}}=0$, consistency of $\hat{\beta }$, Assumptions A, uniform consistency of $\hat{F}$ on $[0,\tau ]$, we write the following expression:
\[\begin{aligned}{}{n^{-1/2}}{\hat{U}_{c/\sqrt{n}}}(t)& =-{n^{-1/2}}{\sum \limits_{i=1}^{n}}{\int _{0}^{t}}\hat{F}(u)\big\{{z_{j}^{(i)}}-{E_{j}}(u,\hat{\beta })\big\}d{N_{i}}(u)\\ {} & =-\bar{M}(t)-{n^{-1/2}}{\sum \limits_{i=1}^{n}}{\int _{0}^{t}}\hat{F}(u)\big\{{z_{j}^{(i)}}-{E_{j}}(u,\hat{\beta })\big\}{e^{{\beta _{0}^{T}}{z^{(i)}}}}\\ {} & \hspace{1em}\times \big\{1-{z_{j}^{(i)}}F(u)\frac{c}{\sqrt{n}}(1+{o_{p}}(1))\big\}{Y_{i}}(u)d\Lambda (u)\\ {} & =-\bar{M}(t)+\frac{c}{n}{\int _{0}^{t}}\hat{F}(u)F(u){V_{jj}}(u,{\beta _{0}}){S^{(0)}}(u,{\beta _{0}})d\Lambda (u)\\ {} & \hspace{1em}+{n^{-1}}{\int _{0}^{t}}\hat{F}(u){V_{j}^{T}}(u,{\beta _{0}}){S^{(0)}}(u,{\beta _{0}})d\Lambda (u)\hspace{0.1667em}{n^{1/2}}(\hat{\beta }-{\beta _{0}})+{o_{P}}(1)\end{aligned}\]
uniformly on $[0,\tau ]$. Applying Theorem 3 write the right side in the form
\[\begin{aligned}{}{n^{-1/2}}{\hat{U}_{c/\sqrt{n}}}(t)& =c\bigg[{\int _{0}^{t}}{F^{2}}(u){v_{jj}}(u,{\beta _{0}}){s^{(0)}}(u,{\beta _{0}})d\Lambda (u)\\ {} & \hspace{1em}-{\int _{0}^{t}}F(u){v_{j}^{T}}(u,{\beta _{0}})d\Lambda (u){\Sigma ^{-1}}(\tau )\\ {} & \hspace{1em}\times {\int _{0}^{\tau }}F(u){v_{j}}(u,{\beta _{0}}){s^{(0)}}(u,{\beta _{0}})d\Lambda (u)\bigg]\\ {} & \hspace{1em}-{n^{-1/2}}{\sum \limits_{i=1}^{n}}{\int _{0}^{t}}\hat{F}(u)\big\{{z_{j}^{(i)}}-{E_{j}}(u,\hat{\beta })\big\}d{M_{i}}(u)\\ {} & \hspace{1em}-{\hat{\Sigma }_{j}^{T}}(t){\hat{\Sigma }^{-1}}(\tau )\hspace{0.1667em}{n^{-1/2}}{\sum \limits_{i=1}^{n}}{\int _{0}^{\tau }}\big\{{z^{(i)}}-E(u,\hat{\beta })\big\}d{M_{i}}(u)+{o_{p}}(1)\end{aligned}\]
uniformly on $[0,\tau ]$.Note that the non-martingale part is d and the martingale part of the expression is exactly of the same form as in the case of the PH model and has the same limit distribution as in latter case. □
The power function of the test against approaching alternatives is
where Φ and ${z_{\frac{\alpha }{2}}}$ are the c.d.f. and the upper $\alpha /2$ critical value of the standard normal law, respectively. If $d\ne 0$ and c is large then μ is large and the power is near to 1.
5 Case of several covariates
The proportional hazards hypothesis for several covariates ${z_{{j_{1}}}},\dots ,{z_{{j_{k}}}}$ (for all covariates ${z_{1}},\dots ,{z_{m}}$, in particular) is tested similarly.
Set $\bar{z}={({z_{{j_{1}}}},\dots ,{z_{{j_{k}}}})^{T}}$, $\gamma ={({\gamma _{{j_{1}}}},\dots ,{\gamma _{{j_{k}}}})^{T}}$. The term ${\gamma _{j}}{z_{j}}$ is replaced by ${\gamma ^{T}}\bar{z}$ in the model (1).
Replacing ${z_{j}^{(i)}}$ by ${\bar{z}^{(i)}}$, ${E_{j}}(t,\hat{\beta })$ by ${({E_{{j_{1}}}}(t,\hat{\beta }),\dots ,{E_{{j_{k}}}}(t,\hat{\beta }))^{T}}$ in the statistic ${\hat{U}_{{\gamma _{j}}}}$ we obtain a statistic denoted by ${\hat{U}_{\gamma }}$. The statistic T has the form (6), where the element ${V_{jj}}(u,\hat{\beta })$ is replaced by the $k\times k$ matrix ${({V_{{j_{l}},{j_{s}}}}(u,\hat{\beta }))}_{k\times k}$, the vector ${V_{j}}(u,\hat{\beta })$ is replaced by the $m\times k$ matrix $({V_{i,{j_{s}}}}(u,\hat{\beta }))$, $i=1,\dots ,m$, $s=1,\dots ,k$ in the definitions of ${\Sigma _{jj}}$, ${\Sigma _{j}}$, and ${\hat{U}_{{\gamma _{j}}}}$ is replaced by ${\hat{U}_{\gamma }}$.
So ${\hat{\Sigma }_{jj}}(t)$, ${\hat{\Sigma }_{j}}(t)$, and $\hat{D}$ become $k\times k$, $m\times k$, and $k\times k$ matrices, respectively, and ${U_{\gamma }}$ becomes a $k\times 1$ vector.
The test statistic
\[ T={n^{-1}}{\hat{U}_{\gamma }^{T}}{\hat{D}^{-1}}{\hat{U}_{\gamma }}\stackrel{\mathcal{D}}{\to }{\chi ^{2}}(k)\hspace{1em}\hspace{2.5pt}\text{as}\hspace{2.5pt}n\to \infty .\]
The hypothesis ${H_{0}}:\gamma =0$ is rejected with approximate significance level α if $T\mathrm{>}{\chi _{\alpha }^{2}}(k)$; here ${\chi _{\alpha }^{2}}(k)$ is the α critical value of the standard chi-squared distribution with k degrees of freedom.6 Real data analysis
Example 1 (Chemo-radio data, one-dimensional dichotomous covariate).
Stablein and Koutrouvelis [14] studied the well-known two-sample data of the Gastrointestinal Tumor Study Group concerning effects of chemotherapy ($z=0$) and chemotherapy plus radiotherapy ($z=1$) on the survival times of gastric cancer patients.
The number of patients $n=90$. The data are right-censored. The value of the test statistic T is 3.651, the p-value is 0.0003. The Cox model is rejected. The result is natural because the Kaplan–Meier estimators of the survival functions of two patient groups intersect.
Example 2 (Prisoners data, 7 covariates).
The data are given in [1]. They consist of 432 male inmates who were released from Maryland state prisons in the early 1970s. These men were followed for 1 year after their release, and the dates of any arrests were recorded. Time is measured by the week of the first arrest after release. There were seven covariates: financial aid after release (FIN; 0 – no, 1 – yes), age in years at the time of release (AGE), race (RACE; 1 – black, 0 – otherwise), full-time work experience before incarceration (WEX; 1 – yes, 0 – no), marital status (MAR; 1 – was married at the time of release, 0 – otherwise), released on parole (PAR; 1 – released on parole, 0 – otherwise), convictions prior to incarceration (PRI).
The results of testing hypothesis for all covariates: the value of the test statistic T is 17.58, the p-value is 0.014. The assumption of the PH assumption is rejected.
The results for each covariate are given in Table 1. The PH assumption is rejected for AGE and WEXP covariates.
Example 3 (UIS dataset, 10 covariates).
Let us consider right censored UIS data set given in [7].
UIS was a 5-year research project comprised of two concurrent randomized trials of residential treatment for drug abuse. The purpose of the study was to compare treatment programs of different planned durations designed to reduce drug abuse and to prevent high-risk HIV behavior. The UIS sought to determine whether alternative residential treatment approaches are variable in effectiveness and whether efficacy depends on planned program duration. The time variable is time to return to drug use (measured from admission). The individuals who did not returned to drug use are right censored. We use the model with 10 covariates (which support PH assumption) given by Hosmer, Lemeshow and May (2008). The covariates are: AGE (years); Beck depression score (beckt; 0 – 54); $\mathrm{NDR}1={((\mathrm{NDR}+1)/10)^{(-1)}}$; $\mathrm{NDR}2={((\mathrm{NDR}+1)/10)^{(-1)}}\log ((\mathrm{NDR}+1)/10)$; drug use history at admission (IVHX_3; 1 – recent, 0 – never or previous); RACE (0 – white, 1 – non-white); treatment randomization assigment (TREAT; 0 – short, 1 – long); treatment site (SITE; 0 – A, 1 – B); interaction of age and treatment site (AGEXS); interaction of race and treatment site (RACEXS). The NDR denotes number of prior drug treatments (0 – 40). Due to missing data in covariates, the model is based on 575 of the 628 observations.
The results of testing hypothesis for all covariates: the value of the test statistic T is 6.781, the p-value is 0.7460. The assumption of the PH assumption is not rejected.
The results for each covariate are given in Table 2.
Table 2.
UIS dataset. The values of test statistics and p-values
Covariate | AGE | beckt | NDR1 | NDR2 | IVHX_3 | RACE |
Stat. | −0.061 | 1.085 | 0.182 | 0.118 | 0.912 | −1.278 |
p-value | 0.952 | 0.278 | 0.856 | 0.906 | 0.362 | 0.201 |
Covariate | TREAT | SITE | AGEXS | RACEXS | Global test |
Stat. | 0.792 | 1.016 | −0.378 | 6.781 | −0.107 |
p-value | 0.429 | 0.309 | 0.705 | 0.746 | 0.915 |
The assumption of the PH assumption for individual covariates is not rejected.