1 Introduction
Statistical model. Consider logistic regression with Berkson-type error in the explanatory variable. One trial is distributed as follows. ${X_{n}^{\mathrm{obs}}}$ is the observed (or assigned) surrogate regressor. The true regressor is $X_{n}={X_{n}^{\mathrm{obs}}}+U_{n}$, where the error $U_{n}\sim N(0,{\tau }^{2})$ is independent of ${X_{n}^{\mathrm{obs}}}$. The response $Y_{n}$ is a binary random variable and attains either 0 or 1 with
We consider both functional model and structural model. In the functional one, ${X_{n}^{\mathrm{obs}}}$ are nonrandom variables, and in the structural one, ${X_{n}^{\mathrm{obs}}}$ are i.i.d., and therefore in the latter model, $({X_{n}^{\mathrm{obs}}},X_{n},Y_{n})$ are i.i.d. random triples.
The couples $({X_{n}^{\mathrm{obs}}},Y_{n})$, $n=1,\dots ,N$, are observed. Vector $\vec{\beta }={(\beta _{0},\beta _{1})}^{\top }$ is a parameter of interest.
The error variance ${\tau }^{2}$ can be either known or unknown, and we consider both cases. The conditions for identifiability of the model (or of the parameter $\vec{\beta }$) are presented.
Overview. Berkson models of logistic regression and probit regression were set up in Burr [1]. For probit regression, it is shown that the introduction of Berkson-type error is equivalent to augmentation of regression parameters. As a consequence, the Berkson model of probit regression is identifiable if ${\tau }^{2}$ is known and is not identifiable if ${\tau }^{2}$ is not known.
The identifiability of the classical model was studied by Küchenhoff [3]. He assumes that both the regressor and measurement error are normally distributed. Then univariate logistic regression is identifiable (here ${\tau }^{2}$ can be unknown), and multiple logistic regression is not identifiable. Our results can be proved similarly to [3] if we assume that the distribution of the surrogate regressor ${X}^{\mathrm{obs}}$ has an unbounded support.
For classification of errors-in-variables regression models and various estimation methods, see the monograph by Carroll et al. [2].
Identifiability of the statistical model can be used in the proof of consistency of the estimator. For known ${\tau }^{2}$, the strong consistency of the maximum likelihood estimator is obtained by Shklyar [4]. But if ${\tau }^{2}$ is not known, the maximum likelihood estimator seems to be unstable (see discussion in [2] or [3]).
2 Convolution of logistic function with normal density
Consider the function
that is, $L_{0}(x,0)={\mathrm{e}}^{x}/(1+{\mathrm{e}}^{x})$ and
(1)
\[L_{0}\big(x,{\sigma }^{2}\big)=\operatorname{\mathsf{E}}\frac{\exp (x-\xi )}{1+\exp (x-\xi )},\hspace{1em}\xi \sim N\big(0,{\sigma }^{2}\big),\hspace{2.5pt}x\in \mathbb{R},\hspace{2.5pt}{\sigma }^{2}\ge 0,\]Denote the derivatives w.r.t. x
Differentiation of $L_{k}(x,{\sigma }^{2})$ with respect to the second argument is described in Appendix A.
The distribution of $Y_{i}$ given ${X_{i}^{\mathrm{obs}}}$ is
since $[\beta _{0}+\beta _{1}X_{i}\mid {X_{i}^{\mathrm{obs}}}]\sim N(\beta _{0}+\beta _{1}{X_{i}^{\mathrm{obs}}},\hspace{0.2222em}{\beta _{1}^{2}}{\tau }^{2})$.
(3)
\[\begin{array}{r@{\hskip0pt}l}\displaystyle \operatorname{\mathsf{P}}\big[Y_{i}=1\big|{X_{i}^{\mathrm{obs}}}\big]& \displaystyle =\operatorname{\mathsf{E}}\big[\operatorname{\mathsf{P}}\big[Y_{i}=1\big|{X_{i}^{\mathrm{obs}}},X_{i}\big]\big|{X_{i}^{\mathrm{obs}}}\big]\\{} & \displaystyle =\operatorname{\mathsf{E}}\bigg[\frac{\exp (\beta _{0}+\beta _{1}X_{i})}{1+\exp (\beta _{0}+\beta _{1}X_{i})}\bigg|{X_{i}^{\mathrm{obs}}}\bigg]=L\big(\beta _{0}+\beta _{1}{X_{i}^{\mathrm{obs}}},\hspace{0.2222em}{\beta _{1}^{2}}{\tau }^{2}\big)\end{array}\]3 Identifiability when ${\tau }^{2}$ is known
Theorem 1.
If in the functional model not all ${X}^{\mathrm{obs}}$ are equal, then the model is identifiable.
Proof.
Suppose that for two values of parameters ${\vec{\beta }}^{(1)}=({\beta _{0}^{(1)}},{\beta _{1}^{(1)}})$ and ${\vec{\beta }}^{(2)}=({\beta _{0}^{(2)}},{\beta _{1}^{(2)}})$, ${\vec{\beta }}^{(1)}\ne {\vec{\beta }}^{(2)}$, the distributions of observations are equal. Then for all $i=1,2,\dots ,N$,
\[\begin{array}{l}\displaystyle \operatorname{\mathsf{P}}_{{\vec{\beta }}^{(1)}}(Y_{i}=1)=\operatorname{\mathsf{P}}_{{\vec{\beta }}^{(2)}}(Y_{i}=1),\\{} \displaystyle L_{0}\big({\beta _{0}^{(1)}}+{\beta _{1}^{(1)}}{X_{i}^{\mathrm{obs}}},{\big({\beta _{1}^{(1)}}\big)}^{2}{\tau }^{2}\big)=L_{0}\big({\beta _{0}^{(2)}}+{\beta _{1}^{(2)}}{X_{i}^{\mathrm{obs}}},{\big({\beta _{1}^{(2)}}\big)}^{2}{\tau }^{2}\big).\end{array}\]
However, by Lemma 4.1 from [4] the equation
\[L_{0}\big({\beta _{0}^{(1)}}+{\beta _{1}^{(1)}}x,{\big({\beta _{1}^{(1)}}\big)}^{2}{\tau }^{2}\big)=L_{0}\big({\beta _{0}^{(2)}}+{\beta _{1}^{(2)}}x,{\big({\beta _{1}^{(2)}}\big)}^{2}{\tau }^{2}\big)\]
has no more than one solution x. Hence, all ${X_{i}^{\mathrm{obs}}}$ are equal. □By definition the degenerate distribution is the distribution concentrated at a single point. For the next theorem, see the proof of Theorem 5.1 in [4].
Theorem 2 ([4]).
If in the structural model the distribution of ${X_{1}^{\mathrm{obs}}}$ is not degenerate, then the parameter $\vec{\beta }$ is identifiable.
4 Identifiability when ${\tau }^{2}$ is unknown
For fixed ${\sigma }^{2}$, the function $L_{0}(x,{\sigma }^{2})$ is a bijection $\mathbb{R}\to (0,\hspace{0.2222em}1)$. Hence, for fixed ${\sigma _{1}^{2}}$ and ${\sigma _{2}^{2}}$, the relation
sets the bijection $\mathbb{R}\to \mathbb{R}$; see Fig. 1.
Lemma 3.
For fixed ${\sigma _{1}^{2}}\ge 0$ and ${\sigma _{2}^{2}}\ge 0$, the sign of the second derivative of the implicit function (4) is
Proof.
Differentiating (4), we get
\[\begin{array}{r@{\hskip0pt}l}\displaystyle L_{1}\big(y,{\sigma _{1}^{2}}\big)\hspace{0.1667em}\mathrm{d}y& \displaystyle =L_{1}\big(x,{\sigma _{2}^{2}}\big)\hspace{0.1667em}\mathrm{d}x;\\{} \displaystyle \frac{\mathrm{d}y}{\mathrm{d}x}& \displaystyle =\frac{L_{1}(x,{\sigma _{2}^{2}})}{L_{1}(y,{\sigma _{1}^{2}})}.\end{array}\]
Then
\[\begin{array}{r@{\hskip0pt}l}\displaystyle \frac{{\mathrm{d}}^{2}y}{\mathrm{d}{x}^{2}}& \displaystyle =\frac{L_{2}(x,{\sigma _{2}^{2}})L_{1}(y,{\sigma _{1}^{2}})-L_{1}(x,{\sigma _{2}^{2}})L_{2}(y,{\sigma _{1}^{2}})\frac{\mathrm{d}y}{\mathrm{d}x}}{L_{1}{(y,{\sigma _{1}^{2}})}^{2}}\\{} & \displaystyle =\frac{L_{2}(x,{\sigma _{2}^{2}})L_{1}{(y,{\sigma _{1}^{2}})}^{2}-L_{1}{(x,{\sigma _{2}^{2}})}^{2}L_{2}(y,{\sigma _{1}^{2}})}{L_{1}{(y,{\sigma _{1}^{2}})}^{3}}\\{} & \displaystyle =\bigg(\frac{L_{2}(x,{\sigma _{2}^{2}})}{L_{1}{(x,{\sigma _{2}^{2}})}^{2}}-\frac{L_{2}(y,{\sigma _{1}^{2}})}{L_{1}{(y,{\sigma _{1}^{2}})}^{2}}\bigg)\cdot \frac{L_{1}{(x,{\sigma _{2}^{2}})}^{2}}{L_{1}(y,{\sigma _{1}^{2}})}.\end{array}\]
Thus,
Denote by $\mu (z,{\sigma }^{2})$ the solution to the equation $L_{0}(\mu ,{\sigma }^{2})=z$. Note that as $L_{0}(x,{\sigma }^{2})$ is the cdf of a symmetric distribution, $\operatorname{sign}(L_{0}(x,{\sigma }^{2})-0.5)=\operatorname{sign}(x)$. Therefore, $\operatorname{sign}(\mu (z,{\sigma }^{2}))=\operatorname{sign}(z-0.5)$. Find the derivative
for fixed z. By the implicit function theorem,
also,
\[\begin{array}{r@{\hskip0pt}l}\displaystyle \frac{\partial }{\partial x}\bigg(\frac{L_{2}(x,v)}{L_{1}{(x,v)}^{2}}\bigg)& \displaystyle =\frac{L_{3}(x,v)L_{1}(x,v)-2L_{2}{(x,v)}^{2}}{L_{1}{(x,v)}^{3}},\\{} \displaystyle \frac{\partial }{\partial v}\bigg(\frac{L_{2}(x,v)}{L_{1}{(x,v)}^{2}}\bigg)& \displaystyle =\frac{L_{4}(x,v)L_{1}(x,v)-2L_{2}(x,v)L_{3}(x,v)}{2L_{1}{(x,v)}^{3}}.\end{array}\]
Then
\[\begin{array}{r@{\hskip0pt}l}\displaystyle \frac{\mathrm{d}}{\mathrm{d}v}\bigg(\frac{L_{2}(\mu (z,v),v)}{L_{1}{(\mu (z,v),v)}^{2}}\bigg)& \displaystyle =-\frac{L_{2}}{2L_{1}}\cdot \frac{L_{3}L_{1}-2{L_{2}^{2}}}{{L_{1}^{3}}}+\frac{L_{4}L_{1}-2L_{2}L_{3}}{2{L_{1}^{3}}}\\{} & \displaystyle =\frac{L_{4}{L_{1}^{2}}-3L_{3}L_{2}L_{1}+2{L_{2}^{3}}}{2{L_{1}^{4}}},\end{array}\]
where $L_{k}$ are evaluated at the point $(\mu (z,v),v)$. By Lemma 10,
The function $v\mapsto \frac{L_{2}(\mu (z,v),v)}{L_{1}{(\mu (z,v),v)}^{2}}$ is monotone (it is increasing for $z>0.5$ and decreasing for $z<0.5$). For x and y satisfying (4),
\[x=\mu \big(z,{\sigma _{2}^{2}}\big)\hspace{1em}\text{and}\hspace{1em}y=\mu \big(z,{\sigma _{1}^{2}}\big)\]
with $z=L_{0}(y,{\sigma _{1}^{2}})=L_{0}(x,{\sigma _{2}^{2}})$; note that $\operatorname{sign}(z-0.5)=\operatorname{sign}(x)$. Then
\[\operatorname{sign}\bigg(\frac{L_{2}(x,{\sigma _{2}^{2}})}{L_{1}{(x,{\sigma _{2}^{2}})}^{2}}-\frac{L_{2}(y,{\sigma _{1}^{2}})}{L_{1}{(y,{\sigma _{1}^{2}})}^{2}}\bigg)=\operatorname{sign}\big({\sigma _{2}^{2}}-{\sigma _{1}^{2}}\big)\operatorname{sign}(x),\]
and with (5), we can obtain the desired equality
□Lemma 4.
The equation
has no more than three solutions, unless either
or
In exceptional cases (7) and (8), equation (6) is an identity.
(6)
\[L_{0}\big({\beta _{0}^{(1)}}+{\beta _{1}^{(1)}}x,\hspace{0.2222em}{\sigma _{1}^{2}}\big)=L_{0}\big({\beta _{0}^{(2)}}+{\beta _{1}^{(2)}}x,\hspace{0.2222em}{\sigma _{2}^{2}}\big)\](7)
\[{\vec{\beta }}^{(1)}={\vec{\beta }}^{(2)}\hspace{1em}\textit{and}\hspace{1em}{\sigma _{1}^{2}}={\sigma _{2}^{2}}\](8)
\[{\beta _{1}^{(1)}}={\beta _{1}^{(2)}}=0\hspace{1em}\textit{and}\hspace{1em}L_{0}\big({\beta _{0}^{(1)}},{\sigma _{1}^{2}}\big)=L_{0}\big({\beta _{0}^{(2)}},{\sigma _{2}^{2}}\big).\]Proof.
The proof has the following idea: if a twice differentiable function $y(x)$ satisfies (4), then the plot of the function either is a straight line (if ${\sigma _{1}^{2}}={\sigma _{2}^{2}}$) or intersects any straight line at no more than three points.
Consider four cases.
Case 1. ${\sigma _{1}^{2}}={\sigma _{2}^{2}}$. Since the function $L_{0}(z,{\sigma }^{2})$ is strictly increasing in z, Eq. (6) is equivalent to
Equation (6) has only one solution if ${\beta _{1}^{(1)}}\ne {\beta _{1}^{(2)}}$; it is an identity if ${\vec{\beta }}^{(1)}={\vec{\beta }}^{(2)}$, and it has no solutions if ${\beta _{1}^{(1)}}={\beta _{1}^{(2)}}$ but ${\beta _{0}^{(1)}}\ne {\beta _{0}^{(2)}}$.
Case 2. ${\beta _{1}^{(2)}}=0$ and ${\beta _{1}^{(1)}}\ne 0$. For any fixed ${\sigma }^{2}$, the function $z\mapsto L_{0}(z,{\sigma }^{2})$ is a bijection $\mathbb{R}\to (0,\hspace{0.1667em}1)$. Denote the inverse function $\mu (Z,{\sigma }^{2})$: $L_{0}(z,{\sigma }^{2})=Z$ if and only if $z=\mu (Z,{\sigma }^{2})$. Equation (6) has a unique solution
Case 3. ${\beta _{1}^{(2)}}={\beta _{1}^{(1)}}=0$. Neither side of (6) depends on x. Equation (6) becomes $L_{0}({\beta _{0}^{(1)}},{\sigma _{1}^{2}})=L_{0}({\beta _{0}^{(2)}},{\sigma _{2}^{2}})$. Equation (6) either holds for all x or does not hold for any x.
Case 4. ${\sigma _{1}^{2}}\ne {\sigma _{2}^{2}}$ and ${\beta _{1}^{(2)}}\ne 0$. Make a linear variable substitution: denote $z_{2}={\beta _{0}^{(2)}}+{\beta _{1}^{(2)}}x$. Then Eq. (6) becomes
Define the function $z_{1}(z_{2})$ from the equation
The function $z_{1}(z_{2}):\mathbb{R}\to \mathbb{R}$ is implicitly defined by Eq. (4): there the equality holds if and only if $y=z_{1}(x)$. Hence, the function $z_{1}(z_{2})$ satisfies Lemma 3. Equation (9) is equivalent to
By Lemma 3,
is strictly monotone on both intervals $(-\infty ,\hspace{0.2222em}0]$ and $[0,\hspace{0.2222em}+\infty )$, and hence (11) attains 0 no more than at two points. Then the left-hand side of (10) has no more than three intervals of monotonicity, and Eq. (10) has no more than three solutions. Equation (6) has the same number of solutions. □
(10)
\[z_{1}(z_{2})-{\beta _{0}^{(1)}}-\frac{{\beta _{1}^{(1)}}}{{\beta _{1}^{(2)}}}\cdot \big(z_{2}-{\beta _{0}^{(2)}}\big)=0.\]
\[\begin{array}{r@{\hskip0pt}l}& \displaystyle \operatorname{sign}\bigg(\frac{{\mathrm{d}}^{2}}{\mathrm{d}{z_{2}^{2}}}\bigg(z_{1}(z_{2})-{\beta _{0}^{(1)}}-\frac{{\beta _{1}^{(1)}}}{{\beta _{1}^{(2)}}}\cdot \big(z_{2}-{\beta _{0}^{(2)}}\big)\bigg)\bigg)\\{} & \displaystyle \hspace{1em}=\operatorname{sign}\bigg(\frac{{\mathrm{d}}^{2}z_{1}(z_{2})}{\mathrm{d}{z_{2}^{2}}}\bigg)=\operatorname{sign}\big({\sigma _{2}^{2}}-{\sigma _{1}^{2}}\big)\operatorname{sign}(z_{2}).\end{array}\]
Then the derivative of the left-hand size of (10)
(11)
\[\frac{\mathrm{d}}{\mathrm{d}z_{2}}\bigg(z_{1}(z_{2})-{\beta _{0}^{(1)}}-\frac{{\beta _{1}^{(1)}}}{{\beta _{1}^{(2)}}}\cdot \big(z_{2}-{\beta _{0}^{(2)}}\big)\bigg)\]Theorem 5.
If in the functional model there are four different ${X}^{\mathrm{obs}}$, then the parameters $\vec{\beta }$ and ${\beta _{1}^{2}}{\tau }^{2}$ are identifiable.
Proof.
Suppose that there are two sets of parameters $({\vec{\beta }}^{(1)},{({\tau }^{(1)})}^{2})$ and $({\vec{\beta }}^{(2)},{({\tau }^{(2)})}^{2})$ that for a given sample of the surrogate, the regressors $\{X_{0n},\hspace{0.2778em}n=1,\dots ,N\}$ provide the same distribution of $Y_{n}$, $n=1,\dots ,N$. Then for all $n=1,\dots ,N$,
In the latter alternative,
\[\begin{array}{r@{\hskip0pt}l}\displaystyle \operatorname{\mathsf{P}}_{{\vec{\beta }}^{(1)},{({\tau }^{(1)})}^{2}}(Y_{n}=1)& \displaystyle =\operatorname{\mathsf{P}}_{{\vec{\beta }}^{(2)},{({\tau }^{(2)})}^{2}}(Y_{n}=1);\\{} \displaystyle L_{0}\big({\beta _{0}^{(1)}}+{\beta _{1}^{(1)}}{X_{n}^{\mathrm{obs}}},\hspace{0.2222em}{\big({\beta _{1}^{(1)}}\big)}^{2}{\big({\tau }^{(1)}\big)}^{2}\big)& \displaystyle =L_{0}\big({\beta _{0}^{(2)}}+{\beta _{1}^{(2)}}{X_{n}^{\mathrm{obs}}},\hspace{0.2222em}{\big({\beta _{1}^{(2)}}\big)}^{2}{\big({\tau }^{(2)}\big)}^{2}\big).\end{array}\]
The equation
\[L_{0}\big({\beta _{0}^{(1)}}+{\beta _{1}^{(1)}}x,\hspace{0.2222em}{\big({\beta _{1}^{(1)}}\big)}^{2}{\big({\tau }^{(1)}\big)}^{2}\big)=L_{0}\big({\beta _{0}^{(2)}}+{\beta _{1}^{(2)}}x,\hspace{0.2222em}{\big({\beta _{1}^{(2)}}\big)}^{2}{\big({\tau }^{(2)}\big)}^{2}\big)\]
has at least four solutions. Then by Lemma 4 either
\[{\vec{\beta }}^{(1)}={\vec{\beta }}^{(2)}\hspace{1em}\text{and}\hspace{1em}{\big({\beta _{1}^{(1)}}\big)}^{2}{\big({\tau }^{(1)}\big)}^{2}={\big({\beta _{1}^{(2)}}\big)}^{2}{\big({\tau }^{(2)}\big)}^{2},\]
or
(12)
\[{\beta _{1}^{(1)}}={\beta _{2}^{(2)}}=0\hspace{1em}\text{and}\hspace{1em}L_{0}\big({\beta _{0}^{(1)}},\hspace{0.2222em}{\big({\beta _{1}^{(1)}}\big)}^{2}{\big({\tau }^{(1)}\big)}^{2}\big)=L_{0}\big({\beta _{0}^{(2)}},\hspace{0.2222em}{\big({\beta _{1}^{(2)}}\big)}^{2}{\big({\tau }^{(2)}\big)}^{2}\big).\]
\[{\big({\beta _{1}^{(1)}}\big)}^{2}{\big({\tau }^{(1)}\big)}^{2}={\big({\beta _{1}^{(2)}}\big)}^{2}{\big({\tau }^{(2)}\big)}^{2}=0\hspace{1em}\text{and}\hspace{1em}{\beta _{0}^{(1)}}={\beta _{0}^{(2)}}\]
since $L_{0}(b_{0},0)=\frac{1}{1+{\mathrm{e}}^{-b_{0}}}$ is a strictly increasing function in $b_{0}$. □Theorem 6.
If in the structural model the distribution of $X_{0}$ is not concentrated at three (or less) points, then the parameters $\vec{\beta }$ and ${\beta _{1}^{2}}{\tau }^{2}$ are identifiable.
Proof.
Suppose that there are two sets of parameters $({\vec{\beta }}^{(1)},{({\tau }^{(1)})}^{2})$ and $({\vec{\beta }}^{(2)},{({\tau }^{(2)})}^{2})$ for which the same bivariate distribution of $({X_{1}^{\mathrm{obs}}},Y_{1})$ is obtained. The random variable $\operatorname{\mathsf{P}}[Y_{1}=1\mid {X_{1}^{\mathrm{obs}}}]$ satisfies Eq. (3) almost surely for each set of parameters. Hence, the equality
\[L_{0}\big({\beta _{0}^{(1)}}+{\beta _{1}^{(1)}}{X_{1}^{\mathrm{obs}}},\hspace{0.2222em}{\big({\beta _{1}^{(1)}}\big)}^{2}{\big({\tau }^{(1)}\big)}^{2}\big)=L_{0}\big({\beta _{0}^{(2)}}+{\beta _{1}^{(2)}}{X_{1}^{\mathrm{obs}}},\hspace{0.2222em}{\big({\beta _{1}^{(2)}}\big)}^{2}{\big({\tau }^{(2)}\big)}^{2}\big)\]
holds almost surely. The rest of the proof is the same as in Theorem 5. □