Identifiability of logistic regression with homoscedastic error: Berkson model

Shklyar, Sergiy

doi:10.15559/15-VMSTA27

Modern Stochastics: Theory and Applications

Identifiability of logistic regression with homoscedastic error: Berkson model

Volume 2, Issue 2 (2015), pp. 131–146

Sergiy Shklyar

https://doi.org/10.15559/15-VMSTA27

Pub. online: 7 July 2015 Type: Research Article

Open Access

Received
20 May 2015

Revised
19 June 2015

Accepted
20 June 2015

Published
7 July 2015

Abstract

We consider the Berkson model of logistic regression with Gaussian and homoscedastic error in regressor. The measurement error variance can be either known or unknown. We deal with both functional and structural cases. Sufficient conditions for identifiability of regression coefficients are presented.

Conditions for identifiability of the model are studied. In the case where the error variance is known, the regression parameters are identifiable if the distribution of the observed regressor is not concentrated at a single point. In the case where the error variance is not known, the regression parameters are identifiable if the distribution of the observed regressor is not concentrated at three (or less) points.

The key analytic tools are relations between the smoothed logistic distribution function and its derivatives.

1 Introduction

Statistical model. Consider logistic regression with Berkson-type error in the explanatory variable. One trial is distributed as follows. ${X_{n}^{\mathrm{obs}}}$ is the observed (or assigned) surrogate regressor. The true regressor is $X_{n}={X_{n}^{\mathrm{obs}}}+U_{n}$, where the error $U_{n}\sim N(0,{\tau }^{2})$ is independent of ${X_{n}^{\mathrm{obs}}}$. The response $Y_{n}$ is a binary random variable and attains either 0 or 1 with

\[\operatorname{\mathsf{P}}\big(Y_{n}=1\big|{X_{n}^{\mathrm{obs}}},X_{n}\big)=\frac{\exp (\beta _{0}+\beta _{1}X_{n})}{1+\exp (\beta _{0}+\beta _{1}X_{n})}.\]

We consider both functional model and structural model. In the functional one, ${X_{n}^{\mathrm{obs}}}$ are nonrandom variables, and in the structural one, ${X_{n}^{\mathrm{obs}}}$ are i.i.d., and therefore in the latter model, $({X_{n}^{\mathrm{obs}}},X_{n},Y_{n})$ are i.i.d. random triples.

The couples $({X_{n}^{\mathrm{obs}}},Y_{n})$, $n=1,\dots ,N$, are observed. Vector $\vec{\beta }={(\beta _{0},\beta _{1})}^{\top }$ is a parameter of interest.

The error variance ${\tau }^{2}$ can be either known or unknown, and we consider both cases. The conditions for identifiability of the model (or of the parameter $\vec{\beta }$) are presented.

Overview. Berkson models of logistic regression and probit regression were set up in Burr [1]. For probit regression, it is shown that the introduction of Berkson-type error is equivalent to augmentation of regression parameters. As a consequence, the Berkson model of probit regression is identifiable if ${\tau }^{2}$ is known and is not identifiable if ${\tau }^{2}$ is not known.

The identifiability of the classical model was studied by Küchenhoff [3]. He assumes that both the regressor and measurement error are normally distributed. Then univariate logistic regression is identifiable (here ${\tau }^{2}$ can be unknown), and multiple logistic regression is not identifiable. Our results can be proved similarly to [3] if we assume that the distribution of the surrogate regressor ${X}^{\mathrm{obs}}$ has an unbounded support.

For classification of errors-in-variables regression models and various estimation methods, see the monograph by Carroll et al. [2].

Identifiability of the statistical model can be used in the proof of consistency of the estimator. For known ${\tau }^{2}$, the strong consistency of the maximum likelihood estimator is obtained by Shklyar [4]. But if ${\tau }^{2}$ is not known, the maximum likelihood estimator seems to be unstable (see discussion in [2] or [3]).

2 Convolution of logistic function with normal density

Consider the function

(1)

\[L_{0}\big(x,{\sigma }^{2}\big)=\operatorname{\mathsf{E}}\frac{\exp (x-\xi )}{1+\exp (x-\xi )},\hspace{1em}\xi \sim N\big(0,{\sigma }^{2}\big),\hspace{2.5pt}x\in \mathbb{R},\hspace{2.5pt}{\sigma }^{2}\ge 0,\]

that is, $L_{0}(x,0)={\mathrm{e}}^{x}/(1+{\mathrm{e}}^{x})$ and

\[L_{0}\big(x,{\sigma }^{2}\big)=\frac{1}{\sqrt{2\pi }\sigma }{\int _{-\infty }^{\infty }}\frac{\exp (x-t)}{1+\exp (x-t)}{\mathrm{e}}^{-{t}^{2}/(2{\sigma }^{2})}\hspace{0.1667em}\mathrm{d}t\hspace{1em}\text{for}\hspace{2.5pt}{\sigma }^{2}>0.\]

Denote the derivatives w.r.t. x

(2)

\[L_{k}\big(x,{\sigma }^{2}\big)=\frac{{\partial }^{k}}{\partial {x}^{k}}L_{0}\big(x,{\sigma }^{2}\big).\]

Differentiation of $L_{k}(x,{\sigma }^{2})$ with respect to the second argument is described in Appendix A.

The distribution of $Y_{i}$ given ${X_{i}^{\mathrm{obs}}}$ is

(3)

\[\begin{array}{r@{\hskip0pt}l}\displaystyle \operatorname{\mathsf{P}}\big[Y_{i}=1\big|{X_{i}^{\mathrm{obs}}}\big]& \displaystyle =\operatorname{\mathsf{E}}\big[\operatorname{\mathsf{P}}\big[Y_{i}=1\big|{X_{i}^{\mathrm{obs}}},X_{i}\big]\big|{X_{i}^{\mathrm{obs}}}\big]\\{} & \displaystyle =\operatorname{\mathsf{E}}\bigg[\frac{\exp (\beta _{0}+\beta _{1}X_{i})}{1+\exp (\beta _{0}+\beta _{1}X_{i})}\bigg|{X_{i}^{\mathrm{obs}}}\bigg]=L\big(\beta _{0}+\beta _{1}{X_{i}^{\mathrm{obs}}},\hspace{0.2222em}{\beta _{1}^{2}}{\tau }^{2}\big)\end{array}\]

since $[\beta _{0}+\beta _{1}X_{i}\mid {X_{i}^{\mathrm{obs}}}]\sim N(\beta _{0}+\beta _{1}{X_{i}^{\mathrm{obs}}},\hspace{0.2222em}{\beta _{1}^{2}}{\tau }^{2})$.

3 Identifiability when ${\tau }^{2}$ is known

Theorem 1.

If in the functional model not all ${X}^{\mathrm{obs}}$ are equal, then the model is identifiable.

Proof.

Suppose that for two values of parameters ${\vec{\beta }}^{(1)}=({\beta _{0}^{(1)}},{\beta _{1}^{(1)}})$ and ${\vec{\beta }}^{(2)}=({\beta _{0}^{(2)}},{\beta _{1}^{(2)}})$, ${\vec{\beta }}^{(1)}\ne {\vec{\beta }}^{(2)}$, the distributions of observations are equal. Then for all $i=1,2,\dots ,N$,

\[\begin{array}{l}\displaystyle \operatorname{\mathsf{P}}_{{\vec{\beta }}^{(1)}}(Y_{i}=1)=\operatorname{\mathsf{P}}_{{\vec{\beta }}^{(2)}}(Y_{i}=1),\\{} \displaystyle L_{0}\big({\beta _{0}^{(1)}}+{\beta _{1}^{(1)}}{X_{i}^{\mathrm{obs}}},{\big({\beta _{1}^{(1)}}\big)}^{2}{\tau }^{2}\big)=L_{0}\big({\beta _{0}^{(2)}}+{\beta _{1}^{(2)}}{X_{i}^{\mathrm{obs}}},{\big({\beta _{1}^{(2)}}\big)}^{2}{\tau }^{2}\big).\end{array}\]

However, by Lemma 4.1 from [4] the equation

\[L_{0}\big({\beta _{0}^{(1)}}+{\beta _{1}^{(1)}}x,{\big({\beta _{1}^{(1)}}\big)}^{2}{\tau }^{2}\big)=L_{0}\big({\beta _{0}^{(2)}}+{\beta _{1}^{(2)}}x,{\big({\beta _{1}^{(2)}}\big)}^{2}{\tau }^{2}\big)\]

has no more than one solution x. Hence, all ${X_{i}^{\mathrm{obs}}}$ are equal. □

By definition the degenerate distribution is the distribution concentrated at a single point. For the next theorem, see the proof of Theorem 5.1 in [4].

Theorem 2 ([4]).

If in the structural model the distribution of ${X_{1}^{\mathrm{obs}}}$ is not degenerate, then the parameter $\vec{\beta }$ is identifiable.

4 Identifiability when ${\tau }^{2}$ is unknown

For fixed ${\sigma }^{2}$, the function $L_{0}(x,{\sigma }^{2})$ is a bijection $\mathbb{R}\to (0,\hspace{0.2222em}1)$. Hence, for fixed ${\sigma _{1}^{2}}$ and ${\sigma _{2}^{2}}$, the relation

(4)

\[L_{0}\big(y,{\sigma _{1}^{2}}\big)=L_{0}\big(x,{\sigma _{2}^{2}}\big)\]

sets the bijection $\mathbb{R}\to \mathbb{R}$; see Fig. 1.

Lemma 3.

For fixed ${\sigma _{1}^{2}}\ge 0$ and ${\sigma _{2}^{2}}\ge 0$, the sign of the second derivative of the implicit function (4) is

\[\operatorname{sign}\bigg(\frac{{\mathrm{d}}^{2}y}{\mathrm{d}{x}^{2}}\bigg)=\operatorname{sign}\big({\sigma _{2}^{2}}-{\sigma _{1}^{2}}\big)\operatorname{sign}(x).\]

Proof.

Differentiating (4), we get

\[\begin{array}{r@{\hskip0pt}l}\displaystyle L_{1}\big(y,{\sigma _{1}^{2}}\big)\hspace{0.1667em}\mathrm{d}y& \displaystyle =L_{1}\big(x,{\sigma _{2}^{2}}\big)\hspace{0.1667em}\mathrm{d}x;\\{} \displaystyle \frac{\mathrm{d}y}{\mathrm{d}x}& \displaystyle =\frac{L_{1}(x,{\sigma _{2}^{2}})}{L_{1}(y,{\sigma _{1}^{2}})}.\end{array}\]

Then

\[\begin{array}{r@{\hskip0pt}l}\displaystyle \frac{{\mathrm{d}}^{2}y}{\mathrm{d}{x}^{2}}& \displaystyle =\frac{L_{2}(x,{\sigma _{2}^{2}})L_{1}(y,{\sigma _{1}^{2}})-L_{1}(x,{\sigma _{2}^{2}})L_{2}(y,{\sigma _{1}^{2}})\frac{\mathrm{d}y}{\mathrm{d}x}}{L_{1}{(y,{\sigma _{1}^{2}})}^{2}}\\{} & \displaystyle =\frac{L_{2}(x,{\sigma _{2}^{2}})L_{1}{(y,{\sigma _{1}^{2}})}^{2}-L_{1}{(x,{\sigma _{2}^{2}})}^{2}L_{2}(y,{\sigma _{1}^{2}})}{L_{1}{(y,{\sigma _{1}^{2}})}^{3}}\\{} & \displaystyle =\bigg(\frac{L_{2}(x,{\sigma _{2}^{2}})}{L_{1}{(x,{\sigma _{2}^{2}})}^{2}}-\frac{L_{2}(y,{\sigma _{1}^{2}})}{L_{1}{(y,{\sigma _{1}^{2}})}^{2}}\bigg)\cdot \frac{L_{1}{(x,{\sigma _{2}^{2}})}^{2}}{L_{1}(y,{\sigma _{1}^{2}})}.\end{array}\]

Thus,

(5)

\[\operatorname{sign}\bigg(\frac{{\mathrm{d}}^{2}y}{\mathrm{d}{x}^{2}}\bigg)=\operatorname{sign}\bigg(\frac{L_{2}(x,{\sigma _{2}^{2}})}{L_{1}{(x,{\sigma _{2}^{2}})}^{2}}-\frac{L_{2}(y,{\sigma _{1}^{2}})}{L_{1}{(y,{\sigma _{1}^{2}})}^{2}}\bigg).\]

Fig. 1.

The plot to equation $L_{0}(y,{\sigma _{1}^{2}})=L_{0}(x,{\sigma _{2}^{2}})$ for ${\sigma _{1}^{2}}<{\sigma _{2}^{2}}$

Denote by $\mu (z,{\sigma }^{2})$ the solution to the equation $L_{0}(\mu ,{\sigma }^{2})=z$. Note that as $L_{0}(x,{\sigma }^{2})$ is the cdf of a symmetric distribution, $\operatorname{sign}(L_{0}(x,{\sigma }^{2})-0.5)=\operatorname{sign}(x)$. Therefore, $\operatorname{sign}(\mu (z,{\sigma }^{2}))=\operatorname{sign}(z-0.5)$. Find the derivative

\[\frac{\mathrm{d}}{\mathrm{d}v}\bigg(\frac{L_{2}(\mu (z,v),v)}{L_{1}{(\mu (z,v),v)}^{2}}\bigg)\]

for fixed z. By the implicit function theorem,

\[\frac{\mathrm{d}\mu (z,v)}{\mathrm{d}v}=-\frac{L_{2}(\mu (z,v),v)}{2L_{1}(\mu (z,v),v)};\]

also,

\[\begin{array}{r@{\hskip0pt}l}\displaystyle \frac{\partial }{\partial x}\bigg(\frac{L_{2}(x,v)}{L_{1}{(x,v)}^{2}}\bigg)& \displaystyle =\frac{L_{3}(x,v)L_{1}(x,v)-2L_{2}{(x,v)}^{2}}{L_{1}{(x,v)}^{3}},\\{} \displaystyle \frac{\partial }{\partial v}\bigg(\frac{L_{2}(x,v)}{L_{1}{(x,v)}^{2}}\bigg)& \displaystyle =\frac{L_{4}(x,v)L_{1}(x,v)-2L_{2}(x,v)L_{3}(x,v)}{2L_{1}{(x,v)}^{3}}.\end{array}\]

Then

\[\begin{array}{r@{\hskip0pt}l}\displaystyle \frac{\mathrm{d}}{\mathrm{d}v}\bigg(\frac{L_{2}(\mu (z,v),v)}{L_{1}{(\mu (z,v),v)}^{2}}\bigg)& \displaystyle =-\frac{L_{2}}{2L_{1}}\cdot \frac{L_{3}L_{1}-2{L_{2}^{2}}}{{L_{1}^{3}}}+\frac{L_{4}L_{1}-2L_{2}L_{3}}{2{L_{1}^{3}}}\\{} & \displaystyle =\frac{L_{4}{L_{1}^{2}}-3L_{3}L_{2}L_{1}+2{L_{2}^{3}}}{2{L_{1}^{4}}},\end{array}\]

where $L_{k}$ are evaluated at the point $(\mu (z,v),v)$. By Lemma 10,

\[\operatorname{sign}\bigg(\frac{\mathrm{d}}{\mathrm{d}v}\bigg(\frac{L_{2}(\mu (z,v),v)}{L_{1}{(\mu (z,v),v)}^{2}}\bigg)\bigg)=\operatorname{sign}\big(\mu (z,v)\big)=\operatorname{sign}(z-0.5).\]

The function $v\mapsto \frac{L_{2}(\mu (z,v),v)}{L_{1}{(\mu (z,v),v)}^{2}}$ is monotone (it is increasing for $z>0.5$ and decreasing for $z<0.5$). For x and y satisfying (4),

\[x=\mu \big(z,{\sigma _{2}^{2}}\big)\hspace{1em}\text{and}\hspace{1em}y=\mu \big(z,{\sigma _{1}^{2}}\big)\]

with $z=L_{0}(y,{\sigma _{1}^{2}})=L_{0}(x,{\sigma _{2}^{2}})$; note that $\operatorname{sign}(z-0.5)=\operatorname{sign}(x)$. Then

\[\operatorname{sign}\bigg(\frac{L_{2}(x,{\sigma _{2}^{2}})}{L_{1}{(x,{\sigma _{2}^{2}})}^{2}}-\frac{L_{2}(y,{\sigma _{1}^{2}})}{L_{1}{(y,{\sigma _{1}^{2}})}^{2}}\bigg)=\operatorname{sign}\big({\sigma _{2}^{2}}-{\sigma _{1}^{2}}\big)\operatorname{sign}(x),\]

and with (5), we can obtain the desired equality

\[\operatorname{sign}\bigg(\frac{{\mathrm{d}}^{2}y}{\mathrm{d}{x}^{2}}\bigg)=\operatorname{sign}\big({\sigma _{2}^{2}}-{\sigma _{1}^{2}}\big)\operatorname{sign}(x).\]

□

Lemma 4.

The equation

(6)

\[L_{0}\big({\beta _{0}^{(1)}}+{\beta _{1}^{(1)}}x,\hspace{0.2222em}{\sigma _{1}^{2}}\big)=L_{0}\big({\beta _{0}^{(2)}}+{\beta _{1}^{(2)}}x,\hspace{0.2222em}{\sigma _{2}^{2}}\big)\]

has no more than three solutions, unless either

(7)

\[{\vec{\beta }}^{(1)}={\vec{\beta }}^{(2)}\hspace{1em}\textit{and}\hspace{1em}{\sigma _{1}^{2}}={\sigma _{2}^{2}}\]

(8)

\[{\beta _{1}^{(1)}}={\beta _{1}^{(2)}}=0\hspace{1em}\textit{and}\hspace{1em}L_{0}\big({\beta _{0}^{(1)}},{\sigma _{1}^{2}}\big)=L_{0}\big({\beta _{0}^{(2)}},{\sigma _{2}^{2}}\big).\]

In exceptional cases (7) and (8), equation (6) is an identity.

Proof.

The proof has the following idea: if a twice differentiable function $y(x)$ satisfies (4), then the plot of the function either is a straight line (if ${\sigma _{1}^{2}}={\sigma _{2}^{2}}$) or intersects any straight line at no more than three points.

Consider four cases.

Case 1. ${\sigma _{1}^{2}}={\sigma _{2}^{2}}$. Since the function $L_{0}(z,{\sigma }^{2})$ is strictly increasing in z, Eq. (6) is equivalent to

\[{\beta _{0}^{(1)}}+{\beta _{1}^{(1)}}x={\beta _{0}^{(2)}}+{\beta _{1}^{(2)}}x.\]

Equation (6) has only one solution if ${\beta _{1}^{(1)}}\ne {\beta _{1}^{(2)}}$; it is an identity if ${\vec{\beta }}^{(1)}={\vec{\beta }}^{(2)}$, and it has no solutions if ${\beta _{1}^{(1)}}={\beta _{1}^{(2)}}$ but ${\beta _{0}^{(1)}}\ne {\beta _{0}^{(2)}}$.

Case 2. ${\beta _{1}^{(2)}}=0$ and ${\beta _{1}^{(1)}}\ne 0$. For any fixed ${\sigma }^{2}$, the function $z\mapsto L_{0}(z,{\sigma }^{2})$ is a bijection $\mathbb{R}\to (0,\hspace{0.1667em}1)$. Denote the inverse function $\mu (Z,{\sigma }^{2})$: $L_{0}(z,{\sigma }^{2})=Z$ if and only if $z=\mu (Z,{\sigma }^{2})$. Equation (6) has a unique solution

\[x=\frac{\mu (L_{0}({\beta _{0}^{(2)}},{\sigma _{2}^{2}}),{\sigma _{1}^{2}})-{\beta _{0}^{(1)}}}{{\beta _{1}^{(1)}}}.\]

Case 3. ${\beta _{1}^{(2)}}={\beta _{1}^{(1)}}=0$. Neither side of (6) depends on x. Equation (6) becomes $L_{0}({\beta _{0}^{(1)}},{\sigma _{1}^{2}})=L_{0}({\beta _{0}^{(2)}},{\sigma _{2}^{2}})$. Equation (6) either holds for all x or does not hold for any x.

Case 4. ${\sigma _{1}^{2}}\ne {\sigma _{2}^{2}}$ and ${\beta _{1}^{(2)}}\ne 0$. Make a linear variable substitution: denote $z_{2}={\beta _{0}^{(2)}}+{\beta _{1}^{(2)}}x$. Then Eq. (6) becomes

(9)

\[L_{0}\bigg({\beta _{0}^{(1)}}+\frac{{\beta _{1}^{(1)}}}{{\beta _{1}^{(2)}}}\cdot \big(z_{2}-{\beta _{0}^{(2)}}\big),\hspace{0.2222em}{\sigma _{1}^{2}}\bigg)=L_{0}\big(z_{2},{\sigma _{2}^{2}}\big).\]

Define the function $z_{1}(z_{2})$ from the equation

\[L_{0}\big(z_{1}(z_{2}),{\sigma _{1}^{2}}\big)=L_{0}\big(z_{2},{\sigma _{2}^{2}}\big).\]

The function $z_{1}(z_{2}):\mathbb{R}\to \mathbb{R}$ is implicitly defined by Eq. (4): there the equality holds if and only if $y=z_{1}(x)$. Hence, the function $z_{1}(z_{2})$ satisfies Lemma 3. Equation (9) is equivalent to

(10)

\[z_{1}(z_{2})-{\beta _{0}^{(1)}}-\frac{{\beta _{1}^{(1)}}}{{\beta _{1}^{(2)}}}\cdot \big(z_{2}-{\beta _{0}^{(2)}}\big)=0.\]

By Lemma 3,

\[\begin{array}{r@{\hskip0pt}l}& \displaystyle \operatorname{sign}\bigg(\frac{{\mathrm{d}}^{2}}{\mathrm{d}{z_{2}^{2}}}\bigg(z_{1}(z_{2})-{\beta _{0}^{(1)}}-\frac{{\beta _{1}^{(1)}}}{{\beta _{1}^{(2)}}}\cdot \big(z_{2}-{\beta _{0}^{(2)}}\big)\bigg)\bigg)\\{} & \displaystyle \hspace{1em}=\operatorname{sign}\bigg(\frac{{\mathrm{d}}^{2}z_{1}(z_{2})}{\mathrm{d}{z_{2}^{2}}}\bigg)=\operatorname{sign}\big({\sigma _{2}^{2}}-{\sigma _{1}^{2}}\big)\operatorname{sign}(z_{2}).\end{array}\]

Then the derivative of the left-hand size of (10)

(11)

\[\frac{\mathrm{d}}{\mathrm{d}z_{2}}\bigg(z_{1}(z_{2})-{\beta _{0}^{(1)}}-\frac{{\beta _{1}^{(1)}}}{{\beta _{1}^{(2)}}}\cdot \big(z_{2}-{\beta _{0}^{(2)}}\big)\bigg)\]

is strictly monotone on both intervals $(-\infty ,\hspace{0.2222em}0]$ and $[0,\hspace{0.2222em}+\infty )$, and hence (11) attains 0 no more than at two points. Then the left-hand side of (10) has no more than three intervals of monotonicity, and Eq. (10) has no more than three solutions. Equation (6) has the same number of solutions. □

Theorem 5.

If in the functional model there are four different ${X}^{\mathrm{obs}}$, then the parameters $\vec{\beta }$ and ${\beta _{1}^{2}}{\tau }^{2}$ are identifiable.

Proof.

Suppose that there are two sets of parameters $({\vec{\beta }}^{(1)},{({\tau }^{(1)})}^{2})$ and $({\vec{\beta }}^{(2)},{({\tau }^{(2)})}^{2})$ that for a given sample of the surrogate, the regressors $\{X_{0n},\hspace{0.2778em}n=1,\dots ,N\}$ provide the same distribution of $Y_{n}$, $n=1,\dots ,N$. Then for all $n=1,\dots ,N$,

\[\begin{array}{r@{\hskip0pt}l}\displaystyle \operatorname{\mathsf{P}}_{{\vec{\beta }}^{(1)},{({\tau }^{(1)})}^{2}}(Y_{n}=1)& \displaystyle =\operatorname{\mathsf{P}}_{{\vec{\beta }}^{(2)},{({\tau }^{(2)})}^{2}}(Y_{n}=1);\\{} \displaystyle L_{0}\big({\beta _{0}^{(1)}}+{\beta _{1}^{(1)}}{X_{n}^{\mathrm{obs}}},\hspace{0.2222em}{\big({\beta _{1}^{(1)}}\big)}^{2}{\big({\tau }^{(1)}\big)}^{2}\big)& \displaystyle =L_{0}\big({\beta _{0}^{(2)}}+{\beta _{1}^{(2)}}{X_{n}^{\mathrm{obs}}},\hspace{0.2222em}{\big({\beta _{1}^{(2)}}\big)}^{2}{\big({\tau }^{(2)}\big)}^{2}\big).\end{array}\]

The equation

\[L_{0}\big({\beta _{0}^{(1)}}+{\beta _{1}^{(1)}}x,\hspace{0.2222em}{\big({\beta _{1}^{(1)}}\big)}^{2}{\big({\tau }^{(1)}\big)}^{2}\big)=L_{0}\big({\beta _{0}^{(2)}}+{\beta _{1}^{(2)}}x,\hspace{0.2222em}{\big({\beta _{1}^{(2)}}\big)}^{2}{\big({\tau }^{(2)}\big)}^{2}\big)\]

has at least four solutions. Then by Lemma 4 either

\[{\vec{\beta }}^{(1)}={\vec{\beta }}^{(2)}\hspace{1em}\text{and}\hspace{1em}{\big({\beta _{1}^{(1)}}\big)}^{2}{\big({\tau }^{(1)}\big)}^{2}={\big({\beta _{1}^{(2)}}\big)}^{2}{\big({\tau }^{(2)}\big)}^{2},\]

(12)

\[{\beta _{1}^{(1)}}={\beta _{2}^{(2)}}=0\hspace{1em}\text{and}\hspace{1em}L_{0}\big({\beta _{0}^{(1)}},\hspace{0.2222em}{\big({\beta _{1}^{(1)}}\big)}^{2}{\big({\tau }^{(1)}\big)}^{2}\big)=L_{0}\big({\beta _{0}^{(2)}},\hspace{0.2222em}{\big({\beta _{1}^{(2)}}\big)}^{2}{\big({\tau }^{(2)}\big)}^{2}\big).\]

In the latter alternative,

\[{\big({\beta _{1}^{(1)}}\big)}^{2}{\big({\tau }^{(1)}\big)}^{2}={\big({\beta _{1}^{(2)}}\big)}^{2}{\big({\tau }^{(2)}\big)}^{2}=0\hspace{1em}\text{and}\hspace{1em}{\beta _{0}^{(1)}}={\beta _{0}^{(2)}}\]

since $L_{0}(b_{0},0)=\frac{1}{1+{\mathrm{e}}^{-b_{0}}}$ is a strictly increasing function in $b_{0}$. □

Theorem 6.

If in the structural model the distribution of $X_{0}$ is not concentrated at three (or less) points, then the parameters $\vec{\beta }$ and ${\beta _{1}^{2}}{\tau }^{2}$ are identifiable.

Proof.

Suppose that there are two sets of parameters $({\vec{\beta }}^{(1)},{({\tau }^{(1)})}^{2})$ and $({\vec{\beta }}^{(2)},{({\tau }^{(2)})}^{2})$ for which the same bivariate distribution of $({X_{1}^{\mathrm{obs}}},Y_{1})$ is obtained. The random variable $\operatorname{\mathsf{P}}[Y_{1}=1\mid {X_{1}^{\mathrm{obs}}}]$ satisfies Eq. (3) almost surely for each set of parameters. Hence, the equality

\[L_{0}\big({\beta _{0}^{(1)}}+{\beta _{1}^{(1)}}{X_{1}^{\mathrm{obs}}},\hspace{0.2222em}{\big({\beta _{1}^{(1)}}\big)}^{2}{\big({\tau }^{(1)}\big)}^{2}\big)=L_{0}\big({\beta _{0}^{(2)}}+{\beta _{1}^{(2)}}{X_{1}^{\mathrm{obs}}},\hspace{0.2222em}{\big({\beta _{1}^{(2)}}\big)}^{2}{\big({\tau }^{(2)}\big)}^{2}\big)\]

holds almost surely. The rest of the proof is the same as in Theorem 5. □

A Differentiation of $L_{k}(x,{\sigma }^{2})$

Consider the sum of two independent random variables $\zeta =\lambda +\xi $, where λ has the logistic distribution

\[\operatorname{\mathsf{P}}(\lambda \le x)=\frac{\exp (x)}{1+\exp (x)},\hspace{1em}x\in \mathbb{R},\]

and $\xi \sim N(0,{\sigma }^{2})$. We allow ${\sigma }^{2}=0$, and then $\xi =0$ almost surely.

The function $L_{0}(x,{\sigma }^{2})$ defined in (1) is the cdf of ζ, and the function $L_{1}(x,{\sigma }^{2})$ defined in (2) is the pdf of ζ.

The partial derivatives of $L_{k}(x,v)$ are

\[\frac{\partial }{\partial x}L_{k}(x,v)=L_{k+1}(x,v),\hspace{2em}\frac{\partial }{\partial v}L_{k}(x,v)=\frac{1}{2}L_{k+2}(x,v);\]

see the proof in [4, Section 2]. The functions $L_{k}(x,v)$ are infinitely differentiable and bounded on $\mathbb{R}\times [0,+\infty )$.

Since the distribution of ζ is symmetric,

\[L_{k}\big(-x,{\sigma }^{2}\big)={(-1)}^{k-1}L_{k}\big(x,{\sigma }^{2}\big),\hspace{1em}k\ge 1,\]

that is, $L_{1}(x,{\sigma }^{2})$ and $L_{3}(x,{\sigma }^{2})$ are even functions in x, and $L_{2}(x,{\sigma }^{2})$ and $L_{4}(x,{\sigma }^{2})$ are odd functions in x.

B The key inequality

The next lemma is similar to Lemma 2.1 in [4]. Hence, the proof is brief; see [4] for details.

Lemma 7.

Let ξ and η be two independent random variables, where $\xi \sim N(0,1)$. Denote $\zeta =\xi +\eta $ and let $p_{\zeta }(z)$ be the pdf of ζ. Then

\[\frac{{\mathrm{d}}^{3}}{\mathrm{d}{z}^{3}}\big(\ln p_{\zeta }(z)\big)=\mu _{3}[\eta |\zeta =z],\]

where $\mu _{3}[\eta \mid \zeta =z]$ is the third conditional central moment,

\[\mu _{3}[\eta |\zeta =z]=\operatorname{\mathsf{E}}\big[{\big(\eta -\operatorname{\mathsf{E}}[\eta |\zeta =z]\big)}^{3}\big|\zeta =z\big].\]

Proof.

We have

\[p_{\zeta }(z)=\operatorname{\mathsf{E}}p_{\xi }(z-\eta )=\frac{1}{\sqrt{2\pi }}\operatorname{\mathsf{E}}{\mathrm{e}}^{-\frac{1}{2}{(z-\eta )}^{2}}.\]

Then

(13)

\[\begin{array}{r@{\hskip0pt}l}\displaystyle {p^{\prime }_{\zeta }}(z)& \displaystyle =\frac{1}{\sqrt{2\pi }}\operatorname{\mathsf{E}}\big[(\eta -z){\mathrm{e}}^{-\frac{1}{2}{(z-\eta )}^{2}}\big],\\{} \displaystyle \frac{\mathrm{d}}{\mathrm{d}z}\big(\ln p_{\zeta }(z)\big)& \displaystyle =\frac{{p^{\prime }_{\zeta }}(z)}{p_{\zeta }(z)}=\frac{\operatorname{\mathsf{E}}[(\eta -z){\mathrm{e}}^{-\frac{1}{2}{(z-\eta )}^{2}}]}{\operatorname{\mathsf{E}}{\mathrm{e}}^{-\frac{1}{2}{(z-\eta )}^{2}}}=\frac{\operatorname{\mathsf{E}}\eta {\mathrm{e}}^{-\frac{1}{2}{(z-\eta )}^{2}}}{\operatorname{\mathsf{E}}{\mathrm{e}}^{-\frac{1}{2}{(z-\eta )}^{2}}}-z,\\{} \displaystyle \frac{{\mathrm{d}}^{2}}{\mathrm{d}{z}^{2}}\big(\ln p_{\zeta }(z)\big)& \displaystyle =\frac{\operatorname{\mathsf{E}}{\eta }^{2}{\mathrm{e}}^{-\frac{1}{2}{(z-\eta )}^{2}}\operatorname{\mathsf{E}}{\mathrm{e}}^{-\frac{1}{2}{(z-\eta )}^{2}}-{(\operatorname{\mathsf{E}}\eta {\mathrm{e}}^{-\frac{1}{2}{(z-\eta )}^{2}})}^{2}}{{(\operatorname{\mathsf{E}}{\mathrm{e}}^{-\frac{1}{2}{(z-\eta )}^{2}})}^{2}}-1,\\{} \displaystyle \frac{{\mathrm{d}}^{3}}{\mathrm{d}{z}^{3}}\big(\ln p_{\zeta }(z)\big)& \displaystyle ={\big(\operatorname{\mathsf{E}}{\mathrm{e}}^{-\frac{1}{2}{(z-\eta )}^{2}}\big)}^{-3}\\{} & \displaystyle \hspace{1em}\times \big(\operatorname{\mathsf{E}}\big[{\eta }^{2}(\eta -z){\mathrm{e}}^{-\frac{1}{2}{(z-\eta )}^{2}}\big]{\big(\operatorname{\mathsf{E}}{\mathrm{e}}^{-\frac{1}{2}{(z-\eta )}^{2}}\big)}^{2}\\{} & \displaystyle \hspace{1em}+\operatorname{\mathsf{E}}{\eta }^{2}{\mathrm{e}}^{-\frac{1}{2}{(z-\eta )}^{2}}\operatorname{\mathsf{E}}\big[(\eta -z){\mathrm{e}}^{-\frac{1}{2}{(z-\eta )}^{2}}\big]\operatorname{\mathsf{E}}{\mathrm{e}}^{-\frac{1}{2}{(z-\eta )}^{2}}\\{} & \displaystyle \hspace{1em}-2\operatorname{\mathsf{E}}\big[\eta (\eta -z){\mathrm{e}}^{-\frac{1}{2}{(z-\eta )}^{2}}\big]\operatorname{\mathsf{E}}\eta {\mathrm{e}}^{-\frac{1}{2}{(z-\eta )}^{2}}\operatorname{\mathsf{E}}{\mathrm{e}}^{-\frac{1}{2}{(z-\eta )}^{2}}\\{} & \displaystyle \hspace{1em}-2\operatorname{\mathsf{E}}{\eta }^{2}{\mathrm{e}}^{-\frac{1}{2}{(z-\eta )}^{2}}\operatorname{\mathsf{E}}{\mathrm{e}}^{-\frac{1}{2}{(z-\eta )}^{2}}\operatorname{\mathsf{E}}\big[(\eta -z){\mathrm{e}}^{-\frac{1}{2}{(z-\eta )}^{2}}\big]\\{} & \displaystyle \hspace{1em}+2{\big(\operatorname{\mathsf{E}}\eta {\mathrm{e}}^{-\frac{1}{2}{(z-\eta )}^{2}}\big)}^{2}\operatorname{\mathsf{E}}\big[(\eta -z){\mathrm{e}}^{-\frac{1}{2}{(z-\eta )}^{2}}\big]\big)\\{} & \displaystyle ={\big(\operatorname{\mathsf{E}}{\mathrm{e}}^{-\frac{1}{2}{(z-\eta )}^{2}}\big)}^{-3}\times \big(\operatorname{\mathsf{E}}{\eta }^{3}{\mathrm{e}}^{-\frac{1}{2}{(z-\eta )}^{2}}{\big(\operatorname{\mathsf{E}}{\mathrm{e}}^{-\frac{1}{2}{(z-\eta )}^{2}}\big)}^{2}\\{} & \displaystyle \hspace{1em}-3\operatorname{\mathsf{E}}{\eta }^{2}{\mathrm{e}}^{-\frac{1}{2}{(z-\eta )}^{2}}\operatorname{\mathsf{E}}\eta {\mathrm{e}}^{-\frac{1}{2}{(z-\eta )}^{2}}\operatorname{\mathsf{E}}{\mathrm{e}}^{-\frac{1}{2}{(z-\eta )}^{2}}+2{\big(\operatorname{\mathsf{E}}\eta {\mathrm{e}}^{-\frac{1}{2}{(z-\eta )}^{2}}\big)}^{3}\big).\end{array}\]

If η has a pdf, the conditional pdf of η given $\zeta =z$ is equal to

\[p_{\eta |\zeta =z}(y)=\frac{p_{\eta }(y){\mathrm{e}}^{-\frac{1}{2}{(z-y)}^{2}}}{\operatorname{\mathsf{E}}{\mathrm{e}}^{-\frac{1}{2}{(z-\eta )}^{2}}};\]

otherwise, we can use the conditional density of η w.r.t. marginal density

\[\frac{\mathrm{d}\operatorname{cdf}_{\eta |\zeta =z}(y)}{\mathrm{d}\operatorname{cdf}_{\eta }(y)}=\frac{{\mathrm{e}}^{-\frac{1}{2}{(z-y)}^{2}}}{\operatorname{\mathsf{E}}{\mathrm{e}}^{-\frac{1}{2}{(z-\eta )}^{2}}}.\]

Anyway, the conditional moments of η given $\zeta =z$ are equal to

(14)

\[\operatorname{\mathsf{E}}\big[{\eta }^{k}\big|\zeta =z\big]=\frac{\operatorname{\mathsf{E}}{\eta }^{k}{\mathrm{e}}^{-\frac{1}{2}{(z-\eta )}^{2}}}{\operatorname{\mathsf{E}}{\mathrm{e}}^{-\frac{1}{2}{(z-\eta )}^{2}}}.\]

From (13) and (14) it follows that

\[\begin{array}{r@{\hskip0pt}l}\displaystyle \frac{{\mathrm{d}}^{3}}{\mathrm{d}{z}^{3}}\big(\ln p_{\zeta }(z)\big)& \displaystyle =\operatorname{\mathsf{E}}\big[{\eta }^{3}\big|\zeta =z\big]-3\operatorname{\mathsf{E}}\big[{\eta }^{2}\big|\zeta =z\big]\operatorname{\mathsf{E}}[\eta |\zeta =z]+2{\big(\operatorname{\mathsf{E}}[\eta |\zeta =z]\big)}^{3}\\{} & \displaystyle =\mu _{3}[\eta |\zeta =z].\end{array}\]

□

Corollary 8.

Let ξ and η be independent random variables such that $\xi \sim N(\mu ,{\sigma }^{2})$. Denote $\zeta =\xi +\eta $, and denote the pdf of ζ by $p_{\zeta }(z)$. Then

\[\frac{{\mathrm{d}}^{3}}{\mathrm{d}{z}^{3}}\big(\ln p_{\zeta }(z)\big)=\frac{1}{{\sigma }^{6}}\hspace{0.1667em}\mu _{3}[\eta \mid \zeta =z].\]

Lemma 9.

Assume that the distribution of a random variable X satisfies the following conditions:

1) X has a continuously differentiable density $p_{X}(x)$.
2) X is unimodal in the following sense: there exists a mode $M\in \mathbb{R}$ such that for all $x\in \mathbb{R}$, we have the equality $\operatorname{sign}({p^{\prime }_{X}}(x))=\operatorname{sign}(M-x)$.
3) Whenever $x_{1}<M<x_{2}$ and $p_{X}(x_{1})=p_{X}(x_{2})$, then $p_{X}(x_{1})>-p_{X}(x_{2})$.
4) $\operatorname{\mathsf{E}}|X{|}^{3}<\infty $.

Then $\mu _{3}(X):=\operatorname{\mathsf{E}}{(X-\operatorname{\mathsf{E}}X)}^{3}>0$.

Proof.

1) $\operatorname{\mathsf{E}}X>M$. Denote by $x_{1}(z)$ and $x_{2}(z)$ the solutions to the equation $p_{X}(x)=z$ (see Fig. 2):

\[\begin{array}{r@{\hskip0pt}l}\displaystyle x_{1}(z)<M<x_{2}(z)& \displaystyle \hspace{1em}\text{if}\hspace{2.5pt}0<z<\max (p_{X});\\{} \displaystyle x_{1}(z)=M=x_{2}(z)& \displaystyle \hspace{1em}\text{if}\hspace{2.5pt}z=\max (p_{X});\\{} \displaystyle p_{X}\big(x_{1}(z)\big)=p_{X}\big(x_{2}(z)\big)=z& \displaystyle \hspace{1em}\text{if}\hspace{2.5pt}0<z\le \max (p_{X}).\end{array}\]

Represent the expectation as a double integral and change the order of integration:

(15)

\[\begin{array}{r@{\hskip0pt}l}\displaystyle \operatorname{\mathsf{E}}X& \displaystyle =M+{\int _{-\infty }^{\infty }}(x-M)p_{X}(x)\hspace{0.1667em}\mathrm{d}x\\{} & \displaystyle =M+\underset{\{(x,z)\hspace{0.1667em}|\hspace{0.1667em}0\le z\le p_{X}(x)\}}{\iint }(x-M)\hspace{0.1667em}\mathrm{d}x\hspace{0.1667em}\mathrm{d}z\\{} & \displaystyle =M+{\int _{0}^{\max (p_{X})}}\Bigg({\int _{x_{1}(z)}^{x_{2}(z)}}(x-M)\hspace{0.1667em}\mathrm{d}x\Bigg)\mathrm{d}z\\{} & \displaystyle =M+{\int _{0}^{\max (p_{X})}}\frac{{(x_{2}(z)-M)}^{2}-{(M-x_{1}(z))}^{2}}{2}\hspace{0.1667em}\mathrm{d}z.\end{array}\]

Fig. 2.

To proof of Lemma 9, part 1). Sample $p_{X}(x)$ and definition of $x_{1}(z)$ and $x_{2}(z)$

For all $x_{2}>M$, by the implicit function theorem,

\[\frac{\mathrm{d}}{\mathrm{d}x_{2}}x_{1}\big(p_{X}(x_{2})\big)=\frac{{p^{\prime }_{X}}(x_{2})}{{p^{\prime }_{X}}(x_{1}(p_{X}(x_{2})))}>-1\]

because $p_{X}(x_{1}(p_{X}(x_{2})))=p_{X}(x_{2})$ implies ${p^{\prime }_{X}}(x_{1}(p_{X}(x_{2})))>-{p^{\prime }_{X}}(x_{2})>0$. Note that $x_{1}(p_{X}(M))=M$. By the Lagrange theorem,

\[x_{1}\big(p_{X}(x_{2})\big)=M+(x_{2}-M)\cdot \frac{\mathrm{d}}{\mathrm{d}x_{3}}x_{1}\big(p_{X}(x_{3})\big)\Big|_{x_{3}=M+(x_{2}-M)\theta }\]

for some $\theta \in (0,1)$;

\[\begin{array}{r@{\hskip0pt}l}\displaystyle x_{1}\big(p_{X}(x_{2})\big)& \displaystyle >M-(x_{2}-M)\hspace{1em}\text{for}\hspace{2.5pt}x_{2}>M;\\{} \displaystyle x_{1}(z)& \displaystyle >M-\big(x_{2}(z)-M\big)\hspace{1em}\text{for}\hspace{2.5pt}0<z<\max (p_{X});\\{} \displaystyle x_{2}(z)-M& \displaystyle >M-x_{1}(z)>0;\\{} \displaystyle \frac{{(x_{2}(z)-M)}^{2}}{2}& \displaystyle >\frac{{(M-x_{1}(z))}^{2}}{2};\end{array}\]

the last integrand in (15) is positive, and then (15) implies $\operatorname{\mathsf{E}}X>M$.

2) Consider the function

\[f(t)=p_{X}(\operatorname{\mathsf{E}}X+t)-p_{X}(\operatorname{\mathsf{E}}X-t),\]

which is odd and strictly decreasing on the interval $[-(\operatorname{\mathsf{E}}X-M),\hspace{0.2778em}\operatorname{\mathsf{E}}X-M]$. Therefore, $f(t)$ attains 0 only once on this interval, that is, at the point 0 (see Fig. 3).

Fig. 3.

To proof of Lemma 9, part 2)

If $t>\operatorname{\mathsf{E}}X-M$ (more generally, $|t|>\operatorname{\mathsf{E}}X-M$) and $f(t)=0$, then ${f^{\prime }}(t)={p^{\prime }_{X}}(\operatorname{\mathsf{E}}X+t)+{p^{\prime }_{X}}(\operatorname{\mathsf{E}}X-t)>0$ by condition 3) of Lemma 9. Therefore, $f(t)$ can attain 0 only once on $(\operatorname{\mathsf{E}}X-M,\hspace{0.2778em}+\infty )$, and if it attains 0 (say, at a point $t_{1}>\operatorname{\mathsf{E}}X-M>0$), it is increasing in the neighborhood of $t_{1}$.

Hence, there may be two cases of sign changing of $f(t)$ (Fig. 3). Either

(16)

\[\exists t_{1}>0\hspace{2.5pt}\forall x\in \mathbb{R}\hspace{0.2778em}:\hspace{0.2778em}\operatorname{sign}\big(f(t)\big)=\operatorname{sign}(t)\operatorname{sign}\big(|t|-t_{1}\big),\]

(17)

\[\forall x\in \mathbb{R}\hspace{0.2778em}:\hspace{0.2778em}\operatorname{sign}\big(f(t)\big)=-\operatorname{sign}(t).\]

3) We have

(18)

\[\begin{array}{r@{\hskip0pt}l}\displaystyle 0& \displaystyle =\operatorname{\mathsf{E}}[X-\operatorname{\mathsf{E}}X]={\int _{-\infty }^{\infty }}(x-\operatorname{\mathsf{E}}X)p_{X}(x)\hspace{0.1667em}\mathrm{d}x\\{} & \displaystyle ={\int _{-\infty }^{\infty }}t\hspace{0.1667em}p_{X}(\operatorname{\mathsf{E}}X+t)\hspace{0.1667em}\mathrm{d}t\\{} & \displaystyle ={\int _{0}^{\infty }}t\hspace{0.1667em}p_{X}(\operatorname{\mathsf{E}}X+t)\hspace{0.1667em}\mathrm{d}t+{\int _{0}^{\infty }}(-t)\hspace{0.1667em}p_{X}(\operatorname{\mathsf{E}}X-t)\hspace{0.1667em}\mathrm{d}t\\{} & \displaystyle ={\int _{0}^{\infty }}t\hspace{0.1667em}f(t)\hspace{0.1667em}\mathrm{d}t,\end{array}\]

where $f(t)$ is defined in the second part of the proof.

Note that the case (17) is impossible because otherwise the last integrand in (18) would be negative and thus the integral could not be equal to 0.

4) Similarly to (18),

\[\operatorname{\mathsf{E}}{(X-\operatorname{\mathsf{E}}X)}^{3}={\int _{0}^{\infty }}{t}^{3}f(t)\hspace{0.1667em}\mathrm{d}t.\]

Subtract ${t_{1}^{2}}$ times Eq. (18), where $t_{1}$ comes from (16):

\[\operatorname{\mathsf{E}}{(X-\operatorname{\mathsf{E}}X)}^{3}={\int _{0}^{\infty }}t\big({t}^{2}-{t_{1}^{2}}\big)f(t)\hspace{0.1667em}\mathrm{d}t.\]

The integrand is positive for $t>0$, $t\ne t_{1}$, and hence $\mu _{3}[X]=\operatorname{\mathsf{E}}{(X-\operatorname{\mathsf{E}}X)}^{3}>0$. □

Lemma 10.

For all $x\in \mathbb{R}$ and ${\sigma }^{2}\ge 0$,

\[\begin{array}{r@{\hskip0pt}l}& \displaystyle \operatorname{sign}\big(L_{4}\big(x,{\sigma }^{2}\big)L_{1}{\big(x,{\sigma }^{2}\big)}^{2}-3L_{3}\big(x,{\sigma }^{2}\big)L_{2}\big(x,{\sigma }^{2}\big)L_{1}\big(x,{\sigma }^{2}\big)+2L_{2}{\big(x,{\sigma }^{2}\big)}^{3}\big)\\{} & \displaystyle \hspace{1em}=\operatorname{sign}(x).\end{array}\]

Lemma 11 is needed to prove Lemma 10. The notation $F(y)$ and $y_{0}$ is common for Lemmas 10 and 11.

For fixed $x>0$ and ${\sigma }^{2}$, consider the function

(19)

\[F(y)=\ln \bigg(\frac{{\mathrm{e}}^{y}}{{({\mathrm{e}}^{y}+1)}^{2}}\bigg)-\frac{{(y-x)}^{2}}{2{\sigma }^{2}}.\]

Its derivative

\[{F^{\prime }}(y)=1-2\frac{{\mathrm{e}}^{y}}{{\mathrm{e}}^{y}+1}-\frac{y-x}{{\sigma }^{2}}\]

is strictly decreasing, and

\[\underset{y\to -\infty }{\lim }{F^{\prime }}(y)=+\infty ,\hspace{2em}\underset{y\to +\infty }{\lim }{F^{\prime }}(y)=-\infty .\]

Hence, ${F^{\prime }}(y)$ attains 0 at a unique point. Denote this point by $y_{0}$, and then

(20)

\[\operatorname{sign}\big({F^{\prime }}(y)\big)=-\operatorname{sign}(y-y_{0}).\]

Lemma 11.

For the function $F(y)$ defined in (19), for $y_{0}$ satisfying (20), and for $y_{3}$ and $y_{4}$ such that ${F^{\prime }}(y_{3})+{F^{\prime }}(y_{4})=0$ and $y_{3}<y_{4}$, we have the following inequalities:

1) $y_{3}<y_{0}<y_{4}$ and ${F^{\prime }}(y_{3})=-{F^{\prime }}(y_{4})>0$.
2) $y_{3}+y_{4}>0$.
3) ${F^{\prime\prime }}(y_{3})<{F^{\prime\prime }}(y_{4})<0$.
4) $F(y_{3})>F(y_{4})$.

Proof.

1) The inequality $y_{3}<y_{0}<y_{4}$ is a consequence of (20), and (20) implies ${F^{\prime }}(y_{3})>0$.

2) $y_{3}+y_{4}>0$. For all $y\in \mathbb{R},$

\[{F^{\prime }}(y)+{F^{\prime }}(-y)=\frac{2x}{{\sigma }^{2}}>0.\]

Since ${F^{\prime }}(y_{3})+{F^{\prime }}(-y_{3})>0$ and ${F^{\prime }}(y_{3})+{F^{\prime }}(y_{4})=0$, we have ${F^{\prime }}(-y_{3})>{F^{\prime }}(y_{4})$, and then $-y_{3}<y_{4}$ because the derivative ${F^{\prime }}(y)$ is decreasing.

3) ${F^{\prime\prime }}(y_{3})<{F^{\prime\prime }}(y_{4})<0$. The second derivative

\[{F^{\prime\prime }}(y)=\frac{-2{\mathrm{e}}^{y}}{{({\mathrm{e}}^{y}+1)}^{2}}-\frac{1}{{\sigma }^{2}}\]

is an even function strictly increasing on $[0,+\infty )$ and attaining only negative values.

The inequalities $y_{3}<y_{4}$ and $y_{3}+y_{4}>0$ can be rewritten as $|y_{3}|<y_{4}$, and then

\[{F^{\prime\prime }}(y_{3})={F^{\prime\prime }}\big(|y_{3}|\big)<{F^{\prime\prime }}(y_{4})<0.\]

4) $F(x_{3})>F(x_{4})$. Consider the inverse function

\[{\big({F^{\prime }}\big)}^{-1}(t),\hspace{1em}t\in \mathbb{R}.\]

Its derivative is

\[\frac{\mathrm{d}}{\mathrm{d}t}\big({\big({F^{\prime }}\big)}^{-1}(t)\big)=\frac{1}{{F^{\prime\prime }}({({F^{\prime }})}^{-1}(t))}<0.\]

Then

\[\begin{array}{r@{\hskip0pt}l}\displaystyle \frac{\mathrm{d}}{\mathrm{d}t}\big(F\big({\big({F^{\prime }}\big)}^{-1}(t)\big)\big)& \displaystyle =\frac{{F^{\prime }}({({F^{\prime }})}^{-1}(t))}{{F^{\prime\prime }}({({F^{\prime }})}^{-1}(t))}=\frac{t}{{F^{\prime\prime }}({({F^{\prime }})}^{-1}(t))};\\{} \displaystyle \frac{\mathrm{d}}{\mathrm{d}t}\big(F\big({\big({F^{\prime }}\big)}^{-1}(t)\big)-F\big({\big({F^{\prime }}\big)}^{-1}(-t)\big)\big)& \displaystyle =\frac{t}{{F^{\prime\prime }}({({F^{\prime }})}^{-1}(t))}+\frac{-t}{{F^{\prime\prime }}({({F^{\prime }})}^{-1}(-t))}.\end{array}\]

Apply already proven part 3) of Lemma 11. If $t>0$, then ${({F^{\prime }})}^{-1}(t)<{({F^{\prime }})}^{-1}(-t)$ (because ${({F^{\prime }})}^{-1}(t)$ is a decreasing function) and ${F^{\prime }}({({F^{\prime }})}^{-1}(t))+{F^{\prime }}({({F^{\prime }})}^{-1}(-t))=t-t=0$. Then by part 3)

\[{F^{\prime\prime }}\big({\big({F^{\prime }}\big)}^{-1}(t)\big)<{F^{\prime\prime }}\big({\big({F^{\prime }}\big)}^{-1}(-t)\big)<0,\hspace{1em}t>0.\]

Hence,

\[\frac{\mathrm{d}}{\mathrm{d}t}\big(F\big({\big({F^{\prime }}\big)}^{-1}(t)\big)-F\big({\big({F^{\prime }}\big)}^{-1}(-t)\big)\big)>0,\hspace{1em}t>0.\]

Note that

\[F\big({\big({F^{\prime }}\big)}^{-1}(0)\big)-F\big({\big({F^{\prime }}\big)}^{-1}(-0)\big)=0.\]

By the Lagrange theorem, for $t>0$,

(21)

\[F\big({\big({F^{\prime }}\big)}^{-1}(t)\big)-F\big({\big({F^{\prime }}\big)}^{-1}(-t)\big)=t\cdot \frac{\mathrm{d}}{\mathrm{d}t_{1}}\big(F\big({\big({F^{\prime }}\big)}^{-1}(t_{1})\big)-F\big({\big({F^{\prime }}\big)}^{-1}(-t_{1})\big)\big)>0,\]

where the derivative is taken at some point $t_{1}\in (0,t)$.

Substituting $t={F^{\prime }}(y_{3})>0$ (then $-t={F^{\prime }}(y_{4})$), we obtain $F(y_{3})-F(y_{4})>0$. □

Proof of Lemma 10.

Case 1. $x>0$ and ${\sigma }^{2}>0$. Recall that for fixed ${\sigma }^{2}$, $L_{1}(x,{\sigma }^{2})$ is the pdf of $\eta +\xi $, where η and ξ are independent variables, $\operatorname{\mathsf{P}}(\eta <y)=\frac{{\mathrm{e}}^{y}}{{\mathrm{e}}^{y}+1}$ and $\xi \sim N(0,{\sigma }^{2})$ (see Appendix A). By Corollary 8,

(22)

\[\frac{{\mathrm{d}}^{3}}{\mathrm{d}{x}^{3}}\big(\ln L_{1}\big(x,{\sigma }^{2}\big)\big)=\frac{1}{{\sigma }^{6}}\mu _{3}[\eta |\eta +\xi =x],\]

but

(23)

\[\frac{{\mathrm{d}}^{3}}{\mathrm{d}{x}^{3}}\big(\ln L_{1}\big(x,{\sigma }^{2}\big)\big)=\frac{L_{4}{L_{1}^{2}}-3L_{3}L_{2}L_{1}+2{L_{2}^{3}}}{{L_{1}^{3}}},\]

where $L_{k}$ are evaluated at the point $(x,{\sigma }^{2})$. Since $L_{1}(x,{\sigma }^{2})>0$, we have to prove that $\mu _{3}[\eta |\eta +\xi =x]>0$. Therefore, we apply Lemma 9.

The pdf of the conditional distribution of η given $\eta +\xi =x$ is equal to

\[p_{\eta |\eta +\xi =x}(y)=\frac{1}{\operatorname{\mathsf{E}}{\mathrm{e}}^{-\frac{{(\eta -x)}^{2}}{2{\sigma }^{2}}}}\cdot \frac{{\mathrm{e}}^{y}}{{(1+{\mathrm{e}}^{y})}^{2}}{\mathrm{e}}^{-\frac{{(y-x)}^{2}}{2{\sigma }^{2}}}.\]

The pdf $p_{\eta |\eta +\xi =x}(y)$ is continuously differentiable. The conditional distribution has a finite kth moment because ${y}^{k}{\mathrm{e}}^{-\frac{{(y-x)}^{2}}{2{\sigma }^{2}}}$ is bounded for any $k\in \mathbb{N}$. Hence, conditions 1) and 4) of Lemma 9 are satisfied.

Evaluate

\[\ln p_{\eta |\eta +\xi =x}(y)=\ln \bigg(\frac{{\mathrm{e}}^{y}}{{({\mathrm{e}}^{y}+1)}^{2}}\bigg)-\frac{y-x}{2{\sigma }^{2}}-\ln \big(\operatorname{\mathsf{E}}{\mathrm{e}}^{-\frac{{(\eta -x)}^{2}}{2{\sigma }^{2}}}\big)=F(y)+C,\]

where the function $F(y)$ is defined in (19), and $C=-\ln (\operatorname{\mathsf{E}}\exp (-\frac{{(\eta -x)}^{2}}{2{\sigma }^{2}}))$ depends only on x and ${\sigma }^{2}$ and does not depend on y.

We check condition 2) of Lemma 9:

(24)

\[\begin{array}{r@{\hskip0pt}l}\displaystyle p_{\eta |\eta +\xi =x}(y)& \displaystyle ={\mathrm{e}}^{F(y)+C};\\{} \displaystyle \frac{\mathrm{d}}{\mathrm{d}y}p_{\eta |\eta +\xi =x}(y)& \displaystyle ={F^{\prime }}(y){\mathrm{e}}^{F(y)+C};\\{} \displaystyle \operatorname{sign}\bigg(\frac{\mathrm{d}}{\mathrm{d}y}p_{\eta |\eta +\xi =x}(y)\bigg)& \displaystyle =\operatorname{sign}\big({F^{\prime }}(y)\big)=-\operatorname{sign}(y-y_{0}),\end{array}\]

and condition 2) holds with $M=y_{0}$, where $y_{0}$ is defined just above (20).

Now check condition of 3) of Lemma 9. The proof is illustrated by Fig. 4. Assume that $p_{\eta |\eta +\xi =x}(y_{1})=p_{\eta |\eta +\xi =x}(y_{2})$ and $y_{1}<y_{0}<y_{2}$. Then $F(y_{1})=F(y_{2})$.

Denote

\[y_{4}={\big({F^{\prime }}\big)}^{-1}\big(-{F^{\prime }}(y_{1})\big).\]

Then ${F^{\prime }}(y_{1})+{F^{\prime }}(y_{4})={F^{\prime }}(y_{1})-{F^{\prime }}(y_{1})=0$, and by (20), as $y_{1}<y_{0}$, we have ${F^{\prime }}(y_{1})>0$, ${F^{\prime }}(y_{4})<0$, $y_{4}>y_{0}>y_{1}$. By Lemma 11, $F(y_{1})>F(y_{4})$.

Fig. 4.

To proof of Lemma 10. Checking condition 3) of Lemma 9

Hence, $F(y_{2})=F(y_{1})>F(y_{4})$. Because the function $F(y)$ is decreasing on $(y_{0},+\infty )$ (see (20)), we have $y_{2}<y_{4}$. Since the function ${F^{\prime }}(y)$ is decreasing, ${F^{\prime }}(y_{2})>{F^{\prime }}(y_{4})=-{F^{\prime }}(y_{1})$, which implies ${F^{\prime }}(y_{1})+{F^{\prime }}(y_{2})>0$. By (24) we have ${p^{\prime }_{\eta |\eta +\xi =x}}(y_{1})+{p^{\prime }_{\eta |\eta +\xi =x}}(y_{2})>0$.

All the conditions of Lemma 9 are satisfied. By Lemma 9, $\mu _{3}[\eta |\eta +\xi =x]>0$, and by (22)–(23),

(25)

\[L_{4}\big(x,{\sigma }^{2}\big)L_{1}{\big(x,{\sigma }^{2}\big)}^{2}-3L_{3}\big(x,{\sigma }^{2}\big)L_{2}\big(x,{\sigma }^{2}\big)L_{1}\big(x,{\sigma }^{2}\big)+2L_{2}\big(x,{\sigma }^{2}\big)>0\]

for all $x>0$ and ${\sigma }^{2}>0$.

Case 2. $x\le 0$ and ${\sigma }^{2}>0$. The distribution of $\eta +\xi $ is symmetric. Hence, $L_{1}(x,{\sigma }^{2})$ and $L_{3}(x,{\sigma }^{2})$ are even functions in x, and $L_{2}(x,{\sigma }^{2})$ and $L_{4}(x,{\sigma }^{2})$ are odd functions in x. Then

\[L_{4}\big(x,{\sigma }^{2}\big)L_{1}{\big(x,{\sigma }^{2}\big)}^{2}-3L_{3}\big(x,{\sigma }^{2}\big)L_{2}\big(x,{\sigma }^{2}\big)L_{1}\big(x,{\sigma }^{2}\big)+2L_{2}{\big(x,{\sigma }^{2}\big)}^{3}\]

is an odd function in x. It is equal to 0 for $x=0$, and it is negative for $x<0$ by Case 1; see (25).

Case 3. ${\sigma }^{2}=0$. The function $L_{1}(x,0)$ is the pdf of the logistic distribution, and $L_{k+1}(x,0)$ is its kth derivative:

\[\begin{array}{r@{\hskip0pt}l}\displaystyle L_{1}(x,0)& \displaystyle =\frac{{\mathrm{e}}^{x}}{{(1+{\mathrm{e}}^{x})}^{2}};\hspace{2em}L_{2}(x,0)=\frac{{\mathrm{e}}^{x}(1-{\mathrm{e}}^{x})}{{(1+{\mathrm{e}}^{x})}^{3}};\\{} \displaystyle L_{3}(x,0)& \displaystyle =\frac{{\mathrm{e}}^{x}}{{(1+{\mathrm{e}}^{x})}^{4}}\big(1-4{\mathrm{e}}^{x}+{\mathrm{e}}^{2x}\big);\\{} \displaystyle L_{4}(x,0)& \displaystyle =\frac{{\mathrm{e}}^{x}(1-{\mathrm{e}}^{x})}{{(1+{\mathrm{e}}^{x})}^{5}}\big(1-10{\mathrm{e}}^{x}+{\mathrm{e}}^{2x}\big).\end{array}\]

Then

\[\begin{array}{r@{\hskip0pt}l}\displaystyle L_{4}{L_{1}^{2}}-3L_{3}L_{2}L_{1}+2{L_{2}^{3}}& \displaystyle =\frac{{\mathrm{e}}^{3x}(1-{\mathrm{e}}^{x})}{{(1+{\mathrm{e}}^{x})}^{9}}\big(-2{\mathrm{e}}^{x}\big);\\{} \displaystyle \operatorname{sign}\big(L_{4}{L_{1}^{2}}-3L_{3}L_{2}L_{1}+2{L_{2}^{3}}\big)& \displaystyle =\operatorname{sign}(x),\end{array}\]

where $L_{k}$ are evaluated at the point $(x,0)$.

Lemma 10 is proven. □

References

[1]

Burr, D.: On errors-in-variables in binary regression – Berkson case. J. Am. Stat. Assoc. 83(403), 739–743 (1988). MR0963801. doi:10.1080/01621459.1988.10478656

[2]

Carroll, R.J., Ruppert, D., Stefanski, L.A., Crainiceanu, C.M.: Measurement Error in Nonlinear Models: A Modern Perspective. CRC Press (2006). MR2243417. doi:10.1201/9781420010138

[3]

Küchenhoff, H.: The identification of logistic regression models with errors in the variables. Stat. Pap. 36(1), 41–47 (1995). MR1334083. doi:10.1007/BF02926017

[4]

Shklyar, S.V.: Logistic regression with homoscedastic errors – A Berkson model. Theory Probab. Math. Stat. 85, 169–180 (2012). MR2933712. doi:10.1090/S0094-9000-2013-00883-7

Reading mode

Table of contents

1 Introduction
2 Convolution of logistic function with normal density
3 Identifiability when ${\tau }^{2}$ is known
4 Identifiability when ${\tau }^{2}$ is unknown
A Differentiation of $L_{k}(x,{\sigma }^{2})$
B The key inequality
References

Open access article under the CC BY license.

Keywords

Logistic regression binary regression errors in variables Berkson model regression calibration model

MSC2010

62J12

Metrics

since March 2018

619

Article info
views

373

Full article
views

357

PDF
downloads

151

XML
downloads

RSS

Figures
4
Theorems
4

Fig. 1.

The plot to equation $L_{0}(y,{\sigma _{1}^{2}})=L_{0}(x,{\sigma _{2}^{2}})$ for ${\sigma _{1}^{2}}<{\sigma _{2}^{2}}$

Fig. 2.

To proof of Lemma 9, part 1). Sample $p_{X}(x)$ and definition of $x_{1}(z)$ and $x_{2}(z)$

Fig. 3.

To proof of Lemma 9, part 2)

Fig. 4.

To proof of Lemma 10. Checking condition 3) of Lemma 9

Theorem 1.

Theorem 2 ([4]).

Theorem 5.

Theorem 6.

Authors

Abstract

1 Introduction

2 Convolution of logistic function with normal density

(1)

(2)

(3)

3 Identifiability when ${\tau }^{2}$ is known

Theorem 1.

Proof.

Theorem 2 ([4]).

4 Identifiability when ${\tau }^{2}$ is unknown

(4)

Lemma 3.

Proof.

(5)

Fig. 1.

Lemma 4.

(6)

(7)

(8)

Proof.

(9)

(10)

(11)

Theorem 5.

Proof.

(12)

Theorem 6.

Proof.

A Differentiation of $L_{k}(x,{\sigma }^{2})$

B The key inequality

Lemma 7.

Proof.

(13)

(14)

Corollary 8.

Lemma 9.

Proof.

(15)

Fig. 2.

Fig. 3.

(16)

(17)

(18)

Lemma 10.

(19)

(20)

Lemma 11.

Proof.

(21)

Proof of Lemma 10.

(22)

(23)

(24)

Fig. 4.

(25)

References

Export citation

Copy and paste formatted citation

Download citation in file

Fig. 1.

Fig. 2.

Fig. 3.

Fig. 4.

Theorem 1.

Theorem 2 ([4]).

Theorem 5.

Theorem 6.