The area under the receiver operating characteristic curve (AUC) is a suitable measure for the quality of classification algorithms. Here we use the theory of U-statistics in order to derive new confidence intervals for it. The new confidence intervals take into account that only the total sample size used to calculate the AUC can be controlled, while the number of members of the case group and the number of members of the control group are random. We show that the new confidence intervals can not only be used in order to evaluate the quality of the fitted model, but also to judge the quality of the classification algorithm itself. We would like to take this opportunity to show that two popular confidence intervals for the AUC, namely DeLong’s interval and the Mann–Whitney intervals due to Sen, coincide.
In this paper we develop a general framework for quantifying how binary risk factors jointly influence a binary outcome. Our key result is an additive expansion of odds ratios as a sum of marginal effects and interaction terms of varying order. These odds ratio expansions are used for estimating the excess odds ratio, attributable proportion and synergy index for a case-control dataset by means of maximum likelihood from a logistic regression model. The confidence intervals associated with these estimates of joint effects and interaction of risk factors rely on the delta method. Our methodology is illustrated with a large Nordic meta dataset for multiple sclerosis. It combines four studies, with a total of 6265 cases and 8401 controls. It has three risk factors (smoking and two genetic factors) and a number of other confounding variables.
We consider the Berkson model of logistic regression with Gaussian and homoscedastic error in regressor. The measurement error variance can be either known or unknown. We deal with both functional and structural cases. Sufficient conditions for identifiability of regression coefficients are presented.
Conditions for identifiability of the model are studied. In the case where the error variance is known, the regression parameters are identifiable if the distribution of the observed regressor is not concentrated at a single point. In the case where the error variance is not known, the regression parameters are identifiable if the distribution of the observed regressor is not concentrated at three (or less) points.
The key analytic tools are relations between the smoothed logistic distribution function and its derivatives.
Cox proportional hazards model is considered. In Kukush et al. (2011), Journal of Statistical Research, Vol. 45, No. 2, 77–94 simultaneous estimators $\lambda _{n}(\cdot )$ and $\beta _{n}$ of baseline hazard rate $\lambda (\cdot )$ and regression parameter β are studied. The estimators maximize the objective function that corrects the log-likelihood function for measurement errors and censoring. Parameter sets for $\lambda (\cdot )$ and β are convex compact sets in $C[0,\tau ]$ and ${\mathbb{R}}^{k}$, respectively. In present paper the asymptotic normality for $\beta _{n}$ and linear functionals of $\lambda _{n}(\cdot )$ is shown. The results are valid as well for a model without measurement errors. A way to compute the estimators is discussed based on the fact that $\lambda _{n}(\cdot )$ is a linear spline.