A new confidence interval based on the theory of U-statistics for the area under the curve
Pub. online: 9 October 2025
Type: Research Article
Open Access
Received
8 April 2025
8 April 2025
Revised
12 September 2025
12 September 2025
Accepted
22 September 2025
22 September 2025
Published
9 October 2025
9 October 2025
Abstract
The area under the receiver operating characteristic curve (AUC) is a suitable measure for the quality of classification algorithms. Here we use the theory of U-statistics in order to derive new confidence intervals for it. The new confidence intervals take into account that only the total sample size used to calculate the AUC can be controlled, while the number of members of the case group and the number of members of the control group are random. We show that the new confidence intervals can not only be used in order to evaluate the quality of the fitted model, but also to judge the quality of the classification algorithm itself. We would like to take this opportunity to show that two popular confidence intervals for the AUC, namely DeLong’s interval and the Mann–Whitney intervals due to Sen, coincide.
Supplementary material
Supplementary MaterialThe file AUC_CI.R contains all confidence intervals mentioned in this article—the new ones proposed here and the ones used in the simulation study for comparison. The files Simulation_binormal.R, Simulation_logistic.R, Siumlation_logistic_2_fast.R, Simulation_LASSO.R, Simulation_LASSO_2_fast.R, Simulation_binormal_bias.R and Simulation_logistic_bias.R contain the source code for the simulations reported in this article.
References
Afendras, G., Papadatos, N., Piperigou, V.: On the limiting distribution of sample central moments. Ann. Inst. Stat. Math. 72, 399–425 (2020). MR4067230. https://doi.org/10.1007/s10463-018-0695-4
Billingsley, P.: Covergence of Probability Measures. Wiley (1999). MR1700749. https://doi.org/10.1002/9780470316962
Carpenter, J., Bithell, J.: Bootstrap confidence intervals: when, which, what? A practical guide for medical statisticians. Stat. Med. 19, 1141–1164 (2000). https://doi.org/10.1002/(SICI)1097-0258(20000515)19:9<1141::AID-SIM479>3.0.CO;2-F
DeLong, E., DeLong, D., Clarke-Pearson, D.: Comparing the areas under two or more operating characteristic curves: A nonparametric approach. Biometrics 44, 837–845 (1988). https://doi.org/10.2307/2531595
Kottas, M., Kuss, O., Zapf, A.: A modified Wald interval for the area under the ROC curve (AUC) in diagnostic case-control studies. BMC Med. Res. Methodol. 14, 26 (2014) (9 pages). https://doi.org/10.1186/1471-2288-14-26
Kowalski, J., Tu, X.: Modern Applied U-Statistics. John Wiley & Sons (2007). MR2368050
LeDell, E., Peterson, M., v. d. Laan, M.: Computationally efficient confidence intervals for cross-validated area under the ROC curve estimates. Electron. J. Stat. 9, 1583–1607 (2015). MR3376118. https://doi.org/10.1214/15-EJS1035
Morgan, F.: Geometric Measure Theory—A Beginner’s Guide. Elsevier (2016). MR3497381
Noma, H., Shinozaki, T., Iba, K., Teramukai, S., Furukawa, T.: Confidence intervals of prediction accuracy measures for multivariable prediction models based on the bootstrap-based optimism correction methods. Stat. Med. 40, 5691–5701 (2021). MR4330574. https://doi.org/10.1002/sim.9148
Pepe, M.: The Statistical Evaluation of Medical Tests for Classification and Prediction. Oxford University Press (2003). MR2260483
Qin, G., Hotilovac, L.: Comparison of non-parametric confidence intervals for the area under the ROC curve of a continuous-scale diagnostic test. Stat. Methods Med. Res. 17, 207–221 (2008). MR2432389. https://doi.org/10.1177/0962280207087173
Sen, P.K.: A note on asymptotically distribution-free confidence bounds $P(X\lt Y)$, based on two samples. Sankhya 29, 95–102 (1967). MR0226772