The area under the receiver operating characteristic curve (AUC) is a suitable measure for the quality of classification algorithms. Here we use the theory of U-statistics in order to derive new confidence intervals for it. The new confidence intervals take into account that only the total sample size used to calculate the AUC can be controlled, while the number of members of the case group and the number of members of the control group are random. We show that the new confidence intervals can not only be used in order to evaluate the quality of the fitted model, but also to judge the quality of the classification algorithm itself. We would like to take this opportunity to show that two popular confidence intervals for the AUC, namely DeLong’s interval and the Mann–Whitney intervals due to Sen, coincide.
Quasi-mixing limits of the killed symmetric Lévy process are studied. It is proved that (intrinsic) ultracontractivity of the underlying process implies the existence of its (uniformly) exponentially quasi-mixing limits. As a by-product, this implication ensures that the process has (uniformly) exponential quasi-ergodicity and (uniformly) exponentially fractional quasi-ergodicity on ${L^{p}}$ ($p\ge 1$). It is noteworthy that precise rates of convergence and precise limiting equalities are provided, which are determined by spectral gaps and eigenfunction ratios of the underlying process. Finally, three examples are provided to demonstrate the theoretical results.
Finite mixtures with different regression models for different mixture components naturally arise in statistical analysis of biological and sociological data. In this paper a model of mixtures with varying concentrations is considered in which the mixing probabilities are different for different observations. The modified local linear regression estimator (mLLRE) is considered for nonparametric estimation of the unknown regression function for the given component of mixture. The asymptotic normality of the mLLRE is proved in the case when the regressor’s probability density function has jumps. Theoretically optimal bandwidth is derived. Simulations were made to estimate the accuracy of the normal approximation.
Let ${({\xi _{k}},{\eta _{k}})_{k\ge 1}}$ be independent identically distributed random vectors with arbitrarily dependent positive components and ${T_{k}}:={\xi _{1}}+\cdots +{\xi _{k-1}}+{\eta _{k}}$ for $k\in \mathbb{N}$. The random sequence ${({T_{k}})_{k\ge 1}}$ is called a (globally) perturbed random walk. Consider a general branching process generated by ${({T_{k}})_{k\ge 1}}$ and let ${Y_{j}}(t)$ denote the number of the jth generation individuals with birth times $\le t$. Assuming that $\mathrm{Var}\hspace{0.1667em}{\xi _{1}}\in (0,\infty )$ and allowing the distribution of ${\eta _{1}}$ to be arbitrary, a law of the iterated logarithm (LIL) is proved for ${Y_{j}}(t)$. In particular, an LIL for the counting process of ${({T_{k}})_{k\ge 1}}$ is obtained. The latter result was previously established in the article by Iksanov, Jedidi and Bouzeffour (2017) under the additional assumption that $\mathbb{E}{\eta _{1}^{a}}\lt \infty $ for some $a\gt 0$. In this paper, it is shown that the aforementioned additional assumption is not needed.