Bernstein-type bounds for beta distribution

Skorski, Maciej

doi:10.15559/23-VMSTA223

Modern Stochastics: Theory and Applications

Bernstein-type bounds for beta distribution

Volume 10, Issue 2 (2023), pp. 211–228

Maciej Skorski

https://doi.org/10.15559/23-VMSTA223

Pub. online: 13 February 2023 Type: Research Article

Open Access

Received
5 June 2022

Revised
5 February 2023

Accepted
6 February 2023

Published
13 February 2023

Abstract

This work obtains sharp closed-form exponential concentration inequalities of Bernstein type for the ubiquitous beta distribution, improving upon sub-Gaussian and sub-gamma bounds previously studied in this context.

The proof leverages a novel handy recursion of order 2 for central moments of the beta distribution, obtained from the hypergeometric representations of moments; this recursion is useful for obtaining explicit expressions for central moments and various tail approximations.

1 Introduction

1.1 Background

The beta distribution is ubiquitous in statistics due to its flexibility. Among many applications, it is used in analyses of uniform order statistics [17], problems in Euclidean geometry [10], general theory of stochastic processes [24], and applied statistical inference; the last category of applications includes time estimation in project management [5], hypothesis testing [34], A/B testing in business [28], modelling in life sciences [31] and others [19, 20].

Unfortunately, the importance and simplicity do not go in pairs. The distribution of $X\sim \mathsf{Beta}(\alpha ,\beta )$ with parameters $\alpha ,\beta >0$ is given by

(1)

\[ \mathbf{P}\{X\leqslant \epsilon \}={\int _{0}^{\epsilon }}\frac{{x^{\alpha -1}}{(1-x)^{\beta -1}}}{B(\alpha ,\beta )}\text{d}x,\hspace{1em}0\leqslant \epsilon \leqslant 1,\]

where $B(\alpha ,\beta )\triangleq {\textstyle\int _{0}^{1}}{x^{\alpha -1}}{(1-x)^{\beta -1}}\text{d}x$ is the normalizing constant; the integral (1), also known as the incomplete beta function [7], is intractable. Thus, there is a strong demand for closed-form approximations of tail probabilities, for example, in the context of adaptive Bayesian inference [8], Bayesian nonparametric statistics [4], properties of random matrices [10, 26], and (obviously) large deviation theory [33].

The main goal of this paper is to give accurate closed-form bounds for the beta distribution in the form of sub-gamma type exponential concentration inequalities [3], that is,

(2)

\[ \mathbf{P}\{X-\mathbf{E}[X]<-\epsilon \},\mathbf{P}\{X-\mathbf{E}[X]>\epsilon \}\leqslant \exp \left(-\frac{{\epsilon ^{2}}}{2v+2c\epsilon }\right)\]

for any $\epsilon \geqslant 0$ and the two parameters: the variance proxy v and the scale c, both depending on α, β. Such concentration bounds, pioneered by Bernstein [2] and popularized by Hoeffding [14], are capable of modelling both the sub-Gaussian and sub-exponential behaviors. Due to this flexibility, bounds of this sort are the working horse of approximation arguments used in modern statistics [18, 3, 22, 30].

1.2 Contribution

The contributions of this work are as follows:

• Optimal Bernstein-type concentration inequality. We establish an exponential concenration bound (2) with v that matches the variance (which is optimal) and, respectively, the optimal value of c which depends on the ratio of the third and second moment. This bound is shown optimal in the regime of small deviations.
• Useful recursion for central moments of the Beta distribution: it is of order 2 with coefficients linearly dependent on the moment order. This formula addresses the lack of a simple closed-form formula for higher-order moments. We use it to estimate the moment generating function.
• Implementation and numerical evaluation. Symbolic algebra software has been used to validate numerical comparison, and a numerical evaluation is provided to illustrate the new bounds. The code snippets are shared in the paper, and the evaluation is also shared as a Colab notebook [27].

1.3 Related work

When judging the bounds in the form of (2), it is important to insist on the optimality of the sub-Gaussian behavior. For small deviations ϵ the bound (2) becomes approximately Gaussian with variance v, thus we ideally want ${v^{2}}=\mathbf{Var}[X]$ and exponential bounds of this type are considered optimal in the literature [1]. On the other hand, bounds with ${v^{2}}>\mathbf{Var}[X]$ essentially overshoot the variance, leading to unnecessary wide tails and increasing uncertainty in statistical inference.

Bearing this in mind, we review prior suboptimal bounds in the form of (2):

• Folklore methods give some crude bounds, for example, one can express a beta random variable in terms of gamma distributions and utilize their concentration properties; such techniques do not give optimal exponents.
• Some bounds in the form of (2) can be derived from established inequalities on the incomplete beta function. Using the well-known inequality [6], which depends on a Kullback–Leibler term, unfortunately leads to suboptimal bounds in the regime of small deviations (as seen from the Taylor approximation, we necessarily obtain the variance factor of ${v^{2}}=\frac{\alpha +\beta +1}{\alpha +\beta }\cdot \mathbf{Var}[X]$, hence overshooting the variance).
• The work [10] gives bounds with explicit but suboptimal v and c, only valid in a limited range of deviations $\sqrt{\frac{6\alpha }{{(\alpha +\beta )^{2}}}}<\epsilon <\frac{\alpha }{\alpha +\beta }$.The proof relies on specific integral estimates, and cannot be sharpened much.
• The work [21] determines the best bound assuming $c=0$ (that is, of sub-Gaussian type). They are not in a closed form, but can be numerically computed as a solution of a transcendental equation. Although the bound is quite sharp for the symmetric case $\alpha =\beta $, it is much worse than our bound when the beta distribution is skewed (the typical case).
• The work [33] obtains suboptimal v and c, shown to be far from the true values by unknown constant factors. With these techniques, it is not possible to obtain the optimal exponent, which is the focus of this work.

As for the central moment recursion, we note that:

• Little is known about formulas for higher central moments (as opposed to raw moments, simpler to compute but not useful for concentration inequalities). Textbooks do not discuss neither explicit formulas nor derivation algorithms beyond the order of 4 (skewness and kurtosis)
• The modern literature [16, 12] credits the recursive formulas found in [25]. Unfortunately, that recursion is computationally inefficient due to its unbounded depth and too complicated to be manipulated for the task of closed-form moment estimation.

Remark 1.

While the current paper was under review, Henzi and Duembgen [13] presented similar tail bounds, obtained by working with integral-derived inequalities. When framed as Bernstein’s inequality, their constant c is worse for the heavier tail (e.g., the right tail in case $\alpha <\beta $), but better on the lighter tail (e.g., $c<0$ for the left tail in case $\alpha <\beta $).

1.4 Organization

The remainder of the paper is organized as follows: Section 2 presents the results, Section 3 provides the technical background, Section 4 gives proofs and Section 5 concludes the work. The implementation and numerical evaluation are shared in the notebook [27].

2 Results

2.1 Optimal Bernstein-type concentration bounds

The central result of this work is a Bernstein-type tail bound with numerically optimal parameters. The precise statement is given in the following theorem.

Theorem 1 (Bernstein’s Inequality).

Let $X\sim \mathsf{Beta}(\alpha ,\beta )$. Define the parameters

(3)

\[ \begin{aligned}{}v& \triangleq \frac{\alpha \beta }{{(\alpha +\beta )^{2}}(\alpha +\beta +1)},\\ {} c& \triangleq \frac{2\left(\beta -\alpha \right)}{\left(\alpha +\beta \right)\left(\alpha +\beta +2\right)}.\end{aligned}\]

Then the upper tail of X is bounded as

(4)

\[ \mathbf{P}\left\{X>\mathbf{E}[X]+\epsilon \right\}\leqslant \left\{\begin{array}{l@{\hskip10.0pt}l}\exp \left(-\frac{{\epsilon ^{2}}}{2\left(v+\frac{c\epsilon }{3}\right)}\right),\hspace{1em}& \beta \geqslant \alpha ,\\ {} \exp \left(-\frac{{\epsilon ^{2}}}{2v}\right),\hspace{1em}& \beta <\alpha ,\end{array}\right.\]

and the lower tail of X is bounded as

(5)

\[ \mathbf{P}\left\{X<\mathbf{E}[X]-\epsilon \right\}\leqslant \left\{\begin{array}{l@{\hskip10.0pt}l}\exp \left(-\frac{{\epsilon ^{2}}}{2\left(v+\frac{c\epsilon }{3}\right)}\right),\hspace{1em}& \alpha \geqslant \beta ,\\ {} \exp \left(-\frac{{\epsilon ^{2}}}{2v}\right),\hspace{1em}& \alpha <\beta .\end{array}\right.\]

Remark 2 (Variance and Scale Parameters).

The variance parameter equals $v=\mathbf{Var}[X]$, and the scale parameter equals $c=\frac{\mathbf{E}[{(X-\mathbf{E}[X])^{3}}]}{\mathbf{Var}[X]}$.

Remark 3 (Mixed Gaussian-Exponential Behaviour).

For small values of ϵ the bound behaves like Gaussian with variance $v=\mathbf{Var}[X]$. For bigger values of ϵ, the bound is close to the exponential tail with parameter $\frac{2c}{3v}$.

The result below shows that the parameters c and v are best possible, in the regime of small deviations ϵ, for the exponential moment method.

Theorem 2 (Best Cramér–Chernoff Bound).

The best bound that can be obtained from the Cramér–Chernoff method is

(6)

\[ \mathbf{P}\left\{X>\mathbf{E}[X]+\epsilon \right\}\leqslant {\mathrm{e}^{-\frac{{\epsilon ^{2}}}{2v}+\frac{c{\epsilon ^{3}}}{6{v^{2}}}+O\left({\epsilon ^{4}}\right)}},\]

as $\epsilon \to 0$, with constants c, v as in Theorem 1.

Comparing this result with the tail from Theorem 1 as $\epsilon \to 0$ we obtain:

Corollary 1 (Optimality of Theorem 1).

The values of constants c and v in the Bernstein-type inequality in Theorem 1 are optimal.

2.2 Handy recurrence for central moments

Our optimal Bernstein inequality is proved using the novel recursion for central moments, presented below in Theorem 3 as the contribution of independent interest. Being of order 2 it is not only efficient to evaluate numerically, but also easy to manipulate algebraically when working with closed-form formulas.

Theorem 3 (Order 2 Recurrence for Central Moments).

For $X\sim \mathsf{Beta}(\alpha ,\beta )$ and any integer order $d\geqslant 2$, the following recurrence relation holds:

(7)

\[ \begin{aligned}{}\mathbf{E}[{(X-\mathbf{E}[X])^{d}}]& =\frac{(d-1)(\beta -\alpha )}{(\alpha +\beta )(\alpha +\beta +d-1)}\cdot \mathbf{E}[{(X-\mathbf{E}[X])^{d-1}}]\\ {} & \hspace{1em}+\frac{(d-1)\alpha \beta }{{(\alpha +\beta )^{2}}(\alpha +\beta +d-1)}\cdot \mathbf{E}[{(X-\mathbf{E}[X])^{d-2}}].\end{aligned}\]

Corollary 2 (P-recursive Property).

The recursion given in Theorem 3, after rearrangements, is of order 2 with coefficients linear in the index d.

The implementation of the recursion, performed in Python’s symbolic algebra package Sympy [23], is presented in Listing 1. Scalability is ensured by memoization, which caches intermediate results to avoid repeated calls.

Listing 1.

Efficient algorithm finding exact formulas for central moments

To demonstrate the algorithm in action, we list some first central moments of the beta distribution in Table 1.

Table 1.

Central beta moments, generated from Theorem 3 using Listing 1

Central Moment	Explicit Formula
$\mathbf{E}[X]$	$\frac{\alpha }{\alpha +\beta }$
$\mathbf{Var}[X]$	$\frac{\alpha \beta }{{(\alpha +\beta )^{2}}(\alpha +\beta +1)}$
$\textbf{Skew}[X]$	$\frac{2\left(\beta -\alpha \right)\sqrt{\alpha +\beta +1}}{\sqrt{\alpha \beta }\left(\alpha +\beta +2\right)}$
$\textbf{Kurt}[X]$	$\frac{3\left(\alpha \beta \left(\alpha +\beta +2\right)+2{\left(\alpha -\beta \right)^{2}}\right)\left(\alpha +\beta +1\right)}{\alpha \beta \left(\alpha +\beta +2\right)\left(\alpha +\beta +3\right)}$
$\mathbf{E}[{(X-\mathbf{E}[X])^{5}}]/{\sqrt{\mathbf{Var}[X]}^{5}}$	$\frac{4\left(\beta -\alpha \right){\left(\alpha +\beta +1\right)^{\frac{3}{2}}}\left(3\alpha \beta \left(\alpha +\beta +2\right)+2\alpha \beta \left(\alpha +\beta +3\right)+6{\left(\alpha -\beta \right)^{2}}\right)}{{\alpha ^{\frac{3}{2}}}{\beta ^{\frac{3}{2}}}\left(\alpha +\beta +2\right)\left(\alpha +\beta +3\right)\left(\alpha +\beta +4\right)}$

Note that the algorithm addresses the lack of simple derivations for such formulas, as well as the lack of formulas beyond skewness and kurtosis.

Finally, we note that the recurrence implies, by induction, the following important property:

Corollary 3 (Skewness of Beta Distribution).

The odd central moments are of the same sign as the value of $\beta -\alpha $.

2.3 Numerical evaluation

The experiment summarized in Figure 1 (the code shared in [27]) demonstrates the advantage of sub-gamma bounds obtained in Theorem 1 over the optimal sub-Gaussian bounds from prior work [21].

Fig. 1.

Numerical evaluation of the best sub-gamma bounds (Theorem 1) and the best sub-Gaussian bounds ([21])

For skewed beta distributions, the bound from this work is more accurate than the sub-Gaussian approximation. The chosen range of parameters covers the cases where the expectation $\mathbf{E}[X]$ is a relatively small number like 0.02 or 0.2, a typical range for many practical Beta models, particularly A/B testing. Note that there is still room for improvement in the regime of larger deviations ϵ, where the bounds could potentially benefit from refining the numeric value of c. The experiment details are shared in the Colab Notebook [27].

3 Preliminaries

3.1 Gaussian hypergeometric function

The Gaussian hypergeometric function is defined as follows [11]:

(8)

\[ {_{2}}{F_{1}}(a,b;c;,z)={\sum \limits_{k=0}^{+\infty }}\frac{{(a)_{k}}{(b)_{k}}}{{(c)_{k}}}\cdot \frac{{z^{k}}}{k!},\]

where we use the Pochhammer symbol defined as

(9)

\[ {(x)_{k}}=\left\{\begin{array}{l@{\hskip10.0pt}l}1,\hspace{1em}& k=0,\\ {} x(x+1)\cdots (x+k-1),\hspace{1em}& k>0.\end{array}\right.\]

We call two functions F of this form contiguous when their parameters differ by integers. Gauss considered ${_{2}}{F_{1}}({a^{\prime }},{b^{\prime }};{c^{\prime }};,z)$ where ${a^{\prime }}=a\pm 1$, ${b^{\prime }}=b\pm 1$, ${c^{\prime }}=c\pm 1$ and proved that between F and any two of these functions, there exists a linear relationship with coefficients linear in z. It follows [29, 15] that F and any two of its contiguous series are linearly dependent, with the coefficients being rational in parameters and z. For our purpose, we need to express F by the series with increased second argument. The explicit formula comes from [32]:

Lemma 1 (Hypergeometric Contiguous Recurrence).

The following recurrence holds for the Gaussian hypergeometric function:

(10)

\[ \begin{aligned}{}{_{2}}{F_{1}}(a,b;c;,z)& =\frac{2b-c+2+(a-b-1)z}{b-c+1}{\hspace{0.1667em}_{2}}{F_{1}}(a,b+1;c;,z)\\ {} & \hspace{1em}+\frac{(b+1)(z-1)}{b-c+1}{\hspace{0.1667em}_{2}}{F_{1}}(a,b+2;c;,z).\end{aligned}\]

3.2 Beta distribution properties

We use the machinery of hypergeometric functions to establish certain properties of the beta distribution. The first result expresses the central moment in terms of the Gaussian hypergeometric function.

Lemma 2 (Central Beta Moments).

Let $X\sim \mathsf{Beta}(\alpha ,\beta )$, then we have that

(11)

\[ \mathbf{E}[{(X-\mathbf{E}[X])^{d}}]={\left(-\frac{\alpha }{\alpha +\beta }\right)^{d}}{\hspace{2.5pt}_{2}}{F_{1}}\left(\alpha ,-d;\alpha +\beta ;\frac{\alpha +\beta }{\alpha }\right),\]

where ${_{2}}{F_{1}}$ is the Gaussian hypergeometric function.

The proof appears in Section 4.1.

3.3 Cramér–Chernoff method

Below, we review the canonical method of obtaining concentration inequalities from the moment generating function, following the discussion in [3].

Recall that for a given random variable Z we define its moment generating function (MGF) and, respectively, cumulant generating function as

(12)

\[ \begin{aligned}{}{\phi _{Z}}(t)& \triangleq \mathbf{E}\left[\exp (tZ)\right],\\ {} {\psi _{Z}}(t)& \triangleq \log \mathbf{E}\left[\exp (tZ)\right].\end{aligned}\]

In this notation, Markov’s exponential inequality becomes

(13)

\[ \mathbf{P}\left\{Z\geqslant \epsilon \right\}\leqslant \exp (-t\epsilon )\mathbf{E}\left[\exp (tZ)\right]=\exp (-t\epsilon +{\psi _{Z}}(t)),\]

valid for any ϵ and any nonnegative t, and nontrivial when ${\phi _{Z}}(t)$ is finite. The optimal value of t is determined by the so-called Cramér transform:

(14)

\[ {\psi ^{\ast }}(\epsilon )=\sup \left\{t\epsilon -{\psi _{Z}}(t):t\geqslant 0\right\},\]

and leads to the Chernoff inequality

(15)

\[ \mathbf{P}\left\{Z\geqslant \epsilon \right\}\leqslant \exp (-{\psi _{Z}^{\ast }}(\epsilon )).\]

3.4 Logarithmic inequalities

It is well known that $\log (1+x)=x+\frac{{x^{2}}}{2}+\cdots \hspace{0.1667em}$ for nonnegative x. Below, we present a useful refinement of this expansion:

Lemma 3 (Padé approximation of logarithm).

For any $x\geqslant 0$ we have that

(16)

\[ \log (1+x)\leqslant x-\frac{{x^{2}}}{2\left(1+\frac{2x}{3}\right)}.\]

The bound is illustrated in Figure 2 below, and we see that it matches up to the term $O({x^{4}})$. The proof appears in Section 4.3.

Fig. 2.

The logarithmic inequality from Lemma 3

4 Proofs

4.1 Proof of Lemma 2

We know that the raw higher-order moments of X are given by [16]

(17)

\[ \mathbf{E}[{X^{d}}]=\frac{{(\alpha )_{d}}}{{(\alpha +\beta )_{d}}}.\]

Combining this with the binomial theorem, we obtain

(18)

\[ \begin{aligned}{}\mathbf{E}[{(X-\mathbf{E}[X])^{d}}]& ={\sum \limits_{k=0}^{d}}{(-1)^{d-k}}\left(\genfrac{}{}{0.0pt}{}{d}{k}\right)\mathbf{E}[{X^{k}}]{\left(\frac{\alpha }{\alpha +\beta }\right)^{d-k}}\\ {} & ={\sum \limits_{k=0}^{d}}{(-1)^{d-k}}\left(\genfrac{}{}{0.0pt}{}{d}{k}\right)\frac{{(\alpha )_{k}}}{{(\alpha +\beta )_{k}}}{\left(\frac{\alpha }{\alpha +\beta }\right)^{d-k}}.\end{aligned}\]

Finally, by $\left(\genfrac{}{}{0.0pt}{}{d}{k}\right)={(-1)^{k}}\frac{{(-d)_{k}}}{k!}$ and the definition of the hypergeometric function,

(19)

\[ \begin{aligned}{}\mathbf{E}[{(X-\mathbf{E}[X])^{d}}]& ={(-1)^{d}}{\sum \limits_{k=0}^{d}}\frac{{(-d)_{k}}{(\alpha )_{k}}}{k!{(\alpha +\beta )_{k}}}{\left(\frac{\alpha }{\alpha +\beta }\right)^{d-k}}\\ {} & ={\left(-\frac{\alpha }{\alpha +\beta }\right)^{d}}{\sum \limits_{k=0}^{d}}\frac{{(-d)_{k}}{(\alpha )_{k}}}{k!{(\alpha +\beta )_{k}}}{\left(\frac{\alpha +\beta }{\alpha }\right)^{k}}\\ {} & ={\left(-\frac{\alpha }{\alpha +\beta }\right)^{d}}{\hspace{2.5pt}_{2}}{F_{1}}\left(\alpha ,-d;\alpha +\beta ;\frac{\alpha +\beta }{\alpha }\right),\end{aligned}\]

which finishes the proof.

4.2 Proof of Theorem 3

The goal is to prove the recursion formula (10). One way to accomplish this is to reuse the summation formulas developed in the proof of Lemma 2. Below, we give an argument that uses the properties of hypergeometric functions to better highlight their connection to the beta distribution.

To simplify the notation, we define $a=\alpha $, $b=-d$, $c=\alpha +\beta $, and $z=\frac{\alpha +\beta }{\alpha }$. Define ${\mu _{d}}\triangleq \mathbf{E}[{(X-\mathbf{E}[X])^{d}}]$. Then using Lemma 2 and Lemma 1 we obtain

(20)

\[ \begin{aligned}{}{(-z)^{d}}\cdot {\mu _{d}}& {=_{2}}{F_{1}}(a,b;c;z)\\ {} & =\frac{2b-c+2+(a-b-1)z}{b-c+1}{\hspace{0.1667em}_{2}}{F_{1}}(a,b+1;c;,z)\\ {} & \hspace{1em}+\frac{(b+1)(z-1)}{b-c+1}{\hspace{0.1667em}_{2}}{F_{1}}(a,b+2;c;,z).\end{aligned}\]

In terms of α, β, d we obtain

(21)

\[ \begin{aligned}{}\frac{2b-c+2+(a-b-1)z}{b-c+1}& =(d-1)\cdot \frac{\alpha -\beta }{\alpha (\alpha +\beta +d-1)},\\ {} \frac{(b+1)(z-1)}{b-c+1}& =(d-1)\cdot \frac{\beta }{\alpha (\alpha +\beta +d-1)}.\end{aligned}\]

The computations are done in SymPy package [23], as shown in Listing 2.

Listing 2.

Simplifying Hypergeometric Recurrence

Since we have

(22)

\[ \begin{aligned}{}{_{2}}{F_{1}}(a,b+1;c;,z)& ={\hspace{2.5pt}_{2}}{F_{1}}(a,-d+1;c;,z)={\mu _{d-1}}\cdot {(-z)^{d-1}},\\ {} {_{2}}{F_{1}}(a,b+2;c;,z)& ={\hspace{2.5pt}_{2}}{F_{1}}(a,-d+2;c;,z)={\mu _{d-2}}\cdot {(-z)^{d-2}},\end{aligned}\]

it follows that

(23)

\[ {z^{2}}{\mu _{d}}=-\frac{z(d-1)(\alpha -\beta )}{\alpha (\alpha +\beta +d-1)}\cdot {\mu _{d-1}}+\frac{(d-1)\beta }{\alpha (\alpha +\beta +d-1)}\cdot {\mu _{d-2}}.\]

Recalling that $z=\frac{\alpha +\beta }{\alpha }$ we finally obtain

(24)

\[ {\mu _{d}}=-\frac{(d-1)(\alpha -\beta )}{(\alpha +\beta )(\alpha +\beta +d-1)}\cdot {\mu _{d-1}}+\frac{(d-1)\alpha \beta }{{(\alpha +\beta )^{2}}(\alpha +\beta +d-1)}\cdot {\mu _{d-2}},\]

which finishes the proof.

4.3 Proof of Lemma 3

Consider the function $f(x)=\log (1+x)-\Big(x-\frac{{x^{2}}}{2(1+\frac{2x}{3})}\Big)$ for nonnegative x. Using Sympy we find that

(25)

\[ {f^{\prime }}(x)=-\frac{{x^{3}}}{\left(x+1\right){\left(2x+3\right)^{2}}},\]

as demonstrated in Listing 3. Thus, f is nondecreasing; since $f(0)=0$ we obtain $f(x)\leqslant 0$ as claimed.

Listing 3.

A Padé-type logarithm inequality

4.4 Proof of Theorem 1

If $X\sim \mathsf{Beta}(\alpha ,\beta )$, then for ${X^{\prime }}=1-X$ we obtain $X-\mathbf{E}[X]=\mathbf{E}[{X^{\prime }}]-{X^{\prime }}$ and thus the upper tail of X equals the lower tail of ${X^{\prime }}$, and the lower tail of X equals the upper tail of ${X^{\prime }}$. Furthermore, from (1) it follows that ${X^{\prime }}\sim \mathsf{Beta}(\beta ,\alpha )$. Thus, by exchanging the roles of α and β accordingly, it suffices to prove the theorem for the upper tail.

To facilitate calculations, we introduce the centred version of X:

(26)

\[ Z\triangleq X-\mathbf{E}[X].\]

With the help of normalized moments

(27)

\[ {m_{d}}\triangleq \frac{\mathbf{E}[{Z^{d}}]}{d!},\]

we expand the moment generating function of Z into the series

(28)

\[ \phi (t)\triangleq \mathbf{E}\exp (tZ)=1+{\sum \limits_{d=2}^{+\infty }}{m_{d}}{t^{d}},\hspace{1em}t\in \mathbb{R}\]

(which converges everywhere as $|Z|\leqslant 1$), and expand its derivative as

(29)

\[ {\phi ^{\prime }}(t)={\sum \limits_{d=2}^{+\infty }}d{m_{d}}{t^{d-1}},\hspace{1em}t\in \mathbb{R}\]

(everywhere, accordingly).

The proof strategy is as follows: we start by expressing the cumulant generating function as an integral involving the moment generating function and its derivative; then we study series expansions of the functions involved, using the moment recursion; this leads to a tractable upper-bound on the integral; finally, the cumulant bound is optimized by the Cramér–Chernoff method. To achieve the promised optimal constants, it is critical to keep all intermediate estimates sharp up to the order of 4. The proof flow is illustrated on Figure 3.

Fig. 3.

The proof roadmap

Observe that the cumulant generating function ψ can be expressed in terms of the moment generating function ϕ as follows.

Claim 1 (Cumulant generating function as integral).

We have

(30)

\[ \psi (t)\triangleq \log \phi (t)={\int _{0}^{t}}\frac{{\phi ^{\prime }}(s)}{\phi (s)}\textit{d}s.\]

Proof.

This follows by the logarithmic derivative identity: $\log {\phi ^{\prime }}=\frac{{\phi ^{\prime }}}{\phi }$. □

We now present a recursion for coefficients of the series expansions.

Claim 2.

The coefficients of the moment generating series satisfy

(31)

\[ d(\alpha +\beta +d-1){m_{d}}=\frac{(d-1)(\beta -\alpha )}{\alpha +\beta }{m_{d-1}}+\frac{\alpha \beta }{{(\alpha +\beta )^{2}}}{m_{d-2}},\hspace{1em}d\geqslant 2.\]

Proof of Claim.

This follows from Theorem 3 by simple manipulations. □

Remark 4.

Denote $p=\frac{\alpha }{\alpha +\beta }$ and $n=\alpha +\beta $. Then we can write

(32)

\[ {m_{d}}=\frac{(d-1)(1-2p)}{d(d+n-1)}{m_{d-1}}+\frac{p(1-p)}{d(d+n-1)}{m_{d-2}}.\]

The relation between ${\phi ^{\prime }}$ and ϕ is given in the following auxiliary result.

Claim 3.

For nonnegative t and $c=\frac{\beta -\alpha }{(\alpha +\beta )(\alpha +\beta +2)}$ it holds that

(33)

\[ \begin{array}{r@{\hskip0pt}l@{\hskip10pt}r}\displaystyle \frac{{\phi ^{\prime }}(t)}{\phi (t)}\leqslant \frac{vt}{1-ct},& & \displaystyle \alpha \leqslant \beta \textit{and}\hspace{2.5pt}t<\frac{1}{c},\\ {} \displaystyle \phi (t)\leqslant \exp \left(\frac{v{t^{2}}}{2}\right),& & \displaystyle \alpha >\beta .\end{array}\]

Proof of Claim.

The proof splits depending on the relation between α and β. The case $\alpha >\beta $ is a bit easier and leads to sub-Gaussian bounds.

Case $\alpha >\beta $: From Corollary 3 and (31) it follows that ${m_{d}}$ is nonnegative when d is even, and negative otherwise. Thus, for even d we have

(34)

\[ {m_{d}}\leqslant \frac{1}{d}\cdot \frac{\alpha \beta }{\alpha +\beta +d-1}{m_{d-2}}\leqslant \frac{v}{d}\cdot {m_{d-2}}.\]

Repeating this $d/2$ times and combining with ${m_{d}}\leqslant 0$ for odd d, we obtain

(35)

\[ {m_{d}}\leqslant \left\{\begin{array}{l@{\hskip10.0pt}l}\frac{{v^{\frac{d}{2}}}}{d!!},\hspace{1em}& d\hspace{2.5pt}\text{even},\\ {} 0,\hspace{1em}& d\hspace{2.5pt}\text{odd}.\end{array}\right.\]

Using $d!!={2^{d/2}}(d/2)!$ for even d, for $t\geqslant 0$ we obtain

(36)

\[\begin{aligned}{}\phi (t)& \leqslant 1+{\sum \limits_{d=2}^{+\infty }}{m_{d}}{t^{d}}\leqslant \exp \left(\frac{v{t^{2}}}{2}\right),\end{aligned}\]

as required.

Case $\alpha \leqslant \beta $: By (28) and (29) for any t and any c (to be determined later)

(37)

\[ (1-ct){\phi ^{\prime }}(t)-vt\phi (t)={\sum \limits_{d=3}^{+\infty }}\left(d{m_{d}}-(d-1)c{m_{d-1}}-v{m_{d-2}}\right){t^{d-1}},\]

where we use the fact that the expansion terms with 1, t, ${t^{2}}$ vanish.

Using (31) to eliminate the term with ${m_{d-2}}$, and expressing v in terms of α, β, we obtain

(38)

\[\begin{aligned}{}& d{m_{d}}-(d-1)c{m_{d-1}}-v{m_{d-2}}\\ {} & \hspace{1em}=-\frac{d\left(d-2\right)}{\alpha +\beta +1}{m_{d}}+\frac{\left(1-d\right)\left(\alpha -\beta +c\left({\alpha ^{2}}+2\alpha \beta +\alpha +{\beta ^{2}}+\beta \right)\right)}{{\alpha ^{2}}+2\alpha \beta +\alpha +{\beta ^{2}}+\beta }{m_{d-1}}.\end{aligned}\]

Since we assume $t\geqslant 0$, it suffices to show that (38) is nonpositive for $d\geqslant 3$, for c defined as in Theorem 1.

Since $\alpha \leqslant \beta $, we have that ${m_{d}}\geqslant 0$ for all d by induction in (31). In particular, ${m_{d-2}}\geqslant 0$, and from (31) we obtain the bound

(39)

\[ {m_{d}}\geqslant \frac{\left(\beta -\alpha \right)\left(d-1\right){m_{d-1}}}{d\left(\alpha +\beta \right)\left(\alpha +\beta +d-1\right)}\]

(note that the bound is sharp for $d=3$) which together with (38) yields

(40)

\[\begin{aligned}{}& d{m_{d}}-(d-1)c{m_{d-1}}-v{m_{d-2}}\\ {} & \hspace{1em}\leqslant c\left(1-d\right){m_{d-1}}+\frac{\left(-\alpha d+\alpha +\beta d-\beta \right){m_{d-1}}}{{\alpha ^{2}}+2\alpha \beta +\alpha d-\alpha +{\beta ^{2}}+\beta d-\beta }.\end{aligned}\]

The last expression is linear in c, with negative coefficient (because ${m_{d-1}}\geqslant 0$ and $d\geqslant 3$). Thus, it is nonpositive if and only if the condition

(41)

\[ c\geqslant \frac{\beta -\alpha }{\left(\alpha +\beta \right)\left(\alpha +\beta +d-1\right)}\]

holds for all $d\geqslant 3$. Since $\beta -\alpha \geqslant 0$, this is equivalent to

(42)

\[ c\geqslant \frac{\beta -\alpha }{\left(\alpha +\beta \right)\left(\alpha +\beta +2\right)},\]

and under this condition, (38) is nonpositive. Replacing in this reasoning c with $\frac{c}{2}$, to align with Theorem 1, finishes the proof.

The calculations are done in Sympy and shown in Listing 4. □

We are now ready to estimate the cumulant generating function.

Listing 4.

Functional inequality on the moment generating function

Claim 4.

For nonnegative t and c as in Claim 3 we have

(43)

\[ \psi (t)\leqslant \left\{\begin{array}{l@{\hskip10.0pt}l}-v\cdot \frac{ct+\log (1-ct)}{{c^{2}}},\hspace{1em}& \alpha \leqslant \beta ,\hspace{1em}t<\frac{1}{c},\\ {} \frac{v{t^{2}}}{2},\hspace{1em}& \alpha >\beta .\end{array}\right.\]

Proof.

This follows from the previous claim, by integrating the first branch of the bound and, respectively, by taking the logarithm of the second branch. □

Remark 5.

Note that this bound, in case $\alpha \leqslant \beta $, gives $\psi (t)\leqslant 1+\frac{{t^{2}}v}{2}+\frac{(\beta -\alpha ){t^{3}}v}{3(\alpha +\beta +2)}+O\left({t^{4}}\right)$, and comparing this with the actual expansion $\psi (t)=1+\frac{{t^{2}}\mathbf{Var}[X]}{2}+\frac{{t^{3}}\mathbf{E}[{(X-\mathbf{E}[X])^{3}}]}{6}+O({t^{4}})$ we find that the bound is sharp up to $O({t^{4}})$.

Finally, in the next claim, we present the Chernoff inequality.

Claim 5.

For nonnegative ϵ and c as in Claim 3 we have

(44)

\[ {\psi ^{\ast }}(\epsilon )\geqslant \left\{\begin{array}{l@{\hskip10.0pt}l}\frac{{\epsilon ^{2}}}{2v\left(1+\frac{2c\epsilon }{3v}\right)},\hspace{1em}& \alpha \leqslant \beta ,\\ {} \frac{{\epsilon ^{2}}}{2v},\hspace{1em}& \alpha >\beta .\end{array}\right.\]

Proof.

With the help of a computer algebra package, we plug the upper bound on $\psi (t)$ from Claim 4 into the Cramér–Chernoff bound, and find that it is maximized at ${t_{\mathit{best}}}=\frac{\epsilon }{c\epsilon +v}$ when $0 < t < 1/c$. This yields the bound

(45)

\[ \begin{aligned}{}{\psi ^{\ast }}(\epsilon )& \triangleq \sup \{t\epsilon -\psi (t):0\leqslant t\}\\ {} & \geqslant \epsilon {t_{\mathit{best}}}-\psi ({t_{\mathit{best}}})\\ {} & =\frac{v}{{c^{2}}}\cdot \left(\frac{c\epsilon }{v}-\log \left(\frac{c\epsilon }{v}+1\right)\right).\end{aligned}\]

The computations are done in Sympy, as shown in Listing 5.

Listing 5.

Optimizing the Cramér–Chernoff Bound

Using (16) with $x=\frac{c\epsilon }{v}$ we finally obtain

(46)

\[ {\psi ^{\ast }}(\epsilon )\geqslant \frac{{\epsilon ^{2}}}{2v\left(1+\frac{2c\epsilon }{3v}\right)},\]

which finishes the proof. □

Theorem 1 follows now by (15) and replacing $c=\frac{\beta -\alpha }{(\alpha +\beta )(\alpha +\beta +2)}$ by $2c$.

4.5 Proof of Theorem 2

Fix $\epsilon \geqslant 0$. Denote $Z=X-\mathbf{E}[X]$, and let $\phi (t)=\mathbf{E}[\exp (tZ)]$, $\psi (t)=\log \phi (t)$ be, respectively, the moment and cumulant generating functions of Z. By Jensen’s inequality $\phi (t)\geqslant \exp (t\mathbf{E}[Z])=1$, and so $\psi (t)\geqslant 0$ and $t\epsilon -\psi (t)\leqslant 0$ for $t\leqslant 0$. Since $\psi (0)=0$ we conclude that

(47)

\[ {\psi ^{\ast }}(t)=\sup \left\{t\epsilon -\psi (t):t\geqslant 0\right\}=\sup \left\{t\epsilon -\psi (t):t\in \mathbb{R}\right\},\]

which shows that the Cramér–Chernoff exponent equals the global maximum of the function $t\to t\epsilon -\psi (t)$ (the Legendre–Fenchel transform), achieved for $t\geqslant 0$.

It is known that the cumulant generating function is convex, and strictly convex for nonconstant random variables [3]. Thus, $\psi (t)$ is strictly convex. As a consequence, we see that the function $t\to t\epsilon -\psi (t)$ is strictly concave. Thus, (47) is maximized at the value t which is a unique solution to

(48)

\[ {\psi ^{\prime }}(t)-\epsilon =0.\]

The last equation defines $t=t(\epsilon )$ implicitly and seems to be not solvable with elementary functions. Nevertheless, the Lagrange inversion theorem ensures that t is analytic as a function of ϵ and gives initial terms of the series expansion. More precisely, if $y=f(x)$ where f is analytic at 0 and $f(0)=0$, ${f^{\prime }}(0)\ne 0$ then x expands into a power series in y around $y=0$, with the coefficient of the term ${y^{k}}$ equal to $\frac{1}{k}$ times the coefficient of ${x^{k-1}}$ in the series expansion of ${\left(x/f(x)\right)^{k}}$ when $k>0$ and 0 when $k=0$; for a reference, see for example, [9]. Applying this to $y=\epsilon $, $x=t$ and $f={\psi ^{\prime }}$ we obtain

(49)

\[ t=\frac{\epsilon }{v}-\frac{c{\epsilon ^{2}}}{2{v^{2}}}+O({\epsilon ^{3}}),\]

where we do computations in Sympy as shown in Listing 6. Note that we use only up to three terms of the expansion of f, since we are interested in the first two terms for x.

Listing 6.

Best possible Chernoff-type tail bounds

Therefore, we obtain the following exponent in the Chernoff inequality:

(50)

\[ {\psi ^{\ast }}(\epsilon )=t\epsilon -\psi (t)=\frac{{\epsilon ^{2}}}{2v}-\frac{c{\epsilon ^{3}}}{6{v^{2}}}+O\left({\epsilon ^{4}}\right),\]

which finishes the proof of the formula in Theorem 2. This expansion matches the expansion of the exponent in Theorem 1 up to $O({\epsilon ^{4}})$, which proves the optimality of the numeric constants.

5 Conclusion

This work established the closed-form sub-gamma type (Bernstein’s) concentration bound for the beta distribution, with optimal constants. This solves the challenge settled by the prior work on sharp sub-Gaussian bounds.

Acknowledgments

The author is grateful to the anonymous reviewers for their constructive comments on the manuscript and help with correcting the proofs, and to Lutz Duembgen for insightful technical discussions.

References

[1]

Ben-Hamou, A., Boucheron, S., Ohannessian, M.I.: Concentration inequalities in the infinite urn scheme for occupancy counts and the missing mass, with applications. Bernoulli 23(1), 249–287 (2017). MR3556773. https://doi.org/10.3150/15-BEJ743

[2]

Bernstein, S.: The theory of probabilities. Gastehizdat Publishing House, Moscow (1946).

[3]

Boucheron, S., Lugosi, G., Bousquet, O.: Concentration inequalities. In: Summer School on Machine Learning, pp. 208–240 (2003). Springer.

[4]

Castillo, I., et al.: Pólya tree posterior distributions on densities. In: Annales de l’Institut Henri Poincaré, Probabilités et Statistiques, vol. 53, pp. 2074–2102 (2017). Institut Henri Poincaré. MR3729648. https://doi.org/10.1214/16-AIHP784

[5]

Clark, C.E.: The pert model for the distribution of an activity time. Oper. Res. 10(3), 405–406 (1962).

[6]

Dumbgen, L.: New goodness-of-fit tests and their application to nonparametric confidence sets. Ann. Stat. 26(1), 288–314 (1998). MR1611768. https://doi.org/10.1214/aos/1030563987

[7]

Dutka, J.: The incomplete beta function—a historical profile. Arch. Hist. Exact Sci. 24(1), 11–29 (1981). MR0618150. https://doi.org/10.1007/BF00327713

[8]

Elder, S.: Bayesian adaptive data analysis guarantees from subgaussianity. arXiv preprint arXiv:1611.00065 (2016).

[9]

Flajolet, P., Sedgewick, R.: Analytic Combinatorics. Cambridge University Press (2009). https://doi.org/10.1017/CBO9780511801655

[10]

Frankl, P., Maehara, H.: Some geometric applications of the beta distribution. Ann. Inst. Stat. Math. 42(3), 463–474 (1990). Available at: https://www.ism.ac.jp/editsec/aism/pdf/042_3_0463.pdf. MR1093963. https://doi.org/10.1007/BF00049302

[11]

Gauss, C.F.: Disquisitiones generales circa seriem infinitam (1813).

[12]

Gupta, A.K., Nadarajah, S.: Handbook of Beta Distribution and Its Applications. Statistics: A Series of Textbooks and Monographs. Taylor & Francis (2004). https://books.google.at/books?id=cVmnsxa-VzwC. MR2079703

[13]

Henzi, A., Duembgen, L.: Some new inequalities for beta distributions. arXiv preprint arXiv:2202.06718 (2022).

[14]

Hoeffding, W.: Probability inequalities for sums of bounded random variables. In: The Collected Works of Wassily Hoeffding, pp. 409–426 (1994). Springer. MR1307621

[15]

Ibrahim, A.K.: Contiguous relations for 2f1 hypergeometric series. J. Egypt. Math. Soc. 20(2), 72–78 (2012). MR3011583. https://doi.org/10.1016/j.joems.2012.08.005

[16]

Johnson, N.L., Kotz, S., Balakrishnan, N.: Continuous Univariate Distributions, Volume 2 vol. 289. John wiley & sons (1995) MR1326603

[17]

Jones, M.: On fractional uniform order statistics. Stat. Probab. Lett. 58(1), 93–96 (2002). MR1913312. https://doi.org/10.1016/S0167-7152(02)00119-0

[18]

Kahane, J.-P.: Propriétés locales des fonctions à séries de fourier aléatoires. Stud. Math. 19(1), 1–25 (1960). MR0117506. https://doi.org/10.4064/sm-19-1-1-25

[19]

Kim, B.-c., Reinschmidt, K.F.: Probabilistic forecasting of project duration using bayesian inference and the beta distribution. J. Constr. Eng. Manage. 135(3), 178–186 (2009).

[20]

Kipping, D.M.: Parametrizing the exoplanet eccentricity distribution with the beta distribution. Mon. Not. R. Astron. Soc. Lett. 434(1), 51–55 (2013).

[21]

Marchal, O., Arbel, J., et al.: On the sub-gaussianity of the beta and dirichlet distributions. Electron. Commun. Probab. 22 (2017). MR3718704. https://doi.org/10.1214/17-ECP92

[22]

Maurer, A., Pontil, M.: Concentration inequalities under sub-gaussian and sub-exponential conditions. In: Advances in Neural Information Processing Systems 34 (2021).

[23]

Meurer, A., Smith, C.P., Paprocki, M., Čertík, O., Kirpichev, S.B., Rocklin, M., Kumar, A., Ivanov, S., Moore, J.K., Singh, S., Rathnayake, T., Vig, S., Granger, B.E., Muller, R.P., Bonazzi, F., Gupta, H., Vats, S., Johansson, F., Pedregosa, F., Curry, M.J., Terrel, A.R., Roučka, v., Saboo, A., Fernando, I., Kulal, S., Cimrman, R., Scopatz, A.: Sympy: symbolic computing in python. PeerJ Comput. Sci. 3, 103 (2017). https://doi.org/10.7717/peerj-cs.103

[24]

Mitov, K., Nadarajah, S.: Beta distributions in stochastic processes. In: Statistics Textbooks and Monographs 174, pp. 165–202 (2004). MR2079709

[25]

Mühlbach, G.v.: Rekursionsformeln für die zentralen momente der pólya-und der beta-verteilung. Metrika 19(1), 171–177 (1972). MR0375563. https://doi.org/10.1007/BF01893292

[26]

Perry, A., Wein, A.S., Bandeira, A.S., et al.: Statistical limits of spiked tensor models. In: Annales de l’Institut Henri Poincaré, Probabilités et Statistiques, vol. 56, pp. 230–264 (2020). Institut Henri Poincaré. MR4058987. https://doi.org/10.1214/19-AIHP960

[27]

Skórski, M.: Bernstein Bounds for Beta Distribution. OSF (2023). https://doi.org/10.17605/OSF.IO/YSVDH

[28]

Stucchio, C.: Bayesian a/b testing at vwo. Whitepaper, Visual Website Optimizer (2015).

[29]

Vidūnas, R.: Contiguous relations of hypergeometric series. J. Comput. Appl. Math. 153(1-2), 507–519 (2003). MR1985719. https://doi.org/10.1016/S0377-0427(02)00643-X

[30]

Wainwright, M.J.: High-dimensional Statistics: A Non-asymptotic Viewpoint vol. 48. Cambridge University Press (2019) MR3967104. https://doi.org/10.1017/9781108627771

[31]

Williams, D.: 394: The analysis of binary responses from toxicological experiments involving reproduction and teratogenicity. Biometrics 31(4), 949–952 (1975).

[32]

Wolfram, M.: www.functions.wolfram.com/HypergeometricFunctions (2020).

[33]

Zhang, A.R., Zhou, Y.: On the non-asymptotic and sharp lower tail bounds of random variables. Stat 9(1), 314 (2020). MR4193419. https://doi.org/10.1002/sta4.314

[34]

Zhang, J., Wu, Y.: Beta approximation to the distribution of Kolmogorov-Smirnov statistic. Ann. Inst. Stat. Math. 54(3), 577–584 (2002). MR1932402. https://doi.org/10.1023/A:1022463111224

Reading mode

Table of contents

1 Introduction
2 Results
3 Preliminaries
4 Proofs
5 Conclusion
Acknowledgments
References

Open access article under the CC BY license.

Keywords

Beta distribution concentration bounds Bernstein inequality

MSC2010

60E05

Metrics

since March 2018

566

Article info
views

609

Full article
views

627

PDF
downloads

119

XML
downloads

RSS

Figures
9
Tables
1
Theorems
3

Listing 1.

Efficient algorithm finding exact formulas for central moments

Fig. 1.

Numerical evaluation of the best sub-gamma bounds (Theorem 1) and the best sub-Gaussian bounds ([21])

Fig. 2.

The logarithmic inequality from Lemma 3

Listing 2.

Simplifying Hypergeometric Recurrence

Listing 3.

A Padé-type logarithm inequality

Fig. 3.

The proof roadmap

Listing 4.

Functional inequality on the moment generating function

Listing 5.

Optimizing the Cramér–Chernoff Bound

Listing 6.

Best possible Chernoff-type tail bounds

Table 1.

Central beta moments, generated from Theorem 3 using Listing 1

Theorem 1 (Bernstein’s Inequality).

Theorem 2 (Best Cramér–Chernoff Bound).

Theorem 3 (Order 2 Recurrence for Central Moments).

Listing 1.

Efficient algorithm finding exact formulas for central moments

Fig. 1.

Numerical evaluation of the best sub-gamma bounds (Theorem 1) and the best sub-Gaussian bounds ([21])

Fig. 2.

The logarithmic inequality from Lemma 3

Listing 2.

Simplifying Hypergeometric Recurrence

Listing 3.

A Padé-type logarithm inequality

Fig. 3.

The proof roadmap

Listing 4.

Functional inequality on the moment generating function

Listing 5.

Optimizing the Cramér–Chernoff Bound

Listing 6.

Best possible Chernoff-type tail bounds

Table 1.

Central beta moments, generated from Theorem 3 using Listing 1

Central Moment	Explicit Formula
$\mathbf{E}[X]$	$\frac{\alpha }{\alpha +\beta }$
$\mathbf{Var}[X]$	$\frac{\alpha \beta }{{(\alpha +\beta )^{2}}(\alpha +\beta +1)}$
$\textbf{Skew}[X]$	$\frac{2\left(\beta -\alpha \right)\sqrt{\alpha +\beta +1}}{\sqrt{\alpha \beta }\left(\alpha +\beta +2\right)}$
$\textbf{Kurt}[X]$	$\frac{3\left(\alpha \beta \left(\alpha +\beta +2\right)+2{\left(\alpha -\beta \right)^{2}}\right)\left(\alpha +\beta +1\right)}{\alpha \beta \left(\alpha +\beta +2\right)\left(\alpha +\beta +3\right)}$
$\mathbf{E}[{(X-\mathbf{E}[X])^{5}}]/{\sqrt{\mathbf{Var}[X]}^{5}}$	$\frac{4\left(\beta -\alpha \right){\left(\alpha +\beta +1\right)^{\frac{3}{2}}}\left(3\alpha \beta \left(\alpha +\beta +2\right)+2\alpha \beta \left(\alpha +\beta +3\right)+6{\left(\alpha -\beta \right)^{2}}\right)}{{\alpha ^{\frac{3}{2}}}{\beta ^{\frac{3}{2}}}\left(\alpha +\beta +2\right)\left(\alpha +\beta +3\right)\left(\alpha +\beta +4\right)}$

Theorem 1 (Bernstein’s Inequality).

Let $X\sim \mathsf{Beta}(\alpha ,\beta )$. Define the parameters

(3)

Then the upper tail of X is bounded as

(4)

and the lower tail of X is bounded as

(5)

Theorem 2 (Best Cramér–Chernoff Bound).

The best bound that can be obtained from the Cramér–Chernoff method is

(6)

\[ \mathbf{P}\left\{X>\mathbf{E}[X]+\epsilon \right\}\leqslant {\mathrm{e}^{-\frac{{\epsilon ^{2}}}{2v}+\frac{c{\epsilon ^{3}}}{6{v^{2}}}+O\left({\epsilon ^{4}}\right)}},\]

as $\epsilon \to 0$, with constants c, v as in Theorem 1.

Theorem 3 (Order 2 Recurrence for Central Moments).

For $X\sim \mathsf{Beta}(\alpha ,\beta )$ and any integer order $d\geqslant 2$, the following recurrence relation holds:

Authors

Abstract

1 Introduction

1.1 Background

(1)

(2)

1.2 Contribution

1.3 Related work

Remark 1.

1.4 Organization

2 Results

2.1 Optimal Bernstein-type concentration bounds

Theorem 1 (Bernstein’s Inequality).

(3)

(4)

(5)

Remark 2 (Variance and Scale Parameters).

Remark 3 (Mixed Gaussian-Exponential Behaviour).

Theorem 2 (Best Cramér–Chernoff Bound).

(6)

Corollary 1 (Optimality of Theorem 1).

2.2 Handy recurrence for central moments

Theorem 3 (Order 2 Recurrence for Central Moments).

(7)

Corollary 2 (P-recursive Property).

Listing 1.

Table 1.

Corollary 3 (Skewness of Beta Distribution).

2.3 Numerical evaluation

Fig. 1.

3 Preliminaries

3.1 Gaussian hypergeometric function

(8)

(9)

Lemma 1 (Hypergeometric Contiguous Recurrence).

(10)

3.2 Beta distribution properties

Lemma 2 (Central Beta Moments).

(11)

3.3 Cramér–Chernoff method

(12)

(13)

(14)

(15)

3.4 Logarithmic inequalities

Lemma 3 (Padé approximation of logarithm).

(16)

Fig. 2.

4 Proofs

4.1 Proof of Lemma 2

(17)

(18)

(19)

4.2 Proof of Theorem 3

(20)

(21)

Listing 2.

(22)

(23)

(24)

4.3 Proof of Lemma 3

(25)

Listing 3.

4.4 Proof of Theorem 1

(26)

(27)

(28)

(29)

Fig. 3.

Claim 1 (Cumulant generating function as integral).

(30)

Proof.

Claim 2.

(31)

Proof of Claim.

Remark 4.

(32)

Claim 3.

(33)

Proof of Claim.