1 Introduction
1.1 Background and related work
The binomial distribution $\mathrm{Binom}(n,p)$, which counts the total number of successes within n independent trials each succeeding with probability p, is of historical and fundamental importance for probability theory and applied statistical inference. In particular, it appears in quantitative variants of the central limit theorem [16, 8, 25, 29], and is broadly used in statistical modelling [17, 1, 9, 7, 26] also as a building block of more advanced models [27]; this includes performing A/B tests on conversion rates popular in e-commerce [11].
Despite this large body of work on approximate inference, little is known about the exact higher moments of the binomial distribution. Besides being of natural interest, such formulas are needed to develop various concentration bounds; for example, binomial moment bounds were proven useful for analyzing random projections [23].
While the textbooks usually cover only the variance, sometimes also the skewness and kurtosis), there have been only few research papers discussing formulas for binomial moments of order $d\gt 4$. The first recursion formula for binomial moments appeared in [3] for the special case of $p=\frac{1}{2}$. The case of general p was handled in [13] by means of recursions utilizing Stirling numbers of the first kind. This was subsequently simplified by means of moment generating functions in [15], and resulted in a more compact formula involving Stirling numbers of the second kind. Very recently, a recursion-free derivation of raw moments has been presented in [19]. Overall, the common idea is to see the moments as (more or less explicit) polynomials in n and p and group terms, in order to simplify the formula or establish other desired properties such as nonnegativity of terms or boundedness of recursion depth (as in the recent result on Beta moments [24]).
The discussed approaches still do not offer a satisfactory answer, as the formulas are not handy enough to be directly applicable. The author of the most general formula in [15] didn’t manage to obtain nonnaive bounds on the binomial moments: the bound $O{(nq)^{d}}$ with $q=1-p$ on the d-th central moment [15] valid for $p\lt \frac{1}{2}$ is trivial as the centered binomial random variable is bounded between $-pn$ and $qn$ – no extra formulas are needed; this bound is far from the true behaviour $O{(nq)^{d/2}}$ when $nq\to \infty $ (obtained by the Central Limit Theorem). The main formula in [15] is actually a mixture of positive and negative contributions, which makes its numerical convergence problematic and theoretical analysis very difficult (as seen above). Moreover, all the prior works do not fully exploit the symmetry and produce rather complicated formulas in terms of p; one could expect simpler expressions on the central moments with the variance $n{\sigma ^{2}}=np(1-p)$ as a variable. Lastly, the discussed prior works are rather scarce in their presentation of related works and techniques, in particular they seem to have overlooked that the formulas with the appearance of Stirling numbers follow easier by the established approach of factorial moments [14, 2].
1.2 Summary of contributions
Addressing the aforementioned issues with approaches in prior works, this paper offers the following novel contributions to computing binomial moments:
In summary, when compared to prior works, these results bring a broader scope of the techniques, as well as lead to arguably more handy formula; another added value is the contributed algorithm and its Python implementation.1
-
• link to factorial moment, which simplifies the approach from prior works,
-
• variance-formula for equivalent yet simpler expressions in ${\sigma ^{2}}=p(1-p)$,
-
• algorithms with implementation to compute the variance-formula,
-
• stable formula as an explicit sum with positive terms,
-
• asymptotically sharp bounds on binomial moments as an application.
1.3 Organization
The remainder of the paper is organized as follows: the necessary background is given in Section 2, proofs appear in Section 4, the results are presented in Section 3, the sharp asymptotics are discussed in Section 5, and Section 6 concludes the work. The Python implementation is presented in Appendix A.
2 Preliminaries
2.2 Moments
Let d be a positive integer. The raw moment of order d of a random variable S is defined as $\mathbb{E}[{S^{d}}]$, while the central moment of order d of S equals $\mathbb{E}[{(S-\mathbb{E}[S])^{d}}]$. We also use the factorial moment, defined as $\mathbb{E}[{S^{\underline{d}}}]$ where ${x^{\underline{d}}}=x(x-1)\cdots (x-(d-1))$ is called the d-th falling factorial [12].
2.3 Special numbers
To state the results, we need Stirling numbers of the second kind. The symbol $\left\{\genfrac{}{}{0.0pt}{}{n}{k}\right\}$ stands for the number of ways of partitioning an n element set into k nonempty subsets. We also need multinomial coefficients defined as $\left(\genfrac{}{}{0.0pt}{}{d}{{d_{1}}\dots {d_{k}}}\right)=\frac{d!}{{d_{1}}!\cdots {d_{k}}!}$ when ${\textstyle\sum _{i=1}^{k}}{d_{i}}=d$ and ${\min _{i}}{d_{i}}\geqslant 0$ and 0 otherwise, which extend the binomial coefficients. By the multinomial theorem, we have that ${({x_{1}}+\cdots +{x_{n}})^{d}}={\textstyle\sum _{{d_{1}},\dots ,{d_{n}}}}\left(\genfrac{}{}{0.0pt}{}{d}{{d_{1}}\dots {d_{n}}}\right){x^{{d_{1}}}}\cdots {x_{n}^{{d_{n}}}}$, extending the binomial formula.
2.4 Polynomials
To work out the desired polynomial formulas, we need some standard algebraic notation. By $\mathbb{Z}[{x_{1}},\dots ,{x_{k}}]$ we denote polynomials with integer coefficients in variables ${x_{1}},\dots ,{x_{k}}$. A polynomial is symmetric if it is invariant under interchanging variables, and antisymmetric when it gets negated under exchange of any two variables. The fundamental theorem of symmetric polynomials states that any symmetric polynomial from $\mathbb{Z}[{x_{1}},\dots ,{x_{k}}]$ can be written as a polynomial in the elementary symmetric functions ${s_{j}}({x_{1}},\dots ,{x_{n}})={\textstyle\sum _{1\leqslant {i_{1}}\dots \leqslant {i_{j}}}}{x_{{i_{1}}}}\cdots {x_{{i_{j}}}}$ for $j=1,\dots ,k$, with integer coefficients. Moreover, antisymmetric polynomials can be written as a product of a symmetric polynomial and Vandermonde’s determinant ${\textstyle\prod _{1\leqslant i\lt j\leqslant k}}({x_{i}}-{x_{j}})$ (see, for example, [22, 28]).
3 Results
Below we discuss the contributions in more detail, deferring proofs to the end part of the paper. We denote $S\sim \mathrm{Binom}(n,p)$, and fix a positive integer d.
3.1 Raw binomial moments and factorial moments
The first result is the derivation of the closed-form formula for raw binomial moments. While this formula appears in prior works [3, 15, 13], the arguments developed here are simpler, as they avoid the machinery of generating functions. The two alternative proof techniques are, respectively: a) linking central and factorial moments b) a direct counting argument.
The two proofs appear respectively in Section 4.1 and Section 4.2. Below, in Table 1, we list the explicit expressions for the first 10 moments.
Table 1.
Formulas for Raw Binomial Moments
d | $\mathbb{E}{[S]^{d}},\hspace{1em}S\sim \mathrm{Binom}(n,p)$ |
1 | $pn$ |
2 | ${p^{2}}{n^{\underline{2}}}+p{n^{}}$ |
3 | ${p^{3}}{n^{\underline{3}}}+3{p^{2}}{n^{\underline{2}}}+p{n^{}}$ |
4 | ${p^{4}}{n^{\underline{4}}}+6{p^{3}}{n^{\underline{3}}}+7{p^{2}}{n^{\underline{2}}}+p{n^{}}$ |
5 | ${p^{5}}{n^{\underline{5}}}+10{p^{4}}{n^{\underline{4}}}+25{p^{3}}{n^{\underline{3}}}+15{p^{2}}{n^{\underline{2}}}+p{n^{}}$ |
6 | ${p^{6}}{n^{\underline{6}}}+15{p^{5}}{n^{\underline{5}}}+65{p^{4}}{n^{\underline{4}}}+90{p^{3}}{n^{\underline{3}}}+31{p^{2}}{n^{\underline{2}}}+p{n^{}}$ |
7 | ${p^{7}}{n^{\underline{7}}}+21{p^{6}}{n^{\underline{6}}}+140{p^{5}}{n^{\underline{5}}}+350{p^{4}}{n^{\underline{4}}}+301{p^{3}}{n^{\underline{3}}}+63{p^{2}}{n^{\underline{2}}}+p{n^{}}$ |
8 | ${p^{8}}{n^{\underline{8}}}+28{p^{7}}{n^{\underline{7}}}+266{p^{6}}{n^{\underline{6}}}+1050{p^{5}}{n^{\underline{5}}}+1701{p^{4}}{n^{\underline{4}}}+966{p^{3}}{n^{\underline{3}}}+127{p^{2}}{n^{\underline{2}}}+p{n^{}}$ |
9 | ${p^{9}}{n^{\underline{9}}}+36{p^{8}}{n^{\underline{8}}}+462{p^{7}}{n^{\underline{7}}}+2646{p^{6}}{n^{\underline{6}}}+6951{p^{5}}{n^{\underline{5}}}+7770{p^{4}}{n^{\underline{4}}}+3025{p^{3}}{n^{\underline{3}}}+255{p^{2}}{n^{\underline{2}}}+p{n^{}}$ |
10 | ${p^{10}}{n^{\underline{10}}}+45{p^{9}}{n^{\underline{9}}}+750{p^{8}}{n^{\underline{8}}}+5880{p^{7}}{n^{\underline{7}}}+22827{p^{6}}{n^{\underline{6}}}+42525{p^{5}}{n^{\underline{5}}}+34105{p^{4}}{n^{\underline{4}}}+9330{p^{3}}{n^{\underline{3}}}+511{p^{2}}{n^{\underline{2}}}+p{n^{}}$ |
3.2 Central binomial moments
Symmetric structure. While in prior works the formulas are derived in terms of p, we go beyond that exploiting the symmetry and showing that the formulas can be written in terms of the variance ${\sigma ^{2}}=p(1-p)$, which makes them much simpler. The following theorem proves what can be conjectured by inspection of known formulas for small-order moments.
Theorem 2 (Existence of Variance-Based Formula).
For $S\sim \mathrm{Binom}(n,p)$ the moment $\mathbb{E}[{(S-\mathbb{E}[S])^{d}}]$ is a symmetric polynomial in p and q when d is even, and antisymmetric when d is odd. In particular, denoting ${\sigma ^{2}}\triangleq pq$ we have
Table 2 illustrates this result, providing explicit moments of order $d=2,\dots ,10$. The usefulness of the formula in Theorem 2 is its simplicity when compared to the representation in terms of p alone. The result is intuitive, but not straightforward to prove; two arguments are given, based on a) the theory of symmetric functions, see Section 4.3 and b) the novel combinatorial formula, see Section 4.4. The algorithm deriving the exact formulas is discussed later.
Table 2.
Central Moments of Binomial Distribution. As above we denote ${\sigma ^{2}}=p(1-p)$
d | $\mathbb{E}[{(S-\mathbb{E}[S])^{d}}],\hspace{1em}S\sim \mathrm{Binom}(n,p)$ |
2 | $n{\sigma ^{2}}$ |
3 | $n{\sigma ^{2}}\left(-2p+1\right)$ |
4 | $3{n^{2}}{\sigma ^{4}}+n\left(-6{\sigma ^{4}}+{\sigma ^{2}}\right)$ |
5 | $\left(-2p+1\right)\left(10{n^{2}}{\sigma ^{4}}+n\left(-12{\sigma ^{4}}+{\sigma ^{2}}\right)\right)$ |
6 | $15{n^{3}}{\sigma ^{6}}+{n^{2}}\left(-130{\sigma ^{6}}+25{\sigma ^{4}}\right)+n\left(120{\sigma ^{6}}-30{\sigma ^{4}}+{\sigma ^{2}}\right)$ |
7 | $\left(-2p+1\right)\left(105{n^{3}}{\sigma ^{6}}+{n^{2}}\left(-462{\sigma ^{6}}+56{\sigma ^{4}}\right)+n\left(360{\sigma ^{6}}-60{\sigma ^{4}}+{\sigma ^{2}}\right)\right)$ |
8 | $105{n^{4}}{\sigma ^{8}}+{n^{3}}\left(-2380{\sigma ^{8}}+490{\sigma ^{6}}\right)+{n^{2}}\left(7308{\sigma ^{8}}-2156{\sigma ^{6}}+119{\sigma ^{4}}\right)+n\left(-5040{\sigma ^{8}}+1680{\sigma ^{6}}-126{\sigma ^{4}}+{\sigma ^{2}}\right)$ |
9 | $\left(-2p+1\right)\left(1260{n^{4}}{\sigma ^{8}}+{n^{3}}\left(-13216{\sigma ^{8}}+1918{\sigma ^{6}}\right)+{n^{2}}\left(32112{\sigma ^{8}}-6948{\sigma ^{6}}+246{\sigma ^{4}}\right)+n\left(-20160{\sigma ^{8}}+5040{\sigma ^{6}}-252{\sigma ^{4}}+{\sigma ^{2}}\right)\right)$ |
10 | $945{n^{5}}{\sigma ^{10}}+{n^{4}}\left(-44100{\sigma ^{10}}+9450{\sigma ^{8}}\right)+{n^{3}}\left(303660{\sigma ^{10}}-99120{\sigma ^{8}}+6825{\sigma ^{6}}\right)+{n^{2}}\left(-623376{\sigma ^{10}}+240840{\sigma ^{8}}-24438{\sigma ^{6}}+501{\sigma ^{4}}\right)+n\left(362880{\sigma ^{10}}-151200{\sigma ^{8}}+17640{\sigma ^{6}}-510{\sigma ^{4}}+{\sigma ^{2}}\right)$ |
Corollary 1 (Skewness and Kurtosis of Binomial Distribution).
The skewness and excess kurtosis equal, respectively, $\frac{q-p}{\sqrt{npq}}$ and $\frac{1-6pq}{npq}$.
Positive polynomial representation. As mentioned in the introduction, the only closed-form formula due to [15] is an alternating sum with no readable leading term, which makes it hard to use; in particular the discussion in [15] fails to give nontrivial bounds on binomial moments. The novelty of this work is a formula consisting of positive terms. This makes it more stable for numerical computations and more handy in theoretical analysis.
Theorem 3 (Explicit Stable Formulas).
For $S\sim \mathrm{Binom}(n,p)$, $q\triangleq 1-p$ and any positive integer d the following holds:
\[ \mathbb{E}[{(S-\mathbb{E}[S])^{d}}]={\sum \limits_{k=1}^{\lfloor \frac{d}{2}\rfloor }}\left(\genfrac{}{}{0.0pt}{}{n}{k}\right){(pq)^{k}}\sum \limits_{{d_{1}}\dots {d_{k}}\geqslant 2}\left(\genfrac{}{}{0.0pt}{}{d}{{d_{1}}\dots {d_{k}}}\right){\prod \limits_{i=1}^{k}}({q^{{d_{i}}-1}}-{(-p)^{{d_{i}}-1}}).\]
Remark 1 (Nonnegativity).
The terms under the sum are all nonnegative when $p\leqslant \frac{1}{2}$. Since $n-S\sim \mathrm{Binom}(n,q)$, it follows that we have $S-\mathbb{E}[S]\sim -(\mathrm{Binom}(n,q)-qn)$; with the help of this identity, working with central binomial moments can be always reduced to the case $p\lt \frac{1}{2}$.
Remark 2 (Closed-form Symmetric Formula).
The above result gives an explicit formula for Theorem 2, and provides an alternative proof of that result.
3.3 Asymptotically sharp moment bounds
To illustrate how useful is our positive representation established in Theorem 3, we derive the sharp bounds on (normalized) central binomial moments. This problem has remained open so far; some recent works used ad hoc upper bounds to estimate the binomial moments and tails (works on random projections, particularly [23]).
Theorem 4.
Let $S\sim \mathrm{Binom}(n,p)$ and ${\sigma ^{2}}=p(1-p)$. Then for any integer $d\gt 1$ we have
\[ {\left|\mathbb{E}[{(S-\mathbb{E}[S])^{d}}]\right|^{1/d}}=C(n,p,d)\cdot \max \left\{{k^{1-\frac{k}{d}}}\cdot {(n{\sigma ^{2}})^{\frac{k}{d}}}:k=1,\dots ,\lfloor d/2\rfloor \right\},\]
where $C(n,p,d)$ is uniformly bounded by ${(3\mathrm{e})^{-1}}\leqslant C(n,p,d)\leqslant {(5/2)^{1/5}}{\mathrm{e}^{1/2}}$.
The estimate is uniformly sharp in all parameters; for the special case when $n{\sigma ^{2}}\to \infty $ and d is fixed, the d-th central moment grows, up to a constant, as ${(n{\sigma ^{2}})^{d/2}}$ which matches the central limit theorem combined with the explicit formulas for moments of the normal distribution [20]. In contrast to Theorem 4, the formula in [15] gives in this setup only much worse $O{(nq)^{d}}$, which anyway follows trivially since S is bounded by n. The proof uses Theorem 3.
3.4 Algorithms and implementation
We have seen that the variance-based representation in Theorem 2 is useful, yet it is not immediate how to efficiently compute this polynomial. To this end, we develop two different algorithms, both implemented in the popular Python symbolic algebra package Sympy [18].
Algorithm 1 takes advantage of the fundamental theorem on symmetric polynomials (see, for example, [10]). Specifically, there is an explicit procedure for converting any symmetric polynomial in $p,q$ into a polynomial in variables $p+q,pq$ (the basic symmetric polynomials in two variables); substituting $p+q=1$ we are left with a polynomial in ${\sigma ^{2}}=pq$. For even d we start with a symmetric polynomial and obtain a polynomial in ${\sigma ^{2}}$. In turn, for odd d we start with an antisymmetric polynomial and apply this procedure to its symmetric part, which results in a polynomial in ${\sigma ^{2}}$ plus the factor $q-p=1-2p$.
In turn, Algorithm 2 uses the power of elimination theory, to recover the representation in ${\sigma ^{2}}$ from any formula written in terms of p. Essentially, it simplifies the input polynomial in p with respect to the polynomial ${\sigma ^{2}}-p(1-p)$ leveraging the elimination properties of Groebner bases (see, for example, [5]). The output is a polynomial in ${\sigma ^{2}}$ (plus the factor $1-2p$ for odd d).
4 Proofs
4.1 First proof of Theorem 1
The proof is based on the fact that the factorial moments of the binomial distribution are easy to compute. Namely (see [21]), we have the following proposition.
Then it remains to connect factorial moments to standard moments, or in other terms: factorial powers to powers. It is well known (see, for example, the discussion in [4]) that this base change is given in terms of the Stirling numbers of the second kind. We state this fact formally below.
4.2 Second proof of Theorem 1
Here we take a direct approach, writing $S={\textstyle\sum _{i=1}^{n}}{X_{i}}$ where ${X_{i}}{\sim ^{iid}}\mathrm{Bern}(p)$. Using the multinomial expansion and the independence of ${X_{i}}$ we obtain
\[ \mathbb{E}[{S^{d}}]=\sum \limits_{{d_{1}},\dots ,{d_{n}}}\left(\genfrac{}{}{0.0pt}{}{d}{{d_{1}}\dots {d_{n}}}\right)\prod \limits_{i}\mathbb{E}[{X_{i}^{{d_{i}}}}].\]
We now group the expressions in the above sum, depending on the number of nonzero elements in ${({d_{i}})_{i}}$. Denoting $\| ({d_{i}}){\| _{0}}=\mathrm{\# }\{i:{d_{i}}\gt 0\}$ and using the fact that $\mathbb{E}{X_{i}^{{d_{i}}}}=p$ when ${d_{i}}\gt 0$, we obtain
\[ \mathbb{E}[{S^{d}}]={\sum \limits_{k=0}^{d}}\sum \limits_{{({d_{i}})_{i}}:\| {({d_{i}})_{i}}{\| _{0}}=k}\left(\genfrac{}{}{0.0pt}{}{d}{{d_{1}}\dots {d_{n}}}\right){p^{k}}.\]
By the symmetry of multinomial coefficients this equals
\[ \mathbb{E}[{S^{d}}]={\sum \limits_{k=0}^{d}}\sum \limits_{{d_{1}}\dots {d_{k}}\gt 0}\left(\genfrac{}{}{0.0pt}{}{d}{{d_{1}}\dots {d_{k}}}\right)\left(\genfrac{}{}{0.0pt}{}{n}{k}\right){p^{k}}.\]
Finally, we observe that the expression ${\textstyle\sum _{{d_{1}}\dots {d_{k}}\gt 0}}\left(\genfrac{}{}{0.0pt}{}{d}{{d_{1}}\dots {d_{k}}}\right)$ counts the number of ways of partitioning $\{1,\dots ,d\}$ into k nonempty labeled subsets; thus, this numbers equals $k!\cdot \left\{\genfrac{}{}{0.0pt}{}{d}{k}\right\}$ which finishes the proof.4.3 First proof of Theorem 2
From Equation (1) we obtain
\[ \mathbb{E}[{(S-\mathbb{E}[S])^{d}}]=\sum \limits_{k}\left(\genfrac{}{}{0.0pt}{}{n}{k}\right){p^{k}}{q^{n-k}}{(k-np)^{d}}.\]
Replacing k by $n-k$ and using the symmetry of binomial coefficients we obtain
\[ \mathbb{E}[{(S-\mathbb{E}[S])^{d}}]=\sum \limits_{k}\left(\genfrac{}{}{0.0pt}{}{n}{k}\right){p^{n-k}}{q^{k}}{(nq-k)^{d}}.\]
When d is even, comparing these two equivalent expressions we see that they are symmetric as polynomials in p and q. By the fundamental theorem of symmetric polynomials, this can be written as a polynomial in $pq$ and $p+q$; in our case $p+q=1$ and the claim follows. If d is odd then ${(nq-k)^{d}}=-{(k-nq)^{d}}$ and we get antisymmetric polynomials in $p,q$ which can be written as a product of $p-q$ and a symmetric polynomial. The latter, by the fundamental theorem, is a polynomial in $p+q$ and $pq$; since $p+q=1$ the result follows.4.4 Second proof of Theorem 2
By inspecting the products ${\textstyle\prod _{i=1}^{k}}({q^{{d_{i}}-1}}-{(-p)^{{d_{i}}-1}})$ that appear in Theorem 3, it can be seen that each of them is symmetric in $p,q$ when ${\textstyle\sum _{i}}{d_{i}}=d$ is even, and antisymmetric when ${\textstyle\sum _{i}}{d_{i}}=d$ is odd. This is because $p,q\to {q^{{d_{i}}-1}}-{(-p)^{{d_{i}}-1}}$ is symmetric when ${d_{i}}$ is even, and antisymmetric otherwise. The claim now follows.
4.5 Proof of Theorem 3
As in the proof of Theorem 1 we arrive at
\[ \mathbb{E}[{(S-\mathbb{E}[S])^{d}}]={\sum \limits_{k=0}^{d}}\sum \limits_{{d_{1}}\dots {d_{k}}\gt 0}\left(\genfrac{}{}{0.0pt}{}{n}{k}\right)\left(\genfrac{}{}{0.0pt}{}{d}{{d_{1}}\dots {d_{k}}}\right){\prod \limits_{i=1}^{k}}\mathbb{E}[{({X_{i}}-\mathbb{E}[{X_{i}}])^{{d_{i}}}}].\]
Denote $x=1-\frac{1}{1-p}$, then $\mathbb{E}{[({X_{i}}-\mathbb{E}[{X_{i}}])]^{{d_{i}}}}=p{(1-p)^{{d_{i}}}}(1-{x^{{d_{i}}-1}})$ and thus
\[ \mathbb{E}[{(S-\mathbb{E}[S])^{d}}]={(1-p)^{d}}{\sum \limits_{k=0}^{d}}\left(\genfrac{}{}{0.0pt}{}{n}{k}\right){p^{k}}\sum \limits_{{d_{1}}\dots {d_{k}}\gt 0}\left(\genfrac{}{}{0.0pt}{}{d}{{d_{1}}\dots {d_{k}}}\right){\prod \limits_{i=1}^{k}}(1-{x^{{d_{i}}-1}}).\]
With some further simplifications and grouping we can write
\[ \mathbb{E}[{(S-\mathbb{E}[S])^{d}}]={(1-p)^{d}}{\sum \limits_{k=1}^{\lfloor \frac{d}{2}\rfloor }}\left(\genfrac{}{}{0.0pt}{}{n}{k}\right){p^{k}}\underset{{U_{k}}}{\underbrace{\sum \limits_{{d_{1}}\dots {d_{k}}\geqslant 2}\left(\genfrac{}{}{0.0pt}{}{d}{{d_{1}}\dots {d_{k}}}\right){\prod \limits_{i=1}^{k}}(1-{x^{{d_{i}}-1}})}},\]
or equivalently
\[ \mathbb{E}[{(S-\mathbb{E}[S])^{d}}]={\sum \limits_{k=1}^{\lfloor \frac{d}{2}\rfloor }}\left(\genfrac{}{}{0.0pt}{}{n}{k}\right){(pq)^{k}}\sum \limits_{{d_{1}}\dots {d_{k}}\gt 1}\left(\genfrac{}{}{0.0pt}{}{d}{{d_{1}}\dots {d_{k}}}\right){\prod \limits_{i=1}^{k}}({q^{{d_{i}}-1}}-{(-p)^{{d_{i}}-1}}).\]
This finishes the proof. In addition to that, in what follows, we discuss how to further group terms and speed up computations. We can write
\[ {U_{k}}={\sum \limits_{j=0}^{k}}\left(\genfrac{}{}{0.0pt}{}{k}{j}\right)\sum \limits_{{d_{1}}\dots {d_{j}}\gt 1}\sum \limits_{{d_{j+1}}\dots {d_{k}}\gt 1}\left(\genfrac{}{}{0.0pt}{}{d}{{d_{1}}\dots {d_{k}}}\right){(-1)^{j}}{x^{{d_{1}}+\cdots +{d_{j}}-j}},\]
and since $\left(\genfrac{}{}{0.0pt}{}{d}{{d_{1}}\dots {d_{k}}}\right)=\left(\genfrac{}{}{0.0pt}{}{{d_{1}}+\cdots +{d_{j}}}{{d_{1}}\dots {d_{j}}}\right)\cdot \left(\genfrac{}{}{0.0pt}{}{d-({d_{1}}+\cdots +{d_{j}})}{{d_{j+1}}\dots {d_{k}}}\right)\cdot \left(\genfrac{}{}{0.0pt}{}{d}{{d_{1}}+\cdots +{d_{j}}}\right)$ we obtain
\[ {U_{k}}={\sum \limits_{j=0}^{k}}\left(\genfrac{}{}{0.0pt}{}{k}{j}\right)\sum \limits_{\ell }\left(\genfrac{}{}{0.0pt}{}{d}{\ell }\right)\sum \limits_{{d_{1}}\dots {d_{k}}\gt 1}\left(\genfrac{}{}{0.0pt}{}{\ell }{{d_{1}}\dots {d_{j}}}\right)\left(\genfrac{}{}{0.0pt}{}{d-\ell }{{d_{j+1}}\dots {d_{k}}}\right){(-1)^{j}}{x^{\ell -j}},\]
and thus
\[\begin{aligned}{}{U_{k}}& ={\sum \limits_{j=0}^{k}}{\sum \limits_{\ell =0}^{d}}\left(\genfrac{}{}{0.0pt}{}{k}{j}\right)\left(\genfrac{}{}{0.0pt}{}{d}{\ell }\right)j!(k-j)!{S_{2}}(\ell ,j){S_{2}}(d-\ell ,k-j){(-1)^{j}}{x^{\ell -j}}\\ {} & =k!{\sum \limits_{\ell =0}^{d}}\left(\genfrac{}{}{0.0pt}{}{d}{\ell }\right){\sum \limits_{j=0}^{k}}{S_{2}}(\ell ,j){S_{2}}(d-\ell ,k-j){(-1)^{j}}{x^{\ell -j}},\end{aligned}\]
where ${S_{2}}(n,k)$ denotes the number of ways of partitioning an n-element set into k subsets of cardinality at least 2 (a variation on Stirling numbers of the second kind). This can be used to develop an equivalent, but faster to compute, formula.5 Applications
5.1 Proof of Theorem 4
Throughout the proof, we will use the elementary estimates ${(m/\mathrm{e})^{m}}\leqslant m!\leqslant {m^{m}}$ (the second inequality is obvious, the first one follows by rearranging to ${\mathrm{e}^{m}}\geqslant \frac{{m^{m}}}{m!}$ and Taylor’s expansion) and ${(n/k)^{k}}\leqslant \left(\genfrac{}{}{0.0pt}{}{n}{k}\right)\leqslant {(n\mathrm{e}/k)^{k}}$ (see, for example, [6]).
By applying Remark 1 we can assume $p\leqslant \frac{1}{2}$. Then $q\geqslant p$, and thus $0\leqslant {q^{{d_{i}}-1}}-{(-p)^{{d_{i}}-1}}\leqslant p+q=1$ for any ${d_{i}}\geqslant 2$. In view of Theorem 3, we obtain the bound
\[ \mathbb{E}[{(S-\mathbb{E}[S])^{d}}]\leqslant {\sum \limits_{k=1}^{\lfloor \frac{d}{2}\rfloor }}\left(\genfrac{}{}{0.0pt}{}{n}{k}\right){(pq)^{k}}\sum \limits_{{d_{1}}\dots {d_{k}}\geqslant 2}\left(\genfrac{}{}{0.0pt}{}{d}{{d_{1}}\dots {d_{k}}}\right).\]
Since we have
\[ \sum \limits_{{d_{1}}\dots {d_{k}}\geqslant 2}\left(\genfrac{}{}{0.0pt}{}{d}{{d_{1}}\dots {d_{k}}}\right)\leqslant \sum \limits_{{d_{1}}\dots {d_{k}}\geqslant 0}\left(\genfrac{}{}{0.0pt}{}{d}{{d_{1}}\dots {d_{k}}}\right)={k^{d}},\]
we further obtain
\[ \mathbb{E}[{(S-\mathbb{E}[S])^{d}}]\leqslant {\sum \limits_{k=1}^{\lfloor \frac{d}{2}\rfloor }}\left(\genfrac{}{}{0.0pt}{}{n}{k}\right){(pq)^{k}}{k^{d}}.\]
Denoting ${\sigma ^{2}}=pq$ and using the bound $\left(\genfrac{}{}{0.0pt}{}{n}{k}\right)\leqslant {(n\mathrm{e}/k)^{k}}$, we finally obtain
\[ \mathbb{E}[{(S-\mathbb{E}[S])^{d}}]\leqslant \lfloor d/2\rfloor \cdot \max \{{\mathrm{e}^{k}}{k^{d-k}}\cdot {(n{\sigma ^{2}})^{k}}:1\leqslant k\leqslant \lfloor d/2\rfloor \},\]
so that (noticing that ${(d/2)^{1/d}}\leqslant {(5/2)^{1/5}}$ for $d\geqslant 2$)
We now move on to the lower bound. The idea is to fix k and look at terms such that ${d_{1}},\dots ,{d_{k}}$ are nearly equal, to make $\left(\genfrac{}{}{0.0pt}{}{d}{{d_{1}}\dots {d_{k}}}\right)$ possibly large.
Let us write $d=k\cdot r+\ell $ with an integer $r\geqslant 2$ and nonnegative integer ℓ such that $0\leqslant \ell \lt k$; this is possible as $k\leqslant d/2$, by writing $d=k\cdot r+\ell $ where $0\leqslant \ell \lt k$. Define ${d_{i}}=r+1$ when $1\leqslant i\leqslant \ell $ and ${d_{i}}=r$ when $\ell \lt i\leqslant k$, so that ${\textstyle\sum _{i}}{d_{i}}=d$. For this tuple $({d_{i}})$ we have
\[ \left(\genfrac{}{}{0.0pt}{}{d}{{d_{1}}\dots {d_{k}}}\right)=\frac{d!}{{\textstyle\textstyle\prod _{i=1}^{k}}{d_{i}}!}\geqslant \frac{d!}{(r+1){!^{{\textstyle\sum _{i}}{d_{i}}}}}=\frac{d!}{{(r+1)^{d}}}.\]
Since $d!\geqslant {(d/\mathrm{e})^{d}}$, $r+1\leqslant \frac{3}{2}r$, and $d/r\geqslant k$, it follows that
When ${d_{i}}$ are even, we have ${q^{{d_{i}}-1}}-{(-p)^{{d_{i}}-1}}={q^{{d_{i}}-1}}+{p^{{d_{i}}-1}}\geqslant 2\cdot \frac{1}{{2^{{d_{i}}-1}}}$ by Jensen’s inequality applied to the function $u\to {u^{{d_{i}}-1}}$ and $p+q=1$. Since in the summation we consider ${({d_{i}})_{i}}$ such that ${\textstyle\sum _{i}}{d_{i}}=d$ and ${d_{i}}\geqslant 2$, we obtain
The above two bounds, in view of Theorem 3, imply that
\[ \mathbb{E}[{(S-\mathbb{E}[S])^{d}}]\geqslant {(3\mathrm{e})^{-d}}{\sum \limits_{k=1}^{\lfloor \frac{d}{2}\rfloor }}\left(\genfrac{}{}{0.0pt}{}{n}{k}\right){(pq)^{k}}{k^{d}}.\]
Denoting ${\sigma ^{2}}=pq$ and using the bound $\left(\genfrac{}{}{0.0pt}{}{n}{k}\right)\geqslant {(n/k)^{k}}$, we finally obtain
which finishes the proof.6 Conclusion
This paper introduces novel and simpler formulas for binomial moments, derived by a combinatorial argument coupled with clever algebraic simplification which relies on symmetrization. An important application leads to sharp asymptotics for the growth of central binomial moments. Moreover, explicit algorithms and the working implementation are provided.