1 Introduction
Let $(\Omega ,\mathfrak{F},\mathbf{P})$ be a probability space supporting all distributions considered below. For any $N\ge 1$ introduce the family of discrete distributions $p=({p_{1}},{p_{2}},\dots ,{p_{N}})$ with probabilities
In the present paper we investigate some properties of the Rényi entropy, which was proposed by Rényi in [14],
\[ {\mathcal{H}_{\alpha }}(p)=\frac{1}{1-\alpha }\log \left({\sum \limits_{k=1}^{N}}{p_{k}^{\alpha }}\right),\hspace{0.2778em}\alpha >0,\hspace{0.2778em}\alpha \ne 1,\]
including its limit value as $\alpha \to 1$, i.e., the Shannon entropy
Due to this continuity, it is possible to put ${\mathcal{H}_{1}}(p)=\mathcal{H}(p)$. We consider the Rényi entropy as a functional of various parameters. The first approach is to fix the distribution and consider ${\mathcal{H}_{\alpha }}(p)$ as the function of $\alpha >0$. Some of the properties of ${\mathcal{H}_{\alpha }}(p)$ as the function of $\alpha >0$ are well known. In particular, it is known that ${\mathcal{H}_{\alpha }}(p)$ is continuous and nonincreasing in $\alpha \in (0,\infty )$, ${\lim \nolimits_{\alpha \to 0+}}{\mathcal{H}_{\alpha }}(p)=\log m$, where m is the number of nonzero probabilities, and ${\lim \nolimits_{\alpha \to +\infty }}{\mathcal{H}_{\alpha }}(p)=-\log {\max _{k}}{p_{k}}$. However, for the reader’s convenience, we provide the short proofs of this and some other simple statements in the Appendix. One can see that these properties of the entropy itself and its first derivative are common for all finite distributions. Also, it is known that Rényi entropy is Schur concave as a function of a discrete distribution, that is
\[ ({p_{i}}-{p_{j}})\left(\frac{\partial {\mathcal{H}_{\alpha }}(p)}{\partial {p_{i}}}-\frac{\partial {\mathcal{H}_{\alpha }}(p)}{\partial {p_{j}}}\right)\le 0,\hspace{0.2778em}i\ne j.\]
Some additional results such as lower bounds on the difference in the Rényi entropy for distributions defined on countable alphabets could be found in [12]. Those results usually use the Rényi divergence of order α of a distribution P from a distribution Q
\[ {D_{\alpha }}\left(P||Q\right)=\frac{1}{\alpha -1}\log \left({\sum \limits_{i=1}^{N}}\frac{{p_{i}^{\alpha }}}{{q_{i}^{\alpha -1}}}\right),\]
which is very similar to the Kullback–Leibler divergence. Some most important properties of the Rényi divergences were reviewed and extended in [7]. Rényi divergences for most commonly used univariate continuous distributions could be found in [8]. The Rényi entropy and divergence is widely used in majorization theory [6, 15], statistics [4, 13], information theory [12, 7, 1] and many other fields. Boundedness of the Rényi entropy was shown in [5] for discrete log-concave distributions depending on its variance. There are other operational definitions of the Rényi entropy given in [11], which are used in practice. It is also used in analysis of financial time series. As it is stated in [16], the Rényi entropy can deal effectively with heavy-tailed distributions and reflect a short-range characteristics of financial time series. So, in some sense, entropy can be connected to the long- and short-range dependence of the selected stochastic processes and allows to compare the memory inherent to various processes. Other methods of comparing the memory of the processes are proposed, e.g., in [3] and [2]. What is more, the Rényi entropy is used in physics. For it to be physically meaningful as thermostatistical quantity it should not change drastically if the probability distribution is slightly changed. It is important that experimental uncertainty in determining the distribution function not cause entropy to diverge. In [9] it is shown that the Rényi entropy is uniformly continuous for probabilities on finite sets. In our paper we go on and find the rate of convergence. In the present paper we restrict ourselves to the standard Rényi entropy and go a step ahead in comparison of standard properties, namely, we investigate convexity of the Rényi entropy with the help of the second derivative. It turned out that from this point of view, the situation is much more interesting and uncertain in comparison with the behavior of the first derivative, and crucially depends on the distribution. One might say that all the standard guesses are wrong. Of course, the second derivative is continuous (evidently, it simply means that it is continuous at 1 because at all other points the continuity is obvious), but then the surprises begin. If the second derivative starts with a positive value at zero, it can either remain positive or have inflection points, depending on the distribution. If it starts from the negative value, it can have the first infection point both before 1 and after 1, depending on the distribution, too (point 1 is interesting as some crucial point for entropy, so, we compare the value of inflection points with it). The value of the second derivative at zero is bounded from below but unbounded from above. Due to the over-complexity of some expressions, which defied analytical consideration, we propose several illustrations performed by numerical methods. We investigate robustness of the Rényi entropy w.r.t. the distribution, and it turns out that the rate of respective convergence depends on initial distribution, too. Further, we establish convergence of the disturbed entropy when the initial distribution is uniform but the number of events increases to ∞, and prove that the limit of Rényi entropy of the binomial distribution is equal to entropy of the Poisson distribution. It was previously proved in [10] that the Shannon entropy of the binomial distribution is increasing to entropy of the Poisson distribution. Our proof of this particular fact is simpler because uses only Lebesgue’s dominated convergence theorem. The paper is organized as follows. Section 2 is devoted to the convexity properties of the Rényi entropy, Section 3 describes robustness of the Rényi entropy, and Section A contains some auxiliary results.2 Convexity of the Rényi entropy
To start, we consider the general properties of the 2nd derivative of the Rényi entropy.
2.1 The form and the continuity of the 2nd derivative
Let us denote ${S_{i}}(\alpha )={\textstyle\sum _{k=1}^{N}}{p_{k}^{\alpha }}{\log ^{i}}{p_{k}}$, $i=0,1,2,3$. Denote also $f(\alpha )=\log \big({\textstyle\sum _{k=1}^{N}}{p_{k}^{\alpha }}\big)$. Obviously, the function $f\in {C^{\infty }}({\mathbb{R}^{+}})$, and its first three derivatives equal
and the sign of ${f^{\prime\prime\prime }}(1)$ is not fixed (as we can see below, it can be both + and −).
\[\begin{array}{l}\displaystyle {f^{\prime }}(\alpha )=\frac{{S_{1}}(\alpha )}{{S_{0}}(\alpha )},\hspace{0.2778em}{f^{\prime\prime }}(\alpha )=\frac{{S_{2}}(\alpha ){S_{0}}(\alpha )-{S_{1}^{2}}(\alpha )}{{S_{0}^{2}}(\alpha )},\\ {} \displaystyle {f^{\prime\prime\prime }}(\alpha )=\frac{{S_{3}}(\alpha ){S_{0}^{2}}(\alpha )-3{S_{2}}(\alpha ){S_{1}}(\alpha ){S_{0}}(\alpha )+2{S_{1}^{3}}(\alpha )}{{S_{0}^{3}}(\alpha )}.\end{array}\]
In particular, if one considers the random variable ξ taking values $\log {p_{k}}$ with probability ${p_{k}}$, then
(1)
\[ \begin{array}{c}\displaystyle {f^{\prime }}(1)=E(\xi )<0,\hspace{0.2778em}{f^{\prime\prime }}(1)=E({\xi ^{2}})-{(E(\xi ))^{2}}>0,\\ {} \displaystyle {f^{\prime\prime\prime }}(1)=E({\xi ^{3}})-3E({\xi ^{2}})E(\xi )+2{(E(\xi ))^{3}},\end{array}\]Lemma 1.
Let ${p_{k}}\ne 0$ for all $1\le k\le N$. Then
Proof.
Equality (2) is a result of direct calculations. Concerning equality (3), we can present ${\mathcal{H}_{\alpha }}(p)$ as
therefore, $-{\mathcal{H}_{\alpha }}(p)$ is a slope function for f. Taking successive derivatives, we get from the standard Taylor formula that
\[ {\mathcal{H}_{\alpha }^{{^{\prime }}}}(p)=\frac{{f^{\prime }}(\alpha )(1-\alpha )+f(\alpha )}{{(1-\alpha )^{2}}}=-\frac{1}{2}{f^{{^{\prime\prime }}}}(\eta ),\]
and
\[ {\mathcal{H}_{\alpha }^{{^{\prime\prime }}}}(p)=\frac{{f^{\prime\prime }}(\alpha ){(1-\alpha )^{2}}+2{f^{\prime }}(\alpha )}{(1-\alpha )+2f(\alpha )}=-\frac{1}{3}{f^{{^{\prime\prime\prime }}}}(\theta ),\]
where $\eta ,\theta \in (1,\alpha )$. If $\alpha \to 1$, then both η and θ tend to 1. Taking into account (1), we immediately get both equality (3) and statement $(ii)$. □2.2 Behavior of the 2nd derivative at the origin
Let us consider the starting point for the 2nd derivative, i.e., the behavior of ${\mathcal{H}_{\alpha }^{{^{\prime\prime }}}}(p)$ at zero as a function of a distribution vector p. Analyzing (2), we see that ${\mathcal{H}_{\alpha }^{{^{\prime\prime }}}}(p)$ as function of α is continuous in 0. Moreover,
\[ {q_{k}}(0)=1/N,\hspace{0.2778em}{q^{\prime }_{k}}(0)=\frac{\log {p_{k}}}{N}-\frac{{\textstyle\textstyle\sum _{k=1}^{N}}\log {p_{k}}}{{N^{2}}},\]
so we can present ${\mathcal{H}_{\alpha }^{{^{\prime\prime }}}}(p)$ as
\[\begin{aligned}{}{\mathcal{H}_{0}^{{^{\prime\prime }}}}(p)& =-{\sum \limits_{k=1}^{N}}\left(\frac{1}{N}\log {p_{k}}-\frac{1}{{N^{2}}}{\sum \limits_{i=1}^{N}}\log {p_{i}}+\frac{2}{N}\right)\log \frac{1}{N{p_{k}}}\\ {} & ={\sum \limits_{k=1}^{N}}\left(\frac{1}{N}\log {p_{k}}-\frac{1}{{N^{2}}}{\sum \limits_{i=1}^{N}}\log {p_{i}}+\frac{2}{N}\right)\left(\log N+\log {p_{k}}\right)\\ {} & =2\log N+\frac{1}{N}{\sum \limits_{k=1}^{N}}{(\log {p_{k}})^{2}}-\frac{1}{{N^{2}}}{\left({\sum \limits_{k=1}^{N}}\log {p_{k}}\right)^{2}}+\frac{2}{N}{\sum \limits_{k=1}^{N}}\log {p_{k}}.\end{aligned}\]
Now we are interested in the sign of ${\mathcal{H}_{\alpha }^{{^{\prime\prime }}}}(p)$. It is very simple to give an example of a distribution for which ${\mathcal{H}_{\alpha }^{{^{\prime\prime }}}}(p)>0$. One of such examples is given in Figure 1. Negative ${\mathcal{H}_{\alpha }^{{^{\prime\prime }}}}(p)$ is also possible, however, at this moment we prefer to start with a more general result. Lemma 2.
If some probability vector p is a point of local extremum of ${\mathcal{H}_{\alpha }^{{^{\prime\prime }}}}(p)$ then either $p=p(uniform)=\left(\frac{1}{N},\dots ,\frac{1}{N}\right)$ or it contains two different probabilities.
Proof.
Let us formulate the necessary conditions for ${\mathcal{H}_{0}^{{^{\prime\prime }}}}(p)$ to have a local extremum at some point. Taking into account the limitation ${\textstyle\sum _{k=1}^{N}}{p_{k}}=1$, these conditions have the form
\[ \left\{\begin{array}{l}2\log N+\frac{1}{N}{\textstyle\textstyle\sum _{k=1}^{N}}{(\log {p_{k}})^{2}}-\frac{1}{{N^{2}}}{\left({\textstyle\textstyle\sum _{k=1}^{N}}\log {p_{k}}\right)^{2}}+\frac{2}{N}{\textstyle\textstyle\sum _{k=1}^{N}}\log {p_{k}}\longrightarrow extr\hspace{1em}\\ {} {\textstyle\textstyle\sum _{k=1}^{N}}{p_{k}}=1.\hspace{1em}\end{array}\right.\]
We write the Lagrangian function
\[\begin{array}{c}\displaystyle L={\lambda _{0}}\left(2\log N+\frac{1}{N}{\sum \limits_{k=1}^{N}}{(\log {p_{k}})^{2}}-\frac{1}{{N^{2}}}{\left({\sum \limits_{k=1}^{N}}\log {p_{k}}\right)^{2}}+\frac{2}{N}{\sum \limits_{k=1}^{N}}\log {p_{k}}\right)\\ {} \displaystyle +\lambda \left({\sum \limits_{k=1}^{N}}{p_{k}}-1\right).\end{array}\]
If some p is an extremum point then there exist ${\lambda _{0}}$ and λ such that ${\lambda _{0}^{2}}+{\lambda ^{2}}\ne 0$ and $\frac{\partial L}{\partial {p_{i}}}(p)=0$ for all $1\le i\le N$, i.e.,
\[ \frac{\partial L}{\partial {p_{i}}}={\lambda _{0}}\left(\frac{2}{N{p_{i}}}\log {p_{i}}-\frac{2}{{N^{2}}{p_{i}}}\left({\sum \limits_{k=1}^{N}}\log {p_{k}}\right)+\frac{2}{N{p_{i}}}\right)+\lambda =0.\]
If ${\lambda _{0}}=0$ then $\lambda =0$. However, ${\lambda _{0}^{2}}+{\lambda ^{2}}\ne 0$, therefore we can put ${\lambda _{0}}=1$. Then
\[ -\lambda {p_{i}}=\frac{2}{N}\log {p_{i}}-\frac{2}{{N^{2}}}\left({\sum \limits_{k=1}^{N}}\log {p_{k}}\right)+\frac{2}{N}.\]
Taking the sum of these equalities we get that $\lambda =-2$, whence
So, if the distribution vector p is an extremum point then ${p_{1}}-\frac{1}{N}\log {p_{1}}=\cdots ={p_{N}}-\frac{1}{N}\log {p_{N}}$. Let us take a look at the continuous function $f(x)=x-\frac{1}{N}\log x$, $x\in (0,1)$. Its derivative equals
\[\begin{array}{l}\displaystyle {f^{\prime }}(x)=1-\frac{1}{Nx}=0\Leftrightarrow x=\frac{1}{N},\hspace{0.2778em}\operatorname{sign}({f^{\prime }}(x))=\operatorname{sign}\left(x-\frac{1}{N}\right),\\ {} \displaystyle \underset{x\to 0+}{\lim }f(x)=+\infty ,\hspace{0.2778em}\underset{x\to +1}{\lim }f(x)=1.\end{array}\]
So, $f(x)$ has its global minimum at point $x=\frac{1}{N}$, and for any $f(\frac{1}{N})<y\le 1$ there exist two points, ${x^{\prime }}\ne {x^{\prime\prime }}$, ${x^{\prime }},{x^{\prime\prime }}\in (0,1)$ such that $f({x^{\prime }})=f({x^{\prime\prime }})=y$. Thus, if ${\mathcal{H}_{0}^{{^{\prime\prime }}}}(p)$ achieves local extremum at vector p, then it contains no more than two different probabilities. Obviously, it can be $p=p(uniform)=\left(\frac{1}{N},\dots ,\frac{1}{N}\right)$. □Remark 1.
Note that ${\mathcal{H}_{0}^{{^{\prime\prime }}}}(p(uniform))=0$. Therefore, in order to find the distribution for which ${\mathcal{H}_{0}^{{^{\prime\prime }}}}(p)<0$, let us consider the distribution vector that contains only two different probabilities ${p_{0}}$, ${q_{0}}$ such that:
where $N,k\in \mathbb{N}$, $N>k$ and ${p_{0}},{q_{0}}\in (0,1)$.
(5)
\[ \left\{\begin{array}{l}{p_{0}}-{q_{0}}=\frac{1}{N}\left(\log {p_{0}}-\log {q_{0}}\right),\hspace{1em}\\ {} k{p_{0}}+(N-k){q_{0}}=1,\hspace{1em}\end{array}\right.\]Lemma 3.
Let p be the distribution vector satisfying (5). Then ${\mathcal{H}_{0}^{{^{\prime\prime }}}}(p)<0$.
Proof.
First, we will show that ${\mathcal{H}_{0}^{{^{\prime\prime }}}}(p)$ is nonpositive. For that we rewrite ${\mathcal{H}_{0}^{{^{\prime\prime }}}}(p)$ in terms of ${p_{0}}$ and ${q_{0}}$:
\[\begin{aligned}{}{\mathcal{H}_{0}^{{^{\prime\prime }}}}(p)& =2\log N+\frac{1}{N}\left(k{(\log {p_{0}})^{2}}+(N-k){(\log {q_{0}})^{2}}\right)\\ {} & -\frac{1}{{N^{2}}}{\left(k\log {p_{0}}+(N-k)\log {q_{0}}\right)^{2}}+\frac{2}{N}\left(k\log {p_{0}}+(N-k)\log {q_{0}}\right)\\ {} & =2\log N+\frac{k(N-k)}{{N^{2}}}\left({(\log {p_{0}})^{2}}-2\log {p_{0}}\log {q_{0}}+{(\log {q_{0}})^{2}}\right)\\ {} & +\frac{2k}{N}\left(\log {p_{0}}-\log {q_{0}}\right)+2\log {q_{0}}\\ {} & =2\log N{q_{0}}+k(N-k){({p_{0}}-{q_{0}})^{2}}+2k({p_{0}}-{q_{0}}).\end{aligned}\]
We know that $k{p_{0}}+(N-k){q_{0}}=1$, whence $k=\frac{N{q_{0}}-1}{{q_{0}}-{p_{0}}}$, and $N-k=\frac{1-N{p_{0}}}{{q_{0}}-{p_{0}}}$. Then
\[\begin{aligned}{}{\mathcal{H}_{0}^{{^{\prime\prime }}}}(p)& =2\log N{q_{0}}+(1-N{q_{0}})(N{p_{0}}-1)+2(1-N{q_{0}})\\ {} & =2\log N{q_{0}}+N({p_{0}}-{q_{0}})+1-{N^{2}}{p_{0}}{q_{0}}\\ {} & =\log {(N{q_{0}})^{2}}+\log \frac{{p_{0}}}{{q_{0}}}+1-{N^{2}}{p_{0}}{q_{0}}=\log {N^{2}}{p_{0}}{q_{0}}-{N^{2}}{p_{0}}{q_{0}}+1.\end{aligned}\]
Note that $\log x-x+1<0$ for $x>0$, $x\ne 1$. We want to show that under conditions (5) ${N^{2}}{p_{0}}{q_{0}}$ cannot be equal to 1. Suppose that ${N^{2}}{p_{0}}{q_{0}}=1$. Then it follows from (5) that
It means that ${q_{0}}$ and ${p_{0}}$ are algebraic numbers. Thus, their defference ${p_{0}}-{q_{0}}$ is also algebraic. On the other hand, by the Lindemann–Weierstrass theorem $\frac{1}{N}(\log {p_{0}}-\log {q_{0}})$ is transcendental number, which contradicts (5). So ${N^{2}}{p_{0}}{q_{0}}\ne 1$ and ${\mathcal{H}_{0}^{{^{\prime\prime }}}}(p)<0$. □Theorem 1.
For any $n>2$ there exists $N\ge n$ and a probability vector $p=({p_{1}},\dots ,{p_{N}})$ such that ${\mathcal{H}_{0}^{{^{\prime\prime }}}}(p)<0$.
Proof.
Consider a distribution vector p that satisfies conditions (5). From Lemma 3 we know that ${\mathcal{H}_{0}^{{^{\prime\prime }}}}(p)<0$. Now we want to show that there exist an arbitrarily large $N\in \mathbb{N}$ and a distribution vector p of length N that satisfy those conditions. For that we denote
Then $0<x<1<y$ and $r<1$ and $x-y=\log x-\log y$. The function $x-\log x$ is decreasing on $(0,1)$, is increasing on $(1,+\infty )$ and is equal to 1 at point 1. Let $y=y(x)$ be the implicit function defined by $x-y=\log x-\log y$. By that we get the 1-to-1 correspondence from $x\in (0,1)$ to $y\in (1,+\infty )$. We also have fuction the $r(x)=\frac{y(x)-1}{y(x)-x}$. If we find ${x^{\prime }}\in (0,1)$ such that ${r^{\prime }}=r({x^{\prime }})$ is rational then we could pick $N,k\in \mathbb{N}$ such that ${r^{\prime }}=\frac{k}{N}$ and get q distribution vector p satisfying (5) with ${p_{0}}=\frac{x}{N}$, ${q_{0}}=\frac{y}{N}$. However, we will not look for such ${x^{\prime }}$, bot will just show that they exist. To do that, observe that $y(x)$ is a continuous function of x and so is the function $r(x)=\frac{y(x)-1}{y(x)-x}$. What is more,
\[ y(x)\to +\infty ,\hspace{0.2778em}x\to 0+\hspace{0.2778em}\mathrm{so}\hspace{0.2778em}r(x)\to 1,\hspace{0.2778em}x\to 0+.\]
Let us fix ${x_{0}}\in (0,1)$, $r({x_{0}})<1$. Then for any ${r^{\prime }}\in (r({x_{0}}),1)$ there exists ${x^{\prime }}\in (0,{x_{0}})$ such that $r({x^{\prime }})={r^{\prime }}$. By taking ${r^{\prime }}\in \mathbb{Q}$ we get that there exists ${x^{\prime }}$ such that $\frac{k}{N}<1$ and is rational. Finally, we want to show that N can be arbitrarily large. For that simply observe that $\frac{k}{N}={r^{\prime }}$ so as ${r^{\prime }}\to 1-$ we get that $N\to +\infty $. □Lemma 4.
Let N be fixed. Then ${\mathcal{H}_{0}^{{^{\prime\prime }}}}(p)$ as the function of vector p is bounded from below and is unbounded from above.
Proof.
Recall that ${\mathcal{H}_{0}^{{^{\prime\prime }}}}(p)=0$ on the uniform distribution and exclude this case from the further consideration. In order to simplify the notations, we denote ${x_{k}}=\log {p_{k}}$, and let
\[ {S_{N}}:=N({\mathcal{H}_{0}^{{^{\prime\prime }}}}(p)-2\log N)={\sum \limits_{k=1}^{N}}{({x_{k}})^{2}}-\frac{1}{N}{\left({\sum \limits_{k=1}^{N}}{x_{k}}\right)^{2}}+2{\sum \limits_{k=1}^{N}}{x_{k}}.\]
Note that there exists $n\le N-1$ such that
\[ {x_{1}}<\log \frac{1}{N},\dots ,{x_{n}}<\log \frac{1}{N},\hspace{0.2778em}{x_{n+1}}\ge \log \frac{1}{N},\dots ,{x_{N}}\ge \log \frac{1}{N}.\]
Further, denote the rectangle $A={[\log \frac{1}{N};0]^{N-n}}\subset {\mathbb{R}^{N-n}}$, and let
\[ {S_{N,1}}={\sum \limits_{k=1}^{n}}{x_{k}},\hspace{0.2778em}{S_{N,2}}={\sum \limits_{k=n+1}^{N}}{x_{k}}.\]
Let us establish that ${\mathcal{H}_{0}^{{^{\prime\prime }}}}(p)$ is bounded from below. In this connection, rewrite ${S_{N}}$ as
\[ {S_{N}}={\sum \limits_{k=1}^{n}}{x_{k}^{2}}+{\sum \limits_{k=n+1}^{N}}{x_{k}^{2}}-\frac{1}{N}\left({({S_{N,1}})^{2}}+2{S_{N,1}}{S_{N,2}}+{({S_{N,2}})^{2}}\right)+2{S_{N,1}}+2{S_{N,2}}.\]
By the Cauchy–Schwarz inequality we have
\[ {\left({\sum \limits_{k=1}^{n}}{x_{k}}\right)^{2}}\le n{\sum \limits_{k=1}^{n}}{x_{k}^{2}},\hspace{0.2778em}{\left({\sum \limits_{k=n+1}^{N}}{x_{k}}\right)^{2}}\le (N-n){\sum \limits_{k=n+1}^{N}}{x_{k}^{2}}.\]
Therefore
\[\begin{aligned}{}{S_{N}}& \ge \left(1-\frac{n}{N}\right){\sum \limits_{k=1}^{n}}{x_{k}^{2}}+\frac{n}{N}{\sum \limits_{k=n+1}^{N}}{x_{k}^{2}}-\frac{2}{N}{S_{N,1}}{S_{N,2}}+2{S_{N,1}}+2{S_{N,2}}\\ {} & ={\sum \limits_{k=1}^{n}}\left(\left(1-\frac{n}{N}\right){x_{k}^{2}}+{x_{k}}\left(2-\frac{2}{N}{S_{N,2}}\right)\right)+\frac{n}{N}{\sum \limits_{k=n+1}^{N}}{x_{k}^{2}}+2{S_{N,2}}\\ {} & =\frac{1}{N}{\sum \limits_{k=1}^{n}}\left(\left(N-n\right){x_{k}^{2}}+2{x_{k}}\left(N-{S_{N,2}}\right)\right)+\frac{n}{N}{\sum \limits_{k=n+1}^{N}}{x_{k}^{2}}+2{S_{N,2}}.\end{aligned}\]
There exists $M>0$ such that for every $n\le N-1$ we have $|{S_{N,2}}|\le M$ because A is compact and ${S_{N,2}}$ is continuous on A. Obviously, $\frac{n}{N}{\textstyle\sum _{k=n+1}^{N}}{x_{k}^{2}}\ge 0$. Finally, for every $1\le k\le n$ we have that $\left(N-n\right){x_{k}^{2}}+2{x_{k}}\left(N-{S_{2}}\right)$ is bounded from below by the value $-\frac{{(N-{S_{2,N}})^{2}}}{N-n}\ge -{N^{2}}-{M^{2}}$. Resuming, we get that ${S_{N}}$ is bounded from below, and consequently ${\mathcal{H}_{0}^{{^{\prime\prime }}}}(p)$ is bounded from below for fixed N.Now we want to establish that ${\mathcal{H}_{0}^{{^{\prime\prime }}}}(p)$ is not bounded from above. In this connection, let $\varepsilon >0$, and let us consider the distribution of the form ${p_{1}}=\varepsilon $, ${p_{2}}=\cdots ={p_{N}}=\frac{1-\varepsilon }{N-1}$. Then we have
\[\begin{aligned}{}{\mathcal{H}_{0}^{{^{\prime\prime }}}}(p)& =2\log N+\frac{1}{N}{\sum \limits_{k=1}^{N}}{(\log {p_{k}})^{2}}-\frac{1}{{N^{2}}}{\left({\sum \limits_{k=1}^{N}}\log {p_{k}}\right)^{2}}+\frac{2}{N}{\sum \limits_{k=1}^{N}}\log {p_{k}}\\ {} & =2\log N+\frac{N-1}{N}{\left(\log \frac{1-\varepsilon }{N-1}\right)^{2}}+\frac{1}{N}{(\log \varepsilon )^{2}}\\ {} & -\frac{1}{{N^{2}}}{\left((N-1)\log \frac{1-\varepsilon }{N-1}+\log \varepsilon \right)^{2}}+\frac{2(N-1)}{N}\log \frac{1-\varepsilon }{N-1}+\frac{2}{N}\log \varepsilon \\ {} & =\left(\frac{1}{N}-\frac{1}{{N^{2}}}\right){\left(\log \varepsilon \right)^{2}}+\left(\frac{2}{N}-\frac{2(N-1)}{{N^{2}}}\log \frac{1-\varepsilon }{N-1}\right)\log \varepsilon +2\log N\\ {} & +\left(\frac{N-1}{N}-\frac{{(N-1)^{2}}}{{N^{2}}}\right){\left(\log \frac{1-\varepsilon }{N-1}\right)^{2}}\\ {} & +\frac{2(N-1)}{N}\log \frac{1-\varepsilon }{N-1}\to +\infty ,\hspace{0.2778em}\varepsilon \to 0+.\end{aligned}\]
□2.3 Superposition of entropy that is convex
Now we establish that the superposition of entropy with some decreasing function is convex. Namely, we shall consider the function
and prove its convexity. Because now we consider the tools that do not include differentiation, we can assume that some probabilities are zero. In order to prove the convexity, we start with the following simple and known result whose proof is added for the reader’s convenience.
(6)
\[ {\mathcal{G}_{\beta }}(p)=-{\mathcal{H}_{1+\frac{1}{\beta }}}(p)=\beta \log \left({\sum \limits_{k=1}^{N}}{p_{k}^{1+1/\beta }}\right),\hspace{0.1667em}\beta >0,\]Lemma 5.
For any measure space $(\mathcal{X},\Sigma ,\mu )$ and any measurable $f\in {L^{p}}(\mathcal{X},\Sigma ,\mu )$ for some interval $p\in [a,b]$, ${\left\| f\right\| _{p}}={\left\| f\right\| _{{L^{p}}(\mathcal{X},\Sigma ,\mu )}}$ is log-convex as a function of $1/p$ on this interval.
Proof.
For any ${p_{2}},{p_{1}}>0$ and $\theta \in (0,1)$, denote $p={\big(\theta /{p_{1}}+(1-\theta )/{p_{2}}\big)^{-1}}$ and observe that
Therefore, by the Hölder inequality
\[\begin{aligned}{}{\left\| f\right\| _{p}^{p}}& ={\int _{\mathcal{X}}}|f(x){|^{\theta p}}\cdot |f(x){|^{(1-\theta )p}}\mu (dx)\\ {} & \le {\left({\int _{\mathcal{X}}}|f(x){|^{{p_{1}}}}\mu (dx)\right)^{\theta p/{p_{1}}}}{\left({\int _{\mathcal{X}}}|f(x){|^{{p_{2}}}}\mu (dx)\right)^{(1-\theta )p/{p_{2}}}},\end{aligned}\]
whence
as required. □Corollary 1.
For any probability vector $p=({p_{k}},1\le k\le N)$, the function ${\mathcal{G}_{\beta }}(p),\beta >0$, is convex.
Proof.
It follows from Lemma 5 by setting $\mathcal{X}=\left\{1,\dots ,N\right\}$, $\mu (A)={\textstyle\sum _{k\in A}}{p_{k}}$, $f(k)={p_{k}}$, $k\in \mathcal{X}$. □
Remark 2.
It follows immediately from (6) that for the function
\[ {\mathcal{G}_{\beta }}(p)=\beta \log {\sum \limits_{k=1}^{N}}{p_{k}^{1+1/\beta }},\hspace{0.2778em}\beta >0,\]
${\mathcal{H}_{\alpha }}(p)={\mathcal{G}_{\frac{1}{\alpha -1}}}(p)$. For $\alpha >1$, $\frac{1}{\alpha -1}$ is convex. If there be such p that ${G_{\cdot }}(p)$ be nondecreasing on an interval, then ${\mathcal{G}_{\frac{1}{\alpha -1}}}(p)$ be convex on that interval and ${\mathcal{H}_{\alpha }}(p)$ be convex, too. However,
\[\begin{array}{c}\displaystyle {\mathcal{G}^{\prime }_{\beta }}(p)=\log {\sum \limits_{k=1}^{N}}{p_{k}^{1+1/\beta }}-\frac{1}{\beta }\frac{{\textstyle\textstyle\sum _{k=1}^{N}}{p_{k}^{1+1/\beta }}\log {p_{k}}}{{\textstyle\textstyle\sum _{k=1}^{N}}{p_{k}^{1+1/\beta }}}\\ {} \displaystyle =-{\sum \limits_{k=1}^{N}}\frac{{p_{k}^{1+1/\beta }}}{{\textstyle\textstyle\sum _{k=1}^{N}}{p_{k}^{1+1/\beta }}}\log \frac{{p_{k}^{1/\beta }}}{{\textstyle\textstyle\sum _{k=1}^{N}}{p_{k}^{1+1/\beta }}}\le 0.\end{array}\]
In some sense, this is a reason why we cannot say something definite concerning the 2nd derivative of entropy either on the whole semiaxes or even in the interval $[1,+\infty )$.2.4 Graphs of ${\mathcal{H}_{\alpha }}(p)$ and its second derivative for several probability distributions
Fig. 1.
Graphs of ${\mathcal{H}_{\alpha }}(p)$ and ${\mathcal{H}_{\alpha }^{{^{\prime\prime }}}}(p)$, where ${p_{1}}={p_{2}}=0.4$, ${p_{3}}=0.2$. Here ${\mathcal{H}_{\alpha }}(p)$ is convex as a function of $\alpha >0$
Fig. 2.
Graphs of ${\mathcal{H}_{\alpha }}(p)$ and ${\mathcal{H}_{\alpha }^{{^{\prime\prime }}}}(p)$, where ${p_{1}}=\cdots ={p_{198}}=\frac{1}{400}$, ${p_{199}}={p_{200}}=\frac{101}{400}$. Dot is the point where ${H_{\alpha }^{{^{\prime\prime }}}}(p)=0$ and this point is $\alpha =0.99422$
Fig. 3.
Graphs of ${\mathcal{H}_{\alpha }}(p)$ and ${\mathcal{H}_{\alpha }^{{^{\prime\prime }}}}(p)$, where ${p_{1}}=\cdots ={p_{10}}=0.01$, ${p_{11}}={p_{12}}=0.15$, ${p_{13}}={p_{14}}=0.3$. Here the second derivative becomes positive long before point 1 (at point 0.11318)
Fig. 4.
Graphs of ${\mathcal{H}_{\alpha }}(p)$ and ${\mathcal{H}_{\alpha }^{{^{\prime\prime }}}}(p)$, where ${p_{1}}=\cdots ={p_{10}}=0.08$, ${p_{11}}=0.2$. Here the second derivative becomes positive after point 1 (at point 2.9997)
3 Robustness of the Rényi entropy
Now we study the asymptotic behavior of the Rényi entropy depending on the behavior of the involved probabilities. The first problem is the stability of the entropy w.r.t. involved probabilities and the rate of its convergence to the limit value when probabilities tend to their limit value with the fixed rate.
3.1 Rate of convergence of the disturbed entropy when the initial distribution is arbitrary but fixed
Let us look at distributions that are “near” some fixed distribution $p=({p_{k}},\hspace{0.2778em}1\le k\le N)$ and construct the approximate distribution $p(\epsilon )=({p_{k}}(\epsilon ),\hspace{0.2778em}1\le k\le N)$ as follows. We can assume that some probabilities are zero, and we shall see that this assumption influences the rate of convergence of the Rényi entropy to the limit value. So, let $0\le {N_{1}}<N$ be a number of zero probabilities, and for them we consider approximate values of the form ${p_{k}}(\epsilon )={c_{k}}\varepsilon $, $0\le {c_{k}}\le 1$, $1\le k\le {N_{1}}$. Further, let ${N_{2}}=N-{N_{1}}\ge 1$ be a number of nonzero probabilities, and for them we consider approximate values of the form ${p_{k}}(\epsilon )={p_{k}}+{c_{k}}\varepsilon $, $|{c_{k}}|\le 1$, ${N_{1}}+1\le k\le N$, where ${c_{1}}+\cdots +{c_{N}}=0$ and $\varepsilon \le \underset{{N_{1}}+1\le k\le N}{\min }{p_{k}}$. Assume also that there exists $k\le N$ such that ${c_{k}}\ne 0$, otherwise ${\mathcal{H}_{\alpha }}(p)-{\mathcal{H}_{\alpha }}(p(\epsilon ))=0$. So, we disturb the intial probabilities linearly in ε with different weights whose sum should necessarily be zero. These assumptions imply that $0\le {p_{k}}(\epsilon )\le 1$ and ${p_{1}}(\epsilon )+\cdots +{p_{N}}(\epsilon )=1$. Now we want to find out how entropy of the disturbed distribution will differ from the initial entropy, depending on parameters ε, N and α. We start with $\alpha =1$.
Theorem 2.
Proof.
First of all, we will find the asymptotic behavior of two auxiliary functions as $\varepsilon \to 0$. First, let $0\le {c_{k}}\le 1$. Then
In particular, we immediately get from (3.1) that
\[ {c_{k}}\varepsilon \log ({c_{k}}\varepsilon )={c_{k}}\varepsilon \log \varepsilon +{c_{k}}\varepsilon \log {c_{k}}={c_{k}}\varepsilon \log \varepsilon +o(\varepsilon \log \varepsilon ),\hspace{0.2778em}\varepsilon \to 0.\]
Second, let ${p_{k}}>0$, $|{c_{k}}|\le 1$. Taking into account the Taylor expansion of logarithm
we can write:
(7)
\[\begin{array}{l}\displaystyle ({p_{k}}+{c_{k}}\varepsilon )\log ({p_{k}}+{c_{k}}\varepsilon )-{p_{k}}\log {p_{k}}={c_{k}}\varepsilon \log {p_{k}}+({p_{k}}+{c_{k}}\varepsilon )\log (1+{c_{k}}{p_{k}^{-1}}\varepsilon )\\ {} \displaystyle ={c_{k}}\varepsilon \log {p_{k}}+({p_{k}}+{c_{k}}\varepsilon )\left({c_{k}}{p_{k}^{-1}}\varepsilon -\frac{1}{2}{({c_{k}}{p_{k}^{-1}}\varepsilon )^{2}}+o({\varepsilon ^{2}})\right)\\ {} \displaystyle =\varepsilon ({c_{k}}\log {p_{k}}+{c_{k}})+{\varepsilon ^{2}}\left({c_{k}^{2}}{p_{k}^{-1}}-\frac{1}{2}{c_{k}^{2}}{p_{k}^{-1}}\right)+o({\varepsilon ^{2}})\\ {} \displaystyle =\varepsilon ({c_{k}}\log {p_{k}}+{c_{k}})+\frac{{c_{k}^{2}}{\varepsilon ^{2}}}{2{p_{k}}}+o({\varepsilon ^{2}}),\hspace{0.2778em}\varepsilon \to 0.\end{array}\]
\[ ({p_{k}}+{c_{k}}\varepsilon )\log ({p_{k}}+{c_{k}}\varepsilon )-{p_{k}}\log {p_{k}}=o(\varepsilon \log \varepsilon ),\hspace{0.2778em}\varepsilon \to 0,\]
and
\[ ({p_{k}}+{c_{k}}\varepsilon )\log ({p_{k}}+{c_{k}}\varepsilon )-{p_{k}}\log {p_{k}}=\varepsilon ({c_{k}}\log {p_{k}}+{c_{k}})+o(\varepsilon ),\hspace{0.2778em}\varepsilon \to 0\]
Now simply observe the following.
\[\begin{aligned}{}\hspace{-23.0pt}(i)\hspace{2em}\hspace{1em}& \underset{\varepsilon \to 0}{\lim }\frac{{\mathcal{H}_{1}}(p)-{\mathcal{H}_{1}}(p(\epsilon ))}{\varepsilon \log \varepsilon }\\ {} & =\underset{\varepsilon \to 0}{\lim }\frac{1}{\varepsilon \log \varepsilon }{\sum \limits_{k=1}^{{N_{1}}}}{c_{k}}\varepsilon \log {c_{k}}\varepsilon \\ {} & \hspace{1em}+\frac{1}{\varepsilon \log \varepsilon }{\sum \limits_{k={N_{1}}+1}^{N}}(({p_{k}}+{c_{k}}\varepsilon )\log ({p_{k}}+{c_{k}}\varepsilon )-{p_{k}}\log {p_{k}})\\ {} & \hspace{1em}=\underset{\varepsilon \to 0}{\lim }\frac{1}{\varepsilon \log \varepsilon }\left({\sum \limits_{k=1}^{{N_{1}}}}({c_{k}}\varepsilon \log \varepsilon +o(\varepsilon \log \varepsilon ))+{\sum \limits_{k={N_{1}}+1}^{N}}o(\varepsilon \log \varepsilon )\right)\\ {} & \hspace{1em}={\sum \limits_{k=1}^{{N_{1}}}}{c_{k}}.\end{aligned}\]
$(ii)$ Since for any $k\le {N_{1}}$ we have that ${c_{k}}=0$ and the total sum ${c_{1}}+\cdots +{c_{N}}=0$ then ${c_{{N_{1}}+1}}+\cdots +{c_{N}}=0$. Furthermore, in this case
\[\begin{aligned}{}\underset{\varepsilon \to 0}{\lim }\frac{{\mathcal{H}_{1}}(p)-{\mathcal{H}_{1}}(p(\epsilon ))}{\varepsilon }& =\underset{\varepsilon \to 0}{\lim }\frac{1}{\varepsilon }{\sum \limits_{k={N_{1}}+1}^{N}}(({p_{k}}+{c_{k}}\varepsilon )\log ({p_{k}}+{c_{k}}\varepsilon )-{p_{k}}\log {p_{k}})\\ {} & =\underset{\varepsilon \to 0}{\lim }\frac{1}{\varepsilon }{\sum \limits_{k={N_{1}}+1}^{N}}(\varepsilon ({c_{k}}\log {p_{k}}+{c_{k}})+o(\varepsilon ))\\ {} & ={\sum \limits_{k={N_{1}}+1}^{N}}({c_{k}}\log {p_{k}}+{c_{k}})={\sum \limits_{k={N_{1}}+1}^{N}}{c_{k}}\log {p_{k}}.\end{aligned}\]
$(iii)$ In this case we have the following relations:
\[\begin{aligned}{}\underset{\varepsilon \to 0}{\lim }\frac{{\mathcal{H}_{1}}(p)-{\mathcal{H}_{1}}(p(\epsilon ))}{{\varepsilon ^{2}}}& =\underset{\varepsilon \to 0}{\lim }\frac{1}{{\varepsilon ^{2}}}{\sum \limits_{k={N_{1}}+1}^{N}}(({p_{k}}+{c_{k}}\varepsilon )\log ({p_{k}}+{c_{k}}\varepsilon )-{p_{k}}\log {p_{k}})\\ {} & =\underset{\varepsilon \to 0}{\lim }\frac{1}{{\varepsilon ^{2}}}{\sum \limits_{k={N_{1}}+1}^{N}}(\varepsilon ({c_{k}}\log {p_{k}}+{c_{k}})+\frac{{c_{k}^{2}}{\varepsilon ^{2}}}{2{p_{k}}}+o({\varepsilon ^{2}}))\\ {} & =\underset{\varepsilon \to 0}{\lim }\frac{1}{{\varepsilon ^{2}}}{\sum \limits_{k={N_{1}}+1}^{N}}\left(\frac{{c_{k}^{2}}{\varepsilon ^{2}}}{2{p_{k}}}+o({\varepsilon ^{2}})\right)=\frac{1}{2}{\sum \limits_{k={N_{1}}+1}^{N}}\frac{{c_{k}^{2}}}{{p_{k}}}.\end{aligned}\]
Theorem is proved. □Now we proceed with $\alpha <1$.
Theorem 3.
Let the number N and coefficients ${c_{1}},\dots ,{c_{N}}$ be fixed, and let $\alpha <1$. Then we have three different situations:
-
$(i)$ Let ${N_{1}}\ge 1$ and there exists $k\le {N_{1}}$ such that ${c_{k}}\ne 0$. Then
-
$(ii)$ Let for all $k\le {N_{1}}\hspace{0.2778em}{c_{k}}=0$ and ${\textstyle\sum _{k={N_{1}}+1}^{N}}{c_{k}}{p_{k}^{\alpha -1}}\ne 0$. Then
-
$(iii)$ Let for all $k\le {N_{1}}\hspace{0.2778em}{c_{k}}=0$ and ${\textstyle\sum _{k={N_{1}}+1}^{N}}{c_{k}}{p_{k}^{\alpha -1}}=0$. Then
Proof.
Similarly to proof of Theorem 2, we start with several asymptotic relations as $\varepsilon \to 0$. Namely, let ${p_{k}}>0$, $|{c_{k}}|\le 1$. Taking into account the Taylor expansion of ${(1+x)^{\alpha }}$ that has the form
we can write:
As a consequence, we get the following asymptotic relations:
and
(8)
\[ \begin{array}{c}\displaystyle \alpha {c_{k}}{({p_{k}}+{c_{k}}\varepsilon )^{\alpha -1}}=\alpha {c_{k}}{p_{k}^{\alpha -1}}{(1+{c_{k}}{p_{k}^{-1}}\varepsilon )^{\alpha -1}}\\ {} \displaystyle =\alpha {c_{k}}{p_{k}^{\alpha -1}}(1+(\alpha -1){c_{k}}{p_{k}^{-1}}\varepsilon +o(\varepsilon ))\\ {} \displaystyle =\alpha {c_{k}}{p_{k}^{\alpha -1}}+\alpha (\alpha -1){c_{k}^{2}}{p_{k}^{\alpha -2}}\varepsilon +o(\varepsilon ),\hspace{0.2778em}\varepsilon \to 0.\end{array}\](9)
\[ \alpha {c_{k}}{({p_{k}}+{c_{k}}\varepsilon )^{\alpha -1}}=o({\varepsilon ^{\alpha -1}}),\hspace{0.2778em}\varepsilon \to 0,\]
\[ \alpha {c_{k}}{({p_{k}}+{c_{k}}\varepsilon )^{\alpha -1}}=\alpha {c_{k}}{p_{k}^{\alpha -1}}+o(1),\hspace{0.2778em}\varepsilon \to 0\]
$(i)$ Applying L’Hospital’s rule, we get:
\[\begin{aligned}{}\underset{\varepsilon \to 0}{\lim }\frac{{H_{\alpha }}(p)-{H_{\alpha }}(p(\epsilon ))}{{\varepsilon ^{\alpha }}}& =\underset{\varepsilon \to 0}{\lim }\frac{1}{(\alpha -1){\varepsilon ^{\alpha }}}\log \left({\sum \limits_{k=1}^{{N_{1}}}}{({c_{k}}\varepsilon )^{\alpha }}+{\sum \limits_{k={N_{1}}+1}^{N}}{({p_{k}}+{c_{k}}\varepsilon )^{\alpha }}\right)\\ {} & -\frac{1}{(\alpha -1){\varepsilon ^{\alpha }}}\log \left({\sum \limits_{k=1}^{N}}{p_{k}^{\alpha }}\right)=\frac{1}{\alpha -1}\underset{\varepsilon \to 0}{\lim }\frac{1}{\alpha {\varepsilon ^{\alpha -1}}}\\ {} & \times \frac{{\textstyle\textstyle\sum _{k=1}^{{N_{1}}}}\alpha {c_{k}^{\alpha }}{\varepsilon ^{\alpha -1}}+{\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}\alpha {c_{k}}{({p_{k}}+{c_{k}}\varepsilon )^{\alpha -1}}}{{\textstyle\textstyle\sum _{k=1}^{{N_{1}}}}{({c_{k}}\varepsilon )^{\alpha }}+{\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}{({p_{k}}+{c_{k}}\varepsilon )^{\alpha }}}\\ {} & =\frac{1}{\alpha -1}\underset{\varepsilon \to 0}{\lim }\frac{{\textstyle\textstyle\sum _{k=1}^{{N_{1}}}}{c_{k}^{\alpha }}+{\varepsilon ^{1-\alpha }}{\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}o({\varepsilon ^{\alpha -1}})}{{\textstyle\textstyle\sum _{k=1}^{{N_{1}}}}{({c_{k}}\varepsilon )^{\alpha }}+{\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}{({p_{k}}+{c_{k}}\varepsilon )^{\alpha }}}\\ {} & =\frac{1}{\alpha -1}\left({\sum \limits_{k=1}^{{N_{1}}}}{c_{k}^{\alpha }}\right){\left({\sum \limits_{k={N_{1}}+1}^{N}}{p_{k}^{\alpha }}\right)^{-1}}.\end{aligned}\]
$(ii)$ In this case we can transform the value under a limit as follows:
\[\begin{aligned}{}& \underset{\varepsilon \to 0}{\lim }\frac{{\mathcal{H}_{\alpha }}(p)-{\mathcal{H}_{\alpha }}(p(\epsilon ))}{\varepsilon }\\ {} & \hspace{1em}=\underset{\varepsilon \to 0}{\lim }\frac{1}{(\alpha -1)\varepsilon }\left(\log \left({\sum \limits_{k={N_{1}}+1}^{N}}{({p_{k}}+{c_{k}}\varepsilon )^{\alpha }}\right)-\log \left({\sum \limits_{k=1}^{N}}{p_{k}^{\alpha }}\right)\right)\\ {} & \hspace{1em}=\frac{1}{\alpha -1}\underset{\varepsilon \to 0}{\lim }\frac{{\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}\alpha {c_{k}}{({p_{k}}+{c_{k}}\varepsilon )^{\alpha -1}}}{{\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}{({p_{k}}+{c_{k}}\varepsilon )^{\alpha }}}\\ {} & \hspace{1em}=\frac{1}{\alpha -1}\underset{\varepsilon \to 0}{\lim }\frac{{\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}(\alpha {c_{k}}{p_{k}^{\alpha -1}}+o(1))}{{\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}{({p_{k}}+{c_{k}}\varepsilon )^{\alpha }}}\\ {} & \hspace{1em}=\frac{\alpha }{\alpha -1}\left({\sum \limits_{k={N_{1}}+1}^{N}}{c_{k}}{p_{k}^{\alpha -1}}\right){\left({\sum \limits_{k={N_{1}}+1}^{N}}{p_{k}^{\alpha }}\right)^{-1}}.\end{aligned}\]
$(iii)$ Finally, in the 3rd case,
\[\begin{aligned}{}& \underset{\varepsilon \to 0}{\lim }\frac{{\mathcal{H}_{\alpha }}(p)-{\mathcal{H}_{\alpha }}(p(\epsilon ))}{{\varepsilon ^{2}}}\\ {} & \hspace{1em}=\underset{\varepsilon \to 0}{\lim }\frac{1}{(\alpha -1){\varepsilon ^{2}}}\left(\log \left({\sum \limits_{k={N_{1}}+1}^{N}}{({p_{k}}+{c_{k}}\varepsilon )^{\alpha }}\right)-\log \left({\sum \limits_{k=1}^{N}}{p_{k}^{\alpha }}\right)\right)\\ {} & \hspace{1em}=\frac{1}{\alpha -1}\underset{\varepsilon \to 0}{\lim }\frac{1}{2\varepsilon }\frac{{\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}\alpha {c_{k}}{({p_{k}}+{c_{k}}\varepsilon )^{\alpha -1}}}{{\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}{({p_{k}}+{c_{k}}\varepsilon )^{\alpha }}}\\ {} & \hspace{1em}=\frac{1}{\alpha -1}\underset{\varepsilon \to 0}{\lim }\frac{1}{2\varepsilon }\frac{{\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}(\alpha {c_{k}}{p_{k}^{\alpha -1}}+\alpha (\alpha -1){c_{k}^{2}}{p_{k}^{\alpha -2}}\varepsilon +o(\varepsilon ))}{{\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}{({p_{k}}+{c_{k}}\varepsilon )^{\alpha }}}\\ {} & \hspace{1em}=\frac{1}{\alpha -1}\underset{\varepsilon \to 0}{\lim }\frac{1}{2\varepsilon }\frac{{\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}(\alpha (\alpha -1){c_{k}^{2}}{p_{k}^{\alpha -2}}\varepsilon +o(\varepsilon ))}{{\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}{({p_{k}}+{c_{k}}\varepsilon )^{\alpha }}}\\ {} & \hspace{1em}=\frac{\alpha }{2}\left({\sum \limits_{k={N_{1}}+1}^{N}}{c_{k}^{2}}{p_{k}^{\alpha -2}}\right){\left({\sum \limits_{k={N_{1}}+1}^{N}}{p_{k}^{\alpha }}\right)^{-1}}.\end{aligned}\]
Theorem is proved. □Now we conclude with $\alpha >1$. In this case, five different asymptotics are possible.
Theorem 4.
Let the number N and coefficients ${c_{1}},\dots ,{c_{N}}$ be fixed, and let $\alpha >1$. Then five different situations are possible:
-
$(i)$ Let ${\textstyle\sum _{k={N_{1}}+1}^{N}}{c_{k}}{p_{k}^{\alpha -1}}\ne 0$. Then for any ${N_{1}}\ge 0$ and $\alpha >1$, we have that
-
$(ii)$ Let ${\textstyle\sum _{k={N_{1}}+1}^{N}}{c_{k}}{p_{k}^{\alpha -1}}=0$, ${N_{1}}\ge 1$, and there exists $k\le {N_{1}}$ such that ${c_{k}}\ne 0$. Then for $\alpha <2$ it holds that
-
$(iii)$ Let ${\textstyle\sum _{k={N_{1}}+1}^{N}}{c_{k}}{p_{k}^{\alpha -1}}=0$, ${N_{1}}\ge 0$ and for all $k\le {N_{1}}$ we have that ${c_{k}}=0$. Then for $\alpha <2$ it holds that
-
$(iv)$ Let ${\textstyle\sum _{k={N_{1}}+1}^{N}}{c_{k}}{p_{k}^{\alpha -1}}=0$, $\alpha =2$. Then for any ${N_{1}}\ge 0$ and ${c_{k}},\hspace{0.2778em}k\le {N_{1}}$, we have that
-
$(v)$ Let ${\textstyle\sum _{k={N_{1}}+1}^{N}}{c_{k}}{p_{k}^{\alpha -1}}=0$, $\alpha >2$. Then for any ${N_{1}}\ge 0$ and ${c_{k}},\hspace{0.2778em}k\le {N_{1}}$, we have that
Proof.
As in the proof of Theorem 3, we shall use expansions (8) and (9). The main tool will be L’Hospital’s rule.
$(i)$ Let ${\textstyle\sum _{k={N_{1}}+1}^{N}}{c_{k}}{p_{k}^{\alpha -1}}\ne 0$. Then for any ${N_{1}}\ge 0$ and $\alpha >1$, we have the following relations:
\[\begin{aligned}{}\underset{\varepsilon \to 0}{\lim }\frac{{\mathcal{H}_{\alpha }}(p)-{\mathcal{H}_{\alpha }}(p(\epsilon ))}{\varepsilon }& =\underset{\varepsilon \to 0}{\lim }\frac{1}{(\alpha -1)\varepsilon }\log \left({\sum \limits_{k=1}^{{N_{1}}}}{({c_{k}}\varepsilon )^{\alpha }}+{\sum \limits_{k={N_{1}}+1}^{N}}{({p_{k}}+{c_{k}}\varepsilon )^{\alpha }}\right)\\ {} & -\frac{1}{(\alpha -1)\varepsilon }\log \left({\sum \limits_{k=1}^{N}}{p_{k}^{\alpha }}\right)=\frac{1}{\alpha -1}\\ {} & \times \underset{\varepsilon \to 0}{\lim }\frac{{\textstyle\textstyle\sum _{k=1}^{{N_{1}}}}\alpha {c_{k}^{\alpha }}{\varepsilon ^{\alpha -1}}+{\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}\alpha {c_{k}}{({p_{k}}+{c_{k}}\varepsilon )^{\alpha -1}}}{{\textstyle\textstyle\sum _{k=1}^{{N_{1}}}}{({c_{k}}\varepsilon )^{\alpha }}+{\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}{({p_{k}}+{c_{k}}\varepsilon )^{\alpha }}}\\ {} & =\frac{\alpha }{\alpha -1}\left({\sum \limits_{k={N_{1}}+1}^{N}}{c_{k}}{p_{k}^{\alpha -1}}\right){\left({\sum \limits_{k={N_{1}}+1}^{N}}{p_{k}^{\alpha }}\right)^{-1}}.\end{aligned}\]
$(ii)$ Let ${\textstyle\sum _{k={N_{1}}+1}^{N}}{c_{k}}{p_{k}^{\alpha -1}}=0$, ${N_{1}}\ge 1$, and there exists $k\le {N_{1}}$ such that ${c_{k}}\ne 0$. Then for $\alpha <2$ we have that
\[\begin{aligned}{}& \underset{\varepsilon \to 0}{\lim }\frac{{\mathcal{H}_{\alpha }}(p)-{\mathcal{H}_{\alpha }}(p(\epsilon ))}{{\varepsilon ^{\alpha }}}\\ {} & \hspace{1em}=\underset{\varepsilon \to 0}{\lim }\frac{1}{(\alpha -1){\varepsilon ^{\alpha }}}\log \left({\sum \limits_{k=1}^{{N_{1}}}}{({c_{k}}\varepsilon )^{\alpha }}+{\sum \limits_{k={N_{1}}+1}^{N}}{({p_{k}}+{c_{k}}\varepsilon )^{\alpha }}\right)\\ {} & \hspace{1em}-\frac{1}{(\alpha -1){\varepsilon ^{\alpha }}}\log \left({\sum \limits_{k=1}^{N}}{p_{k}^{\alpha }}\right)=\frac{1}{\alpha -1}\underset{\varepsilon \to 0}{\lim }\frac{1}{\alpha {\varepsilon ^{\alpha -1}}}\\ {} & \hspace{1em}\times \frac{{\textstyle\textstyle\sum _{k=1}^{{N_{1}}}}\alpha {c_{k}^{\alpha }}{\varepsilon ^{\alpha -1}}+{\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}\alpha {c_{k}}{({p_{k}}+{c_{k}}\varepsilon )^{\alpha -1}}}{{\textstyle\textstyle\sum _{k=1}^{{N_{1}}}}{({c_{k}}\varepsilon )^{\alpha }}+{\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}{({p_{k}}+{c_{k}}\varepsilon )^{\alpha }}}\\ {} & \hspace{1em}=\frac{1}{\alpha -1}\underset{\varepsilon \to 0}{\lim }\frac{{\textstyle\textstyle\sum _{k=1}^{{N_{1}}}}{c_{k}^{\alpha }}{\varepsilon ^{\alpha -1}}+{\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}((\alpha -1){c_{k}^{2}}{p_{k}^{\alpha -2}}\varepsilon +o(\varepsilon ))}{{\varepsilon ^{\alpha -1}}\left({\textstyle\textstyle\sum _{k=1}^{{N_{1}}}}{({c_{k}}\varepsilon )^{\alpha }}+{\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}{({p_{k}}+{c_{k}}\varepsilon )^{\alpha }}\right)}\\ {} & \hspace{1em}=\frac{1}{\alpha -1}\left({\sum \limits_{k=1}^{{N_{1}}}}{c_{k}^{\alpha }}\right){\left({\sum \limits_{k={N_{1}}+1}^{N}}{p_{k}^{\alpha }}\right)^{-1}}.\end{aligned}\]
$(iii)$ Let ${\textstyle\sum _{k={N_{1}}+1}^{N}}{c_{k}}{p_{k}^{\alpha -1}}=0$, ${N_{1}}\ge 0$ and for all $k\le {N_{1}}$ we have that ${c_{k}}=0$. Then for $\alpha <2$ it holds that
\[\begin{aligned}{}& \hspace{1em}\underset{\varepsilon \to 0}{\lim }\frac{{\mathcal{H}_{\alpha }}(p)-{\mathcal{H}_{\alpha }}(p(\epsilon ))}{{\varepsilon ^{2}}}\\ {} & \hspace{1em}=\underset{\varepsilon \to 0}{\lim }\frac{1}{(\alpha -1){\varepsilon ^{2}}}\left(\log \left({\sum \limits_{k={N_{1}}+1}^{N}}{({p_{k}}+{c_{k}}\varepsilon )^{\alpha }}\right)-\log \left({\sum \limits_{k=1}^{N}}{p_{k}^{\alpha }}\right)\right)\\ {} & \hspace{1em}=\frac{1}{\alpha -1}\underset{\varepsilon \to 0}{\lim }\frac{1}{2\varepsilon }\frac{{\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}\alpha {c_{k}}{({p_{k}}+{c_{k}}\varepsilon )^{\alpha -1}}}{{\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}{({p_{k}}+{c_{k}}\varepsilon )^{\alpha }}}\\ {} & \hspace{1em}=\frac{1}{\alpha -1}\underset{\varepsilon \to 0}{\lim }\frac{{\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}(\alpha {c_{k}}{p_{k}^{\alpha -1}}+\alpha (\alpha -1){c_{k}^{2}}{p_{k}^{\alpha -2}}\varepsilon +o(\varepsilon ))}{2\varepsilon \left({\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}{({p_{k}}+{c_{k}}\varepsilon )^{\alpha }}\right)}\\ {} & \hspace{1em}=\frac{1}{\alpha -1}\underset{\varepsilon \to 0}{\lim }\frac{{\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}(\alpha (\alpha -1){c_{k}^{2}}{p_{k}^{\alpha -2}}\varepsilon +o(\varepsilon ))}{2\varepsilon \left({\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}{({p_{k}}+{c_{k}}\varepsilon )^{\alpha }}\right)}\\ {} & \hspace{1em}=\frac{\alpha }{2}\left({\sum \limits_{k={N_{1}}+1}^{N}}{c_{k}^{2}}{p_{k}^{\alpha -2}}\right){\left({\sum \limits_{k={N_{1}}+1}^{N}}{p_{k}^{\alpha }}\right)^{-1}}.\end{aligned}\]
$(iv)$ Obviously, in the case $\alpha =2$ we have the simple value of the entropy:
Therefore, if ${\textstyle\sum _{k={N_{1}}+1}^{N}}{c_{k}}{p_{k}^{\alpha -1}}=0$, $\alpha =2$, then for any ${N_{1}}\ge 0$ and ${c_{k}},\hspace{0.2778em}k\le {N_{1}}$, we have that
\[\begin{aligned}{}\underset{\varepsilon \to 0}{\lim }\frac{{\mathcal{H}_{2}}(p)-{\mathcal{H}_{2}}(p(\epsilon ))}{{\varepsilon ^{2}}}& =\underset{\varepsilon \to 0}{\lim }\frac{1}{{\varepsilon ^{2}}}\log \left({\sum \limits_{k=1}^{{N_{1}}}}{({c_{k}}\varepsilon )^{2}}+{\sum \limits_{k={N_{1}}+1}^{N}}{({p_{k}}+{c_{k}}\varepsilon )^{2}}\right)\\ {} & -\frac{1}{{\varepsilon ^{2}}}\log \left({\sum \limits_{k=1}^{N}}{p_{k}^{2}}\right)=\underset{\varepsilon \to 0}{\lim }\frac{1}{2\varepsilon }\\ {} & \times \frac{{\textstyle\textstyle\sum _{k=1}^{{N_{1}}}}2{c_{k}^{2}}\varepsilon +{\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}2{c_{k}}({p_{k}}+{c_{k}}\varepsilon )}{{\textstyle\textstyle\sum _{k=1}^{{N_{1}}}}{({c_{k}}\varepsilon )^{2}}+{\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}{({p_{k}}+{c_{k}}\varepsilon )^{2}}}\\ {} & =\underset{\varepsilon \to 0}{\lim }\frac{{\textstyle\textstyle\sum _{k=1}^{{N_{1}}}}{c_{k}^{2}}+{\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}{c_{k}^{2}}}{{\textstyle\textstyle\sum _{k=1}^{{N_{1}}}}{({c_{k}}\varepsilon )^{2}}+{\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}{({p_{k}}+{c_{k}}\varepsilon )^{2}}}\\ {} & =\left({\sum \limits_{k=1}^{N}}{c_{k}^{2}}\right){\left({\sum \limits_{k={N_{1}}+1}^{N}}{p_{k}^{2}}\right)^{-1}}.\end{aligned}\]
$(v)$ Let ${\textstyle\sum _{k={N_{1}}+1}^{N}}{c_{k}}{p_{k}^{\alpha -1}}=0$, $\alpha >2$. Then for any ${N_{1}}\ge 0$ and ${c_{k}},\hspace{0.2778em}k\le {N_{1}}$, we have that
\[\begin{aligned}{}& \underset{\varepsilon \to 0}{\lim }\frac{{\mathcal{H}_{\alpha }}(p)-{\mathcal{H}_{\alpha }}(p(\epsilon ))}{{\varepsilon ^{2}}}\\ {} & \hspace{1em}=\underset{\varepsilon \to 0}{\lim }\frac{1}{(\alpha -1){\varepsilon ^{2}}}\log \left({\sum \limits_{k=1}^{{N_{1}}}}{({c_{k}}\varepsilon )^{\alpha }}+{\sum \limits_{k={N_{1}}+1}^{N}}{({p_{k}}+{c_{k}}\varepsilon )^{\alpha }}\right)\\ {} & \hspace{1em}-\frac{1}{(\alpha -1){\varepsilon ^{2}}}\log \left({\sum \limits_{k=1}^{N}}{p_{k}^{\alpha }}\right)=\frac{1}{\alpha -1}\underset{\varepsilon \to 0}{\lim }\frac{1}{2\varepsilon }\\ {} & \hspace{1em}\times \frac{{\textstyle\textstyle\sum _{k=1}^{{N_{1}}}}\alpha {c_{k}^{\alpha }}{\varepsilon ^{\alpha -1}}+{\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}\alpha {c_{k}}{({p_{k}}+{c_{k}}\varepsilon )^{\alpha -1}}}{{\textstyle\textstyle\sum _{k=1}^{{N_{1}}}}{({c_{k}}\varepsilon )^{\alpha }}+{\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}{({p_{k}}+{c_{k}}\varepsilon )^{\alpha }}}\\ {} & \hspace{1em}=\frac{1}{\alpha -1}\underset{\varepsilon \to 0}{\lim }\frac{{\textstyle\textstyle\sum _{k=1}^{{N_{1}}}}\alpha {c_{k}^{\alpha }}{\varepsilon ^{\alpha -1}}+{\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}(\alpha (\alpha -1){c_{k}^{2}}{p_{k}^{\alpha -2}}\varepsilon +o(\varepsilon ))}{2\varepsilon \left({\textstyle\textstyle\sum _{k=1}^{{N_{1}}}}{({c_{k}}\varepsilon )^{\alpha }}+{\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}{({p_{k}}+{c_{k}}\varepsilon )^{\alpha }}\right)}\\ {} & \hspace{1em}=\frac{\alpha }{2}\left({\sum \limits_{k={N_{1}}+1}^{N}}{c_{k}^{2}}{p_{k}^{\alpha -2}}\right){\left({\sum \limits_{k={N_{1}}+1}^{N}}{p_{k}^{\alpha }}\right)^{-1}}.\end{aligned}\]
Theorem is proved. □3.2 Convergence of the disturbed entropy when the initial distribution is uniform but the number of events increases to ∞
The second problem is to establish conditions of stability of the entropy of uniform distribution when the number of events tends to ∞. Let $N>1$, ${p_{N}}(uni)=(\frac{1}{N},\dots ,\frac{1}{N})$ be a vector of uniform distribution with N possible states, $\varepsilon =\varepsilon (N)\le \frac{1}{N}$, and $\{{c_{kN}};\hspace{0.2778em}N\ge 1,\hspace{0.2778em}1\le k\le N\}$ be a family of fixed numbers (not all zero) such that $|{c_{kN}}|\le 1$ and ${\textstyle\sum _{k=1}^{N}}{c_{kN}}=0$. Note that for any $N\ge 1$ there are strictly positive numbers ${c_{kN}}$ for some k and consider the disturbed distribution vector ${p_{N}^{{^{\prime }}}}=(\frac{1}{N}+{c_{1N}}\varepsilon ,\dots ,\frac{1}{N}+{c_{NN}}\varepsilon )$.
Proof.
We know that $N\varepsilon \to 0$, as $N\to \infty $, and the family of numbers $\{{c_{kn}};\hspace{0.2778em}n\ge 1,\hspace{0.2778em}1\le k\le n\}$ is bounded. Therefore the values
\[ \underset{n\ge 1,\hspace{0.2778em}1\le k\le n}{\sup }(1+N{c_{kn}}\varepsilon )\to 1,\hspace{0.2778em}\underset{n\ge 1,\hspace{0.2778em}1\le k\le n}{\inf }(1+N{c_{kn}}\varepsilon )\to 1,\hspace{0.2778em}N\to \infty ,\]
as the functions of N, and for every $N\ge 1$, $\underset{n\ge 1,\hspace{0.2778em}1\le k\le n}{\sup }(1+N{c_{kn}}\varepsilon )\ge 1$. Recall that the function $x\log x$ is increasing for $x\ge 1$, and $x\log x\le 0$ for $0<x<1$. Moreover, Rényi entropy is maximal on the uniform distribution. As a consequence of all these observations and assumptions we get that
\[\begin{aligned}{}0\le {\mathcal{H}_{1}}({p_{N}})-{\mathcal{H}_{1}}({p_{N}^{{^{\prime }}}})& =\frac{1}{N}{\sum \limits_{k=1}^{N}}(1+N{c_{kN}}\varepsilon )\log (1+N{c_{kN}}\varepsilon )\\ {} & \le \frac{1}{N}{\sum \limits_{k=1}^{N}}\underset{n\ge 1,\hspace{0.2778em}1\le k\le n}{\sup }(1+N{c_{kn}}\varepsilon )\log \underset{n\ge 1,\hspace{0.2778em}1\le k\le n}{\sup }(1+N{c_{kn}}\varepsilon )\\ {} & =\underset{n\ge 1,\hspace{0.2778em}1\le k\le n}{\sup }(1+N{c_{kn}}\varepsilon )\log \underset{n\ge 1,\hspace{0.2778em}1\le k\le n}{\sup }(1+N{c_{kn}}\varepsilon )\\ {} & \to 0,\hspace{0.2778em}N\to \infty .\end{aligned}\]
Let $\alpha >1$. Then
\[\begin{aligned}{}0\le {\mathcal{H}_{\alpha }}({p_{N}})-{\mathcal{H}_{\alpha }}({p_{N}^{{^{\prime }}}})& =\frac{1}{\alpha -1}\log \left(\frac{1}{N}{\sum \limits_{k=1}^{N}}{(1+N{c_{kN}}\varepsilon )^{\alpha }}\right)\\ {} & \le \frac{1}{\alpha -1}\log \left(\frac{1}{N}{\sum \limits_{k=1}^{N}}{\left(\underset{n\ge 1,\hspace{0.2778em}1\le k\le n}{\sup }(1+N{c_{kn}}\varepsilon )\right)^{\alpha }}\right)\\ {} & =\frac{\alpha }{\alpha -1}\log \left(\underset{n\ge 1,\hspace{0.2778em}1\le k\le n}{\sup }(1+N{c_{kn}}\varepsilon )\right)\to 0,\hspace{0.2778em}N\to \infty .\end{aligned}\]
Similarly, for $0<\alpha <1$ we produce the transformations
\[\begin{aligned}{}0\le {\mathcal{H}_{\alpha }}({p_{N}})-{\mathcal{H}_{\alpha }}({p_{N}^{{^{\prime }}}})& =\frac{1}{\alpha -1}\log \left(\frac{1}{N}{\sum \limits_{k=1}^{N}}{(1+N{c_{kN}}\varepsilon )^{\alpha }}\right)\\ {} & \le \frac{1}{\alpha -1}\log \left(\frac{1}{N}{\sum \limits_{k=1}^{N}}{\left(\underset{n\ge 1,\hspace{0.2778em}1\le k\le n}{\inf }(1+N{c_{kn}}\varepsilon )\right)^{\alpha }}\right)\\ {} & =\frac{\alpha }{\alpha -1}\log \left(\underset{n\ge 1,\hspace{0.2778em}1\le k\le n}{\inf }(1+N{c_{kn}}\varepsilon )\right)\to 0,\hspace{0.2778em}N\to \infty ,\end{aligned}\]
and the proof follows. □3.3 Binomial and Poisson distributions
In this section we look at convergence of Rényi entropy of the binomial distribution to Rényi entropy of the Poisson distribution.
Proof.
First, let $\alpha =1$. We will find and regroup entropies of the binomial and Poisson distributions.
\[\begin{aligned}{}{\mathcal{H}_{1}}\left(B\left(n,p\right)\right)& =-{\sum \limits_{k=0}^{n}}\left(\genfrac{}{}{0pt}{}{n}{k}\right){p^{k}}{(1-p)^{n-k}}\log \left(\left(\genfrac{}{}{0pt}{}{n}{k}\right){p^{k}}{(1-p)^{n-k}}\right)\\ {} & =-{\sum \limits_{k=0}^{n}}\left(\genfrac{}{}{0pt}{}{n}{k}\right){p^{k}}{(1-p)^{n-k}}\log \left(\genfrac{}{}{0pt}{}{n}{k}\right)\\ {} & -n\left(p\log p+(1-p)\log (1-p)\right)\\ {} & =-{\sum \limits_{k=0}^{n}}\left(\genfrac{}{}{0pt}{}{n}{k}\right){p^{k}}{(1-p)^{n-k}}(\log n!-\log k!-\log (n-k)!)+np\log n\\ {} & -np\log np-n\log (1-p)+np\log (1-p)\\ {} & =np\log (1-p)-n\log (1-p)-np\log np+np\log n\\ {} & -{\sum \limits_{k=0}^{n}}\left(\genfrac{}{}{0pt}{}{n}{k}\right){p^{k}}{(1-p)^{n-k}}(\log n!-\log k!-\log (n-k)!).\\ {} {\mathcal{H}_{1}}(Poi(\lambda ))& =-{\sum \limits_{k=0}^{\infty }}{e^{-\lambda }}\frac{{\lambda ^{k}}}{k!}\log \left({e^{-\lambda }}\frac{{\lambda ^{k}}}{k!}\right)=\lambda -\lambda \log \lambda +{\sum \limits_{k=0}^{\infty }}{e^{-\lambda }}\frac{{\lambda ^{k}}}{k!}\log k!\end{aligned}\]
We want to show componentwise convergence of entropies. For that let us take $np=\lambda $ and observe that
\[\begin{array}{l}\displaystyle np\log (1-p)=\lambda \log (1-p)\to \lambda \log 1=0,\hspace{0.2778em}n\to \infty ,\hspace{0.2778em}p\to 0.\\ {} \displaystyle -n\log (1-p)=\log {\left(1-\frac{\lambda }{n}\right)^{-n}}\to \lambda ,\hspace{0.2778em}n\to \infty ,\hspace{0.2778em}p\to 0.\\ {} \displaystyle -np\log np=-\lambda \log \lambda ,\end{array}\]
\[\begin{aligned}{}np\log n& -{\sum \limits_{k=0}^{n}}\left(\genfrac{}{}{0pt}{}{n}{k}\right){p^{k}}{(1-p)^{n-k}}(\log n!-\log k!-\log (n-k)!)\\ {} & ={\sum \limits_{k=0}^{n}}\left(\genfrac{}{}{0pt}{}{n}{k}\right){p^{k}}{(1-p)^{n-k}}k\log n\\ {} & -{\sum \limits_{k=0}^{n}}\left(\genfrac{}{}{0pt}{}{n}{k}\right){p^{k}}{(1-p)^{n-k}}(\log n!-\log k!-\log (n-k)!)\\ {} & ={\sum \limits_{k=0}^{n}}\left(\genfrac{}{}{0pt}{}{n}{k}\right){p^{k}}{(1-p)^{n-k}}\left(\log {n^{k}}-\log n!+\log k!+\log (n-k)!\right)\\ {} & ={\sum \limits_{k=0}^{n}}\left(\genfrac{}{}{0pt}{}{n}{k}\right){p^{k}}{(1-p)^{n-k}}\left(\log \frac{{n^{k}}(n-k)!}{n!}+\log k!\right).\end{aligned}\]
It is well known that $\frac{\log (x)}{x}\le 1$, $x>0$. Using this fact, we get the following representation:
\[\begin{aligned}{}\left(\genfrac{}{}{0pt}{}{n}{k}\right){p^{k}}{(1-p)^{n-k}}\log \frac{{n^{k}}(n-k)!}{n!}& =\frac{n!}{(n-k)!k!}{\left(\frac{\lambda }{n}\right)^{k}}{\left(1-\frac{\lambda }{n}\right)^{n-k}}\log \frac{{n^{k}}(n-k)!}{n!}\\ {} & =\frac{{\lambda ^{k}}}{k!}{\left(1-\frac{\lambda }{n}\right)^{n-k}}\frac{n!}{{n^{k}}(n-k)!}\log \frac{{n^{k}}(n-k)!}{n!}\\ {} & \le \frac{{\lambda ^{k}}}{k!}{\left(1-\frac{\lambda }{n}\right)^{n-k}}\le \frac{{\lambda ^{k}}}{k!}.\end{aligned}\]
For the second part of the sum simply observe that
\[\begin{aligned}{}\left(\genfrac{}{}{0pt}{}{n}{k}\right){p^{k}}{(1-p)^{n-k}}\log k!& =\frac{n!}{(n-k)!k!}{\left(\frac{\lambda }{n}\right)^{k}}{\left(1-\frac{\lambda }{n}\right)^{n-k}}\log k!\\ {} & =\frac{{\lambda ^{k}}}{k!}\log k!{\left(1-\frac{\lambda }{n}\right)^{n-k}}\frac{n!}{(n-k)!{n^{k}}}\\ {} & \le \frac{{\lambda ^{k}}}{k!}\log k!\end{aligned}\]
As ${\textstyle\sum _{k=0}^{\infty }}\frac{{\lambda ^{k}}}{k!}\left(1+\log k!\right)<\infty $, by Lebesgue’s dominated convergence theorem:
\[\begin{aligned}{}\underset{n\to \infty }{\lim }& {\sum \limits_{k=0}^{n}}\left(\genfrac{}{}{0pt}{}{n}{k}\right){p^{k}}{(1-p)^{n-k}}\left(\log \frac{{n^{k}}(n-k)!}{n!}+\log k!\right)\\ {} & ={\sum \limits_{k=0}^{\infty }}\underset{n\to \infty }{\lim }\left(\genfrac{}{}{0pt}{}{n}{k}\right){p^{k}}{(1-p)^{n-k}}\left(\log \frac{{n^{k}}(n-k)!}{n!}+\log k!\right)\\ {} & ={\sum \limits_{k=0}^{\infty }}\underset{n\to \infty }{\lim }\frac{{\lambda ^{k}}}{k!}{\left(1-\frac{\lambda }{n}\right)^{n-k}}\frac{n!}{(n-k)!{n^{k}}}\left(\log \frac{{n^{k}}(n-k)!}{n!}+\log k!\right)\\ {} & ={\sum \limits_{k=0}^{\infty }}{e^{-\lambda }}\frac{{\lambda ^{k}}}{k!}\log k!\end{aligned}\]
Finally, we get that
\[ \underset{n\to \infty }{\lim }{\mathcal{H}_{1}}\left(B\left(n,\frac{\lambda }{n}\right)\right)={\mathcal{H}_{1}}(Poi(\lambda )).\]
For $\alpha \ne 1$ we have:
\[\begin{array}{l}\displaystyle {\mathcal{H}_{\alpha }}(binomial)=\frac{1}{1-\alpha }\log {\sum \limits_{k=0}^{n}}{\left(\left(\genfrac{}{}{0pt}{}{n}{k}\right){p^{k}}{(1-p)^{n-k}}\right)^{\alpha }},\\ {} \displaystyle {\mathcal{H}_{\alpha }}(poisson)=\frac{1}{1-\alpha }\log {\sum \limits_{k=0}^{+\infty }}{\left({e^{-\lambda }}\frac{{\lambda ^{k}}}{k!}\right)^{\alpha }}.\end{array}\]
Thus, to show that
it is enough to show the convergence of sums, which follows from Lebesgue’s dominated convergence theorem and
□